IoT end-to-end encryption with E4 (3/n): automation component

Our previous post, introduced the architecture and components of E4, Teserakt’s encryption and key management for IoT systems. This post is about one of these components, the automation engine, which we’ll describe more in depth today.

Keys are not forever

Devices managed by E4 rely on symmetric or public-key cryptography to protect their communications. In both mode, each device has its own identity key, plus one one key for each topics (you can see a topic as a "conversation", as a data type, as a classification level, and so on). Since devices may be deployed for years, there must be processes to renew these keys remotely and securely. Motivations for rotating keys include:

  • Revocation of a key that has possibly been compromised, or is not to be trusted for various reasons

  • Provide forward secrecy, that is, guarantee that if a key is compromised at some point, then earlier communications (with a different key) cannot be decrypted

  • Provide backward secrecy, or post-compromise security, for example in order to guarantee that, would some topic key be compromised, future topic keys remain secret, and therefore content protected with these keys remain protected.

Such key rotation may be done manually, or partially manually by creating some scripts, creating a custom network service, and so on, and updating the script manually when needed. This is a cumbersome and error-prone procedure that is unlikely to scale and to prove reliable in critical environments. We therefore propose an automation system that is simple to use (both through a graphical UI and scripts), and that easily scales to many topics and many messages.

The E4 automation engine address exactly this problem. It automates the rotation of any E4 keys by defining rules following a simple grammar, depicted below:

In the context of our key rotation automation, we then use following terms:

  • Rule: a rule is the main component of the engine. It holds a list of triggers and targets controlling when the key rotation will happen, and which device or topic will be updated.
  • Trigger: a trigger is a condition that must be fulfilled in order to execute a key rotation for all the parent rule’s target. A trigger can be a predefined period of time, or a number of system events, such as a threshold of clients joining or leaving a certain topic.
  • Target: a target designates either a client device or a topic, for which the key will be updated when one of the rule’s triggers will have its condition fulfilled.

When to rotate keys

We currently support basic use cases such as:

  • Rotating a device or topic key at a fixed time interval
  • Rotating a topic key after a certain number of clients joined or left this topic

Those cover the most common scenarios requiring a key rotation. We’ve also carefully designed our engine and rules format to be flexible and easily extensible. If you have any other use cases which are not actually covered, we’d be happy to hear about them!

We’re often asked what’s the right time interval for key rotation. There is no single right answer, for it depends on a number of factor including the threat model and risks, the network reliability, the cost of sending control messages, etc.

How it works

As seen in our previous architecture post, the automation engine extends the C2 server functionalities. When enabled, it starts an internal scheduler that monitors the time-based rules, and also registers a channel on the C2 server to receive system events. On each events, the automation engine will check from the defined rules if any triggers are due, and request a key rotation for each of the rule’s targets on the C2. The trigger state then reset and is waiting again for its conditions to be met to fire again.

Client key rotation

A client’s key is its more critical key, and should ideally be better protected than topic keys, which are shared with other devices. In the symmetric key mode, the client’s key is also known to the C2, and used to protect control messages sent to that client. One of thes commands supported by control messages is SetIDKey, which offers a new client key to a remote device.

When a client key is rotated, the C2 server thus issues a SetIDKey command, containing a newly generated key, and protected with the current client key. The command is then transmitted to the client via the MQTT broker, by publishing it on the client control topic.

In the public-key mode, the C2 only knows clients’ public keys, not their private keys. Therefore client keys can’t be rotated in the public key mode.

Topic key rotation

When a topic key is rotated, the C2 server first generates a new random key, before issuing a SetTopicKey command for every subscribed clients on this topic. Each command is then protected, using the client key of its recipient, before transmitting on each client control topics.

When a topic has many subscribed clients, there might be a significant delay between the first and last SetTopicKey commands transmission (and thus reception). Some clients may thus use the old key while others are already using the new one, preventing the decryption of messages in each way. To avoid message loss, we thus need a key transition mechanism, which we implemented by defining a configurable grace period.

Grace period

The grace period is a client parameter, which defines the duration during which both the old and new key can be used, starting from the time when a new topic key is received. The client will store the newly received key, and keep the old one for a configurable amount of time. The devices then does the following:

  • When receiving a message, it first tries to unprotect it with the new key. If this fails (as indicated by invalid authentication tag), the client then tries with the old key if it is still within the validity period.

  • When transmitting a message, the new key is always used. This might prevent some devices from reading the message, however this is safest behavior in case the old key was compromised. Furthermore, using the old key would also prevent messages to be decrypted by clients whose grace period has passed.

Again, when we’re asked what’s the best grace period value, our answer is that it depends on several factors, including the number of clients subscribed to a topic, how often keys are rotated, the network’s latency and reliability, and so on. Ideally, the grace period should be chosen after analyzing empirical data. The best value of the grace period is one that sufficiently minimizes messages loss, while minimizing the amount of messages encrypted using the old key (thus, the shorter the period, the safer).

Conclusion

Key management, and more particularly secret rotation, can be a tedious task. With E4, we try to make it transparent, fail-safe, and easy to configure and operate. You can configure your device and topic keys rotation, either from the dedicated page on the web console, or from the command-line client, allowing to create or view active rotation policies.

You can try it yourself, and start rotating keys using our e4 client, with binaries for common architectures available on the release page. Automation rules can then be created from the demo console.

IoT end-to-end encryption with E4 (2/n): architecture and integration

The previous post in the series introduced the E4 protocol, which brings end-to-end encryption to IoT networks in a scalable way. We now describe how simple and unobtrusive E4 is to integrate in an existing infrastructure. We’ll start with a basic example of an IoT architecture, present the limitations in terms of data security, and then introduce E4 components and their integration.

An typical IoT architecture

A common industrial IoT architecture includes a multitude of connected devices, reporting their data such as measurements to a processing back-end. In the example depicted below, the MQTT transport protocol is used, such that devices and the back-end connect to an MQTT message broker in order to exchange messages according to a publish/subscribe logic.

Because devices directly publish their data as cleartext messages, this data is exposed to the message broker, which can read its content, modify it, and create fake messages on behalf of the devices. This can happen in different scenario, for example when:

  • The broker is operated by a third party, such as a cloud operator
  • The broker is compromised by an attacker, who then controls its operation
  • The broker database is compromised, and therefore all its messages

Other network components (routers, gateways, etc.) can also do the same if the connection between devices and the broker is not protected.

In other words important security properties are missing in this architecture:

  • Data authenticity and integrity of messages
  • Data confidentiality

The same concerns apply when data is sent to the devices, such as firmware or configuration updates, as well as when devices communicate with each other.

E4 integration

The E4 solution aims to protect IoT data without making any change to the message broker (or similar component), and in a way that it mostly independent of the transport protocol (in this example, MQTT). As you can see in the image below, our server component (C2) connects to the message broker like any other device, and other network components (devices and back-end) integrate our software library in order to support the E4 security layer. The subsequent sections explain how this integration is done.

Terminology

We defines a few terms:

  • E4 clients (or just "clients"): In E4, clients are anything able to publish or receive messages. In our examples, the devices, sensors or processing backend are all identified as clients. E4 clients must each have a unique identifier, and a secret key.
  • C2 server (or just "C2"): The back-end component of E4, which is seen by the network as just another device. C2 sends control messages to E4 clients and performs security monitoring operations, among others.
  • Topics: MQTT topics are the message queues where messages will be published and read from. In E4, topics can be associated with a secret key, which clients will use to protect messages tagged with this topic.
  • Protect/Unprotect: Instead of just encryption/decryption, E4’s security layer can also provide integrity, authenticity, and replay protection. That’s why we refer to the message transformation as "protection" rather than only encryption.

Modifications to clients

To integrate E4, the main modifications take place at the applications level, which need to send or receive messages on the clients. They need to integrate the E4 library in order to:

  • Protect a message before publishing it. This ensure that intercepted messages can’t be read nor modified by unauthorized parties.
  • Unprotect a message after receiving it. This allow to retrieve the original message payload from a protected message.
  • Subscribe to the device’s control topic. This allow the C2 server to send control messages to this unique client.
  • Store an identifier and related key material.
  • Process control messages sent by the C2.

C2 integration

The C2 is E4’s key management server, whose main roles are to:

  • Keep track of client devices and their associated cryptographic keys.
  • Identify communication topics, and their associated cryptographic keys.
  • Send control messages to the clients.
  • Perform analytics to monitor the network.

The C2 exposes an API (gRPC or HTTP) allowing to register all your clients and topics, and securely provision keys to devices in a manual or automated way. To do so, it only needs to reach the MQTT broker, as a normal client (ideally with MQTT QoS parameter set to 2).

Besides its core service, C2 provides additional and optional services:

  • A simple web-based UI, to manage clients and topics manually. See our public demo https://console.demo.teserakt.io.
  • A command-line tool (c2cli), allowing scripting and batch processing for your clients and topics management.
  • An automation engine to define, manage and automate key rotation policies (which will be presented in a future post in this series).

As depicted above, additional services directly connect to the core C2 service, meaning there is still no change to make to your existing infrastructure.

C2 database

The C2 can use popular SQL RDBMSs (such as MySQL, PostgreSQL, SQLite) to store your devices and topics. All the cryptographic keys are stored encrypted, and only decrypted when needed, that is, when transmitting them to clients in control messages.

In its simplest setting, database records are encrypted using a key controlled by the C2 service. Depending on the security needs and available technology, the following components can be used to manage the database encryption key:

  • HSM via PKCS#11 interface
  • Dedicated key management service such as Hashicorp’s Vault
  • Cloud-based key management services such as AWS KMS

Control messages

In order to control each devices internal state, such as their identity key (used by the C2 server to encrypt the device’s control messages), or the list of topic keys (used to encrypt or decrypt the data transmitted or received by the client), the C2 service emit control messages, which can be of one of the following types:

  • SetTopicKey to set a key for a given topic, allowing the device to read or publish messages on this topic.
  • RemoveTopic to remove a topic key from the client, preventing the device to read or publish messages on the topic.
  • ResetTopics to remove all the topic keys stored by th client.
  • SetIDKey to set the device identity key, required to process control messages.

As we’ll describe in a future post, E4 support symmetric-key and public-key cryptography modes. In the public-key mode (which is the safest), the following messages are used:

  • SetPubKey to add the public key of another device in the local storage, allowing to authenticate the message sender.
  • RemovePubKey to remove a stored public key from the client’s storage.
  • ResetPubKeys to remove all the public keys stored by the client.

Control messages are protected using the current device key, and published by the C2 on the device’s control topic (created from the client unique identifier we mentioned above).

The processing of control messages is implemented in the client software, as you can see for example in E4’s Go client.

Summary

E4 allows you to protect your IoT data with minimal changes required, and a short integration and testing period. E4 does not require any change to the message broker, nor additional hardware or software capabilities on the clients.

E4 is also robust and resilient to outages: although the C2 services are designed for scalability and high-availability, devices can still communicate reliably if these are unavailable or are temporarily down for maintenance..

If you’re interested in learning more or have any question, feel free to contact us!

Links

IoT end-to-end encryption with E4 (1/n): it's open-source!

We’re thrilled to announce a major milestone for Teserakt, with the launch of our flagship product E4 as open-source software.  We want to make Teserakt’s IoT protection solution as easy to try as possible, and available to any user from hobbyists to large corporations. This is why our client library is now free to use, enabling anyone to integrate strong end-to-end encryption into their applications.

We go even further: Although our server is not open-source, we run a public demo version of our web interface, available without any registration or email identification. Using this demo in combination with our open-source client application, anyone can test our key management server with no effort.

E4 is the product of two years of research, interacting with customers from industries such as aerospace, agriculture, automotive, energy, healthcare, and physical access, to understand real applications’ security needs and technical constraints. E4 is also ideal for mobile applications, to protect personal or other sensitive data.

This is the first of a series of posts about E4, its architecture, components, and internals, which we’ll be publishing during the coming weeks. Meanwhile, technical and commercial information about E4 is now available on E4’s page.

The Q&A below gives a general overview of E4, before more details in the upcoming posts. Don’t hesitate to contact us directly at [email protected] if you have specific questions or would like access to a private instance of the E4 server.

What is E4?

E4 is Teserakt’s innovative data protection software solution for IoT networks, consisting of 

  • A client library, used to encrypt and decrypt data, and 
  • A key server to manage devices rights manually or automatically.

What exactly is now open-source?

We open-source E4’s client library, available in the C and Go languages. It can be integrated into a wide range of embedded platforms, as well as mobile applications or cloud services, in order to enable end-to-end data protection.

E4’s key server is not open-source, but is not required to test the client library’s capabilities. Without the server, the client library can be used with fixed keys and without management and monitoring features.

Do I need the server for my application?

E4’s key server is used to manage devices remotely in order to provision keys, grant and revoke rights to devices, ensure perfect forward secrecy, automate key rotation policies, monitor message for anomalies, and so on.

Enterprise, production deployments will likely need the server, while personal projects and prototypes can work with only the client library. The server is offered on a commercial basis along with technical support, software maintenance, and extra features integration.

What network protocols do you support?

The client library is agnostic of the network protocol, as it works on top of the network layer in order to encrypt/decrypt data, and to process control messages from a server.  

Our server works by default with MQTT (the IoT protocol used by AWS’ IoT Hub and Google Cloud IoT platforms), and can be configured to work with other protocols such as Kafka, AMQP, Zero-MQ. To know if we support your protocol, please contact us at [email protected].

How to try it?

It takes less than 5 minutes! To try E4 without writing your own application, we created a simple interactive client application that you can use in combination with our public demo server interface. You can directly download the client’s binary for your platform or build it yourself, and then follow the instructions.

What cryptography does E4 use?

E4 can run in two modes, symmetric or public-key cryptography. The symmetric mode is optimized for the most constrained platforms, and only uses AES-SIV and SHA-3. The public-key mode uses in addition Ed25519 signatures and X25519 key exchange. Details of our crypto protocols will appear in a subsequent post. Note that we also support a cipher suite including only FIPS 140-2 primitives.

At what address can I find… ?

Lightweight crypto standards (sorry NIST)

NIST’s running a Lightweight Cryptography project in order to standardize symmetric encryption primitives that are lighter than the established NIST standards (read: AES and its modes of operations). NIST claims that “[because] the majority of current cryptographic algorithms were designed for desktop/server environments, many of these algorithms do not fit into constrained devices.” This is good motivation for the competition, however it’s factually incorrect: AES today fits in almost all IoT-ish chips and has even been used for bus and memory encryption.

But that’s not the point of this post—for more on that subject, come hear our talk at NIST’s Lightweight Cryptography Workshop in about 10 days, a talk derived from one of our previous posts, and based on a multitude of real examples. (See also this recent Twitter thread.)

Sorry NIST, this post is not about you, but about another standardization body: ISO, and specifically its ISO/IEC 29192 class of lightweight standards within the 35.030 category (“IT Security – including encryption”). This category includes no less than 9 standards relating to lightweight cryptography, which are (copying verbatim from the ISO page, links included):

ISO/IEC 29192-1:2012Information technology — Security techniques — Lightweight cryptography — Part 1: General90.93ISO/IEC JTC 1/SC 27
ISO/IEC 29192-2IT security techniques — Lightweight cryptography — Part 2: Block ciphers60.00ISO/IEC JTC 1/SC 27
ISO/IEC 29192-2:2012Information technology — Security techniques — Lightweight cryptography — Part 2: Block ciphers90.92ISO/IEC JTC 1/SC 27
ISO/IEC 29192-3:2012Information technology — Security techniques — Lightweight cryptography — Part 3: Stream ciphers90.93ISO/IEC JTC 1/SC 27
ISO/IEC 29192-4:2013Information technology — Security techniques — Lightweight cryptography — Part 4: Mechanisms using asymmetric techniques90.93ISO/IEC JTC 1/SC 27
ISO/IEC 29192-4:2013/AMD 1:2016Information technology — Security techniques — Lightweight cryptography — Part 4: Mechanisms using asymmetric techniques — Amendment 160.60ISO/IEC JTC 1/SC 27
ISO/IEC 29192-5:2016Information technology — Security techniques — Lightweight cryptography — Part 5: Hash-functions60.60ISO/IEC JTC 1/SC 27
ISO/IEC 29192-6:2019Information technology — Lightweight cryptography — Part 6: Message authentication codes (MACs)60.60ISO/IEC JTC 1/SC 27
ISO/IEC 29192-7:2019Information security — Lightweight cryptography — Part 7: Broadcast authentication protocols60.60ISO/IEC JTC 1/SC 27

Granted, few people and industries care about ISO/IEC standards, and you probably don’t, neither do we really to be honest—the fee around $100 to access the ISO standard documents probably doesn’t help in making ISO algorithms more popular.

We nonetheless think that it would be worthwhile to have a list of lightweight symmetric algorithms (whatever your weight metric) that received the blessing of allegedly some competent cryptographers, and therefore that are presumably safe to use. We’ve therefore reviewed what ISO has to offer so that you don’t have to, and summarize the fruit of our research in the remainder of this post:

Block ciphers

ISO/IEC 29192-2:2012 (thus, from 2012) standardizes these two block ciphers (literally just the block ciphers, and not any mode, which are covered in another ISO standard [nothing surprising here: ECB, CBC, OFB, CFB, CTR]):

  • PRESENT, anagram and little sister of SERPENT, designed in 2007, and the least Google-friendly cipher ever. Then marketed as ultra-lightweight, PRESENT is a simple substitution-permutation network with 4-bit S-boxes, which is more hardware-friendly than it is software-friendly, but will perform well enough almost everywhere. It has 64-bit blocks and a key of 80 or 128 bits.
  • CLEFIA, also designed in 2007, from Sony, initially aimed for DRM applications, is a boring Feistel scheme with 128-bit blocks and a key of 128, 192, or 256 bits. CLEFIA isn’t really more lightweight than AES, and I guess Sony lobbied for its standardization.

Stream ciphers

ISO/IEC 29192-3:2012 standardizes these two stream ciphers (again, one bit-oriented and one byte-oriented):

  • Trivium, one of the winners of the eSTREAM project, the favorite cipher of the DEF CON conference, is a minimalistic, bit-oriented stream cipher that is a simple combination of shift registers. Trivium is arguably the lightweightest algorithm in this post.
  • Enocoro, whose existence I had completely forgotten until writing this post. Designed by the Japanese firm Hitachi, Enocoro was submitted to the CRYPTREC contest in 2010 (see this cryptanalysis evaluation), yet ended up not being selected. Enocoro is a byte-oriented feedback shift register with a conservative design. It supports 80- and 128-bit keys, and is most probably safe (and safer with 128-bit keys).

Hash functions

This time not two but three algorithms, as defined in ISO/IEC 29192-5:2016:

  • PHOTON, designed in 2011, combines the simplicity of the sponge construction and the security guarantees of the AES permutation (no, Joan Daemen is not a designer of PHOTON). The permutation is optimized for low-end platforms, so you can’t directly reuse AES code/silicon. PHOTON comes in multiple versions, depending on the security level you want (the higher the security, the bigger the state).
  • SPONGENT, also from 2011, is to PRESENT what PHOTON is to AES. Nuff said.
  • Lesamnta-LW is different. It’s not a sponge but has a SHA-like round function and targets 120-bit security with a single version. It’s also less light than the above two.

MACs

Of the three algorithms listed in ISO/IEC 29192-6:2019, I was only familiar with one Chaskey, which is essentially a 32-bit version of SipHash. The other two are

  • LightMAC, which is actually a MAC construction rather than strictly speaking an algorithm. And since I didn’t pay the 118 CHF to read the full document, I don’t really know what ISO standardized here (the mode itself? an instantiation with a specific block cipher? Help please.)
  • “Tsudik’s keymode”, apparently from this 1992 paper by Gene Tsudik, which discussed the secret-prefix and secret-suffix MACs and proposes the hybrid version, which may or may not be what this “keymode” is about. I’ve no idea why it ended up being standardized in 2019.

AEAD

Nothing here, ISO hasn’t standardized authenticated ciphers.

Summary

All these algorithms are okay, but their being ISO standards doesn’t say much and doesn’t mean that they’re necessarily superior to others, just that some people bothered submitting them and lobbying for their standardization.

How to lock a GitHub user out of their repos (bug or feature?)

We accidentally discovered this while working with our friends from X41, when GitHub denied me access to my private repositories; for example git pull of git clone would fail and write “ERROR: Permission to <account>/<repo>.git denied to deploy key”. I asked for help but nobody found the root cause.

Here’s the problem: GitHub recently introduced deploy keys, or keys granting read-only right to a server to best automate deployment scripts. This is a great feature, however it can be abused as follows to lock a GitHub user out of their repositories:

  1. Find one or more SSH public keys belonging to the victim, but not associated to their GitHub account. For example, you may happen to know SSH keys associated to another project, or you may use public keys from https://gitlab.com/username.keys (comparing with https://github.com/username.keys).
  2. Add these SSH public keys as deploy keys of one of your GitHub projects (in Settings > Deploy keys > Add deploy key). Let’s say this project is at github.com/attacker/repo.
  3. When connecting to GitHub over SSH, the victim will then be identified at attacker/repo (for example when doing ssh -T [email protected]), if and only if at least of the keys added as deploy keys is prioritized by SSH over any key linked to the GitHub account.

For example, if you have private key files id_ecdsa_github and id_ecdsa_gitlab in your ~/.ssh/ directory, and if SSH offers the public key id_ecdsa_gitlab.pub first, and if the attacker has added that key as deploy key, then GitHub will identify you as attacker/repo when you’ll try to connect to one of your repositories, thereby denying you access to your private repositories (and only granting read access to public ones).

The way SSH prioritizes key can be found in its code, and also varies depending on whether you use ssh-agent (typically, when caching passphrase-protected private keys).

This behavior of GitHub and SSH can therefore be exploited to lock GitHub users out of their private repositories (via command-line SSH), and arguably qualifies as a denial-of-service attack. Note that it doesn’t require any action from the victim, and only needs public information (if we assume that public keys are public). We’ve lost one afternoon of work investigating the issue, being involuntarily attacked by X41. After understanding what happened, we asked them to remove our (freshly generated) key from their repo’s deploy keys, which indeed solved the issue.

You can argue that it’s the responsibility of users to properly configure their ~/.ssh/config, which of course avoids the attack, and that that behavior is acceptable. This is presumably the opinion of GitHub, since they responded to our bug bounty entry that “it does not present a security risk”. GitHub probably has good reasons to ignore the risk, but from our perspective the potential annoyance and DoS risk to GitHub users is not be negligible.

The problem can probably be fixed, although it’s not be straightforward, since SSH authentication would have to depend not only on the host name, but also on the repository accessed (whereas access control to a repository is typically enforced after SSH authentication).

Update: As @stuartpb noticed, this behavior can also be exploited by adding a victim’s public keys to a user account (new or repo-less), and is not specific to deploy keys.

Cryptography in industrial embedded systems: our experience of the needs and constraints

A common model is that industrial embedded systems (a.k.a. IoT, M2M, etc.) need small, fast, low-energy crypto primitives—requirements often summarized by the “lightweight” qualificative. For example, a premise of NIST’s Lightweight Cryptography standardization project is that AES is not lightweight enough, and more generally that “the majority of current cryptographic algorithms were designed for desktop/server environments, many of these algorithms do not fit into constrained devices”—note the implicit emphasis on size.

In this article we share some observations from our experience working with industrial embedded systems in various industries, on various platforms and using various network protocols. We notably challenge the truism that small devices need small crypto, and argue that finding a suitable primitive is usually the simplest task that engineers face when integrating cryptography in their products. This article reviews some of the other problems one has to deal with when deploying cryptography mechanism on “lightweight” platforms.

We’re of course fatally subject to selection bias, and don’t claim that our perspective should be taken as a reference or authoritative. We nonetheless hope to contribute to a better understanding of what are the “constrained devices” that NIST refers to, and more generally of “real-world” cryptography in the context of embedded systems.

Few details can be share, alas, about our experience. Let us only say that we’ve designed, integrated, implemented, or reviewed cryptographic components in systems used in automotive, banking, content protection, satellite communications, law enforcement technology, supply chain management, device tracking, or healthcare.

AES is lightweight enough

Most of the time. Ten years ago we worked on a low-cost RFID product that had to use something else that AES, partially for performance reasons. Today many RFID products include an AES engine, as specified for example in the norm ISO/IEC 29167-10.

Systems on chips and boards for industrial application often include AES hardware, and when they don’t a software implementation of AES comes at acceptable costs. For example, chips from the very popular STM32 family generally provide readily available AES in different modes of operations.

An example perhaps of a very low-cost device that uses a non-AES primitive would be the Multos Step/One, which is EMVCo-compliant. In this case, 3DES was chosen instead. ATM Encrypting-PIN-Pads frequently use 3DES. We believe that the continued use of 3DES has more to do with compatibility and cost of replacement than the prohibitive cost of running a wider block cipher.

Of course an algorithm more compact and faster than AES wouldn’t hurt, but the benefits would have to justify the integration costs.

Choosing primitives is a luxury

More than once we faced the following challenge: create a data protection mechanism given the cryptography primitives available on the devices, which could for example be AES-GCM and SHA-256 only. We may have to use AES-GCM and SHA-256 because they’re standards, because they’re efficiently implemented (for example through hardware accelerators), or for the sake of interoperability. Note that a platform may give you access to AES-GCM, but not to the AES core directly, so you can’t use it to implement (say) AES-SIV.

If you want to use another algorithm than the ones available, you have to justify that it’s worth the cost of implementing, integrating and testing the new primitive. AES-GCM is not perfect (risk of nonce reuse, etc.), but the risk it creates is usually negligible compared to other risks. The situation may be different with AES-ECB.

Statelessness

Not all platforms are stateful, or reliably stateful. This means that you can’t always persistently store a counter, seed, or other context-dependent on the device. The software/firmware on the platform may be updatable over-the-air, but not always. The keys stored on the device may not be modifiable after the personalization phase of the production cycle.

Randomness

The platform may not offer you a reliable pseudorandom generator, or it may only have some low-entropy non-cryptographic generator, anyway that can be a severe limitation, especially if you realize this after proudly completing an implementation of ECDSA.

There are well-known workarounds of course, such as deterministic ECDSA and EdDSA for ECC signatures, or AES-SIV for authenticated encryption (this robustness to weak/non-randomness is the main reason why we chose to make it the default cipher in our company’s product). But sometimes it can get trickier, when you really need some kind of randomness yet can’t fully trust the PRNG (it’s more fun when the platform is stateless).

It’s not only about (authenticated) encryption

When no established standard such as TLS is used—and sometimes even when it is—the security layer is typically implemented at the application layer between the transport and business logic. (Authenticated) encryption is a typical requirement, but seldom the only one: you may have to worry about replay attacks or have to “obfuscate” some metadata or header information, for example to protect anonymity.

In an ideal world, you ought to use a thoroughly-designed, provably-secure, peer-reviewed protocol. But 1) such a thing likely doesn’t exist, and 2) even when it does it would probably not be suitable to your use case, for example if the said protocol requires a trusted third party or three network round-trips.

Message size limitations

True story: “Our clear payload is N bytes; the protected payload must be N bytes too, and must be encrypted and authenticated.” That’s when you have to be creative. Such a situation can occur with protocols such as Bluetooth Low-Energy or WAN protocols such as LoRaWAN or Sigfox (where the uplink payloads are 12 bytes and downlink payloads 8 bytes).

Even when you can afford some overhead to send a nonce and a tag, this may come at prohibitive cost if we’re talking of millions of messages and a per-volume pricing model. In other contexts, additional payload size can increase the risk of packet loss.

Network unreliability

It’s not just about TCP being reliable (guaranteed, in-order packet delivery) and UDP being not. Protocols running on top of TCP can have their own reliability properties caused by the way they transmit messages. For example, MQTT (when running over TCP) guarantees message delivery, but not in-order.

Whatever protocol is used, devices may have no way to transmit nor receive messages for a certain period of time. For example, communicating with satellites in non-geostationary orbit, or devices that are out of range for periods of time, such as aircraft, ships or smart meters as a measuring driver passes by.

An excellent engineering question is to ask how one would reliably transmit data from Mars, particularly where that data should be processed as quickly as possible on receipt. If not Mars, then further away. At such great distances, what is instantaneous for us starts to take seconds or minutes of time. The answer is to resend on a broadcast channel repeatedly, usually as an illustrative example of UDP/broadcast protocols. In these cases, packets are expected to be lost and round-trips are impossible—there is no way to confirm receipt.

Such limitations often prevent the use of crypto protocols adding RTTs, or requiring even a moderate level of synchronization with other devices. Unreliable network becomes particularly fun when implementing key rotation or key distribution mechanisms.

When crypto is too big

Crypto can be too big (in code size, or RAM usage) for certain platforms; the main cases we’ve encountered are when public-key operations are impossible, or when a TLS implementation takes too much resources. Although there are good TLS implementations for constrained platforms (such as ARM’s mbedTLS, wolfSSL, or BearSSL), they may include a lot of code to support the TLS standards and operations such as parsing certificates. Even the size of a TLS-PSK stack can prohibitive—and not because of AES.

Sometimes public-key cryptography is even possible within the limited capacity of the device, but the limiting factor is the protocol. An example of such a problem can be found in BearSSL’s documentation. Quoting Thomas Pornin’s TLS 1.3 Status, when streaming ASN.1 certificates, the usual order is end-entity first, CAs later. In any given object, the public key follows the certificate. EdDSA combines the public key and data when signing. Thus, in order to validate a signature, the entire certificate must be buffered until the public key can be extracted.

Now while a highly constrained device may simply use PSK and avoid this problem entirely, it is also true that the device may be capable of Ed25519 signatures even without sufficient RAM to buffer large certificates. This problem arises entirely from the choice of PureEdDSA rather than HashEdDSA in TLS 1.3.

Untrusted infrastructure

More often than you might expect, the infrastructure we use should not be trusted. In the MQTT context, this means brokers. In other context this means wireless repeaters, conversion gateways between protocols such as communication via SMS and so on. In the context of currently proposed IoT standards, these nodes are often assumed trusted and capable of re-encrypting for each hop using TLS where possible, or some other point-to-point protocol.

We believe that the implicit trust in the infrastructure by having it handle keys invites a far greater risk than the challenges of underpowered devices.

Conclusions

Cryptography on constrained devices can pose many problems, but the speed and size of symmetric primitives (ciphers, hash functions) is rarely one, at least in our experience (YMMV).

We can’t ignore that economics and risk management play into cryptography. Standard NIST cryptography primitives and NSA Suites A & B, for example, were designed to provide the US Government with an assurance that data is protected for the lifetime of the relevant classified information—on the order of magnitude of 50 years. It took time for the community to gain confidence in AES, but it’s now widely and globally trusted—anyway, safe block ciphers are easy to design; even DES nor GOST have never really been broken.

Lightweight cryptography might be suitable where such expectations of long-term security do not hold, and would allow the use of a very “lightweight” component. An extreme example is that or memory encryption, or “scrambling”, where only a handful of high-frequency cycles can be allocated.

The open question is whether we can design algorithms to match this requirement, bearing in mind that we have no ability to predict future developments. Looking back at history, requirements are driven by applications on which the public research community has little view. As highlighted in this article, said requirements often involve various components and technologies, which make the engineering problem difficult to approach to outsiders.

E4 vs. (D)TLS #IoTencryption

Today at Teserakt we discussed the benefits of using our end-to-end encryption protocol E4 instead of TLS with pre-shared keys (PSK), as sometimes used on low-end devices that don’t use public-key cryptography. The discussion started after a call with a start-up that is specialized in low-power WAN networks and uses DTLS with PSK for protecting data sent to and received from devices. We thought we would write this quick post to share our thoughts and encourage readers to share their experience with similar protocols.

The following points (in arbitrary order) are the main benefits of E4 that we identified, and in our experience cover some of the most important problems encountered when attempting to deploy encryption on low-power devices:

  • An application layer protocol and can therefore go over other protocols, properly end-to-end, whereas (D)TLS is not end-to-end however you cut it.
  • “No ridiculously complex standards (TLS requires full ASN.1 parser)”
  • Remains secure if the device has neither a PRNG nor a clock.
  • Simpler key management (no PKI, X.509, etc.).
  • 0-RTT; E4 doesn’t need to perform a handshake mechanism to start sending encrypted data.
  • Much smaller code and RAM footprint. In particular no need to allocate MBs of memory to process certificate chains unlike TLS.

A last remark: as an alternative to (D)TLS for low-end platforms we’ll be evaluating the Noise family of protocols and in particular Rust implementations optimized for ARM-based chips. More on this in a future post 🙂