IoT end-to-end encryption with E4 (4/n): security model

Is your communication secure? This sounds like a simple question, but a closer look at any communication protocol, boundary or permissions system will show it is anything but simple. You will often see more seasoned security professionals, before implementing security controls, talk about a threat model. More simply put: whether something is secure or not depends on your assumptions and expectations.

Threat modelling

We will use two example to introduce the concept of threat modelling in the context of encryption applications:

BEAST is an attack discovered several years ago that could decrypt data under specific conditions by misusing the TLS protocol. There are many good summaries and explanations of this attack online, so we will not repeat the details here. The important point is that the capabilities of an attacker were never really put forward, beyond a vague interception capability. More precisely, the model overlooked risks unique to the CBC mode of operation. We thus had higher expectations of the protocol than it actually provided.

Let us pick another, sometimes controversial example. Signal is a state-of-the-art end-to-end encrypted messaging service that requires accounts be bound to a phone number. This could be considered problematic if you expect Signal to hide who you are: you need to acquire a SIM card not associated with your identity, which is difficult in many European countries. You may also have more benign reasons: you might not want to give out a number you also use for WhatsApp and on which people can call you. However, if you start from the assumption that either this is not Signal’s concern, or that your adversary always knows who you are, then Signal suddenly looks much better. It depends on your perspective and your expectations.

A reasoning for the E4 security model

When we set out to design E4, we set ourselves the goal to bring state-of-the-art end-to-end encryption to the internet of things; or “Signal for IoT” if you like, yet technically very different than Signal. However embedded devices have a different set of constraints than mobile devices or computers. There is the obvious, namely that they tend to have a less powerful processor and much more limited memory. They may even work on battery. Less obvious is perhaps the fact that the device might be a sensor or other remote device, so human interaction to fix issues or accept key rotations may either be prohibitively expensive or outright impossible.

The Signal protocol allows you to verify the identity keys of the party you are talking with by comparing fingerprints. Once done, you can be sure that the messages you are exchanging are done so with the correct party – any change will be detected. Further, thanks to the double ratchet, your messages have “post-compromise security” – if the key for a single message is compromised the damage is limited to a few messages, not the entire conversation. Signal also does not require heavy parsing of complicated data formats like ASN.1. It is widely recognized as the state of the art in messaging security.

It may therefore seem logical to adopt Signal directly for embedded and IoT devices. Unfortunately it is not quite as straightforward as this. What we want, therefore, is to get as close to the level of security provided by Signal as is possible given the constraints of the embedded world. Also note that, instead of just “encryption”, we often prefer to talk of data protection, which covers other security aspects such as authenticity, integrity, and replay protection.

A high level view of requirements

What specifically do we want? After much research and discussion with our partners and customers, we came up with a list of requirements for E4. We will begin with some of the more obvious ones:

  • Data must be protected from its producer to its consumer(s), keeping in mind that there may be one or many data consumers, the set thereof may change over time.
  • Devices may be remote and must be autonomous. There may be no capacity for a human to validate keys or otherwise take decisions on the devices.
  • A fleet, or perhaps multiple fleets, of devices are managed by a single organization. This organization’s devices may have interoperate with legacy systems, and potentially external systems.
  • The organization has the capacity to deploy a management component, but may otherwise use infrastructure they do not trust and in particular wishes to trust the minimal amount of infrastructure possible.
  • The network between devices and from device to any controlling servers may cross multiple protocols. For example, the first “hop” may be LTE (“4G”) or satellite for some devices, followed by the public cloud.
  • Devices don’t necessarily address messages to another device or entity, but instead may tag messages with a “topic”, such that all clients subscribed to that topic will receive the message.

Some consequences of these choices are immediate in our design. In order to trust the minimum possible infrastructure, we made the choice to provide application layer end-to-end encryption – as opposed to a transport-specific protocol. This means our cryptography can be adapted to the protocol in question and if necessary wrapped in multiple different protocols. This also means only the two devices communicating actually need to read the message.

The lack of operator/human interaction also implies keys need to be managed. As a result, we engineered a command-and-control server for this purpose, which we call C2 for short. The server needs to be deployed somewhere with sufficient security controls, but otherwise acts as a client within the network and does not need to remain online continuously.

The C2 occupies a privileged position in our protocol. As will be discussed later, in some deployments it will possess secret keys for the devices, and in all deployments it replaces the human operator in deciding what keys to trust, when to change them and so on. Using a channel secured uniquely per each device, the instructions are then sent. We assume that the C2 is owned and operated by, or on behalf of, the same organization running the devices and the operators of the C2 are trusted. This is in contrast to the cloud and network infrastructure that might be used for communication, including brokers and gateways.

An obvious objection to this in a secure messenger scenario such as Signal would be that it amounts to key escrow, and the C2 can decrypt any messages on the network. This is true, but the scenario under consideration is different. In a secure messaging scenario, individuals may not be part of the same organisation and may not completely trust each other and as such key escrow is inappropriate. In the E4 model, human interaction with devices cannot be assumed: the device cannot ask a human to approve a fingerprint, verify another device, join or leave a group. We must offload this work to a third party. We also make the assumption that all the devices belong to the same organization and it is acceptable for that organization to have access to all keys of its own devices, should it desire.

We have talked about trust for a whole organization. It is of course entirely possible that an organization could run multiple, siloed C2s if they wished to enforce an entirely separate trust boundary. Nothing in E4 prevents this.

The challenges of embedded devices

Now we will look at the specific challenges posed by embedded devices.

  • Embedded devices may be capable of limited cryptographic primitives, or may only support a restrict set of algorithms. We will cover our specific cryptographic choices in a later post.
  • Devices may have very limited storage space. Within the constraint that something must be stored, E4 should still work.
  • We may not be able to rely on the clock on the device. Perhaps the clock is unreliable, or perhaps there simply is no clock. On the other hand, if a reliable clock exists we wish to exploit this fact.
  • The device may not have a cryptographic coprocessor or other “secure element”. Customers may for example be retrofitting older devices. In particular, while customers may be able to deploy new hardware based on hardware security solutions, we did not want to require this. On the other hand, if such a device exists, we would like to be compatible and able to exploit it.
  • The device may have poor to no entropy. This is a known problem with embedded devices whose complexity compared to desktop computers is much reduced and for which there is little environmental randomness to exploit, or doing so is incredibly slow.

Our desire to bring strong encryption to most embedded devices, even older ones, led to our decision to make E4 a software solution. We have of course not discounted the risk of hardware tampering and attacks and customers who wish to use components that protect against such risk may of course do so. For example, a platform can leverage AES accelerators or ARM TrustZone to make their E4 deployment more secure. We have chosen standard cryptographic primitives that are highly likely to be compatible with secure elements.

Limited storage space and limited cryptographic primitives led us to our most restrictive design: a symmetric-key only mode. There is a tradeoff here, of course, but we are still able to provide excellent levels of security.

Like most protocols, we wanted to provide anti-replay protection and we implement this using a time window during which messages may be accepted. If a clock is not considered reliable, we can use our protocol’s control messages for synchronization. If a clock does not exist, or this protection is not desired, it can be disabled.

The lack of clock also introduces another problem: revocation. In the internet’s Public Key Infrastructure this is done by associating a key (certificate) with a window of validity (expiry date). There are documented problems with this approach, namely if keys must be revoked prior to this expiry date. We certainly cannot afford to process lists of expired keys to achieve this end, and if the clock cannot be trusted then expiry itself becomes problematic. Luckily, we can simply take inspiration from the double ratchet and be bullish: when we want to expire a key, we replace it and flush it from memory.

Surmounting the entropy problem has been studied in cryptographic research. Protocols that rely on certain parameter choices being random have led to attacks when they are not, such as the k=3 ECDSA bug in the PlayStation. This has led to deterministic encryption designs, and in particular synthetic initialization vector designs (such as deterministic ECDSA and AES-SIV). These designs are secure without the need to provide entropy to the cryptographic code. For key generation, the C2 takes over responsibility and is presumed to run on a server which has more than sufficient entropy – this is true of almost any normal server you have and does not impose any specific requirements.

Device communication

Our next step was to look at how devices actually communicate. There are two obvious cases: receiving firmware updates from the organization’s server, and submitting data back to a collection point.

However, we didn’t want to constrain our protocol to these two use cases and have to add fixes at a later date. Given the need for the C2 to manage keys on devices, we needed a separate control channel that we have discussed previously; this should be separate to a normal message channel that devices may use. Devices may be subscribed to a single message channel in addition to their control channel, or they might be part of a group shared with other devices. Those mappings need to be changed dynamically and we may wish to rotate keys based on various criteria that have been discussed in previous posts.

From the point of view of security constraints, this does not impose anything on us aside from one requirement: we need to be able to rotate keys for all channels, including the device’s own key, as needed.

When we are not so constrained

Not all devices are so constrained. In particular, surprisingly small devices are capable of public key cryptography and post-quantum algorithms are being benchmarked against the ARM Cortex-M4.

Customers may wish to use public-key cryptography if it exists. They may also want to use post-quantum algorithms and we should be ready to deploy these once standardized.

On the other hand, “ciphertext agility”, where protocols negotiate the cipher suite they will use, are inefficient and have documented downgrade problems. We wish to avoid this, and do so in the most trivial way possible: we have different versions of our protocol for these use cases, that customers may adopt as necessary.

The E4 model: summary

Putting all of this together, E4 is designed to meet the following constraints:

  • Devices in a single organization must be managed together and remotely, and include a key management solution that is easy to operate and scales to many devices and messages.
  • It is possible to deploy a command-and-control server in a suitably protected fashion.
  • The infrastructure used, aside from the C2 and communicating components, is minimally trusted. In particular, end-to-end encryption is required.
  • We wish to still be deployable in very constrained environments using only symmetric key cryptography. However, public key and post quantum support should be possible if desired, but this choice must be made according to risk.
  • We wish to be able to retrofit and deploy inexpensively in software, while remaining compatible with and ready to support state of the art hardware protection with minimal effort.
  • We cannot necessarily trust the clock on the device. If one is present we work around this lack of trust; if not we may degrade the replay protection gracefully.
  • We cannot rely on randomness on the device. The managed keys approach allows us to mitigate this for key generation, and appropriate choice of cryptographic algorithms allows us to remove our reliance on this requirement elsewhere.
  • We wish to be able to rotate and replace keys flexibly and simply, without reliance on revocation schemes and expiry dates that have proven unreliable.

Are the communications of devices secure with E4? We believe they are. Messages cannot be modified or read by any intermediate network node such as a message broker; only the end devices can decrypt. We are able to rotate and replace keys flexibly, limiting the window of exposure and allowing a reasonable level of forward and backward secrecy. Except in very constrained cases, we can supply replay-protection in addition, and we do not require a source of randomness. We can do this in devices that can only support software updates and have limited processing power. However, if the devices are capable of public-key or even post-quantum algorithms, we can deploy a version of our protocol that leverages this. If the customer can deploy hardware security, we can adapt to that too. We can do this without requiring human interaction with the devices, supporting true embedded use cases, where the devices are in remote or inaccessible locations.