IoT end-to-end encryption with E4 (3/n): automation component

Our previous post, introduced the architecture and components of E4, Teserakt’s encryption and key management for IoT systems. This post is about one of these components, the automation engine, which we’ll describe more in depth today.

Keys are not forever

Devices managed by E4 rely on symmetric or public-key cryptography to protect their communications. In both mode, each device has its own identity key, plus one one key for each topics (you can see a topic as a "conversation", as a data type, as a classification level, and so on). Since devices may be deployed for years, there must be processes to renew these keys remotely and securely. Motivations for rotating keys include:

  • Revocation of a key that has possibly been compromised, or is not to be trusted for various reasons

  • Provide forward secrecy, that is, guarantee that if a key is compromised at some point, then earlier communications (with a different key) cannot be decrypted

  • Provide backward secrecy, or post-compromise security, for example in order to guarantee that, would some topic key be compromised, future topic keys remain secret, and therefore content protected with these keys remain protected.

Such key rotation may be done manually, or partially manually by creating some scripts, creating a custom network service, and so on, and updating the script manually when needed. This is a cumbersome and error-prone procedure that is unlikely to scale and to prove reliable in critical environments. We therefore propose an automation system that is simple to use (both through a graphical UI and scripts), and that easily scales to many topics and many messages.

The E4 automation engine address exactly this problem. It automates the rotation of any E4 keys by defining rules following a simple grammar, depicted below:

In the context of our key rotation automation, we then use following terms:

  • Rule: a rule is the main component of the engine. It holds a list of triggers and targets controlling when the key rotation will happen, and which device or topic will be updated.
  • Trigger: a trigger is a condition that must be fulfilled in order to execute a key rotation for all the parent rule’s target. A trigger can be a predefined period of time, or a number of system events, such as a threshold of clients joining or leaving a certain topic.
  • Target: a target designates either a client device or a topic, for which the key will be updated when one of the rule’s triggers will have its condition fulfilled.

When to rotate keys

We currently support basic use cases such as:

  • Rotating a device or topic key at a fixed time interval
  • Rotating a topic key after a certain number of clients joined or left this topic

Those cover the most common scenarios requiring a key rotation. We’ve also carefully designed our engine and rules format to be flexible and easily extensible. If you have any other use cases which are not actually covered, we’d be happy to hear about them!

We’re often asked what’s the right time interval for key rotation. There is no single right answer, for it depends on a number of factor including the threat model and risks, the network reliability, the cost of sending control messages, etc.

How it works

As seen in our previous architecture post, the automation engine extends the C2 server functionalities. When enabled, it starts an internal scheduler that monitors the time-based rules, and also registers a channel on the C2 server to receive system events. On each events, the automation engine will check from the defined rules if any triggers are due, and request a key rotation for each of the rule’s targets on the C2. The trigger state then reset and is waiting again for its conditions to be met to fire again.

Client key rotation

A client’s key is its more critical key, and should ideally be better protected than topic keys, which are shared with other devices. In the symmetric key mode, the client’s key is also known to the C2, and used to protect control messages sent to that client. One of thes commands supported by control messages is SetIDKey, which offers a new client key to a remote device.

When a client key is rotated, the C2 server thus issues a SetIDKey command, containing a newly generated key, and protected with the current client key. The command is then transmitted to the client via the MQTT broker, by publishing it on the client control topic.

In the public-key mode, the C2 only knows clients’ public keys, not their private keys. Therefore client keys can’t be rotated in the public key mode.

Topic key rotation

When a topic key is rotated, the C2 server first generates a new random key, before issuing a SetTopicKey command for every subscribed clients on this topic. Each command is then protected, using the client key of its recipient, before transmitting on each client control topics.

When a topic has many subscribed clients, there might be a significant delay between the first and last SetTopicKey commands transmission (and thus reception). Some clients may thus use the old key while others are already using the new one, preventing the decryption of messages in each way. To avoid message loss, we thus need a key transition mechanism, which we implemented by defining a configurable grace period.

Grace period

The grace period is a client parameter, which defines the duration during which both the old and new key can be used, starting from the time when a new topic key is received. The client will store the newly received key, and keep the old one for a configurable amount of time. The devices then does the following:

  • When receiving a message, it first tries to unprotect it with the new key. If this fails (as indicated by invalid authentication tag), the client then tries with the old key if it is still within the validity period.

  • When transmitting a message, the new key is always used. This might prevent some devices from reading the message, however this is safest behavior in case the old key was compromised. Furthermore, using the old key would also prevent messages to be decrypted by clients whose grace period has passed.

Again, when we’re asked what’s the best grace period value, our answer is that it depends on several factors, including the number of clients subscribed to a topic, how often keys are rotated, the network’s latency and reliability, and so on. Ideally, the grace period should be chosen after analyzing empirical data. The best value of the grace period is one that sufficiently minimizes messages loss, while minimizing the amount of messages encrypted using the old key (thus, the shorter the period, the safer).

Conclusion

Key management, and more particularly secret rotation, can be a tedious task. With E4, we try to make it transparent, fail-safe, and easy to configure and operate. You can configure your device and topic keys rotation, either from the dedicated page on the web console, or from the command-line client, allowing to create or view active rotation policies.

You can try it yourself, and start rotating keys using our e4 client, with binaries for common architectures available on the release page. Automation rules can then be created from the demo console.

IoT end-to-end encryption with E4 (1/n): it's open-source!

We’re thrilled to announce a major milestone for Teserakt, with the launch of our flagship product E4 as open-source software.  We want to make Teserakt’s IoT protection solution as easy to try as possible, and available to any user from hobbyists to large corporations. This is why our client library is now free to use, enabling anyone to integrate strong end-to-end encryption into their applications.

We go even further: Although our server is not open-source, we run a public demo version of our web interface, available without any registration or email identification. Using this demo in combination with our open-source client application, anyone can test our key management server with no effort.

E4 is the product of two years of research, interacting with customers from industries such as aerospace, agriculture, automotive, energy, healthcare, and physical access, to understand real applications’ security needs and technical constraints. E4 is also ideal for mobile applications, to protect personal or other sensitive data.

This is the first of a series of posts about E4, its architecture, components, and internals, which we’ll be publishing during the coming weeks. Meanwhile, technical and commercial information about E4 is now available on E4’s page.

The Q&A below gives a general overview of E4, before more details in the upcoming posts. Don’t hesitate to contact us directly at [email protected] if you have specific questions or would like access to a private instance of the E4 server.

What is E4?

E4 is Teserakt’s innovative data protection software solution for IoT networks, consisting of 

  • A client library, used to encrypt and decrypt data, and 
  • A key server to manage devices rights manually or automatically.

What exactly is now open-source?

We open-source E4’s client library, available in the C and Go languages. It can be integrated into a wide range of embedded platforms, as well as mobile applications or cloud services, in order to enable end-to-end data protection.

E4’s key server is not open-source, but is not required to test the client library’s capabilities. Without the server, the client library can be used with fixed keys and without management and monitoring features.

Do I need the server for my application?

E4’s key server is used to manage devices remotely in order to provision keys, grant and revoke rights to devices, ensure perfect forward secrecy, automate key rotation policies, monitor message for anomalies, and so on.

Enterprise, production deployments will likely need the server, while personal projects and prototypes can work with only the client library. The server is offered on a commercial basis along with technical support, software maintenance, and extra features integration.

What network protocols do you support?

The client library is agnostic of the network protocol, as it works on top of the network layer in order to encrypt/decrypt data, and to process control messages from a server.  

Our server works by default with MQTT (the IoT protocol used by AWS’ IoT Hub and Google Cloud IoT platforms), and can be configured to work with other protocols such as Kafka, AMQP, Zero-MQ. To know if we support your protocol, please contact us at [email protected].

How to try it?

It takes less than 5 minutes! To try E4 without writing your own application, we created a simple interactive client application that you can use in combination with our public demo server interface. You can directly download the client’s binary for your platform or build it yourself, and then follow the instructions.

What cryptography does E4 use?

E4 can run in two modes, symmetric or public-key cryptography. The symmetric mode is optimized for the most constrained platforms, and only uses AES-SIV and SHA-3. The public-key mode uses in addition Ed25519 signatures and X25519 key exchange. Details of our crypto protocols will appear in a subsequent post. Note that we also support a cipher suite including only FIPS 140-2 primitives.

At what address can I find… ?

Lightweight crypto standards (sorry NIST)

NIST’s running a Lightweight Cryptography project in order to standardize symmetric encryption primitives that are lighter than the established NIST standards (read: AES and its modes of operations). NIST claims that “[because] the majority of current cryptographic algorithms were designed for desktop/server environments, many of these algorithms do not fit into constrained devices.” This is good motivation for the competition, however it’s factually incorrect: AES today fits in almost all IoT-ish chips and has even been used for bus and memory encryption.

But that’s not the point of this post—for more on that subject, come hear our talk at NIST’s Lightweight Cryptography Workshop in about 10 days, a talk derived from one of our previous posts, and based on a multitude of real examples. (See also this recent Twitter thread.)

Sorry NIST, this post is not about you, but about another standardization body: ISO, and specifically its ISO/IEC 29192 class of lightweight standards within the 35.030 category (“IT Security – including encryption”). This category includes no less than 9 standards relating to lightweight cryptography, which are (copying verbatim from the ISO page, links included):

ISO/IEC 29192-1:2012Information technology — Security techniques — Lightweight cryptography — Part 1: General90.93ISO/IEC JTC 1/SC 27
ISO/IEC 29192-2IT security techniques — Lightweight cryptography — Part 2: Block ciphers60.00ISO/IEC JTC 1/SC 27
ISO/IEC 29192-2:2012Information technology — Security techniques — Lightweight cryptography — Part 2: Block ciphers90.92ISO/IEC JTC 1/SC 27
ISO/IEC 29192-3:2012Information technology — Security techniques — Lightweight cryptography — Part 3: Stream ciphers90.93ISO/IEC JTC 1/SC 27
ISO/IEC 29192-4:2013Information technology — Security techniques — Lightweight cryptography — Part 4: Mechanisms using asymmetric techniques90.93ISO/IEC JTC 1/SC 27
ISO/IEC 29192-4:2013/AMD 1:2016Information technology — Security techniques — Lightweight cryptography — Part 4: Mechanisms using asymmetric techniques — Amendment 160.60ISO/IEC JTC 1/SC 27
ISO/IEC 29192-5:2016Information technology — Security techniques — Lightweight cryptography — Part 5: Hash-functions60.60ISO/IEC JTC 1/SC 27
ISO/IEC 29192-6:2019Information technology — Lightweight cryptography — Part 6: Message authentication codes (MACs)60.60ISO/IEC JTC 1/SC 27
ISO/IEC 29192-7:2019Information security — Lightweight cryptography — Part 7: Broadcast authentication protocols60.60ISO/IEC JTC 1/SC 27

Granted, few people and industries care about ISO/IEC standards, and you probably don’t, neither do we really to be honest—the fee around $100 to access the ISO standard documents probably doesn’t help in making ISO algorithms more popular.

We nonetheless think that it would be worthwhile to have a list of lightweight symmetric algorithms (whatever your weight metric) that received the blessing of allegedly some competent cryptographers, and therefore that are presumably safe to use. We’ve therefore reviewed what ISO has to offer so that you don’t have to, and summarize the fruit of our research in the remainder of this post:

Block ciphers

ISO/IEC 29192-2:2012 (thus, from 2012) standardizes these two block ciphers (literally just the block ciphers, and not any mode, which are covered in another ISO standard [nothing surprising here: ECB, CBC, OFB, CFB, CTR]):

  • PRESENT, anagram and little sister of SERPENT, designed in 2007, and the least Google-friendly cipher ever. Then marketed as ultra-lightweight, PRESENT is a simple substitution-permutation network with 4-bit S-boxes, which is more hardware-friendly than it is software-friendly, but will perform well enough almost everywhere. It has 64-bit blocks and a key of 80 or 128 bits.
  • CLEFIA, also designed in 2007, from Sony, initially aimed for DRM applications, is a boring Feistel scheme with 128-bit blocks and a key of 128, 192, or 256 bits. CLEFIA isn’t really more lightweight than AES, and I guess Sony lobbied for its standardization.

Stream ciphers

ISO/IEC 29192-3:2012 standardizes these two stream ciphers (again, one bit-oriented and one byte-oriented):

  • Trivium, one of the winners of the eSTREAM project, the favorite cipher of the DEF CON conference, is a minimalistic, bit-oriented stream cipher that is a simple combination of shift registers. Trivium is arguably the lightweightest algorithm in this post.
  • Enocoro, whose existence I had completely forgotten until writing this post. Designed by the Japanese firm Hitachi, Enocoro was submitted to the CRYPTREC contest in 2010 (see this cryptanalysis evaluation), yet ended up not being selected. Enocoro is a byte-oriented feedback shift register with a conservative design. It supports 80- and 128-bit keys, and is most probably safe (and safer with 128-bit keys).

Hash functions

This time not two but three algorithms, as defined in ISO/IEC 29192-5:2016:

  • PHOTON, designed in 2011, combines the simplicity of the sponge construction and the security guarantees of the AES permutation (no, Joan Daemen is not a designer of PHOTON). The permutation is optimized for low-end platforms, so you can’t directly reuse AES code/silicon. PHOTON comes in multiple versions, depending on the security level you want (the higher the security, the bigger the state).
  • SPONGENT, also from 2011, is to PRESENT what PHOTON is to AES. Nuff said.
  • Lesamnta-LW is different. It’s not a sponge but has a SHA-like round function and targets 120-bit security with a single version. It’s also less light than the above two.

MACs

Of the three algorithms listed in ISO/IEC 29192-6:2019, I was only familiar with one Chaskey, which is essentially a 32-bit version of SipHash. The other two are

  • LightMAC, which is actually a MAC construction rather than strictly speaking an algorithm. And since I didn’t pay the 118 CHF to read the full document, I don’t really know what ISO standardized here (the mode itself? an instantiation with a specific block cipher? Help please.)
  • “Tsudik’s keymode”, apparently from this 1992 paper by Gene Tsudik, which discussed the secret-prefix and secret-suffix MACs and proposes the hybrid version, which may or may not be what this “keymode” is about. I’ve no idea why it ended up being standardized in 2019.

AEAD

Nothing here, ISO hasn’t standardized authenticated ciphers.

Summary

All these algorithms are okay, but their being ISO standards doesn’t say much and doesn’t mean that they’re necessarily superior to others, just that some people bothered submitting them and lobbying for their standardization.

How to lock a GitHub user out of their repos (bug or feature?)

We accidentally discovered this while working with our friends from X41, when GitHub denied me access to my private repositories; for example git pull of git clone would fail and write “ERROR: Permission to <account>/<repo>.git denied to deploy key”. I asked for help but nobody found the root cause.

Here’s the problem: GitHub recently introduced deploy keys, or keys granting read-only right to a server to best automate deployment scripts. This is a great feature, however it can be abused as follows to lock a GitHub user out of their repositories:

  1. Find one or more SSH public keys belonging to the victim, but not associated to their GitHub account. For example, you may happen to know SSH keys associated to another project, or you may use public keys from https://gitlab.com/username.keys (comparing with https://github.com/username.keys).
  2. Add these SSH public keys as deploy keys of one of your GitHub projects (in Settings > Deploy keys > Add deploy key). Let’s say this project is at github.com/attacker/repo.
  3. When connecting to GitHub over SSH, the victim will then be identified at attacker/repo (for example when doing ssh -T [email protected]), if and only if at least of the keys added as deploy keys is prioritized by SSH over any key linked to the GitHub account.

For example, if you have private key files id_ecdsa_github and id_ecdsa_gitlab in your ~/.ssh/ directory, and if SSH offers the public key id_ecdsa_gitlab.pub first, and if the attacker has added that key as deploy key, then GitHub will identify you as attacker/repo when you’ll try to connect to one of your repositories, thereby denying you access to your private repositories (and only granting read access to public ones).

The way SSH prioritizes key can be found in its code, and also varies depending on whether you use ssh-agent (typically, when caching passphrase-protected private keys).

This behavior of GitHub and SSH can therefore be exploited to lock GitHub users out of their private repositories (via command-line SSH), and arguably qualifies as a denial-of-service attack. Note that it doesn’t require any action from the victim, and only needs public information (if we assume that public keys are public). We’ve lost one afternoon of work investigating the issue, being involuntarily attacked by X41. After understanding what happened, we asked them to remove our (freshly generated) key from their repo’s deploy keys, which indeed solved the issue.

You can argue that it’s the responsibility of users to properly configure their ~/.ssh/config, which of course avoids the attack, and that that behavior is acceptable. This is presumably the opinion of GitHub, since they responded to our bug bounty entry that “it does not present a security risk”. GitHub probably has good reasons to ignore the risk, but from our perspective the potential annoyance and DoS risk to GitHub users is not be negligible.

The problem can probably be fixed, although it’s not be straightforward, since SSH authentication would have to depend not only on the host name, but also on the repository accessed (whereas access control to a repository is typically enforced after SSH authentication).

Update: As @stuartpb noticed, this behavior can also be exploited by adding a victim’s public keys to a user account (new or repo-less), and is not specific to deploy keys.

Cryptography in industrial embedded systems: our experience of the needs and constraints

A common model is that industrial embedded systems (a.k.a. IoT, M2M, etc.) need small, fast, low-energy crypto primitives—requirements often summarized by the “lightweight” qualificative. For example, a premise of NIST’s Lightweight Cryptography standardization project is that AES is not lightweight enough, and more generally that “the majority of current cryptographic algorithms were designed for desktop/server environments, many of these algorithms do not fit into constrained devices”—note the implicit emphasis on size.

In this article we share some observations from our experience working with industrial embedded systems in various industries, on various platforms and using various network protocols. We notably challenge the truism that small devices need small crypto, and argue that finding a suitable primitive is usually the simplest task that engineers face when integrating cryptography in their products. This article reviews some of the other problems one has to deal with when deploying cryptography mechanism on “lightweight” platforms.

We’re of course fatally subject to selection bias, and don’t claim that our perspective should be taken as a reference or authoritative. We nonetheless hope to contribute to a better understanding of what are the “constrained devices” that NIST refers to, and more generally of “real-world” cryptography in the context of embedded systems.

Few details can be share, alas, about our experience. Let us only say that we’ve designed, integrated, implemented, or reviewed cryptographic components in systems used in automotive, banking, content protection, satellite communications, law enforcement technology, supply chain management, device tracking, or healthcare.

AES is lightweight enough

Most of the time. Ten years ago we worked on a low-cost RFID product that had to use something else that AES, partially for performance reasons. Today many RFID products include an AES engine, as specified for example in the norm ISO/IEC 29167-10.

Systems on chips and boards for industrial application often include AES hardware, and when they don’t a software implementation of AES comes at acceptable costs. For example, chips from the very popular STM32 family generally provide readily available AES in different modes of operations.

An example perhaps of a very low-cost device that uses a non-AES primitive would be the Multos Step/One, which is EMVCo-compliant. In this case, 3DES was chosen instead. ATM Encrypting-PIN-Pads frequently use 3DES. We believe that the continued use of 3DES has more to do with compatibility and cost of replacement than the prohibitive cost of running a wider block cipher.

Of course an algorithm more compact and faster than AES wouldn’t hurt, but the benefits would have to justify the integration costs.

Choosing primitives is a luxury

More than once we faced the following challenge: create a data protection mechanism given the cryptography primitives available on the devices, which could for example be AES-GCM and SHA-256 only. We may have to use AES-GCM and SHA-256 because they’re standards, because they’re efficiently implemented (for example through hardware accelerators), or for the sake of interoperability. Note that a platform may give you access to AES-GCM, but not to the AES core directly, so you can’t use it to implement (say) AES-SIV.

If you want to use another algorithm than the ones available, you have to justify that it’s worth the cost of implementing, integrating and testing the new primitive. AES-GCM is not perfect (risk of nonce reuse, etc.), but the risk it creates is usually negligible compared to other risks. The situation may be different with AES-ECB.

Statelessness

Not all platforms are stateful, or reliably stateful. This means that you can’t always persistently store a counter, seed, or other context-dependent on the device. The software/firmware on the platform may be updatable over-the-air, but not always. The keys stored on the device may not be modifiable after the personalization phase of the production cycle.

Randomness

The platform may not offer you a reliable pseudorandom generator, or it may only have some low-entropy non-cryptographic generator, anyway that can be a severe limitation, especially if you realize this after proudly completing an implementation of ECDSA.

There are well-known workarounds of course, such as deterministic ECDSA and EdDSA for ECC signatures, or AES-SIV for authenticated encryption (this robustness to weak/non-randomness is the main reason why we chose to make it the default cipher in our company’s product). But sometimes it can get trickier, when you really need some kind of randomness yet can’t fully trust the PRNG (it’s more fun when the platform is stateless).

It’s not only about (authenticated) encryption

When no established standard such as TLS is used—and sometimes even when it is—the security layer is typically implemented at the application layer between the transport and business logic. (Authenticated) encryption is a typical requirement, but seldom the only one: you may have to worry about replay attacks or have to “obfuscate” some metadata or header information, for example to protect anonymity.

In an ideal world, you ought to use a thoroughly-designed, provably-secure, peer-reviewed protocol. But 1) such a thing likely doesn’t exist, and 2) even when it does it would probably not be suitable to your use case, for example if the said protocol requires a trusted third party or three network round-trips.

Message size limitations

True story: “Our clear payload is N bytes; the protected payload must be N bytes too, and must be encrypted and authenticated.” That’s when you have to be creative. Such a situation can occur with protocols such as Bluetooth Low-Energy or WAN protocols such as LoRaWAN or Sigfox (where the uplink payloads are 12 bytes and downlink payloads 8 bytes).

Even when you can afford some overhead to send a nonce and a tag, this may come at prohibitive cost if we’re talking of millions of messages and a per-volume pricing model. In other contexts, additional payload size can increase the risk of packet loss.

Network unreliability

It’s not just about TCP being reliable (guaranteed, in-order packet delivery) and UDP being not. Protocols running on top of TCP can have their own reliability properties caused by the way they transmit messages. For example, MQTT (when running over TCP) guarantees message delivery, but not in-order.

Whatever protocol is used, devices may have no way to transmit nor receive messages for a certain period of time. For example, communicating with satellites in non-geostationary orbit, or devices that are out of range for periods of time, such as aircraft, ships or smart meters as a measuring driver passes by.

An excellent engineering question is to ask how one would reliably transmit data from Mars, particularly where that data should be processed as quickly as possible on receipt. If not Mars, then further away. At such great distances, what is instantaneous for us starts to take seconds or minutes of time. The answer is to resend on a broadcast channel repeatedly, usually as an illustrative example of UDP/broadcast protocols. In these cases, packets are expected to be lost and round-trips are impossible—there is no way to confirm receipt.

Such limitations often prevent the use of crypto protocols adding RTTs, or requiring even a moderate level of synchronization with other devices. Unreliable network becomes particularly fun when implementing key rotation or key distribution mechanisms.

When crypto is too big

Crypto can be too big (in code size, or RAM usage) for certain platforms; the main cases we’ve encountered are when public-key operations are impossible, or when a TLS implementation takes too much resources. Although there are good TLS implementations for constrained platforms (such as ARM’s mbedTLS, wolfSSL, or BearSSL), they may include a lot of code to support the TLS standards and operations such as parsing certificates. Even the size of a TLS-PSK stack can prohibitive—and not because of AES.

Sometimes public-key cryptography is even possible within the limited capacity of the device, but the limiting factor is the protocol. An example of such a problem can be found in BearSSL’s documentation. Quoting Thomas Pornin’s TLS 1.3 Status, when streaming ASN.1 certificates, the usual order is end-entity first, CAs later. In any given object, the public key follows the certificate. EdDSA combines the public key and data when signing. Thus, in order to validate a signature, the entire certificate must be buffered until the public key can be extracted.

Now while a highly constrained device may simply use PSK and avoid this problem entirely, it is also true that the device may be capable of Ed25519 signatures even without sufficient RAM to buffer large certificates. This problem arises entirely from the choice of PureEdDSA rather than HashEdDSA in TLS 1.3.

Untrusted infrastructure

More often than you might expect, the infrastructure we use should not be trusted. In the MQTT context, this means brokers. In other context this means wireless repeaters, conversion gateways between protocols such as communication via SMS and so on. In the context of currently proposed IoT standards, these nodes are often assumed trusted and capable of re-encrypting for each hop using TLS where possible, or some other point-to-point protocol.

We believe that the implicit trust in the infrastructure by having it handle keys invites a far greater risk than the challenges of underpowered devices.

Conclusions

Cryptography on constrained devices can pose many problems, but the speed and size of symmetric primitives (ciphers, hash functions) is rarely one, at least in our experience (YMMV).

We can’t ignore that economics and risk management play into cryptography. Standard NIST cryptography primitives and NSA Suites A & B, for example, were designed to provide the US Government with an assurance that data is protected for the lifetime of the relevant classified information—on the order of magnitude of 50 years. It took time for the community to gain confidence in AES, but it’s now widely and globally trusted—anyway, safe block ciphers are easy to design; even DES nor GOST have never really been broken.

Lightweight cryptography might be suitable where such expectations of long-term security do not hold, and would allow the use of a very “lightweight” component. An extreme example is that or memory encryption, or “scrambling”, where only a handful of high-frequency cycles can be allocated.

The open question is whether we can design algorithms to match this requirement, bearing in mind that we have no ability to predict future developments. Looking back at history, requirements are driven by applications on which the public research community has little view. As highlighted in this article, said requirements often involve various components and technologies, which make the engineering problem difficult to approach to outsiders.

E4 vs. (D)TLS #IoTencryption

Today at Teserakt we discussed the benefits of using our end-to-end encryption protocol E4 instead of TLS with pre-shared keys (PSK), as sometimes used on low-end devices that don’t use public-key cryptography. The discussion started after a call with a start-up that is specialized in low-power WAN networks and uses DTLS with PSK for protecting data sent to and received from devices. We thought we would write this quick post to share our thoughts and encourage readers to share their experience with similar protocols.

The following points (in arbitrary order) are the main benefits of E4 that we identified, and in our experience cover some of the most important problems encountered when attempting to deploy encryption on low-power devices:

  • An application layer protocol and can therefore go over other protocols, properly end-to-end, whereas (D)TLS is not end-to-end however you cut it.
  • “No ridiculously complex standards (TLS requires full ASN.1 parser)”
  • Remains secure if the device has neither a PRNG nor a clock.
  • Simpler key management (no PKI, X.509, etc.).
  • 0-RTT; E4 doesn’t need to perform a handshake mechanism to start sending encrypted data.
  • Much smaller code and RAM footprint. In particular no need to allocate MBs of memory to process certificate chains unlike TLS.

A last remark: as an alternative to (D)TLS for low-end platforms we’ll be evaluating the Noise family of protocols and in particular Rust implementations optimized for ARM-based chips. More on this in a future post 🙂

A secure element host library bug

As part of our mission to bring state of the art security to embedded devices, Teserakt occasionally works with customers to review the security of their products. With our cryptographic expertise we are often asked to look at the design of cryptographic solutions, but we also review code for potential security-sensitive issues.

One of our customers had chosen to use the A71CH Secure Element, which at its simplest is a smartcard element for embedded circuitry capable of public key cryptography on behalf of the host MCU. Such devices, like smartcards, store public-private keypairs in such a way that the private component is next-to-impossible to extract (in theory), as such, for high-assurance systems they are hugely important.

The host MCU, the device to which the secure element is connected, speaks to the A71CH device using the GlobalPlatformPro smartcard standards, specifically SCP11. To facilitate this, NXP provide a host library that can be linked with embedded firmware to talk to the device.

During our review, we were asked to look specifically at components of the NXP library code to ensure this was also free of potential errors. Here we found some classic overflows. First, the deprecated code for selecting an applet is as follows:

U32 GP_SelectApplet(U8 * pAppletName, U8 appletNameLength, U8 * pResponse, U32 * pResponseLength)
{
U32 st = 0;
U8 txBuf[128];
U8 len = appletNameLength;
txBuf[0] = CLA_ISO7816;
txBuf[1] = INS_GP_SELECT;
txBuf[2] = 0x04; txBuf[3] = 0x00;
txBuf[4] = len;
// AV: buffer overflow here. max(U8)=255, sizeof txBuf=128.
// fix this with:
// #DEFINE TRANS_APPLET_OFFSET 5
// memcpy(&txBuf[TRANS_APPLET_OFFSET], pAppletName,min(appletNameLength, sizeof(txBuf) - TRANS_APPLET_OFFSET));
memcpy(&txBuf[5], pAppletName, len);
txBuf[5+ len] = 0x00;
assert(pAppletName != NULL);
st = smCom_TransceiveRaw(txBuf, 6+len, pResponse, pResponseLength);
return st;
}

We have included our review comments. The problem should be obvious to anyone familiar with secure coding in C: the pAppletName buffer can contain more than 128 bytes of data, which will overflow into the return address and onto the stack.

Having seen this, we next looked at the non-deprecated code for applet selection. This starts at:

U16 GP_Select(U8 *appletName, U16 appletNameLen, U8 *responseData, U16 *responseDataLen)
{
U16 rv = 0;
apdu_t apdu;
apdu_t * pApdu = (apdu_t *) &apdu;
U8 isOk = 0x00;
assert(appletName != NULL);
assert(responseData != NULL);
pApdu->cla = CLA_ISO7816;
pApdu->ins = INS_GP_SELECT;
pApdu->p1 = 0x04;
pApdu->p2 = 0x00;
AllocateAPDUBuffer(pApdu);
SetApduHeader(pApdu, USE_STANDARD_APDU_LEN);
smApduAppendCmdData(pApdu, appletName, appletNameLen);
rv = (U16)scp_Transceive(pApdu, SCP_MODE);
if (rv == SMCOM_OK) {
rv = smGetSw(pApdu, &isOk);
if (isOk) {
rv = smApduGetResponseBody(pApdu, responseData, responseDataLen);
}
}
FreeAPDUBuffer(pApdu); return rv;
}

So this code is building an apdu_t data type (application protocol data unit), which is defined as follows

typedef struct
{
U8 cla;
U8 ins;
U8 p1;
U8 p2;
U8* pBuf;
U16 buflen;
U16 rxlen;
U8 extendedLength;
U8 hasData;
U16 lc;
U8 lcLength;
U8 hasLe;
U16 le;
U8 leLength;
U16 offset;
} apdu_t;

The interesting part here is pBuf and how it is manipulated, as this is where any payload data, such as the applet name, will go. Here is the implementation of this function:

// defined elsewhere:
define MAX_APDU_BUF_LENGTH 1454
// defined in sm_apdu.c:
ifndef USE_MALLOC_FOR_APDU_BUFFER
static U8 sharedApduBuffer[MAX_APDU_BUF_LENGTH];
endif
U8 AllocateAPDUBuffer(apdu_t * pApdu)
{
// AV: again, no null pointer checks.
assert(pApdu);
// In case of e.g. TGT_A7, pApdu is pointing to a structure defined on the stack
// so pApdu->pBuf contains random data
#ifdef USE_MALLOC_FOR_APDU_BUFFER
pApdu->pBuf = (U8*) malloc(MAX_APDU_BUF_LENGTH);
#else
pApdu->pBuf = sharedApduBuffer;
#endif
return 0;
}

Here we see that there are two options depending on whether dynamic allocation is supported on the target platform. If it is, a buffer of 1454 bytes is allocated dynamically. If it isn’t, a static buffer is provided as part of the firmware image for use.

Now if we go back to GP_Select, we see that it calls smApduAppendCmdData to add the applet name to the apdu. Let’s look at the details of this:

U16 smApduAppendCmdData(apdu_t *pApdu, const U8 *data, U16 dataLen)
{
// If this is the first commmand data section added to the buffer, we needs to ensure
// the correct offset is used writing the data. This depends on
// whether the APDU is a standard or an extended APDU.
if (pApdu->hasData == 0)
{
pApdu->hasData = 1;
ReserveLc(pApdu);
}
pApdu->lc += dataLen; // Value
memcpy(&pApdu->pBuf[pApdu->offset], data, dataLen);
pApdu->offset += dataLen; // adapt length
pApdu->buflen = pApdu->offset;
return pApdu->offset;
}

Here again, we have a problem. We know that pBuf is 1454 bytes, but dataLen can be set up to 65536 bytes. memcpy will copy regardless of the value of any byte, unlike strcpy, which terminates on null characters. The only difference here is that we will overwrite either the heap, or somewhere higher than the allocated location of the static buffer, depending on the configuration at compile time.

The most obvious question is: how exploitable is this? The answer really is: it depends. In the use case we were considering, the applet name was always the hardcoded string “a71ch”, although NXP code often derived the length of this using strlen. It would therefore require an attacker to overwrite these values in order to achieve any meaningful exploit.

In more complicated scenarios, this could easily be exploitable. This is especially true as in an embedded context, the many exploit mitigations present in modern operating systems do not exist; the operating system itself may be very barebones and all code is running in ARM’s supervisor context (equivalent to x86’s “ring 0”).

We also noted a number of “assert” null pointer checks during this review. These are “no-ops” unless code is built specifically to be debugged and so do not perform any kind of safety check at all. Again, if you are familiar with computer security you will already understand why this is a problem; if not, the quick explanation is that null pointers indicate memory at “address 0”. An attacker can control a program simply by writing to this well-known address, using a condition where a null pointer is present. Most modern desktop operating systems deliberately allocate the null page, with permissions set to deny any kind of access. This causes applications to crash when access using a null pointer is attempted. Embedded systems may not include such defences.

On behalf of our customer, we reported all of these issues to the vendor NXP and the issues have since been fixed in the latest 1.06 build of the host library. Firstly, in smApduAppendCmdData a maximum payload length is computed

// The maximum amount of data payload depends on (whichever is smaller) 
// - STD-APDU (MAX=255 byte) / EXTENDED-APDU (MAX=65536 byte)
// - size of pApdu->pBuf (MAX_APDU_BUF_LENGTH)
// Standard Length APDU's:
// There is a pre-processor macro in place that ensures 'pApdu->pBuf' is of sufficient size
// Extended Length APDU's (not used by A71CH):
// APDU payload restricted by buffersize of 'pApdu->pBuf'
U16 maxPayload_noLe;
if (pApdu->extendedLength) {
maxPayload_noLe = MAX_APDU_BUF_LENGTH - EXT_CASE4_APDU_OVERHEAD;
}
else
{
maxPayload_noLe = APDU_HEADER_LENGTH + APDU_STD_MAX_DATA;
}

Armed with this information, the length is later checked:

// Value
if (dataLen <= (maxPayload_noLe - pApdu->offset))
{
memcpy(&pApdu->pBuf[pApdu->offset], data, dataLen);
pApdu->offset += dataLen;
}
else
{
return ERR_INTERNAL_BUF_TOO_SMALL;
}

Which prevents data being copied into the buffer unless it is within bounds. The deprecated code has been removed entirely.

We believe this should serve as a cautionary tale about the state of security of embedded devices, or “IoT”. With a lot of old code in use, and code written in memory-unsafe languages such as C, it is difficult to build high-assurance hardware. Indeed, it is very easy for even the best of programmers to write code containing such bugs.

This post does however have a silver lining. We have only good things to say about NXP’s response to our submission. NXP patched and issued a new release well within the standard 90 day disclosure deadline, and kept us updated throughout the process. NXP PSIRT’s response should be held up as an example of how companies should accept security reports

Details:

  • Affected version: 1.05 (prior versions may be affected).
  • Fixed version: 1.06
  • Reported to NXP: 2019-02-25
  • Fixed (software available): 2019-03-18
  • Software download: here.