Our previous post, introduced the architecture and components of E4, Teserakt’s encryption and key management for IoT systems. This post is about one of these components, the automation engine, which we’ll describe more in depth today.
Keys are not forever
Devices managed by E4 rely on symmetric or public-key cryptography to protect their communications. In both mode, each device has its own identity key, plus one one key for each topics (you can see a topic as a "conversation", as a data type, as a classification level, and so on). Since devices may be deployed for years, there must be processes to renew these keys remotely and securely. Motivations for rotating keys include:
Revocation of a key that has possibly been compromised, or is not to be trusted for various reasons
Provide forward secrecy, that is, guarantee that if a key is compromised at some point, then earlier communications (with a different key) cannot be decrypted
Provide backward secrecy, or post-compromise security, for example in order to guarantee that, would some topic key be compromised, future topic keys remain secret, and therefore content protected with these keys remain protected.
Such key rotation may be done manually, or partially manually by creating some scripts, creating a custom network service, and so on, and updating the script manually when needed. This is a cumbersome and error-prone procedure that is unlikely to scale and to prove reliable in critical environments. We therefore propose an automation system that is simple to use (both through a graphical UI and scripts), and that easily scales to many topics and many messages.
The E4 automation engine address exactly this problem. It automates the rotation of any E4 keys by defining rules following a simple grammar, depicted below:
In the context of our key rotation automation, we then use following terms:
- Rule: a rule is the main component of the engine. It holds a list of triggers and targets controlling when the key rotation will happen, and which device or topic will be updated.
- Trigger: a trigger is a condition that must be fulfilled in order to execute a key rotation for all the parent rule’s target. A trigger can be a predefined period of time, or a number of system events, such as a threshold of clients joining or leaving a certain topic.
- Target: a target designates either a client device or a topic, for which the key will be updated when one of the rule’s triggers will have its condition fulfilled.
When to rotate keys
We currently support basic use cases such as:
- Rotating a device or topic key at a fixed time interval
- Rotating a topic key after a certain number of clients joined or left this topic
Those cover the most common scenarios requiring a key rotation. We’ve also carefully designed our engine and rules format to be flexible and easily extensible. If you have any other use cases which are not actually covered, we’d be happy to hear about them!
We’re often asked what’s the right time interval for key rotation. There is no single right answer, for it depends on a number of factor including the threat model and risks, the network reliability, the cost of sending control messages, etc.
How it works
As seen in our previous architecture post, the automation engine extends the C2 server functionalities. When enabled, it starts an internal scheduler that monitors the time-based rules, and also registers a channel on the C2 server to receive system events. On each events, the automation engine will check from the defined rules if any triggers are due, and request a key rotation for each of the rule’s targets on the C2. The trigger state then reset and is waiting again for its conditions to be met to fire again.
Client key rotation
A client’s key is its more critical key, and should ideally be better protected than topic keys, which are shared with other devices. In the symmetric key mode, the client’s key is also known to the C2, and used to protect control messages sent to that client. One of thes commands supported by control messages is SetIDKey, which offers a new client key to a remote device.
When a client key is rotated, the C2 server thus issues a SetIDKey command, containing a newly generated key, and protected with the current client key. The command is then transmitted to the client via the MQTT broker, by publishing it on the client control topic.
In the public-key mode, the C2 only knows clients’ public keys, not their private keys. Therefore client keys can’t be rotated in the public key mode.
Topic key rotation
When a topic key is rotated, the C2 server first generates a new random key, before issuing a SetTopicKey command for every subscribed clients on this topic. Each command is then protected, using the client key of its recipient, before transmitting on each client control topics.
When a topic has many subscribed clients, there might be a significant delay between the first and last SetTopicKey commands transmission (and thus reception). Some clients may thus use the old key while others are already using the new one, preventing the decryption of messages in each way. To avoid message loss, we thus need a key transition mechanism, which we implemented by defining a configurable grace period.
The grace period is a client parameter, which defines the duration during which both the old and new key can be used, starting from the time when a new topic key is received. The client will store the newly received key, and keep the old one for a configurable amount of time. The devices then does the following:
When receiving a message, it first tries to unprotect it with the new key. If this fails (as indicated by invalid authentication tag), the client then tries with the old key if it is still within the validity period.
When transmitting a message, the new key is always used. This might prevent some devices from reading the message, however this is safest behavior in case the old key was compromised. Furthermore, using the old key would also prevent messages to be decrypted by clients whose grace period has passed.
Again, when we’re asked what’s the best grace period value, our answer is that it depends on several factors, including the number of clients subscribed to a topic, how often keys are rotated, the network’s latency and reliability, and so on. Ideally, the grace period should be chosen after analyzing empirical data. The best value of the grace period is one that sufficiently minimizes messages loss, while minimizing the amount of messages encrypted using the old key (thus, the shorter the period, the safer).
Key management, and more particularly secret rotation, can be a tedious task. With E4, we try to make it transparent, fail-safe, and easy to configure and operate. You can configure your device and topic keys rotation, either from the dedicated page on the web console, or from the command-line client, allowing to create or view active rotation policies.