Group Key Management for IM-style Chat Group

Shane B. Eisenman
CS4180 Network Security - Implementation Report
Last Updated: November 27, 2002


Download Code:

Summary of Implementation

The problem addressed in this project is how to provide efficient group key management as applied to IM-style chat groups.  As elaborated in the project proposal, any implemented solution to this problem should consider the issues of user authentication, message integrity, forward and backward secrecy, and minimizing third-party involvement.  The first version (V1.0) of this code successfully addresses these last two concerns, that is, providing perfect forward and backward secrecy and requiring minimal third-party involvement.  Providing user authentication and explicit message integrity is deferred to later versions. Additionally, the approach used here is quite scalable, featuring O(log N) key management complexity, which is quite important when dealing with the potentially large and highly dynamic membership a chat group implies.

An overview of the implemented framework is provided here.  The system consists of two types of entities, an Administrator Node (AN) and the Chat Group Participants (CGP).  Only one AN is required regardless of how many groups there are.  The AN is the bootstrap server for this system and each chat entity, compiled with the AN's static IP address, will contact it during an initial start up phase.  In theory, there is nothing stopping the AN functionality from being distributed in the case where large, active groups either create network congestion in the local area of the AN or tax the processing power of the AN.  The latter risk is limited by the minimal set of activities the AN must execute per CGP access. Briefly, the AN merely keeps group registration lists, which currently must be configured off-line.  Any query from a GCP is handled with 1 file access and 1 sent message.  At no point is the AN required to execute any cryptographic operation.  Within an established group, there is one GCP that plays a special role - the Group Manager (GM).  The GM is the GCP that first contacts the AN asking about a given group.  Once designated as the GM, one is GM for the life of the group.  In the current implementation there is no mechanism for GM authority transfer, although this is certainly possible.  Currently, if the GM leaves the group, the group is destroyed.  The GM is the key master for the group.  It maintains a binary tree of keys, a subset of which it distributes to each subsequent joiner of the group.  The lowest level of the tree contains the secret keys that only the GM and each respective GCP share, while the root of the tree contains the key that every GCP shares, the Traffic Encryption Key (TEK).  Intermediate levels of the tree contain keys that allow the GM to distribute the TEK to the users in a way that maintains perfect forward and backward secrecy.  These are called Key Encryption Keys (KEKs).  The current implementation uses a balanced binary tree, with a depth sufficient to accommodate 128 other GCPs, that is created and filled, excepting the lowest (user key) level, when the GCP discovers from the AN that it is the GM.  Non-GM GCPs have no special responsibilities as far as key managements is concerned.  

Application Feature Details

The basic operations on the group are create, destroy, join and leave.  The message format specifications for these operations are here.

As discussed previously, group creation occurs when the first GCP contacts the AN concerning a given group number.  When the AN receives a Group  Query message from a GCP, it first reads the 'grp' field of the message and determines whether a group with this number has been registered.  If so, it further checks to see that the requestor is registered in the group about which it is asking.  If so, it checks to see if there is a GM already assigned to this group.   If no GM was already assigned, the requestor is set as the GM.  In either case, the AN returns the GM's IP address to the requestor. Upon receipt of the Group Reply message, the GCP compares it's own IP address with that of the GM returned in the message, determining in this way whether it is the GM of the group or not.  If a GCP has determined that it is the GM, it creates a binary tree of the necessary depth and fills it with the initial TEK and KEKs. If a GCP is not the GM, it contacts the GM with a Join Request message at this point.

A Join Request message contains the secret key the GCP and the GM are to share.  Note that in V1.0 this key is sent unencrypted, which is of course not secure at all.  Further, the GCP sending the Join Request is not authenticated cryptographically in any way.  Upon receipt of the Join Request, the GM does verify that the requestor's IP address is registered in the group, but this provides only the most basic of security assurances.  A possible solution available for use in the next version  of this software is discussed in a later section.  The GM must perform several actions after determining that the requestor is a valid group registrant.  First it installs the joining GCP's secret key in the first available leaf node of the key tree.  Note that each key stored in the tree has an associated reference that consists of a unique ID, a version and a revision.  Then it sends a message to the active group members informing them of the new user and directing them to hash their current keys (TEK and KEKs), thereby incrementing the version number.  The GM does the same locally.  A SHA-1 hash function is used. Lastly, the GM returns two messages to the GCP. The first message contains the subset of keys the joining GCP will need to know.  This subset is defined as the the keys at the nodes that must be traversed when moving from the user leaf (GCP secret key) to the root node (TEK).  The second message is a list of the active member's IP addresses.

There are three possible scenarios under which a GCP may leave a group.  First, one can leave silently, that is, without notifying the group members.   Secondly, one can leave while announcing this intention to the group.  In both these cases it should be noted that the leaving is voluntarily.  For this reason no key changes are required from a security point of view since it is OK if the leaving GCP can still read messages, i.e. its keys are not considered "compromised".  In V1.0 these first two cases are implemented as follows. In the case of a vocal leave, the GCP notifies the group with a Leave message and then exits.  Upon receipt of the Leave message, the GM removes the leaving GCP's secret key from it's leaf, thereby opening a slot for a new prospective joiner.  Also, all GCPs receiving the Leave message remove the sender from their active member list.  By contrast it should be easy to see how leaving silently, although it does not compromise the security of the keys, is damaging to the group.  Although it is small issue that the silent leaver is still a member of each GCP's active user list, the bigger problem is that the silent leaver's secret key never gets removed from the key tree.  Each silent leaver decreases the group membership capacity by 1.  Lastly, a (non-GM) GCP can be forced out of the group, at which time its active membership privileges are revoked and the keys in the subset of the key tree it knew are compromised.  The decision about how to determine when to force a GCP out of the group is a group policy decision and not part of the key management strategy.  In V1.0, if any user complains about another, the latter is forced out of the group.  Obviously, this is a bit unreasonable, though it could provide for some amusing group dynamics.  Requiring that a majority of the active membership complain about a given user before evicting the latter seems like a more practical policy.  Once a member has been forced out, the GM removes the GCP's secret key from it's associated leaf node, and also throws out all keys the departing GCP knew (KEKs on the path from leaf to root, and the root (TEK) itself).  It then generates new KEKs for the appropriate nodes and distributes them securely using one message to the group.  It doesn't matter if the departed GCP reads this message because it will not be able to decrypt anything.  The way this is accomplished can be best explained through an example.  Imagine a binary key tree of depth 4, which supports 16 users, and nodes are marked with hexadecimal labels that define the range of participants knowing the key.

tree.jpgnewkeymsg.jpg

Assume C is leaving.  This means that the keys it knew (Key Encryption Keys CD, CF, 8F, and the Traffic Encryption Key 0F) need to be viewed as compromised and have to be changed in such a way that C cannot acquire the new keys.  This is done efficiently by following the tree from the leaf node corresponding to the leaving participant to the TEK stored in the root node, and encrypting the new node keys with all appropriate underlying node or leaf keys.  For our example, the tree in the above-left figure shows that the new Key Encryption Key CDnew (replacement for CD) needs to be received by D, CFnew by participants D, E and F, 8Fnew by 8...B, D...F , and the new Traffic Encryption Key 0Fnew by every participant except C. Instead of encrypting the new keys individually for each of the intended participants, we take advantage of the existing hierarchy:  
See the figure above-right for a graphical representation of the message sent out to the group.  Along the path to the leaving node's leaf, all new keys except the bottom two rows will be encrypted for their two children. The new key in the leaver's parent node will be encrypted once. This results in 2W-1 keys being sent out, where W represents the depth of the hierarchy.  Processing this multicast message will require at most W decryption operations from by the participants, with an average of less than 2 decryptions.

To destroy the group, the GM issues a Destroy message to the group warning them of the impending group dissolution, sends a GM Resignation message to the AN resigning its GM-ship, wipes the key tree from memory and then exits.  Upon receipt of the Destroy message, each GCP wipes it's keys from memory and then exits.  Only the GM can issue a Destroy message.  An interesting case to consider is when the GM leaves silently.  Assuming it wipes the key tree there is no security risk, and the remaining GCPs can continue chatting.  The only restrictions are that no new members can join the group and no member may be forced out.  In the V1.0 implementation there is no provision for a GM leaving silently.

Of course the most functional operation of a secure chat group is the actual chatting.  Outgoing text is encrypted with the TEK and decrypted with the same at the receiving GCPs.  An important question is how control and data messages are reliably delivered to every group member.  V1.0 assumes no multicast support and therefore opens/closes a unicast TCP session between the sender and every other member for each message sent.  This is clearly not scalable and this fact was the whole motivation behind multicast channels.  

Discussion of Security Features

V1.0 boasts (and sheepishly admits having) the following security properties.  The key distribution strategy of using a binary tree with levels of KEKs provides perfect forward and backward secrecy at all times.  A newly joined GCP cannot decrypt previous chat traffic because a cryptographically strong one-way hash function is used to increase the revision of all KEKs and the TEK each times a new user joins.  A GCP who is forced out of the group cannot decrypt susequent traffic because it is based on key versions of which it has no knowledge.  Since all keying material is explicitly erased from memory (not just freed) before the GCP exits, it is guaranteed that no third party can obtain information by viewing freed system memory.  Also, as stated earlier, no third party is involved with key management so no trust in such an entity is required.

On the downside, this implementation has the joining GCP send its secret key to the GM in plain view of even the most passive eavesdropper.  If this message is recognized, an attacker can also decrypt the messages coming from the GM to the joining GCP that contain a key subset and the active member IP addresses.  This allows not only decryption of the chat traffic until the TEK version is changed, but also allows the decryption of the new KEKs and TEK versions and thus continued decryption of traffic regardless of any action of any member in the group.   In addition it can do this without anyone being the wiser.  Of course, the "simple" solution is, assuming the necessary PKI, using public key cryptography to authenticate the joining member and protect the initial communitication of the private "session key" to the GM. For example the joiner could RSA sign a hash of his key and send this along with his key encrypted with the public key of the GM. Since the keys are relatively short this does not necessarily add much additional "cost" to the process of joining, and from here everything proceeds as already defined.

A slight privacy issue exists in that traffic between the chat group pricipals and the AN is unencrypted.  Although this does not affect the key management at all, it does divulge the IP address of the GM to the casual eavesdropper, which is something a non-registrant would not be allowed by the AN to obtain.  Armed with this knowledge an active attacker could focus a DoS attack on the GM (either by IP address spoofing or bombardment),  or even try to break into the GM to get at the key tree.  Again, this could be solved with public key encryption.

A larger issue with this implementation is that it essentially implements ECB DES even though the CBC mode of DES is used.  This is because the IV is currently directly derived from the key and NOT on the incoming ciphertext, as intended.  The reason for this is the presence of multiple sources in a chat group so that even if TCP is used to provide guaranteed and ordered transmissions between a <source,destination> pair, traffic from different sources is necessary interleaved at any given receiver.  It is possible to keep state that binds a source with the most recent IV from that source.  Though this can be expensive in a large group, perhaps this is a necessary cost to pay.  Note however, that there may be other ways around this if DES is not used for the traffic encryption.

Since chat traffic privacy is provided with a symmetric encryption scheme, user identity verification within the group is not immediately available.  Although the current implementation provides the IP address as a way to follow the chat flow, this alone cannot be relied upon due to the possibility of IP address spoofing.  One possible solution is append a RSA signed SHA-1 hash of the message content to the message and (optionally) encrypt this with the TEK.  This would provide verification of source identity for each chat message as well as a guarantee of message integrity.  Due to the extra cost this implies, it could be enabled as an option in groups that require these features.

Future Work


The following are additional features that do not directly affect the security of the key management scheme, but provide improvements to make the chat application itself more robust or efficient.

The damage a  silent leaver can cause was described above.  This issue can be solved by having the GM send out periodic "heartbeat" messages to which GCPs must reply to maintain their leaf nodes in the tree.  Though this increases the messaging overhead O(N), it scales in the sense that it solves an a more important "O(N) problem" so maintaining maximum group capacity is probably worth the extra messaging cost for groups of practical size.  

A problem with a symptom identical to the silent leaver is that of the multiple joiner.  In V1.0, it is allowed for the same IP address to be bound to i<=N leaf nodes in the key tree, which effectively blocks i-1 other users.  This is a trivial problem to solve.  The GM already keeps list of active GCPs so it would be simple to check this list when it received a Join Request, and then not allow a user to join from the same IP address more than once.  Of course, in a sort of DDoS scenario a member could still use multiple machines or IP spoofing to occupy all the leaves.  However, it should be noted that only valid group registrants are allowed to join at all, so the risk of either of these "attacks" is low.

Currently, incoming messages are not filtered by group.  This can become a problem if a user is running two instances of the Chat entity, each in a different group.  In V1.0 this is not really a problem because it requires the GCP to serve a particular port, which precludes two chat entities running on the same machine, but it could become an issue with a more flexible implementation.

It is usually desireable that a GCP who is forced out of a chat group should not be able to rejoin the group, either until its dissolution or for a certain amount of time ("go sit in the 'timeout' corner").  The current implementation has no provisions for keeping track of IP addresses that were recently expelled from the group.  This means a user can just quickly rejoin the group and start bad-mouthing Linux again, for example.  It would not be difficult for the GM to maintain a list of "barred" users, and only slightly more complicated to keep track of the approximate time they were forced out.

V0.1 uses TCP sockets so guaranteed delivery is assured.  In the case where TCP sockets are not used it is possible for a GCP to miss Key Revision Update and Key Version Update messages from the GM.  To solve this problem, the software should check the revision and version fields of the key identifier (this information is part of every chat message) and take action if it is discovered the key being used to encrypt the message is of a higer version or revision than the GCP is using.  In the case of key revision, the GCP simply hashes its keys the necessary number of times to bring it up to date.  The case of a missed key version update is more involved because the GCP must contact the GM directly, encrypting the resend request with his secret user (leaf) key to prove his identity, and have the GM resend the new keys.  A malicious group insider can make you hash your keys to a higher version than the group is using by sending a message with incorrect information in the key identifier fields but once this is discovered by the other member of the multicast group the malicious user can be expelled.

Finally, the maximum group size allowed in this code version is 128.  While this should be sufficient for most chat groups, it is possible to allow for larger groups by merging existing groups under a new GM.  Actually, the GM need not actually be a different IP address.  What needs to change is the  key tree.  One additional level will be added (a new root node) and the previous roots will become the two children of this new root.  The existing GCPs of each group only need to be informed of the (possibly new) IP address of the GM and the new root node key (TEK).  In this case backward secrecy is still maintained (the merged groups cannot read each other's traffic from before the merge) because the KEKs and previous TEKs are still distinct.