How we do it?

Basics

People always safeguard their precious belongings. When it comes to a valuable object, we want to keep it safe, but not in a manner that makes it inaccessible or difficult to retrieve. Safeguarding sensitive information is no exception to these requirements. The good news is that, it is possible to perfectly hide any information content, while keeping it easily accessible by the owner.

Any information content can be represented as a collection of bits, and each bit of information can be perfectly concealed behind another bit that acts as a key. To securely communicate a bunch of bits conveying a message from one party to another, it suffices that the two parties have access to a key with the same number of bits as the message they want to communicate. Well, at first, this seems to be a traditional chicken-egg problem, meaning that, communicating the key seems to be as difficult as communicating the message itself. Fortunately, this is not the case. There is a big difference between the bits forming the key and the bits forming the message. Message bits have to keep their original value, but as far as the key bits are concerned, all that matters is that the keys at the two parties are the same; it does not matter what is the exact content of the key. Although this property simplifies the issue at hand, it still requires a mechanism to make the same key, regardless of its exact bit values, available at the two parties.

Information theoretical results confirm that, to have perfect secrecy, the key should have some special features:

All the bits forming the key should be truly random and independent of each other; like flipping a coin several times and recording the sequence of head/tail at the two parties.
Key should not be reused.

Information theory proves that, subject to above conditions for generation and distribution of security keys, it is possible to achieve perfect, also known as unbreakable, secrecy. Having a mathematical proof is very promising, as it guarantees, regardless of what may be in the horizon, a secret safeguarded in this manner will remain secret. This means security is guaranteed regardless any future advances in computer technology, and regardless of any future theoretical discoveries in breaking security locks. All in all, it will be absolutely impossible to break the safety provided by this locking mechanism.

Relying on new discoveries in information theory; and after many years of hard work on bridging the gap between theory and practice in information theoretical security, we have been able to find practical ways to fully unleash the enormous strength of unbreakable security. In other words, we have solved the long standing chicken-egg problem of perfect secrecy; how to create the same string of key bits at two different physical locations, without communicating or disclosing any information about it. This operation mimics telepathy, or thought transference, between two locations. The two parties manage to reach to the same key without ever communicating with each other, or talking to others, about the key content; it is like being able to magically read each other’s minds.

You may ask yourself, if this is the right way to have a perfectly secure system, then what about all these other methods that are currently in use? The answer is that; techniques that are currently used to generate and share security keys are neither random, nor secure. These techniques mostly rely on solving a difficult mathematical problem that ultimately provides the two parties with the same answer for the key. Although the mathematical problem is designed to be difficult for any party other than the legitimate ones, it is not by any means impossible to solve. With computer technology moving forward at a lightning speed, there is a good chance that what is considered today to be difficult, and consequently secure, will be a toy problem for computers of tomorrow.

In addition to severe limitations on actual secrecy mechanism used in current systems, as explained above, there is another major shortcoming regarding actual privacy. The problem is that, on top of all the faulty mechanisms used to secure the data, an equally serious problem is that all the tools required to break their faulty security lock are themselves recorded and safeguarded in another vault. It is like locking a vault and safeguarding its key in another vault, and then safeguard the key of the second vault in a third vault. The reality, regardless of how many such chained vaults we use, is that none of the vaults will be truly secure as there is a key somewhere hidden, even if it may appear to be hard to get, that can open that vault. In summary, current systems are neither secure, nor confidential/private. Our technology addresses this second shortcoming concerning privacy as well. Our technology guarantees its users unbreakable security and absolute privacy, meaning that, our technology does not require storing the credentials that could open the security lock belonging to its users. In other words, we do not need, and absolutely do not have, access to your private information to safeguard it. In other words, we keep your information absolutely confidential, hidden from all including ourselves. This is to fulfill our commitment of keeping your confidential information hidden from anyone other than yourself, its rightful owner.

One piece of the puzzle remains. This remaining piece concerns a data transmission mechanism that would keep up to similar standards of unbreakable secrecy and confidentiality. In other words, we do not want to leave any loose end that would act as a bottleneck to achieving perfect secrecy and confidentiality. Communication channels currently known and treated as secure are based using encryption keys that are not secret at all, just hard to guess. Another serious problem is that, the details of currently used encryption keys are known to systems/people creating them in the first place, and there is no guarantee that this knowledge cannot get into the hands of individuals with mischievous intentions. This is like the case of using several vaults, each safeguarding the key to another one, if someone manages to open one of these vaults, the entire chain will be compromised. In other words, the chain is inherently breakable, as it relies on certain vault mechanism to protect something that is recorded. What is the solution? The solution is to avoid recording of any useful information altogether, and this is exactly what our technology does.

In summary, current encryption and authentication mechanisms rely on safeguarding something that is recorded. The fact that a record exists somewhere turns such systems vulnerable, and regardless of how impenetrable one may think their vaults are, the fact that there is a vault with something valuable inside it, renders such systems potentially breakable. Regardless of how many guarding mechanisms they use to safeguard the valuables, the problem will be always with the weakest link. We avoid this problem by avoiding to store/record any useful information. Hackers, regardless of what they do, can never break our vault as simply there is no vault to be broken.

Authentication

Authentication is the cornerstone of security. Noting the shortcomings of known authentication techniques, we have designed an effective and user friendly authentication apparatus based on the following innovations:

Real-time, adaptive multi-factor authentication: Legacy two-factor authentication systems typically send a message, such as an SMS, containing an authenticating key-word (numerical value) to a first device belonging to the client (typically a cell phone), asking the client to close the loop by manually entering the key-word into a client’s second device that is seeking connection to the server. The delay in closing the loop in such legacy setups provides the ground for malicious acts. In addition, there is no mechanism in place to detect suspicious behaviors that may be indicative of malicious activities. In our setup, the loop is closed automatically in real-time. Closing the loop in real-time enables us to accurately monitor the events occurring within the loop, enabling our authentication system to detect suspicious activities. In addition, our authentication system interacts with the user in real-time, and thereby adapts its functionality to particular circumstances. For example, relying on machine intelligence, the authenticator can ask the client to move his/her head within the frame used for face recognition and then accurately monitors the reaction of the client in order to verify the client seeking access is genuine.

Mutual Authentication: Another difference with legacy structures is that our authenticator verifies the authenticity of the client as a person, as well as the authenticity of client’s devices used to access the service. Each of client’s devices is specified by a unique identifier, which is then verified against server’s records prior to granting access. Each device’ identifier is analogous to a password used for that particular device. The important point is that, in our system, the identifiers associated with different devices are automatically changed over time without disclosing any information to an outsider that may be monitoring the activities for malicious purposes. Another distinction is that, our overall authentication strategy authenticates each and every node, from sever(s) to client and to client’s devices, against each and every other node. Again, the credentials for such a distributed authentication are truly random, change over time and are established without disclosing any information to an outsider. Methods explained later for random key generation form the backbone for generating authentication credentials and for automatically changing these credentials over time.

Machine learning for detecting suspicious activities, and accordingly adjusting authentication level: We rely on machine learning, while accounting for auxiliary algorithmic inputs such as time and location, to adjust the complexity of authentication procedure depending on the circumstances. In this manner, we can achieve a high level of client convenience without compromising security. Details of various authenticating factors, from interactive face recognition and speaker identification to the number of factors in automated multi-factor authentication, are dynamically adjusted and accordingly more authenticating steps are added in order to provide the right balance between client’s convenience and processing delay without slightest compromise in security considerations.

Distributed Key Maintenance

First level of data encryption for storage – Client-based Encryption: Data to be securely stored will be first encrypted within a client’s trusted device. This is called client-based encryption. The encryption key for client-based encryption will be regenerated within the client’s trusted device every time that it is needed. The key will not be stored, and it will never be accessible to any other device(s). Once the data is encrypted, it needs to be moved to the server side to be stored.

Securely sending the data resulting from the client-based encryption to the storage server: For the purpose of communicating the encrypted data, we form an encrypted channel between client’s device and our storage server. The encryption for this channel is realized using truly random keys that are established at the two ends without disclosing any information to nodes that may be monitoring the underlying information exchanges for eavesdropping purposes. The encryption keys for communications are dynamically changed over time. It should be added that our encrypted communications channel is built on top of the legacy encrypted channel produced by the underlying standard, such as TLS and HTTPS.

Second level of data encryption for storage – Server-based Encryption: The data to be stored on the server side is already encrypted, with a key that is not stored at all – the key can be only recovered at the client’s side by the client’s trusted device. As a second safeguard mechanism, we apply an additional layer of encryption, the so-called sever-based encryption, to the data prior to storing it on the server. The key for the server-side encryption is generated randomly using a combination of our key generation algorithms. The key is then optimally divided into several key segments, and each segment is stored on a different storage server on the internet including Box, Dropbox, Amazon, and Google Drive. Our main storage server in on Microsoft Azure, which also keeps one piece of the key used for server-based encryption. The dividing of the key into key segments is based on our proprietary technique and is optimum from an information theoretical perspective. Optimum here means, all key segments are needed to recover the main (composite) server-based encryption key, while missing a single key segment is equivalent to not having access to any information about the composite key. In addition, the same as the case of all other keys used in different parts of our system, the sever-based key and all its associated key segments are regularly and automatically updated over time without disclosing any information. The net outcome is that, in order for a hacker to access the server-based encryption key, all the storage servers involved in our system, namely “Microsoft Azure” as the primary server and “Box”, “Dropbox”, “Amazon”, “Google Drive” as the auxiliary servers, should be hacked at the same time.

Key Generation

The process of key generation involves generating two identical strings of binary numbers at two different locations. The binary numbers are random in the sense that their values are unknown before being generated. In addition, the two strings should be independently generated at the two locations without disclosing any information to potential observers that could be eavesdropping. In other words, it should be impossible to eavesdrop the content regardless of the level of sophistication and abundance of resources available to potential hackers. It is also important that the binary numbers are generated truly randomly. In particular, the numbers cannot be generated using a pseudo-random algorithm; i.e., there cannot be a recipe-like procedure for generating the binary numbers. The reason is that any such pseudo-random algorithm would ultimately rely on some deterministic steps, and as a result, its operation could be potentially reproduced elsewhere by a hacker who has managed to retrieve the underlying recipe. In other words, to start with, there should not be any recipe with reproducible steps; if such a recipe exists, it can potentially be stolen and reproduced. Next, we explain how we have managed to realize such an apparently impossible task, i.e., task of generating a truly random key at two separate locations without disclosing any information.

In addition, upon generating the key, its randomness is further enhanced by relying on proprietary techniques for privacy amplification such that NIST randomness criteria are fully satisfied. Sophisticated proprietary algorithms, based on information theoretical arguments, are used to combine the generated keys with the relevant history and other client specific information to further enhance resilience to eavesdropping. In this manner, truly random and undisclosed keys, used as shared secrets, are generated in support of both client-server connections as well as in scenarios that several distributed nodes are required to exchange information. In the case of having multiple distributed nodes, the algorithms result in generating the same random bit string at all nodes without disclosing any information. In all these setups, the generated keys are updated on a regular basis, as well as in situations that some of our monitoring agents detect some suspicious behaviors and accordingly require that some of the keys are immediately updated. In all cases, the “concealment” as well as the “randomness” of the generated keys are guaranteed based on both mathematical proofs and compliance to relevant tests set by NIST. Our technology relies on several innovations for key generation to be explained next.

Key Generation Relying on Reciprocity: Communications systems relay on some form of connection between points to transmit information. This is generally called a communications channel. Examples are transmission over the Internet where the channel are primarily composed of (fiber) optical links and include several intermediary hardware components such as switches and buffers that control the flow of data among participating nodes. Another example is wireless channel in which information is converted to electromagnetic wave and transmitted over the air. Most communication channels, including examples provided above, are based on using a transmission medium with a physical property called reciprocity. As a simple example, in transmission over a link connecting a point A to a point B, the time that it takes for the signal to travel from node A to node B is equal (or approximately equal) to the time that it takes for the signal to travel from node B to node A. In wireless transmission, the changes in the phase of the radio frequency (RF) signal when travelling from point A to point B is the same as the changes in the phase when traveling from B to A. The numerical value of the quantity with reciprocity property is typically affected by numerous unknown factors, resulting in exactly what we need: a random variable that can be measured at two physically separated points. This is a cornerstone of our technology, we have found creative ways to: (1) identify such reciprocal quantities, (2) enhance their randomness, and (3) remove any mismatches between the two measured values, and we do all these steps without disclosing any information about the final outcome.

As another example for reciprocity, let us consider the scenario of rotating a roulette wheel. The wheel can be turned in two directions, clockwise and counter-clockwise. Now if the wheel in turned with exact same force in either directions, the number of turns and the final outcome turns out to the same. The point is that, a phenomenon similar to turning of a roulette wheel exists in transmitting signals between the two ends of a communication channel. To explain the randomness, let us consider a scenario in which a shape similar to a roulette wheel is painted on the side of a car tire, and the car travels between two cities. The tire turns thousands of time and the exact number of turns is determined by the details of the route taken by the driver, each and every small turn of the car affects the final outcome, meaning that it will be a random quantity. Obviously, if the distance between two cities is traveled in either directions, i.e., from city A to city B, or vice versa, the number of times the tire turn will be the same if the driver takes exactly the same routes in both directions, and the road conditions remain the same in both directions. The important point is that, due to extremely high speed of sending signals over communication channels, the above requirements can be satisfied resulting in a pair of reciprocal random variables. For example, in the case of wireless transmission, the phase of the RF signal is affected by each and every object within the propagation path. The overall phase depends on the details of the propagation path resulting in a random variable that depends on the exact position of the observation point and will be independent from one observation point to another that is only a few inches away (order a wavelength). The positions of objects affecting the signal phase are unpredictable, and vary in an unpredictable manner over time. In spite of this feature, as the signal propagates with the speed of light, the back and forth trip between two ends of a communication channel (points of measuring the reciprocal random variables) will take a small fraction of a second during which the environment effectively remains completely still, and exactly the same from the perspective of the two measurement points.

Key Generation Relying on Error and Packet Loss: Error in transmission has been always an unavoidable nuisance in communication technology. We have managed to benefit from transmission error in key establishment. In other words, we have managed to turn something considered totally undesirable in one context into something very useful in another. To explain this concept, let us consider the scenario that there are many cans of paint of different colors. The cans are numbered sequentially without any consideration of their color. In other words, the numerical value written on each can does not convey any information about its color. Let us also assume there are three identical copies of each can, where the numerical value specifying the can is repeated on all three copies. Now assume a collection of such cans are thrown into a turbulent sea and two parties (considered legitimate parties in a key exchange setup) set out to salvage as many cans as possible. An illegitimate party (eavesdropper) sets out to do the same. Each legitimate party publicly announces the numerical labels of the cans that it has recovered. Then, the two legitimate parties, knowing the labels of the cans available to their counterpart, select the largest subset that is available at both of them. Then, each of the two legitimate parties mixes the colors corresponding to the selected cans, resulting in the same color being obtained at both ends from the mix. The lost cans are analogous to channel errors, and the final outcome of the mixing operation is analogous to the key. The important point is that, if the channel error is reasonably high, and the number of cans used in the experiment is sufficiently large, the chances that the eavesdropper gains access to a copy of all the cans that are being mixed at the two legitimate parities will become extremely small. Note that, if needed, legitimate parties can request re-transmission of lost packets (cans that have not been recovered), however, any potential eavesdropper cannot benefit from this feature without being detected. We have managed to construct a structure similar to what is explained above in transmission over the Internet as well as in wireless communications.

Key Generation Relying on Common Randomness: The concept of common randomness in Information Theory refers to scenarios that a given informative content can be accessed by a number of different observers. The main criterion is that, each such observation is unavoidably contaminated with some measurement noise, and as a result, although different observations will be similar, no two observations will be exactly the same. To explain this concept in simple words, let us consider the case that the informative content is a piece of music written in musical notes, but played with different bands at different times. The measurable observation will be the sound generated by each such play, although the generated musical sound in all cases will be similar, no two plays will result in exact same sound waveforms. Information theoretical results indicate that, if two such sets of random variables are provided to two separate nodes, one set at each node, then the two nodes can collaborate over a public channel (without disclosing any information) and reach to two smaller sets that are exactly the same. In a sense, through collaboration, each node can extract the essence of the similarity between the two contents (capture the similarities and condense it in two identical bit streams) and separate it from dissimilarities that in each case acts as an observation noise. Another example is the case of extracting numerical values from a bio-metric signature, for example, numerical attributes that are extracted form a face to be used for the purpose of face recognition. If the face of a person is presented at different times to the same face recognition camera/engine, the numerical attributes extracted at different experiments will be similar, but not exactly the same. We have managed to extract keys by processing sources of common randomness. In particular, two sources of such common randomness are face/voice recognition and speed of typing of a client. In both cases, the newly extracted numerical attributes at the client side are used in conjunction with a dictionary built at the server side to provide the two contents which are similar, but not exactly the same. Through innovative designs, we have been able to bring the existence results predicted by information theoretical arguments to practical extraction of a common encryption key.