By: Dr Najwa Aaraj, Chief Researcher, Cryptography Research Centre /Acting Chief Researcher, Autonomous Robotics Research Centre
Cryptography and machine learning have typically been separate disciplines. New cross-discipline research at Abu Dhabi’s Advanced Technology Council’s Cryptography Research Centre (CRC) promises to improve both domains. The CRC has already launched an ambitious cross-disciplinary secure cloud computing programme. In the long run, this will lead to better security and enable AI systems to safely leverage confidential data.
Applying machine learning to cryptography can improve the tools for analysing codes and lead to smaller, more efficient cryptographic algorithms. This could open up opportunities for securely running AI models on smaller IoT devices.
New AI and machine learning algorithms are also enabling better tools for detecting vulnerabilities in how cryptographic algorithms are physically implemented that can be leveraged by side-channel attacks.
Applying cryptography to machine learning can allow teams to securely train machine learning models. This could help drive the adoption of AI in privacy-sensitive industries, including medicine and finance.
New cryptographic techniques are also being developed to protect machine learning algorithms from poisoning, bias, or the extraction of training data. This promises to improve the security and safety of AI algorithms used for autonomous vehicles, industrial processes, and edge computing applications.
Improving cryptanalysis with machine learning
Cryptography pioneer Ron Rivest first remarked on the similarities between cryptography and machine learning in 1991. The two fields share similar workflows and, in some ways, represent different ends of similar processes. Machine learning helps find patterns hidden in seemingly random data, while cryptography makes ordered data seem random.
However, despite these similarities, both fields evolved down different paths. Things began to change when the success of new image recognition algorithms like AlexNet catalysed an exploration of deep learning algorithms for other use cases. In 2019, researchers in Germany demonstrated the first cryptographic distinguishers for discerning when data are hidden in seemingly random noise.
This inspired research at the CRC and the Politecnico di Torino in Italy to develop tools for analysing the performance improvement possible with various deep learning techniques. This research could pave the way for the broader adoption of deep learning in cryptography. The team is currently exploring how deep learning and conventional cryptographic distinguishers can be combined for optimal results.
Improving side-channel analysis with machine learning
Machine learning excels at finding the relationships in data that no one had previously considered relevant. This is a perfect use case for identifying side-channel vulnerabilities that arise from how algorithms are implemented and run on physical systems.
Common side-channel attacks analyse how known calculations affect phenomena such as time variations, energy emissions, power consumption, or temperature to break the cipher and recover keys. For example, in 2018 several side-channel vulnerabilities were discovered in popular CPUs, including Spectre and Meltdown long after the chips had been released for general use.
Machine learning can help in various steps in launching side-channel attacks, including classification, clustering, feature engineering, and pre-processing. Side-channel attacks commonly consist of two phases for identifying vulnerabilities and exploiting this vulnerability.
Machine learning techniques show the most promise in identifying vulnerabilities and can sometimes outperform attacks that iteratively look for susceptibility to well-known side-channel vulnerabilities. Researchers are also exploring how reinforcement learning techniques can automatically tune neural networks to further improve and automate side-channel analysis.
Improving security on limited hardware
There is extensive interest in running machine learning and AI algorithms on medium and smaller embedded systems or IoT devices. Traditional AI tools for big data tend to lose classification accuracy when running on energy-constrained devices.
Researchers are exploring various approaches for generating more energy-efficient neural network architectures. For example, a team of Princeton researchers has developed a novel neural network methodology called SCANN that can generate very compact neural networks without losing accuracy for medium and small data sets. Early tests were able to significantly shrink the size and number of connections in models with little drop in accuracy. This could make it possible to implement cryptographic algorithms on smaller devices with limited connectivity and power supplies.
Enterprises have been using various machine learning algorithms to improve anomaly detection and identify attacks as part of Intrusion Detection Systems that form a key part of modern security infrastructure. Researchers at Princeton and the CRC recently developed an approach called SHARKS to extend threat detection to physical infrastructure. This work proposed a novel method to build a graph data model of the behaviour of cyber-physical attack patterns and then use machine learning techniques to identify new vulnerabilities. The team also developed techniques for creating and analysing attack graphs to identify novel vulnerabilities in 5G networks.
Down the road, these kinds of techniques could achieve similar goals across much larger hardware, software, and network stacks.
Securely training AI algorithms
AI and machine learning algorithms show tremendous promise in areas like medicine for diagnosing disease and recommending the best course of treatment. At the same time, most countries have strict privacy regulations that restrict flow of medical data even among trusted partners. Similar constraints limit the use of other types of personally identifiable information that might improve AI models for financial management, customer service, or enterprise operations.
The CRC recently unveiled the UAE’s first secure cloud technologies programme. The group is developing tools and best practices for applying privacy-enhancing technologies to improve collaboration and analysis around sensitive data. For example, techniques such as federated learning, multi-party computation, and homomorphic computing can train AI algorithms while keeping confidential data secure.
Federated learning provides a way to distribute data processing across multiple computers. For example, a voice recognition model on a smartphone could make local model updates in response to user feedback. Then all the smartphones could send model updates to a centralized coordinator without including any personal data. The various updates are integrated into one global model update sent down to improve voice recognition on everyone’s phone. Similar techniques could run on computers managed by different hospitals or research organizations. In this scenario, secure Multi-Party Computation (MPC) can provide privacy also for the model owner. All computations can be done interactively between the nodes and the centralized coordinator without jeopardizing the node’s and coordinator’s inputs.
Fully homomorphic encryption (FHE) runs various computing processes on fully encrypted data. In this case, data are encrypted by each node, which could be a bank or a hospital, and sent to a central server. Specially crafted algorithms train the machine learning algorithms without ever decrypting the data. This approach is the most convenient for constrained devices since it is non-interactive. Unlike federated learning or MPC, nodes are not involved in the data processing task. However, much work is required to improve the performance of these systems. It is also essential to analyse the different ways that bias can creep into these systems since no one person ever sees all the data together.
Protecting machine learning models with obfuscation and trusted computing
Researchers are in the early days of discovering new categories of security vulnerabilities unique to AI and machine learning algorithms. Early examples include techniques to steal confidential training data, poison AI models, and extract machine learning models without permission. This creates the need for data confidentiality techniques, new algorithmic defences, and algorithmic integrity measures.
New architectures such as multi-party computation and FHE can protect the confidentiality and privacy of training data. But new attack techniques such as inversion can also recover some or part of the dataset used to train the model. For example, hackers may recover faces used for training a face recognition model.
New cryptographic primitives could help ensure confidentiality of both the data and the model during training and classification. Developments in trusted computer hardware infrastructure could enhance these implementations. Other research is exploring new cryptographic techniques to prevent adversarial machine learning attacks that manipulate data to deliberately cause misclassification.
New combinations of machine and cryptography are vital for protecting AI systems as the field evolves. This kind of research requires collaborations across various traditionally separate domains, including security, cryptography, hardware, machine learning, and autonomous systems, to keep pace with AI security and the rapid discovery of new risks.