For decades, Artificial Intelligence (“AI”) has served as an inspiration for Hollywood’s latest heroes and bleakest dystopian futures. More recently, AI systems have taken on the role of personal assistants: our phones and computers, cars and planes, homes and cities, they are all powered to an ever increasing extent by AI systems. Even your TV remote will soon understand when you speak to it.
In most of these cases, the instructions given to the AI system are within the confines of human predictability, based on an individual’s consent or what can be deemed to be the data controller’s legitimate interest. The assumption is that AI systems use personal data to make things easier and that individuals should be able to choose how their data is used.
However, AI systems have now become so capable that some worry that they are starting to replace us: whether it is recognising faces, diagnosing medical conditions or even reading emotions, AI systems sometimes appear to be a better or at least a relevant option. They have even taken on art – AI compositions, poetry and paintings are perhaps not quite there yet, but they are not far off either. AI systems that decide whether bail should be granted are, perhaps, a natural extension of this trend. However, when they take up our flaws and biases and apply them unjustly in broader and more efficient ways than we ever could, there are substantial ethical, cultural and legal questions to be answered.
Comparatively little has been written about AI’s interplay with privacy laws, despite the central role AIs have played in our lives over the last few years. This is surprising, considering the types of information AIs learn about us, our preferences and propensities. For example, when and to what extent AIs are regulated by the General Data Protection Regulation (“GDPR”) could be looked at in more detail.
To see what Privacy, as a discipline drawing on law and governance, can offer in the context of AI, it is worth trying to understand what the term “Artificial Intelligence” means. There are currently no universal AIs that can deal with all kinds of tasks equally well. Rather, various approaches to machine learning (“ML”) are used for highly specific purposes. Significant time is invested tweaking and training these ML models before they are deployed in a production environment. To increase their effectiveness, sometimes different ML approaches are layered on top of one another, so that in order to recognise a face, for example, first an ML model that specialises in recognising eyes is deployed, followed by another for the nose and mouth, and then another to analyse the face as a whole.
With this (extremely basic) outline of what the term AI means (using ML and AI interchangeably here), the next aspect that is important from a privacy perspective is how information is processed and retained within ML models.
While approaches differ depending on the type of ML used (e.g., see here for how this works in neural networks), for the purposes of this blog post the key points are that:
- information is retained in the (trained) ML model; and
- information and correlations are represented as vectors which are not easily translatable into a form that humans can understand.
These two points raise interesting questions: If information is retained in the internal structures of a trained ML model, can it be read out? The answer is yes, in certain circumstances. By way of example, researchers were able to attack a facial recognition ML model in a way that allowed them to reconstruct pictures of faces using only the probability scores that the ML model gave as output. A similar attack was demonstrated on an ML model that was trained to suggest appropriate medical treatments for patients. Here, researchers were able to infer information about the genetic markers of the individuals whose data was used to train the model. There is thus a risk that (some) trained ML models contain personal data.
Selling such ML models would require the data controller to comply with its privacy obligations as to notice and appropriate legal basis which may be difficult, particularly in the context of sensitive personal data and biometrics. Further, where a service provider trains an ML model on behalf of another company, that service provider may be a data processor and the (trained) ML model may have to remain under the data controller’s control – in other words, the service provider would not be allowed to sell the trained ML model even though being able to do so is a crucial part of the service provider’s business plan. Other disciplines can provide valuable tools and insights here (e.g., the concept of differential privacy).
Another question comes up in the context of profiling: If the authors of ML models have trouble understanding how their ML models arrived at particular outputs, how can a data controller using off-the-shelf ML models be expected to meet its obligations under Articles 13(2)(f), 14(2)(g), and 15(1)(h) GDPR and provide a meaningful description of the ML model’s logic? Encouraging generic descriptions does not appear to have been the intention behind the GDPR and it will be very interesting to see how the regulators respond to this issue.
AI can cause challenges under the GDPR’s security provisions, too. Since ML models can be gatekeepers as well as attack vectors, one ML model can be used to defeat another. Consider a bank that uses voice recognition ML models to secure its clients’ bank accounts. Account access is only granted if the ML model positively identifies the bank’s client’s unique voice fingerprint, the ML model having been trained during account setup. Can such a bank demonstrate appropriate technical and organisational security measures if ML models that are engineered to imitate human voices are available and highly effective? Perhaps not, but the regulator will need to determine what is appropriate in this “damned if you do, damned if you don’t” scenario.
In conclusion, it is clear that AI poses many unique privacy questions which will only increase in number and prominence as the use of AI becomes more and more widespread. Whether our current privacy frameworks have all the answers remains to be seen and it will fall to the regulators to fill any gaps. In the meantime, we should improve our technical knowledge of AI. By doing this, we will be able to provide practical and sensible privacy guidance to our internal and external stakeholders, and where possible, help them towards being able to make better and more informed choices.