One of the emerging challenges in machine learning concerns the problem of transparency; put simply, not all ML models are easily explainable. Historically this seem to have not been a major problem, and we often associate the term 'black box', which may even have added to the mystique surrounding AI.
The issue with black boxes is that you can't easily tell what's happening inside, which may be compounded where an ML model happens to be also non-deterministic (in fact, or for all practical purposes). So how does this problem manifest itself?
A question of trust
We can imagine that an ML model with these characteristics might be deployed within an investment bank for example. Such a system would have the potential to block a legitimate transaction - due to a false positive or Type 2 error. This blocked transaction might represent a significant financial event, such as a stop loss request to sell. Worse still, you won't necessarily be able to diagnose why the transaction was blocked.
For many environments, particularly safety or business critical systems, or highly regulated contexts, this level of opaqueness is unacceptable. Imagine for example a Lethal autonomous weapon systems (LAWS), autonomously controlled by an unexplainable AI. A digital foreign exchange network. An ambulance dispatch system. All of these systems need a level of trust and certainty.
In an increasingly regulated landscape, the concerns around transparency are becoming even more complex and pressing - especially for enterprises handling financial and personal data, or safety-critical systems. This can be a particular challenge in complying with regulations such as GDPR, where users may legally request further information on the specific nature of data processing.
As we move to increasing levels of autonomy and automation in AI decision making, especially in environments handling high value, sensitive or mission-critical data, the issue of trust and accountability in AI becomes a major concern.
Why are some AI models intrinsically opaque?
ML models such neural networks are effectively little more than graphs of weights and vectors, trained on large quantities of data. These models can be accurate, whilst at the same time maddeningly opaque. Model behaviour can deviate over time, and start to exhibit 'concept drift', and it may not be obvious why.
In many cases there may be no accepted techniques to properly understand model behavior, other than by observing it. Where techniques do exist (for example feature sensitivity analysis) the techniques can be slow and too low level to explain overall system function.
If the original training data is later decoupled or lost, then effectively we have a working ML system that we may not be able to explain the exact functioning of, or even how it got there, it just works well. Going forward this is likley be very unsatisfactory when mapped against compliance needs. This remains an active area of research.
Explainable AI
An emerging area of interest on this problem is the field of ‘explainable AI’ (XAI). Explainable AI offers reasoning as to why ML system arrived at various outputs and predictions, and attempts to provide explanations of how it got there.
Given that we are talking about potentially highly complex systems, explainability refers not only to whether outputs and decisions of the system are interpretable, but also how the entire process and intention surrounding the model can be adequately explained. In many cases this is likely to be a compromise.
These explanations should be understandable, at the very leats by a human domain expert.
With advances in explainable AI we may be able to mitigate some of the threats around concept drift, adversarial attacks, and intentionally planted bias, however this is still a relatively new field.
Unsurprisingly XAI works best with AI models that are inherently explainable, however several AI techniques remain largely inscrutable, and for the foreseeable future are likely to remain so. Neural nets for example work so well, partly because they are so beautifully simple. However this simplicity produces models that are capable of very sophisticated behaviour, and when you examine the underlying graph, the cause of the behaviour may be far from intuitive.
Work in progress
There are several major initiatives in progressing explainable technology, and this work goes back several decades. There are two main set of techniques currently used to develop explainable systems in the research community: post-hoc and ante-hoc.
Post-hoc techniques allow models to be trained as normal, with explainability incorporated at test time. Examples include: Local Interpretable Model-Agnostic Explanations (LIME), Layer-wise Relevance Propagation (LRP), and BETA
Ante-hoc techniques rely on building explainability into a model right from the outset. Examples include: Reversed Time Attention Model (RETAIN), and Bayesian deep learning (BDL).
There are several major initiatives in progress, including work by Google, and projects such as The DARPA XAI program, aims to produce 'glass box' models that are explainable to a 'human-in-the-loop', without greatly sacrificing AI performance. This is an ambitious objective, particularly in a field that it advancing so rapidly. In 2018 an interdisciplinary conference called Fairness, Accountability, and Transparency (FAT) was established to research transparency and explainability in the context of socio-technical systems, many of which include AI. The first global conference exclusively dedicated to explainable AI, the International Joint Conference on Artificial Intelligence: Workshop on Explainable Artificial Intelligence (XAI), was launched in 2017.
Accuracy verses explainability
It is fair to say that full transparency may not always be possible in AI (and it may not always be necessary). Even if it is explainable, it could be that the explanation for some models may only be understandable by a human with a certain level of expertise. For example, the processing function a highly customised deep learning network may be highly accurate, but challenging to describe succinctly to a lay person.
One way we can think of AI techniques is as a spectrum with accuracy at one end of the spectrum, and explainability at the other. Starting with accuracy we might expect this spectrum to approximate this order:
- Neural nets
- Ensemble methods
- Support vector machines
- Graphical models
- Decision trees
- Regression algorithms
- Classifiers
Part of the dialog here will be around the subject of 'what forms an acceptable level of explanation'.
Going forward
Whilst ML systems are becoming increasingly beneficial and sophisticated, there are all sorts of indirect but closely related impacts to consider: such as ensuring there is no racial bias in a system, understanding outcomes in safety critical systems, avoiding threat to life, the potenial for insurance claims and litigation, and ethical considerations. Organisations need to feel that they retain control of these systems.
On a more practical level, one possibility would be to replace a 'black box' system in production environments with a more deterministic explainable model (for example replace a neural net with a decision tree). This may be not practical for certain classes of problem, but there could be an interesting line of research in translating opaque models into more understandable (i.e. comprehensible) forms, perhaps with some compromises in performance.
Complementary tools such as blockchain could also help make AI more coherent and understandable by immutable registering all data, variables and processes involved in a decision process, anchored in time. We can then trace back and determine why particular decisions were made by the model, with the assurance that those states have