As part of our podcast series, we spoke to Dr Allison Gardner, who is talking in the Health Tech stream at the 2020 conference. Allison explains why investors in health tech need to be asking questions about teams as well as tech, why we must hold onto our senior clinicians and prevent deskilling, and why it’s absolutely critical that AI systems are designed, developed, and deployed correctly.
Allison’s main focus is AI ethics as well as gender and diversity within the tech field. She feels that diversity is “one of the fundamental reasons why we have had the errors that we’ve had within machine learning with regards to algorithmic bias, and the problematic outcomes when you have badly developed and utilised algorithms. Particularly within the health industry, because we know even within medicine that you have problems with gender differences in diagnosis. When you embed that within technology and diagnostic technology, it can actually be more insidious”.
USING MACHINE LEARNING TO PREDICT COVID-19 PEAKS
Our interview took place during lockdown so naturally we were curious whether machine learning could be used to predict COVID-19 peaks.
While there’s been a lot of interest in the tracer apps, there are other aspects which don’t get much coverage, such as modelling of the disease. There’s some interesting work on testing data from Suez on whether that data can be used to predict potential peaks as much as a week in advance.
If you know that you’re going to get a localised peak one week in advance, a local lockdown can be implemented early to prevent it from spreading outwards.
Machine learning that can utilise that sort of information can be very useful in terms of managing a crisis. For example, if you know that you’re going to get a localised peak one week in advance, a local lockdown can be implemented early to prevent it from spreading outwards.
Also the symptom-tracker app that has been developed, and the information that they’re collecting has been able to uncover an unusual symptomatology and expand our knowledge of the disease. Tracer apps can also useful if implemented in the right way as an addition and the adjunct to real-world tracking and testing, but there are some governance, privacy, and ethical issues with those.
PRIVACY IS A KEY CHALLENGE IN HEALTH
The key challenge is privacy. It depends on what type of prediction modelling that you’re doing. If you’re looking for people’s private personal data, how can you be sure that this is done ethically, and that people have control of their data and can exercise their rights under GDPR?
Your data is kept on your phone and it’s not shared in a central database, but there is a point where it needs to go into a central database if you’re going to do sensible tracking.
My major concern is about any sharing of personal health data where you might be given assurances that the data is going to be held safely within a centralised database, but then later you discover it’s been sold off. And that has happened even with NHS data and it’s a huge concern.
The big debate at the moment, for example, around the tracer apps is whether we would have centralised or decentralised apps. So your data is kept on your phone and it’s not shared in a central database, but there is a point where it needs to go into a central database if you’re going to do sensible tracking. So data privacy is a huge thing.
HOW THESE SYSTEMS ARE DEPLOYED IS CRUCIAL
How these systems are actually deployed is crucial. You can develop an ethically-developed algorithm as best you can, with de-biased data and appropriate mitigations, but when you actually sell that onto a customer and it gets deployed, you need to be sure that it is deployed in the way that it was developed for, with the appropriate mitigations.
Different cultural settings will have different affects on the same algorithm, and the outcomes can be very different. It’s this dynamic, constant evaluation of the algorithm, and to ensure that the deployment processes you put in place are done so correctly. And that’s right down to the design of the interfaces and the operational procedures with regard to not having humans just default to the machine decision, rather it augments their decisions.
A few years ago there was an argument about the ‘black box’ and ‘we can’t explain AI’. That’s been pretty much debunked. You can explain AI. People can’t hide behind that excuse any longer.
So an overwrite must be possible, re-evaluation must be possible, and the people should understand the metrics. I do think people need to look at that aspect of machine learning and not just the development of the algorithm itself.
The case has been well made that historic datasets contain structural bias. I still see people saying, I’m selling a fully de-biased dataset. Which shows a lack of understanding. A few years ago there was an argument about the ‘black box’ and ‘we can’t explain AI’. That’s been pretty much debunked. You can explain AI. People can’t hide behind that excuse any longer.
Resistance to regulation was based on an argument that that regulation stifles innovation. But regulation can actually inspire innovation because you’re going to have to be more innovative to develop good quality, ethical AI. Meaningful regulation still has not happened. The general public’s understanding of what AI is and the pervasiveness of this technology within their lives already is still not there.
My biggest concern from a health point of view is a rush to deploy AI solutions without fully understanding issues such as gender and ethnic differences within the medical field.
My biggest concern from a health point of view is a rush to deploy AI solutions without fully understanding issues such as gender and ethnic differences within the medical field, especially if they’re built on historical data. That still surprises and worries me when grants are being awarded.
There doesn’t seem to be any requirement in grant applications to (1) address lack of diversity in the team; (2) disaggregate the dataset so you can determine and evaluate the algorithm based on gender differences and intersectionality; (3) the machine-learning tool must be explainable, particularly if it’s a diagnostic or a decision-making algorithm, including operational ones such as bed management.
THE RISK OF DESKILLING AMONG CLINICIANS
The dependence on algorithms and the long-term consequences on the impact on clinicians’ skills needs to be addressed carefully and this comes back to operational processes. There is some evidence from other industry sectors that, if not deployed properly, de-skilling can result. We don’t want to reduce the ability of our senior clinicians. We want our trainee clinicians to get to the level of our best senior clinicians. Any system we develop must enable that to happen.
Let’s take the example of predictive or diagnostic algorithms. Yes, it should augment this decision-making process. A diagnostic algorithm for childhood diseases diagnosed about six different diseases where they tested six groups, three Junior doctors and three Senior doctors. When I looked at some of the results, it did actually diagnose better than some doctors: it diagnose better than trainee doctors. It wasn’t better than Senior doctors.
If they use that system and they default to it, that’s great, they’ll improve. But it might be that they’ll never get as good as a senior doctor.
The question I asked was, how are you going to use that system? Are you going to use it to train Junior doctors? And if they use that system and they default to it, that’s great, they’ll improve. But it might be that they’ll never get as good as a senior doctors because it would create this sort of false barrier to them. We will lose that nuanced, inspirational, and experienced view that some very senior doctors and clinicians have. I worry about that.
So how is that going to be deployed? And what safeguards have you got for de-skilling? This is where you get the belief that, oh, don’t worry, a human was making the decision but operational procedures meant they defaulted to the algorithm. It should be a machine in the loop, and it should be a human making the decisions. When people are planning these systems I would like them to think very carefully about how it’s deployed and the long-term consequences.
Coming back to deployment, it can still go wrong if you don’t deploy correctly. An example was a bed management algorithm in the States. They tried really hard to ensure it was an unbiased, ethically-developed bed management process that decides who gets a bed and who doesn’t. However, it was discriminating against Black and Asian minority groups. It can have serious health consequences when people are being discharged from hospital too early. Luckily, the hospital was monitoring it, and found out fairly quickly, and admitted it and corrected it.
THE FUTURE OF ALGORITHMIC DECISION-MAKING IN HEALTH
I think algorithmic decision-making will allow for the easier management of co-morbid conditions, and for personalised medicine to be easier.
I think algorithmic decision-making will allow us to make very complex decisions better. It will allow for the easier management of co-morbid conditions, and for personalised medicine to be easier, because it’s hard to do a one-size-fits-all. Which currently some algorithms are but if they are developed to allow for that flexibility, that is fantastic.
My own particular background was from genomic analysis and disease prediction from that analysis. So a better understanding of our genome and the inherited traits around various diseases and how they intersect with environmental issues, that’s a really interesting area. The value of machine learning is that it can expand the dataset (although it can cause problems as well if you don’t understand your decision-making process), and it can help increase efficiency within the health sector. But an increase in efficiency shouldn’t be done at the cost of human labour, which it happen. We need to be careful. We need to be wary of tech-solutionism as well.
THE IMPORTANCE OF IEEE STANDARDS
We asked Allison about her work on IEEE P7000 Global Initiative on the Ethics on Autonomous and Intelligent Systems, and specifically on algorithmic bias, providing a framework for algorithmic impact assessments.
I think these standards really important, and my experience, particularly with P7003 which deals completely with algorithmic bias, is how you can develop ethically a process that doesn’t cause bias. And it is really complex, because it isn’t just a simple solution of just looking at the dataset.
This is why conferences such as yours and this multidisciplinary approach is so vital. We need to think more deeply about what questions we are actually asking, and looking to see whether AI is a good solution for that. It starts by asking, what problem am I solving? Is that problem I’m deciding to solve in itself a biased one? Have I got a diverse group of people contributing? And what problems am I not even thinking to address?
So before you even hit the AI, you need to be thinking, where I go to get my dataset from? How am I choosing that dataset? Am I inserting my own biases as to what datasets I think will be the best ones? So again, if you’re doing a predictive policing dataset, or even a diagnostic dataset, and you go for particular types of data and you don’t analyse that for, for example, what proportion is Black Asian Minority Ethnic (BAME) groups? What gender proportions are there? What age range do I have? And then the intersectionalities of that?
Where was that data collected? So was that data collected in one locality, and where I’m wanting to deploy is in a different locality? So is that actually applicable? And the idea of localisation is really important from a standard point of view. Because we’re meant to be global standards.
That’s been a really interesting conversation. Because all of the datasets have what we call features. So, the features are your gender, your address, your ethnic group, age, what you do for a living, whether you have a cat or a dog, what your blood group is, and so on. But as you do what’s called feature selection, you are trying to develop a model that will use as few of them as possible, because if you have too many you get something called overfitting, which means you just start seeing patterns where there are no patterns available.
What the standard is trying to do is to help people […] think about bias constantly, in the right way. And then what metrics you use to measure this, and how you would do it, and how you deploy it.
So you try and you tweak your datasets. They’re never fully perfect. There’s all sorts of missing values and incorrect values. So, how you clean your datasets is a human decision-making process. What the standard is trying to do is to help people navigate this, and think about bias constantly, in the right way. And then what metrics you use to measure this, how you would do it, and how you deploy it.
At the center of P7003 there is an algorithmic impact risk analysis that you can do. And in there is a go/no-go clause, where once you’ve done all of this and you feel like you are not producing an ethical algorithm, there’s a point where you say, we can’t deploy it. We’re seeing this at the moment with some facial recognition technology, where people are saying that we can’t get this right in the right way yet. It’s still too biased, it’s not being used ethically so we’re not going to develop it or use it.
But writing a global standard for different cultures, where one feature might be considered a protected characteristic, and in one country it is legal, and in another country illegal – that creates difficulties. So there’s some interesting conversations and fortunately you have multidisciplinary teams working on these, but it’s really hard.
So, eventually these standards will get developed by experts, and will be written and published. And people can then use those as a working model to see if they are producing ethical AI, and they don’t necessarily require regulation. It’s something you can claim to work for, but really they should be supported by regulation and certification on top of that so that you can audit the systems.
You can then say, you say you’ve used this system, and you follow this standard – have you really done that? The International Standards Organisation (ISO) and British Standards Institute (BSI) have that sort of framework there.
WHY WE NEED WOMEN LEADING IN AI
Allison, along with Ivana Bartoletti, and Reema Patel, co-founded Women Leading in AI in 2018. We asked her what prompted them to create the organisation and what they hoped to achieve through it.
In 2018, when we founded Women Leading in AI, it was at the point where the issue of AI was bubbling up, and the fantastic work by great people such as Joy Buolamwini and Timnit Gebru who were illustrating how these systems can be really biased if not done properly. And many of them were traced back to the fact that we just did not have gender or ethnically diverse teams developing it.
So we had huge errors in development that caused discriminatory outcomes, which could actually become embedded in real systemic inequality. I was watching all of that as somebody who’s always been interested in women’s rights, and who has a political background.
I met Ivana and Reema through the think-tank, Fabian Society. Reema worked at the Ada Lovelace Institute, and Ivana is a privacy expert.
A lot of conversations about regulation were happening. I got a little mad because I could see that a lot of the policy people talking about AI – not Ivana or Reema – didn’t understand AI, and were falling for the black box explanations and were not quite understanding that it’s not just about the data, there’s more to it. I thought, they need to speak to somebody who develops machine learning algorithms, who understands it.
And Ivana, said, “Right, let’s do something about it!”. I agreed, and we said we’ll do a conference, with just women’s voices, and it got a bigger interest than we anticipated.
Because of our links with Westminster – in particular, I’m a councillor, and Reema’s a councillor – we have good relationships cross-party. Women Leading AI is not party political. We feel quite strongly about that. Because of these strong cross-party relationships, we could access people like the Digital, Culture, Media, and Sport Committee, the All-Party Parliamentary Group on AI, and help that conversation, and drive for proper regulation, but also creating a large network of women who had a space where they could talk about their experience, and help us elevate others to have that opportunity as well.
I’m always conscious when I get invites because of my role. I’m thinking, really, I should be trying to reach out to elevate others. So it’s trying to get that balance. We’re very keen on doing that. So we usually do an annual conference, where we make a rod for our own backs, because we believe in accessibility, so we don’t charge. And that’s a real ethical issue, because it’s a lot of work. And asking for speakers to do things for free, it’s in a lot ways unethical in itself. And your conference as fantastic, as I think you’ve had that conversation, and I think you’ve got the balance right. We’ve learnt a lot from you.
And the problem is, incorrectly deployed health systems of any sort but particularly with AI – because it embeds biases, and can augment them – people will die if it’s not right.
IF WE DON’T GET IT RIGHT, PEOPLE WILL DIE
If we don’t get this right, all the inequality can become embedded in the system and people can be blind to it. And I still go to conferences and say, you do know that ECT traces can be different between male and female. And I still get looks of surprise. And I’m thinking, the research is out there, or, that research was mainly based upon male mice etc. You’ve got to address this issue. And the problem is, incorrectly deployed health systems of any sort, but particularly with AI – because it embeds biases, and can augment them – people will die if it’s not right. This is high stakes.
So we need to see change sooner or it will become embedded in our systems and become very hard to undo. We’ve been trying to deal with this for decades, and unlike other science areas, where we have managed to improve gender diversity, it is getting worse in computer science.
All these initiatives are great but they’re not necessarily fixing the problem. So I would do two things, which is I would consider looking at smaller segments in how we operate in our processes, because we cannot fix it by battering men over the heads, and saying, you need more diversity. Then you get the surface fixes which don’t work because they don’t actually ask the women how they need to be helped.
SMALL NUDGES MIGHT MAKE A DIFFERENCE
The question is, why are there too many men in tech? And why are men staying in tech and why women are being forced out of it? The spotlight needs to change on that type of question. Women don’t need fixing, we don’t need encouraging, we don’t need help. We just need to stop having to fight the system.
I think the field of computer science and programming was taken away from women. We were the machine developers and programmers, and we got pushed out, and the field became a safe space for a certain type of male who feel quite threatened if women come into it.
And that’s actually historic. The field of computer science and programming was taken away from women. We were the machine developers and programmers, and we got pushed out, and the field became a safe space for a certain type of male who feel quite threatened if women come into it. And again, it is probably not conscious, but it’s created in that way. That’s why we have things called tool boxes, instead of sewing boxes. And then they say, but women use tool boxes as well. Then I say, men sew as well, so, what’s the difference? It’s subtle.
It’s really endemic in the subject area but we haven’t won that battle. And we’re not going to win it. So how do homogeneous groups – male groups – develop good design? If they want the money, they’re going to have to have diverse groups. If they want to sell their products, they’re going to have to show disaggregated data and evaluations of their algorithms to show it doesn’t cause any discrimination. And where it does – because all will – what mitigations are in place that you hardwired in so that post-deployment it can’t be changed? How are you being transparent in your informing of your customers of these systems? And this is why you need regulation.
Regulation requiring diverse teams will stimulate innovation. Because once you’ve got diverse teams, you’re going to think of more problems that you can solve that you never thought of before.
In fact, regulation requiring diverse teams will stimulate innovation. Because once you’ve got diverse teams, you’re going to think of more problems that you can solve that you never thought of before and address a whole group of people that have been sidelined. And I talk about women, 50 percent of their population, that can deal with that.
I want to see these small nudges on this regulation being introduced. And one of the best nudges, I really recommend, and I intend to do a campaign on this, is to have grant-awarding bodies build this into their application. And if applicants and tech startups cannot demonstrate, in a meaningful way, that they are combatting discriminatory practices, then they don’t get the funding. Suddenly you’ll find that they will start sorting it out.
IT’S EASILY SOLVED WITH ETHICAL DEPLOYMENT
Particularly in the UK there seems to be resistance to implementing any form of regulation. It’s also quite difficult to regulate something that is so cross-sectoral. I think it is easily solved with ethical deployment. A lot of the work of Women Leading in AI is trying to navigate that.
Any algorithmic impact bills that have been introduced meet with a huge resistance. In the USA, Lillian Edwards recommended they developed a draft bill, which was extremely good, parts of which seem to have been taken up by Australia.
Any regulation that goes through gets amended again and again to the point where it can become fairly meaningless. So a regulation has to have some teeth.
But when any of these go through there is huge resistance from the tech industry. The regulation gets amended to the point where it can become fairly meaningless. So a regulation has to have some teeth.
And the classic example was New York City, who did an Algorithmic Impact Assessment bill two years ago. The original version was really good, and showed a lot of foresight. But it got amended to the point where all that ended up being allowed for, was that there’d be an AI ethics counsel who would review the algorithms and make recommendations, which they implemented.
One year later, the council said, this is just a waste of time. They had not been able to do anything because every time they asked for something to evaluate they’ve claimed intellectual property rights. So they can’t tell us. So they’ve done nothing of value all year. It became a meaningless piece of legislation.
If we don’t get any AI specific regulation for quite a number of years, and we’ll have to keep fighting that, we do currently have legislation that we could use.
THE CURRENT LEGISLATION WE CAN USE
If we don’t get any AI specific regulation for quite a number of years, and we’ll have to keep fighting that, there is currently legislation that we could use such as the laws on equalities, human rights, data privacy, and GDPR.
What’s going to have to happen is a class action addressing issues of a particular algorithm. And we have tried it. We tried it with a facial recognition in Cardiff and in London and it didn’t get passed. But something’s going to get passed, and that will certainly make people sit up and listen. It’s hitting people in the pocket, isn’t it? But I’d rather not wait for that.
Let’s just be honest and know that we need to do it. Let’s get some good regulations, a good ombudsman. So somebody has a right to redress. Imagine the scenario where you have a heart attack prediction algorithm and you are a Black female and yours was missed. And you find that your diagnosis was based upon an algorithm that has all of those problems.
If you don’t know that you’ve been subject to an algorithmic decision-making tool, and that the human in the loop wasn’t a meaningful human in the loop, it’s very hard for you to exercise your rights under [current legislation to seek redress].
We know that Black women’s symptomatology is very different, and sometimes their diagnosis can get missed, and they die because of it. In that case, you should be able to seek redress. But if you don’t know that you’ve been subject to an algorithmic decision-making tool, and that the human in the loop wasn’t a meaningful human in the loop, it’s very hard for you to exercise your rights under GDPR for example to do that. So this is where we want transparency as well.
IDENTIFY WHERE AI IS USED
So legislation is needed. I would argue that a very simple piece of legislation is needed: the requirement to use an ‘info mark’ where anything that impacts on the human that has had an algorithmic decision-making process involved. So that you are aware.
This gives you the grassroots-up viewpoint. People start seeing the same one everywhere, and they start to see how pervasive it is, and they will start asking questions. And when the people start asking questions, then maybe industry, public-sector, and legislators will start thinking, we need to do something about it now, because they’re now demanding it.
If you’re not doing anything wrong, what’s the problem letting people know that AI’s been involved in this process?
It’s one of those little nudge techniques that will help. Keep it simple because we’ve been going around in circles debating this for quite a few years. Bring in the requirements within grant applications. Have an info mark so people are aware. We’ve got all of this great work with standards developing as well, so we can scale these solutions up. And my argument is, if you’re not doing anything wrong, what’s the problem letting people know that AI’s been involved in this process?
You can listen to the full interview with Allison on our podcast here.
Photo credits: Christina and H. Shaw on Unsplash.