ADVERTISEMENT
Open access
Discussion
Published Online 31 August 2021

Bias in medical artificial intelligence

Publication: The Bulletin of the Royal College of Surgeons of England
Volume 103, Number 6

Abstract

How human biases impact the use of artificial intelligence in medicine.
Bias, discrimination and social injustices have plagued human society since the dawn of time. Healthcare and its affiliated technologies are not immune to such perils. With the advent of artificial intelligence (AI) in diagnostic and therapeutic medical interfaces, bias may be intentionally or inadvertently introduced into medical technology.
This paper starts with a brief review of the basic concepts behind AI in order to explore how and where bias may arise. This is followed by several documented instances of bias incurred against certain patient subgroups through the use of medical technology. Finally, a simple method is proposed to help identify and treat such bias.

AI: a summary

The world of AI is filled with jargon but as practitioners and users of medical technology, we must understand the concepts through which AI functions in order to learn to identify where bias can occur, producing discrimination and inequities in healthcare. To begin with, massive collections of data are acquired. These arise from scientific research and electronic health records, and can include information on potentially any aspect of medical practice such as patient characteristics, symptoms of specific diseases, diagnostic criteria, medication doses and abnormal signs on radiographs. These data are used to construct algorithms, written in programming language and fed into the machine being developed. Algorithms become the machine’s background reference through which it is able to recognise and interpret incoming information. Akin to a child interpreting the world through the instructions of its parents and teachers, together with their memories and experiences, AI interprets the world through data algorithms.
Like humans, AI also needs to reflect on and learn from new experiences in order to improve its performance. When AI learns from new data with the help of human input, it is termed machine learning; when it learns by itself, without the need for human input, that is deep learning.1
AI uses computer vision2 to interpret visual information such as images and videos, to which it can then react based on its algorithms. Natural language processing (NLP)2 is how AI can understand and interpret human language, whether spoken or written.

The bias cascade

From reviewing how AI is created, one can see where potential entry points for bias are located. These successive entry points create a domino effect of biases and prejudices with alarming clinical consequences.
The bias cascade starts with the data collection process. Disparities are known to exist in the recruitment of research subjects, where certain populations (eg women) are underrepresented. Since data form the technology’s backbone, biased or skewed data fed into AI algorithms will eventually result in biased and therefore inaccurate clinical performance. Machines are only as impartial as the data that have been fed into them. Experts using these data to construct algorithms may also carry their own prejudices and these may further compound inherent bias in the technology. Finally, access to healthcare facilities and medical technology suffers its own forms of inequalities, culminating in a compound interest of bias.
The medical literature is teeming with evidence of bias in medical technology. What follows are some examples of such bias on four fronts: racial, gender-based, socioeconomic and linguistic. When multiple factors are involved in creating bias (whether consecutively or simultaneously), this is referred to as compound bias.

Racial bias

Racial inequalities are intricately woven into the fabric of modern society. Awareness and transparency can help to uncover such inequalities that have made their way into healthcare. A leading example is a study published in The New England Journal of Medicine in 2020 by Sjoding et al, which sparked introspection in the medical community by unmasking racial bias in pulse oximetry sensors.3 The authors found that at the same oxygen saturation readings, Black patients were significantly more hypoxaemic than their White counterparts. Given the prevalence of hypoxaemia in patients with COVID-19 and the importance of pulse oximetry readings in steering management, this study has incited a serious look into unacknowledged and unconscious biases embedded in the design of hospital instruments.
In plastic surgery, AI designed to detect features of attractiveness on patient photographs would make detrimental decisions on Asian patients if it were programmed to recognise only White features for signs of beauty.4 Alarmingly, AI taught to detect skin cancer on images of fair skin may not be able to reliably diagnose lesions on darker skin.5

Gender bias

AI interfaces designed to read radiological images use computer vision to identify radiological abnormalities. They are expected to outperform human radiologists by reading images faster, with greater precision and a lower margin of error, but they too may exhibit ‘ethically impaired’ computer vision owing to being programmed with inherently biased data algorithms. Larrazabal et al found that AI driven software used to read chest X-rays worryingly underperformed in the diagnosis of various thoracic diseases whenever it was confronted with data from underrepresented genders.6 When the software was reprogrammed with more gender balanced data, it was able to perform better by detecting diseases more accurately on X-ray, proving that more diverse data improve the software’s clinical performance in a diverse patient population. This highlights the importance of emphasising gender diversity and equity issues in AI before releasing it into hospital practice as a diagnostic gold standard.

Socioeconomic bias

A prominent example of compound bias was demonstrated in a study by Obermeyer et al.7 Their group showed how socioeconomic disparities, originally driven by racial disparities, resulted in an underestimation of sickness in Black patients. They studied a hospital AI algorithm designed to identify patients who will benefit most from extra care to reduce their future healthcare costs. This algorithm was being fed insurance and cost data, meaning that those patients who already cost the most were predicted to cost more in the future and were therefore destined to receive additional care. However, in the US, more money is generally spent on White patients than on Black patients and so, for any given sum of money spent on healthcare, the Black patients were actually sicker.
The algorithm overlooked all the socioeconomic and racial barriers Black patients face, from difficulties in accessing hospital care in poverty stricken communities to the inability to afford insurance premiums and treatment, and the general distrust in healthcare and fear that Black communities may hold based on unpleasant past experiences of racial discrimination.7 Consequently, the sicker and poorer patients, on whom less money was already being spent, were unlikely to be offered the extra care they required based on the algorithm’s inherently biased choices.

Linguistic bias

AI can comprehend human language through NLP – but which language? The simple answer is the language or dialect it has been fed and taught to recognise through machine learning. This creates yet another bias with which patients and healthcare professionals must contend.
A team at the University of Toronto used an AI algorithm to identify language impairment as an early sign of Alzheimer’s disease, thereby making it easier to make a diagnosis.8 In practice, however, it was found that the algorithm was best at identifying Canadian English, putting French speakers and those who used other dialects at a disadvantage (and therefore at risk of being misdiagnosed).

4D solutions to discrimination

Eliminating bias and creating fairness and equity in AI means being conscious of bias at every potential entry point. Rajkomar et al recommend doing so at various ‘equity checkpoints’, which they locate in ‘design, data collection, training, evaluation, launch review and monitored deployment’.9 A more simplified, cyclical approach is found below, with four main phases or entry points, namely data, development, delivery and dashboard (Figure 1).
Figure 1 The 4D model: a cycle of four entry points where bias in artificial intelligence may be detected and the key aspects at each point that help eliminate bias.

Data

Since data act as the driving force behind the technology, it is imperative that the data used to feed AI algorithms are unbiased and representative of the target populations. This begins with diverse and well balanced study populations in medical research, paying particular attention to racial and ethnic diversity, gender balance, socioeconomic equity, and other social, as well as ethical, determinants of disease and access to healthcare. This also applies to electronic health records, and any other documents and sources of data used in AI algorithms. The use of neutral and fair language is mandatory when using or designing keywords to retrieve data records.
Global collaborations play a vital and constructive role in procuring massive quantities of diverse data. In much the same way that the World Health Organization amassed global information on the COVID-19 pandemic, healthcare professionals worldwide can contribute data to create equitable AI platforms. Furthermore, academic centres, where most medical research is based, may not be as representative of the general community as other healthcare institutions.

Development

This phase relies on engineers and programmers to carefully select data variables and write algorithms that are unbiased. They must also work in liaison with clinical practitioners to test the technology and make sure it does not exhibit bias in practice before releasing it to market. Machines should be tested in different patient populations for both scientific and ethical performance, and an ethical committee should be present to approve machines for clinical use.
A study led by Dr Fahrenbach at the University of Chicago is a brilliant example of how socioeconomic bias in AI algorithms was identified early, averting an ethical disaster.10 The algorithm was designed to predict patients who would have the shortest hospital stays in order to allocate them additional care and resources after discharge. Among the patient characteristics used as data variables and found to correlate significantly with length of hospital stay were the patients’ zip codes. On closer inspection, the study team found that patients living in more affluent areas had shorter hospital stays and were being selected by the algorithm to receive extra care they did not need whereas those from underprivileged areas would have been denied care that they needed. Testing the algorithm early made sure it was not released for hospital use.

Delivery

Healthcare professionals must become more conscious of inherent biases in the machinery and AI interfaces they use. This starts by ensuring equitable patient access to the technology, and then diligently observing how it performs in diverse populations and underrepresented communities. Any bias noticed should immediately be reported to an appropriate committee at the hospital, which can then communicate with the developers.

Dashboard

Understanding bias inherent in medical technology allows clinicians to question the accuracy of the technology if the results do not meet the expectations from their clinical expertise. It is crucial to create a ‘dashboard’ through which AI technologies can be evaluated. Performance data can then be used to give feedback. At the hospital level, feedback from healthcare professionals should be reviewed by a committee that comprises AI experts/biomedical engineers, clinical experts, ethicists and administrators. This committee should then be able to communicate with designated decision makers, which may include the developing company or healthcare legislators. For example, in the case mentioned earlier of racial bias in pulse oximetry, a letter was written to the US Food and Drug Administration to draw attention to the matter.11 Awareness and accountability are key components of any ethical dashboard.

Conclusions

Bias exists in medical technology and may be racial, gender-based, socioeconomic or linguistic. Understanding how AI works helps us identify bias before it can have dire consequences on patient care. Ethical governance over AI is imperative if it is to succeed in helping those patients it is intended to help.

References

1.
IBM. Machine learning. https://www.ibm.com/cloud/learn/machine-learning (cited June 2021).
2.
Great Learning. What is artificial intelligence? How does AI work, types and future of it? https://www.mygreatlearning.com/blog/what-is-artificial-intelligence (cited June 2021).
3.
Sjoding MW, Dickson RP, Iwashyna TJ et al. Racial bias in pulse oximetry measurement. N Engl J Med 2020; 383: 2477–2478.
4.
Koimizu J, Numajiri T, Kato K. Machine learning and ethics in plastic surgery. Plast Reconstr Surg Glob Open 2019; 7: e2162.
5.
Goyal M, Knackstedt T, Yan S, Hassanpour S. Artificial intelligence-based image classification methods for diagnosis of skin cancer: challenges and opportunities. Comput Biol Med 2020; 127: 104065.
6.
Larrazabal AJ, Nieto N, Peterson V et al. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc Natl Acad Sci USA 2020; 117: 12592–12594.
7.
Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019; 366: 447–453.
8.
Fraser KC, Meltzer JA, Rudzicz F. Linguistic features identify Alzheimer’s disease in narrative speech. J Alzheimers Dis 2016; 49: 407–422.
9.
Rajkomar A, Hardt M, Howell MD et al. Ensuring fairness in machine learning to advance health equity. Ann Intern Med 2018; 169: 866–872.
10.
IEEE Spectrum. Racial bias found in algorithms that determine health care for millions of patients. https://spectrum.ieee.org/the-human-os/biomedical/ethics/racial-bias-found-in-algorithms-that-determine-health-care-for-millions-of-patients (cited June 2021).
11.
Sjoding M, Iwashyna TJ, Valley TS. More on racial bias in pulse oximetry measurement. N Engl J Med 2021; 384: 1278.

Information & Authors

Information

Published In

cover image The Bulletin of the Royal College of Surgeons of England
The Bulletin of the Royal College of Surgeons of England
Volume 103Number 6September 2021
Pages: 302 - 305

History

Published online: 31 August 2021
Published in print: September 2021

Authors

Affiliations

AJMS AlHasan
Specialist General Surgeon @A160186, Jaber Al-Ahmad Al-Sabah Hospital, Kuwait

Metrics & Citations

Metrics

Article Metrics

Views
1028
Citations
Crossref 1

Citations

Export citation

Select the format you want to export the citation of this publication.

View Options

View options

PDF

View PDF

PDF Plus

View PDF Plus

Get Access

Login Options

Check if you have access through your login credentials or your institution to get full access on this article.

Subscribe and get full access to this article.

Buy Article
Bias in medical artificial intelligence Vol.103 • Issue 6 • 24 hours access
GBP 19.99
Add to cart

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share on social media