Large language models (or LLMs), like Google’s Bard, or OpenAI’s ChatGPT seem to have taken over the internet and chances are, you might have tried them too. Next to generating silly poems about your friend’s cat or assisting in writing that new LinkedIn post, LLMs are already projected to replace common search engines as we know them (Grant, 2023), therefore becoming a widely used and accessible source of knowledge. Furthermore, LLMs come with the benefit of creating quick, concise, yet sophisticated answers without having to click through several websites and articles that might even contradict each other. Sounds great, right?
Der vorliegende Beitrag entstand 2023 im Rahmen der Sommerakademie «Stupid Statistics?!? Durchblick behalten im Daten-Dschungel der Gegenwart» der Schweizerischen Studienstiftung und wurde redaktionell begleitet von Reatch. Der Beitrag gibt die persönliche Meinung der Autorin wieder und entspricht nicht zwingend derjenigen von Reatch oder der Schweizerischen Studienstiftung. Erstmals erschienen im Reatch-Blog.
Well, it’s not that simple. While AI will likely make many areas of our lives easier, its benefits might not be shared by everyone alike. AI algorithms have continuously been shown to exhibit bias against disadvantaged, especially racialised groups including in health (Obermeyer et al., 2019), education (Cheuk, 2021), or employment (Chen, 2023). This is, unfortunately, an inherent danger of AI, because their algorithms recognise and reproduce patterns based on huge datasets (Rajpurkar et al., 2022), which often reflect widespread discriminatory patterns in society. Simply put, when an AI is trained on biased data, it will likewise generate biased output, reproducing existing inequalities. So, could biases creep into the algorithms of modern technology? To investigate this, I sat down for a serious conversation with ChatGPT.
Before we start, it is worth mentioning that ChatGPT is non-deterministic, meaning that for the same question, answers can vary. For that reason, I used different phrasings for my questions and the overall output was reasonably consistent. Therefore, I provided representative quotes, though only for the older version of GPT-3.0 when I conducted the research in the spring of 2023. I held the conversation in English, which is ChatGPT’s largest database and arguably the lingua franca in modern science. Some text fragments have been omitted for conciseness or to avoid repetition. Without further ado, let’s dive in.
The number of possible conversation topics with ChatGPT surrounding bias seemed endless, so it was not an easy choice. In my last semester at the University of St Andrews, I took a biology class, entitled “The Science of Race and Racism”. Knowledge obtained in this class allowed me to evaluate the information that ChatGPT provided to the best of my abilities, as well as place it in a larger context. So, for this article, I decided to ask a seemingly simple, yet surprisingly complicated, question, a question you might also be curious about: “Are there biological differences between human races?” ChatGPT replied:
There is no universally accepted definition of “race” within the scientific community, and the concept has been highly controversial and politically charged. Therefore, it is important to clarify that the term “race” is often used in a social, cultural, or political context, rather than a biological one.
While there are certainly genetic variations among humans, the genetic differences between individuals within any given “racial” group are generally greater than the genetic differences between individuals of different “racial” groups. These variations are not sufficient to classify people into distinct, biologically meaningful racial categories.
Additionally, many of the physical traits that are commonly associated with different “races” – such as skin colour, hair texture, and facial features – are actually the result of adaptations to environmental factors such as climate and sunlight exposure. These adaptations have evolved independently in different populations around the world and are not necessarily indicative of deeper biological differences.
ChatGPT correctly mentioned the important argument against the concept of biologically distinct human “races” that only 5-10% of genetic variation is detectable between racialised groups, much less than the 90% variation found within populations (Lewontin, 1972; Rosenberg et al., 2002; Long and Kittles, 2003). Contrary to ChatGPT here, some academics have indeed argued that the little, but remaining, variation between racialised lines justifies “race” as a biologically meaningful category (Woodley, 2010). However, I believe that this view is based on questionable methodological practices – which I will describe in more detail later – and risks underestimating the complexity of human variation. There are no clear-cut genetic boundaries between groups of populations, not even between continental lines (Barbujani and Belle, 2006). On the contrary, genetic differences between human populations are continuous and increase linearly with geographical distance from East Africa, where modern humans originated and migrated from (Handley et al., 2007). In fact, human evolutionary history is mainly dominated by the migration of small and unrepresentative subsets of larger populations to new territories – known as ‘founder effects’ – rather than natural selection (Ramachandran et al., 2005). Due to the unrepresentative and smaller gene pool of each new founder population, genetic diversity within populations declines with increasing geographic distance from East Africa. Therefore, categorising humans according to continents, or even characteristics like skin or hair colour (which are genetically independent, or ‘discordant’, from most other human traits, anyway, see Graves and Goodman, 2022) is arbitrary and does not account for the complexity of human genetic history.
While most geneticists today follow this argument and reject a biological concept of human “race”, the debate among scholars is still not settled. For example, the influential Harvard geneticist David Reich recently argued that there is sufficient evidence that racial categories are biologically meaningful as they can be useful predictors in medical research (Reich, 2018). This was refuted in an open letter by 67 scientists warning against the conflation of human populations (which can be relevant in medical contexts) and “race” (BuzzFeed Opinion, 2018). ChatGPT ensured me that, where debates were controversial, ChatGPT would “strive to present the different arguments and evidence in a clear and balanced manner, without endorsing any particular viewpoint”. So, what makes ChatGPT take a stance on a debated issue? On one hand, this could be chance or simply reflecting the scientific majority opinion. However, the consistency of ChatGPT to reject the concept of “race” opens the possibility that there is a predefined setting set by its developers at OpenAI. ChatGPT has been found to exhibit political biases, despite claiming neutrality (Rozado, 2023). While in this case, ChatGPT – in my judgement – rightly deemed the concept of “race” biologically meaningless, failing to portray other perspectives, especially in science, can become a serious issue as I will show later.
Before that, I decided to press ChatGPT a little on the implications of its arguments. I therefore asked if racial categories are biologically not meaningful, whether this means that there are no biological differences between “races”. Things suddenly became a bit more complicated:
I did not say that there are no biological differences between races. […] There are certainly genetic variations among humans, and some of these variations do correspond with geographic and ethnic ancestry. The genetic differences between individuals of different geographic and ethnic ancestry reflect the evolutionary history of human populations, including patterns of migration, adaptation to different environments, and genetic drift. For example, genetic studies have found that certain populations have a higher frequency of particular genetic variants that confer resistance to certain diseases, such as sickle cell anaemia in populations from regions with a high incidence of malaria. […]
Even though most of the output is scientifically accurate, the first sentence caught my attention. ChatGPT seems to conflate “race”, as prompted by my question, with human populations of different “geographic and ethnic ancestry”. Interestingly, this is also the fallacy that David Reich is accused of by his opponents. However, ethnicity, like “race”, is socially defined and conceptually different from human populations descending from different geographic locations. Two individuals who identify as Black, a commonly used racial or ethnic category in many countries, might (partly) descend from vastly different populations on the African continent, with considerable genetic diversity between them. An example to illustrate this was mentioned by ChatGPT itself: due to the protective properties of sickle-cell anaemia against malaria, the abundance of the genetic variant causing sickle-cell anaemia increases only with the frequency of malaria transmission (and therefore deaths, acting as a selection pressure) in a given location, including outside Africa (Kato et al., 2018; Esoh and Wonkam, 2021). Consequently, sickle cell anaemia only reliably correlates with malaria incidence, not continent, or any socially constructed concept of “race” or ethnicity. Guessing whether a Black or African person has the variant for sickle-cell anaemia therefore becomes as reliable as guessing whether I had orange juice for breakfast this morning (I didn’t). This point is actually further supported by research showing that even knowing, rather than assuming, an individual’s ancestry only slightly improves the ability to predict their genotype (Feldman and Lewontin, 2008).
Next, I wanted to know what scientific evidence ChatGPT uses to justify its stance on human genetic variation. Therefore, I asked how it knew about the greater variation within than between human populations, to which ChatGPT replied:
[…] One way to measure genetic variation is to analyse genetic markers, such as single nucleotide polymorphisms (SNPs), that are found throughout the human genome. Studies have found that the vast majority of genetic variation in humans is found within populations rather than between them.
For example, a landmark study published in the journal Nature in 2002 analysed genetic data from over 1,000 individuals from 52 populations around the world. The study found that only about 5-10% of genetic variation was found between populations, while the remaining 90-95% of genetic variation was found within populations. Subsequent studies using larger and more diverse datasets have confirmed these findings.
ChatGPT mentioned the use of single nucleotide polymorphisms (SNPs), which are single base positions in the DNA that can have two or more alternative sequences, or variants. Algorithms can then analyse the distributions of SNPs in individuals and cluster them into genetically similar groups. While some researchers have argued that humans reliably cluster together according to continents or self-identified racial categories (Tang et al., 2005; Li et al., 2008), the overlap between genetic clusters and socially defined race is weak at best (Tishkoff et al., 2009; Graves and Goodman, 2022). Also, it is important to remember that in these analyses, the number of clusters, or groups, is set a priori. Simply put, an algorithm divides people up into as many groups as it is asked to. It should not be surprising if individuals from close geographical regions cluster together; but as genetic variation is continuous, the distinctions and categories arise from the methods, and cannot be derived from the data alone (Gravlee, 2009).
Furthermore, such results themselves can be called into question: arguably, the most rigorous and authoritative genomic investigation found that even when the algorithm was asked to group human populations into five categories (a common contemporary number for ethnic categories in Western countries, like African, European, Asian, etc.), several very different genetic patterns emerge for populations on the African continent while Europeans show an almost indistinguishable pattern to populations in the Middle East, Central Asia and India (Tishkoff et al., 2009; for further reading, their paper includes various interesting illustrations on this point, which can be accessed with a university login). This underlines the vast genetic variability that exists on the African continent and further indicates that biology does not lend support to societal notions of human ethnicity or race.
As the conversation continued, I encountered some cases of factual inaccuracy in the output of ChatGPT. Digging deeper into the sources ChatGPT used, I noticed that some scientific papers and references were simply invented, a common problem with LLMs (Hillier, 2023). Furthermore, ChatGPT explained to me that “[t]he concept that genetic variation within groups is greater than between groups is often referred to as the ‘Lewontin’s fallacy’ after geneticist Richard Lewontin, who first described it in the 1970s.” While the notion does originate from work by Lewontin in the 1970s (Lewontin, 1972, 1973), the so-called ‘Lewontin’s fallacy’ describes the opposite of what ChatGPT proposes. The fallacy dates to an article by Edwards in 2003, in which he proposes that Lewontin’s findings are a statistical error, concealing that humans can be genetically distinguished into stable ethnic groups, especially when many different genetic traits are considered in combination (Edwards, 2003; Hochman, 2021). Let’s see how ChatGPT describes the concept when asked specifically:
Lewontin’s fallacy is a concept in population genetics that refers to the idea that genetic variation within groups (such as racial or ethnic groups) is greater than the variation between groups. The term “fallacy” is used because this observation, while technically true, can be misleading or misinterpreted. […]
While Lewontin’s observation is accurate, it has been criticised as a fallacy because it can be misinterpreted to suggest that race is a meaningless or arbitrary category. This is not the case. While genetic variation within groups is greater than between groups, it is still possible to use genetic information to infer ancestry or geographic origins with a certain degree of accuracy. […]
Therefore, while Lewontin’s fallacy highlights the limitations of using racial or ethnic categories to make biological or genetic inferences, it does not mean that these categories are entirely meaningless or irrelevant. It’s important to consider both genetic and social factors when studying human diversity and health.
ChatGPT again conflates “race” with genetic information about geographic origin, but now uses this argument to refute race as a biologically “meaningless category”, directly contradicting its own statement about the non-existence of “distinct, biologically meaningful racial categories”, mentioned earlier. I speculate that this could be due to the training data: articles describing Lewontin’s fallacy might be more likely to defend it and therefore, ChatGPT does, too. There is again no mention of the scientific debate behind the fallacy, which in this case reproduces outdated ideas around “race”. Precisely because Edwards ironically falls for the same fallacy as Reich and, in this case, ChatGPT. He seems to overlook that, as demonstrated, regarding genetic differences as continuous is a more accurate depiction of reality. Most importantly, though, this example shows that the content of ChatGPT’s answers can vary substantially, simply depending on how you ask. This is arguably a big problem for search engines that claim neutrality to their users.
After this point, arguments became repetitive, and I noticed that I exhausted what ChatGPT could say about the topic; I decided to terminate the conversation. Despite the short encounter, several interesting conclusions can be drawn. I was positively surprised that ChatGPT initially rejected “race” as a biologically meaningful category, which is by far the dominant stance among scholars. However, the right (or rather wrong) questions could provoke contradicting, and – in my view – problematic answers. Ironically, ChatGPT seems to be aware of its own problems. When asked directly, it affirmed that “it is possible that I may return information that is based on a scientific consensus that is obsolete or outdated” or “that the data I was trained on may contain biases, including racial biases, and that I may reproduce those biases in my responses”. Given how little this level of self-reflection was represented in its answers, you would be right to call ChatGPT… slightly hypocritical.
Lastly, despite ChatGPT’s intent to present scientific debates as such, it became clear that this was not the case. Though speculative, it seemed like the direction of ChatGPT’s answers relied on either pre-defined settings or the phrasing of questions. Concerning the former, some of OpenAI’s competitors are already working on AIs that are less “politically correct”, or “woke” (I am looking at you, Elon) than ChatGPT (Bryant, 2023). In the worst case, this could lead to a situation in which users turn to the AI that only affirms one’s viewpoint and could contribute further to polarisation. Regarding the latter, output from LLMs should not depend on arbitrary wording or chance but rather reflect the stage of scientific debate and evaluate whether a consensus has already been reached or is currently shifting. Especially in a scientific setting, constructive debates and opposing viewpoints are indispensable in the pursuit of knowledge, as most things in science are (un)fortunately complicated. Their nuanced depiction should be prioritised over concise and snappy answers in the future development of LLMs. As I hope to have shown, this is relevant not only for the sake of scientific accuracy but especially for topics that carry important social ramifications for already underprivileged communities.
References
Barbujani, G. and Belle, E.M.S. (2006) ‘Genomic boundaries between human populations’, Human heredity, 61(1), pp. 15–21. Available at: https://doi.org/10.1159/000091….
Bryant, M. (2023) ‘Elon Musk reportedly planning to launch AI rival to ChatGPT maker’, The Guardian, 15 April. Available at: https://www.theguardian.com/te… (Accessed: 27 April 2023).
BuzzFeed Opinion (2018) ‘How Not To Talk About Race And Genetics’, BuzzFeed News, 30 March. Available at: https://www.buzzfeednews.com/article/bfopinion/race-genetics-david-reich (Accessed: 16 January 2024).
Chen, Z. (2023) ‘Ethics and discrimination in artificial intelligence-enabled recruitment practices’, Humanities and Social Sciences Communications, 10(1), p. 567. Available at: https://doi.org/10.1057/s41599….
Cheuk, T. (2021) ‘Can AI be racist? Color-evasiveness in the application of machine learning to science assessments’, Science Education, 105(5), pp. 825–836. Available at: https://doi.org/10.1002/SCE.21….
Edwards, A.W.F. (2003) ‘Human genetic diversity: Lewontin’s fallacy’, BioEssays, 25(8), pp. 798–801. Available at: https://doi.org/10.1002/BIES.1….
Esoh, K. and Wonkam, A. (2021) ‘Evolutionary history of sickle-cell mutation: implications for global genetic medicine’, Human Molecular Genetics, 30(1), pp. 119–128. Available at: https://doi.org/10.1093/HMG/DD….
Feldman, M.W. and Lewontin, R.C. (2008) ‘Race, ancestry, and medicine.’, in B.A. Koenig, S.S.-J. Lee, and S.S. Richardson (eds) Revisiting race in a genomic age. New Brunswick, NJ: Rutgers University Press, pp. 89–101.
Fujimura, J.H. and Rajagopalan, R. (2011) ‘Different differences: The use of “genetic ancestry” versus race in biomedical human genetic research’, Social Studies of Science, 41(1), pp. 5–30. Available at: https://doi.org/10.1177/030631….
Grant, N. (2023) ‘Google Devising Radical Search Changes to Beat Back A.I. Rivals’, The New York Times, 16 April. Available at: https://www.nytimes.com/2023/0… (Accessed: 26 April 2023).
Graves, J.L. and Goodman, A.H. (2022) Racism, Not Race: Answers to Frequently Asked Questions. New York: Columbia University Press.
Gravlee, C.C. (2009) ‘How race becomes biology: Embodiment of social inequality’, American Journal of Physical Anthropology, 139(1), pp. 47–57. Available at: https://doi.org/10.1002/ajpa.2….
Handley, L.J.L. et al. (2007) ‘Going the distance: human population genetics in a clinal world’, Trends in Genetics, 23(9), pp. 432–439. Available at: https://doi.org/10.1016/j.tig…..
Hillier, M. (2023) ‘Why does ChatGPT generate fake references?’, TECHE. Available at: https://teche.mq.edu.au/2023/0… (Accessed: 28 April 2023).
Hochman, A. (2021) ‘Janus-faced race: Is race biological, social, or mythical?’, American Journal of Physical Anthropology, 175(2), pp. 453–464. Available at: https://doi.org/10.1002/AJPA.2….
Kato, G.J. et al. (2018) ‘Sickle cell disease’, Nature Reviews, 4(1), pp. 1–22. Available at: https://doi.org/10.1038/nrdp.2….
Lewontin, R.C. (1972) ‘The apportionment of human diversity’, Evolutionary Biology, 6, pp. 381–398.
Lewontin, R.C. (1973) ‘Population genetics.’, Annual review of genetics, 7(1), pp. 1–17. Available at: https://doi.org/10.1146/ANNURE….
Li, J.Z. et al. (2008) ‘Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation’, Science, 319(5866), pp. 1100–1104. Available at: https://doi.org/10.1126/scienc….
Long, J.C. and Kittles, R.A. (2003) ‘Human genetic diversity and the nonexistence of biological races’, Human biology, 75(4), pp. 449–471. Available at: https://doi.org/10.1353/HUB.20….
Obermeyer, Z. et al. (2019) ‘Dissecting racial bias in an algorithm used to manage the health of populations’, Science, 366(6464), pp. 447–453. Available at: https://doi.org/10.1126/scienc….
Rajpurkar, P. et al. (2022) ‘AI in health and medicine’, Nature Medicine, 28(1), pp. 31–38. Available at: https://doi.org/10.1038/s41591….
Ramachandran, S. et al. (2005) ‘Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa’, Proceedings of the National Academy of Sciences, 102(44), pp. 15942–15947. Available at: https://doi.org/10.1073/pnas.0….
Reich, D. (2018) ‘How Genetics Is Changing Our Understanding of “Race”’, The New York Times, March 23. Available at: https://www.nytimes.com/2018/03/23/opinion/sunday/genetics-race.html (Accessed: 18 January 2024)
Rosenberg, N.A. et al. (2002) ‘Genetic Structure of Human Populations’, Science, 298(5602), pp. 2381–2385. Available at: https://doi.org/10.1126/scienc….
Rozado, D. (2023) ‘The Political Biases of ChatGPT’, Social Sciences, 12(3), p. 148. Available at: https://doi.org/10.3390/SOCSCI….
Tang, H. et al. (2005) ‘Genetic Structure, Self-Identified Race/Ethnicity, and Confounding in Case-Control Association Studies’, The American Journal of Human Genetics, 76(2), pp. 268–275. Available at: https://doi.org/10.1086/427888.
Tishkoff, S.A. et al. (2009) ‘The genetic structure and history of Africans and African Americans’, Science, 324(5930), pp. 1035–1044. Available at: https://doi.org/10.1126/SCIENC….
Woodley, M.A. (2010) ‘Is Homo sapiens polytypic? Human taxonomic diversity and its implications’, Medical Hypotheses, 74(1), pp. 195–201. Available at: https://doi.org/10.1016/j.mehy….
The image was generated using Stable Diffusion XL Base 1.0 and the prompt: “a laptop and people with different races and DNA”