LLM Stereotypes Mirror Lived Disability Experiences
Category: User-Centred Design · Effect: Strong effect · Year: 2023
Large Language Models often perpetuate subtle, harmful stereotypes about disability that mirror real-world biases, rather than overtly offensive content.
Design Takeaway
Prioritize inclusive data sourcing and user-centered evaluation methods that capture nuanced biases, not just overt offensiveness, when designing AI systems.
Why It Matters
Designers developing AI-powered tools must recognize that 'non-offensive' doesn't equate to 'unbiased.' Understanding how LLMs can subtly reinforce negative stereotypes is crucial for creating inclusive and equitable user experiences.
Key Finding
People with disabilities interacting with an AI language model found it reflected subtle, harmful stereotypes they encounter daily, rather than overtly offensive content, suggesting a need for better training data.
Key Findings
- Participants rarely found LLM outputs overtly offensive or toxic.
- LLM responses reflected subtle, harmful stereotypes (e.g., inspiration porn, able-bodied saviors) that mirrored participants' lived experiences and dominant media portrayals.
- Participants identified training data as a likely source of these stereotypes.
- Participants recommended training LLMs on diverse, disability-positive resources.
Research Evidence
Aim: To identify categories of harms perpetuated by Large Language Models (LLMs) towards the disability community from their perspective.
Method: Qualitative research using focus groups and annotation.
Procedure: Researchers conducted 19 focus groups with 56 participants with disabilities. Participants interacted with a dialog model, discussing and annotating its responses related to disability.
Sample Size: 56 participants
Context: Artificial Intelligence (AI) development, specifically Large Language Models (LLMs) and their societal impact.
Design Principle
AI systems should be evaluated not only for overt harmfulness but also for their subtle reinforcement of societal stereotypes, especially concerning marginalized groups.
How to Apply
When developing or evaluating AI systems, actively recruit individuals from diverse and marginalized communities to test for subtle biases and stereotypes in AI outputs, using their lived experiences as a benchmark.
Limitations
The study focused on a specific dialog model and the disability community; findings may not generalize to all LLMs or other marginalized groups without further research. The definition of 'harm' was participant-defined and nuanced.
Student Guide (IB Design Technology)
Simple Explanation: AI chatbots can sometimes say things that aren't obviously rude but still make people with disabilities feel misunderstood or stereotyped, just like in real life.
Why This Matters: This research highlights that even AI tools designed to be helpful can unintentionally cause harm by reflecting societal biases, which is important for creating responsible and inclusive designs.
Critical Thinking: How can designers proactively identify and mitigate 'subtle biases' in AI systems before they are deployed, beyond simply checking for overtly offensive content?
IA-Ready Paragraph: This research underscores the critical need for user-centered evaluation of AI systems, particularly concerning subtle biases. As demonstrated by Gadiraju et al. (2023), AI models can inadvertently perpetuate harmful stereotypes about disability that mirror lived experiences, even when not overtly offensive. This highlights the importance of involving diverse user groups in the design and testing phases to ensure AI tools are equitable and inclusive.
Project Tips
- When testing AI tools, think about how they might represent different groups of people.
- Consider involving users from diverse backgrounds in your design process to uncover potential biases.
How to Use in IA
- Reference this study when discussing the ethical implications of AI or the importance of user testing with diverse groups in your design project.
Examiner Tips
- Demonstrate an understanding of 'subtle bias' in AI, not just overt toxicity, and how user perspectives are crucial for identifying it.
Independent Variable: LLM responses to disability-related prompts.
Dependent Variable: Participant perceptions of LLM outputs (e.g., subtle stereotypes, harmfulness).
Controlled Variables: Focus group discussions, participant demographics (disability status).
Strengths
- Directly incorporates the perspectives of individuals with disabilities.
- Focuses on nuanced, subtle harms often overlooked in bias detection.
Critical Questions
- What are the ethical responsibilities of designers when AI systems perpetuate subtle biases?
- How can training data be curated to actively promote positive representations rather than just avoid negative ones?
Extended Essay Application
- Investigate the potential for subtle biases in AI-generated content within a specific domain (e.g., educational materials, creative writing tools) by conducting user studies with relevant communities.
Source
"I wouldn't say offensive but...": Disability-Centered Perspectives on Large Language Models · 2023 · 10.1145/3593013.3593989