LLM-Assisted Heuristic Evaluation Identifies 55% of Usability Issues in Automotive REM Tools

Category: User-Centred Design · Effect: Moderate effect · Year: 2026

Leveraging Large Language Models (LLMs) alongside human expert review can significantly enhance the efficiency and scope of heuristic usability evaluations for complex engineering software.

Design Takeaway

Incorporate LLM-driven heuristic analysis as a preliminary step in usability testing for complex engineering software to quickly identify a substantial portion of potential issues, thereby optimizing the use of human expert resources.

Why It Matters

Requirements Engineering and Management (REM) tools are critical for safety-critical industries like automotive design. Poor usability in these tools can lead to workflow disruptions and compliance risks. This research demonstrates how AI can augment traditional usability testing, offering a faster, broader initial assessment.

Key Finding

An LLM successfully identified over half of the relevant usability issues in an automotive REM tool, with a significant portion of these findings aligning with those of human experts.

Key Findings

Research Evidence

Aim: To evaluate the usability of an automotive Requirements Engineering and Management (REM) tool using both human expert heuristics and an LLM-based approach, and to compare the effectiveness of each method.

Method: Comparative Heuristic Usability Evaluation

Procedure: Human experts conducted a heuristic usability evaluation of the IBM DOORS Next Generation REM tool using Nielsen's 10 Usability Heuristics. Subsequently, an LLM (ChatGPT-5) was prompted with the same heuristics and static screenshots of the tool to perform its own evaluation. The findings from both methods were compared to identify overlapping and unique issues.

Context: Automotive engineering software (Requirements Engineering and Management tools)

Design Principle

Augment human expertise with AI for efficient and comprehensive design evaluation.

How to Apply

When evaluating complex software, use an LLM with relevant heuristics and screenshots to generate an initial list of potential usability problems. Then, have human experts review and validate these findings, focusing their efforts on the most critical or unique issues identified by the AI.

Limitations

The LLM's evaluation was based on static screenshots, lacking dynamic interaction. The LLM's unconfirmed findings require further validation. The specific LLM used may influence results.

Student Guide (IB Design Technology)

Simple Explanation: Using AI like ChatGPT can help find many of the same problems in software design that human experts do, and sometimes even find new ones, making the testing process faster.

Why This Matters: This research shows how new AI technologies can be used in design projects to make usability testing more efficient and thorough, which is important for creating user-friendly products.

Critical Thinking: To what extent can LLMs truly understand the subjective aspects of user experience, such as delight or frustration, which are crucial for holistic design evaluation?

IA-Ready Paragraph: This design project explored the potential of AI in usability evaluation. By employing a Large Language Model (LLM) alongside human heuristic review, we aimed to assess the efficiency and effectiveness of AI in identifying usability issues within complex engineering software. The LLM successfully flagged a significant percentage of relevant issues, demonstrating its capacity to augment traditional design research methods and accelerate the identification of potential design flaws.

Project Tips

How to Use in IA

Examiner Tips

Independent Variable: ["Evaluation method (Human Expert vs. LLM)","Type of usability issue"]

Dependent Variable: ["Number of identified usability issues","Percentage of overlapping issues","Percentage of novel issues identified by LLM"]

Controlled Variables: ["Usability heuristics used (Nielsen's 10)","Target software (IBM DOORS Next Generation)","Screenshots used for LLM evaluation"]

Strengths

Critical Questions

Extended Essay Application

Source

Hybrid Usability Evaluation of an Automotive REM Tool: Human and LLM-Based Heuristic Assessment of IBM Doors Next · Applied Sciences · 2026 · 10.3390/app16020723