LLM-Assisted Heuristic Evaluation Identifies 55% of Usability Issues in Automotive REM Tools

Category: User-Centred Design · Effect: Moderate effect · Year: 2026

Leveraging Large Language Models (LLMs) alongside human expert review can significantly enhance the efficiency and scope of heuristic usability evaluations for complex engineering software.

Design Takeaway

Incorporate LLM-driven heuristic analysis as a preliminary step in usability testing for complex engineering software to quickly identify a substantial portion of potential issues, thereby optimizing the use of human expert resources.

Why It Matters

Requirements Engineering and Management (REM) tools are critical for safety-critical industries like automotive design. Poor usability in these tools can lead to workflow disruptions and compliance risks. This research demonstrates how AI can augment traditional usability testing, offering a faster, broader initial assessment.

Key Finding

An LLM successfully identified over half of the relevant usability issues in an automotive REM tool, with a significant portion of these findings aligning with those of human experts.

Key Findings

Human evaluators identified usability issues within the REM tool.
The LLM identified 55% of its detected issues as valid, with 32% overlapping with human findings and 23% being novel.
LLMs show potential as complementary tools for accelerating early-stage heuristic inspections.

Research Evidence

Aim: To evaluate the usability of an automotive Requirements Engineering and Management (REM) tool using both human expert heuristics and an LLM-based approach, and to compare the effectiveness of each method.

Method: Comparative Heuristic Usability Evaluation

Procedure: Human experts conducted a heuristic usability evaluation of the IBM DOORS Next Generation REM tool using Nielsen's 10 Usability Heuristics. Subsequently, an LLM (ChatGPT-5) was prompted with the same heuristics and static screenshots of the tool to perform its own evaluation. The findings from both methods were compared to identify overlapping and unique issues.

Context: Automotive engineering software (Requirements Engineering and Management tools)

Design Principle

Augment human expertise with AI for efficient and comprehensive design evaluation.

How to Apply

When evaluating complex software, use an LLM with relevant heuristics and screenshots to generate an initial list of potential usability problems. Then, have human experts review and validate these findings, focusing their efforts on the most critical or unique issues identified by the AI.

Limitations

The LLM's evaluation was based on static screenshots, lacking dynamic interaction. The LLM's unconfirmed findings require further validation. The specific LLM used may influence results.

Student Guide (IB Design Technology)

Simple Explanation: Using AI like ChatGPT can help find many of the same problems in software design that human experts do, and sometimes even find new ones, making the testing process faster.

Why This Matters: This research shows how new AI technologies can be used in design projects to make usability testing more efficient and thorough, which is important for creating user-friendly products.

Critical Thinking: To what extent can LLMs truly understand the subjective aspects of user experience, such as delight or frustration, which are crucial for holistic design evaluation?

IA-Ready Paragraph: This design project explored the potential of AI in usability evaluation. By employing a Large Language Model (LLM) alongside human heuristic review, we aimed to assess the efficiency and effectiveness of AI in identifying usability issues within complex engineering software. The LLM successfully flagged a significant percentage of relevant issues, demonstrating its capacity to augment traditional design research methods and accelerate the identification of potential design flaws.

Project Tips

When evaluating a product, consider using an AI tool to help identify potential usability issues before conducting user testing.
Clearly document the prompts and parameters used when employing AI for design analysis.

How to Use in IA

Use AI tools to generate initial hypotheses about usability problems in your design, which can then be tested through user research.
Compare AI-generated findings with your own observations or expert reviews to identify strengths and weaknesses of the AI approach.

Examiner Tips

Demonstrate an understanding of how AI can be integrated into the design research process, rather than just relying on traditional methods.
Critically evaluate the limitations of AI-generated insights and how they complement, rather than replace, human judgment.

Independent Variable: ["Evaluation method (Human Expert vs. LLM)","Type of usability issue"]

Dependent Variable: ["Number of identified usability issues","Percentage of overlapping issues","Percentage of novel issues identified by LLM"]

Controlled Variables: ["Usability heuristics used (Nielsen's 10)","Target software (IBM DOORS Next Generation)","Screenshots used for LLM evaluation"]

Strengths

Comparative analysis of two distinct evaluation methods.
Application to a relevant and complex domain (automotive engineering software).

Critical Questions

How would the results differ if a different LLM or a more advanced version was used?
What are the ethical considerations of relying on AI for design evaluation, particularly in safety-critical applications?

Extended Essay Application

Investigate the efficacy of different AI models in evaluating the usability of various design artifacts, from physical products to digital interfaces.
Explore the integration of AI-generated insights into the iterative design process, assessing its impact on design outcomes and development timelines.

Source

Hybrid Usability Evaluation of an Automotive REM Tool: Human and LLM-Based Heuristic Assessment of IBM Doors Next · Applied Sciences · 2026 · 10.3390/app16020723