ADITI ROY CHOUDHURY

Designing
Trust in AI.

How psychological principles and
iterative research shaped an
agentic QA platform for a Fortune
500 healthcare enterprise

ROLE

UX Researcher & Designer

DURATION

10 months, Phased

TEAM

1 Architect

4 Data Science Engineers

4 Quality Engineers

4 Fullstack Engineers

2 Product Managers

1 Director

THE HONEST PREMISE

Onboarding critical and sensitive drug data into a healthcare database is tedious and time-consuming. Especially when there's a lot of data and the process is not automated. Our team devised an Agentic AI solution that will allow QA users to monitor this process and onboard drug data that goes through multiple application before finally getting registered in the database. However, mid-project, the client tripled the scope. They wanted automation but the hesitation to trust an AI tool with sensitive drug data was inevitable and valid.

I treated it as a behavioural design problem — because when users don't trust an AI tool, they abandon it. And trust is a psychological construct, not a visual one.

THE USERS AND THE REAL PROBLEM

QA engineers at a major healthcare enterprise were spending 30–50% of every sprint manually writing, formatting and onboarding drug data into their database and running test cases for the same. This work was repetitive, error-prone, and deeply unstandardized. Senior and junior engineers produced wildly different coverage quality. Test documentation lived across JIRA, offline spreadsheets, and SharePoint, with no unified view of progress or status.

We offered an Agentic AI solution that automates the entire process of registering drug data by the QA team and takes cares of the end-to-end QA testing right from test case generation to execution. While the technology existed, the human problem of adopting and trusting an AI with sensitive data was unsolved.

That was the real design brief.

THE PROBLEM SPACE: A WEEK IN THREE ACTS

Monday

The Cognitive Drain

Low-value repetition creates mental fatigue before the real work begins.

"I spend Monday mornings copy-pasting acceptance criteria from Jira into Excel, formatting the same login steps for the 50th time. By Wednesday, I'm still not done writing tests. I need to be extra careful and therefore it slows me down."

QA ENGINEER

Wednesday

The Decision Paralysis

The 'Broken Windows Theory' of QA: When documentation is messy, engineers stop caring about precision.

"I can't tell if we have good coverage or not. Tests are scattered across Google Sheets, and nobody maps them back to the original stories. Onboarding drug data with precision becomes more complex and time-consuming. There are so many applications in place."

QA ENGINEER

Friday

The Algorithmic Aversion

Low-value repetition creates mental fatigue before the real work begins.

"Given that some of the processes as quite standard, is there a way we can automate it? Saves us a large amount of manual efforts. But, if AI is generating the steps and also running the tests- should I be trusting it with the end-to-end process? At what point, do I come in?"

QA ENGINEER

RESEARCH APPROACH

Building a behavioural baseline.

Given the constraints of building a single accessible POC on the client side which distributed across a large cross-functional team and the limited access to end users, I adopted a multi-source approach in order to maximize insight depth over breadth.

The gap between expectation and output is where trust breaks. That became my design north star.

Stakeholder Workshops

SURFACE CONFLICTING MENTAL MODELS

Structured workshops with the full team of architects, data scientists, QA, full-stack. The goal was to surface conflicting mental models early (what the DS team thought users wanted versus what users actually needed) and resolve those tensions before the final structure. They doubled as alignment checkpoints as scope evolved.

Contextual Inquiry

WATCH THE REAL WORKFLOW

I observed QA engineers using their existing platforms in real time. I observed how they navigated JIRA, structured test cases in spreadsheets, where they hesitated and what took the longest time. Watching the actual workflow surfaced what they'd never think to mention: the copy-paste patterns, the informal workarounds, the visible frustration.

Stakeholder Interviews

LEARN THE BACKEND CONSTRAINTS

In-depth interviews across the full team- not just end users but architects and data scientists. Understanding the backend constraints directly from the DS team was critical: I needed to know what the AI could and couldn't do reliably, so interfaces set accurate expectations instead of over-promising. It also involved working actively with the Product manager in order to keep business outcomes in mind while designing.

Cross-Organisational Validation

TRIANGULATE ACROSS TWO CULTURES

To test whether the pain points were client-specific or representative, I ran parallel sessions with QA engineers at Brillio. This triangulation meant decisions weren't built on one team's context, but on behavioural patterns that held across two different engineering cultures. A range of potential pain points were discovered by our team at Brillio itself.

Task Analysis

DECOMPOSE EACH SCREEN

For each screen I broke down the cognitive tasks a user had to complete in terms of what decisions were being taken, in what sequence, with what information. Crucial for the running-test flow, where users upload cases, track status, trigger actions, and hold a mental model of the pipeline at once. It directly shaped the information hierarchy.

Moderated Usability Testing

SIX LONGITUDINAL ROUNDS

6 rounds of moderated remote testing across the project, with the QA lead as primary participant. Although a single-participant testing was a constraint but it also served as an opportunity: longitudinal observation of one expert across 6 sessions revealed how mental models, comfort and behaviour shifted as the product evolved.

DESIGN DECISION - THE PSYCHOLOGICAL PIVOT

Three months in, the client expanded scope significantly. Turns out, the QA users were visualizing a different workflow from the Agent. There was no single test-case upload but rather a bulk document ingestion. On paper, a redesign problem. In practice, a trust collapse waiting to happen.

“Manually, we do look at individual test cases but if we are involving an Agent into our workflow, we expect a bulk interaction. If we can upload multiple test cases at a time, wouldn't it reduce the work load? ”

PHASE 1: VERSION 1

Single User Story Entry

User enters a story manually or uploads a CSV in a provided template.

→ 3-step guided flow, human agent in the loop.

PHASE 2: SCOPE CREEP

Bulk Upload + SOP Entry

User uploads multiple stories in a CSV template; the agent extracts data and generates test steps.

→ Smart edits — only one upload button.

PHASE 3: COMPLEXITY PEAK

Bulk Upload + Interactions

User uploads test cases in large numbers, interacts with tests, and runs them after review.

→ 3-step simple journey with test management.

Three cognitive science principles drove the core design architecture — not as post-hoc justification, but as active decision-making frameworks during design.

COGNITIVE LOAD THEORY

Progressive Disclosure

“I just want to upload a doc with my test cases. Why am I seeing 12 fields? Why so much manual work? This is exactly what I wanted to avoid.”

The behavioural risk. A 12-field form for someone who just wants to upload a document triggers cognitive overload. Overwhelmed users don't push through, they leave, or comply without understanding what they're confirming. Neither builds trust.

The response. Chunked into a 3-step guided flow. Advanced options like bulk tracking were collapsible and surfaced only when contextually relevant. The basic flow worked immediately; power features were introduced progressively.

HICK's LAW

Decision Scaffolding

“There are too many options. I want something straightforward — I already have the test cases ready.”

The behavioural risk. More choices = longer decisions = higher abandonment. In an AI workflow, decision fatigue is dangerous. Given that users are only just starting to trust a completely new platform with End-to-end testing, they might start rubber-stamping AI output just to get through the interface.

The response. Reduced the decision surface from many input fields to one upload with a predefined template. Smart defaults auto-detect test cases. Primary actions dominant; secondary actions present but recessed. Every confirmation single-action. This ensures the human-in-loop. We give the users a feeling that will AI can take care of everything but feel free to intervene and review whenever you want. It may be an end-to-end process, but the action to generate test steps and reviewing sits with you.

SELF- DETERMINATION THEORY

Locus of Control

“I'm working across 32 test cases. I can't review every step — but I need to feel like I could.”

The behavioural risk.Users at scale cannot review everything. They shift to an agentic tool to automate tests while keeping a sense of control. Naturally, they want to save time but they also want to be accountable for the outputs after all they are dealing with sensitive data. AI is best with the human in the loop.

The response. The stage-based flow gives users perceived ownership over an AI process they didn't author. They don't need to read every test case, they just need to feel like they could. The second screen hands the locus of control back.

Complete control to AI is scary. But, fewer, clearer decisions build confidence users feel in control, not managed. Time is saved.

OUTCOME AND IMPACT

“Signed off without a single UX revision request”

across a cross-functional team of 11, in a Fortune 500 healthcare enterprise context.

QGentic is in phased release. Phases 1 & 2 are live with the client's QA team; Phase 3 is in active development.

QA engineers who once spent 2–3 hours running a single test can now process hundreds of test cases in the same window — including review time.

The tool doesn't eliminate their judgment. It eliminates their drudgery. That distinction was intentional, and users recognised it.

Adoption is growing steadily. The QA lead who pushed back hardest across 6 sessions — who represented users most faithfully — signed off on the design. That is the outcome I weight most.

Global feedback button on the tool reveals that users are feeling comfortable onboarding data through the application despite failures in a few tests.

LESSONS

Trust in AI is an architecture, not a feature.

Every decision, the 3-step flow, progressive disclosure, the singular upload, the feedback loop simply answered one question: does this make the user feel in control of an AI they didn't build? That has to be yes at every stage, or adoption fails regardless of the technology.

Scope creep is a research finding in disguise.

When the client said everything happens in bulk, that wasn't a problem to manage: it was the most important insight of the project. It reframed the entire product. A researcher who only works within fixed requirements misses what the research is telling them.

Psychology doesn't stop at the therapy room.

Cognitive Load Theory, Hick's Law, Locus of Control, Broken Windows are not academic abstractions, but descriptions of how humans behave under pressure. An engineer staring at a 12-field form on a Monday is experiencing overload like any user. The domain changes; the brain doesn't.

Single-participant longitudinal testing has underrated value.

6 sessions with one expert user over 8 months gave me what multiple participants in a single round never could: a record of how trust, mental models and behaviour change over time. I could see the product working. That's a different kind of evidence.

Collaboration across disciplines is itself a research method.

The quality of insight came from the breadth of people I spoke to: QA lead, architects, data scientists, engineers at Brillio. No single stakeholder had the full picture. Holding all those perspectives at once and finding the decision that serves them all is the hardest technical skill in the room.

Designing Trust in AI.

Designing Trust in AI.

How psychological principles and iterative research shaped an agentic QA platform for a Fortune 500 healthcare enterprise

How psychological principles and iterative research shaped an agentic QA platform for a Fortune 500 healthcare enterprise

THE HONEST PREMISE

THE USERS AND THE REAL PROBLEM

That was the real design brief.

THE PROBLEM SPACE: A WEEK IN THREE ACTS

RESEARCH APPROACH

Building a behavioural baseline.

DESIGN DECISION - THE PSYCHOLOGICAL PIVOT

COGNITIVE LOAD THEORY

Progressive Disclosure

HICK's LAW

Decision Scaffolding

SELF- DETERMINATION THEORY

Locus of Control

OUTCOME AND IMPACT

LESSONS

Designing
Trust in AI.

Designing
Trust in AI.

How psychological principles and
iterative research shaped an
agentic QA platform for a Fortune
500 healthcare enterprise

How psychological principles and
iterative research shaped an
agentic QA platform for a Fortune
500 healthcare enterprise