Designing Trust In AI

How psychological principles tamed a moving target

The honest premise: Mid-project, the client tripled the scope. Most designers would have patched the UI and moved on. I treated it as a behavioural design problem — because when users don't trust an AI tool, they abandon it. And trust is a psychological construct, not a visual one.

ROLE

UX Researcher & Designer

DURATION

8 months, phased

TEAM

1 Architect

4 Data Science Engineers

4 Quality Engineers

2 Fullstack Engineers

1 Product Manager

THE USERS AND THE REAL PROBLEM

Identifying core painpoints.

Monday: The Cognitive Drain

Low-value repetition creates mental fatigue before the real work begins.

"I spend Monday mornings copy-pasting acceptance criteria from Jira into Excel, formatting the same login steps for the 50th time. By Wednesday, I'm still not done writing tests."

-QA Engineer

Wednesday: The Decision Paralysis

The 'Broken Windows Theory' of QA: When documentation is messy, engineers stop caring about precision.

"I can't tell if we have good coverage or not. Tests are scattered across Google Sheets, and nobody maps them back to the original stories."

-QA Engineer

Friday: The Algorithmic Aversion

Low-value repetition creates mental fatigue before the real work begins.

"I would like it to be automated but I should want to trust the AI and its outputs."

-QA Engineer

QA engineers were losing 30–50% of sprint capacity to manual test case creation — repetitive, error-prone, and deeply unstandardised. Senior and junior engineers produced wildly different coverage quality. Tests lived in scattered spreadsheets, disconnected from the stories that spawned them.

DISCOVERY

Building a Behavioural Baseline.

40% of a QA engineer's week is spent writing test cases manually. What if AI could do it in 2 minutes?

The initial phase of the process was discovering the problem. Testing process is a bit scattered, in that it’s spread across various platforms (JIRA and whatnot) and a singular platform that would allow for end-to-end testing would really ease out the process where most of test case generation and running can be done automatically by the agent.


I shadowed live QA workflows with engineers, mapped their mental models from story → test → execution, and ran 10 structured co-design syncs across architects, QA leads, and developers. This wasn't just requirements gathering — I was building a behavioural baseline: what do users expect the AI to do, and where does reality diverge from that expectation?

The gap between expectation and output is where trust breaks. That became my design north star.

DESIGN DECISIONS

The Psychological Pivot

Three months in, the client expanded scope significantly — from single test case upload to bulk document ingestion. On paper, a redesign problem. In practice, a trust collapse waiting to happen.

Before- After Scope Timeline

"We do not upload a single test case. There is not single test case engagement. Everything is done in bulk!"

PHASE 1 : Version 1

Single User Story Entry

User either enters user story manually or uploads a CSV document with all the details in a template that we provide to them.

3- Step Guided Flow with Human Agent in the Loop

PHASE 2 : SCOPE CREEP

Bulk Upload + SOP Entry

User only uploads multiple user stories in a CSV template and the agent extracts the data into the system and generates test steps.

Smart edits- Only one upload button.

PHASE 3 : COMPLEXITY PEAK

No SOP Entry but Bulk upload and interactions

User uploads test cases only in large numbers, interacts with tests and runs them after review.

3-step simple journey with test management

The initial phase of the process was discovering the problem. Testing process is a bit scattered, in that it’s spread across various platforms (JIRA and whatnot) and a singular platform that would allow for end-to-end testing would really ease out the process where most of test case generation and running can be done automatically by the agent.


I shadowed live QA workflows with engineers, mapped their mental models from story → test → execution, and ran 10 structured co-design syncs across architects, QA leads, and developers. This wasn't just requirements gathering — I was building a behavioural baseline: what do users expect the AI to do, and where does reality diverge from that expectation?

The gap between expectation and output is where trust breaks. That became my design north star.

COGNITIVE LOAD THEORY

Progressive Disclosure

THE TENSION

“I just want to upload a doc with my multiple tst cases. Why am I seeing 12 fields? Why do I have to do so much manual work?”

The behavioral risk: Presenting a 12-field form to someone who just wants to upload a document triggers cognitive overload. When users feel overwhelmed, they don't push through — they leave, or worse, they comply without understanding what they're confirming. Neither builds trust in an AI tool.

THE TENSION

“There are too many options right now. I want something straightforward as I already have the test cases ready with me!”

The behavioral risk: More choices = longer decision time = higher abandonment (Hick's Law). In an AI-assisted workflow, decision fatigue is especially dangerous — users start rubber-stamping AI output just to get through the interface.

THE TENSION

“I’m working across 32 test cases but I can’t sit and review every single step - but I need to feel like I could”

The behavioral risk: Users working at a scale cannot afford to review everything. Plus the reason they are shifting from JIRA to an Agentic AI tool is so that they get their tests automatized while also have a sense of control. They want to save time but be in control of all kinds of review as well. AI is best with the Human-in-loop.

THE RESPONSE

Chunked the experience into a 3-step guided flow —

Input/Upload → Review and Manage → Run. Advanced options like bulk tracking were collapsible, surfaced only when contextually relevant. The basic flow worked immediately; power features unfolded progressively.

UPLOAD

-->

REVIEW/MANAGE

-->

RUN

Reduce cognitive load at the start- make them trust the Agent- allow them to manage only when needed.

THE RESPONSE

NOT STARTED

TC 01 Login Flow

TC 02 Upload

GENERATING

TC 02 Auth

REVIEW

TC 02 Session

RUN

TC 02 Nav

TC 02 Upload

People trust systems more when they feel the source of control is internal — i.e. they're driving, not being driven. The stage-based flow gives users perceived ownership over an AI process they didn't author. They don't need to read every test case. They just need to feel like they could. Allow them to manage tests in the second screen- giving them the control back.

Give the locus of control back to the users- they should control the final outcome.

THE RESPONSE

TOO MANY INPUT FIELDS

-->

SINGULAR UPLOAD

-->

MANAGE

Reduced the decision surface from multiple input fields to one upload with a predefined template that works for the QEs. Using smart defaults where the AI auto-detects test cases from uploaded documents. Primary actions (critical path) were visually dominant; secondary actions (annotation, edge cases) were present but recessed. Every confirmation was single-action.


The principle at work: fewer, clearer decisions build confidence — users feel in control, not managed.

HICK’S LAW

Decision Scaffolding

HICK’S LAW

Decision Scaffolding

Create a free website with Framer, the website builder loved by startups, designers and agencies.