Experiments | Keva Docs

Experiments let you test different AI approaches and automatically adopt what works best.

What Are Experiments?

Experiments are A/B tests for AI responses:

Test different response styles
Measure customer outcomes
Automatically use winners
Continuously improve

How Experiments Work

Setup

Define an experiment:

Choose what to test (response style, length, etc.)
Set the variants (A vs B)
Define success metrics
Set traffic allocation

Running

During the experiment:

Tickets randomly get variant A or B
AI generates responses accordingly
Outcomes are tracked

Analysis

After enough data:

Statistical significance calculated
Winner determined
Results reported

Application

Winning approach is:

Automatically adopted
Applied to future tickets
Stored in Client Brain

Creating an Experiment

Navigate to Experiments

Go to Settings → AI → Experiments

Click New Experiment

Configure the Experiment

Field	Description
Name	Descriptive name
Hypothesis	What you're testing
Variants	A and B approaches
Metric	What determines success
Traffic	% of tickets to include
Duration	How long to run

Example: Response Length

Name: Short vs Detailed Responses
Hypothesis: Shorter responses resolve tickets faster
 
Variant A: Standard response length
Variant B: Concise responses (50% shorter)
 
Success Metric: Resolution rate
Traffic: 20%
Duration: 2 weeks

Start the Experiment

Click Start Experiment to begin.

Experiment Types

Tone Testing

Test different communication styles:

Formal vs casual
Technical vs simple
Empathetic vs direct

Length Testing

Test response length:

Concise vs comprehensive
Bullet points vs paragraphs
With vs without context

Structure Testing

Test response formats:

Question-first vs answer-first
With vs without greeting
Signature styles

Content Testing

Test what information to include:

Links vs inline content
Step-by-step vs summary
With vs without images

Success Metrics

Available Metrics

Metric	Description
Resolution rate	% tickets resolved
First-contact resolution	Resolved in one reply
Customer satisfaction	CSAT score
Response time	Speed of resolution
Escalation rate	% needing human
Reopening rate	% reopened after close

Choosing Metrics

Pick metrics that:

Align with your goals
Are measurable
Have enough volume

Monitoring Experiments

Experiment Dashboard

View running experiments:

Current performance
Sample size
Estimated completion
Statistical significance

Pausing Experiments

If a variant performs poorly:

Click Pause
All traffic goes to the other variant
Review results
Decide to resume or end

Interpreting Results

Statistical Significance

Results need 95% confidence to be meaningful:

< 95% - Not enough data, continue
≥ 95% - Results are reliable

Winning Variant

The winner is the variant with:

Better metric performance
Statistical significance
Practical difference (not just statistical)

Inconclusive Results

If no clear winner:

Variants may be equivalent
Consider testing something else
Or refine the hypothesis

Applying Results

Auto-Apply

If enabled, winning approaches automatically:

Update AI behavior
Store in Client Brain
Apply to future tickets

Manual Apply

Review results first:

See experiment results
Click Apply Winner
Confirm the change

Rollback

If issues arise after applying:

Go to experiment results
Click Rollback
Reverts to previous behavior

Best Practices

Test One Thing

Change only one variable per experiment:

✅ Short vs long responses
❌ Short formal vs long casual

Adequate Sample Size

Need enough tickets for reliable results:

Minimum ~100 tickets per variant
More for small effect sizes

Run Long Enough

Give experiments time:

At least 1-2 weeks
Account for weekly patterns

Document Learnings

Record what you learned:

Why the winner won
Insights for future tests
Failed hypotheses

Scheduled Experiments

Set up recurring experiments:

Click Schedule
Define recurrence
AI automatically tests variations

Good for:

Continuous improvement
Seasonal optimization
Changing customer needs