Experiments
A/B test AI responses to find what works best
Experiments let you test different AI approaches and automatically adopt what works best.
What Are Experiments?
Experiments are A/B tests for AI responses:
- Test different response styles
- Measure customer outcomes
- Automatically use winners
- Continuously improve
How Experiments Work
Setup
Define an experiment:
- Choose what to test (response style, length, etc.)
- Set the variants (A vs B)
- Define success metrics
- Set traffic allocation
Running
During the experiment:
- Tickets randomly get variant A or B
- AI generates responses accordingly
- Outcomes are tracked
Analysis
After enough data:
- Statistical significance calculated
- Winner determined
- Results reported
Application
Winning approach is:
- Automatically adopted
- Applied to future tickets
- Stored in Client Brain
Creating an Experiment
Navigate to Experiments
Go to Settings → AI → Experiments
Click New Experiment
Configure the Experiment
| Field | Description |
|---|---|
| Name | Descriptive name |
| Hypothesis | What you're testing |
| Variants | A and B approaches |
| Metric | What determines success |
| Traffic | % of tickets to include |
| Duration | How long to run |
Example: Response Length
Name: Short vs Detailed Responses
Hypothesis: Shorter responses resolve tickets faster
Variant A: Standard response length
Variant B: Concise responses (50% shorter)
Success Metric: Resolution rate
Traffic: 20%
Duration: 2 weeksStart the Experiment
Click Start Experiment to begin.
Experiment Types
Tone Testing
Test different communication styles:
- Formal vs casual
- Technical vs simple
- Empathetic vs direct
Length Testing
Test response length:
- Concise vs comprehensive
- Bullet points vs paragraphs
- With vs without context
Structure Testing
Test response formats:
- Question-first vs answer-first
- With vs without greeting
- Signature styles
Content Testing
Test what information to include:
- Links vs inline content
- Step-by-step vs summary
- With vs without images
Success Metrics
Available Metrics
| Metric | Description |
|---|---|
| Resolution rate | % tickets resolved |
| First-contact resolution | Resolved in one reply |
| Customer satisfaction | CSAT score |
| Response time | Speed of resolution |
| Escalation rate | % needing human |
| Reopening rate | % reopened after close |
Choosing Metrics
Pick metrics that:
- Align with your goals
- Are measurable
- Have enough volume
Monitoring Experiments
Experiment Dashboard
View running experiments:
- Current performance
- Sample size
- Estimated completion
- Statistical significance
Pausing Experiments
If a variant performs poorly:
- Click Pause
- All traffic goes to the other variant
- Review results
- Decide to resume or end
Interpreting Results
Statistical Significance
Results need 95% confidence to be meaningful:
- < 95% - Not enough data, continue
- ≥ 95% - Results are reliable
Winning Variant
The winner is the variant with:
- Better metric performance
- Statistical significance
- Practical difference (not just statistical)
Inconclusive Results
If no clear winner:
- Variants may be equivalent
- Consider testing something else
- Or refine the hypothesis
Applying Results
Auto-Apply
If enabled, winning approaches automatically:
- Update AI behavior
- Store in Client Brain
- Apply to future tickets
Manual Apply
Review results first:
- See experiment results
- Click Apply Winner
- Confirm the change
Rollback
If issues arise after applying:
- Go to experiment results
- Click Rollback
- Reverts to previous behavior
Best Practices
Test One Thing
Change only one variable per experiment:
- ✅ Short vs long responses
- ❌ Short formal vs long casual
Adequate Sample Size
Need enough tickets for reliable results:
- Minimum ~100 tickets per variant
- More for small effect sizes
Run Long Enough
Give experiments time:
- At least 1-2 weeks
- Account for weekly patterns
Document Learnings
Record what you learned:
- Why the winner won
- Insights for future tests
- Failed hypotheses
Scheduled Experiments
Set up recurring experiments:
- Click Schedule
- Define recurrence
- AI automatically tests variations
Good for:
- Continuous improvement
- Seasonal optimization
- Changing customer needs