A/B Testing Engine: Optimising Intervention Effectiveness

Whistl never stops improving. The A/B Testing Engine continuously tests variations of interventions—different wording, timing, and formats—to discover what works best. Winning variations are promoted across the user base while maintaining personalisation. This is evidence-based behaviour change at scale.

Why A/B Testing Matters

Small changes can have big impacts on intervention effectiveness:

Research on Message Framing

Gain vs. loss framing: Different users respond to different frames (Rothman et al., 2006)
Message length: Shorter isn't always better (Keller & Lehmann, 2008)
Tone matching: Coaching style must match user preference (Miller & Rollnick, 2012)
Timing effects: When you intervene matters as much as what you say (Heron & Smyth, 2010)

The Limits of Expert Design

Experts can't predict: What researchers think works ≠ what actually works
Context matters: Effectiveness varies by user, time, situation
Continuous improvement: What works today may not work tomorrow
Scale reveals patterns: Large user base enables statistical confidence

What Whistl A/B Tests

The testing engine evaluates variations across multiple dimensions:

Message Wording

Step	Variant A	Variant B	Variant C
Acknowledge	"I hear you. What's driving this?"	"I understand this is hard. Talk to me."	"You want through. I get it. Why?"
Reflect	"Last time you felt this way..."	"Remember what happened last time..."	"Think about the last time..."
Breathe	"Let's breathe together."	"Time to breathe. With me."	"Breathe. Just breathe."
Visualize	"Picture your goal..."	"Remember what you're saving for..."	"This is what you're working toward..."

Timing Variations

Immediate vs. delayed: Intervene right away or wait 30 seconds?
Breathing duration: 2 minutes vs. 3 minutes vs. 90 seconds
Step spacing: Show steps one at a time or all together?
Follow-up timing: Check in after 1 hour or 2 hours?

Visual Format

Text-only vs. image: Does showing goal images help?
Progress bar style: Linear vs. circular vs. numeric
Color schemes: Calming blues vs. urgent reds vs. neutral
Animation: Animated breathing pacer vs. static

How A/B Testing Works

Whistl's testing engine follows rigorous methodology:

Test Assignment

# User assignment to test variants
def assign_test_variant(user_id, test_id):
    # Hash user ID for consistent assignment
    hash_value = hash(user_id + test_id) % 100
    
    # Assign to variant based on hash
    if hash_value < 33:
        return "A"  # Control group
    elif hash_value < 66:
        return "B"  # Variant B
    else:
        return "C"  # Variant C

# User always sees same variant for same test
# Different users see different variants

Sample Size Requirements

Minimum per variant: 100 interventions
Statistical power: 80% (standard for behavioural research)
Confidence level: 95% (p < 0.05)
Minimum effect size: 5% improvement to adopt

Success Metrics

Metric	Definition	Target
Intervention Acceptance	User engaged with intervention	>70%
Urge Pass Rate	Urge didn't return within 2 hours	>60%
Step Completion	User completed the full step	>80%
Helpfulness Rating	User rated 4+ stars	>4.0/5.0
No Bypass	User didn't bypass after intervention	>75%

Current Active Tests

Examples of tests running across the Whistl user base:

Test 1: Acknowledge Message Tone

Test ID: ACK_TONE_001
Status: Running (67% complete)

Variant A (Control): "I hear you. What's driving this?"
  - Sample: 1,234 interventions
  - Acceptance rate: 89%
  - Helpfulness: 4.2/5

Variant B: "I understand this is frustrating. Talk to me."
  - Sample: 1,198 interventions
  - Acceptance rate: 91%
  - Helpfulness: 4.4/5

Variant C: "This is hard. I'm here. What's happening?"
  - Sample: 1,211 interventions
  - Acceptance rate: 87%
  - Helpfulness: 4.3/5

Current leader: Variant B (+2% acceptance, +0.2 helpfulness)

Test 2: Breathing Duration

Test ID: BREATHE_DURATION_002
Status: Running (45% complete)

Variant A (Control): 2 minutes
  - Sample: 892 interventions
  - Completion rate: 78%
  - Urge pass rate: 54%

Variant B: 90 seconds
  - Sample: 867 interventions
  - Completion rate: 84%
  - Urge pass rate: 49%

Variant C: 3 minutes
  - Sample: 901 interventions
  - Completion rate: 71%
  - Urge pass rate: 58%

Current leader: Variant A (best balance of completion and effectiveness)

Test 3: Visualization Format

Test ID: VISUAL_FORMAT_003
Status: Complete - Variant B Winner

Variant A (Control): Text-only goal reminder
  - Sample: 2,100 interventions
  - Motivation increase: 45%

Variant B: Goal image + progress bar
  - Sample: 2,087 interventions
  - Motivation increase: 61% ✓ WINNER

Variant C: Goal image + time travel projection
  - Sample: 2,134 interventions
  - Motivation increase: 58%

Result: Variant B promoted to all users

From Test to Production

When a test completes, winning variants are rolled out:

Rollout Process

Statistical validation: Confirm significance and effect size
Segment analysis: Check if winner varies by user type
Gradual rollout: 10% → 50% → 100% over 1 week
Monitoring: Watch for unexpected effects
Documentation: Update intervention library

Personalisation Override

Even winning variants respect personal preferences:

Coaching style: Tough Love users still get Tough Love variants
Step order: Personal step ordering takes precedence
Opt-out: Users can disable experimental features

Ethical Considerations

Whistl's A/B testing follows ethical guidelines:

Ethical Principles

No harmful variants: All variants must be supportive, not punitive
Crisis exclusion: Users in crisis don't receive test variants
Transparency: Users can view active tests in settings
Opt-out available: Users can choose control variants only

Data Privacy

Anonymous aggregation: Test results are aggregated, not individual
No external sharing: Test data stays within Whistl
Minimal collection: Only data needed for testing is collected

Effectiveness Improvements

A/B testing has driven measurable improvements:

Cumulative Impact (12 Months)

Metric	Baseline	Current	Improvement
Overall Intervention Acceptance	64%	73%	+9%
Urge Pass Rate	52%	61%	+9%
User Satisfaction	4.1/5	4.6/5	+12%
Step Completion Rate	67%	78%	+11%

User Testimonials

"I noticed the messages changed over time. They got... better? More helpful. Didn't realise they were testing stuff." — Marcus, 28

"The breathing timer changed from 2 minutes to something else and back. Asked support—they said they were testing what works. Cool that they care about getting it right." — Sarah, 34

"Love that Whistl is always improving. It's not static software—it's getting smarter." — Jake, 31

Conclusion

Whistl's A/B Testing Engine ensures that every intervention is backed by evidence, not just intuition. By continuously testing and learning, Whistl gets more effective over time—for every user.

This is behaviour change science in action: hypothesis, test, learn, improve. Repeat forever.

Experience Evidence-Based Protection

Whistl's interventions are continuously tested and improved. Download free and benefit from ongoing optimisation.

Download Whistl Free