A/B Testing
A/B Testing
- During A/B testing you will send users to, typically 2, different UI's, AI models, pricing structures, etc... and collect data to find effects
These online systems are often tied to feature flags, and use canary or rollout testing on a few random, or non-random, user groups while collecting large amounts of click data
The below sections draw on Frequentist vs Bayesian which was covered in Bayesian Statistics
Frequentist A/B Testing
In this setup, the Frequentist approach would have:
- Null Hypothesis : no difference between variants
- Compute p-value: the probability of seeing the observed effect (or more extreme) under
- If p-value < (e.g., 0.05), reject
- Limitations:
- you can't quantify how large the effects are, or how uncertain you are
- you can simply say "yes you think there's a significant difference"
Bayesian A/B Testing
- Assign priors to conversion rates of each group (e.g., Beta distributions)
- Use observed data to compute posterior distributions
- For computing posterior distributions we'd need a feedback and update mechanism
- TODO: you will create a ML Sys Design video around this
- Compute:
- Posterior probability that variant B is better than A
- Expected loss (e.g., how bad a wrong decision could be)
- With this approach you are able to interpret the chance that variant B is "better than" variant A
- you can utilize Causal Inference to understand the effects of variants on users
Casual Inference
Correlation isn't causation - you want to directly measure what happens if you change X, or variants, on our user base
Causal Inference helps us to estimate model impact and personalization
- Treatment Effect: The causal impact of a treatment (e.g., new algorithm) on an outcome
- Counterfactual: What would have happened if the unit didn’t receive the treatment
- ATE / CATE: Average Treatment Effect / Conditional ATE
- Confounding: A variable that affects both treatment and outcome