library(dplyr)
<- 2000
n_user
set.seed(314)
<- deltatest::generate_dummy_data(n_user) |>
data mutate(group = if_else(group == 0, "control", "treatment")) |>
group_by(user_id, group) |>
summarise(clicks = sum(metric), pageviews = n(), .groups = "drop")
data#> # A tibble: 2,000 × 4
#> user_id group clicks pageviews
#> <int> <chr> <int> <int>
#> 1 1 treatment 1 6
#> 2 2 treatment 2 11
#> 3 3 control 0 17
#> 4 4 control 4 12
#> 5 5 control 5 10
#> 6 6 control 1 15
#> 7 7 control 2 6
#> 8 8 treatment 2 11
#> 9 9 treatment 2 16
#> 10 10 control 0 17
#> # ℹ 1,990 more rows
In online A/B testing, we often face a significant practical challenge: the randomization unit differs from the analysis unit. Typically, control and treatment groups are randomly assigned at the user level, while metrics—such as click-through rate—are measured at a more granular level (e.g., per page-view). In this case, the randomization unit is user, but the analysis unit is page-view.
This discrepancy raises concerns for statistical hypothesis testing, which assumes that data points are independent and identically distributed (i.i.d.). Specifically, a single user can generate multiple page-views, and each user may have a different probability of clicking. Consequently, the data may exhibit within-user correlation, thereby violating the i.i.d. assumption.
When the standard Z-test is applied to such correlated data, the resulting p-values do not follow the expected uniform distribution under the null hypothesis. As a result, smaller p-values tend to occur more frequently even when there is no true difference, increasing the risk of falsely detecting a significant difference.
To address this problem, Deng et al. (2018) proposed a modified statistical hypothesis testing method. Their approach replaces the standard variance estimation formula in the Z-test with an approximate formula derived via the Delta method, which accounts for within-user correlation. To simplify the application of this method, the deltatest package has been developed.
To illustrate how to use this package, we prepare a data frame that includes columns for the number of clicks and page-views aggregated for each user. This data frame also contains a column indicating whether each user was assigned to the control or treatment group.
The statistical hypothesis test using the Delta method can then be performed on this data as follows:
library(deltatest)
deltatest(data, clicks / pageviews, by = group)
#>
#> Two Sample Z-test Using the Delta Method
#>
#> data: clicks/pageviews by group
#> Z = 0.31437, p-value = 0.7532
#> alternative hypothesis: true difference in means between control and treatment is not equal to 0
#> 95 percent confidence interval:
#> -0.01410593 0.01949536
#> sample estimates:
#> mean in control mean in treatment difference
#> 0.245959325 0.248654038 0.002694713
This version of the Z-test yields p-values that follow the expected uniform distribution under the null hypothesis, even when within-user correlation is present.
For more details, refer to https://hoxo-m.github.io/deltatest/.
Installation
You can install the deltatest package from CRAN.
install.packages("deltatest")
References
- CRAN: Package deltatest
- hoxo-m/deltatest: Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing
- Deng, A., Knoblich, U., & Lu, J. (2018). Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.