Generate Dummy Data — generate_dummy

Generate random dummy data for simulation studies. For details, see Section 4.3 in Deng et al. (2017).

Usage

generate_dummy_data(
  n_user,
  model = c("Bernoulli", "normal"),
  xi = 0,
  sigma = 0,
  lambda = 3,
  random_unit = c("user", "session", "pageview"),
  treatment_ratio = 0.5
)

Arguments

n_user: integer value specifying the number of users included in the generated data. Since multiple rows are generated for each user, the number of rows in the data exceeds the number of users.
model: character string specifying the model that generates the potential outcomes. It must be one of "Bernoulli" (default) or "normal". You can specify just the initial letter.
xi: numeric value specifying the treatment effect variation (TEV) under the Bernoulli model, where \(TEV = 2\xi\). This argument is ignored if the model argument is set to "normal". The default is 0.
sigma: numeric value specifying the treatment effect variation (TEV) under the normal model, where \(TEV = \sigma\). This argument is ignored if the model argument is set to "Bernoulli". The default is 0.
lambda: numeric value specifying the Poisson rate parameter for the number of sessions and page-views generated by a single user.
random_unit: character string specifying the randomization unit. It must be one of "user" (default), "session", or "pageview". You can specify just the initial letter. The default is 0.
treatment_ratio: numeric value specifying the ratio assigned to treatment. The default value is 0.5.

Value

data.frame with the columns user_id, group, and metric, where each row represents a metric value for a page-view.

References

Deng, A., Lu, J., & Litz, J. (2017). Trustworthy Analysis of Online A/B Tests: Pitfalls, challenges and solutions. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. doi:10.1145/3018661.3018677

Examples

library(deltatest)

set.seed(314)
generate_dummy_data(n_user = 2000)
#> # A tibble: 31,812 × 3
#>    user_id group metric
#>      <int> <int>  <int>
#>  1       1     1      0
#>  2       1     1      0
#>  3       1     1      0
#>  4       1     1      0
#>  5       1     1      0
#>  6       1     1      1
#>  7       2     1      0
#>  8       2     1      0
#>  9       2     1      1
#> 10       2     1      0
#> # ℹ 31,802 more rows