The `ShipOfTheseus` class decomposes the difference in outcome rates between two datasets and visualizes the results as a Theseus Plot. It provides methods to compute contributions of individual attributes, summarize results in tables, and generate waterfall-style plots for intuitive interpretation.
Methods
Method new()
The constructor of the ShipOfTheseus class.
Usage
ShipOfTheseus$new(data1, data2, outcome, labels, ylab, digits, text_size)Arguments
data1data frame representing the first group (e.g., the baseline or "original" data).
data2data frame representing the second group (e.g., the comparison or "refitted" data).
outcomestring specifying the outcome variable used to compute the rate metric (default is "y"). Typically, this is a binary indicator (e.g., 0/1) that is aggregated to form rates.
labelscharacter vector of length 2 giving the labels for the two groups. The first corresponds to `data1`, the second to `data2`. Default is c("Original", "Refitted").
ylabstring specifying the y-axis label for plots. If NULL (default), no label is displayed.
digitsinteger indicating the number of decimal places to use for displaying numeric values (default is 3).
text_sizenumeric value specifying the relative size of text elements in plots (default is 1).
Returns
A ShipOfTheseus object, which can be used with plot() to
create Theseus plots.
Method table()
Generate a contribution table for a given column.
Usage
ShipOfTheseus$table(column_name, n = Inf, continuous = continuous_config())Arguments
column_namestring. The name of the column to analyze.
ninteger. Maximum number of top contributing attributes to display. If the number of attributes exceeds `n`, the remaining are aggregated.
continuouslist. A configuration list for handling continuous variables (e.g., specifying number of bins or custom breaks).
Method plot()
Generate a Theseus plot for a specified column
Usage
ShipOfTheseus$plot(
column_name,
n = 10L,
main_item = NULL,
bar_max_value = NULL,
levels = NULL,
continuous = continuous_config()
)Arguments
column_nameThe name of the column to visualize.
ninteger. Maximum number of top contributing attributes to display. Remaining attributes are aggregated if necessary.
main_itemstring. The attribute used as the reference for scaling the bar heights.
bar_max_valuenumeric. Maximum value for scaling the contribution bars.
levelscharacter vector specifying the display order of attributes.
continuouslist. Configuration for handling continuous variables (e.g., number of bins or custom breaks).
Method plot_flip()
Generate a Theseus plot for a specified column
Usage
ShipOfTheseus$plot_flip(
column_name,
n = 10L,
main_item = NULL,
bar_max_value = NULL,
levels = NULL,
continuous = continuous_config()
)Arguments
column_nameThe name of the column to visualize.
ninteger. Maximum number of top contributing attributes to display. Remaining attributes are aggregated if necessary.
main_itemstring. The attribute used as the reference for scaling the bar heights.
bar_max_valuenumeric. Maximum value for scaling the contribution bars.
levelscharacter vector specifying the display order of attributes.
continuouslist. Configuration for handling continuous variables (e.g., number of bins or custom breaks).