Align Data with a Missing Reference — getDataInShape • folda

This function aligns a given dataset (data) with a reference dataset (missingReference). It ensures that the structure, column names, and factor levels in data match the structure of missingReference. If necessary, missing columns are initialized with NA, and factor levels are adjusted to match the reference. Additionally, it handles the imputation of missing values based on the reference and manages flag variables for categorical or numerical columns.

Usage

getDataInShape(data, missingReference)

Arguments

data: A data frame to be aligned and adjusted according to the missingReference.
missingReference: A reference data frame that provides the structure (column names, factor levels, and missing value reference) for aligning data.

Value

A data frame where the structure, column names, and factor levels of data are aligned with missingReference. Missing values in data are imputed based on the first row of the missingReference, and flag variables are updated accordingly.

Examples

data <- data.frame(
  X1_FLAG = c(0, 0, 0),
  X1 = factor(c(NA, "C", "B"), levels = LETTERS[2:3]),
  X2_FLAG = c(NA, 0, 1),
  X2 = c(2, NA, 3)
)

missingReference <- data.frame(
  X1_FLAG = 1,
  X1 = factor("A", levels = LETTERS[1:2]),
  X2 = 1,
  X2_FLAG = 1
)

getDataInShape(data, missingReference)
#>   X1_FLAG X1 X2 X2_FLAG
#> 1       1  A  2       0
#> 2       1  A  1       1
#> 3       0  B  3       1