Skip to contents

This function imputes missing values in a data frame based on specified methods for numerical and categorical variables. Additionally, it can add flag columns to indicate missing values. For numerical variables, missing values can be imputed using the mean or median. For categorical variables, missing values can be imputed using the mode or a new level. This function also removes constant columns (all NAs or all observed but the same value).

Usage

missingFix(data, missingMethod = c("medianFlag", "newLevel"))

Arguments

data

A data frame containing the data to be processed. Missing values (NA) will be imputed based on the methods provided in missingMethod.

missingMethod

A character vector of length 2 specifying the methods for imputing missing values. The first element specifies the method for numerical variables ("mean", "median", "meanFlag", or "medianFlag"), and the second element specifies the method for categorical variables ("mode", "modeFlag", or "newLevel"). If "Flag" is included, a flag column will be added for the corresponding variable type.

Value

A list with two elements:

data

The original data frame with missing values imputed, and flag columns added if applicable.

ref

A reference row containing the imputed values and flag levels, which can be used for future predictions or reference.

Examples

dat <- data.frame(
  X1 = rep(NA, 5),
  X2 = factor(rep(NA, 5), levels = LETTERS[1:3]),
  X3 = 1:5,
  X4 = LETTERS[1:5],
  X5 = c(NA, 2, 3, 10, NA),
  X6 = factor(c("A", NA, NA, "B", "B"), levels = LETTERS[1:3])
)
missingFix(dat)
#> $data
#>   X3 X4 X5          X6 X5_FLAG
#> 1  1  A  3           A       1
#> 2  2  B  2 new0_0Level       0
#> 3  3  C  3 new0_0Level       0
#> 4  4  D 10           B       0
#> 5  5  E  3           B       1
#> 
#> $ref
#>   X3 X4 X5          X6 X5_FLAG
#> 1  3  A  3 new0_0Level       1
#>