This function fits a ULDA (Uncorrelated Linear Discriminant Analysis) model to the provided data, with an option for forward selection of variables based on Pillai's trace or Wilks' Lambda. It can also handle missing values, perform downsampling, and compute the linear discriminant scores and group means for classification. The function returns a fitted ULDA model object.
Arguments
- datX
A data frame of predictor variables.
- response
A factor representing the response variable with multiple classes.
- subsetMethod
A character string specifying the method for variable selection. Options are
"forward"
for forward selection or"all"
for using all variables. Default is"forward"
.- testStat
A character string specifying the test statistic to use for forward selection. Options are
"Pillai"
or"Wilks"
. Default is"Pillai"
.- correction
A logical value indicating whether to apply a multiple comparison correction during forward selection. Default is
TRUE
.- alpha
A numeric value between 0 and 1 specifying the significance level for the test statistic during forward selection. Default is 0.1.
- prior
A numeric vector representing the prior probabilities for each class in the response variable. If
NULL
, the observed class frequencies are used as the prior. Default isNULL
.- misClassCost
A square matrix \(C\), where each element \(C_{ij}\) represents the cost of classifying an observation into class \(i\) given that it truly belongs to class \(j\). If
NULL
, a default matrix with equal misclassification costs for all class pairs is used. Default isNULL
.- missingMethod
A character vector of length 2 specifying how to handle missing values for numerical and categorical variables, respectively. Default is
c("medianFlag", "newLevel")
.- downSampling
A logical value indicating whether to perform downsampling to balance the class distribution in the training data or to improve computational efficiency. Default is
FALSE
. Note that if downsampling is applied and theprior
isNULL
, the class prior will be calculated based on the downsampled data. To retain the original prior, please specify it explicitly using theprior
parameter.- kSample
An integer specifying the maximum number of samples to take from each class during downsampling. If
NULL
, the number of samples is limited to the size of the smallest class. Default isNULL
.
Value
A list of class ULDA
containing the following components:
- scaling
The matrix of scaling coefficients for the linear discriminants.
- groupMeans
The group means of the linear discriminant scores.
- prior
The prior probabilities for each class.
- misClassCost
The misclassification cost matrix.
- misReference
A reference for handling missing values.
- terms
The terms used in the model formula.
- xlevels
The levels of the factors used in the model.
- varIdx
The indices of the selected variables.
- varSD
The standard deviations of the selected variables.
- varCenter
The means of the selected variables.
- statPillai
The Pillai's trace statistic.
- pValue
The p-value associated with Pillai's trace.
- predGini
The Gini index of the predictions on the training data.
- confusionMatrix
The confusion matrix for the training data predictions.
- forwardInfo
Information about the forward selection process, if applicable.
- stopInfo
A message indicating why forward selection stopped, if applicable.
References
Howland, P., Jeon, M., & Park, H. (2003). Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition. SIAM Journal on Matrix Analysis and Applications
Wang, S. (2024). A New Forward Discriminant Analysis Framework Based On Pillai's Trace and ULDA. arXiv preprint arXiv:2409.03136. Available at https://arxiv.org/abs/2409.03136.