Process (check and update) data according to specified feature list

preprocess_data(x, feature_list)

Arguments

x

matrix, data.frame or data.table. The data to check input for and update according to the specification in feature_list.

feature_list

List. Output from running get_data_specs or get_model_specs

Value

List with two named elements: x_dt: Checked and updated data x in data.table format, and update_feature_list the output from check_features

Details

This function takes care of all preprocessing and checking of the provided data in x against the feature_list which is typically the output from get_model_specs

Author

Martin Jullum

Examples

# Load example data
if (requireNamespace("MASS", quietly = TRUE)) {
  data("Boston", package = "MASS")
  # Split data into test- and training data
  x_train <- data.table::as.data.table(head(Boston))
  x_train[, rad := as.factor(rad)]
  data_features <- get_data_specs(x_train)
  model <- lm(medv ~ lstat + rm + rad + indus, data = x_train)

  model_features <- get_model_specs(model)
  preprocess_data(x_train, model_features)
}
#> 
#> Success with message:
#> The columns(s) crim, zn, chas, nox, age, dis, tax, ptratio, black, medv is not used by the model and thus removed from the data.
#> $x_dt
#>    lstat    rm rad indus
#> 1:  4.98 6.575   1  2.31
#> 2:  9.14 6.421   2  7.07
#> 3:  4.03 7.185   2  7.07
#> 4:  2.94 6.998   3  2.18
#> 5:  5.33 7.147   3  2.18
#> 6:  5.21 6.430   3  2.18
#> 
#> $updated_feature_list
#> $updated_feature_list$labels
#> [1] "lstat" "rm"    "rad"   "indus"
#> 
#> $updated_feature_list$classes
#>     lstat        rm       rad     indus 
#> "numeric" "numeric"  "factor" "numeric" 
#> 
#> $updated_feature_list$factor_levels
#> $updated_feature_list$factor_levels$lstat
#> NULL
#> 
#> $updated_feature_list$factor_levels$rm
#> NULL
#> 
#> $updated_feature_list$factor_levels$rad
#> [1] "1" "2" "3"
#> 
#> $updated_feature_list$factor_levels$indus
#> NULL
#> 
#> 
#> $updated_feature_list$specs_type
#> [1] "model"
#> 
#>