R/preprocess_data.R
check_features.Rd
Checks that two extracted feature lists have exactly the same properties
check_features(f_list_1, f_list_2, use_1_as_truth = T)
List. As extracted from either get_data_specs
or get_model_specs
.
Logical. If TRUE, f_list_2
is compared to f_list_1
, i.e. additional elements
is allowed in f_list_2
, and if f_list_1
's feature classes contains NAs, feature class check is
ignored regardless of what is specified in f_list_1
. If FALSE, f_list_1
and f_list_2
are
equated and they need to contain exactly the same elements. Set to TRUE when comparing a model and data, and FALSE
when comparing two data sets.
List. The f_list_1
is returned as inserted if there all check are carried out. If some info is
missing from f_list_1
, the function continues consistency checking using f_list_2
and returns that.
# Load example data
if (requireNamespace("MASS", quietly = TRUE)) {
data("Boston", package = "MASS")
# Split data into test- and training data
x_train <- data.table::as.data.table(head(Boston))
x_train[, rad := as.factor(rad)]
data_features <- get_data_specs(x_train)
model <- lm(medv ~ lstat + rm + rad + indus, data = x_train)
model_features <- get_model_specs(model)
check_features(model_features, data_features)
}
#> $labels
#> [1] "lstat" "rm" "rad" "indus"
#>
#> $classes
#> lstat rm rad indus
#> "numeric" "numeric" "factor" "numeric"
#>
#> $factor_levels
#> $factor_levels$lstat
#> NULL
#>
#> $factor_levels$rm
#> NULL
#>
#> $factor_levels$rad
#> [1] "1" "2" "3"
#>
#> $factor_levels$indus
#> NULL
#>
#>
#> $specs_type
#> [1] "model"
#>