Convert a the data into a torch::dataset() which the vaeac model creates batches from.

vaeac_dataset(X, one_hot_max_sizes)

Arguments

X

A torch_tensor contain the data of shape N x p, where N and p are the number of observations and features, respectively.

one_hot_max_sizes

A torch tensor of dimension n_features containing the one hot sizes of the n_features features. That is, if the ith feature is a categorical feature with 5 levels, then one_hot_max_sizes[i] = 5. While the size for continuous features can either be 0 or 1.

Details

This function creates a torch::dataset() object that represent a map from keys to data samples. It is used by the torch::dataloader() to load data which should be used to extract the batches for all epochs in the training phase of the neural network. Note that a dataset object is an R6 instanc, see https://r6.r-lib.org/articles/Introduction.html, which is classical object-oriented programming, with self reference. I.e, vaeac_dataset() is a subclass of type torch::dataset().

Author

Lars Henry Berge Olsen

Examples

if (FALSE) { # \dontrun{
p <- 5
N <- 14
batch_size <- 10
one_hot_max_sizes <- rep(1, p)
vaeac_ds <- vaeac_dataset(
  torch_tensor(matrix(rnorm(p * N), ncol = p),
    dtype = torch_float()
  ),
  one_hot_max_sizes
)
vaeac_ds

vaeac_dl <- torch::dataloader(
  vaeac_ds,
  batch_size = batch_size,
  shuffle = TRUE,
  drop_last = FALSE
)
vaeac_dl$.length()
vaeac_dl$.iter()

vaeac_iterator <- vaeac_dl$.iter()
vaeac_iterator$.next() # batch1
vaeac_iterator$.next() # batch2
vaeac_iterator$.next() # Empty
} # }