crslab.data package¶
Module contents¶
Data module which reads, processes and batches data for the whole system
-
crslab.data.dataset_register_table¶ record all supported dataset
- Type
dict
-
crslab.data.dataset_language_map¶ record all dataset corresponding language
- Type
dict
-
crslab.data.dataloader_register_table¶ record all model corresponding dataloader
- Type
dict
-
crslab.data.get_dataloader(opt, dataset, vocab) → crslab.data.dataloader.base.BaseDataLoader[source]¶ get dataloader to batchify dataset
- Parameters
opt (Config or dict) – config for dataloader or the whole system.
dataset – processed raw data, no side data.
vocab (dict) – all kinds of useful size, idx and map between token and idx.
- Returns
dataloader
-
crslab.data.get_dataset(opt, tokenize, restore, save) → crslab.data.dataset.base.BaseDataset[source]¶ get and process dataset
- Parameters
opt (Config or dict) – config for dataset or the whole system.
tokenize (str) – how to tokenize the dataset.
restore (bool) – whether to restore saved dataset which has been processed.
save (bool) – whether to save dataset after processing.
- Returns
processed dataset