crslab.data package¶
Module contents¶
Data module which reads, processes and batches data for the whole system
-
crslab.data.
dataset_register_table
¶ record all supported dataset
- Type
dict
-
crslab.data.
dataset_language_map
¶ record all dataset corresponding language
- Type
dict
-
crslab.data.
dataloader_register_table
¶ record all model corresponding dataloader
- Type
dict
-
crslab.data.
get_dataloader
(opt, dataset, vocab) → crslab.data.dataloader.base.BaseDataLoader[source]¶ get dataloader to batchify dataset
- Parameters
opt (Config or dict) – config for dataloader or the whole system.
dataset – processed raw data, no side data.
vocab (dict) – all kinds of useful size, idx and map between token and idx.
- Returns
dataloader
-
crslab.data.
get_dataset
(opt, tokenize, restore, save) → crslab.data.dataset.base.BaseDataset[source]¶ get and process dataset
- Parameters
opt (Config or dict) – config for dataset or the whole system.
tokenize (str) – how to tokenize the dataset.
restore (bool) – whether to restore saved dataset which has been processed.
save (bool) – whether to save dataset after processing.
- Returns
processed dataset