crslab.data package

Module contents

Data module which reads, processes and batches data for the whole system

crslab.data.dataset_register_table

record all supported dataset

Type

dict

crslab.data.dataset_language_map

record all dataset corresponding language

Type

dict

crslab.data.dataloader_register_table

record all model corresponding dataloader

Type

dict

crslab.data.get_dataloader(opt, dataset, vocab)crslab.data.dataloader.base.BaseDataLoader[source]

get dataloader to batchify dataset

Parameters
  • opt (Config or dict) – config for dataloader or the whole system.

  • dataset – processed raw data, no side data.

  • vocab (dict) – all kinds of useful size, idx and map between token and idx.

Returns

dataloader

crslab.data.get_dataset(opt, tokenize, restore, save) → crslab.data.dataset.base.BaseDataset[source]

get and process dataset

Parameters
  • opt (Config or dict) – config for dataset or the whole system.

  • tokenize (str) – how to tokenize the dataset.

  • restore (bool) – whether to restore saved dataset which has been processed.

  • save (bool) – whether to save dataset after processing.

Returns

processed dataset