crslab.data.dataset.tgredial package

Submodules

TGReDial

References

Zhou, Kun, et al. “Towards Topic-Guided Conversational Recommender System.” in COLING 2020.

class crslab.data.dataset.tgredial.tgredial.TGReDialDataset(opt, tokenize, restore=False, save=False)[source]

Bases: crslab.data.dataset.base.BaseDataset

train_data

train dataset.

valid_data

valid dataset.

test_data

test dataset.

vocab
{
    'tok2ind': map from token to index,
    'ind2tok': map from index to token,
    'topic2ind': map from topic to index,
    'ind2topic': map from index to topic,
    'entity2id': map from entity to index,
    'id2entity': map from index to entity,
    'word2id': map from word to index,
    'vocab_size': len(self.tok2ind),
    'n_topic': len(self.topic2ind) + 1,
    'n_entity': max(self.entity2id.values()) + 1,
    'n_word': max(self.word2id.values()) + 1,
}
Type

dict

Notes

'unk' and 'pad_topic' must be specified in 'special_token_idx' in resources.py.

Specify tokenized resource and init base dataset.

Parameters
  • opt (Config or dict) – config for dataset or the whole system.

  • tokenize (str) – how to tokenize dataset.

  • restore (bool) – whether to restore saved dataset which has been processed. Defaults to False.

  • save (bool) – whether to save dataset after processing. Defaults to False.

Module contents