This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Alex Context NLG Dataset

Please use the following text to cite this item or export to a predefined format:
Dušek, Ondřej and Jurčíček, Filip, 2016, Alex Context NLG Dataset, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-1675.
Date issued
2016-04-05
Size
1859 entries,
5577 sentences
Language(s)
Description
A dataset intended for fully trainable natural language generation (NLG) systems in task-oriented spoken dialogue systems (SDS), covering the English public transport information domain. It includes preceding context (user utterance) along with each data instance (pair of source meaning representation and target natural language paraphrase to be generated). Taking the form of the previous user utterance into account for generating the system response allows NLG systems trained on this dataset to entrain (adapt) to the preceding utterance, i.e., reuse wording and syntactic structure. This should presumably improve the perceived naturalness of the output, and may even lead to a higher task success rate. Crowdsourcing has been used to obtain natural context user utterances as well as natural system responses to be generated.
Acknowledgement
 Files in this item
Name
README.md
Size
11.38 KB
Format
application/octet-stream
Description
Dataset description and documentation
MD5
2e8449b98200579becfd6d46b401741c
Preview
  File Preview
Name
dataset.csv
Size
1.12 MB
Format
application/octet-stream
Description
The dataset in CSV format
MD5
81f85c82b5ca7f5e23605face62fd5fd
Preview
  File Preview
Name
dataset.json
Size
1.77 MB
Format
application/octet-stream
Description
The dataset in JSON format
MD5
552ea0396c3184a74588b2c151b73bef
Preview
  File Preview