dc.contributor.author | Dušek, Ondřej |
dc.contributor.author | Jurčíček, Filip |
dc.date.accessioned | 2016-04-05T12:02:13Z |
dc.date.available | 2016-04-05T12:02:13Z |
dc.date.issued | 2016-04-05 |
dc.identifier.uri | http://hdl.handle.net/11234/1-1675 |
dc.description | A dataset intended for fully trainable natural language generation (NLG) systems in task-oriented spoken dialogue systems (SDS), covering the English public transport information domain. It includes preceding context (user utterance) along with each data instance (pair of source meaning representation and target natural language paraphrase to be generated). Taking the form of the previous user utterance into account for generating the system response allows NLG systems trained on this dataset to entrain (adapt) to the preceding utterance, i.e., reuse wording and syntactic structure. This should presumably improve the perceived naturalness of the output, and may even lead to a higher task success rate. Crowdsourcing has been used to obtain natural context user utterances as well as natural system responses to be generated. |
dc.language.iso | eng |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ |
dc.source.uri | https://github.com/UFAL-DSG/alex_context_nlg_dataset |
dc.subject | dialogue system |
dc.subject | natural language generation |
dc.subject | dialogue alignment |
dc.subject | entrainment |
dc.title | Alex Context NLG Dataset |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Ondřej Dušek odusek@ufal.mff.cuni.cz Charles University in Prague, UFAL |
sponsor | Grantová agentura Univerzity Karlovy v Praze GAUK 2058214 Adaptivní generátor přirozeného jazyka nationalFunds |
sponsor | Ministerstvo školství, mládeže a tělovýchovy České republiky LK11221 Vývoj metod pro návrh statistických mluvených dialogových systémů nationalFunds |
sponsor | Univerzita Karlova v Praze (mimo GAUK) SVV 260 333 Specifický vysokoškolský výzkum nationalFunds |
size.info | 1859 entries |
size.info | 5577 sentences |
files.size | 3042834 |
files.count | 3 |
Soubory tohoto záznamu
Stáhnout všechny soubory záznamu (2.9 MB)Licenční kategorie:
Licence: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
Licence: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Název
- README.md
- Velikost
- 11.38 KB
- Formát
- Neznámý
- Popis
- Dataset description and documentation
- MD5
- 2e8449b98200579becfd6d46b401741c
- Název
- dataset.csv
- Velikost
- 1.12 MB
- Formát
- Neznámý
- Popis
- The dataset in CSV format
- MD5
- 81f85c82b5ca7f5e23605face62fd5fd
- Název
- dataset.json
- Velikost
- 1.77 MB
- Formát
- Neznámý
- Popis
- The dataset in JSON format
- MD5
- 552ea0396c3184a74588b2c151b73bef