ATCC: Pronunciation lexicon and n-gram counts for ASR module
Please use the following text to cite this item or export to a predefined format:
Šmídl, Luboš, 2013,
ATCC: Pronunciation lexicon and n-gram counts for ASR module, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11858/00-097C-0000-000D-EC92-F.
Authors
Item identifier
Date issued
2013-01-01
Size
236500 other
Language(s)
Description
The corpus contains pronunciation lexicon and n-gram counts (unigrams, bigrams and trigrams) that can be used for constructing the language model for air traffic control communication domain. It could be used together with the Air Traffic Control Communication corpus (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0).
Acknowledgement
Technologická agentura České republiky
Project code:TA01030476
Project name:Inteligentní technologie pro zvýšení bezpečnosti letového provozu
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- v01_dict_mix.txt
- Size
- 169.64 KB
- Format
- text/plain
- Description
- Pronunciation lexicon
- MD5
- 8de3a9ec38eda5c1ad1624c9e61de967

&. d eh s ih m l &0 z ia r ow &0 z ih r ow &0 z iy r ow &1 w ah n &1+ w ah n &10 t eh n &100 hh ah n d r ah d &100 hh ah n d r ih d &1000 th aw z n d &11 ih l eh v n &12 t w eh l v &13 th er t iy n &15 f ih f t iy n &1500 f ih f t iy n hh ah n d r ah d &1500 w ah n th aw z n d f ay v hh ah n d r ah d &1500 w ah n th aw z n d f ay v hh ah n d r ih d &18 ey t iy n &19 n ay n t iy n &2 t uw &200 t uw hh ah n d r ah d &200 t uw hh ah n d r ih d &2000 t uw th aw z n d &21 t w eh n t iy w ah n &24 t w eh n t iy f ao r &27 t w eh n t iy s eh v n &28 t w eh n t iy ey t &3 th r iy &30 th er t iy &300 th r iy hh ah n d r ah d &300 th r iy hh ah n d r ih d &31 th er t iy w ah n &4 f ao &4 f ao r &4 f ow r &5 f ay v &50 f ih f t iy &500 f ay v hh ah n d r ah d &500 f ay v hh ah n d r ih d &6 s ih k s &60 s ih k s t iy &7 s eh v n &70 s eh v n t iy &700 s eh v n hh ah n d r ah d &700 s eh v n hh ah n d r ih d &8 ey t &80 ey t iy &9 n ay n ah r &9+ n ay n &9+ n ay n ah r ((+4)) f aa r ((+4)) f ow r ((+7)) s eh v ah n ((+7)) s eh v eh n ((+7)) s eh v ih n ((+9)) n ay n ((+a)) aa ((+A)) aa ((+�)) aa ((+a)) ah ((+A)) ah ((+a)) ey ((+A)) ey ((+a)) ia ((+A)) ia ((+act)) ae k t ((+adar)) ah d aa r ((+adar)) ey d aa r ((+ague)) aa g ((+aha)) aa hh aa ((+�ha)) aa hh ah ((+aha)) ah hh ah ((+aining)) ey n ih ng ((+aining)) ih n ih ng ((+air)) ea ((+air)) ea r ((+air)) eh r ((+air)) ey r ((+aj)) aa jh ((+aj)) ae jh ((+aj)) ah jh ((+aj)) ay ((+al)) ah l ((+amrock)) ae m r aa k ((+amrock)) ae m r oh k ((+an)) aa n ((+an)) ae n ((+an)) ah n ((+and)) ae n d ((+and)) ah n d ((+andinavian)) ae n d ih n ey v ia n ((+andinavian)) ae n d ih n ey v iy ah n ((+andinavian)) ae n d ih n ey v iy ih n ((+anding)) ae n d ih ng ((+anding)) ah n d ih ng ((+anex)) aa n eh k s ((+anex)) ah n eh k s ((+anex)) ey n eh k s ((+ank)) ae ng k ((+ank)) ah ng k ((+ansa)) aa n s ah ((+ansa)) aa n z ah ((+ansa)) ae n s ah ((+ansa)) ae n z ah ((+ansa)) ah n s ah ((+anway)) ae n w ey ((+anway)) ah n w ey ((+ar)) aa . . .
- Name
- v01_words_mix.1gram.counts
- Size
- 52.17 KB
- Format
- application/octet-stream
- Description
- Unigram counts (non-speech events included)
- MD5
- fe04b2e566718c79733c924146089b1d

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- v01_words_mix.1gram.no-nse.counts
- Size
- 52.13 KB
- Format
- application/octet-stream
- Description
- Unigram counts (non-speech events removed)
- MD5
- 4e7cf3fd766462eae4fc892a63f161ef

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- v01_words_mix.2gram.counts
- Size
- 718.53 KB
- Format
- application/octet-stream
- Description
- Bigram counts (non-speech events included)
- MD5
- 32c5c56c74fe3b72d263c72f1e1dd035

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- v01_words_mix.2gram.no-nse.counts
- Size
- 683.03 KB
- Format
- application/octet-stream
- Description
- Bigram counts (non-speech events removed)
- MD5
- 587c12207b27f1d3a844c5d874d30990

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- v01_words_mix.3gram.counts
- Size
- 3.08 MB
- Format
- application/octet-stream
- Description
- Trigram counts (non-speech events included)
- MD5
- 674daa0835b58c7d885045d763331658

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- v01_words_mix.3gram.no-nse.counts
- Size
- 2.81 MB
- Format
- application/octet-stream
- Description
- Trigram counts (non-speech events removed)
- MD5
- 6ccbdc77f93b9925b1716d7e22402736

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

