This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

ATCC: Pronunciation lexicon and n-gram counts for ASR module

Please use the following text to cite this item or export to a predefined format:
Šmídl, Luboš, 2013, ATCC: Pronunciation lexicon and n-gram counts for ASR module, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11858/00-097C-0000-000D-EC92-F.
Date issued
2013-01-01
Size
236500 other
Language(s)
Description
The corpus contains pronunciation lexicon and n-gram counts (unigrams, bigrams and trigrams) that can be used for constructing the language model for air traffic control communication domain. It could be used together with the Air Traffic Control Communication corpus (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0).
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
v01_dict_mix.txt
Size
169.64 KB
Format
text/plain
Description
Pronunciation lexicon
MD5
8de3a9ec38eda5c1ad1624c9e61de967
Preview
  File Preview
    &.	d eh s ih m l
    &0	z ia r ow
    &0	z ih r ow
    &0	z iy r ow
    &1	w ah n
    &1+	w ah n
    &10	t eh n
    &100	hh ah n d r ah d
    &100	hh ah n d r ih d
    &1000	th aw z n d
    &11	ih l eh v n
    &12	t w eh l v
    &13	th er t iy n
    &15	f ih f t iy n
    &1500	f ih f t iy n hh ah n d r ah d
    &1500	w ah n th aw z n d f ay v hh ah n d r ah d
    &1500	w ah n th aw z n d f ay v hh ah n d r ih d
    &18	ey t iy n
    &19	n ay n t iy n
    &2	t uw
    &200	t uw hh ah n d r ah d
    &200	t uw hh ah n d r ih d
    &2000	t uw th aw z n d
    &21	t w eh n t iy w ah n
    &24	t w eh n t iy f ao r
    &27	t w eh n t iy s eh v n
    &28	t w eh n t iy ey t
    &3	th r iy
    &30	th er t iy
    &300	th r iy hh ah n d r ah d
    &300	th r iy hh ah n d r ih d
    &31	th er t iy w ah n
    &4	f ao
    &4	f ao r
    &4	f ow r
    &5	f ay v
    &50	f ih f t iy
    &500	f ay v hh ah n d r ah d
    &500	f ay v hh ah n d r ih d
    &6	s ih k s
    &60	s ih k s t iy
    &7	s eh v n
    &70	s eh v n t iy
    &700	s eh v n hh ah n d r ah d
    &700	s eh v n hh ah n d r ih d
    &8	ey t
    &80	ey t iy
    &9	n ay n ah r
    &9+	n ay n
    &9+	n ay n ah r
    ((+4))	f aa r
    ((+4))	f ow r
    ((+7))	s eh v ah n
    ((+7))	s eh v eh n
    ((+7))	s eh v ih n
    ((+9))	n ay n
    ((+a))	aa
    ((+A))	aa
    ((+�))	aa
    ((+a))	ah
    ((+A))	ah
    ((+a))	ey
    ((+A))	ey
    ((+a))	ia
    ((+A))	ia
    ((+act))	ae k t
    ((+adar))	ah d aa r
    ((+adar))	ey d aa r
    ((+ague))	aa g
    ((+aha))	aa hh aa
    ((+�ha))	aa hh ah
    ((+aha))	ah hh ah
    ((+aining))	ey n ih ng
    ((+aining))	ih n ih ng
    ((+air))	ea
    ((+air))	ea r
    ((+air))	eh r
    ((+air))	ey r
    ((+aj))	aa jh
    ((+aj))	ae jh
    ((+aj))	ah jh
    ((+aj))	ay
    ((+al))	ah l
    ((+amrock))	ae m r aa k
    ((+amrock))	ae m r oh k
    ((+an))	aa n
    ((+an))	ae n
    ((+an))	ah n
    ((+and))	ae n d
    ((+and))	ah n d
    ((+andinavian))	ae n d ih n ey v ia n
    ((+andinavian))	ae n d ih n ey v iy ah n
    ((+andinavian))	ae n d ih n ey v iy ih n
    ((+anding))	ae n d ih ng
    ((+anding))	ah n d ih ng
    ((+anex))	aa n eh k s
    ((+anex))	ah n eh k s
    ((+anex))	ey n eh k s
    ((+ank))	ae ng k
    ((+ank))	ah ng k
    ((+ansa))	aa n s ah
    ((+ansa))	aa n z ah
    ((+ansa))	ae n s ah
    ((+ansa))	ae n z ah
    ((+ansa))	ah n s ah
    ((+anway))	ae n w ey
    ((+anway))	ah n w ey
    ((+ar))	aa  . . .
Name
v01_words_mix.1gram.counts
Size
52.17 KB
Format
application/octet-stream
Description
Unigram counts (non-speech events included)
MD5
fe04b2e566718c79733c924146089b1d
Preview
  File Preview
Name
v01_words_mix.1gram.no-nse.counts
Size
52.13 KB
Format
application/octet-stream
Description
Unigram counts (non-speech events removed)
MD5
4e7cf3fd766462eae4fc892a63f161ef
Preview
  File Preview
Name
v01_words_mix.2gram.counts
Size
718.53 KB
Format
application/octet-stream
Description
Bigram counts (non-speech events included)
MD5
32c5c56c74fe3b72d263c72f1e1dd035
Preview
  File Preview
Name
v01_words_mix.2gram.no-nse.counts
Size
683.03 KB
Format
application/octet-stream
Description
Bigram counts (non-speech events removed)
MD5
587c12207b27f1d3a844c5d874d30990
Preview
  File Preview
Name
v01_words_mix.3gram.counts
Size
3.08 MB
Format
application/octet-stream
Description
Trigram counts (non-speech events included)
MD5
674daa0835b58c7d885045d763331658
Preview
  File Preview
Name
v01_words_mix.3gram.no-nse.counts
Size
2.81 MB
Format
application/octet-stream
Description
Trigram counts (non-speech events removed)
MD5
6ccbdc77f93b9925b1716d7e22402736
Preview
  File Preview