dc.contributor.author | Šmídl, Luboš |
dc.date.accessioned | 2013-01-01T14:56:06Z |
dc.date.available | 2013-01-01T14:56:06Z |
dc.date.issued | 2013-01-01 |
dc.identifier | ZCU_CZ_ ATCC-LM4ASR |
dc.identifier.uri | http://hdl.handle.net/11858/00-097C-0000-000D-EC92-F |
dc.description | The corpus contains pronunciation lexicon and n-gram counts (unigrams, bigrams and trigrams) that can be used for constructing the language model for air traffic control communication domain. It could be used together with the Air Traffic Control Communication corpus (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0). |
dc.description.sponsorship | Technology Agency of the Czech Republic, project No. TA01030476 |
dc.language.iso | eng |
dc.publisher | University of West Bohemia, Department of Cybernetics |
dc.rights | Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/3.0/ |
dc.subject | pronunciation lexicon |
dc.subject | n-gram counts |
dc.subject | language model |
dc.title | ATCC: Pronunciation lexicon and n-gram counts for ASR module |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContactInfo#PersonInfo.surname | Šmídl |
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName | Luboš |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName | University of West Bohemia |
metashare.ResourceInfo#DistributionInfo.availability | restrictedUse |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse | academic-nonCommercialUse |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse | attribution |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.distributionAccessMedium | downloadable |
metashare.ResourceInfo#ValidationInfo.validated | True |
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName | Inteligentní technologie pro zvýšení bezpečnosti letového provozu |
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.fundingType | nationalFunds |
metashare.ResourceInfo#ContentInfo.mediaType | text |
metashare.ResourceInfo#TextInfo#SizeInfo.size | 236500 |
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit | other |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email | ircing@kky.zcu.cz |
metashare.ResourceInfo#ContentInfo.detailedType | other |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
sponsor | Technologická agentura České republiky TA01030476 Inteligentní technologie pro zvýšení bezpečnosti letového provozu nationalFunds |
size.info | 236500 other |
files.size | 7896750 |
files.count | 7 |
Files in this item
Download all files in item (7.53 MB)This item is
Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
Publicly Available
and licensed under:Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
- Name
- v01_dict_mix.txt
- Size
- 169.64 KB
- Format
- Text file
- Description
- Pronunciation lexicon
- MD5
- 8de3a9ec38eda5c1ad1624c9e61de967
&. d eh s ih m l &0 z ia r ow &0 z ih r ow &0 z iy r ow &1 w ah n &1+ w ah n &10 t eh n &100 hh ah n d r ah d &100 hh ah n d r ih d &1000 th aw z n d &11 ih l eh v n &12 t w eh l v &13 th er t iy n &15 f ih f t iy n &1500 f ih f t iy n hh ah n d r ah d &1500 w ah n th aw z n d f ay v hh ah n d r ah d &1500 w ah n th aw z n d f ay v hh ah n d r ih d &18 ey t iy n &19 n ay n t iy n &2 t uw &200 t uw hh ah n d r ah d &200 t uw hh ah n d r ih d &2000 t uw th aw z n d &21 t w eh n t iy w ah n &24 t w eh n t iy f ao r &27 t w eh n t iy s eh v n &28 t w eh n t iy ey t &3 th r iy &30 th er t iy &300 th r iy hh ah n d r ah d &300 th r iy hh ah n d r ih d &31 th er t iy w ah n &4 f ao &4 f ao r &4 f ow r &5 f ay v &50 f ih f t iy &500 f ay v hh ah n d r ah d &500 f ay v hh ah n d r ih d &6 s ih k s &60 s ih k s t iy &7 s eh v n &70 s eh v n t iy &700 s eh v n hh ah n d r ah d &700 s eh v n hh ah n d r ih d &8 ey t &80 ey t iy &9 n ay n ah r &9+ n a . . .
- Name
- v01_words_mix.1gram.counts
- Size
- 52.17 KB
- Format
- Unknown
- Description
- Unigram counts (non-speech events included)
- MD5
- fe04b2e566718c79733c924146089b1d
- Name
- v01_words_mix.1gram.no-nse.counts
- Size
- 52.13 KB
- Format
- Unknown
- Description
- Unigram counts (non-speech events removed)
- MD5
- 4e7cf3fd766462eae4fc892a63f161ef
- Name
- v01_words_mix.2gram.counts
- Size
- 718.53 KB
- Format
- Unknown
- Description
- Bigram counts (non-speech events included)
- MD5
- 32c5c56c74fe3b72d263c72f1e1dd035
- Name
- v01_words_mix.2gram.no-nse.counts
- Size
- 683.03 KB
- Format
- Unknown
- Description
- Bigram counts (non-speech events removed)
- MD5
- 587c12207b27f1d3a844c5d874d30990
- Name
- v01_words_mix.3gram.counts
- Size
- 3.08 MB
- Format
- Unknown
- Description
- Trigram counts (non-speech events included)
- MD5
- 674daa0835b58c7d885045d763331658
- Name
- v01_words_mix.3gram.no-nse.counts
- Size
- 2.81 MB
- Format
- Unknown
- Description
- Trigram counts (non-speech events removed)
- MD5
- 6ccbdc77f93b9925b1716d7e22402736