dc.contributor.author | Šmídl, Luboš |
dc.date.accessioned | 2013-01-01T14:56:06Z |
dc.date.available | 2013-01-01T14:56:06Z |
dc.date.issued | 2013-01-01 |
dc.identifier | ZCU_CZ_ ATCC-LM4ASR |
dc.identifier.uri | http://hdl.handle.net/11858/00-097C-0000-000D-EC92-F |
dc.description | The corpus contains pronunciation lexicon and n-gram counts (unigrams, bigrams and trigrams) that can be used for constructing the language model for air traffic control communication domain. It could be used together with the Air Traffic Control Communication corpus (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0). |
dc.description.sponsorship | Technology Agency of the Czech Republic, project No. TA01030476 |
dc.language.iso | eng |
dc.publisher | University of West Bohemia, Department of Cybernetics |
dc.rights | Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/3.0/ |
dc.subject | pronunciation lexicon |
dc.subject | n-gram counts |
dc.subject | language model |
dc.title | ATCC: Pronunciation lexicon and n-gram counts for ASR module |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContactInfo#PersonInfo.surname | Šmídl |
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName | Luboš |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName | University of West Bohemia |
metashare.ResourceInfo#DistributionInfo.availability | restrictedUse |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse | academic-nonCommercialUse |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse | attribution |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.distributionAccessMedium | downloadable |
metashare.ResourceInfo#ValidationInfo.validated | True |
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName | Inteligentní technologie pro zvýšení bezpečnosti letového provozu |
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.fundingType | nationalFunds |
metashare.ResourceInfo#ContentInfo.mediaType | text |
metashare.ResourceInfo#TextInfo#SizeInfo.size | 236500 |
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit | other |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email | ircing@kky.zcu.cz |
metashare.ResourceInfo#ContentInfo.detailedType | other |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
sponsor | Technologická agentura České republiky TA01030476 Inteligentní technologie pro zvýšení bezpečnosti letového provozu nationalFunds |
size.info | 236500 other |
files.size | 7896750 |
files.count | 7 |
Soubory tohoto záznamu
Stáhnout všechny soubory záznamu (7.53 MB)Licenční kategorie:
Licence: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
Publicly Available
Licence: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
- Název
- v01_dict_mix.txt
- Velikost
- 169.64 KB
- Formát
- Textový soubor
- Popis
- Pronunciation lexicon
- MD5
- 8de3a9ec38eda5c1ad1624c9e61de967
&. d eh s ih m l &0 z ia r ow &0 z ih r ow &0 z iy r ow &1 w ah n &1+ w ah n &10 t eh n &100 hh ah n d r ah d &100 hh ah n d r ih d &1000 th aw z n d &11 ih l eh v n &12 t w eh l v &13 th er t iy n &15 f ih f t iy n &1500 f ih f t iy n hh ah n d r ah d &1500 w ah n th aw z n d f ay v hh ah n d r ah d &1500 w ah n th aw z n d f ay v hh ah n d r ih d &18 ey t iy n &19 n ay n t iy n &2 t uw &200 t uw hh ah n d r ah d &200 t uw hh ah n d r ih d &2000 t uw th aw z n d &21 t w eh n t iy w ah n &24 t w eh n t iy f ao r &27 t w eh n t iy s eh v n &28 t w eh n t iy ey t &3 th r iy &30 th er t iy &300 th r iy hh ah n d r ah d &300 th r iy hh ah n d r ih d &31 th er t iy w ah n &4 f ao &4 f ao r &4 f ow r &5 f ay v &50 f ih f t iy &500 f ay v hh ah n d r ah d &500 f ay v hh ah n d r ih d &6 s ih k s &60 s ih k s t iy &7 s eh v n &70 s eh v n t iy &700 s eh v n hh ah n d r ah d &700 s eh v n hh ah n d r ih d &8 ey t &80 ey t iy &9 n ay n ah r &9+ n a . . .
- Název
- v01_words_mix.1gram.counts
- Velikost
- 52.17 KB
- Formát
- Neznámý
- Popis
- Unigram counts (non-speech events included)
- MD5
- fe04b2e566718c79733c924146089b1d
- Název
- v01_words_mix.1gram.no-nse.counts
- Velikost
- 52.13 KB
- Formát
- Neznámý
- Popis
- Unigram counts (non-speech events removed)
- MD5
- 4e7cf3fd766462eae4fc892a63f161ef
- Název
- v01_words_mix.2gram.counts
- Velikost
- 718.53 KB
- Formát
- Neznámý
- Popis
- Bigram counts (non-speech events included)
- MD5
- 32c5c56c74fe3b72d263c72f1e1dd035
- Název
- v01_words_mix.2gram.no-nse.counts
- Velikost
- 683.03 KB
- Formát
- Neznámý
- Popis
- Bigram counts (non-speech events removed)
- MD5
- 587c12207b27f1d3a844c5d874d30990
- Název
- v01_words_mix.3gram.counts
- Velikost
- 3.08 MB
- Formát
- Neznámý
- Popis
- Trigram counts (non-speech events included)
- MD5
- 674daa0835b58c7d885045d763331658
- Název
- v01_words_mix.3gram.no-nse.counts
- Velikost
- 2.81 MB
- Formát
- Neznámý
- Popis
- Trigram counts (non-speech events removed)
- MD5
- 6ccbdc77f93b9925b1716d7e22402736