Show simple item record

 
dc.contributor.author Šmídl, Luboš
dc.date.accessioned 2013-01-01T14:56:06Z
dc.date.available 2013-01-01T14:56:06Z
dc.date.issued 2013-01-01
dc.identifier ZCU_CZ_ ATCC-LM4ASR
dc.identifier.uri http://hdl.handle.net/11858/00-097C-0000-000D-EC92-F
dc.description The corpus contains pronunciation lexicon and n-gram counts (unigrams, bigrams and trigrams) that can be used for constructing the language model for air traffic control communication domain. It could be used together with the Air Traffic Control Communication corpus (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0).
dc.description.sponsorship Technology Agency of the Czech Republic, project No. TA01030476
dc.language.iso eng
dc.publisher University of West Bohemia, Department of Cybernetics
dc.rights Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc/3.0/
dc.subject pronunciation lexicon
dc.subject n-gram counts
dc.subject language model
dc.title ATCC: Pronunciation lexicon and n-gram counts for ASR module
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContactInfo#PersonInfo.surname Šmídl
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName Luboš
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName University of West Bohemia
metashare.ResourceInfo#DistributionInfo.availability restrictedUse
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse academic-nonCommercialUse
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse attribution
metashare.ResourceInfo#DistributionInfo#LicenseInfo.distributionAccessMedium downloadable
metashare.ResourceInfo#ValidationInfo.validated True
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName Inteligentní technologie pro zvýšení bezpečnosti letového provozu
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.fundingType nationalFunds
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#TextInfo#SizeInfo.size 236500
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit other
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email ircing@kky.zcu.cz
metashare.ResourceInfo#ContentInfo.detailedType other
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
sponsor Technologická agentura České republiky TA01030476 Inteligentní technologie pro zvýšení bezpečnosti letového provozu nationalFunds
size.info 236500 other
files.size 7896750
files.count 7


 Files in this item

 Download all files in item (7.53 MB)
This item is
Publicly Available
and licensed under:
Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
Distributed under Creative Commons Attribution Required Noncommercial
Icon
Name
v01_dict_mix.txt
Size
169.64 KB
Format
Text file
Description
Pronunciation lexicon
MD5
8de3a9ec38eda5c1ad1624c9e61de967
 Download file  Preview
 File Preview  
&.	d eh s ih m l
&0	z ia r ow
&0	z ih r ow
&0	z iy r ow
&1	w ah n
&1+	w ah n
&10	t eh n
&100	hh ah n d r ah d
&100	hh ah n d r ih d
&1000	th aw z n d
&11	ih l eh v n
&12	t w eh l v
&13	th er t iy n
&15	f ih f t iy n
&1500	f ih f t iy n hh ah n d r ah d
&1500	w ah n th aw z n d f ay v hh ah n d r ah d
&1500	w ah n th aw z n d f ay v hh ah n d r ih d
&18	ey t iy n
&19	n ay n t iy n
&2	t uw
&200	t uw hh ah n d r ah d
&200	t uw hh ah n d r ih d
&2000	t uw th aw z n d
&21	t w eh n t iy w ah n
&24	t w eh n t iy f ao r
&27	t w eh n t iy s eh v n
&28	t w eh n t iy ey t
&3	th r iy
&30	th er t iy
&300	th r iy hh ah n d r ah d
&300	th r iy hh ah n d r ih d
&31	th er t iy w ah n
&4	f ao
&4	f ao r
&4	f ow r
&5	f ay v
&50	f ih f t iy
&500	f ay v hh ah n d r ah d
&500	f ay v hh ah n d r ih d
&6	s ih k s
&60	s ih k s t iy
&7	s eh v n
&70	s eh v n t iy
&700	s eh v n hh ah n d r ah d
&700	s eh v n hh ah n d r ih d
&8	ey t
&80	ey t iy
&9	n ay n ah r
&9+	n a . . .
                                            
Icon
Name
v01_words_mix.1gram.counts
Size
52.17 KB
Format
Unknown
Description
Unigram counts (non-speech events included)
MD5
fe04b2e566718c79733c924146089b1d
 Download file
Icon
Name
v01_words_mix.1gram.no-nse.counts
Size
52.13 KB
Format
Unknown
Description
Unigram counts (non-speech events removed)
MD5
4e7cf3fd766462eae4fc892a63f161ef
 Download file
Icon
Name
v01_words_mix.2gram.counts
Size
718.53 KB
Format
Unknown
Description
Bigram counts (non-speech events included)
MD5
32c5c56c74fe3b72d263c72f1e1dd035
 Download file
Icon
Name
v01_words_mix.2gram.no-nse.counts
Size
683.03 KB
Format
Unknown
Description
Bigram counts (non-speech events removed)
MD5
587c12207b27f1d3a844c5d874d30990
 Download file
Icon
Name
v01_words_mix.3gram.counts
Size
3.08 MB
Format
Unknown
Description
Trigram counts (non-speech events included)
MD5
674daa0835b58c7d885045d763331658
 Download file
Icon
Name
v01_words_mix.3gram.no-nse.counts
Size
2.81 MB
Format
Unknown
Description
Trigram counts (non-speech events removed)
MD5
6ccbdc77f93b9925b1716d7e22402736
 Download file

Show simple item record