Czech data - both train and test+eval sets, as well as the valency dictionary - for the CoNLL 2009 Shared Task. Documentation is included. The data are generated from PDT 2.0. LDC catalog number: LDC2009E34B and MSM 0021620838 (http://ufal.mff.cuni.cz:8080/bib/?section=grant&id=116488695895567&mode=view)
Czech trial (example) data for CoNLL 2009 Shared Task. The data are generated from PDT 2.0. LDC2009E32B and MSM 0021620838 (http://ufal.mff.cuni.cz:8080/bib/?section=grant&id=116488695895567&mode=view)
A taxonomic revision of Taraxacum sect. Leucantha Soest is presented. Species in this section are mainly characterized by the pale bordered and appressed outer involucral bracts, achenes covered with subsparse coarse spinules, thick cylindrical cone and a relatively short, thicker rostrum, and often white or pale yellowish flowers. They occur in subsaline wet meadows and steppe depressions over a large area including Mongolia, South Siberia, NE, N and W China, Tibet, the Western Himalayas, Tadzhikistan, Kyrgyzstan and E and NE Kazakhstan. Eighteen species are recognized, seven of them described as new: Taraxacum niveum from the Altai and Dzhungaria, T. candidatum centred in Ladakh, Tadzhikistan and Kyrgyzstan, T. album from Kyrgyzstan, T. flavidum from Mongolia and Transbaikalia, T. occultum from East Mongolia, T. virgineum from Ladakh, India, and T. inimitabile from Gobi-Altai, Mongolia. An analysis of syntypes of the names T. dealbatum Hand.-Mazz. and T. sinense Dahlstedt is given. For the safe interpretation of the name T. luridum, epitype was designated. All the species are agamospermous but sexuality and diploidy is documented for a few Transbaikalian plants of the section Leucantha.
On the basis of rich material from Asia, a recently described group of dandelions, Taraxacum sect. Stenoloba Kirschner et Štěpánek, is revised taxonomically. Four previously described species are recognized: T. sinomongolicum, newly typified, T. mongoliforme, with a lectotype replacing the original holotype now not extant, and a new epitype, T. scariosum, a new combination of Leontodon scariosus Tausch, replacing the frequently confused names, T. asiaticum, newly typified, and T. stenolobum, and T. multisectum, a taxon for the first time compared with other members of the section. Three new species are described: T. abax occupies a large range from S Siberia and Mongolia to NE China, T. abalienatum and T. odibile are known from Mongolia and SE Siberia. Taraxacum abax and T. abalienatum represent core species of the section Stenoloba, whilst T. odibile exhibits a mixture of characters of sections Stenoloba and Leucantha. All the known members of the section Stenoloba are agamosperms. Taraxacum mongoliforme, T. abax and T. scariosum proved to be triploid with 2n = 24. This account includes detailed descriptions and an identification key.
On the basis of the authors’ collections and cultivated material from Asia, a recently described group of dandelions, Taraxacum sect. Suavia, is revised. In addition to three species described previously (T. haneltii, T. sumneviczii and T. formosissimum), six new species from Mongolia and Kyrgyzstan are recognized. Three of them, T. suave, T. stupendum and T. margaritarium, possess most of the features characterizing the section Suavia, one, T. suasorium, is regarded as intermediate between sections Suavia and Leucantha, whilst the remaining two, T. nobile and T. venustius, exhibit some characters of another related section (T. sect. Stenoloba). The members of the section Suavia are agamospermous. Detailed descriptions, drawings and an identification key are given.
Karyological variation, reproductive isolation, morphological differentiation and geographic distribution of the cytotypes of Centaurea phrygia were investigated in Central Europe. Occurrence of two dominant cytotypes, diploid (2n = 22) and tetraploid (2n = 44), was confirmed and additionally triploid, pentaploid and hexaploid ploidy levels identified using flow cytometry. Allozyme variation as well as morphological and genome size data suggest an autopolyploid origin of the tetraploids. Crossing experiments and flow cytometric screening of mixed populations revealed strong reproductive isolation of the cytotypes. Multivariate morphometric analysis revealed significant differentiation between the cytotypes in several morphological characters (pappus length, length and colour of appendages on involucral bracts, involucre width). The cytotypes have a parapatric distribution with only a small contact zone: diploids occupy the whole of the Central and North European geographic range of the species except for the major part of the Western Carpathians, whereas tetraploids are confined to the Western Carpathians and adjacent areas, both cytotypes co-occurring only in a limited area of intra-montane basins of the Western Carpathians. Based on this array of data, taxonomic treatment of the cytotypes as autonomous species is proposed. The name Centaurea phrygia is applied to the diploids and the name C. erdneri belongs to the tetraploids; nomenclature of hybrids with C. jacea is also resolved.
Syntactic (including deep-syntactic - tectogrammatical) annotation of user-generated noisy sentences. The annotation was made on Czech-English and English-Czech Faust Dev/Test sets.
The English data includes manual annotations of English reference translations of Czech source texts. This texts were translated independently by two translators. After some necessary cleanings, 1000 segments were randomly selected for manual annotation. Both the reference translations were annotated, which means 2000 annotated segments in total.
The Czech data includes manual annotations of Czech reference translations of English source texts. This texts were translated independently by three translators. After some necessary cleanings, 1000 segments were randomly selected for manual annotation. All three reference translations were annotated, which means 3000 annotated segments in total.
Faust is part of PDT-C 1.0 (http://hdl.handle.net/11234/1-3185).
This machine translation test set contains 2223 Czech sentences collected within the FAUST project (https://ufal.mff.cuni.cz/grants/faust, http://hdl.handle.net/11234/1-3308).
Each original (noisy) sentence was normalized (clean1 and clean2) and translated to English independently by two translators.
A detailed study of Taraxacum sect. Ruderalia for the 8th volume of the Flora of the Czech Republic revealed five new agamospermous species, viz. T. atroviride Štěpánek et Trávníček, T. clarum Kirschner, Štěpánek et Trávníček, T. moldavicum Chán, H. Ollgaard, Štěpánek, Trávníček et Žíla, T. urbicola Kirschner, Štěpánek et Trávníček and T. violaceifrons Trávníček. These species are formally described, thoroughly characterized morphologically and compared with similar taxa. They are known from numerous localities in Central Europe; T. moldavicum, in addition to the Central European distribution, is known to occur in two regions in Denmark. All these species are also documented by photographs of their general habit and important features.
Plant names based on the original material from a restricted region are scientifically important for the study of local biodiversity. Names typified with or entirely based on the original material from the Czech Republic are studied in the present paper; the names are confined to cases of generally accepted names published and taxa described in the period 1753–1820. Some names with original material coming from a border region (mostly near the Polish border) are included, too. Brief notes and references are given to introduce the authors of names and the history of their herbarium collections. New data are given on publications and herbaria of F.W. Schmidt, T. Haenke and J. E. Pohl, including examples of their handwritings; the other authors being C. Linnaeus (and J. Burser), J. Zauschner, K. L.Willdenow, J. C. Mikan, K. Sternberg, H. A. Schrader, L. Trattinick, K. B. Presl, J. S. Presl, P. M. Opiz, I. F. Tausch and H. G. L. Reichenbach. Nomenclatural and taxonomic notes are given on Aconitum plicatum, Allium senescens subsp. montanum, Gagea bohemica, Plantago uliginosa, Spergularia salina, Valeriana officinalis, V. exaltata, V. sambucifolia and Veronica triloba. A number of names are typified (lecto-, neo- , epitypes): Allium montanum, Athyrium distentifolium, Erysimum arcuatum (= Barbarea vulgaris subsp. arcuata), Schmidtia (= Coleanthus) subtilis, Epilobium nutans, Ornithogalum bohemicum (= Gagea bohemica), Hieracium sudeticum, Myosotis sparsiflora, Cynoglossum (= Omphalodes) scorpioides, Pedicularis sudetica, Phyteuma nigrum, Plantago uliginosa (with an identification key), Poa laxa, Soldanella montana, Symphytum bohemicum, Thlaspi caerulescens, Valeriana exaltata (with notes on the typification of V. officinalis), V. sambucifolia, Veronica triloba (with a note on the status of names in Čelakovský‘s works), Viola sudetica and V. saxatilis. The other names included in the list are Avenula planiculmis, Cardamine amara subsp. opicii, Eriophorum vaginatum, Hieracium rupestre (= H. schmidtii), Luzula sudetica, Mentha longifolia, Potentilla lindackeri, Rosa elliptica, Salix silesiaca, Stipa capillata and Viola rupestris. A few cases of names excluded from the list are also analysed: Achillea millefolium subsp. sudetica, Alchemilla fissa, Carex bohemica, Dactylorhiza longebracteata, Gagea pusilla, Geranium bohemicum, Matricaria recutita, Veronica dentata, Spergularia salina (correct name: S. marina), Gentianella obtusifolia, Myosotis alpestris and Mentha rotundifolia. For most cases, conservation status and situation at the original localities (in many cases in protected areas) are discussed.