CoNLL 2017 and 2018 shared tasks:
Multilingual Parsing from Raw Text to Universal Dependencies
This package contains the test data in the form in which they ware presented
to the participating systems: raw text files and files preprocessed by UDPipe.
The metadata.json files contain lists of files to process and to output;
README files in the respective folders describe the syntax of metadata.json.
For full training, development and gold standard test data, see
Universal Dependencies 2.0 (CoNLL 2017)
Universal Dependencies 2.2 (CoNLL 2018)
See the download links at http://universaldependencies.org/.
For more information on the shared tasks, see
http://universaldependencies.org/conll17/
http://universaldependencies.org/conll18/
Contents:
conll17-ud-test-2017-05-09 ... CoNLL 2017 test data
conll18-ud-test-2018-05-06 ... CoNLL 2018 test data
conll18-ud-test-2018-05-06-for-conll17 ... CoNLL 2018 test data with metadata
and filenames modified so that it is digestible by the 2017 systems.
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Changes in version 1.1:
1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset.
2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0.
3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
Growth in length and weight, based on a combination of scale annulus interpretation and back-calculation using the Fraser-Lee model, was studied in male and female barbel, Barbus barbus, from a section of the River Jihlava sampled in 1999–2001. Results were compared with growth data obtained with similar methods in 1976, prior to construction and functioning of a hydropower scheme complex, and during the period of the scheme’s partial operation (1980–1984). Recent growth rate, under seemingly fully-stabilised environmental conditions and complete adaptation of the barbel population, showed the highest distinct sexual dimorphism in growth rate was also confirmed, with females growing faster than males, though to a lower extent than recorded both during previous periods and from several other localities. Further, upon comparison of back-calculated lengths for previous years of recently tagged-and-recaptured fish (1999–2001) with observed lengths directly measured at corresponding ages, no significant differences were overall found between the results obtained by either method in most age groups. Finally, the linear Fraser-Lee model proved a sufficiently accurate and practical method for back-calculating lengths for previous years of life also in barbel.
The goal of this paper is to provide an overview of the structure and contents of the soon-to-be available ORAL corpus, which combines previously published corpora (ORAL2006, ORAL2008 and ORAL2013) with newly transcribed material into a single conveniently accessible and more richly annotated resource, about 6 million running words in length. The recordings and corresponding transcripts span a decade between 2002 and 2011; most of them capture interactions of mutually well-acquainted speakers, in informal situations and natural settings. The corpus is complemented by amarginal portion of more formal data, mostly public talks. It is tagged and lemmatized, and an effort was made to adapt existing tools (targeted at written language) to yield better results on spoken data. We hope the availability of such a resource will spawn further discussions on the morphological and syntactic analysis of spoken language, perhaps resulting in more radical departures in the future from the part-of-speech classification inherited from the linguistic analysis of written language.
MorphoDiTa: Morphological Dictionary and Tagger is an open-source tool for morphological analysis of natural language texts. It performs morphological analysis, morphological generation, tagging and tokenization and is distributed as a standalone tool or a library, along with trained linguistic models. In the Czech language, MorphoDiTa achieves state-of-the-art results with a throughput around 10-200K words per second. MorphoDiTa is a free software under LGPL license and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions.
We studied movement and abundance of barbel, Barbus barbus , over three years (October 1995 to September 1998) in two stretches (Woolmer’s Park, Holwell Bridge) of a section of the River Lee (Hertfordshire, England) delimitated by water retention structures. Of 349 tagged individuals (168 at Woolmer’s Park; 181 at Holwell Bridge), 51.8 % and 13.3 % respectively were recaptured at least once, with a much higher rate of multiple recaptures at Woolmer’s Park, where monitoring of movements was over a longer period, than at Holwell Bridge, where too few recaptures were made for further movement analysis. At Woolmer’s Park, 77.1 % of the barbel showed limited (i.e. resident component) and the rest greater between- capture movements (i.e. mobile component). There was no preferential directional movement across size classes. Based on the available recapture data, population size (estimated through a Bayesian method) first increased moderately (1995–96) and then sharply (1996–97) at Woolmer’s Park, and even further later at Holwell Bridge (1998–99). This may reflect a recovery phase in the local population, or possibly a rising part of a cyclic recruitment pattern, such as reported for barbel elsewhere and for other cyprinids in the UK. Habitat enhancement is recommended over stocking, given the adequate abundance of barbel in areas with suitable habitat. However, it remains unclear whether fencing-off of the banks from livestock will enhance 0+ barbel numbers, which appear to be low relative to some European rivers of similar width and depth.
Altogether 701 adult barbel, Barbus barbus were captured by electrofishing and individually tagged to study their local displacement and movements in a stretch of the River Jihlava (Czech Republic). A total of 149 fish were recaptured and 105 of them (70.47 %) were considered as ”resident” because they were always recaptured in the same, relatively restricted (250 - 780 m) stream section, which always contained a pool and was demarcated naturally by riffles on both edges. The remaining 44 recaptured specimens (29.53 %) belonged to the “mobile” part of population, their movements encompassing two (or exceptionally more) adjacent stream sections and at maximum a distance of 1680 m downstream or 2020 m upstream. The proportion of mobile barbel, relatively low in smaller and middle size classes, increased in the largest size classes (451–550 mm of SL). A rather limited extent of movements also suggests a relatively small area of home range in the studied stretch, which nevertheless provides satisfactory resources and favourable conditions required by barbel over their entire life cycle. The extent of movements and corresponding proportion of mobile fish appear to be increasing with diminishing habitat patchiness. In the stretch of River Jihlava studied, with a rich patchy heterogenous habitat and well developed riffle-pool-raceway structure, each section (pool) can be considered as a more or less isolated spatial unit containing its own, and in a certain degree, isolated component of a metapopulation.
A four-year experiment with a total of 993 individually-tagged barbel, Barbus barbus, resulted in the assessment of survival and abundance. The mean annual survival rate was 0.862, but the partial values assessed separately for seasons (spring – autumn and autumn – spring) differed considerably and the possible reasons for this phenomenon are discussed. On the basis of known survival rate, the abundance was subsequently estimated (for the entire studied stretch and per hectare) using the Petersen capture-recapture method for the period spring 1999 to autumn 2002, and the mean value reached 303 ± 110 ind.ha-1 (minimum 195, maximum 498 ind.ha-1). The Jolly-Seber method was also used to estimate abundance from autumn 1999 to spring 2001 and gave a mean 425 ± 120 ind.ha-1 and a range 233–563 ind.ha1. These results were in autumn 2001 supported by another simultaneously conducted census following the removal method by Zippin (316 ind.ha-1). The abundance showed a significant tendency to increase during the four-year survey, which is in an accordance with the long-term changes observed in the dynamics of the fish community in this stream.