Files in this item
- Name
- language-modeling-corpus.zip
- Size
- 633.79 MB
- Format
- application/zip
- Description
- Sentences for unsupervised training and validation of language models
- MD5
- b6ed0a8e7dc263d1a1e635d6d5770f6b
- dataset_mlm_non-crossing_only-relevant_training.txt6 MB
- dataset_mlm_non-crossing_all_validation.txt64 MB
- dataset_mlm_all_only-relevant_validation.txt719 kB
- dataset_mlm_all_only-relevant_training.txt7 MB
- dataset_mlm_non-crossing_all_training.txt499 MB
- dataset_mlm_all_all_validation.txt78 MB
- dataset_mlm_non-crossing_only-relevant_validation.txt536 kB
- dataset_mlm_all_all_training.txt601 MB
- Name
- named-entity-recognition-annotations-small.zip
- Size
- 978.29 MB
- Format
- application/zip
- Description
- Sentences and NER tags that we used for the supervised training, validation, and testing of intermediate language models
- MD5
- 92d80ec8d6e66263295797b1e81bd60d
- dataset_ner_manatee_non-crossing_only-relevant_testing_001-400.ner_tags.txt41 kB
- dataset_ner_manatee+regests_all_only-relevant_validation_automatically_tagged.sentences.txt628 kB
- dataset_ner_fuzzy-regex+regests_non-crossing_all_validation_automatically_tagged.ner_tags.txt10 MB
- dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training_automatically_tagged.sentences.txt3 MB
- dataset_ner_regests_testing.sentences.txt186 kB
- dataset_ner_fuzzy-regex+regests_all_all_training.ner_tags.txt50 MB
- dataset_ner_manatee_non-crossing_all_validation.ner_tags.txt2 MB
- dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation_automatically_tagged.ner_tags.txt317 kB
- dataset_ner_fuzzy-regex_non-crossing_all_validation_automatically_tagged.ner_tags.txt10 MB
- dataset_ner_manatee+regests_non-crossing_all_training.sentences.txt43 MB
- dataset_ner_manatee_non-crossing_only-relevant_training_automatically_tagged.sentences.txt1 MB
- dataset_ner_fuzzy-regex+regests_all_only-relevant_training.sentences.txt4 MB
- dataset_ner_manatee+regests_all_only-relevant_training_automatically_tagged.sentences.txt3 MB
- dataset_ner_manatee_non-crossing_only-relevant_training.ner_tags.txt546 kB
- dataset_ner_manatee_non-crossing_all_validation_automatically_tagged.ner_tags.txt3 MB
- dataset_ner_manatee+regests_all_all_training_automatically_tagged.sentences.txt64 MB
- dataset_ner_fuzzy-regex+regests_all_all_training_automatically_tagged.ner_tags.txt63 MB
- dataset_ner_manatee+regests_all_all_training.sentences.txt64 MB
- dataset_ner_manatee_all_all_validation.sentences.txt14 MB
- dataset_ner_fuzzy-regex_all_only-relevant_validation_automatically_tagged.sentences.txt908 kB
- dataset_ner_manatee+regests_non-crossing_only-relevant_training.ner_tags.txt956 kB
- dataset_ner_fuzzy-regex_all_only-relevant_validation_automatically_tagged.ner_tags.txt394 kB
- dataset_ner_fuzzy-regex+regests_all_only-relevant_validation.ner_tags.txt347 kB
- dataset_ner_fuzzy-regex+regests_all_all_training.sentences.txt156 MB
- dataset_ner_regests_testing_001-400.sentences.txt91 kB
- dataset_ner_manatee+regests_non-crossing_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
- dataset_ner_fuzzy-regex_all_all_testing.ner_tags.txt12 MB
- dataset_ner_manatee_non-crossing_only-relevant_testing.ner_tags.txt114 kB
- dataset_ner_fuzzy-regex_non-crossing_all_testing.ner_tags.txt8 MB
- dataset_ner_regests_testing.ner_tags.txt69 kB
- dataset_ner_manatee_non-crossing_all_validation_automatically_tagged.sentences.txt8 MB
- dataset_ner_manatee_non-crossing_all_training.ner_tags.txt13 MB
- dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training.sentences.txt3 MB
- dataset_ner_fuzzy-regex+regests_all_all_validation_automatically_tagged.sentences.txt39 MB
- dataset_ner_manatee_non-crossing_only-relevant_testing_401-500.ner_tags.txt9 kB
- dataset_ner_manatee+regests_all_all_validation_automatically_tagged.ner_tags.txt5 MB
- dataset_ner_fuzzy-regex_all_only-relevant_training.sentences.txt3 MB
- dataset_ner_manatee_all_only-relevant_testing.ner_tags.txt164 kB
- dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_automatically_tagged.docx58 kB
- dataset_ner_fuzzy-regex_non-crossing_all_training.ner_tags.txt36 MB
- dataset_ner_fuzzy-regex+regests_all_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
- dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation_automatically_tagged.sentences.txt725 kB
- dataset_ner_fuzzy-regex+regests_all_only-relevant_training.ner_tags.txt1 MB
- dataset_ner_manatee_all_all_training_automatically_tagged.ner_tags.txt26 MB
- dataset_ner_manatee+regests_all_all_validation_automatically_tagged.sentences.txt14 MB
- dataset_ner_manatee_all_only-relevant_validation_automatically_tagged.ner_tags.txt212 kB
- dataset_ner_fuzzy-regex_all_only-relevant_testing.ner_tags.txt308 kB
- dataset_ner_manatee+regests_non-crossing_all_training.ner_tags.txt13 MB
- dataset_ner_fuzzy-regex+regests_non-crossing_all_training_automatically_tagged.ner_tags.txt46 MB
- dataset_ner_manatee+regests_non-crossing_all_training_automatically_tagged.sentences.txt43 MB
- dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_tagged.ner_tags.txt12 kB
- dataset_ner_manatee+regests_non-crossing_all_validation.ner_tags.txt2 MB
- dataset_ner_regests_validation.ner_tags.txt49 kB
- dataset_ner_manatee_all_only-relevant_training_automatically_tagged.ner_tags.txt899 kB
- dataset_ner_fuzzy-regex+regests_non-crossing_all_validation_automatically_tagged.sentences.txt25 MB
- dataset_ner_fuzzy-regex_all_only-relevant_validation.sentences.txt908 kB
- dataset_ner_fuzzy-regex+regests_non-crossing_all_validation.sentences.txt25 MB
- dataset_ner_manatee_all_all_testing.sentences.txt14 MB
- dataset_ner_fuzzy-regex_all_all_training.ner_tags.txt51 MB
- dataset_ner_fuzzy-regex_all_all_validation_automatically_tagged.ner_tags.txt16 MB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation_automatically_tagged.ner_tags.txt266 kB
- dataset_ner_manatee_all_only-relevant_validation.sentences.txt500 kB
- dataset_ner_manatee_all_only-relevant_training.sentences.txt2 MB
- dataset_ner_manatee_non-crossing_only-relevant_testing.sentences.txt343 kB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation_automatically_tagged.sentences.txt597 kB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation.ner_tags.txt201 kB
- dataset_ner_fuzzy-regex+regests_all_all_validation.sentences.txt39 MB
- dataset_ner_fuzzy-regex+regests_all_only-relevant_training_automatically_tagged.sentences.txt4 MB
- dataset_ner_manatee_non-crossing_all_testing.sentences.txt8 MB
- dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_automatically_tagged.ner_tags.txt11 kB
- dataset_ner_fuzzy-regex_all_only-relevant_training_automatically_tagged.sentences.txt3 MB
- dataset_ner_manatee_non-crossing_only-relevant_validation.sentences.txt340 kB
- dataset_ner_fuzzy-regex_non-crossing_all_testing.sentences.txt25 MB
- dataset_ner_fuzzy-regex+regests_all_only-relevant_validation_automatically_tagged.ner_tags.txt445 kB
- dataset_ner_manatee+regests_non-crossing_only-relevant_validation.ner_tags.txt162 kB
- dataset_ner_fuzzy-regex_all_only-relevant_training.ner_tags.txt1 MB
- dataset_ner_manatee_all_only-relevant_training_automatically_tagged.sentences.txt2 MB
- dataset_ner_regests_validation.sentences.txt128 kB
- dataset_ner_fuzzy-regex+regests_non-crossing_all_training.sentences.txt110 MB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation.sentences.txt598 kB
- dataset_ner_fuzzy-regex+regests_all_only-relevant_validation.sentences.txt1 MB
- dataset_ner_manatee_non-crossing_only-relevant_training.sentences.txt1 MB
- dataset_ner_manatee_all_all_training.sentences.txt63 MB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_training.ner_tags.txt897 kB
- dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
- dataset_ner_fuzzy-regex_all_all_testing.sentences.txt39 MB
- dataset_ner_manatee+regests_all_all_validation.sentences.txt14 MB
- dataset_ner_manatee_non-crossing_only-relevant_training_automatically_tagged.ner_tags.txt679 kB
- dataset_ner_manatee_all_only-relevant_training.ner_tags.txt731 kB
- dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation.sentences.txt725 kB
- dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_tagged.sentences.txt28 kB
- dataset_ner_manatee_non-crossing_only-relevant_testing_401-500.sentences.txt28 kB
- dataset_ner_manatee_non-crossing_only-relevant_testing_001-400.sentences.txt123 kB
- dataset_ner_fuzzy-regex+regests_non-crossing_all_training.ner_tags.txt36 MB
- dataset_ner_fuzzy-regex_non-crossing_all_training_automatically_tagged.sentences.txt109 MB
- dataset_ner_manatee_all_only-relevant_validation.ner_tags.txt164 kB
- dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation.ner_tags.txt249 kB
- dataset_ner_fuzzy-regex+regests_all_only-relevant_validation_automatically_tagged.sentences.txt1 MB
- dataset_ner_manatee+regests_non-crossing_only-relevant_training.sentences.txt2 MB
- dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_automatically_tagged.sentences.txt28 kB
- dataset_ner_regests_training.sentences.txt1 MB
- dataset_ner_regests_training_automatically_tagged.sentences.txt1 MB
- dataset_ner_regests_testing_001-400.ner_tags.txt34 kB
- dataset_ner_manatee+regests_all_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
- dataset_ner_fuzzy-regex_all_all_validation.sentences.txt39 MB
- dataset_ner_fuzzy-regex+regests_all_all_validation.ner_tags.txt12 MB
- dataset_ner_manatee+regests_all_only-relevant_validation.ner_tags.txt213 kB
- dataset_ner_manatee+regests_non-crossing_all_validation_automatically_tagged.sentences.txt8 MB
- dataset_ner_fuzzy-regex_all_only-relevant_testing.sentences.txt928 kB
- dataset_ner_manatee_non-crossing_only-relevant_validation_automatically_tagged.ner_tags.txt147 kB
- dataset_ner_manatee+regests_all_all_training.ner_tags.txt20 MB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_training_automatically_tagged.sentences.txt2 MB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_testing.ner_tags.txt214 kB
- dataset_ner_fuzzy-regex_non-crossing_all_training.sentences.txt109 MB
- dataset_ner_manatee+regests_all_only-relevant_training.sentences.txt3 MB
- dataset_ner_manatee+regests_non-crossing_all_validation_automatically_tagged.ner_tags.txt3 MB
- dataset_ner_manatee_all_all_testing.ner_tags.txt4 MB
- dataset_ner_manatee+regests_non-crossing_only-relevant_validation.sentences.txt468 kB
- dataset_ner_manatee_all_all_validation_automatically_tagged.ner_tags.txt5 MB
- dataset_ner_manatee+regests_non-crossing_only-relevant_validation_automatically_tagged.ner_tags.txt198 kB
- dataset_ner_manatee+regests_all_only-relevant_training.ner_tags.txt1 MB
- dataset_ner_fuzzy-regex+regests_non-crossing_all_validation.ner_tags.txt8 MB
- dataset_ner_manatee+regests_all_only-relevant_validation_automatically_tagged.ner_tags.txt263 kB
- dataset_ner_fuzzy-regex_non-crossing_all_training_automatically_tagged.ner_tags.txt45 MB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_testing.sentences.txt632 kB
- dataset_ner_regests_validation_automatically_tagged.sentences.txt128 kB
- dataset_ner_regests_validation_automatically_tagged.ner_tags.txt50 kB
- dataset_ner_manatee+regests_all_all_validation.ner_tags.txt4 MB
- dataset_ner_manatee+regests_non-crossing_only-relevant_training_automatically_tagged.sentences.txt2 MB
- dataset_ner_manatee_all_only-relevant_testing.sentences.txt498 kB
- dataset_ner_manatee_non-crossing_all_validation.sentences.txt8 MB
- dataset_ner_manatee_non-crossing_all_training.sentences.txt42 MB
- dataset_ner_manatee+regests_non-crossing_all_training_automatically_tagged.ner_tags.txt18 MB
- dataset_ner_manatee_all_only-relevant_validation_automatically_tagged.sentences.txt500 kB
- dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_tagged.docx39 kB
- dataset_ner_fuzzy-regex_non-crossing_all_validation.ner_tags.txt8 MB
- dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training.ner_tags.txt1 MB
- dataset_ner_fuzzy-regex_all_all_validation_automatically_tagged.sentences.txt39 MB
- dataset_ner_fuzzy-regex_all_only-relevant_validation.ner_tags.txt299 kB
- dataset_ner_manatee_all_all_training_automatically_tagged.sentences.txt63 MB
- dataset_ner_regests_training.ner_tags.txt411 kB
- dataset_ner_manatee+regests_non-crossing_only-relevant_validation_automatically_tagged.sentences.txt468 kB
- dataset_ner_manatee_non-crossing_all_testing.ner_tags.txt2 MB
- dataset_ner_manatee+regests_non-crossing_all_validation.sentences.txt8 MB
- dataset_ner_manatee+regests_all_all_training_automatically_tagged.ner_tags.txt26 MB
- dataset_ner_fuzzy-regex_all_all_training_automatically_tagged.sentences.txt155 MB
- dataset_ner_manatee_non-crossing_only-relevant_validation.ner_tags.txt114 kB
- dataset_ner_manatee+regests_all_only-relevant_validation.sentences.txt628 kB
- dataset_ner_fuzzy-regex_all_all_training_automatically_tagged.ner_tags.txt63 MB
- dataset_ner_manatee_non-crossing_all_training_automatically_tagged.ner_tags.txt17 MB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
- dataset_ner_fuzzy-regex+regests_all_all_validation_automatically_tagged.ner_tags.txt16 MB
- dataset_ner_fuzzy-regex_all_all_training.sentences.txt156 MB
- dataset_ner_regests_training_automatically_tagged.ner_tags.txt424 kB
- dataset_ner_manatee_non-crossing_all_training_automatically_tagged.sentences.txt42 MB
- dataset_ner_manatee_all_all_validation.ner_tags.txt4 MB
- dataset_ner_fuzzy-regex+regests_non-crossing_all_training_automatically_tagged.sentences.txt110 MB
- dataset_ner_fuzzy-regex+regests_all_all_training_automatically_tagged.sentences.txt156 MB
- dataset_ner_fuzzy-regex_non-crossing_all_validation_automatically_tagged.sentences.txt25 MB
- dataset_ner_fuzzy-regex_non-crossing_all_validation.sentences.txt25 MB
- dataset_ner_fuzzy-regex_all_all_validation.ner_tags.txt12 MB
- dataset_ner_fuzzy-regex_all_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
- dataset_ner_fuzzy-regex_non-crossing_only-relevant_training.sentences.txt2 MB
- dataset_ner_manatee_non-crossing_only-relevant_validation_automatically_tagged.sentences.txt340 kB
- dataset_ner_manatee_all_all_validation_automatically_tagged.sentences.txt14 MB
- dataset_ner_manatee_all_all_training.ner_tags.txt20 MB
- Name
- named-entity-recognition-annotations-large.zip
- Size
- 1.31 GB
- Format
- application/zip
- Description
- Sentences and NER tags for supervised training, validation, and testing of language models
- MD5
- 4bf38b89ed8948d3fa355b6e1e55d6de
- dataset_mlm_non-crossing_only-relevant_validation_automatically_tagged_004.sentences.txt534 kB
- dataset_mlm_all_only-relevant_training_automatically_tagged_004.sentences.txt7 MB
- dataset_mlm_non-crossing_all_training_automatically_tagged_004.sentences.txt498 MB
- dataset_mlm_all_all_validation_automatically_tagged_004.sentences.txt77 MB
- dataset_mlm_non-crossing_only-relevant_training_automatically_tagged_007.sentences.txt6 MB
- dataset_mlm_all_all_training_automatically_tagged_004.sentences.txt599 MB
- dataset_mlm_all_only-relevant_validation_automatically_tagged_004.ner_tags.txt265 kB
- dataset_mlm_all_only-relevant_validation_automatically_tagged_004.sentences.txt717 kB
- dataset_mlm_all_only-relevant_validation_automatically_tagged_007.ner_tags.txt248 kB
- dataset_mlm_all_all_validation_automatically_tagged_007.ner_tags.txt28 MB
- dataset_mlm_non-crossing_only-relevant_training_automatically_tagged_004.ner_tags.txt2 MB
- dataset_mlm_non-crossing_only-relevant_training_automatically_tagged_007.ner_tags.txt2 MB
- dataset_mlm_non-crossing_all_validation_automatically_tagged_007.sentences.txt64 MB
- dataset_mlm_non-crossing_all_training_automatically_tagged_004.ner_tags.txt203 MB
- dataset_mlm_non-crossing_all_training_automatically_tagged_007.ner_tags.txt184 MB
- dataset_mlm_all_only-relevant_training_automatically_tagged_004.ner_tags.txt3 MB
- dataset_mlm_non-crossing_only-relevant_training_automatically_tagged_004.sentences.txt6 MB
- dataset_mlm_all_only-relevant_training_automatically_tagged_007.ner_tags.txt3 MB
- dataset_mlm_all_all_training_automatically_tagged_004.ner_tags.txt241 MB
- dataset_mlm_all_all_training_automatically_tagged_007.ner_tags.txt220 MB
- dataset_mlm_non-crossing_only-relevant_validation_automatically_tagged_004.ner_tags.txt202 kB
- dataset_mlm_non-crossing_only-relevant_validation_automatically_tagged_007.ner_tags.txt188 kB
- dataset_mlm_non-crossing_all_validation_automatically_tagged_004.sentences.txt64 MB
- dataset_mlm_non-crossing_only-relevant_validation_automatically_tagged_007.sentences.txt534 kB
- dataset_mlm_all_only-relevant_training_automatically_tagged_007.sentences.txt7 MB
- dataset_mlm_non-crossing_all_training_automatically_tagged_007.sentences.txt498 MB
- dataset_mlm_all_all_validation_automatically_tagged_004.ner_tags.txt30 MB
- dataset_mlm_all_all_validation_automatically_tagged_007.sentences.txt77 MB
- dataset_mlm_all_all_training_automatically_tagged_007.sentences.txt599 MB
- dataset_mlm_all_only-relevant_validation_automatically_tagged_007.sentences.txt717 kB
- dataset_mlm_non-crossing_all_validation_automatically_tagged_004.ner_tags.txt25 MB
- dataset_mlm_non-crossing_all_validation_automatically_tagged_007.ner_tags.txt23 MB