Files in this item

This item is
Publicly Available
and licensed under:
Public Domain Dedication (CC Zero)
Distributed under Creative Commons No Copyright
Icon
Name
language-modeling-corpus.zip
Size
633.79 MB
Format
application/zip
Description
Sentences for unsupervised training and validation of language models
MD5
b6ed0a8e7dc263d1a1e635d6d5770f6b
 Download file  Preview
 File Preview  
    • dataset_mlm_non-crossing_only-relevant_training.txt6 MB
    • dataset_mlm_non-crossing_all_validation.txt64 MB
    • dataset_mlm_all_only-relevant_validation.txt719 kB
    • dataset_mlm_all_only-relevant_training.txt7 MB
    • dataset_mlm_non-crossing_all_training.txt499 MB
    • dataset_mlm_all_all_validation.txt78 MB
    • dataset_mlm_non-crossing_only-relevant_validation.txt536 kB
    • dataset_mlm_all_all_training.txt601 MB
Icon
Name
named-entity-recognition-annotations.zip
Size
978.29 MB
Format
application/zip
Description
Sentences and NER tags for supervised training, validation, and testing of language models
MD5
92d80ec8d6e66263295797b1e81bd60d
 Download file  Preview
 File Preview  
    • dataset_ner_manatee_non-crossing_only-relevant_testing_001-400.ner_tags.txt41 kB
    • dataset_ner_manatee+regests_all_only-relevant_validation_automatically_tagged.sentences.txt628 kB
    • dataset_ner_fuzzy-regex+regests_non-crossing_all_validation_automatically_tagged.ner_tags.txt10 MB
    • dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training_automatically_tagged.sentences.txt3 MB
    • dataset_ner_regests_testing.sentences.txt186 kB
    • dataset_ner_fuzzy-regex+regests_all_all_training.ner_tags.txt50 MB
    • dataset_ner_manatee_non-crossing_all_validation.ner_tags.txt2 MB
    • dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation_automatically_tagged.ner_tags.txt317 kB
    • dataset_ner_fuzzy-regex_non-crossing_all_validation_automatically_tagged.ner_tags.txt10 MB
    • dataset_ner_manatee+regests_non-crossing_all_training.sentences.txt43 MB
    • dataset_ner_manatee_non-crossing_only-relevant_training_automatically_tagged.sentences.txt1 MB
    • dataset_ner_fuzzy-regex+regests_all_only-relevant_training.sentences.txt4 MB
    • dataset_ner_manatee+regests_all_only-relevant_training_automatically_tagged.sentences.txt3 MB
    • dataset_ner_manatee_non-crossing_only-relevant_training.ner_tags.txt546 kB
    • dataset_ner_manatee_non-crossing_all_validation_automatically_tagged.ner_tags.txt3 MB
    • dataset_ner_manatee+regests_all_all_training_automatically_tagged.sentences.txt64 MB
    • dataset_ner_fuzzy-regex+regests_all_all_training_automatically_tagged.ner_tags.txt63 MB
    • dataset_ner_manatee+regests_all_all_training.sentences.txt64 MB
    • dataset_ner_manatee_all_all_validation.sentences.txt14 MB
    • dataset_ner_fuzzy-regex_all_only-relevant_validation_automatically_tagged.sentences.txt908 kB
    • dataset_ner_manatee+regests_non-crossing_only-relevant_training.ner_tags.txt956 kB
    • dataset_ner_fuzzy-regex_all_only-relevant_validation_automatically_tagged.ner_tags.txt394 kB
    • dataset_ner_fuzzy-regex+regests_all_only-relevant_validation.ner_tags.txt347 kB
    • dataset_ner_fuzzy-regex+regests_all_all_training.sentences.txt156 MB
    • dataset_ner_regests_testing_001-400.sentences.txt91 kB
    • dataset_ner_manatee+regests_non-crossing_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
    • dataset_ner_fuzzy-regex_all_all_testing.ner_tags.txt12 MB
    • dataset_ner_manatee_non-crossing_only-relevant_testing.ner_tags.txt114 kB
    • dataset_ner_fuzzy-regex_non-crossing_all_testing.ner_tags.txt8 MB
    • dataset_ner_regests_testing.ner_tags.txt69 kB
    • dataset_ner_manatee_non-crossing_all_validation_automatically_tagged.sentences.txt8 MB
    • dataset_ner_manatee_non-crossing_all_training.ner_tags.txt13 MB
    • dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training.sentences.txt3 MB
    • dataset_ner_fuzzy-regex+regests_all_all_validation_automatically_tagged.sentences.txt39 MB
    • dataset_ner_manatee_non-crossing_only-relevant_testing_401-500.ner_tags.txt9 kB
    • dataset_ner_manatee+regests_all_all_validation_automatically_tagged.ner_tags.txt5 MB
    • dataset_ner_fuzzy-regex_all_only-relevant_training.sentences.txt3 MB
    • dataset_ner_manatee_all_only-relevant_testing.ner_tags.txt164 kB
    • dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_automatically_tagged.docx58 kB
    • dataset_ner_fuzzy-regex_non-crossing_all_training.ner_tags.txt36 MB
    • dataset_ner_fuzzy-regex+regests_all_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
    • dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation_automatically_tagged.sentences.txt725 kB
    • dataset_ner_fuzzy-regex+regests_all_only-relevant_training.ner_tags.txt1 MB
    • dataset_ner_manatee_all_all_training_automatically_tagged.ner_tags.txt26 MB
    • dataset_ner_manatee+regests_all_all_validation_automatically_tagged.sentences.txt14 MB
    • dataset_ner_manatee_all_only-relevant_validation_automatically_tagged.ner_tags.txt212 kB
    • dataset_ner_fuzzy-regex_all_only-relevant_testing.ner_tags.txt308 kB
    • dataset_ner_manatee+regests_non-crossing_all_training.ner_tags.txt13 MB
    • dataset_ner_fuzzy-regex+regests_non-crossing_all_training_automatically_tagged.ner_tags.txt46 MB
    • dataset_ner_manatee+regests_non-crossing_all_training_automatically_tagged.sentences.txt43 MB
    • dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_tagged.ner_tags.txt12 kB
    • dataset_ner_manatee+regests_non-crossing_all_validation.ner_tags.txt2 MB
    • dataset_ner_regests_validation.ner_tags.txt49 kB
    • dataset_ner_manatee_all_only-relevant_training_automatically_tagged.ner_tags.txt899 kB
    • dataset_ner_fuzzy-regex+regests_non-crossing_all_validation_automatically_tagged.sentences.txt25 MB
    • dataset_ner_fuzzy-regex_all_only-relevant_validation.sentences.txt908 kB
    • dataset_ner_fuzzy-regex+regests_non-crossing_all_validation.sentences.txt25 MB
    • dataset_ner_manatee_all_all_testing.sentences.txt14 MB
    • dataset_ner_fuzzy-regex_all_all_training.ner_tags.txt51 MB
    • dataset_ner_fuzzy-regex_all_all_validation_automatically_tagged.ner_tags.txt16 MB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation_automatically_tagged.ner_tags.txt266 kB
    • dataset_ner_manatee_all_only-relevant_validation.sentences.txt500 kB
    • dataset_ner_manatee_all_only-relevant_training.sentences.txt2 MB
    • dataset_ner_manatee_non-crossing_only-relevant_testing.sentences.txt343 kB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation_automatically_tagged.sentences.txt597 kB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation.ner_tags.txt201 kB
    • dataset_ner_fuzzy-regex+regests_all_all_validation.sentences.txt39 MB
    • dataset_ner_fuzzy-regex+regests_all_only-relevant_training_automatically_tagged.sentences.txt4 MB
    • dataset_ner_manatee_non-crossing_all_testing.sentences.txt8 MB
    • dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_automatically_tagged.ner_tags.txt11 kB
    • dataset_ner_fuzzy-regex_all_only-relevant_training_automatically_tagged.sentences.txt3 MB
    • dataset_ner_manatee_non-crossing_only-relevant_validation.sentences.txt340 kB
    • dataset_ner_fuzzy-regex_non-crossing_all_testing.sentences.txt25 MB
    • dataset_ner_fuzzy-regex+regests_all_only-relevant_validation_automatically_tagged.ner_tags.txt445 kB
    • dataset_ner_manatee+regests_non-crossing_only-relevant_validation.ner_tags.txt162 kB
    • dataset_ner_fuzzy-regex_all_only-relevant_training.ner_tags.txt1 MB
    • dataset_ner_manatee_all_only-relevant_training_automatically_tagged.sentences.txt2 MB
    • dataset_ner_regests_validation.sentences.txt128 kB
    • dataset_ner_fuzzy-regex+regests_non-crossing_all_training.sentences.txt110 MB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_validation.sentences.txt598 kB
    • dataset_ner_fuzzy-regex+regests_all_only-relevant_validation.sentences.txt1 MB
    • dataset_ner_manatee_non-crossing_only-relevant_training.sentences.txt1 MB
    • dataset_ner_manatee_all_all_training.sentences.txt63 MB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_training.ner_tags.txt897 kB
    • dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
    • dataset_ner_fuzzy-regex_all_all_testing.sentences.txt39 MB
    • dataset_ner_manatee+regests_all_all_validation.sentences.txt14 MB
    • dataset_ner_manatee_non-crossing_only-relevant_training_automatically_tagged.ner_tags.txt679 kB
    • dataset_ner_manatee_all_only-relevant_training.ner_tags.txt731 kB
    • dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation.sentences.txt725 kB
    • dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_tagged.sentences.txt28 kB
    • dataset_ner_manatee_non-crossing_only-relevant_testing_401-500.sentences.txt28 kB
    • dataset_ner_manatee_non-crossing_only-relevant_testing_001-400.sentences.txt123 kB
    • dataset_ner_fuzzy-regex+regests_non-crossing_all_training.ner_tags.txt36 MB
    • dataset_ner_fuzzy-regex_non-crossing_all_training_automatically_tagged.sentences.txt109 MB
    • dataset_ner_manatee_all_only-relevant_validation.ner_tags.txt164 kB
    • dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_validation.ner_tags.txt249 kB
    • dataset_ner_fuzzy-regex+regests_all_only-relevant_validation_automatically_tagged.sentences.txt1 MB
    • dataset_ner_manatee+regests_non-crossing_only-relevant_training.sentences.txt2 MB
    • dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_automatically_tagged.sentences.txt28 kB
    • dataset_ner_regests_training.sentences.txt1 MB
    • dataset_ner_regests_training_automatically_tagged.sentences.txt1 MB
    • dataset_ner_regests_testing_001-400.ner_tags.txt34 kB
    • dataset_ner_manatee+regests_all_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
    • dataset_ner_fuzzy-regex_all_all_validation.sentences.txt39 MB
    • dataset_ner_fuzzy-regex+regests_all_all_validation.ner_tags.txt12 MB
    • dataset_ner_manatee+regests_all_only-relevant_validation.ner_tags.txt213 kB
    • dataset_ner_manatee+regests_non-crossing_all_validation_automatically_tagged.sentences.txt8 MB
    • dataset_ner_fuzzy-regex_all_only-relevant_testing.sentences.txt928 kB
    • dataset_ner_manatee_non-crossing_only-relevant_validation_automatically_tagged.ner_tags.txt147 kB
    • dataset_ner_manatee+regests_all_all_training.ner_tags.txt20 MB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_training_automatically_tagged.sentences.txt2 MB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_testing.ner_tags.txt214 kB
    • dataset_ner_fuzzy-regex_non-crossing_all_training.sentences.txt109 MB
    • dataset_ner_manatee+regests_all_only-relevant_training.sentences.txt3 MB
    • dataset_ner_manatee+regests_non-crossing_all_validation_automatically_tagged.ner_tags.txt3 MB
    • dataset_ner_manatee_all_all_testing.ner_tags.txt4 MB
    • dataset_ner_manatee+regests_non-crossing_only-relevant_validation.sentences.txt468 kB
    • dataset_ner_manatee_all_all_validation_automatically_tagged.ner_tags.txt5 MB
    • dataset_ner_manatee+regests_non-crossing_only-relevant_validation_automatically_tagged.ner_tags.txt198 kB
    • dataset_ner_manatee+regests_all_only-relevant_training.ner_tags.txt1 MB
    • dataset_ner_fuzzy-regex+regests_non-crossing_all_validation.ner_tags.txt8 MB
    • dataset_ner_manatee+regests_all_only-relevant_validation_automatically_tagged.ner_tags.txt263 kB
    • dataset_ner_fuzzy-regex_non-crossing_all_training_automatically_tagged.ner_tags.txt45 MB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_testing.sentences.txt632 kB
    • dataset_ner_regests_validation_automatically_tagged.sentences.txt128 kB
    • dataset_ner_regests_validation_automatically_tagged.ner_tags.txt50 kB
    • dataset_ner_manatee+regests_all_all_validation.ner_tags.txt4 MB
    • dataset_ner_manatee+regests_non-crossing_only-relevant_training_automatically_tagged.sentences.txt2 MB
    • dataset_ner_manatee_all_only-relevant_testing.sentences.txt498 kB
    • dataset_ner_manatee_non-crossing_all_validation.sentences.txt8 MB
    • dataset_ner_manatee_non-crossing_all_training.sentences.txt42 MB
    • dataset_ner_manatee+regests_non-crossing_all_training_automatically_tagged.ner_tags.txt18 MB
    • dataset_ner_manatee_all_only-relevant_validation_automatically_tagged.sentences.txt500 kB
    • dataset_ner_manatee_non-crossing_only-relevant_testing_401-500_tagged.docx39 kB
    • dataset_ner_fuzzy-regex_non-crossing_all_validation.ner_tags.txt8 MB
    • dataset_ner_fuzzy-regex+regests_non-crossing_only-relevant_training.ner_tags.txt1 MB
    • dataset_ner_fuzzy-regex_all_all_validation_automatically_tagged.sentences.txt39 MB
    • dataset_ner_fuzzy-regex_all_only-relevant_validation.ner_tags.txt299 kB
    • dataset_ner_manatee_all_all_training_automatically_tagged.sentences.txt63 MB
    • dataset_ner_regests_training.ner_tags.txt411 kB
    • dataset_ner_manatee+regests_non-crossing_only-relevant_validation_automatically_tagged.sentences.txt468 kB
    • dataset_ner_manatee_non-crossing_all_testing.ner_tags.txt2 MB
    • dataset_ner_manatee+regests_non-crossing_all_validation.sentences.txt8 MB
    • dataset_ner_manatee+regests_all_all_training_automatically_tagged.ner_tags.txt26 MB
    • dataset_ner_fuzzy-regex_all_all_training_automatically_tagged.sentences.txt155 MB
    • dataset_ner_manatee_non-crossing_only-relevant_validation.ner_tags.txt114 kB
    • dataset_ner_manatee+regests_all_only-relevant_validation.sentences.txt628 kB
    • dataset_ner_fuzzy-regex_all_all_training_automatically_tagged.ner_tags.txt63 MB
    • dataset_ner_manatee_non-crossing_all_training_automatically_tagged.ner_tags.txt17 MB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
    • dataset_ner_fuzzy-regex+regests_all_all_validation_automatically_tagged.ner_tags.txt16 MB
    • dataset_ner_fuzzy-regex_all_all_training.sentences.txt156 MB
    • dataset_ner_regests_training_automatically_tagged.ner_tags.txt424 kB
    • dataset_ner_manatee_non-crossing_all_training_automatically_tagged.sentences.txt42 MB
    • dataset_ner_manatee_all_all_validation.ner_tags.txt4 MB
    • dataset_ner_fuzzy-regex+regests_non-crossing_all_training_automatically_tagged.sentences.txt110 MB
    • dataset_ner_fuzzy-regex+regests_all_all_training_automatically_tagged.sentences.txt156 MB
    • dataset_ner_fuzzy-regex_non-crossing_all_validation_automatically_tagged.sentences.txt25 MB
    • dataset_ner_fuzzy-regex_non-crossing_all_validation.sentences.txt25 MB
    • dataset_ner_fuzzy-regex_all_all_validation.ner_tags.txt12 MB
    • dataset_ner_fuzzy-regex_all_only-relevant_training_automatically_tagged.ner_tags.txt1 MB
    • dataset_ner_fuzzy-regex_non-crossing_only-relevant_training.sentences.txt2 MB
    • dataset_ner_manatee_non-crossing_only-relevant_validation_automatically_tagged.sentences.txt340 kB
    • dataset_ner_manatee_all_all_validation_automatically_tagged.sentences.txt14 MB
    • dataset_ner_manatee_all_all_training.ner_tags.txt20 MB