Soubory tohoto záznamu

 Stáhnout všechny soubory záznamu (437.36 MB)
Icon
Název
0_README.txt
Velikost
4.34 KB
Formát
Textový soubor
Popis
Description of the sql-file
MD5
0077e7229f1f119f44406e95ffb203ec
 Stáhnout soubor  Náhled
 Náhled souboru  
This SQL-Dump contains linguistic annotated data from the Online-Forum PC Games (https://forum.pcgames.de). All posts (approx. 2.4 mio) where scraped on in April 2019 (details see Kissling 2019 and the github-URL below), resulting in 120 mio tokens of almost 70'000 authors.
In this database you find tokenized, part-of-speech-tagged and party lemmatized information of the posts and metadata like authors and the location of the post in the forum structure. Lastly, in the table infinitives, you will find the results of the API request done with Oxford Dictionary of English.
The order of the words in a post cannot be reconstructed with this database. Usernames were replaced with author_ids.

Additional information:
As this corpus was analyzed in terms of productivity and language contact of German and English (Kissling 2020), there is additional information about German base forms found in present day English, mainly focussing on the formula "German_verb_stem + -en = English verb infinitiv . . .
                                            
Icon
Název
posts_German_PC_Games_online_forum.sql
Velikost
437.35 MB
Formát
Neznámý
Popis
sql dump of the corpus
MD5
92d4d397befe0d9e552ef421c5eb6018
 Stáhnout soubor