« Previous |
11 - 15 of 15
|
Next »
Number of results to display per page
Search Results
12. HWC2023 –Hamburg.de Website Corpus 2023
- Creator:
- Rüdiger, Jan Oliver
- Publisher:
- Leibniz-Institut für Deutsche Sprache
- Type:
- text and corpus
- Subject:
- corpus, Web corpus, web corpora, Germanistik, German, websites, crawling corpus, and CorpusExplorer
- Language:
- German
- Description:
- A petition for a referendum (called: "Schluss mit Gendersprache in Verwaltung und Bildung" / eng.: "abolition of gender language in administration and education") was formed in Hamburg in February 2023. The project "Empirical Gender Linguistics" at the "Leibniz Institute for the German Language" took this as an opportunity to completely scrap the "https://www.hamburg.de" website (except the list of ships in the Port of Hamburg and the yellow page). The Hamburg.de website is the central digital contact point for citizens. The scraped texts were cleaned, processed and annotated using http://www.CorpusExplorer.de (TreeTagger - POS/Lemma information). We use the corpus to analyze the use of words with gender signs.
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/3.0/
13. Indonesian web corpus
- Creator:
- MEDVEĎ, MAREK and Suchomel, Vít
- Publisher:
- Masaryk University, NLP Centre
- Type:
- text and corpus
- Subject:
- Web corpus
- Language:
- Indonesian
- Description:
- Indonesian web corpus crawled in 2010. Encoded in UTF-8, cleaned, deduplicated, tagged by Morphind.
- Rights:
- NLP Centre Web Corpus License, https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC, and ACA
14. Nottinghamer Korpus Deutscher YouTube-Sprache (The NottDeuYTSch Corpus) (2022-07-27)
- Creator:
- Cotgrove, Louis Alexander
- Publisher:
- University of Nottingham
- Type:
- text and corpus
- Subject:
- youth language, Computer-Mediated Communication, Digitally-Mediated Communication, CMC, DMC, online, YouTube, digital, emoji, translanguaging, multilingualism, social media, digital humanities, and Web corpus
- Language:
- German, English, Russian, Turkish, and Serbo-Croatian
- Description:
- The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represents an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
15. Oromo web corpus
- Creator:
- Suchomel, Vít and Rychlý, Pavel
- Publisher:
- Masaryk University, NLP Centre
- Type:
- text and corpus
- Subject:
- text corpora, Ethiopian languages, Oromo, Web corpus, and under-resourced language
- Language:
- Oromo
- Description:
- Oromo web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
- Rights:
- NLP Centre Web Corpus License, https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC, and ACA