Bavaria's Dialects Online (BDO) is the digital language information system of the three projects "Bavarian Dictionary", "Franconian Dictionary", and "Dialectological Information System of Bavarian Swabia". The database combines the research results of dialect research and presents dictionary articles as well as research data in a freely accessible online tool.
BDO is not only aimed at scholars, but also at the lay public interested in the language. Here, the vocabulary of all Bavarian dialects is collected in one place and made accessible. The system shows the richness of the dialects of Bavaria in combination. With the new database, one will be able to compare the dialect vocabulary of Old Bavaria, Franconia and Swabia. Authentic dialect evidence is used to illustrate the dialect words in their variety of meanings and regional distribution, as well as to show their use in idioms, proverbs, and much more. BDO allows a whole new look at the vocabulary of the dialects of all parts of the state of Bavaria.
The database contains about 5 Million dialectal linguistic evidences collected in differend projects within the Free State of Bavaria to the dialects Bavarian, Frankish, and Swabian.
In 1984, linguists at the University of Augsburg began to collect dialect data for the research and documentation project "Linguistic Map of Swabia" (German: "Sprachatlas von Bayerisch-Schwaben (SBS)"). In 1986, the University of Bayreuth followed with preparations for the "Linguistic Map of North- and East-Bavaria" (German: "Sprachatlas von Nordostbayern (SNOB)"). In the following years, partner projects of the other regions also started to collect data in their particular region. All six language projects then formed the "Research Association of the Bavarian Linguistic Map " (German: Bayerischer Sprachatlas (BSA)"), which was funded by the DFG and the Bavarian State Ministry of Science, Research and the Arts.
The first digital publication of BayDat by Ralf Zimmermann in 2007 at the University of Würzburg (see linked paper) was re-designed in 2019 by Manuel Raaf at the Bavarian Academy of Sciences and Humanities.
For detailed information, please see https://baydat.badw.de/info
The database offers access to over 6 million dialectal linguistic evidences of the project "Dictionary of Bavarian Dialects" (German: Das Bayerische Wörterbuch) as image snippets, partly and forthgoing lemmatized.
The area covered by the Dictionary of Bavarian Dialects (Bayerisches Wörterbuch) comprises Upper Bavaria, Lower Bavaria, the Upper Palatinate and neighbouring regions of Bavarian Swabia, Middle Franconia and Upper Franconia. Over and above the vernaculars spoken today, Bavaria’s literary tradition since its beginnings in the 8th century is also taken into account.
Starting in 1913, language material was collected from all Bavarian-speaking regions in Bavaria. Questionnaires were sent out to local informants throughout Bavaria, and contemporary and historical literary sources were excerpted. Today the collection comprises around nine million dialect examples. With the exception of the “Wörterlisten” (word lists), which can be digitally searched and edited, this material consists of index cards, to which corresponding standard German or quasi-standard German keywords have been added, filed alphabetically (see link below for more information).
For detailed information, please see https://www.bwb.badw.de/en/the-project.html and https://www.bwb.badw.de/en/digital-platform.html
HinDialect: 26 Hindi-related languages and dialects of the Indic Continuum in North India
Languages
This is a collection of folksongs for 26 languages that form a dialect continuum in North India and nearby regions.
Namely Angika, Awadhi, Baiga, Bengali, Bhadrawahi, Bhili, Bhojpuri, Braj, Bundeli, Chhattisgarhi, Garhwali, Gujarati, Haryanvi, Himachali, Hindi, Kanauji, Khadi Boli, Korku, Kumaoni, Magahi, Malvi, Marathi, Nimadi, Panjabi, Rajasthani, Sanskrit.
This data is originally collected by the Kavita Kosh Project at http://www.kavitakosh.org/ . Here are the main characteristics of the languages in this collection:
- They are all Indic languages except for Korku.
- The majority of them are closely related to the standard Hindi dialect genealogically (such as Hariyanvi and Bhojpuri), although the collection also contains languages such as Bengali and Gujarati which are more distant relatives.
- They are all primarily spoken in (North) India (Bengali is also spoken in Bangladesh)
- All except Sanksrit are alive languages
Data
Categorising them by pre-existing available NLP resources, we have:
* Band 1 languages : Hindi, Panjabi, Gujarati, Bengali, Nepali. These languages already have other large standard datasets available. Kavita Kosh may have very little data for these languages.
* Band 2 languages: Bhojpuri, Magahi, Awadhi, Braj. These languages have growing interest and some datasets of a relatively small size as compared to Band 1 language resources.
* Band 3 languages: All other languages in the collection are previously zero-resource languages. These are the languages for which this dataset is the most relevant.
Script
This dataset is entirely in Devanagari. Content in the case of languages not written in Devanagari (such as Bengali and Gujarati) has been transliterated by the Kavita Kosh Project.
Format
The dataset contains a single text file containing folksongs per language. Folksongs are separated from each other by an empty line. The first line of a new piece is the title of the folksong, and line separation within folksongs is preserved.
HinDialect: 26 Hindi-related languages and dialects of the Indic Continuum in North India
Languages
This is a collection of folksongs for 26 languages that form a dialect continuum in North India and nearby regions.
Namely Angika, Awadhi, Baiga, Bengali, Bhadrawahi, Bhili, Bhojpuri, Braj, Bundeli, Chhattisgarhi, Garhwali, Gujarati, Haryanvi, Himachali, Hindi, Kanauji, Khadi Boli, Korku, Kumaoni, Magahi, Malvi, Marathi, Nimadi, Panjabi, Rajasthani, Sanskrit.
This data is originally collected by the Kavita Kosh Project at http://www.kavitakosh.org/ . Here are the main characteristics of the languages in this collection:
- They are all Indic languages except for Korku.
- The majority of them are closely related to the standard Hindi dialect genealogically (such as Hariyanvi and Bhojpuri), although the collection also contains languages such as Bengali and Gujarati which are more distant relatives.
- All except Nepali are primarily spoken in (North) India
- All except Sanksrit are alive languages
Data
Categorising them by pre-existing available NLP resources, we have:
* Band 1 languages : Hindi, Marathi, Punjabi, Sindhi, Gujarati, Bengali, Nepali. These languages already have other large datasets available. Since Kavita Kosh focusses largely on Hindi-related languages, we may have very little data for these other languages in this particular dataset.
* Band 2 languages: Bhojpuri, Magahi, Awadhi, Brajbhasha. These languages have growing interest and some datasets of a relatively small size as compared to Band 1 language resources.
* Band 3 languages: All other languages in the collection are previously zero-resource languages. These are the languages for which this dataset is the most relevant.
Script
This dataset is entirely in Devanagari. Content in the case of languages not written in Devanagari (such as Bengali and Gujarati) has been transliterated by the Kavita Kosh Project.
Format
The data is segregated by language, and contains each folksong in a different JSON file.
The database currently contains about 1 million dialectal linguistic evidences of the project "The Franconian Dictionary" (German: Das Fränkische Wörterbuch), each of which lemmatized, annotated, and linked to the original questionnaire. The database is work in progress, so there will be more data available regularly.
The Franconian Dictionary was initiated by the Munich office of the Bavarian Dictionary project, sending questionnaires for a dialect survey in Franconia. In the wake of this survey an office in Erlangen was established in 1933 (see link below for more information).
During the course of 90 years thousands of volunteers helped to compile a considerable collection of vernacular examples of usage, drawn from the Bavarian districts of Upper, Middle and Lower Frankonia. For the most part they represent the East Franconian dialect, to the lesser extent also Rhine-Franconian, Swabian and North-Bavarian vernaculars. Between 2007 and 2008 a small selection of the research results was published in three editions of one printed volume by Eberhard Wagner and Alfred Klepsch: “Handwörterbuch von Bayerisch-Franken” (see link below for more information).
Since 2012 the Franconian Dictionary, a project of the Bavarian Academy of Sciences and Humanities, has been entrusted to the Friedrich-Alexander-University in Erlangen and Nuremberg (FAU). The project is supervised by Prof. Dr. Mechthild Habermann, Chair of the Faculty of German Linguistics at the FAU.
For detailed information, please see http://www.wbf.badw.de/en/the-project.html and http://www.wbf.badw.de/en/wbf-digital.html
This program enables the user to visualize f0 contours, to plot vowels in the F1/F2 space for multiple points in the vowel interval, e.g. at 20%, 50% and 80%, and to visualize vowel durations.
(The tool is implemented in R. We used the following packages: phonR, gplots, plotrix, lattice, readxl, WriteXLS, DT,
psych and pracma. We thank the developers of these packages.)