{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started with Text Analysis\n",
"The aim of the notebook is to begin with quantitative analysis of text data. We select a Czech text, split it into tokens, perform frequency analysis, and observe the nature of the data.\n",
"\n",
"In this notebook (and further excersises), we will use the following modules:\n",
"* Natural Language Toolkit (`nltk`)\n",
"* `numpy` - module for numerical operation (extremely handy for vector and matrix calculations)\n",
"* `pandas` - module for data processing (handy for table-like data)\n",
"* `matplotlib` - module for mathematical visualizations\n",
"\n",
"These four packages are widely used in different text analysis tasks. Many other packages are built upon these four."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK 1**: Get the text of Karel Hynek Macha's Maj. Store it in a plain text file (no .DOC or .DOCX)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For future use, store the text as `../resources/maj.txt`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install necessary packages\n",
"In this notebook, we use NLTK (Natural Language ToolKit) for tokenization of input text, and Pandas, a package for easy handling of tabular data.\n",
"\n",
"N.B. In some installations, `pip` is replaced by `pip3`. If the installation doesn't work and you see `/bin/bash: pip: command not found`, try to rename `pip` to `pip3` and re-run the cell."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/bin/bash: pip: command not found\n",
"/bin/bash: pip: command not found\n",
"/bin/bash: pip: command not found\n",
"/bin/bash: pip: command not found\n"
]
}
],
"source": [
"!pip install --user nltk\n",
"!pip install --user pandas\n",
"!pip install --user matplotlib\n",
"!pip install --user numpy"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package punkt to /home/zuzana/nltk_data...\n",
"[nltk_data] Package punkt is already up-to-date!\n"
]
}
],
"source": [
"import pandas as pd\n",
"import nltk\n",
"nltk.download('punkt')\n",
"from nltk.tokenize import word_tokenize\n",
"from collections import Counter\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get the data\n",
"Here, you have to probably change the filename."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"text = None\n",
"with open('../resources/maj.txt') as f:\n",
" text = f.read()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK 2**: What is the purpose of tokenization? Hint: Let print `word_tokenize(\"your short text\")` in a separate cell. Why we don't just split the text by spaces?"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Counter({'1': 1,\n",
" 'Byl': 3,\n",
" 'pozdní': 6,\n",
" 'večer': 5,\n",
" '–': 256,\n",
" 'první': 6,\n",
" 'máj': 15,\n",
" 'večerní': 7,\n",
" 'byl': 5,\n",
" 'lásky': 11,\n",
" 'čas': 23,\n",
" '.': 203,\n",
" 'Hrdliččin': 1,\n",
" 'zval': 3,\n",
" 'ku': 10,\n",
" 'lásce': 7,\n",
" 'hlas': 28,\n",
" ',': 405,\n",
" 'kde': 14,\n",
" 'borový': 3,\n",
" 'zaváněl': 2,\n",
" 'háj': 4,\n",
" 'O': 2,\n",
" 'šeptal': 2,\n",
" 'tichý': 7,\n",
" 'mech': 2,\n",
" ';': 67,\n",
" 'květoucí': 3,\n",
" 'strom': 3,\n",
" 'lhal': 2,\n",
" 'žel': 4,\n",
" 'svou': 14,\n",
" 'lásku': 2,\n",
" 'slavík': 2,\n",
" 'růži': 3,\n",
" 'pěl': 2,\n",
" 'růžinu': 2,\n",
" 'jevil': 2,\n",
" 'vonný': 3,\n",
" 'vzdech': 3,\n",
" 'Jezero': 2,\n",
" 'hladké': 2,\n",
" 'v': 130,\n",
" 'křovích': 2,\n",
" 'stinných': 2,\n",
" 'zvučelo': 2,\n",
" 'temně': 2,\n",
" 'tajný': 2,\n",
" 'bol': 2,\n",
" 'břeh': 5,\n",
" 'je': 32,\n",
" 'objímal': 2,\n",
" 'kol': 14,\n",
" 'a': 77,\n",
" 'slunce': 6,\n",
" 'jasná': 2,\n",
" 'světů': 2,\n",
" 'jiných': 2,\n",
" 'bloudila': 1,\n",
" 'blankytnými': 1,\n",
" 'pásky': 1,\n",
" 'planoucí': 1,\n",
" 'tam': 24,\n",
" 'co': 39,\n",
" 'slzy': 10,\n",
" 'I': 3,\n",
" 'světy': 1,\n",
" 'jich': 3,\n",
" 'oblohu': 1,\n",
" 'skvoucí': 1,\n",
" 've': 8,\n",
" 'chrám': 2,\n",
" 'věčné': 1,\n",
" 'vzešly': 1,\n",
" 'až': 22,\n",
" 'se': 93,\n",
" 'milostí': 1,\n",
" 'k': 24,\n",
" 'sobě': 4,\n",
" 'vroucí': 1,\n",
" 'změnivše': 1,\n",
" 'jiskry': 5,\n",
" 'hasnoucí': 1,\n",
" 'bloudící': 1,\n",
" 'milenci': 1,\n",
" 'sešly': 1,\n",
" 'Ouplné': 1,\n",
" 'lůny': 6,\n",
" 'krásná': 2,\n",
" 'tvář': 10,\n",
" 'tak': 9,\n",
" 'bledě': 1,\n",
" 'jasně': 2,\n",
" 'bledá': 3,\n",
" 'jak': 18,\n",
" 'milence': 1,\n",
" 'milenka': 1,\n",
" 'hledá': 1,\n",
" 'růžovou': 1,\n",
" 'vzplanula': 1,\n",
" 'zář': 13,\n",
" 'na': 29,\n",
" 'vodách': 1,\n",
" 'obrazy': 1,\n",
" 'své': 3,\n",
" 'zřela': 1,\n",
" 'sama': 1,\n",
" 'láskou': 2,\n",
" 'mřela': 1,\n",
" 'Dál': 1,\n",
" 'blyštil': 1,\n",
" 'bledý': 2,\n",
" 'dvorů': 2,\n",
" 'stín': 15,\n",
" 'jenž': 18,\n",
" 'šly': 1,\n",
" 'vzdy': 10,\n",
" 'blíž': 5,\n",
" 'objetí': 1,\n",
" 'by': 11,\n",
" 'níž': 6,\n",
" 'vinuly': 1,\n",
" 'soumraku': 1,\n",
" 'klín': 13,\n",
" 'posléze': 1,\n",
" 'šerem': 1,\n",
" 'jedno': 3,\n",
" 'splynou': 1,\n",
" 'S': 3,\n",
" 'nimi': 4,\n",
" 'stromy': 1,\n",
" 'stromům': 1,\n",
" 'vinou': 3,\n",
" 'Nejzáze': 1,\n",
" 'stíní': 1,\n",
" 'šero': 1,\n",
" 'hor': 8,\n",
" 'bříza': 1,\n",
" 'boru': 2,\n",
" 'bříze': 1,\n",
" 'bor': 1,\n",
" 'kloní': 3,\n",
" 'Vlna': 1,\n",
" 'za': 11,\n",
" 'vlnou': 1,\n",
" 'potokem': 1,\n",
" 'spěchá': 4,\n",
" 'Vře': 1,\n",
" 'plnou': 1,\n",
" 'každý': 2,\n",
" 'tvor': 3,\n",
" 'Za': 5,\n",
" 'růžového': 1,\n",
" 'večera': 1,\n",
" 'pod': 10,\n",
" 'dubem': 1,\n",
" 'sličná': 1,\n",
" 'děva': 1,\n",
" 'sedí': 3,\n",
" 'skály': 4,\n",
" 'břehu': 9,\n",
" 'jezera': 10,\n",
" 'daleko': 3,\n",
" 'přes': 4,\n",
" 'jezero': 6,\n",
" 'hledí': 2,\n",
" 'To': 7,\n",
" 'jí': 6,\n",
" 'modro': 3,\n",
" 'nohoum': 1,\n",
" 'vine': 2,\n",
" 'dále': 3,\n",
" 'zeleně': 2,\n",
" 'zakvítá': 2,\n",
" 'zeleněji': 2,\n",
" 'prosvítá': 2,\n",
" 'dálce': 5,\n",
" 'bledé': 5,\n",
" 'jasno': 2,\n",
" 'splyne': 3,\n",
" 'Po': 6,\n",
" 'šírošíré': 3,\n",
" 'hladině': 2,\n",
" 'umdlelý': 1,\n",
" 'dívka': 1,\n",
" 'zrak': 12,\n",
" 'upírá': 1,\n",
" 'po': 31,\n",
" 'nic': 4,\n",
" 'mimo': 1,\n",
" 'promyk': 1,\n",
" 'hvězd': 1,\n",
" 'nezírá': 2,\n",
" 'Dívčina': 1,\n",
" 'anjel': 1,\n",
" 'padlý': 1,\n",
" 'amarant': 1,\n",
" 'jaro': 2,\n",
" 'svadlý': 1,\n",
" 'ubledlých': 1,\n",
" 'lících': 2,\n",
" 'krásy': 1,\n",
" 'spějí': 1,\n",
" 'Hodina': 1,\n",
" 'všecko': 3,\n",
" 'vzala': 1,\n",
" 'ta': 2,\n",
" 'usta': 1,\n",
" 'zraky': 3,\n",
" 'čelo': 1,\n",
" 'její': 6,\n",
" 'půvabný': 1,\n",
" 'žal': 4,\n",
" 'i': 43,\n",
" 'smutek': 2,\n",
" 'psala': 1,\n",
" 'Tak': 5,\n",
" 'zašel': 1,\n",
" 'dnes': 3,\n",
" 'dvacátý': 1,\n",
" 'den': 19,\n",
" 'krajinu': 2,\n",
" 'tichou': 1,\n",
" 'kráčí': 3,\n",
" 'sen.': 5,\n",
" 'Poslední': 1,\n",
" 'požár': 4,\n",
" 'kvapně': 2,\n",
" 'hasne': 1,\n",
" 'nebe': 7,\n",
" 'růžojasné': 1,\n",
" 'nad': 18,\n",
" 'modrými': 1,\n",
" 'horami': 4,\n",
" 'míhá': 5,\n",
" '„': 55,\n",
" 'On': 4,\n",
" 'nejde': 1,\n",
" 'již': 20,\n",
" 'nevrátí': 1,\n",
" '!': 82,\n",
" 'Svedenou': 1,\n",
" 'tu': 7,\n",
" 'zachvátí': 1,\n",
" '“': 55,\n",
" 'Hluboký': 1,\n",
" 'ňadra': 2,\n",
" 'zdvíhá': 5,\n",
" 'bolestný': 1,\n",
" 'srdcem': 1,\n",
" 'bije': 2,\n",
" 'cit': 5,\n",
" 'u': 17,\n",
" 'tajemné': 1,\n",
" 'vod': 6,\n",
" 'stonání': 1,\n",
" 'mísí': 2,\n",
" 'dívky': 3,\n",
" 'pláč': 4,\n",
" 'lkání': 3,\n",
" 'V': 15,\n",
" 'slzích': 1,\n",
" 'zhlíží': 2,\n",
" 'hvězdný': 5,\n",
" 'svit': 11,\n",
" 'plynou': 3,\n",
" 'Vřelé': 1,\n",
" 'ty': 2,\n",
" 'tváře': 3,\n",
" 'chladné': 1,\n",
" 'padající': 2,\n",
" 'hvězdy': 6,\n",
" 'hynou': 3,\n",
" 'kam': 2,\n",
" 'zapadnou': 1,\n",
" 'květ': 6,\n",
" 'uvadne': 1,\n",
" 'Viz': 2,\n",
" 'mihla': 1,\n",
" 'kraje': 5,\n",
" 'ní': 5,\n",
" 'nahnuté': 1,\n",
" 'větýrek': 1,\n",
" 'bílým': 1,\n",
" 'šatem': 1,\n",
" 'vlaje': 1,\n",
" 'Oko': 1,\n",
" 'má': 6,\n",
" 'dálku': 3,\n",
" 'napnuté': 3,\n",
" 'Teď': 5,\n",
" 'rychle': 3,\n",
" 'utírá': 1,\n",
" 'rukou': 2,\n",
" 'si': 10,\n",
" 'zastírá': 1,\n",
" 'upírajíc': 1,\n",
" 'dálné': 2,\n",
" 'hory': 10,\n",
" 'vlnách': 2,\n",
" 'jiskra': 1,\n",
" 'jiskru': 1,\n",
" 'honí': 1,\n",
" 'vodě': 2,\n",
" 'hvězda': 3,\n",
" 's': 18,\n",
" 'hvězdou': 1,\n",
" 'hraje': 3,\n",
" 'Jak': 6,\n",
" 'holoubátko': 1,\n",
" 'sněhobílé': 1,\n",
" 'černým': 1,\n",
" 'mračnem': 1,\n",
" 'přelétá': 2,\n",
" 'lílie': 3,\n",
" 'vodní': 1,\n",
" 'zakvétá': 1,\n",
" 'temné': 5,\n",
" 'číle': 1,\n",
" 'níží': 2,\n",
" 'temných': 5,\n",
" 'cosi': 1,\n",
" 'blíží': 2,\n",
" 'Malá': 1,\n",
" 'chvíle': 3,\n",
" 'čápa': 2,\n",
" 'vážný': 2,\n",
" 'let': 5,\n",
" 'ne': 1,\n",
" 'holoubě': 1,\n",
" 'či': 2,\n",
" 'bílá': 3,\n",
" 'plachta': 1,\n",
" 'větrem': 2,\n",
" 'houpá': 1,\n",
" 'Štíhlé': 1,\n",
" 'veslo': 1,\n",
" 'modru': 2,\n",
" 'koupá': 1,\n",
" 'dlouhé': 5,\n",
" 'pruhy': 2,\n",
" 'kolem': 16,\n",
" 'tvoří': 2,\n",
" 'Těm': 1,\n",
" 'zlaté': 1,\n",
" 'růže': 2,\n",
" 'při': 5,\n",
" 'doubí': 1,\n",
" 'horách': 6,\n",
" 'nebi': 1,\n",
" 'hoří': 2,\n",
" 'růžovým': 1,\n",
" 'zlatem': 1,\n",
" 'čela': 1,\n",
" 'broubí': 3,\n",
" 'Rychlý': 1,\n",
" 'to': 20,\n",
" 'člůnek': 1,\n",
" 'blíže': 4,\n",
" 'on': 11,\n",
" 'Ty': 2,\n",
" 'péra': 2,\n",
" 'kvítí': 2,\n",
" 'klobouk': 2,\n",
" 'oko': 6,\n",
" 'ním': 4,\n",
" 'svítí': 4,\n",
" 'ten': 4,\n",
" 'plášť': 2,\n",
" 'Již': 3,\n",
" 'člůn': 2,\n",
" 'skalou': 1,\n",
" 'víže': 2,\n",
" 'Vzhůru': 1,\n",
" 'skále': 4,\n",
" 'lehký': 3,\n",
" 'krok': 4,\n",
" 'uzounkou': 1,\n",
" 'stezkou': 2,\n",
" 'plavce': 1,\n",
" 'vede': 1,\n",
" 'Dívce': 1,\n",
" 'zardí': 1,\n",
" 'dub': 2,\n",
" 'skryta': 1,\n",
" 'Vstříc': 1,\n",
" 'mu': 13,\n",
" 'běží': 2,\n",
" 'zaplesá': 1,\n",
" 'dlouhý': 5,\n",
" 'skok': 3,\n",
" 'plavci': 1,\n",
" 'prsou': 1,\n",
" 'leží': 2,\n",
" 'Ha': 1,\n",
" 'Běda': 1,\n",
" 'mi': 5,\n",
" 'Vtom': 1,\n",
" 'známou': 1,\n",
" 'osvítila': 1,\n",
" 'hrůzou': 4,\n",
" 'krev': 4,\n",
" 'žilách': 1,\n",
" 'staví': 1,\n",
" 'Kde': 3,\n",
" 'Vilém': 2,\n",
" 'můj': 15,\n",
" '?': 21,\n",
" 'plavec': 1,\n",
" 'tichými': 1,\n",
" 'slovy': 1,\n",
" 'šepce': 4,\n",
" 'praví': 1,\n",
" ':': 15,\n",
" 'Tam': 5,\n",
" 'jezeru': 7,\n",
" 'vížka': 1,\n",
" 'ční': 1,\n",
" 'stromů': 1,\n",
" 'noc': 14,\n",
" 'bílý': 5,\n",
" 'hlubokoť': 2,\n",
" 'stopen': 3,\n",
" 'však': 5,\n",
" 'hlouběji': 1,\n",
" 'ještě': 11,\n",
" 'vodu': 1,\n",
" 'vryt': 1,\n",
" 'z': 19,\n",
" 'mala': 5,\n",
" 'okénka': 1,\n",
" 'lampy': 4,\n",
" 'myšlenkou': 3,\n",
" 'baví': 1,\n",
" 'že': 4,\n",
" 'příští': 4,\n",
" 'jej': 3,\n",
" 'žití': 2,\n",
" 'zbaví': 1,\n",
" 'hanu': 2,\n",
" 'tvoji': 1,\n",
" 'vinu': 4,\n",
" 'dozvěděl': 1,\n",
" 'svůdce': 4,\n",
" 'tvého': 1,\n",
" 'vraždě': 1,\n",
" 'zavraždil': 1,\n",
" 'otce': 3,\n",
" 'svého': 4,\n",
" 'Msta': 1,\n",
" 'patách': 1,\n",
" 'jeho': 20,\n",
" 'činu': 1,\n",
" 'Hanebně': 1,\n",
" 'zemře': 1,\n",
" 'Poklid': 1,\n",
" 'dán': 2,\n",
" 'květou': 1,\n",
" 'zbledlé': 1,\n",
" 'obdrží': 1,\n",
" 'stán': 6,\n",
" 'štíhlé': 1,\n",
" 'oudy': 1,\n",
" 'kolo': 9,\n",
" 'vpletou': 1,\n",
" 'skoná': 1,\n",
" 'strašný': 9,\n",
" 'lesů': 10,\n",
" 'pán': 7,\n",
" 'hanbu': 1,\n",
" 'měj': 2,\n",
" 'světa': 1,\n",
" 'kletbu': 2,\n",
" 'mou': 5,\n",
" 'Obrátí': 1,\n",
" 'Utichl': 2,\n",
" 'slezl': 1,\n",
" 'krátký': 1,\n",
" 'svůj': 5,\n",
" 'najde': 1,\n",
" 'Ten': 3,\n",
" 'letí': 1,\n",
" 'menší': 2,\n",
" 'mezi': 5,\n",
" 'zajde': 2,\n",
" 'Tiché': 1,\n",
" 'jsou': 5,\n",
" 'vlny': 1,\n",
" 'temný': 4,\n",
" 'vše': 8,\n",
" 'lazurným': 1,\n",
" 'pláštěm': 1,\n",
" 'krylo': 1,\n",
" 'vodou': 1,\n",
" 'bílých': 5,\n",
" 'skví': 1,\n",
" 'šatů': 1,\n",
" 'krajina': 2,\n",
" 'Jarmilo': 7,\n",
" 'hlubinách': 1,\n",
" 'vody': 6,\n",
" 'Je': 3,\n",
" 'Zve': 1,\n",
" 'hrám': 1,\n",
" 'hrdliččin': 5,\n",
" 'Ilustrace': 5,\n",
" 'č': 5,\n",
" '2': 2,\n",
" 'Klesla': 1,\n",
" 'nebes': 2,\n",
" 'výše': 2,\n",
" 'mrtvá': 2,\n",
" 'siný': 1,\n",
" 'padá': 2,\n",
" 'neskončené': 1,\n",
" 'říše': 1,\n",
" 'věčně': 4,\n",
" 'věčný': 1,\n",
" 'byt': 3,\n",
" 'Její': 1,\n",
" 'zní': 6,\n",
" 'hrobu': 1,\n",
" 'všeho': 1,\n",
" 'jekot': 1,\n",
" 'hrůzný': 1,\n",
" 'kvíl': 1,\n",
" 'Kdy': 1,\n",
" 'dopadne': 1,\n",
" 'konce': 7,\n",
" 'Nikdy': 1,\n",
" 'nikde': 1,\n",
" 'žádný': 10,\n",
" 'cíl': 2,\n",
" 'Kol': 1,\n",
" 'bílé': 10,\n",
" 'věže': 2,\n",
" 'větry': 2,\n",
" 'hrají': 2,\n",
" 'vlnky': 3,\n",
" 'šepotají': 2,\n",
" 'Na': 5,\n",
" 'zdě': 1,\n",
" 'stříbrnou': 1,\n",
" 'rozlila': 1,\n",
" 'hluboko': 1,\n",
" 'věži': 1,\n",
" 'temno': 6,\n",
" 'pouhé': 6,\n",
" 'neb': 3,\n",
" 'jasna': 1,\n",
" 'měsíce': 5,\n",
" 'světlá': 2,\n",
" 'moc': 5,\n",
" 'uzounkým': 1,\n",
" 'oknem': 1,\n",
" 'sklepení': 2,\n",
" 'proletši': 1,\n",
" 'změní': 2,\n",
" 'pološerou': 1,\n",
" 'Sloup': 1,\n",
" 'sloupu': 3,\n",
" 'rameno': 1,\n",
" 'podává': 1,\n",
" 'temnotou': 1,\n",
" 'noční': 9,\n",
" 'Z': 5,\n",
" 'venku': 2,\n",
" 'větru': 3,\n",
" 'vání': 2,\n",
" 'zvražděných': 1,\n",
" 'vězňů': 1,\n",
" 'vlasami': 1,\n",
" 'vězně': 14,\n",
" 'pohrává': 1,\n",
" 'kamenný': 4,\n",
" 'složen': 3,\n",
" 'stůl': 4,\n",
" 'hlavu': 3,\n",
" 'o': 6,\n",
" 'ruce': 3,\n",
" 'opírá': 3,\n",
" 'polou': 3,\n",
" 'sedě': 3,\n",
" 'kleče': 3,\n",
" 'půl': 4,\n",
" 'hloub': 2,\n",
" 'myšlenek': 2,\n",
" 'zabírá': 2,\n",
" 'tváři': 9,\n",
" 'mračna': 2,\n",
" 'jdou': 2,\n",
" 'zahalil': 3,\n",
" 'vězeň': 10,\n",
" 'ně': 3,\n",
" 'duši': 2,\n",
" 'myšlenka': 4,\n",
" 'umírá': 7,\n",
" 'Hluboká': 2,\n",
" 'rouškou': 1,\n",
" 'teď': 4,\n",
" 'přikrýváš': 1,\n",
" 'dědinu': 1,\n",
" 'ona': 4,\n",
" 'truchlí': 2,\n",
" 'pro': 2,\n",
" 'mě': 5,\n",
" 'Že': 1,\n",
" 'pouhý': 2,\n",
" 'sen': 10,\n",
" 'Ta': 1,\n",
" 'dávno': 3,\n",
" 'neví': 1,\n",
" 'mně': 4,\n",
" 'Sotvaže': 1,\n",
" 'zítra': 6,\n",
" 'jasný': 1,\n",
" 'lesy': 6,\n",
" 'vstane': 1,\n",
" 'já': 6,\n",
" 'hanebně': 1,\n",
" 'jsem': 17,\n",
" 'odpraven': 2,\n",
" 'vesele': 1,\n",
" 'vzplane.': 1,\n",
" 'Umlknul': 2,\n",
" 'jen': 9,\n",
" 'sloupy': 1,\n",
" 'dál': 11,\n",
" 'rozlíhá': 1,\n",
" 'jakby': 1,\n",
" 'přimrazen': 2,\n",
" 'konci': 1,\n",
" 'síně': 2,\n",
" 'usne': 1,\n",
" 'temnotě': 1,\n",
" 'Hluboké': 3,\n",
" 'ticho': 9,\n",
" 'té': 3,\n",
" 'temnosti': 1,\n",
" 'zpět': 3,\n",
" 'vábí': 1,\n",
" 'časy': 1,\n",
" 'pominulé': 1,\n",
" 'svých': 1,\n",
" 'snách': 1,\n",
" 'dny': 2,\n",
" 'mladosti': 1,\n",
" 'zas': 8,\n",
" 'žije': 1,\n",
" 'uplynulé': 1,\n",
" 'vzpomnění': 1,\n",
" 'mladistvých': 1,\n",
" 'mladistvé': 2,\n",
" 'sny': 1,\n",
" 'vábilo': 1,\n",
" 'lilo': 1,\n",
" 'srdce': 5,\n",
" 'citech': 1,\n",
" 'potopilo': 1,\n",
" 'marná': 1,\n",
" 'touha': 1,\n",
" 'zašlý': 1,\n",
" 'svět': 4,\n",
" 'jezerem': 4,\n",
" 'hora': 1,\n",
" 'horu': 2,\n",
" 'západní': 2,\n",
" 'stíhá': 4,\n",
" 'zdá': 2,\n",
" 'temném': 1,\n",
" 'posledně': 2,\n",
" 'dítko': 1,\n",
" 'Od': 2,\n",
" 'vyhnán': 1,\n",
" 'loupežnickém': 1,\n",
" 'roste': 2,\n",
" 'sboru': 1,\n",
" 'Později': 1,\n",
" 'vůdcem': 1,\n",
" 'spolku': 1,\n",
" 'zván': 1,\n",
" 'dovede': 1,\n",
" 'činy': 1,\n",
" 'neslýchané': 1,\n",
" 'všude': 2,\n",
" 'jest': 3,\n",
" 'jméno': 3,\n",
" 'znané': 1,\n",
" 'každémuť': 1,\n",
" 'Strašný': 2,\n",
" 'Až': 3,\n",
" 'poslez': 1,\n",
" 'láska': 3,\n",
" 'svadlé': 1,\n",
" 'nejvejš': 1,\n",
" 'roznítí': 1,\n",
" 'pomstu': 2,\n",
" 'poznav': 1,\n",
" 'padlé': 1,\n",
" 'zavraždí': 1,\n",
" 'neznaného': 1,\n",
" 'Protož': 1,\n",
" 'vězení': 2,\n",
" 'být': 1,\n",
" 'vyvstane': 1,\n",
" 'Sok': 1,\n",
" 'otec': 2,\n",
" 'Vrah': 1,\n",
" 'syn': 1,\n",
" 'mojí': 1,\n",
" 'Neznámý': 1,\n",
" 'čin': 1,\n",
" 'pronesl': 1,\n",
" 'dvojí': 2,\n",
" 'Proč': 4,\n",
" 'vyvržen': 1,\n",
" 'stal': 1,\n",
" 'Čí': 2,\n",
" 'pomstí': 1,\n",
" 'nesu': 1,\n",
" 'Ne': 1,\n",
" 'života': 2,\n",
" 'snad': 1,\n",
" 'vyváben': 1,\n",
" 'bych': 1,\n",
" 'ztrestal': 1,\n",
" 'A': 16,\n",
" 'jestliže': 1,\n",
" 'vůli': 1,\n",
" 'nejednal': 1,\n",
" 'proč': 1,\n",
" 'smrtí': 1,\n",
" 'zlou': 1,\n",
" 'časně': 1,\n",
" 'hynu': 1,\n",
" 'Časně': 1,\n",
" 'Hrůzou': 1,\n",
" 'obražený': 1,\n",
" 'od': 2,\n",
" 'stěn': 2,\n",
" 'hluboké': 3,\n",
" 'noci': 4,\n",
" 'němý': 1,\n",
" 'daleké': 3,\n",
" 'kobky': 1,\n",
" 'zajme': 2,\n",
" 'paměť': 1,\n",
" 'nový': 4,\n",
" 'Ach': 5,\n",
" 'Anjel': 1,\n",
" 'klesla': 1,\n",
" 'dřív': 1,\n",
" 'než': 7,\n",
" 'ji': 1,\n",
" 'znal': 1,\n",
" 'tvůj': 2,\n",
" 'Má': 1,\n",
" 'kletba': 1,\n",
" 'Léč': 1,\n",
" 'hluboký': 4,\n",
" 'umoří': 1,\n",
" 'slova': 4,\n",
" 'Kvapně': 1,\n",
" 'vstal': 2,\n",
" 'nocí': 4,\n",
" 'řinčí': 2,\n",
" 'řetězů': 3,\n",
" 'hřmot': 4,\n",
" 'okna': 1,\n",
" 'zalétá': 2,\n",
" 'ven': 3,\n",
" 'hluky': 1,\n",
" 'Ouplný': 1,\n",
" 'měsíc': 5,\n",
" 'přikryl': 2,\n",
" 'mrak': 7,\n",
" 'nade': 3,\n",
" 'horní': 1,\n",
" 'vychází': 2,\n",
" 'ztracené': 1,\n",
" 'světlo': 5,\n",
" 'Zrak': 1,\n",
" 'tyto': 1,\n",
" 'bolný': 1,\n",
" 'vodí': 1,\n",
" 'krásnáť': 1,\n",
" 'krásný': 6,\n",
" 'střídá': 2,\n",
" 'mrtvý': 5,\n",
" 'hled': 1,\n",
" 'více': 2,\n",
" 'neuhlídá': 1,\n",
" 'jako': 14,\n",
" 'šedý': 2,\n",
" 'rozestírá': 2,\n",
" 'Sklesl': 1,\n",
" 'sklesl': 1,\n",
" 'pak': 5,\n",
" 'tichu': 1,\n",
" 'horám': 1,\n",
" 'mraku': 1,\n",
" 'ohromna': 1,\n",
" 'ptáka': 1,\n",
" 'peruť': 1,\n",
" 'dlouhá': 3,\n",
" 'šírou': 3,\n",
" 'dálkou': 3,\n",
" 'tma': 3,\n",
" 'pouhá': 2,\n",
" 'Slyš': 1,\n",
" 'sladký': 4,\n",
" 'pronikl': 1,\n",
" 'temnou': 1,\n",
" 'lesní': 1,\n",
" 'trouba': 1,\n",
" 'uvádí': 1,\n",
" 'hudbu': 2,\n",
" 'jemnou': 1,\n",
" 'Vše': 1,\n",
" 'uspal': 1,\n",
" 'tento': 2,\n",
" 'zvuk': 5,\n",
" 'dálka': 1,\n",
" 'dřímá': 3,\n",
" 'Vězeň': 2,\n",
" 'zapomněl': 1,\n",
" 'vlastních': 1,\n",
" 'muk': 2,\n",
" 'hudba': 1,\n",
" 'ucho': 5,\n",
" 'jímá': 1,\n",
" 'milý': 1,\n",
" 'život': 4,\n",
" 'vdechne': 1,\n",
" 'zítřejší': 2,\n",
" 'ach': 3,\n",
" 'mine': 1,\n",
" 'mé': 2,\n",
" 'nikdy': 6,\n",
" 'těch': 1,\n",
" 'zvuků': 1,\n",
" 'nedoslechne': 1,\n",
" 'Zpět': 1,\n",
" 'sklesne': 1,\n",
" 'řetěz': 1,\n",
" 'hluk': 9,\n",
" 'kobkou': 2,\n",
" 'hloubi': 2,\n",
" 'opět': 10,\n",
" 'svírá': 1,\n",
" 'trouby': 2,\n",
" 'jemný': 1,\n",
" 'Budoucí': 1,\n",
" 'Zítřejší': 1,\n",
" 'Co': 2,\n",
" 'něj': 5,\n",
" 'spaní': 2,\n",
" 'bez': 8,\n",
" 'snění': 1,\n",
" 'Snad': 1,\n",
" 'žiji': 1,\n",
" 'jiný': 2,\n",
" 'Či': 1,\n",
" 'čem': 1,\n",
" 'tady': 2,\n",
" 'toužil': 1,\n",
" 'neměla': 1,\n",
" 'šírá': 1,\n",
" 'zem': 2,\n",
" 'zjeví': 1,\n",
" 'Kdo': 2,\n",
" 'ví': 1,\n",
" 'neví.': 1,\n",
" 'mlčí': 1,\n",
" 'Tichá': 1,\n",
" 'přikrývá': 1,\n",
" 'Zhasla': 1,\n",
" 'šírý': 3,\n",
" 'dol': 4,\n",
" 'hrob': 6,\n",
" 'daleký': 3,\n",
" 'zívá': 2,\n",
" 'Umlkl': 2,\n",
" 'vítr': 2,\n",
" 'usnul': 1,\n",
" 'líbý': 1,\n",
" 'síni': 1,\n",
" 'mrtvé': 5,\n",
" 'temná': 2,\n",
" 'Temnější': 2,\n",
" 'nastává': 2,\n",
" 'Pryč': 2,\n",
" 'myšlenko': 2,\n",
" 'citu': 1,\n",
" 'myšlenku': 2,\n",
" 'překonává': 2,\n",
" 'mokrých': 1,\n",
" 'kapka': 1,\n",
" 'kapkou': 1,\n",
" 'jejich': 2,\n",
" 'pádu': 1,\n",
" 'dutý': 2,\n",
" 'dalekou': 1,\n",
" 'rozložen': 1,\n",
" 'měřil': 2,\n",
" 'hyne': 4,\n",
" 'delší': 1,\n",
" 'hrůzy': 1,\n",
" 'Kapky': 1,\n",
" 'svým': 4,\n",
" 'pádem': 3,\n",
" 'měří': 2,\n",
" 'Zde': 1,\n",
" 'ba': 3,\n",
" 'kmit': 3,\n",
" 'vloudí': 1,\n",
" 'pustý': 1,\n",
" 'přebývá': 1,\n",
" 'díl': 2,\n",
" 'není': 1,\n",
" 'chvíl': 1,\n",
" 'nemine': 1,\n",
" 'nevstane': 1,\n",
" 'času': 2,\n",
" 'neubývá': 1,\n",
" 'mne': 2,\n",
" 'věčnost': 1,\n",
" 'dívá': 1,\n",
" 'prázdno': 2,\n",
" 'mnou': 3,\n",
" 'pode': 4,\n",
" 'Bez': 2,\n",
" 'místo': 2,\n",
" 'smrtelný': 1,\n",
" 'mysle': 1,\n",
" 'toť': 1,\n",
" '‚nic': 1,\n",
" '‘': 1,\n",
" 'nazývá': 1,\n",
" 'skončí': 1,\n",
" 'pusté': 1,\n",
" 'uveden': 1,\n",
" 'omdlívá': 2,\n",
" 'lehounce': 1,\n",
" 'jezerní': 1,\n",
" 'věží': 1,\n",
" 'uspávati': 1,\n",
" 'zdají': 1,\n",
" 'hlubokých': 2,\n",
" 'mrákotách': 2,\n",
" 'Strážného': 1,\n",
" 'vzbudil': 1,\n",
" 'jejž': 2,\n",
" 'činí': 1,\n",
" 'padání': 1,\n",
" 'světlem': 1,\n",
" 'vstoupil': 1,\n",
" 'Lehký': 1,\n",
" 'chod': 2,\n",
" 'nevzbudil': 1,\n",
" 'strašných': 1,\n",
" 'zdání': 1,\n",
" 'dlouhou': 2,\n",
" 'síní': 1,\n",
" 'bledší': 5,\n",
" 'vzadu': 1,\n",
" 'zmizí': 2,\n",
" 'pustopustá': 1,\n",
" 'ostatní': 2,\n",
" 'zastíní': 1,\n",
" 'Leč': 3,\n",
" 'nepohnutý': 1,\n",
" 'halil': 1,\n",
" 'ač': 1,\n",
" 'strážce': 4,\n",
" 'rudá': 2,\n",
" 'ubledlou': 2,\n",
" 'polila': 1,\n",
" 'prchla': 1,\n",
" 'čírá': 1,\n",
" 'znovu': 3,\n",
" 'mdlobách': 1,\n",
" 'jeví': 1,\n",
" 'hlasu': 1,\n",
" 'šepot': 4,\n",
" 'mdlý': 1,\n",
" 'trapnýť': 1,\n",
" 'zlý': 1,\n",
" 'Duch': 1,\n",
" 'duch': 1,\n",
" 'duše': 1,\n",
" 'jednotlivá': 1,\n",
" 'ze': 2,\n",
" 'sevřených': 1,\n",
" 'ust': 1,\n",
" 'Než': 1,\n",
" 'dostihne': 1,\n",
" 'strašná': 1,\n",
" 'ničím': 1,\n",
" 'jakž': 1,\n",
" ...})"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tokens = Counter()\n",
"for token in word_tokenize(text):\n",
" if token:\n",
" tokens[token] += 1\n",
"tokens"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create DataFrame\n",
"Pandas DataFrame is a data object, easy to handle. Let's experiment with it."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" token | \n",
" freq | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" Byl | \n",
" 3 | \n",
"
\n",
" \n",
" 2 | \n",
" pozdní | \n",
" 6 | \n",
"
\n",
" \n",
" 3 | \n",
" večer | \n",
" 5 | \n",
"
\n",
" \n",
" 4 | \n",
" – | \n",
" 256 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" token freq\n",
"0 1 1\n",
"1 Byl 3\n",
"2 pozdní 6\n",
"3 večer 5\n",
"4 – 256"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame.from_dict({\"token\": [k for k,v in dict(tokens).items()], \"freq\": [v for k,v in dict(tokens).items()]})\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### DataFrame Info\n",
"**TASK 3**: How many different tokens are in the text? This number is the *vocabulary size*."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 1926 entries, 0 to 1925\n",
"Data columns (total 2 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 token 1926 non-null object\n",
" 1 freq 1926 non-null int64 \n",
"dtypes: int64(1), object(1)\n",
"memory usage: 30.2+ KB\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" token | \n",
" freq | \n",
"
\n",
" \n",
" \n",
" \n",
" 240 | \n",
" ! | \n",
" 82 | \n",
"
\n",
" \n",
" 1068 | \n",
" ( | \n",
" 4 | \n",
"
\n",
" \n",
" 1069 | \n",
" ) | \n",
" 4 | \n",
"
\n",
" \n",
" 17 | \n",
" , | \n",
" 405 | \n",
"
\n",
" \n",
" 11 | \n",
" . | \n",
" 203 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" token freq\n",
"240 ! 82\n",
"1068 ( 4\n",
"1069 ) 4\n",
"17 , 405\n",
"11 . 203"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values(by='token', ascending=True).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK 4**: What are the most frequent tokens in the text? Write down what you think will be 5 most frequent tokens. Sort data by frequency and display 5 most frequent tokens."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pandas Series\n",
"Pandas Series is a slice of DataFrame. Usually, a Series is a result of slicing a DataFrame using a condition.\n",
"Let's see a singe row, a single column, and a single cell.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"token 1\n",
"freq 1\n",
"Name: 0, dtype: object"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[0]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"1 3\n",
"2 6\n",
"3 5\n",
"4 256\n",
" ... \n",
"1921 1\n",
"1922 1\n",
"1923 1\n",
"1924 1\n",
"1925 1\n",
"Name: freq, Length: 1926, dtype: int64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['freq']"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['token'][0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Tokens with a certain frequency"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" token | \n",
" freq | \n",
"
\n",
" \n",
" \n",
" \n",
" 14 | \n",
" ku | \n",
" 10 | \n",
"
\n",
" \n",
" 64 | \n",
" slzy | \n",
" 10 | \n",
"
\n",
" \n",
" 89 | \n",
" tvář | \n",
" 10 | \n",
"
\n",
" \n",
" 116 | \n",
" vzdy | \n",
" 10 | \n",
"
\n",
" \n",
" 154 | \n",
" pod | \n",
" 10 | \n",
"
\n",
" \n",
" 161 | \n",
" jezera | \n",
" 10 | \n",
"
\n",
" \n",
" 294 | \n",
" si | \n",
" 10 | \n",
"
\n",
" \n",
" 298 | \n",
" hory | \n",
" 10 | \n",
"
\n",
" \n",
" 467 | \n",
" lesů | \n",
" 10 | \n",
"
\n",
" \n",
" 531 | \n",
" žádný | \n",
" 10 | \n",
"
\n",
" \n",
" 534 | \n",
" bílé | \n",
" 10 | \n",
"
\n",
" \n",
" 592 | \n",
" vězeň | \n",
" 10 | \n",
"
\n",
" \n",
" 608 | \n",
" sen | \n",
" 10 | \n",
"
\n",
" \n",
" 848 | \n",
" opět | \n",
" 10 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" token freq\n",
"14 ku 10\n",
"64 slzy 10\n",
"89 tvář 10\n",
"116 vzdy 10\n",
"154 pod 10\n",
"161 jezera 10\n",
"294 si 10\n",
"298 hory 10\n",
"467 lesů 10\n",
"531 žádný 10\n",
"534 bílé 10\n",
"592 vězeň 10\n",
"608 sen 10\n",
"848 opět 10"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df.freq==10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data Visualization"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAADtCAYAAABESjVvAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQvElEQVR4nO3df4xdZZ3H8fdnS5eisgVLl9ROdYh2V2GjlR3QFUNcjMgPYyG6LmRTiYFUE0yQmF3Rf5DETTRZZbdmhdTFtRoEiWBo/LG7BUmIJihTrUhbDV2sMk2ltWKREFhbvvvHnOqlTjt35s502sf3K7m55zzPc+753n8+98wz594nVYUkqS1/MtcFSJJmnuEuSQ0y3CWpQYa7JDXIcJekBh031wUAnHLKKTU8PDzXZUjSMWXjxo2/rKrFE/UdFeE+PDzM6OjoXJchSceUJD87VJ/TMpLUIMNdkhpkuEtSg46KOXdJGtRvf/tbxsbGeOaZZ+a6lBm3YMEChoaGmD9/ft/HGO6SmjA2NsaJJ57I8PAwSea6nBlTVezZs4exsTFOO+20vo9zWkZSE5555hkWLVrUVLADJGHRokVT/ovEcJfUjNaC/YDpvC/DXZIa1Pece5J5wCiwo6reluQ04HZgEbARWFVV/5fkeOALwF8De4C/r6rtM165JB3G8HVfn9HX2/7xiycds2bNGm666SbOPPNMbr311hk9/1RN5cr9GmBrz/4ngBur6hXAE8CVXfuVwBNd+43dOElq3mc+8xk2bNjwvGDft2/fnNTSV7gnGQIuBv6j2w9wHvCVbsg64JJue2W3T9f/5rQ6ESZJnfe97308+uijXHjhhSxcuJBVq1ZxzjnnsGrVKnbv3s073vEOzjrrLM466yy+853vALBnzx7OP/98zjjjDK666ipe9rKX8ctf/nJG6un3yv1fgX8Cnuv2FwG/rqoDH0ljwNJueynwGEDXv7cb/zxJVicZTTK6e/fu6VUvSUeJm2++mZe85CXcd999XHvttWzZsoV77rmH2267jWuuuYZrr72WBx98kDvvvJOrrroKgBtuuIE3vvGNbN68mUsvvZSf//znM1bPpHPuSd4G7KqqjUneNFMnrqq1wFqAkZERF3KV1JS3v/3tnHDCCQDcc889bNmy5Xd9Tz75JE899RT3338/d911FwAXX3wxJ5988oydv59/qJ4DvD3JRcAC4M+AfwNOSnJcd3U+BOzoxu8AlgFjSY4DFjL+j1VJ+qPxwhe+8Hfbzz33HA888AALFiw4YuefdFqmqj5cVUNVNQxcBnyrqv4BuA94ZzfsCuDubnt9t0/X/62q8spc0h+t888/n09/+tO/29+0aRMA5557Ll/60pcA+OY3v8kTTzwxY+cc5OcHPgTcnuRjwA+AW7r2W4AvJtkG/IrxDwRJOqL6uXXxSFmzZg1XX301r371q9m3bx/nnnsuN998M9dffz2XX345Z5xxBm94wxt46UtfOmPnzNFwUT0yMlIu1iFpEFu3buVVr3rVXJcxkAMLF51yyil/0DfR+0uysapGJnotv6EqSQ3yVyEl6Sixffv2GXstr9wlNeNomGaeDdN5X4a7pCYsWLCAPXv2NBfwB37Pfaq3UTotI6kJQ0NDjI2N0eI33g+sxDQVhrukJsyfP39KKxW1zmkZSWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQZOGe5IFSb6X5IdJNie5oWv/fJKfJtnUPVZ07UmyJsm2JA8lOXOW34Mk6SD9/HDYs8B5VfVUkvnAt5N8s+v7x6r6ykHjLwSWd4/XATd1z5KkI2TSK/ca91S3O797HO4Hk1cCX+iOewA4KcmSwUuVJPWrrzn3JPOSbAJ2ARuq6rtd1z93Uy83Jjm+a1sKPNZz+FjXdvBrrk4ymmS0xd9flqS51Fe4V9X+qloBDAFnJ/kr4MPAK4GzgBcDH5rKiatqbVWNVNXI4sWLp1a1JOmwpnS3TFX9GrgPuKCqdnZTL88C/wmc3Q3bASzrOWyoa5MkHSH93C2zOMlJ3fYJwFuAHx+YR08S4BLg4e6Q9cC7u7tmXg/sraqds1C7JOkQ+rlbZgmwLsk8xj8M7qiqryX5VpLFQIBNwPu68d8ALgK2AU8D75nxqiVJhzVpuFfVQ8BrJ2g/7xDjC7h68NIkSdPlN1QlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ3qZ5m9BUm+l+SHSTYnuaFrPy3Jd5NsS/LlJH/atR/f7W/r+odn+T1Ikg7Sz5X7s8B5VfUaYAVwQbc26ieAG6vqFcATwJXd+CuBJ7r2G7txkqQjaNJwr3FPdbvzu0cB5wFf6drXMb5INsDKbp+u/83dItqSpCOkrzn3JPOSbAJ2ARuA/wV+XVX7uiFjwNJueynwGEDXvxdYNMFrrk4ymmR09+7dA70JSdLz9RXuVbW/qlYAQ8DZwCsHPXFVra2qkaoaWbx48aAvJ0nqMaW7Zarq18B9wN8AJyU5rusaAnZ02zuAZQBd/0Jgz0wUK0nqTz93yyxOclK3fQLwFmAr4yH/zm7YFcDd3fb6bp+u/1tVVTNYsyRpEsdNPoQlwLok8xj/MLijqr6WZAtwe5KPAT8AbunG3wJ8Mck24FfAZbNQtyTpMCYN96p6CHjtBO2PMj7/fnD7M8DfzUh1kqRp8RuqktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1KB+1lBdluS+JFuSbE5yTdf+0SQ7kmzqHhf1HPPhJNuS/CTJWyc7x4927B3sXUiSnqefNVT3AR+squ8nORHYmGRD13djVf1L7+AkpzO+buoZwEuAe5L8RVXtn8nCJUmHNumVe1XtrKrvd9u/AbYCSw9zyErg9qp6tqp+CmxjgrVWJUmzZ0pz7kmGGV8s+7td0/uTPJTkc0lO7tqWAo/1HDbGBB8GSVYnGU0yuv9pp2UkaSb1He5JXgTcCXygqp4EbgJeDqwAdgKfnMqJq2ptVY1U1ci8FyycyqGSpEn0Fe5J5jMe7LdW1V0AVfV4Ve2vqueAz/L7qZcdwLKew4e6NknSEdLP3TIBbgG2VtWnetqX9Ay7FHi4214PXJbk+CSnAcuB781cyZKkyfRzt8w5wCrgR0k2dW0fAS5PsgIoYDvwXoCq2pzkDmAL43faXO2dMpJ0ZKWq5roGjl+yvJ7d+chclyFJx5QkG6tqZKI+v6EqSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWpQP8vsLUtyX5ItSTYnuaZrf3GSDUke6Z5P7tqTZE2SbUkeSnLmbL8JSdLz9XPlvg/4YFWdDrweuDrJ6cB1wL1VtRy4t9sHuJDxdVOXA6uBm2a8aknSYU0a7lW1s6q+323/BtgKLAVWAuu6YeuAS7rtlcAXatwDwEkHLaYtSZplU5pzTzIMvBb4LnBqVe3sun4BnNptLwUe6zlsrGuTJB0hfYd7khcBdwIfqKone/tqfJXtKa20nWR1ktEko/uf3juVQyVJk+gr3JPMZzzYb62qu7rmxw9Mt3TPu7r2HcCynsOHurbnqaq1VTVSVSPzXrBwuvVLkibQz90yAW4BtlbVp3q61gNXdNtXAHf3tL+7u2vm9cDenukbSdIRcFwfY84BVgE/SrKpa/sI8HHgjiRXAj8D3tX1fQO4CNgGPA28ZyYLliRNbtJwr6pvAzlE95snGF/A1QPWJUkagN9QlaQGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAb1s4bq55LsSvJwT9tHk+xIsql7XNTT9+Ek25L8JMlbZ6twSdKh9XPl/nngggnab6yqFd3jGwBJTgcuA87ojvlMknkzVawkqT+ThntV3Q/8qs/XWwncXlXPVtVPGV8k++wB6pMkTcMgc+7vT/JQN21zcte2FHisZ8xY1/YHkqxOMppkdP/TewcoQ5J0sOmG+03Ay4EVwE7gk1N9gapaW1UjVTUy7wULp1mGJGki0wr3qnq8qvZX1XPAZ/n91MsOYFnP0KGuTZJ0BE0r3JMs6dm9FDhwJ8164LIkxyc5DVgOfG+wEiVJU3XcZAOS3Aa8CTglyRhwPfCmJCuAArYD7wWoqs1J7gC2APuAq6tq/6xULkk6pFTVXNfA8UuW17M7H5nrMiTpmJJkY1WNTNTnN1QlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQZOGe5LPJdmV5OGethcn2ZDkke755K49SdYk2ZbkoSRnzmbxkqSJ9XPl/nnggoPargPurarlwL3dPsCFjC+KvRxYDdw0M2VKkqZi0nCvqvuBXx3UvBJY122vAy7paf9CjXsAOCnJkhmqVZLUp+nOuZ9aVTu77V8Ap3bbS4HHesaNdW1/IMnqJKNJRvc/vXeaZUiSJjLwP1SrqoCaxnFrq2qkqkbmvWDhoGVIknpMN9wfPzDd0j3v6tp3AMt6xg11bZKkI2i64b4euKLbvgK4u6f93d1dM68H9vZM30iSjpDjJhuQ5DbgTcApScaA64GPA3ckuRL4GfCubvg3gIuAbcDTwHtmoWZJ0iQyPmU+t45fsrye3fnIXJchSceUJBuramSiPr+hKkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNeioCffh674+1yVIUjOOmnCXJM0cw12SGmS4S1KDDHdJatCkKzEdTpLtwG+A/cC+qhpJ8mLgy8AwsB14V1U9MViZkqSpmIkr97+tqhU9q4FcB9xbVcuBe7t9SdIRNBvTMiuBdd32OuCSWTiHJOkwBg33Av4nycYkq7u2U6tqZ7f9C+DUiQ5MsjrJaJLR/U/vHbAMSVKvgebcgTdW1Y4kfw5sSPLj3s6qqiQTrsBdVWuBtTC+QPaAdUiSegx05V5VO7rnXcBXgbOBx5MsAeiedw1apCRpaqYd7klemOTEA9vA+cDDwHrgim7YFcDdgxYpSZqaQaZlTgW+muTA63ypqv4ryYPAHUmuBH4GvGvwMiVJUzHtcK+qR4HXTNC+B3jzIEVJkgbjN1QlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNeioDPfh674+1yVI0jHtqAx3MOAlaRBHbbhLkqbvmAh3r+IlaWqOiXAHA16SpuKYCXd4fsAb9pJ0aMdUuB/MsJekic1auCe5IMlPkmxLct1snafXwWE/lX1JasmshHuSecC/AxcCpwOXJzl9Ns41U6b6YTDb+5I0iEEWyD6cs4Ft3TqrJLkdWAlsmaXzNedA0G//+MVzun801PDHtn801PDHtn801DDd/UNJVR12wHQkeSdwQVVd1e2vAl5XVe/vGbMaWN3t/iXwkxkvRJLa9rKqWjxRx2xduU+qqtYCa+fq/JLUstn6h+oOYFnP/lDXJkk6AmYr3B8Elic5LcmfApcB62fpXJKkg8zKtExV7UvyfuC/gXnA56pq82ycS5L0h2blH6qSpLl1TH9DVZI0McNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNej/AeJuXBbxn7u4AAAAAElFTkSuQmCC\n",
"text/plain": [
"