In a fragmented newsreel segment, composer Julius Kalaš, the newly elected Chairman of the Syndicate of Czechoslovak Film Artists and Technicians, gives a speech at the 1st International Congress of Film Professionals in Mariánské Lázně in 1947.
Parallel corpus, 3,297,283 words.
The idea was to create a small parallel corpus which would enable to work with entire texts in translation analysis rather then short extracts. At the same time it aimed at acquiring experience that could be used in creating a larger parallel corpus of English and Czech in the future.
Although the main part of work has been completed -- and the aims of the KACENKA grant met -- we keep improving and enlarging KACENKA gradually. Currently, it has the size of 3,297,283 words (out of which, 1,689,513 have been acquired by means of scanning).
Most of the English texts for KACENKA have been retrieved from the Internet resources. The rest -- and nearly all the Czech texts -- had to be scanned with the use of an OCR programme.
KACENKA is stored on a single CD-ROM; its use is limited by copyright restrictions.
Painter Kamil Lhoták on Bohumil Veselý's balcony. Kamil Lhoták with gilder Brůžek while choosing a frame in a fragmented segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1957, issue no. 2.
Archaeologist and karst scientist Professor Karel Absolon in his study on the occasion of his 80th birthday in a segment from Týden ve filmu (Week in Film) 1957, issue no. 26.
Conductor Karel Ančerl on Bohumil Veselý's balcony. Ančerl conducts the Czech Philharmonic in a segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1956, issue no. 4.
Writer Karel Čapek and actress Olga Scheinpflugová on Peace Square in Prague after their wedding ceremony on 26 August 1935 in a segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1935, issue no. 36. Karel Čapek with his brother, the painter Josef Čapek, in the garden of his villa in Prague-Vinohrady in the documentary Jaro v Praze (Spring in Prague, dir. Jaroslav Novotný, 1930).
Ethnographer Karel Plicka accepting the title of Merited Artist in a segment form Týden ve filmu (Week in Film) 1958, issue no. 22. The commendation is presented by Minister of Culture František Kahuda.
Sculptor Karel Pokorný giving a speech about Mikoláš Aleš in a fragmented segment from Československé filmové noviny (Czechoslovak Film News) 1946, issue no. 52. Pokorný working on a bust of J. V. Stalin in a segment from Československé filmové noviny (Czechoslovak Film News) 1952, issue no. 51.
Footage of painter Karel Svolinský in a fragmented segment from Týden ve filmu (Week in Film) 1956, issue no. 4. Svolinský working in the studio of the Academy of Arts, Architecture and Design in a segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1954, issue no. 18.
KER is a keyword extractor that was designed for scanned texts in Czech and English. It is based on the standard tf-idf algorithm with the idf tables trained on texts from Wikipedia. To deal with the data sparsity, texts are preprocessed by Morphodita: morphological dictionary and tagger.