Amharic web corpus. Crawled by SpiderLing in August 2013 and October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Amharic WIC corpus.
This text corpus contains a carefully optimized set of sentences that could be used in the process of preparing a speech corpus for the development of personalized text-to-speech system. It was designed primarily for the voice conservation procedure that must be performed in a relatively short period before a person loses his/her own voice, typically because of the total laryngectomy.
Total laryngectomy is a radical treatment procedure which is often unavoidable to save life of patients who were diagnosed with severe laryngeal cancer. In spite of being very effective with respect to the primary treatment, it significantly handicaps the patients due to the permanent loss of their ability to use voice and produce speech. Luckily, the modern methods of computer text-to-speech (TTS) synthesis offer a possibility for "digital conservation" of patient's original voice for his/her future speech communication -- a procedure called voice banking or voice conservation. Moreover, the banking procedure can be undertaken by any person facing voice degradation or loss in farther future, or who is simply is willing to keep his/her voice-print.