1. Title: Historical Ethiopic Manuscript Recognition: Sequence-to-Sequence Learning
Description:The present growth of document digitization and changes in the world of linguistic landscape demands an immediate solution for enabling information access for everybody. This requires research in the area of Document Image Analysis (DIA) and Natural Language Processing (NLP). Currently, many of the well known scripts have Optical Character Recognition (OCR) systems with sufficiently high performance that enables OCR applications to be applied in industrial settings. However, OCR systems yield very-good results only on a narrow domain and very specific use cases. Thus, it is still considered as a challenging task and there are other indigenous scripts, such as Ethiopic scripts, for which no well developed OCR systems exist. Ethiopic scripts have a rich collection of manuscripts ranging from historical to modern, from vellum written to paper printed and from simple to complex layouts. Given the increasing needs to digitize such document and its impact on the accessibility of indigenous knowledge, nowadays, OCR development for Ethiopic scripts is getting a moderate amount of attention from researchers in the field of computing and linguistics. However, attempts made for Ethiopic script recognition so far are mainly focus on modern printed texts and followed a segmentation-based OCR approach that require several preprocessing steps and is very limited in addressing the issues of historical Ethiopic manuscripts recognition, which are different from the modern documents in various ways such as writing materials, writing style and shape, morphological structure, and the background. Moreover, these attempts don’t show results on the large dataset or considering all possible characters used in the historical Ethiopic manuscript. The objective of this research is, therefore, to explore and tailor contemporary deep learning techniques for the OCR of historical Ethiopic manuscripts. In addition, we will develop a standard database for the OCR of historical Ethiopic manuscripts. Since this research is an experimental research, we will use Python and Keras Application Program Interface (API) with TensorFlow for implementation. At the end of this research, the following outputs are expected to be delivered: (i). A well structured and standard dataset of historical Ethiopic Manuscript (ii). a robust OCR model designed for the recognition of historical manuscripts written in Ethiopic scripts. We expect that this research will be completed within 20 months.
Researchers: Mrs. Bezawork Tilahun (PI), Mr. Tadele Menigiste, Mrs. Tsion worku, Mr. Birhanu Hailu, Dr. Tesfa Tegegne and Dr. Lemma Kassaye
2. Title: Developing continuum of care enhancing tools and evaluating quality of intrapartum care using electronic partograph at public health facilities of Bahir Dar City administration, Northwest Ethiopia
Description: Since the dawn of Information and Communication Technology (ICT) the world has changed greatly. The rapidly evolving ICT has made ways to do things simple and convenient in all facet of life. It is a vital facilitator to increase productivity and improve quality of products. Furthermore, the recent growth of Internet and the World Wide Web have become major driver of the way society accesses and views information. Extensive application of ICT benefits health sectors in general and continuum of care i.e. the maternal, newborn, and child health in particular significantly. Healthcare practice supported by electronic processes and communication is known us e-health. E-health has a lot of promises in health sector including efficiency, enhancing quality, empowerment, etc. Annual maternal and fetal deaths are very high worldwide. According to the World Health Organization (WHO), 295,000 mothers’ died in 2017 and 29 infant deaths per 1000 occurred globally due to obstructed and prolonged labour. The majority of the figure stated is contributed by developing countries. In 2016, the estimated maternal mortality ratio of Ethiopia is 412/100, 000 and infant mortality is 48 per 1000. Many maternal and fetal mortality can be avoided by complete Antenatal care (ANC) visit,timely identification and management of labour abnormalities and postnatal care. ANC is advocated as the cornerstone for reducing children’s deaths and improving maternal health. Partograph, which is a single page paper labour monitoring tool recommended by WHO, is the most commonly used tool by health professionals to supervise active labour. The tool is used to present labour progression as well as fetal and maternal welfare graphically and it has proven records in helping to identify obstetric and fetal complications timely. However, utilization of partograph is low in developing countries. It is not initiated at all or at the right time for majority of the labours and those initiated partographs are incomplete, prone to error, leads to delayed decision and loss of client’s decisions. The reasons for poor partograph utilization include lack of pre-printed partograph, workload pressure, insufficient knowledge and poor attitude towards partograph. The other reason for low usage of partograph is manually filling out and interpreting partograph is a tiresome and complex process. Though WHO comes with different versions of partograph (simple, moderate and detailed) the problem still remains apparent. Studies in Ethiopia have shown that non-use of partograph by obstetric care providers while attending women in labour is low, which ranges from 43% to 52%. The aim of this research is to develop continuum of care enhancing tools and evaluate quality of intrapartum care using electronic partograph (epartograph)by reducing complexity of partograph. A digital partograph could boost partograph utilization as automation may increase efficiency and usability, and decrease making error and usage reluctancy by enforcement. As part of this research, we plan to develop continuum of care enhancing tools, designated “ADHERE-IN” and evaluate their usability and acceptability, to explore the challenges to use ADHERE-IN tools, and to evaluate the effect of electronic partograph on partograph utilization in Bahir Dar City. Moreover, we will evaluate the proportion of mothers that receive expected care using the partograph and investigate factors that affect partograph utilizations in the selected health facilities. The project will be completed in one and half year (from 01/11/202-01/04/2022).
Researchers: Mr. Addisu Damene (PI), Dr. Muluken Azege, Dr. Eyaya Misgan, Mr.Dabere Nigatu, Dr. Enyew Abate and Dr. Esubalew Alemneh
3. Title: Amharic Text Corpus Development based on Part of speech Tag
Description: Corpus is one of the main resources to study natural language and develop various tool for processing human languages. Amharic corpus is in its infancy. Few studies are carried out on corpus development and these attempts are not explicitly available for academicians and other stakeholders. This research concentrates on Amharic text corpus development. The corpus will be prepared by collecting various Amharic documents. The collected data will be cleaned and language experts will make annotation. The final output of this research is will be a standardized Amharic text corpus which is vital for the study of Amharic linguistic analysis.
Researchers: Mr. Tsegaye Abebe (PI), Dr. Lemma Kassaye, and Dr. Esubalew Alemneh
- Worku Kelemework (until he left for PhD study),
- Hailu Beshada (until he left due to a transfer)
4. Title: Speech Corpus Development for Speaker Independent Continuous Amharic Speech Recognition Continued from Previous Research
Description: Speech corpus development is very important for developing speech recognizers, speech analyzers and phonetic. It helps to develop standardized applications such as Question Answering, Dialog System and many other interactive systems. Developing countries like Ethiopia has not standardized corpora in which every researcher can access and do their work. Because of this reason the research result is penalized on performance. Hence different researches have been conducted, but most of the works are not available for future uses, for researchers and might not be available for future work. Lack of good speech corpus hampers (penalizes) the research result and performance. Therefore, standardized speech corpus is very important for the quality of the research and time consumption for the researcher’s too.
In the previous research period we have collected more than 200 hours of speech data, from 250 speakers in which each speaker read 420 sentences and summing up 100,000 sentences, from five different areas of Amharic speakers (Gojjam—Debre Markos, Gondar--Gondar, Wollo--Kombolcha, Shewa---Debre Berihan, Debub---Hawassa). However, we have done a lot, still speech corpus needs more data from other areas like (Oromia, Benishangul, Afar, and Somalia). In addition to that, the speech was collected from peoples in age range 20 to 40 which doesn’t include below 20 ages. Hence the aim of this research is to collect additional speech data from 200 speakers for speaker independent continuous Amharic speech which can be available for every researcher from (Oromia, Benishangul, Afar, and Somalia) in which the speakers are in age range of 20-40 and 120 speakers will be recorded Amharic reading speeches in age range of 13 to 20 from Gojjam, Gondar, Addis Ababa, and Oromia. Therefore 320 speakers will be participated.
On the other hand, the collected speech data should be analyzed and transcribed. The transcription will be done manually due to lack of automatic speech transcriber for Amharic language. To transcribe a minute of speech manually, it needs 3 hours. The total 4 number of collected speech is 109,200 (260 speaker * 420 sentences) sentences, and more than 200 hours data. The primary goal of this research is to find a standard scheme which can make the corpus be established more efficiently and be used or shared more easily.
Researchers: Dr. Tesfa Tegegne(PI), Mr. Addisu Damena, Mr. Tsegaye Abebe, and Mr. Belsty Yalew