We use the semantic information we get for the words in the translated string from WordNet to bias the speech recognizer towards the gained knowledge. We used MT system to translate the source language string to target language. In this work, we have improved the performance of speech recognition of a translator speaking in the target language, taking the advantage of source language string and synset information from WordNet. In concluding chapter of the thesis, we show the application of MT in improving the Automatic Speech Recognition (ASR) output in the scenario of a CAT system where a human translator translates a source language string into a target language string using different input methods such as speech and typing. The subsequent approach explores the various system combination techniques through which these triangulated systems can be combined to improve the translations. As triangulation technique explores additional multi parallel data, it provides us with separately estimated phrase-tables which could be further smoothed using smoothing methods. We have used phrase table triangulation instead of sentence based triangulation as it gives better translations. Triangulation is the process of using an intermediate language as a pivot to translate a source language to a target language. Triangulation is a technique which has been found to be very useful in improving the translations when multilingual parallel corpora are present. MT systems for other Indo-Aryan language pairs which are trained on small parallel corpora are improved using various recently developed techniques in the field of SMTs. So we improved the translation quality of Indo-Aryan MT systems by exploring popular available techniques called ``Triangulation'' and ``System Combination'' to reduce problem of data sparsity and improve quality of the system. WordNet is not available for Indo-Aryan languages except Hindi and there are not many sources available for extracting clean mono-lingual data. The techniques used above rely on large mono-lingual data for the target language and also requires WordNet of the source language. Firstly, we explore the use of distributed representations in a language model in a Recurrent neural network based Language Model (RNNLM) framework with different linguistic features and secondly, we explore the use of lexical resources such as WordNet to overcome sparsity. We explore alternate strategies, given that the collection of large parallel corpora is expensive. Morphologically rich languages require large amounts of parallel data to adequately estimate parameters in a statistical MT system. We describe a MT system for English to Hindi in which we use a pre-processing module to transform English data into Hindi-Order to get better phrase alignments. SMT depends heavily on parallel training data available for a language pair. We also show use of machine translation (MT) in improving speech recognition output in the scenario of a computer assisted translation (CAT) system. This thesis addresses problems of data-sparsity and translation quality of a SMT system for English-Hindi and Indo-Aryan language pairs. Statistical machine translation (SMT) depends heavily on parallel training data available for a language pair. Studies that look at in-depth pre-translation strategies for developing translation machine model are green areas for pidgin-English translation. This indicates that the accuracy is dependent on the level and type of hybrid used. From our findings, our hybrid model outperforms the baseline NMT model with a BLEU score of 1.05 on two-level translation. The Bi-Lingual Evaluation Understudy (BLEU) score was employed as a metric of measurement. From the JW300 public dataset, we used 22,047 sentence pairs for training our model,1000 for tuning, and 2520 for testing. In this paper, we propose a hybrid-strategic model that improves the accuracy of the baseline Neural Machine Translation Model (NMT) in translating pidgin English to the English language. To proffer a solution, researchers in machine translation from Pidgin English to the English language have leveraged only unsupervised and supervised Neural Machine Translation-based models. With the development in web technology and the English language dominancy of web content, this growing population stands disadvantaged in understanding content on the web. Despite the diversity, one common point of unification, especially among the West African communities is the spoken pidgin-English language. The African continent is made up of people with rich diverse cultures and spoken languages.
0 Comments
Leave a Reply. |