IIT Roorkee researchers have developed an efficient method for Sanskrit text sentiment analysis.
Sanskrit is one of the world’s most ancient languages; however, natural language processing tasks such as machine translation and sentiment analysis have not been explored for it to the full potential because of the unavailability of sufficient labeled data.
The proposed technique has achieved 87.50% accuracy for machine translation and 92.83% accuracy for sentiment classification.
HOW THE IIT ROORKEE METHOD WORKS
The research proposed a method that comprises models for machine translation, translation evaluation, and sentiment analysis.
The machine translations have been used as cross-lingual mapping of the source and the target language. The obtained English translations are sufficiently mature and natural as the original English sentences.
Professor Balasubramanian Raman, Department of Computer Science, IIT Roorkee, said, "We have trained our model to predict sentiment scores in the range of positive, neutral, or negative. And the model uses statistics, natural language processing, and machine learning to determine the sentiment with over 90% accuracy."
The dataset to perform this research was taken from the Valmiki Ramayana website developed and maintained by the IIT Kanpur researchers.
The team involved in this research are Professor Balasubramanian Raman, Department of Computer Science and Engineering and his PhD student Puneet Kumar, and MSc student Kshitij Pathania, Department of Mathematics.
The model has been published as a research paper in a reputed peer-reviewed journal Applied Intelligence.
The future plans of the researchers are to exploit the morphological properties of Sanskrit for better classification using only root words with their respective suffixes and prefix.
It is also planned to evaluate whether the morphological richness of Sanskrit is retained while translating to English.
Moreover, the researchers also plan to obtain a model that discerns the context of words in multiple languages and provides word embeddings of lesser dimensions.