Cuneiform Text Dialect Identification Using Machine Learning Algorithms and Natural Language Processing (NLP)

Elaf A Saeed,Ammar D Jasim,Munther A Abdul Malik

doi:10.31987/ijict.7.2.265

Abstract

Due to a lack of resources and the tokenization issue, it is challenging to identify the languages inscribed in cuneiform symbols. Sumerian and six dialects of the Akkadian language-Old Babylonian, Middle Babylonian Peripheral, Standard Babylonian, Neo-Babylonian, Late Babylonian, and Neo-Assyrian-are among the seven languages and dialects written in cuneiform that need to be identified. This problem is addressed by the Cuneiform Language Identification task in VarDial 2019. This paper presents ten machine learning algorithms derived from four types of machine learning that were used (supervised, ensemble, instance-based, and Artificial Neural Network) learnings. The Support Vector Machine (SVM), Na Bayes (NB), Logistic Regression (LR), and Decision Tree (DT) algorithms within supervised learning, the K-Nearest Neighbors algorithm (KNN) within instance- based learning, the Random Forest (RF), Adaptive Boosting (Adaboost), Extreme Gradient Boosting (XGBoost), and Gradient Boosting (GB) algorithms within ensemble learning. Also, one of the natural language processing algorithms, n-gram, is used to identify the cuneiform dialect. The best result belongs to an ensemble of Random Forest classifiers working on character-level features with a macro averaged F1 score of 96%, and the best outcome for the n-grams algorithm is 0.82% of di-gram.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cuneiform Text Dialect Identification Using Machine Learning Algorithms and Natural Language Processing (NLP)

Abstract

Published Version

Talk to us

Similar Papers

More From: Iraqi Journal of Information and Communication Technology

Lead the way for us

Journal: Iraqi Journal of Information and Communication Technology	Publication Date: Sep 1, 2024
License type: cc-by

Similar Papers

Bayesian optimization-enhanced ensemble learning for the uniaxial compressive strength prediction of natural rock and its application
Chukwuemeka Daniel ... Yucong Pan
Geohazard Mechanics | VOL. 2
Chukwuemeka Daniel, et. al.Chukwuemeka Daniel ... Yucong Pan
22 May 2024
Geohazard Mechanics | VOL. 2

Detection of DDoS attack in IoT traffic using ensemble machine learning techniques
Nimisha Pandey ... Pramod Kumar Mishra
Networks and Heterogeneous Media | VOL. 18
Nimisha Pandey, et. al.Nimisha Pandey ... Pramod Kumar Mishra
01 Jan 2023
Networks and Heterogeneous Media | VOL. 18

Estimation of Postal Service Delivery Time and Energy Cost with E-Scooter by Machine Learning Algorithms
Hakan İnaç ... Yunus Emre Ayözen
Applied Sciences | VOL. 12
Hakan İnaç, et. al.Hakan İnaç ... Yunus Emre Ayözen
30 Nov 2022
Applied Sciences | VOL. 12

Ensemble Learners of Multiple Deep CNNs for Pulmonary Nodules Classification Using CT Images
Baihua Zhang ... Fan Yang
IEEE Access | VOL. 7
Baihua Zhang, et. al.Baihua Zhang ... Fan Yang
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cuneiform Text Dialect Identification Using Machine Learning Algorithms and Natural Language Processing (NLP)

Abstract

Published Version

Talk to us

Similar Papers

More From: Iraqi Journal of Information and Communication Technology