A robust feature engineering approach for Arabic extremist content detection in social media
DOI:
https://doi.org/10.37868/dss.v7.id316Abstract
In this paper, we propose an end-to-end machine learning system to analyze sentiment polarity in extremist (terrorism-related) Arabic content with novel designed features concentrating on linguistic discourse, properties and changes of such contents. We constructed three corpora (V1, V2 and V3) from Arabic tweets; which have been pre-processed by using various linguistic techniques: Normal stemmer, root pattern, Light Stemming. We have employed various machines' learning algorithms such as SVMs, NB and KNN with BOW and ngr am models to retrieve features. Our large scale comparative analysis based on a real-dataset benchmark chose linear SVM and Uni-gram model in conjunction with Term Frequency-inverse document Frequency (TF-IDF) as the preferable choice. Our approach achieved better accuracy for extremist sentiment detection and greater Recall in V1 (81.097%) and V2 (81.707%) compared to this setup. These ones were superior to other combination of SVM kernels along with the KNN algorithm that also was very competitive. Our findings outperformed the already established approach (Kanan & Fox) for classifying extremist Arabic texts (our BEA as an average achieved accuracy rate higher than their 78.00% but using P-Stemmer and SVM). The precision-recall and ROC AUC values for SVM settings also reinforced the performance, and high scores reflected its ability to handle complex features of Arabic like syllabic lengthening and diacritics. The present study demonstrates the potential applicability of this approach to enhanced supporting extremism detection analysis in Arabic textual data, and may offer a clearer perspective for those concerned on security, education and policy making domains.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ahmed Salman Ibraheem

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
This journal permits and encourages authors to post items/PDFs submitted to the journal on personal websites or institutional repositories after publication, while providing bibliographic details that credit its publication in this journal.




