Arabic Hate Speech Detection | BERT NLP Project

Overview

Multilingual BERT model achieving 94.4% accuracy in Arabic hate speech detection across five classes — deployed as public API with explainable predictions for content moderation. The system identifies harmful content across five distinct classes with 94.6% F1 Macro score, making it suitable for real-world content moderation platforms.

Problem / Opportunity

Arabic social media platforms struggle with content moderation due to limited Arabic NLP tools and the complexity of detecting hate speech across dialectal variations. Manual moderation is slow and inconsistent, while existing automated tools lack accuracy for Arabic text. There's a critical need for high-accuracy, explainable AI systems that can handle Arabic's unique linguistic challenges and provide transparent decision-making for content moderation teams.

Approach

Selected and fine-tuned multilingual BERT architecture optimized for Arabic hate speech classification across five distinct categories (Neutral, Offensive, Sexism, Religious Discrimination, Racism)
Preprocessed Arabic text dataset handling dialectal variations, diacritics, and social media-specific patterns to improve model robustness
Implemented comprehensive evaluation framework measuring accuracy, F1 Macro, precision, and recall across all five classes to ensure balanced performance
Developed explainability layer providing prediction confidence scores and feature attribution to make model decisions interpretable for content moderators
Deployed production API using Hugging Face Spaces with FastAPI backend enabling real-time inference at scale with documented endpoints
Created interactive web interface allowing users to test the model instantly and understand predictions through visual explanations

Technical Details: Fine-tuning utilized mixed-precision training (fp16), 4 epochs, batch size 32, learning rate 3e-5, and AdamW optimizer on dual NVIDIA T4 GPUs. Model checkpoint selected at epoch 3 (lowest validation loss).

Outcomes & Impact

Model Performance

94.4% accuracy across five hate speech categories demonstrating strong classification capability
94.6% F1 Macro score indicating balanced performance across all classes without bias toward majority categories
Precision and recall balanced across all five classes ensuring reliable content moderation

Deployment & Reach

Public API deployed on Hugging Face Spaces with real-time inference capability
Real-time inference with sub-2-second average prediction time for responsive user experience
Interactive demo enabling instant testing and explainability for researchers and content moderators

Business Impact

Explainable predictions supporting human-in-the-loop moderation workflows through confidence scoring
Research contribution to Arabic NLP community via open-source model and documentation
Scalable solution suitable for integration into content moderation pipelines

Sample Prediction

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_id = "hossam87/bert-base-arabic-hate-speech"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "هذا نص عربي للاختبار"
result = classifier(text)
print(result)

Project Visualizations

Live Demo & Repository

Live Demo Hugging Face Model Card

Technical Skills

Python PyTorch Transformers Hugging Face BERT Text Classification Machine Learning Arabic NLP

Learnings/Takeaways

- Transfer learning and fine-tuning pre-trained models are highly effective for specialized NLP tasks in non-English languages.
- Interpretability and clear labeling of hate speech are crucial for practical moderation tools.
- Deploying as a web app (Hugging Face Space) bridges research with real-world, accessible applications.

BERT-base Arabic Hate Speech Detector