BERT-base Arabic Hate Speech Detector
Date: July 2025
Overview
Multilingual BERT model achieving 94.4% accuracy in Arabic hate speech detection across five classes — deployed as public API with explainable predictions for content moderation. The system identifies harmful content across five distinct classes with 94.6% F1 Macro score, making it suitable for real-world content moderation platforms.
Problem / Opportunity
Arabic social media platforms struggle with content moderation due to limited Arabic NLP tools and the complexity of detecting hate speech across dialectal variations. Manual moderation is slow and inconsistent, while existing automated tools lack accuracy for Arabic text. There's a critical need for high-accuracy, explainable AI systems that can handle Arabic's unique linguistic challenges and provide transparent decision-making for content moderation teams.
Approach
- Selected and fine-tuned multilingual BERT architecture optimized for Arabic hate speech classification across five distinct categories (Neutral, Offensive, Sexism, Religious Discrimination, Racism)
- Preprocessed Arabic text dataset handling dialectal variations, diacritics, and social media-specific patterns to improve model robustness
- Implemented comprehensive evaluation framework measuring accuracy, F1 Macro, precision, and recall across all five classes to ensure balanced performance
- Developed explainability layer providing prediction confidence scores and feature attribution to make model decisions interpretable for content moderators
- Deployed production API using Hugging Face Spaces with FastAPI backend enabling real-time inference at scale with documented endpoints
- Created interactive web interface allowing users to test the model instantly and understand predictions through visual explanations
Technical Details: Fine-tuning utilized mixed-precision training (fp16), 4 epochs, batch size 32, learning rate 3e-5, and AdamW optimizer on dual NVIDIA T4 GPUs. Model checkpoint selected at epoch 3 (lowest validation loss).
Outcomes & Impact
Model Performance
- 94.4% accuracy across five hate speech categories demonstrating strong classification capability
- 94.6% F1 Macro score indicating balanced performance across all classes without bias toward majority categories
- Precision and recall balanced across all five classes ensuring reliable content moderation
Deployment & Reach
- Public API deployed on Hugging Face Spaces with real-time inference capability
- Real-time inference with sub-2-second average prediction time for responsive user experience
- Interactive demo enabling instant testing and explainability for researchers and content moderators
Business Impact
- Explainable predictions supporting human-in-the-loop moderation workflows through confidence scoring
- Research contribution to Arabic NLP community via open-source model and documentation
- Scalable solution suitable for integration into content moderation pipelines
Sample Prediction
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_id = "hossam87/bert-base-arabic-hate-speech"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "هذا نص عربي للاختبار"
result = classifier(text)
print(result)
Project Visualizations
Live Demo & Repository
Technical Skills
Learnings/Takeaways
- Transfer learning and fine-tuning pre-trained models are highly effective for specialized NLP tasks in non-English languages.
- Interpretability and clear labeling of hate speech are crucial for practical moderation tools.
- Deploying as a web app (Hugging Face Space) bridges research with real-world, accessible applications.