BERT-base Arabic Hate Speech Detector

Date: July 2025

Overview

Multilingual BERT model achieving 94.4% accuracy in Arabic hate speech detection across five classes — deployed as public API with explainable predictions for content moderation. The system identifies harmful content across five distinct classes with 94.6% F1 Macro score, making it suitable for real-world content moderation platforms.

Problem / Opportunity

Arabic social media platforms struggle with content moderation due to limited Arabic NLP tools and the complexity of detecting hate speech across dialectal variations. Manual moderation is slow and inconsistent, while existing automated tools lack accuracy for Arabic text. There's a critical need for high-accuracy, explainable AI systems that can handle Arabic's unique linguistic challenges and provide transparent decision-making for content moderation teams.

Approach

  1. Selected and fine-tuned multilingual BERT architecture optimized for Arabic hate speech classification across five distinct categories (Neutral, Offensive, Sexism, Religious Discrimination, Racism)
  2. Preprocessed Arabic text dataset handling dialectal variations, diacritics, and social media-specific patterns to improve model robustness
  3. Implemented comprehensive evaluation framework measuring accuracy, F1 Macro, precision, and recall across all five classes to ensure balanced performance
  4. Developed explainability layer providing prediction confidence scores and feature attribution to make model decisions interpretable for content moderators
  5. Deployed production API using Hugging Face Spaces with FastAPI backend enabling real-time inference at scale with documented endpoints
  6. Created interactive web interface allowing users to test the model instantly and understand predictions through visual explanations

Technical Details: Fine-tuning utilized mixed-precision training (fp16), 4 epochs, batch size 32, learning rate 3e-5, and AdamW optimizer on dual NVIDIA T4 GPUs. Model checkpoint selected at epoch 3 (lowest validation loss).

Outcomes & Impact

Model Performance

Deployment & Reach

Business Impact

Sample Prediction

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_id = "hossam87/bert-base-arabic-hate-speech"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "هذا نص عربي للاختبار"
result = classifier(text)
print(result)
        

Project Visualizations

Hate Speech Detection

Live Demo & Repository

Technical Skills

Python PyTorch Transformers Hugging Face BERT Text Classification Machine Learning Arabic NLP

Learnings/Takeaways

- Transfer learning and fine-tuning pre-trained models are highly effective for specialized NLP tasks in non-English languages.
- Interpretability and clear labeling of hate speech are crucial for practical moderation tools.
- Deploying as a web app (Hugging Face Space) bridges research with real-world, accessible applications.

Back to Projects