Customer Churn Predictor Web Application

Objective

To develop and deploy an interactive web application that predicts customer churn probability in real-time, enabling proactive retention strategies based on data-driven insights.

Approach

The project utilized a telecommunications customer dataset, which was first cleaned by handling missing values in 'TotalCharges'. The dataset was then preprocessed using Label Encoding for categorical features and StandardScaler for numerical features like 'tenure' and 'MonthlyCharges'. To address the class imbalance in the target variable (Churn), the SMOTE (Synthetic Minority Over-sampling Technique) was applied to the training data. The core of the predictive model is a soft-voting ensemble classifier combining two fine-tuned models: a Random Forest (200 estimators) and an XGBoost classifier (learning rate 0.2, max depth 10), which were optimized using GridSearchCV. The final prediction threshold was tuned to 0.41 based on the Precision-Recall curve to better balance the trade-off between identifying potential churners and minimizing false positives.

Results

The final ensemble model achieved a test accuracy of 75.7% and a ROC-AUC score of 74.7%. The model demonstrated a strong ability to distinguish between churning and non-churning customers, with a recall of 72% for the churn class and a precision of 53%. The detailed classification report showed an F1-score of 0.61 for the churn class. Key visualizations created during the analysis included correlation matrices, feature distribution plots, and a Precision-Recall curve, which was instrumental in selecting an optimal prediction threshold to meet business objectives.

Project Visualizations

Code Repository

GitHub Repository Live Demo

Technical Skills

Python Pandas Numpy XGBoost scikit-learn Streamlit Plotly Voting Classifier (Ensemble Learning) SMOTE (Imbalanced Data Handling) GridSearchCV (Hyperparameter Tuning)

Learnings/Takeaways

This project provided significant insights into handling imbalanced datasets, demonstrating the effectiveness of SMOTE in improving model performance on the minority class. A key technical lesson was the power of ensemble learning; creating a soft-voting classifier with Random Forest and XGBoost resulted in a more robust and accurate model than either could achieve alone. Furthermore, the project offered practical experience in MLOps by deploying the trained model into an interactive Streamlit web application, making the predictive insights accessible and actionable for non-technical users. It also highlighted the importance of moving beyond accuracy as a metric by using the Precision-Recall curve to tune the decision threshold, thereby aligning the model's output with specific business goals for churn prevention.