Case Study ยท Flagship Project

Interpretable Credit Risk Scoring

A machine learning pipeline that predicts loan default probability across 300K+ applicants โ€” then makes every prediction explainable to risk managers, auditors, and product teams using SHAP and Tableau.

LightGBM SHAP Python Tableau Optuna Home Credit Dataset Fintech ยท Risk
300K+Loan Applicants
0.25Optimized Threshold
6Dashboard Views
SHAPExplainability Layer
Live Dashboard
Interpretable Credit Risk Dashboard โ€” Tableau Public
โ†— Open full screen
Loading dashboard...
โšก Fully interactive โ€” use tabs, filters, and tooltips directly above ยท open in Tableau Public for best experience
The Problem

Many loan applicants โ€” particularly those with limited or no formal credit history โ€” are denied financing not because they're high risk, but because traditional scoring models can't assess them. The Home Credit dataset represents exactly this population: 300K+ applicants where standard bureau data is thin or absent.

The challenge isn't just prediction accuracy. In regulated financial environments, a model that says "denied" without explanation is unusable. Risk managers need to know why. Auditors need documentation. Product teams need actionable thresholds. A black-box ML model, no matter how accurate, fails all three.

Approach
01
Data Cleaning & Feature Engineering
Handled missing values, encoded categorical variables, and addressed class imbalance in the raw Home Credit dataset. Structured outputs into analysis-ready formats.
PandasNumPyClass ImbalanceEncoding
02
LightGBM Modeling with Optuna Tuning
Trained a LightGBM classifier with automated hyperparameter optimization via Optuna. Tuned the decision threshold to 0.25 to maximize recall โ€” prioritizing detection of actual defaulters over false precision.
LightGBMOptunaThreshold TuningAUC-ROC
03
SHAP Explainability Layer
Applied SHAP to generate both global feature importance and individual prediction explanations. Computed SHAP values by risk band so stakeholders can see how risk drivers differ across segments โ€” not just the global picture.
SHAPGlobal ImportanceRisk SegmentationLocal Explanations
04
Tableau-Ready Exports & Dashboard
Structured all model outputs into clean CSV exports purpose-built for Tableau. Built a 6-view interactive dashboard that lets risk teams explore predictions, SHAP values, and model performance without touching any code.
Tableau PublicCSV ExportsDashboard Design
Key Findings
A threshold of 0.25 rather than the default 0.50 significantly improved recall โ€” catching more actual defaulters at the cost of some precision. In credit risk, missing a true default is far more costly than a false positive.
External credit scores (EXT_SOURCE_2, EXT_SOURCE_3) and credit annuity ratio emerged as the strongest global predictors โ€” consistent with domain intuition, which validates the model's behavior to risk managers.
SHAP values revealed that the same features drive risk differently across Low, Medium, and High risk bands โ€” a finding that wouldn't surface from global importance alone, with direct implications for segmented lending policy.
The confusion matrix shows 843 true positives vs. 4,122 false negatives at threshold 0.25 โ€” a deliberate trade-off favoring default detection in a class-imbalanced dataset (91% non-default).
Exporting structured CSVs from the modeling pipeline into Tableau made the model's outputs accessible to compliance, product, and executive audiences โ€” no code required.
Business Implication

This project demonstrates that interpretability isn't a trade-off against accuracy โ€” it's a requirement for deployment. A model that risk teams don't trust won't be used, regardless of its AUC score.

The SHAP + Tableau layer transforms a machine learning output into a decision support tool that risk managers can act on, auditors can review, and executives can understand. In a regulated lending environment, that's the difference between a proof of concept and a production-ready system.