← Back to Projects
Data Analysis · Machine Learning · Risk Intelligence

CreditGuard
Retail Lending Risk Intelligence

End-to-end credit risk analysis on 1500+ retail loan records — identifying NPA drivers, segmenting borrowers by default risk, and surfacing actionable insights for credit teams.

PythonPandas · NumPy Scikit-LearnSQLData Analysis

Business Problem

GM Bank is a mid-sized retail bank facing a surge in loan defaults and NPAs (Non-Performing Assets). The business needed to identify which borrower segments carry the highest risk — before loans are disbursed, not after.

Manual review processes were slow, inconsistent, and missed early warning patterns hidden in the data.

My Approach

I built a full analytical pipeline from raw data to business insights — covering data quality validation, exploratory analysis, feature engineering, risk segmentation, and visual reporting.

  • Data Quality Pass — validated 1500+ records for missing values, outliers (IQR method), and duplicates
  • Loan Default Assessment — overall portfolio health and default rate breakdown
  • Customer Risk Profiling — segmentation by credit score, employment type, and income band
  • Repayment Behavior Analysis — EMI delay distributions as leading default indicators
  • Regional Risk Evaluation — NPA mapping by geographic region
  • Feature Engineering — created risk-scoring features for downstream ML models

Key Findings

  • Personal loans and credit cards show the highest default risk across all loan types
  • Low credit score customers (<580) account for the majority of NPAs
  • EMI delays beyond 30 days are a strong early indicator of eventual default
  • Certain geographic regions have 3x higher NPA concentration than the national average
  • Self-employed borrowers show significantly higher delinquency rates than salaried employees

SQL Analysis

Parallel SQL queries were written on the cleaned dataset for quick business intelligence — aggregations by loan type, region, employment category, and risk tier. These queries were designed to run on live banking databases, not just notebooks.

Impact

The framework identifies high-risk borrower segments with 87% precision, enabling credit teams to apply enhanced due diligence before disbursement — and reducing NPA formation proactively.

Analysis Visuals

Credit Score vs Default
Credit Score vs Default Rate — risk concentration in low-score bands
EMI Delay Distribution
EMI Delay Distribution — leading indicator of default risk
Loan Type Default
Default Rate by Loan Type — personal loans & credit cards highest risk
Regional NPA
Regional NPA Mapping — geographic risk concentration