1. Introduction

This document provides a standardized framework to evaluate the effectiveness of Smile’s MAW product during backtesting and live POC phases, ensuring alignment between:

Blacklist identification logic
Risk segmentation performance
Business-specific approval strategies

2. Blacklist Definition & Data Handling

2.1 Blacklist Decision Rule

True Blacklist:
- L5 level signals in SMS and voice represent confirmed high-risk / fraud / delinquency users

2.2 Seat Data Treatment

Seat data must be excluded from blacklist count calculations

Reason:

Seat data includes mixed operational activities:
- Marketing outreach
- Customer service calls
- Payment reminders
- Collection activities
These introduce label noise and distort blacklist precision

However:

Seat data should still be used as behavioral signals in risk analysis (not labeling)

3. Scenario-Based Risk Evaluation Framework

3.1 New Customer Acquisition (Data Coverage ≥ 80%)

3.1.1 Indicators for Increasing Approval (Pass Rate ↑)

Profiles considered lower risk:

Presence of:
- Multiple L1 / L3 signals in SMS or voice
- No L4 / L5 signals
Seat data signals:
- L5 exists with valid call duration (call duration is not 0)
- Requires validation via:
  - Connection frequency
  - Call duration consistency

Interpretation:

Likely normal financial behavior / engaged users

3.1.2 Indicators for Reducing Approval (Pass Rate ↓)

Profiles considered higher risk:

Abnormal recent activity (last ~7 days):
- Multiple outbound attempts
- No successful connections (0 duration)
Risk signals in telco data:
- Multiple L4 hits
- Even small number of L5 hits
Bullish / aggressive borrowing behavior:
- High distinct CID count
- Example threshold:
  - more than 10 unique CIDs

Interpretation:

Indicates over-leveraging, potential fraud, or credit stress

3.2 Existing Customers (Credit Limit Increase / Repeat Loans)

3.2.1 High-Quality Customer Signals

Seat data:
- L5 with normal call duration (healthy engagement)
Borrowing behavior:
- Controlled CID exposure:
  - Suggested range: 3–5 CID hits
Risk outcome indicators:
- No signs of default risk
- CID behavior not in extreme ranges

Interpretation:

Eligible for credit line increase / retention strategies

3.2.2 Fraud / Uncertain Risk Signals

Very low CID activity:
- 0–1 CID hits

Interpretation:

Insufficient behavioral data
Must be cross-validated with other data sources (e.g., device, KYC, bureau)

4. Key Evaluation Parameters

4.1 Time Window Configuration

Must be aligned with product type:

New Customers:
- Short-term loans:
  - Last 6–12 months
- Installment products:
  - Full historical dataset preferred
Existing Customers:
- Based on:
  - Loan tenure
  - Repayment cycle
  - Credit review frequency

4.2 Dynamic Threshold Setting

Thresholds must not be static and should be calibrated per client:

Examples:

Number of dial attempts
Connection rate
Distinct CID count
L4/L5 hit frequency

Adjustment drivers:

Target segment risk profile
Approval rate targets
Local market behavior (PH / ID / LATAM / Africa differences)

5. Backtesting Methodology

5.1 Sample Segmentation

Split test population into:

Approved vs Rejected (client decision baseline)
Good vs Bad (actual repayment outcome)
MAW risk tiers (L1–L5 distribution)

5.2 Core Evaluation Metrics

Hit Rate (Blacklist Detection):
- % of bad users correctly identified via L5
KS / AUC (if combined into score)
Approval Rate Impact:
- Pass rate change when applying MAW rules
Bad Rate Improvement:
- Compare:
  - With MAW rules
  - Without MAW rules

5.3 Rule Simulation

Simulate scenarios:

Reject:
- Any L5 (SMS/voice)
- High CID count (> threshold)
- Multiple L4
Approve:
- Only L1–L3
- Stable seat engagement

Then compare:

Approval rate vs Bad rate tradeoff

6. POC / Live Testing Recommendations

Start with:
- Partial traffic rollout (10%–30%)
Monitor:
- Early delinquency (D7, D14)
- Approval rate shift
Gradually adjust:
- CID thresholds
- L4/L5 sensitivity
- Time windows

7. Summary

MAW is behavioral telco-derived data, not a standalone credit score
Best performance achieved when combined with:
- Device intelligence
- Credit bureau
- Alternative data (e.g., Smile's Footprint Score)
Seat data:
- Valuable for behavior
- Not suitable for labeling