1. Introduction

This document is intended to assist prospective Smile clients in evaluating the outcomes of back-testing for Footprint Score, Phone Footprint and Email Footprint attributes.

Footprint attributes should always be evaluated based on the intended decision use case.
Attributes with strong fraud-discriminatory power may not perform well for lending risk, and vice versa.

Clients are advised to:

Evaluate KS, IV, and stability metrics within the relevant business context
Avoid assessing performance across the full attribute set without segmentation by use case

2. Most Common Use Cases

2.1 Onboarding Use Case

Objective:

Prevent fake or unreachable accounts at entry
Filter low-quality or disposable identities
Minimize onboarding friction

Key principles:

Not intended for deep risk discrimination
Avoid over-penalizing new or thin-file users

2.1.1 Phone Footprint – Recommended Attributes

Validity & Reachability
- valid, active
Phone Type
- phoneType
Disposable Indicators
- disposable
Carrier Presence
- Carrier exists and is not null

2.1.2 Email Footprint – Recommended Attributes

Email Validity
- registered, deliverable
Domain Type
- isFreeWebmail, disposable
Basic Tenure
- tenure

2.2 KYC Use Case

Objective:

Strengthen identity verification
Assess identity persistence over time

Key principles:

Long-standing identities are more reliable
Stability signals are critical

2.2.1 Phone Footprint – Recommended Attributes

Tenure & Stability
- tenure, minTenure, maxTenure
Carrier Consistency
- originalCarrier vs currentCarrier
Activity Signals
- Active over time

2.2.2 Email Footprint – Recommended Attributes

Email Age
- tenure
Domain Registration Data
- creationTime, registrarName
Corporate Association
- companyName

2.3 Fraud Detection Use Case

Objective:

Detect synthetic identities and high-risk users

Key principles:

Fraud clusters around disposable and abnormal patterns
Typically yields high KS performance

2.3.1 Phone Footprint – Recommended Attributes

Disposable & Risk Indicators
- disposable, phoneType
Abnormal Tenure
- Very short tenure, phoneNumberAge
Carrier Anomalies
- Unknown / virtual carriers
Activity Inconsistency
- active = false with recent usage claim

2.3.2 Email Footprint – Recommended Attributes

Disposable Indicators
- disposable
Free Webmail + Short Tenure
- isFreeWebmail + low tenure
Domain Breach / Reputation
- breached
Domain Registration Anomalie
s
- Very recent creationTime

2.4 Lending Risk (Credit Risk) Use Case

Objective:

Improve credit underwriting decisions

Key principles:

Best used as model features, not hard rules
Focus on stability and long-term behavior

2.4.1 Phone Footprint – Recommended Attributes

Tenure Depth
- tenure
Carrier Stability
- No frequent carrier switching
Long-term Activity
- Consistently active

2.4.2 Email Footprint – Recommended Attributes

Email Age
- tenure
Domain Quality
- Non-disposable
- Non-free or long-tenured free domains
Reputation
- breached, firstBreachDate, lastBreachDate
- Prefer no breach or long time since last breach

2.5 Collection Use Case

Objective:

Improve contactability and collection efficiency

Key principles:

Not intended for approval or risk scoring
Focus on channel effectiveness

2.5.1 Phone Footprint – Recommended Attributes

Current Activity
- active
Carrier Reliability
- Known carrier
Carrier Anomalies
- Unknown / virtual carriers
Phone Activity
- Active status

2.5.2 Email Footprint – Recommended Attributes

Deliverability
- deliverable
Email Activity
- registeredProfileCount
Domain Reputation
- Non-disposable

3. Evaluation Principles for Footprint Data

3.1 Compositional Nature of Footprint Features

Smile Footprint Score and Email/Phone Footprint attributes are compositional by design.
They are intended to work collectively within a model, rather than as strong standalone predictors.

3.2 Limitations of Feature-by-Feature Evaluation

3.2.1 Feature Interactions Are Not Captured

Many features derive value from interactions.
Example:
- email.registeredProfilesCount may be weak alone,
- But becomes highly predictive when combined with:
  - mobile.registeredProfilesCount
  - analysis.fraudScore
Single-variable evaluation misses these joint effects.

3.2.2 Collinearity Is Misinterpreted

Many attributes are correlated by nature (e.g., social/media registration signals)
Individual IV:
- May double-count signal
- Or underestimate group-level contribution
The signal exists at a group level, not always at an individual feature level.

3.2.3 Linear Metrics Are Insufficient

IV / KS assume linear and monotonic relationships
Footprint data often exhibits:
- Non-linear patterns
- Conditional dependencies
These are better captured by:
- Tree-based models
- Ensemble learning approaches

4. Recommended Evaluation Methodology

4.1 Model-Based Evaluation (Primary Method)

Train models using all footprint features together:

Logistic Regression (baseline)
Gradient Boosted Models:
- XGBoost
- LightGBM

Evaluate on a holdout dataset using:

AUC-ROC
Gini coefficient

This reflects the true predictive power of the dataset.

4.2 Incremental Lift / Ablation Analysis

Start with an existing / Build a baseline model
Add footprint features
Measure incremental improvement:
- Δ AUC
- Δ Gini
- Business KPIs (approval rate, bad rate)

This shows real contribution to your current stack.

4.3 Feature Contribution Analysis (SHAP)

After model training:

Use SHAP values to assess feature importance

Benefits:

Accounts for feature interactions
Provides directional and magnitude insights
More robust than IV for this type of data

5. Summary

Footprint data is designed to operate as an ensemble of signals
Individual feature testing will underestimate its value
The correct evaluation approach is:

Model-level performance improvement — specifically the incremental AUC (or Gini) gained when adding footprint features to an existing model

6. Key Takeaway

The value of Smile Footprint lies not in isolated variables, but in how these signals interact within a model to improve overall predictive performance across different business use cases.