Guideline for Footprints Product

1. Introduction

This document is intended to assist prospective Smile clients in evaluating the outcomes of back-testing for Footprint Score, Phone Footprint and Email Footprint attributes.

Footprint attributes should always be evaluated based on the intended decision use case. Attributes with strong fraud-discriminatory power may not perform well for lending risk, and vice versa.

Clients are advised to:

  • Evaluate KS, IV, and stability metrics within the relevant business context
  • Avoid assessing performance across the full attribute set without segmentation by use case

2. Most Common Use Cases

2.1 Onboarding Use Case

Objective:

  • Prevent fake or unreachable accounts at entry
  • Filter low-quality or disposable identities
  • Minimize onboarding friction

Key principles:

  • Not intended for deep risk discrimination

  • Avoid over-penalizing new or thin-file users


2.1.1 Phone Footprint – Recommended Attributes

  • Validity & Reachability

    • valid, active
  • Phone Type

    • phoneType
  • Disposable Indicators

    • disposable
  • Carrier Presence

    • Carrier exists and is not null

2.1.2 Email Footprint – Recommended Attributes

  • Email Validity

    • registered, deliverable
  • Domain Type

    • isFreeWebmail, disposable
  • Basic Tenure

    • tenure

2.2 KYC Use Case

Objective:

  • Strengthen identity verification

  • Assess identity persistence over time

Key principles:

  • Long-standing identities are more reliable
  • Stability signals are critical

2.2.1 Phone Footprint – Recommended Attributes

  • Tenure & Stability
    • tenure, minTenure, maxTenure
  • Carrier Consistency
    • originalCarrier vs currentCarrier
  • Activity Signals
    • Active over time

2.2.2 Email Footprint – Recommended Attributes

  • Email Age
    • tenure
  • Domain Registration Data
    • creationTime, registrarName
  • Corporate Association
    • companyName

2.3 Fraud Detection Use Case

Objective:

  • Detect synthetic identities and high-risk users

Key principles:

  • Fraud clusters around disposable and abnormal patterns
  • Typically yields high KS performance

2.3.1 Phone Footprint – Recommended Attributes

  • Disposable & Risk Indicators
    • disposable, phoneType
  • Abnormal Tenure
    • Very short tenure, phoneNumberAge
  • Carrier Anomalies
    • Unknown / virtual carriers
  • Activity Inconsistency
    • active = false with recent usage claim

2.3.2 Email Footprint – Recommended Attributes

  • Disposable Indicators
    • disposable
  • Free Webmail + Short Tenure
    • isFreeWebmail + low tenure
  • Domain Breach / Reputation
    • breached
  • Domain Registration Anomalie s
    • Very recent creationTime

2.4 Lending Risk (Credit Risk) Use Case

Objective:

  • Improve credit underwriting decisions

Key principles:

  • Best used as model features, not hard rules
  • Focus on stability and long-term behavior

2.4.1 Phone Footprint – Recommended Attributes

  • Tenure Depth
    • tenure
  • Carrier Stability
    • No frequent carrier switching
  • Long-term Activity
    • Consistently active

2.4.2 Email Footprint – Recommended Attributes

  • Email Age
    • tenure
  • Domain Quality
    • Non-disposable
    • Non-free or long-tenured free domains
  • Reputation
    • breached, firstBreachDate, lastBreachDate
    • Prefer no breach or long time since last breach

2.5 Collection Use Case

Objective:

  • Improve contactability and collection efficiency

Key principles:

  • Not intended for approval or risk scoring

  • Focus on channel effectiveness


2.5.1 Phone Footprint – Recommended Attributes

  • Current Activity
    • active
  • Carrier Reliability
    • Known carrier
  • Carrier Anomalies
    • Unknown / virtual carriers
  • Phone Activity
    • Active status

2.5.2 Email Footprint – Recommended Attributes

  • Deliverability
    • deliverable
  • Email Activity
    • registeredProfileCount
  • Domain Reputation
    • Non-disposable

3. Evaluation Principles for Footprint Data

3.1 Compositional Nature of Footprint Features

Smile Footprint Score and Email/Phone Footprint attributes are compositional by design. They are intended to work collectively within a model, rather than as strong standalone predictors.


3.2 Limitations of Feature-by-Feature Evaluation

3.2.1 Feature Interactions Are Not Captured

  • Many features derive value from interactions.

  • Example:

    • email.registeredProfilesCount may be weak alone,

    • But becomes highly predictive when combined with:

      • mobile.registeredProfilesCount

      • analysis.fraudScore

  • Single-variable evaluation misses these joint effects.


3.2.2 Collinearity Is Misinterpreted

  • Many attributes are correlated by nature (e.g., social/media registration signals)

  • Individual IV:

    • May double-count signal
    • Or underestimate group-level contribution
  • The signal exists at a group level, not always at an individual feature level.


3.2.3 Linear Metrics Are Insufficient

  • IV / KS assume linear and monotonic relationships
  • Footprint data often exhibits:
    • Non-linear patterns
    • Conditional dependencies
  • These are better captured by:
    • Tree-based models
    • Ensemble learning approaches

4. Recommended Evaluation Methodology

4.1 Model-Based Evaluation (Primary Method)

Train models using all footprint features together:

  • Logistic Regression (baseline)

  • Gradient Boosted Models:

    • XGBoost
    • LightGBM

Evaluate on a holdout dataset using:

  • AUC-ROC
  • Gini coefficient

This reflects the true predictive power of the dataset.


4.2 Incremental Lift / Ablation Analysis

  1. Start with an existing / Build a baseline model
  2. Add footprint features
  3. Measure incremental improvement:
    • Δ AUC
    • Δ Gini
    • Business KPIs (approval rate, bad rate)

This shows real contribution to your current stack.


4.3 Feature Contribution Analysis (SHAP)

After model training:

  • Use SHAP values to assess feature importance

Benefits:

  • Accounts for feature interactions
  • Provides directional and magnitude insights
  • More robust than IV for this type of data

5. Summary

  • Footprint data is designed to operate as an ensemble of signals

  • Individual feature testing will underestimate its value

  • The correct evaluation approach is:

    Model-level performance improvement — specifically the incremental AUC (or Gini) gained when adding footprint features to an existing model


6. Key Takeaway

The value of Smile Footprint lies not in isolated variables, but in how these signals interact within a model to improve overall predictive performance across different business use cases.