Transaction Categorization AI: Smart Spending Analysis
Whistl's AI doesn't just categorise transactions—it understands spending context, identifies risky patterns, and detects merchant manipulation. Using natural language processing and machine learning, Whistl transforms raw transaction data into actionable financial insights.
Why Transaction Categorisation Matters
Raw bank transaction data is cryptic and unhelpful:
"POS 3847291 MELBOURNE AU 14/02""SPRTBET*8372918 SYDNEY""AMZN Mktp AU*2K83H Sydney"
Without intelligent categorisation, you can't see patterns. Whistl's AI transforms this data into meaningful categories that reveal spending behaviour.
How Whistl's Categorisation AI Works
Whistl uses a multi-layer approach to transaction categorisation:
Layer 1: Merchant Identification
The AI first identifies the merchant from transaction descriptors:
# Merchant extraction examples "SPRTBET*8372918 SYDNEY" → Sportsbet "CROWN CASINO MELBOURNE" → Crown Casino "AMZN Mktp AU*2K83H" → Amazon "WOOLWORTHS 1234 SYDNEY" → Woolworths "UBER TRIP 14/02" → Uber
Layer 2: Category Classification
Once the merchant is identified, it's classified into spending categories:
Primary Categories
| Category | Examples | Risk Level |
|---|---|---|
| Gambling | Sportsbet, Crown, TAB | Critical |
| Shopping | Amazon, eBay, retailers | High |
| Dining | Restaurants, cafes, delivery | Medium |
| Entertainment | Netflix, Spotify, events | Medium |
| Transport | Uber, fuel, public transport | Low |
| Groceries | Woolworths, Coles | Low |
| Utilities | Electricity, water, internet | Low |
| Healthcare | Pharmacy, doctors | Low |
Layer 3: Risk Assessment
Each transaction is assessed for risk based on multiple factors:
- Merchant type: Gambling = critical risk
- Amount: Large amounts = higher risk
- Time: Late night = elevated risk
- Frequency: Multiple transactions = velocity risk
- Category budget: Over budget = increased risk
Natural Language Processing for Categorisation
Whistl uses NLP to understand transaction descriptions:
Text Preprocessing
# Raw transaction description raw = "SPRTBET*8372918 SYDNEY AU 14/02 19:34" # Preprocessing steps cleaned = raw.lower() # "sprtbet*8372918 sydney au..." cleaned = remove_special_chars(cleaned) # "sprtbet 8372918 sydney au..." cleaned = remove_numbers(cleaned) # "sprtbet sydney au" tokens = tokenize(cleaned) # ["sprtbet", "sydney", "au"]
Merchant Matching
# Merchant database lookup
merchant_db = {
"sprtbet": {"name": "Sportsbet", "category": "gambling", "risk": "critical"},
"crown": {"name": "Crown Casino", "category": "gambling", "risk": "critical"},
"amazon": {"name": "Amazon", "category": "shopping", "risk": "high"},
"woolworths": {"name": "Woolworths", "category": "groceries", "risk": "low"},
}
# Fuzzy matching for variations
def match_merchant(tokens):
for token in tokens:
for merchant_key in merchant_db:
if fuzzy_match(token, merchant_key) > 0.8:
return merchant_db[merchant_key]
return None
Handling Ambiguous Transactions
Some transactions are ambiguous and require context:
"THE STAR SYDNEY"- Could be casino or entertainment venue"CLUBS AUSTRALIA"- Could be RSL club or nightclub"TAB"- Could be betting or newsagency
Whistl uses additional context to disambiguate:
- Transaction amount (betting tends to be specific amounts)
- Time of day (casinos more likely at night)
- Location data (near known venues)
- User correction history
Merchant Embedding Risk Detection
Whistl detects when merchants are similar to known risky merchants:
Embedding Technology
Merchants are represented as vectors in a high-dimensional space:
# Merchant embeddings (simplified)
merchant_vectors = {
"sportsbet": [0.92, 0.15, 0.88, ...], # Close to gambling cluster
"ladbrokes": [0.91, 0.14, 0.87, ...], # Close to sportsbet
"crown": [0.89, 0.12, 0.91, ...], # Close to gambling cluster
"woolworths": [0.12, 0.85, 0.09, ...], # Far from gambling cluster
}
# Similarity calculation
def merchant_risk(new_merchant):
vector = get_embedding(new_merchant)
similarity_to_gambling = cosine_similarity(vector, gambling_centroid)
return similarity_to_gambling
New Merchant Detection
When you transact with a new merchant, Whistl assesses risk:
- Name similarity: Does the name sound like a gambling site?
- Category patterns: Does it fit gambling transaction patterns?
- Amount patterns: Are amounts typical of betting?
- Time patterns: Does timing match gambling behaviour?
Spending Velocity Detection
Whistl tracks spending velocity within categories:
Velocity Calculation
# Spending velocity calculation
def calculate_velocity(category, window_days=7):
recent_spending = sum(
tx.amount for tx in transactions
if tx.category == category and
tx.date >= today - timedelta(days=window_days)
)
historical_average = get_historical_average(category, window_days)
velocity_ratio = recent_spending / historical_average
return velocity_ratio
# Risk thresholds
if velocity_ratio > 2.0:
risk_level = "HIGH" # 2x normal spending
elif velocity_ratio > 1.5:
risk_level = "ELEVATED" # 1.5x normal spending
else:
risk_level = "NORMAL"
Velocity-Based Interventions
When velocity exceeds thresholds, Whistl intervenes:
- 1.5x normal: AI check-in message
- 2.0x normal: SpendingShield elevation
- 3.0x normal: Category block consideration
Category Budget Tracking
Whistl tracks spending against category budgets:
Budget Ratio Calculation
# Budget ratio calculation
def budget_ratio(category):
budget = get_monthly_budget(category)
spent = get_month_to_date_spending(category)
days_remaining = days_in_month - day_of_month
# Projected full month spending
projected = spent / day_of_month * days_in_month
ratio = spent / budget
projected_ratio = projected / budget
return {
"current_ratio": ratio,
"projected_ratio": projected_ratio,
"remaining": budget - spent
}
Budget-Based Risk
| Budget Ratio | Risk Level | Action |
|---|---|---|
| <50% | Low | Normal monitoring |
| 50-80% | Moderate | Budget reminder |
| 80-100% | High | Spending warning |
| >100% | Critical | Category block consideration |
User Corrections and Learning
Whistl learns from user corrections to improve categorisation:
Correction Interface
- Users can recategorise any transaction
- Mark transactions as "not gambling" (false positives)
- Mark transactions as "risky" (false negatives)
- Add notes to transactions
Learning from Corrections
# Learning from user corrections
def learn_from_correction(transaction, new_category):
# Update merchant category mapping
merchant_db[transaction.merchant]["category"] = new_category
# Update model weights
update_nlp_weights(transaction.description, new_category)
# Update similar merchants
for similar_merchant in find_similar_merchants(transaction.merchant):
suggest_category(similar_merchant, new_category)
Privacy: On-Device Categorisation
All transaction categorisation happens on your device:
- Transaction data: Never leaves your phone
- Merchant database: Stored locally, updated securely
- NLP models: Run on-device via Core ML
- Category history: Stored encrypted locally
Effectiveness Data
From categorisation accuracy testing:
| Metric | Result |
|---|---|
| Merchant Identification Accuracy | 94% |
| Category Classification Accuracy | 91% |
| Risk Assessment Accuracy | 87% |
| User Correction Rate | 6% of transactions |
| Learning Improvement (30 days) | +12% accuracy |
User Testimonials
"The categorisation is scary accurate. It knew a transaction was gambling-related even though the descriptor was cryptic." — Marcus, 28
"I love seeing my spending by category. Finally understand where my money goes." — Emma, 26
"When I went over my shopping budget, Whistl caught it immediately. Not weeks later like my old budgeting app." — Sarah, 34
Conclusion
Whistl's transaction categorisation AI transforms cryptic bank data into meaningful insights. By understanding not just what you spent, but what it means for your financial health, Whistl enables proactive protection.
This isn't just categorisation—it's intelligent spending analysis that protects you from yourself.
Get Smart Spending Analysis
Whistl's AI categorises transactions and detects risky patterns. Download free and understand your spending.
Download Whistl FreeRelated: SpendingShield | Spending Velocity | Machine Learning