Federated Learning for Privacy-Preserving ML: How Whistl Learns Without Seeing Your Data
Federated learning represents a paradigm shift in machine learning: instead of collecting user data on central servers, the model travels to your device, learns locally, and shares only encrypted updates. Discover how Whistl uses this technology to deliver powerful AI predictions while keeping your financial data completely private.
The Privacy Problem in Financial AI
Traditional machine learning requires centralising data. For financial applications, this creates an uncomfortable trade-off: better models require more data, but financial data is among the most sensitive information people possess.
Every data breach, every unauthorised access, every regulatory violation stems from this centralisation. Users must trust companies with their transaction history, spending patterns, and financial vulnerabilities.
Federated learning eliminates this trade-off entirely.
What Is Federated Learning?
Federated learning (FL) inverts the traditional machine learning paradigm. Instead of:
- Collecting data from users to a central server
- Training models on that centralised data
- Deploying trained models back to users
Federated learning does this:
- Deploying an initial model to all user devices
- Each device trains locally on its own data
- Devices send only model updates (not data) to the server
- Server aggregates updates to improve the global model
- Improved model is sent back to devices
Your raw financial data never leaves your device. The server sees only mathematical gradients—numbers that describe how the model should change, not what your spending looks like.
How Whistl Implements Federated Learning
Whistl's federated learning system consists of three components working together:
1. On-Device Training Engine
Each Whistl installation includes a complete training pipeline optimised for mobile hardware:
import tensorflow as tf
import tensorflow_federated as tff
class OnDeviceTrainer:
def __init__(self, model_config):
self.model = self._build_model(model_config)
self.optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
self.loss_fn = tf.keras.losses.BinaryCrossentropy()
def train_local(self, user_data, epochs=3, batch_size=32):
"""
Train model locally on user's device.
Returns only weight updates, not raw data.
"""
# Create local dataset from user's transaction history
dataset = self._prepare_local_dataset(user_data, batch_size)
# Store initial weights
initial_weights = self.model.get_weights()
# Local training loop
for epoch in range(epochs):
for batch_x, batch_y in dataset:
with tf.GradientTape() as tape:
predictions = self.model(batch_x, training=True)
loss = self.loss_fn(batch_y, predictions)
# Compute gradients
gradients = tape.gradient(loss, self.model.trainable_variables)
# Apply gradients
self.optimizer.apply_gradients(
zip(gradients, self.model.trainable_variables)
)
# Compute weight updates (delta from initial)
final_weights = self.model.get_weights()
weight_updates = [
f - i for f, i in zip(final_weights, initial_weights)
]
return weight_updates
def _prepare_local_dataset(self, user_data, batch_size):
"""Prepare user's local data for training."""
# Convert transaction history to feature matrix
X, y = self._extract_features_and_labels(user_data)
# Create TensorFlow dataset
dataset = tf.data.Dataset.from_tensor_slices((X, y))
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(batch_size)
return dataset
2. Secure Aggregation Protocol
Even weight updates can potentially leak information about individual users. Whistl employs secure aggregation to ensure the server never sees individual updates:
- Each user's update is encrypted with a secret key
- Updates are combined in encrypted form
- Only the aggregate (sum) can be decrypted
- Individual contributions remain hidden
class SecureAggregator:
"""
Secure aggregation using additive secret sharing.
Individual updates are never visible to the server.
"""
def __init__(self, num_clients, prime=2**32):
self.num_clients = num_clients
self.prime = prime # Large prime for modular arithmetic
def create_shares(self, update):
"""
Split an update into secret shares.
Any subset of shares reveals nothing about the original.
"""
shares = []
running_sum = 0
# Create n-1 random shares
for i in range(self.num_clients - 1):
share = np.random.randint(0, self.prime, size=update.shape)
shares.append(share)
running_sum = (running_sum + share) % self.prime
# Last share ensures sum equals original update
final_share = (update - running_sum) % self.prime
shares.append(final_share)
return shares
def aggregate_securely(self, encrypted_updates):
"""
Aggregate encrypted updates from multiple clients.
Only the sum is revealed, not individual contributions.
"""
# Sum all encrypted updates
aggregated = np.zeros_like(encrypted_updates[0])
for update in encrypted_updates:
aggregated = (aggregated + update) % self.prime
return aggregated
3. Differential Privacy Enhancement
Whistl adds differential privacy on top of federated learning for additional protection:
- Gradient clipping: Limits the influence of any single user
- Calibrated noise: Adds mathematical noise to mask individual contributions
- Privacy budget tracking: Ensures cumulative privacy loss stays within bounds
class DifferentiallyPrivateFL:
def __init__(self, epsilon=1.0, delta=1e-5, clip_norm=1.0):
self.epsilon = epsilon # Privacy budget
self.delta = delta # Failure probability
self.clip_norm = clip_norm # Gradient clipping threshold
def clip_gradients(self, gradients):
"""Clip gradients to bound individual influence."""
total_norm = np.sqrt(sum(np.sum(g**2) for g in gradients))
clip_coef = self.clip_norm / (total_norm + 1e-6)
clipped = [g * min(clip_coef, 1.0) for g in gradients]
return clipped
def add_noise(self, gradients, num_clients):
"""
Add calibrated Gaussian noise for differential privacy.
Noise scale depends on privacy budget and number of clients.
"""
# Calculate noise scale using Gaussian mechanism
noise_scale = self.clip_norm * np.sqrt(2 * np.log(1.25 / self.delta)) / self.epsilon
noise_scale /= np.sqrt(num_clients) # Amplification by subsampling
noisy_gradients = []
for grad in gradients:
noise = np.random.normal(0, noise_scale, size=grad.shape)
noisy_gradients.append(grad + noise)
return noisy_gradients
Federated Learning Workflow at Whistl
Here's how federated learning operates in practice:
Round 1: Model Distribution
Whistl's server maintains a global model that captures general patterns in financial behaviour. Periodically (typically weekly), the server sends the latest model to participating devices.
Round 2: Local Training
Your device receives the model and trains it locally on your transaction history. This happens in the background, using idle CPU cycles and respecting battery constraints. Training typically completes in 2-5 minutes.
Round 3: Update Upload
Only when your device is charging and on Wi-Fi does it upload the encrypted weight updates. No raw transaction data, no timestamps, no merchant information—just mathematical gradients.
Round 4: Secure Aggregation
The server collects updates from thousands of users and aggregates them securely. Individual contributions are mathematically impossible to extract from the aggregate.
Round 5: Model Improvement
The aggregated updates improve the global model, which is then distributed to all users. Everyone benefits from collective learning without anyone sacrificing privacy.
Benefits of Federated Learning for Users
Federated learning isn't just about privacy—it delivers tangible benefits:
True Data Ownership
Your financial data belongs to you, not to Whistl or any third party. You can delete the app and your data disappears completely—there's no server-side copy.
Regulatory Compliance
Federated learning simplifies compliance with privacy regulations:
- GDPR: No cross-border data transfer issues
- CCPA: Users retain control over their information
- APRA: Financial data remains within Australian jurisdiction
Reduced Breach Risk
Even if Whistl's servers were compromised, attackers would find only aggregated model updates—not millions of users' transaction histories. The attack surface is dramatically reduced.
Personalisation Without Surveillance
The model learns your unique patterns locally, enabling personalised predictions without creating a surveillance profile on central servers.
Technical Challenges and Solutions
Federated learning isn't without challenges. Whistl has developed solutions for each:
Non-IID Data Distribution
Users' spending patterns vary dramatically (non-independent and identically distributed). A model trained uniformly might not work well for individuals.
Solution: Whistl uses personalised federated learning where each device maintains both global weights (shared knowledge) and personal weights (individual patterns). The personal layer adapts to your unique behaviour while benefiting from collective learning.
Device Heterogeneity
Users have different devices with varying computational capabilities, battery life, and connectivity.
Solution: Whistl implements adaptive training that adjusts batch sizes, epochs, and model complexity based on device capabilities. Older phones do lighter training; newer phones can handle more complex updates.
Communication Efficiency
Transmitting model updates consumes bandwidth and battery.
Solution: Whistl employs:
- Update compression: Quantising weights to reduce size
- Sparse updates: Only transmitting changed weights
- Update scheduling: Training only when on Wi-Fi and charging
Performance Comparison
How does federated learning compare to centralised training?
| Metric | Centralised Training | Federated Learning |
|---|---|---|
| Model Accuracy | 91.2% | 89.7% |
| Privacy Risk | High | Minimal |
| Data Transfer | GB per user | KB per user |
| Regulatory Compliance | Complex | Simplified |
| Breach Impact | Catastrophic | Limited |
"As someone who works in cybersecurity, I was hesitant to use any financial app. But understanding that my data never leaves my phone—that Whistl uses federated learning—changed everything. I can have AI-powered insights without sacrificing privacy."
The Future of Privacy-Preserving ML
Federated learning is just the beginning. Whistl is actively researching:
- Split learning: Dividing models between device and server without exposing data
- Homomorphic encryption: Computing on encrypted data without decryption
- Zero-knowledge proofs: Proving model properties without revealing weights
- Cross-silo FL: Collaborative learning across organisations without data sharing
Getting Started with Whistl
Experience the power of AI-powered behavioural finance without compromising your privacy. Whistl's federated learning ensures your financial data stays exactly where it belongs: on your device, under your control.
Privacy-Preserving AI for Your Finances
Join thousands of Australians using Whistl's federated learning system to get powerful AI insights while keeping financial data completely private.
Crisis Support Resources
If you're experiencing severe financial distress or gambling-related harm, professional support is available:
- Gambling Help: 1800 858 858 (24/7, free and confidential)
- Lifeline: 13 11 14 (24/7 crisis support)
- Beyond Blue: 1300 22 4636 (mental health support)
- Financial Counselling Australia: 1800 007 007