On-Device ML: Core ML vs TensorFlow Lite Implementation
Whistl runs all AI processing on your device—never in the cloud. This technical deep dive compares Core ML (iOS) and TensorFlow Lite (Android), explaining model conversion, hardware acceleration, performance optimisation, and why on-device processing is essential for financial privacy.
Why On-Device Machine Learning?
Cloud-based AI requires sending sensitive data to remote servers. For a financial behaviour app, this creates unacceptable risks:
- Privacy exposure: Transaction data, location, biometrics leave your device
- Latency: Network round-trip adds 100-500ms delay
- Offline failure: No connectivity = no protection
- Cost: Cloud inference at scale is expensive
On-device ML solves all four problems while enabling real-time intervention.
Core ML (iOS Implementation)
Apple's Core ML framework provides native machine learning support for iOS, iPadOS, and macOS.
Model Format and Conversion
Whistl's neural network is trained in PyTorch, then converted to Core ML format:
# PyTorch to Core ML conversion
import coremltools as ct
# Load trained PyTorch model
torch_model = torch.load('whistl_impulse_predictor.pt')
torch_model.eval()
# Create example input (56 features)
example_input = torch.randn(1, 56)
# Trace and convert
traced_model = torch.jit.trace(torch_model, example_input)
mlmodel = ct.convert(
traced_model,
inputs=[ct.TensorType(shape=example_input.shape, name='features')],
convert_to='mlprogram' # MIL backend for best performance
)
# Save Core ML model
mlmodel.save('WhistlImpulsePredictor.mlpackage')
Model Architecture in Core ML
Model: WhistlImpulsePredictor Input: 56 Float32 features Output: 1 Float32 probability Layer Configuration: ├── Input (56) ├── Dense(56→128) + ReLU ├── Dense(128→64) + ReLU ├── Dense(64→32) + ReLU ├── Dense(32→1) + Sigmoid └── Output (1) Model Size: 450KB (compressed) Quantisation: Float16 (optional Int8 for smaller size)
Hardware Acceleration
Core ML automatically routes inference to the optimal hardware:
| Device | Neural Engine | GPU | CPU |
|---|---|---|---|
| iPhone 12+ | 16-core (primary) | Fallback | Fallback |
| iPhone 11 | 8-core (primary) | Fallback | Fallback |
| iPhone X/XS | Not available | Primary | Fallback |
| iPad Pro | 16-core (primary) | Fallback | Fallback |
Neural Engine delivers 15x faster inference than CPU with 1/10th the power consumption.
Inference Code (Swift)
import CoreML
class ImpulsePredictor {
private let model: WhistlImpulsePredictor
init() {
let config = MLModelConfiguration()
config.computeUnits = .all // Use Neural Engine + GPU + CPU
self.model = try! WhistlImpulsePredictor(configuration: config)
}
func predict(features: [Float]) -> Float {
// Convert array to MLMultiArray
let multiArray = try! MLMultiArray(shape: [56], dataType: .float32)
for (index, value) in features.enumerated() {
multiArray[index] = NSNumber(value: value)
}
// Run inference
let output = try! model.prediction(features: multiArray)
return output.probability
}
}
// Usage
let predictor = ImpulsePredictor()
let risk = predictor.predict(features: userFeatures)
if risk > 0.6 {
activateIntervention()
}
Performance Benchmarks (iOS)
| Device | Neural Engine | Inference Time | Power Draw |
|---|---|---|---|
| iPhone 15 Pro | 16-core | 3.2ms | 12mW |
| iPhone 14 | 16-core | 4.1ms | 15mW |
| iPhone 13 | 16-core | 5.8ms | 18mW |
| iPhone 12 | 8-core | 8.4ms | 22mW |
| iPhone 11 | 8-core | 12.1ms | 28mW |
All devices achieve real-time inference (<50ms) with negligible battery impact.
TensorFlow Lite (Android Implementation)
Google's TensorFlow Lite provides on-device ML for Android and other platforms.
Model Format and Conversion
PyTorch models are converted to TFLite format via ONNX:
# PyTorch to TFLite conversion (via ONNX)
import torch
import onnx
import tensorflow as tf
# Export PyTorch to ONNX
torch_model = torch.load('whistl_impulse_predictor.pt')
dummy_input = torch.randn(1, 56)
torch.onnx.export(
torch_model,
dummy_input,
'whistl_model.onnx',
input_names=['features'],
output_names=['probability'],
opset_version=13
)
# Convert ONNX to TFLite
converter = tf.lite.TFLiteConverter.from_onnx_file('whistl_model.onnx')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()
# Save TFLite model
with open('whistl_impulse_predictor.tflite', 'wb') as f:
f.write(tflite_model)
Model Quantisation
TFLite supports aggressive quantisation for smaller model size:
| Quantisation Type | Model Size | Accuracy Loss | Speed Gain |
|---|---|---|---|
| Float32 (full precision) | 900KB | 0% | 1.0x |
| Float16 (half precision) | 450KB | <0.1% | 1.5x |
| Int8 (full integer) | 225KB | 0.3-0.5% | 2.5x |
Whistl uses Float16 quantisation—50% size reduction with negligible accuracy impact.
Hardware Acceleration (Android)
TFLite delegates inference to available hardware accelerators:
// TFLite Interpreter with delegates
val interpreterOptions = Interpreter.Options()
// Try GPU delegate first (fastest for most devices)
try {
val gpuDelegate = GpuDelegate()
interpreterOptions.addDelegate(gpuDelegate)
} catch (e: Exception) {
// GPU not available
}
// Try NNAPI delegate (Android Neural Networks API)
try {
val nnapiDelegate = NnApiDelegate()
interpreterOptions.addDelegate(nnapiDelegate)
} catch (e: Exception) {
// NNAPI not available
}
// Fallback to CPU (always available)
val interpreter = Interpreter(modelBuffer, interpreterOptions)
Inference Code (Kotlin)
import org.tensorflow.lite.Interpreter
import org.tensorflow.lite.support.common.FileUtil
class ImpulsePredictor(context: Context) {
private var interpreter: Interpreter
init {
val model = FileUtil.loadMappedFile(context, "whistl_impulse_predictor.tflite")
val options = Interpreter.Options()
options.setNumThreads(4) // Use 4 CPU threads
options.setUseXNNPACK(true) // Enable XNNPACK delegate
interpreter = Interpreter(model, options)
}
fun predict(features: FloatArray): Float {
val input = features.reshape(1, 56)
val output = Array(1) { FloatArray(1) }
interpreter.run(input, output)
return output[0][0]
}
fun close() {
interpreter.close()
}
}
Performance Benchmarks (Android)
| Device | Accelerator | Inference Time | Power Draw |
|---|---|---|---|
| Pixel 8 Pro | Tensor G3 TPU | 4.5ms | 14mW |
| Samsung S24 | Snapdragon 8 Gen 3 | 5.2ms | 16mW |
| Pixel 7 | Tensor G2 TPU | 6.8ms | 19mW |
| OnePlus 11 | Snapdragon 8 Gen 2 | 7.4ms | 21mW |
| Pixel 6 | Tensor G1 TPU | 9.1ms | 24mW |
Core ML vs TensorFlow Lite: Comparison
| Feature | Core ML (iOS) | TensorFlow Lite (Android) |
|---|---|---|
| Model Format | .mlpackage | .tflite |
| Hardware Acceleration | Neural Engine (dedicated) | GPU/NNAPI/TPU (varies) |
| Conversion Complexity | Moderate (direct PyTorch) | Higher (via ONNX) |
| Model Size (Float16) | 450KB | 450KB |
| Avg Inference Time | 5.8ms | 6.6ms |
| Power Efficiency | Excellent (dedicated NPU) | Good (shared GPU/TPU) |
| Offline Support | Full | Full |
| Privacy | On-device only | On-device only |
| Dynamic Updates | App Store required | Play Store or OTA |
| Debugging Tools | Xcode Core ML debugger | TFLite Model Explorer |
Model Update Strategy
Whistl updates ML models through different mechanisms for each platform:
iOS: App Store Updates
- Process: New model bundled with app update
- Frequency: Monthly model improvements
- Advantage: Guaranteed model integrity
- Disadvantage: Requires full app download
Android: OTA Model Downloads
- Process: Play Feature Delivery or custom CDN
- Frequency: Weekly model improvements
- Advantage: Smaller downloads, faster iteration
- Disadvantage: Requires network connectivity
Model Versioning
{
"model_version": "2026.03.01",
"architecture": "feedforward_56_128_64_32_1",
"quantisation": "float16",
"training_date": "2026-02-28",
"accuracy": 0.842,
"min_ios_version": "15.0",
"min_android_api": 26,
"changelog": [
"Improved payday proximity detection",
"Enhanced HRV feature weighting",
"Reduced false positives for shopping"
]
}
Federated Learning for Privacy
Whistl uses federated learning to improve models without collecting raw data:
Federated Learning Workflow
- Local training: Each device trains on personal data overnight
- Gradient computation: Calculate weight updates (not raw data)
- Differential privacy: Add calibrated noise to gradients
- Secure upload: Encrypted gradient transmission to server
- Aggregation: Server averages gradients from thousands of devices
- Global update: Improved model distributed to all users
Privacy Guarantees
- Raw data never leaves device: Only gradient updates transmitted
- Differential privacy: ε=0.1 privacy budget per update
- Secure aggregation: Server sees only aggregated updates
- Device-level encryption: TLS 1.3 for all communications
Battery Optimisation
On-device ML must be power-efficient. Whistl implements several optimisations:
Batch Processing
Instead of continuous inference, Whistl batches predictions:
- Normal mode: Predict every 5 minutes
- Elevated risk: Predict every 1 minute
- High risk: Predict every 30 seconds
- Sleep mode: Predict every 30 minutes (when stationary + night)
Adaptive Frequency
func calculateInferenceInterval(riskScore: Float) -> TimeInterval {
switch riskScore {
case 0.0..<0.4:
return 300 // 5 minutes
case 0.4..<0.6:
return 60 // 1 minute
case 0.6..<0.8:
return 30 // 30 seconds
default:
return 10 // 10 seconds (critical)
}
}
Battery Impact
| Usage Pattern | Daily Battery Impact |
|---|---|
| Normal (low risk) | 2-3% |
| Elevated (moderate risk) | 4-5% |
| High risk (frequent intervention) | 6-8% |
| Continuous monitoring (debug mode) | 15-20% |
Debugging and Monitoring
Production ML requires robust debugging and monitoring:
Model Performance Tracking
- Prediction latency: Log inference time for each prediction
- Output distribution: Track risk score histogram
- Accuracy validation: Compare predictions to actual outcomes
- Drift detection: Alert if prediction distribution shifts
Debug Tools
- Xcode Core ML Debugger: Visualise layer activations (iOS)
- TFLite Model Explorer: Inspect model graph (Android)
- Custom logging: Feature importance per prediction
Security Considerations
On-device models must be protected from tampering:
Model Integrity
- Code signing: Models signed with Whistl private key
- Hash verification: SHA-256 checksum before loading
- Runtime attestation: Verify model hasn't been modified
Reverse Engineering Protection
- Model encryption: Weights encrypted at rest
- Obfuscation: Layer names and structure obfuscated
- Jailbreak detection: Disable ML on compromised devices
Conclusion
On-device machine learning is essential for privacy-first financial apps. Core ML and TensorFlow Lite both provide excellent frameworks for running neural networks locally—with dedicated hardware acceleration delivering sub-10ms inference and minimal battery impact.
Whistl's implementation demonstrates that sophisticated AI doesn't require cloud processing. Your data stays on your device, predictions happen in real-time, and protection works even offline.
Experience Privacy-First AI
Whistl's on-device neural networks predict impulses without sending your data to the cloud. Download free and experience private AI.
Download Whistl FreeRelated: Neural Networks Explained | AI Financial Coach | Local Storage Encryption