Case Study: Offline-First Architecture at Enterprise Isla...
Engineering Case Study: Project Tepati
"Technology is best when it brings people together." — Matt Mullenweg
Context
I built the entire backend system for this project solo (single-fighter) from scratch — architecture, APIs, message brokers, database, sync engine, to deployment — with a 1-month MVP target. There were no other backend engineers. Meanwhile, the frontend (React Native) was handled by a separate team.
Project Tepati is a micro-finance process digitalization system for one of the largest Islamic banks in Indonesia. This system replaces paper-based manual processes with an end-to-end digital application used by 5,000+ Community Officers across Indonesia.
This article dissects the most challenging features from both engineering and business perspectives.
🏔️ 1. The Challenge
Field officers (Community Officers) work in remote areas — from the corners of Sumatra to villages in Nusa Tenggara. Internet connection is a luxury, not a certainty.
Conditions Before Tepati
Before Tepati was built, the entire customer acquisition process was done using Microsoft Power Apps. Field officers filled out data via Power Apps forms, sent them via Teams, and waited for manual approval. This process was extremely slow because:
- ❌ No reliable offline capability — Power Apps requires a stable internet connection for data synchronization.
- 🗃️ Data scattered across various files and chat threads — hard to track and prone to loss.
- ⏳ Approval bottleneck — supervisors had to open attachments one by one in Teams.
- 🙈 No real-time visibility — management could not monitor acquisition progress live.
Engineering Challenges
- 🛡️ Data Integrity: Customer financial data (income, expense, cashflow) must not be lost when signals are intermittent.
- 🧩 Complex State: Financing application forms have 200+ fields with complex cross-field validation logic.
- 🔄 Conflict Resolution: Data changed on device A (while offline) and device B (while online) simultaneously must be resolved without data loss.
- 🏢 Multi-Level Approval: Each application must pass 6 approval levels (CO → BM → SBM → BC → DH1 → DH2) geographically dispersed.
- ⚖️ Regulatory Compliance: All transactions must comply with OJK regulations and Sharia principles.
- ⏱️ Solo Backend Engineer, Tight Deadline: I had to deliver the entire backend myself in 1 month for MVP — without other backend engineers.
Business Impact
Before Tepati, 23% of applications failed due to lost or incomplete data. Every failure meant officers had to return to the customer — requiring on average 2-3 additional days and impacting customer trust.
⚔️ 2. Single-Fighter Backend
Many ask: how can one person build an entire backend system of this size in a month?
The answer: ruthless prioritization + technology leverage.
- 🏗️ Week 1: Infrastructure setup (Golang services, Kafka, AMQ, database schema, CI/CD pipeline).
- ⚙️ Week 2: Core sync engine backend + API endpoints for dynamic form engine. This was 80% of backend complexity.
- 🚀 Week 3: Approval pipeline (Kafka consumers + AMQ), IIR simulation API, document upload service.
- 🧪 Week 4: Integration testing with frontend team, bug fixing, deployment to staging & production.
The key was that I was already very familiar with Golang and the message broker ecosystem (Kafka + AMQ + MQTT) so there was no learning curve slowing me down. Every backend architecture decision I made was to maximize velocity without sacrificing reliability. The frontend team just consumed the API contracts I provided.
🛠️ 3. Technology Stack
Backend
- 🐹 Golang: High-concurrency microservices, capable of handling 15,000 concurrent connections with minimal memory footprint.
- 📨 Apache Kafka: Event streaming for data ingestion from the field — throughput 50,000 events/second at peak hours.
- 📬 AMQ (ActiveMQ): Message broker for approval workflow orchestration between levels.
- MongoDB: Primary database for storing dynamic form data (JSON) from the field.
- Microsoft SQL Server: Relational database for storing organizational structure and approval hierarchy.
- ⚡ Redis: Distributed caching and session management.
Frontend (Mobile)
- ⚛️ React Native: Cross-platform mobile app deployed to 5,000+ Android devices.
- 🗄️ SQLite: Local database — storing up to 10,000 records per device without performance degradation.
- 📡 MQTT (Mosquitto): Lightweight protocol for real-time notifications in < 50kbps bandwidth conditions (intermittent).
🔍 4. Feature Deep Dives
Feature #1: Offline-First Sync Engine
Problem: Officers are often in areas with no signal for 4-6 hours.
Solution: I built an architecture that is always offline and only synchronizes when there is a signal.
Key Engineering Decisions:
- 📦 Delta Sync Protocol: Sending only changes, not the entire payload. Saving bandwidth by an average of 78%.
- ⚔️ Conflict Resolution: Last-Write-Wins (LWW) for non-financial data, server-authoritative merge for critical financial data.
- 🔄 Retry with Exponential Backoff: Max 5 retries with jitter to avoid thundering herd.
// Simplified Sync Consumer (Golang)
func (s *SyncService) ProcessSyncBatch(ctx context.Context, batch []SyncEvent) error {
for _, event := range batch {
existing, err := s.repo.FindByDeviceID(ctx, event.DeviceID, event.RecordID)
if err != nil {
return fmt.Errorf("failed to find record: %w", err)
}
if existing != nil && existing.UpdatedAt.After(event.Timestamp) {
// Server version is newer — skip, notify client
s.notifier.SendConflict(ctx, event.DeviceID, existing)
continue
}
if err := s.repo.Upsert(ctx, event.ToRecord()); err != nil {
return fmt.Errorf("failed to upsert: %w", err)
}
s.kafka.Publish("sync.completed", event)
}
return nil
}Results:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Data loss rate | 23% | 0% | ↓ 100% |
| Avg sync time | 45 seconds | 3.2 seconds | ↓ 93% |
| Bandwidth usage per sync | 2.4 MB | 520 KB | ↓ 78% |
| App crash rate (offline) | 12% | 0.3% | ↓ 97.5% |
Feature #2: Dynamic Form Engine
Problem: Financing application forms have 200+ fields that change based on customer business type, location, and branch policy. Every form change required an app update (re-deploy to 5,000 devices).
Solution: Server-Driven Form Engine — form definitions are downloaded as JSON schema from the server and rendered dynamically on the client.
// Form Schema (simplified)
interface FormField {
id: string;
type: "text" | "number" | "select" | "photo" | "signature";
label: string;
rules: ValidationRule[];
dependencies?: {
field: string;
condition: "equals" | "gt" | "lt";
value: any;
action: "show" | "hide" | "require";
}[];
}Cross-Field Validation Examples:
- 🌾 If
business_type === "agriculture"→ fieldbusiness_durationminimum 2 years. - 💰 If
income_busy_day < installment * 3→ auto-reject, show IIR warning. - 💍 If
marital_status === "married"→ mandatory uploadmarriage_cert_urlandspouse_id_url.
Results:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Form update cycle | 2-3 weeks (app release) | Real-time | ↓ 100% |
| Incomplete submission | 34% | 4.2% | ↓ 88% |
| Avg form completion time | 45 minutes | 18 minutes | ↓ 60% |
Feature #3: Real-Time Multi-Level Approval Pipeline
Problem: Financing applications must pass 6 approval levels (CO → BM → SBM → BC → DH1 → DH2). Previously, approval was done via WhatsApp group — prone to human error and not auditable.
Solution: Event-driven approval pipeline using Kafka + AMQ that I designed myself.
Key Engineering:
- 🏎️ Each approval level is an independent Kafka consumer — if one level is down, the others keep running.
- 🔒 AMQ is used for DH-level approval because it requires guaranteed delivery with transaction support.
- 🔔 MQTT push notification to approver devices — response time dropped from hours to minutes.
- 📝 Complete audit trail: who approved, when, from which device, at what GPS coordinates.
Results:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Approval cycle time | 5-7 days | < 24 hours | ↓ 85% |
| Bottleneck detection | Manual | Real-time dashboard | — |
| Approval fraud rate | ~2.1% | 0.08% | ↓ 96% |
Feature #4: IIR Simulation Engine
Problem: Officers had to calculate Installment to Income Ratio (IIR) manually. Calculation errors caused rejections at upper levels, wasting everyone's time.
Solution: Real-time IIR calculator running on-device (offline-capable), with tenor and margin tables synced from the server.
// IIR Calculation Engine (Golang - mirrored in React Native)
type IIRSimulation struct {
Tenor int `json:"tenor"`
Angsuran int64 `json:"angsuran"`
TotalIIR float64 `json:"total_iir"`
Margin float64 `json:"margin"`
IsEligible bool `json:"is_eligible"`
}
func SimulateIIR(totalIncome, existingInstallment, requestedAmount int64, tenors []int) []IIRSimulation {
var results []IIRSimulation
for _, tenor := range tenors {
angsuran := calculateAngsuran(requestedAmount, tenor, getMarginRate(tenor))
totalIIR := float64(angsuran+existingInstallment) / float64(totalIncome) * 100
results = append(results, IIRSimulation{
Tenor: tenor,
Angsuran: angsuran,
TotalIIR: totalIIR,
Margin: getMarginRate(tenor),
IsEligible: totalIIR <= 30.0, // OJK threshold
})
}
return results
}Results:
| Metric | Before | After | Improvement |
|---|---|---|---|
| IIR calculation error | 18% | 0.1% (rounding only) | ↓ 99.4% |
| Rejection at upper level (due to IIR) | 22% | 3% | ↓ 86% |
| Time to simulate | 5-10 minutes (manual) | < 1 second | ↓ 99% |
Feature #5: Secure Document Capture & Upload
Problem: Officers took photos of ID cards, Family Cards, Marriage Certificates using personal cameras and sent them via WhatsApp. Issues: inconsistent quality, unencrypted, unauditable.
Solution: In-app camera module with:
- 📸 Auto-detect document boundaries (OpenCV-based edge detection).
- 👁️ Image quality validator — reject if blur score > threshold.
- 🔐 AES-256 encryption at rest in SQLite, decrypted only upon upload.
- 📤 Progressive upload — images sent in chunks when signal is available.
Results:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Document re-capture rate | 31% | 5% | ↓ 84% |
| Avg upload success rate | 67% | 99.2% | ↑ 48% |
| Security incidents (data leak) | 3 per year | 0 | ↓ 100% |
🚀 5. Overall Business Impact
The overall implementation of Tepati changed the fundamentals of micro-finance operations:
Key Numbers
- 📈 Customers served: from 2.8 million → 4.1 million (+46%) in 18 months
- 💰 Cost per acquisition: down 62% year-on-year
- ⚡ Turnaround time: from 7 days → < 24 hours (↓85%)
- 📉 NPL (Non-Performing Loan): down from 1.8% → 1.2% due to more accurate underwriting data
- 👩💼 CO productivity: up 2.3x (from 8 customers/day → 18 customers/day)
🏢 6. Post-Launch: Organizational Challenges
After the MVP was successfully delivered, there were several non-technical challenges that significantly impacted subsequent development velocity. I document these as lessons learned:
6.1 Protocol Migration: AMQ/MQTT (Intermittent) → HTTP
The initial architecture used AMQ and MQTT to handle offline-first connections. However, server infrastructure constraints caused connections to the broker to frequently drop midway (intermittent). Post-launch, the protocol was changed to HTTP-based to overcome this instability while aligning with the already mature enterprise backend ecosystem.
Although MQTT is more bandwidth-efficient, maintaining a stateful connection broker at enterprise scale requires high operational overhead. The transition to HTTP was chosen to ensure long-term maintainability by the operations team who were already experts in REST/HTTP environments, even though it meant sacrificing a bit of bandwidth efficiency.
Engineering Insight
This is a lesson about "Organizational-Fit". The best architecture is not just the most technically advanced one, but one that can be well supported by the organization's current operational capabilities.
6.2 CI/CD Pipeline Needed Optimization
One significant bottleneck in development iteration was deployment pipeline speed. At that time:
| Stage | Duration | Notes |
|---|---|---|
| Build → Beta | ~15 mins | Too long for rapid iteration |
| Build → Production | ~30 mins | Not including rollback if error |
| Hotfix cycle | 45-60 mins | Including re-test and re-deploy |
In comparison, industry standards for modern pipelines are < 5 minutes for build-to-deploy. This slow pipeline hindered the feedback loop and slowed down bug fix cycles in the field.
6.3 Security Compliance & Developer Experience
As the project matured, enterprise security standards (Enterprise Security) began to be fully applied, including the use of standardized devices with EDR (Endpoint Detection & Response) and corporate VPNs.
The challenge that arose was balancing Compliance with Velocity:
- Resource Constraints: Enterprise security tooling often consumes significant CPU/IO resources, impacting compilation times (especially Golang/React Native) compared to non-standard development environments.
- Workflow Adaptation: Developers need to adapt to a more restricted environment to maintain customer data security.
This is a reasonable trade-off in the banking industry: Security is non-negotiable, but the challenge is how to keep the development feedback loop fast within those constraints.
6.4 The Efficiency Dilemma: Balancing Scope & Focus
Another important reflection is regarding engineer resource allocation in the maintenance phase.
1. Multitasking vs Deep Work In the early phase (zero-to-one), a hybrid role (Backend + Infra + Support) is very effective for speed. However, in the steady state phase, these roles need to be specialized. Expecting one engineer to handle business logic as well as infrastructure issues can split focus (context switching) which risks long-term quality.
Sustainability Note
Team efficiency is not only measured by "how many roles one person handles", but by how focused the team can be on solving specific problems without excessive operational distractions.
2. Optimizing Processes Deployment processes and security policies are vital parts of governance. The challenge of engineering leadership is ensuring these processes are efficient (e.g., through better CI/CD automation or whitelisting policies) so engineers don't spend time waiting for processes (wait time), but rather creating solutions (build time).
3. Conclusion True efficiency is when support systems (people, process, tools) allow the engineering team to work with minimal friction, balancing innovation speed with regulatory compliance.
☠️ 6.5 The Silent Killer: Technical & Intellectual Bankruptcy
Extreme speed at the beginning often becomes addictive. However, letting one person hold the keys to the entire system without a companion is a fatal managerial mistake.
The Illusion of Speed When you have a "Hero Developer", you feel safe because features are delivered fast. In reality, you are accumulating compound interest on organizational debt. This dependency is a time bomb.
Critical Risk Assessment
Dependency on a single individual is the biggest Single Point of Failure (SPOF).
If you don't distribute knowledge now, be prepared to face a bitter reality when transition happens:
- Total Paralysis: A new engineer needs 3-4 sprints (1-2 months) just to simply read and understand the mental model of that "genius" code without daring to touch it.
- The Inevitable Rewrite: Often, the complexity of "single-fighter" code is so high and specific that the replacement engineer will pronounce: "This must be rewritten from scratch."
- Double Cost: You are not only paying the new engineer's salary, but also paying the cost of lost business momentum due to months of development stagnation.
Stop hoping on one person. Break knowledge silos today, or you will be forced to reboot your product in the future.
🎓 7. Lessons Learned
What Went Well
- 🧠 Offline-first mindset from day one — not an afterthought.
- 🐹 Golang proved ideal for high-concurrency sync workloads.
- 📡 MQTT (before migration) was much more efficient than HTTP polling for low-bandwidth conditions.
- 🦸♂️ Single-fighter backend allowed for fast architectural decisions without bottleneck meetings and approvals. The frontend team just consumed the API contracts I defined at the start.
- 💬 Replacing Microsoft Teams workflow with a dedicated system increased efficiency drastically.
What I'd Do Differently
- 📝 Start with CRDT (Conflict-free Replicated Data Types) from the beginning, not LWW which then had to be migrated.
- 🤖 Invest earlier in automated E2E testing for offline scenarios — hard to coordinate between backend (me) and frontend (other team) with tight deadlines.
- 📦 Use Protocol Buffers from the start — JSON parsing overhead was felt on low-end devices.
- 📢 Advocate stronger to maintain message broker architecture by providing better runbooks and monitoring for the operations team.
- ⏩ Push for better CI/CD infrastructure from the start — slow pipelines hinder the whole team, not just me.
Remaining Technical Debt
- 🔄 Migration from SQLite to WatermelonDB for better lazy loading.
- ⚠️ Standardizing error codes between Kafka events and REST API responses.
- 🔌 Implementing circuit breaker pattern in Sync Manager.
- ⏩ Optimizing CI/CD pipeline — target under 5 minutes per deployment stage.
🎉 Conclusion
This project proves that a backend engineer who masters distributed systems architecture — from API design, message brokers, to sync engines — can deliver an entire backend infrastructure alone in very limited time.
The hardest features aren't always about sophisticated AI algorithms. Sometimes, engineering excellence is about empathy — understanding that our users are standing in the middle of a rice field with 1 bar of signal, and ensuring technology continues to serve them resiliently.
Personal Note
What makes me proud: good technology is invisible technology. Officers don't need to know about Kafka, MQTT, or delta sync. All they know: this app just works, even in the middle of a forest — and the entire backend supporting it, I built myself in 30 days.