Building Reactive Systems - From Manifesto to Practice
ReactiveArchitectureResilienceSystem Design

Building Reactive Systems - From Manifesto to Practice

Phuoc NguyenJanuary 20, 20247 min read

Building Reactive Systems

The Reactive Manifesto

On September 16, 2014, Jonas Bonér and his colleagues published The Reactive Manifesto - a document defining the core characteristics of a Reactive system.

The Reactive Manifesto
The Reactive Manifesto

Reactive Systems are systems that are highly responsive, elastically scalable, fault-tolerant, and built on message-driven architecture.


The 4 Pillars of Reactive Systems

Reactive Manifesto
Reactive Manifesto
PillarMeaningRole
RESPONSIVEFast responseUltimate goal - Good UX
RESILIENTRecovery capabilityMaintain responsive when failures occur
ELASTICFlexible scalingMaintain responsive when load changes
MESSAGE DRIVENMessage-orientedTechnical foundation for everything

Relationship: Message Driven → (Resilient + Elastic) → Responsive


1. Responsive

Systems must respond in a timely manner if at all possible. Responsiveness is the foundation of usability.

Responsive System
Responsive System

Characteristics of Responsive Systems:

CharacteristicDescription
Consistent response timePredictable response times
Simplified error handlingSimplified error handling
User confidenceBuilding trust with users
Encourage interactionEncouraging interaction and growth

Key insight: Responsive is the result of properly applying Resilient and Elastic.


2. Resilient

Systems must remain responsive when failures occur. Any system that is not resilient will be unresponsive after failure.

Resilient System
Resilient System

Resilience is achieved through:

TechniqueDescription
ReplicationReplicating data/services for failover
ContainmentContaining failures, preventing spread
IsolationSeparating components, reducing coupling
DelegationDelegating recovery handling to other components

3. Elastic

Systems must remain responsive under varying workload. Can increase or decrease resources based on demand.

Elastic System
Elastic System

Requirements for Elasticity:

RequirementExplanation
No central bottlenecksNo central bottleneck points
No contention pointsNo contention points
Shard/Replicate componentsAbility to shard and replicate
Distribute inputsDistribute input across components

4. Message Driven

Reactive Systems rely on asynchronous message-passing to establish boundaries between components.

Message Driven
Message Driven

Benefits of Message-Driven:

  • Loose coupling - Components don't depend directly on each other
  • Isolation - Clear boundaries between components
  • Location transparency - No need to know physical location
  • Error delegation - Errors are passed as messages
  • Back-pressure - Flow control when overloaded

Commands vs Events

CharacteristicCommandsEvents
Send toUnicast (1 target)Broadcast/Multicast
PurposeRequest specific actionNotify something happened
ResponseExpect responseDon't expect response
Example"Transfer $100 to User X""Transaction ABC completed"

Non-Blocking I/O with Netty

The Problem with Blocking I/O

Blocking IO Problem
Blocking IO Problem
Blocking I/ONon-Blocking I/O
1 thread = 1 connectionFew threads = thousands connections
Thread blocked while waiting for I/ONot blocked
10K connections = 10K threads10K connections = few threads
High memory costLow memory cost
Context switching overheadMinimal switching

Netty Architecture

Netty Architecture by Layer:

LayerComponentRole
1ChannelsRepresent connections (conn1, conn2, conn3...)
2SelectorMultiplexing - monitor multiple channels simultaneously
3Event Loop1 Thread handles all events

Key insight: With Netty, 1 Event Loop thread can manage thousands of connections thanks to non-blocking I/O and multiplexing.


Resilience Patterns

When building payment systems at MoMo, we applied the following patterns:

Resilience Patterns
Resilience Patterns

1. Retry Pattern

Purpose: Retry operations when transient failures occur.

StepStateWait Time
1Request → Fail-
2Wait1s
3Retry → Fail-
4Wait (longer)2s
5Retry → Fail-
6Wait (even longer)4s
7Retry → Success-

Exponential Backoff: Wait time doubles after each failure

Best Practices:

  • Use exponential backoff (1s → 2s → 4s → 8s)
  • Limit retry attempts (max 3-5)
  • Distinguish between retryable and non-retryable errors

2. Circuit Breaker Pattern

Purpose: Prevent continuous calls to a failing service.

Circuit Breaker
Circuit Breaker

Circuit Breaker State Flow:

From StateConditionTo State
CLOSEDMultiple consecutive failuresOPEN
OPENAfter timeoutHALF-OPEN
HALF-OPENTest request succeedsCLOSED
HALF-OPENTest request failsOPEN
StateBehavior
ClosedNormal operation, counting failures
OpenReject requests immediately, don't call downstream
Half-OpenAllow test requests to check recovery

3. Rate Limiter Pattern

Purpose: Control the number of requests within a time period.

Rate Limiter
Rate Limiter

Algorithms:

AlgorithmDescriptionUse case
Token BucketEach request consumes 1 tokenAPI rate limiting
Leaky BucketProcess at fixed rateTraffic shaping
Fixed WindowCount within fixed time periodSimple counting
Sliding WindowCombines advantages of methodsSmooth limiting

4. Bulkhead Pattern

Purpose: Isolate parts of the system so failures don't spread.

Bulkhead
Bulkhead

Like ship compartments:

  • Compartments (bulkheads) are separated
  • Water entering one compartment doesn't sink the entire ship

Application with Bulkhead Pattern:

Thread PoolFunctionIsolation
Pool ABank IntegrationSeparate
Pool BPayment ProcessingSeparate
Pool CUser ServiceSeparate

Each pool is completely isolated - if Bank Integration is overloaded, Payment and User still work normally.

5. Fallback Pattern

Purpose: Provide alternative values when primary operation fails.

Examples:

  • Return cached data when database is unavailable
  • Use default values when external service is down
  • Redirect to backup service

6. Timeout Pattern

Purpose: Set time limits to avoid waiting indefinitely.

TimeoutProblem
Too shortFalse positives, request canceled early
Too longResources held too long, cascade failures
AppropriateBased on SLA and historical data

Real Experience at MoMo

MoMo Experience
MoMo Experience

Applied Architecture:

1. Message-Driven Architecture:

  • Apache Kafka as message broker
  • Event Sourcing for transaction history
  • CQRS to separate read/write operations

2. Resilience Patterns:

  • Circuit Breaker for bank integrations (30+ banks)
  • Retry with exponential backoff
  • Bulkhead to isolate critical payment flows

3. Non-Blocking I/O:

  • Vert.x for core services
  • Reactive streams with back-pressure
  • Connection pooling with non-blocking drivers

Results Achieved:

MetricBeforeAfter
Throughput10K TPS100K+ TPS
Latency P99500ms50ms
Availability99.9%99.99%
Resource usageHighOptimized

Key Takeaways

Summary
Summary
  1. Reactive is not just a technical choice - It's an architecture decision affecting the entire system design
  2. 4 pillars must be applied together - Missing one reduces overall effectiveness
  3. Resilience patterns are mandatory - In distributed systems, failure is normal, not an exception
  4. Message-driven is the foundation - Enables loose coupling and location transparency
  5. Non-blocking I/O is the technical enabler for high throughput

References

Share: