When Developers Learn to Save

The Beginning Question

October 2024, during a team meeting, a question was raised: "We need to find a more powerful framework to replace our current approach."

At that time, our system was running on Vert.x - a Java toolkit famous for its performance with an event-loop non-blocking model. For Dependency Injection, we used Dagger - Google's compile-time DI. The codebase had been built over many years, with dozens of services handling millions of transactions daily.

But in the world of technology, standing still means falling behind.

The team discovered Quarkus - a framework designed for Kubernetes and cloud-native applications. What's interesting is that Quarkus's core is actually Vert.x, meaning all our knowledge about reactive programming would still be utilized. But Quarkus goes further with outstanding advantages: extremely fast startup time, small memory footprint, and the ability to build native code with GraalVM.

The question wasn't "Is Quarkus good?" but "Is the transformation worth it?"

The Cost of Change

Switching from Vert.x + Dagger to Quarkus isn't as simple as upgrading a library version. It's changing the entire philosophy of how we code.

Dependency Injection had to shift from Dagger (compile-time, annotation processing) to CDI (Jakarta EE standard). Two completely different approaches.

Programming model had to shift from callbacks to Mutiny (Uni/Multi). Although both are reactive, the syntax and mindset differ significantly.

// V2 Pattern - Callback-based
public void processTransfer(TransferData input, Handler<TransferData> whenDone) {
    validateTask.exec(input, validatedData -> {
        coreTask.exec(validatedData, coreResult -> {
            persistTask.exec(coreResult, whenDone);
        });
    });
}

// V3 Pattern - Mutiny reactive
public Uni<RequestMsg> processTransfer(RequestMsg input) {
    return validateTask.exec(input)
        .flatMap(coreTask::exec)
        .flatMap(persistTask::exec);
}

And most importantly: all common libraries had to be rebuilt from scratch. Database connection, message broker, Redis caching, workflow engine, task scheduling - everything had to be rewritten for the V3 platform.

Looking at the scope of work, the team asked ourselves: "Is it really worth it?"

Unexpected Inspiration

One day by chance, we came across an article about Capital One's journey from Java to Golang. Their Credit Offers API was completely rewritten from Java to Go. Many people thought they switched because Golang was "cooler." But no. The real results were: 70% performance gain and 90% cost savings - an incredible number.

Java running on JVM consumes quite a lot of resources. Each service needs several hundred MB of RAM just to start. When you have thousands of microservices, that number multiplies into enormous costs.

That's when the team's mindset changed.

Platform transformation isn't just about "upgrading technology for fun." It can bring real business value: reduced infrastructure costs, reduced resource consumption, increased system efficiency.

And from there, a clear goal was set: Optimize 30% resource.

Two Parallel Approaches

Realizing we couldn't wait to finish building the new platform before optimizing, the team decided to split into two workstreams running in parallel.

Workstream 1: Right-sizing - Optimizing What We Have

Before thinking about changing platforms, let's look at what we're currently using.

The team began reviewing the entire resource configuration of all services. And we discovered an embarrassing truth: many pods were only using 0.2% to 0.5% of their requested CPU.

Imagine renting a 100m² apartment but only using 1m². That's what we were doing with our infrastructure.

Why was this happening? The answer is simple: lack of knowledge about Kubernetes resource management.

When configuring resources for a service, developers typically "leave extra to be safe." Request 2 CPU cores while only using 0.1. Request 4GB RAM while only using 500MB. The "better safe than sorry" mentality leads to systematic waste.

After researching with the DevOps team, we developed a standard formula for resource configuration:

Request Resource (what K8s guarantees you'll have):

CPU Request = Peak Usage / (HPA threshold - 20%)
Memory Request = Peak Usage / (HPA threshold - 20%)

Limit Resource (maximum allowed threshold):

CPU Limit = CPU Request × 2 to 4 (depending on workload)
Memory Limit = Memory Request × 1.2

How to determine Peak Usage:

Open Grafana, view metrics from the past 7 days
Identify resource usage at peak traffic times
Apply the formula above

Example: Service A has peak CPU usage of 0.48 cores. With HPA threshold 80%:

CPU Request = 0.48 / 0.6 = 0.8 cores
CPU Limit = 0.8 × 3 = 2.4 cores

Instead of requesting 2 cores like before, now we only need 0.8 cores. A 60% savings with just one service.

Workstream 2: Platform V3 - Building the New Foundation

In parallel with right-sizing, the team began building the V3 platform with Quarkus.

V3 Architecture was designed with the following principles:

Component	V2 (Legacy)	V3 (Modern)
Framework	Vert.x 4.x	Quarkus 3.15.1
Java	17	21
DI Container	Dagger	CDI (Jakarta EE)
Async Model	Callbacks	Mutiny (Uni/Multi)
HTTP	Vert.x HTTP Server	JAX-RS (RESTEasy Reactive)
Build	Maven Shade (Fat JAR)	Quarkus Maven + Native

V3 Common Libraries were built completely new:

lib_v3-scaffold: Core framework with Task, WorkFlow patterns
lib_v3-http-server: REST API with JWT authentication
lib_v3-jdbc: Reactive database access
lib_v3-redis: Caching with multi-instance support
lib_v3-kafka: Event streaming
lib_v3-rabbit: RabbitMQ RPC communication

Each library was designed with a Reactive First mindset - all operations are non-blocking, all returns are Uni or Multi.

// lib_v3-jdbc interface
public interface ReactiveJDBCClient {
    <T> Uni<T> querySingle(String query, Class<T> tClass);
    <T> Multi<T> query(String query, Class<T> tClass);
    Uni<Integer> updateWithParams(String query, List<Object> params);
}

// lib_v3-redis interface
public interface ReactiveRedisClient {
    Uni<String> get(String key);
    Uni<Void> setWithTTLSeconds(String key, String value, Long ttlSeconds);
    Uni<Boolean> hset(String key, String field, String value);
}

Expensive Lessons

The implementation didn't go as smoothly as planned. And each difficulty brought a lesson.

Lesson 1: High Memory Services - When Code is the Culprit

During the resource review, the team discovered some services with abnormal memory usage: many pods but still high memory, some even restarting due to memory peaks.

Initially, we thought this was a configuration issue. But no. This was a code issue.

After investigating, we found the root cause was in the code - not infrastructure. There were patterns in the code causing memory leaks or holding resources longer than necessary.

Lesson: Right-sizing is just the first step. Sometimes you need to invest additional resources to optimize code before optimizing infrastructure.

Lesson 2: Big Bang Migration Isn't Feasible

The original plan was: build the complete V3 platform, then migrate all services.

Reality: main resources must focus on business projects. The team didn't have enough bandwidth to maintain V2, build V3, and migrate simultaneously.

Solution: Soft Migration Strategy

New modules: Use V3 framework from the start
Critical old modules: Keep the tech stack (Vert.x + Dagger), only migrate to V3 project to sync versions and dependencies
Non-critical old modules: Gradually migrate to V3 framework when resources allow

This strategy reduces risk and allows the team to move forward without needing an "all-in" migration.

Lesson 3: GraalVM Native - Difficult but Worthwhile

One of Quarkus's promises is the ability to build native executables with GraalVM. Native code doesn't need a JVM, startup is nearly instant, and memory footprint is extremely small.

But building native isn't simple. The team encountered and overcame many challenges:

Risk	Impact	Solution
Reflection issues	High	Use `@RegisterForReflection`, configure `reflect-config.json`
Third-party libraries not compatible	High	Check Quarkus extensions first, fallback to JVM mode if needed
Long build time (10-15 mins)	Medium	CI caching, parallel builds
Learning curve	Medium	Start with simple modules, create detailed documentation
Regression bugs	High	Comprehensive testing, phased rollouts

The team initially struggled because GitLab runners didn't have GraalVM. After DevOps helped install GraalVM on the runners, building native became feasible.

One important tip: don't try to native-ize everything at once. Start with small, stateless modules with minimal dependencies. Once the team is familiar with the pitfalls, move on to more complex modules.

Impact: Not Just Promises

Right-sizing Results

After applying the right-sizing formula to all services:

Metric	Original	Saved	Percentage
CPU	419.2 cores	184.8 cores	44.08%
Memory	590,951 MB	22,756 MB	3.85%

44% CPU saved. This number far exceeded our initial 30% target.

What's worth reflecting on: this wasn't from optimizing code or changing architecture. This was just from configuring correctly what we actually need.

Additionally, the team:

Built Grafana dashboards monitoring all resource usage
Set up alert rules for abnormal thresholds
Organized training sessions for developers on Kubernetes resource management

Platform Migration Results

For a complex system with dozens of modules, migration progress following the soft migration strategy:

Stack	Percentage	Notes
V3 Native (GraalVM)	~10%	Simple, stateless modules
V3 JVM (Quarkus)	~15%	More complex modules
V2 (Vert.x + Dagger)	~75%	Gradually migrating

Results achieved:

100% of new services written on V3 framework
Complete documentation for onboarding
Phased rollout minimizing risk

Native Code: From 250MB to 25MB

And here's the most exciting part.

The team experimented with building native code for an internal service - one of the first modules running Native in production.

Actual measured results:

Metric	JVM Mode	Native Mode	Improvement
Memory Usage	~250 MB	~25 MB	10x
Startup Time	~5 seconds	~50 ms	100x
Container Size	~200 MB	~50 MB	4x
Pod Ready Time	10-15 seconds	< 1 second	15x

Why do these numbers matter?

With Kubernetes, startup time determines scaling speed. When traffic spikes suddenly, HPA triggers scale-out. With JVM, each new pod needs 10-15 seconds to be ready. With Native, it takes under 1 second. This means the system can react 15 times faster to traffic changes.

A memory footprint 5-10x smaller means: on the same Kubernetes node, you can schedule more pods. Or use smaller nodes with lower costs.

The service has been running stable in production. This isn't theory - this is reality happening right now.

Regrets and Pride

After nearly a year of implementation, we have some thoughts about this journey.

First regret: All this time, the team had been using resources wastefully without realizing it. Those 0.2% CPU usage numbers had existed for a long time, but no one noticed.

Second regret: Lack of infrastructure knowledge. As backend developers, we write code running on Kubernetes every day, but we didn't understand how Kubernetes works. Didn't understand what request/limit means. Didn't understand what conditions trigger HPA.

First pride: At least we realized in time to put things on the right track. Just one step of looking back, optimizing what we're using, is already a big step forward. 44% CPU saved didn't come from anywhere far - it came from understanding correctly and configuring correctly.

Second pride: Native code actually works. 250MB down to 25MB isn't marketing material - it's measurable reality in production.

What's Next?

The journey isn't over. With the results achieved, the path forward is clear:

1. Expand Native Code Coverage

Starting from simple modules, gradually expanding to more complex ones. Each successful native module is a step forward in performance and cost efficiency.

2. Complete V3 Migration

Continue migrating remaining modules to the V3 framework, ensuring complete:

Unit tests and integration tests
Fault tolerance (Rate limit, Bulkhead, Circuit breaker)
Observability (Health checks, Metrics, Tracing)

3. Maintain Optimization Culture

Right-sizing isn't a one-time task. Regular review of resource usage is needed, with alerts for services using abnormal resources (too high or too low).

Closing Thoughts

The 44% journey taught us one thing: optimization isn't the infrastructure team's job - it's everyone's job.

When developers understand resource management, they write better code. When developers understand container limits, they configure more correctly. When developers understand native compilation, they have one more powerful tool in their arsenal.

And sometimes, the biggest step forward doesn't come from building something new. It comes from looking back and optimizing what you already have.

44% CPU saved. 10x memory reduction with native code. These numbers are proof of a simple truth: when developers learn to save, the whole system benefits.

Keep going, keep growing.

Appendix

Right-sizing Formula Reference

Parameter	Formula	Example
CPU Request	Peak Usage / 0.6	0.48 / 0.6 = 0.8 cores
Memory Request	Peak Usage / 0.6	1.5GB / 0.6 = 2.5GB
CPU Limit	Request × 2-4	0.8 × 3 = 2.4 cores
Memory Limit	Request × 1.2	2.5 × 1.2 = 3GB
Min Pods	Normal Peak / Pod Capacity	30 rps / 10 = 3 pods
Max Pods	Peak Traffic / Pod Capacity	100 rps / 10 = 10 pods

Note: Default HPA threshold is 80%, so divisor = 0.8 - 0.2 = 0.6