When Developers Learn to Save
PlatformQuarkusGraalVMOptimizationArchitecture

When Developers Learn to Save

Phuoc NguyenJanuary 20, 202512 min read

When Developers Learn to Save

The Beginning Question

October 2024, during a team meeting, a question was raised: "We need to find a more powerful framework to replace our current approach."

At that time, our system was running on Vert.x - a Java toolkit famous for its performance with an event-loop non-blocking model. For Dependency Injection, we used Dagger - Google's compile-time DI. The codebase had been built over many years, with dozens of services handling millions of transactions daily.

But in the world of technology, standing still means falling behind.

The team discovered Quarkus - a framework designed for Kubernetes and cloud-native applications. What's interesting is that Quarkus's core is actually Vert.x, meaning all our knowledge about reactive programming would still be utilized. But Quarkus goes further with outstanding advantages: extremely fast startup time, small memory footprint, and the ability to build native code with GraalVM.

The question wasn't "Is Quarkus good?" but "Is the transformation worth it?"

Decision
Decision

The Cost of Change

Switching from Vert.x + Dagger to Quarkus isn't as simple as upgrading a library version. It's changing the entire philosophy of how we code.

Dependency Injection had to shift from Dagger (compile-time, annotation processing) to CDI (Jakarta EE standard). Two completely different approaches.

Programming model had to shift from callbacks to Mutiny (Uni/Multi). Although both are reactive, the syntax and mindset differ significantly.

// V2 Pattern - Callback-based
public void processTransfer(TransferData input, Handler<TransferData> whenDone) {
    validateTask.exec(input, validatedData -> {
        coreTask.exec(validatedData, coreResult -> {
            persistTask.exec(coreResult, whenDone);
        });
    });
}

// V3 Pattern - Mutiny reactive
public Uni<RequestMsg> processTransfer(RequestMsg input) {
    return validateTask.exec(input)
        .flatMap(coreTask::exec)
        .flatMap(persistTask::exec);
}

And most importantly: all common libraries had to be rebuilt from scratch. Database connection, message broker, Redis caching, workflow engine, task scheduling - everything had to be rewritten for the V3 platform.

Looking at the scope of work, the team asked ourselves: "Is it really worth it?"

Unexpected Inspiration

One day by chance, we came across an article about Capital One's journey from Java to Golang. Their Credit Offers API was completely rewritten from Java to Go. Many people thought they switched because Golang was "cooler." But no. The real results were: 70% performance gain and 90% cost savings - an incredible number.

Java running on JVM consumes quite a lot of resources. Each service needs several hundred MB of RAM just to start. When you have thousands of microservices, that number multiplies into enormous costs.

That's when the team's mindset changed.

Platform transformation isn't just about "upgrading technology for fun." It can bring real business value: reduced infrastructure costs, reduced resource consumption, increased system efficiency.

And from there, a clear goal was set: Optimize 30% resource.

Target
Target

Two Parallel Approaches

Realizing we couldn't wait to finish building the new platform before optimizing, the team decided to split into two workstreams running in parallel.

Workstream 1: Right-sizing - Optimizing What We Have

Before thinking about changing platforms, let's look at what we're currently using.

The team began reviewing the entire resource configuration of all services. And we discovered an embarrassing truth: many pods were only using 0.2% to 0.5% of their requested CPU.

Imagine renting a 100m² apartment but only using 1m². That's what we were doing with our infrastructure.

Why was this happening? The answer is simple: lack of knowledge about Kubernetes resource management.

When configuring resources for a service, developers typically "leave extra to be safe." Request 2 CPU cores while only using 0.1. Request 4GB RAM while only using 500MB. The "better safe than sorry" mentality leads to systematic waste.

After researching with the DevOps team, we developed a standard formula for resource configuration:

Request Resource (what K8s guarantees you'll have):

CPU Request = Peak Usage / (HPA threshold - 20%)
Memory Request = Peak Usage / (HPA threshold - 20%)

Limit Resource (maximum allowed threshold):

CPU Limit = CPU Request × 2 to 4 (depending on workload)
Memory Limit = Memory Request × 1.2

How to determine Peak Usage:

  1. Open Grafana, view metrics from the past 7 days
  2. Identify resource usage at peak traffic times
  3. Apply the formula above

Example: Service A has peak CPU usage of 0.48 cores. With HPA threshold 80%:

CPU Request = 0.48 / 0.6 = 0.8 cores
CPU Limit = 0.8 × 3 = 2.4 cores

Instead of requesting 2 cores like before, now we only need 0.8 cores. A 60% savings with just one service.

Right-sizing
Right-sizing

Workstream 2: Platform V3 - Building the New Foundation

In parallel with right-sizing, the team began building the V3 platform with Quarkus.

V3 Architecture was designed with the following principles:

ComponentV2 (Legacy)V3 (Modern)
FrameworkVert.x 4.xQuarkus 3.15.1
Java1721
DI ContainerDaggerCDI (Jakarta EE)
Async ModelCallbacksMutiny (Uni/Multi)
HTTPVert.x HTTP ServerJAX-RS (RESTEasy Reactive)
BuildMaven Shade (Fat JAR)Quarkus Maven + Native

V3 Common Libraries were built completely new:

  • lib_v3-scaffold: Core framework with Task, WorkFlow patterns
  • lib_v3-http-server: REST API with JWT authentication
  • lib_v3-jdbc: Reactive database access
  • lib_v3-redis: Caching with multi-instance support
  • lib_v3-kafka: Event streaming
  • lib_v3-rabbit: RabbitMQ RPC communication

Each library was designed with a Reactive First mindset - all operations are non-blocking, all returns are Uni or Multi.

// lib_v3-jdbc interface
public interface ReactiveJDBCClient {
    <T> Uni<T> querySingle(String query, Class<T> tClass);
    <T> Multi<T> query(String query, Class<T> tClass);
    Uni<Integer> updateWithParams(String query, List<Object> params);
}

// lib_v3-redis interface
public interface ReactiveRedisClient {
    Uni<String> get(String key);
    Uni<Void> setWithTTLSeconds(String key, String value, Long ttlSeconds);
    Uni<Boolean> hset(String key, String field, String value);
}

Expensive Lessons

The implementation didn't go as smoothly as planned. And each difficulty brought a lesson.

Lesson 1: High Memory Services - When Code is the Culprit

During the resource review, the team discovered some services with abnormal memory usage: many pods but still high memory, some even restarting due to memory peaks.

Initially, we thought this was a configuration issue. But no. This was a code issue.

After investigating, we found the root cause was in the code - not infrastructure. There were patterns in the code causing memory leaks or holding resources longer than necessary.

Lesson: Right-sizing is just the first step. Sometimes you need to invest additional resources to optimize code before optimizing infrastructure.

Lesson 2: Big Bang Migration Isn't Feasible

The original plan was: build the complete V3 platform, then migrate all services.

Reality: main resources must focus on business projects. The team didn't have enough bandwidth to maintain V2, build V3, and migrate simultaneously.

Solution: Soft Migration Strategy

  • New modules: Use V3 framework from the start
  • Critical old modules: Keep the tech stack (Vert.x + Dagger), only migrate to V3 project to sync versions and dependencies
  • Non-critical old modules: Gradually migrate to V3 framework when resources allow

This strategy reduces risk and allows the team to move forward without needing an "all-in" migration.

Lesson 3: GraalVM Native - Difficult but Worthwhile

One of Quarkus's promises is the ability to build native executables with GraalVM. Native code doesn't need a JVM, startup is nearly instant, and memory footprint is extremely small.

But building native isn't simple. The team encountered and overcame many challenges:

RiskImpactSolution
Reflection issuesHighUse @RegisterForReflection, configure reflect-config.json
Third-party libraries not compatibleHighCheck Quarkus extensions first, fallback to JVM mode if needed
Long build time (10-15 mins)MediumCI caching, parallel builds
Learning curveMediumStart with simple modules, create detailed documentation
Regression bugsHighComprehensive testing, phased rollouts

The team initially struggled because GitLab runners didn't have GraalVM. After DevOps helped install GraalVM on the runners, building native became feasible.

One important tip: don't try to native-ize everything at once. Start with small, stateless modules with minimal dependencies. Once the team is familiar with the pitfalls, move on to more complex modules.

Native Build
Native Build

Impact: Not Just Promises

Right-sizing Results

After applying the right-sizing formula to all services:

MetricOriginalSavedPercentage
CPU419.2 cores184.8 cores44.08%
Memory590,951 MB22,756 MB3.85%

44% CPU saved. This number far exceeded our initial 30% target.

What's worth reflecting on: this wasn't from optimizing code or changing architecture. This was just from configuring correctly what we actually need.

Additionally, the team:

  • Built Grafana dashboards monitoring all resource usage
  • Set up alert rules for abnormal thresholds
  • Organized training sessions for developers on Kubernetes resource management

Platform Migration Results

For a complex system with dozens of modules, migration progress following the soft migration strategy:

StackPercentageNotes
V3 Native (GraalVM)~10%Simple, stateless modules
V3 JVM (Quarkus)~15%More complex modules
V2 (Vert.x + Dagger)~75%Gradually migrating

Results achieved:

  • 100% of new services written on V3 framework
  • Complete documentation for onboarding
  • Phased rollout minimizing risk

Native Code: From 250MB to 25MB

And here's the most exciting part.

The team experimented with building native code for an internal service - one of the first modules running Native in production.

Actual measured results:

MetricJVM ModeNative ModeImprovement
Memory Usage~250 MB~25 MB10x
Startup Time~5 seconds~50 ms100x
Container Size~200 MB~50 MB4x
Pod Ready Time10-15 seconds< 1 second15x

Why do these numbers matter?

With Kubernetes, startup time determines scaling speed. When traffic spikes suddenly, HPA triggers scale-out. With JVM, each new pod needs 10-15 seconds to be ready. With Native, it takes under 1 second. This means the system can react 15 times faster to traffic changes.

A memory footprint 5-10x smaller means: on the same Kubernetes node, you can schedule more pods. Or use smaller nodes with lower costs.

The service has been running stable in production. This isn't theory - this is reality happening right now.

Results
Results

Regrets and Pride

After nearly a year of implementation, we have some thoughts about this journey.

First regret: All this time, the team had been using resources wastefully without realizing it. Those 0.2% CPU usage numbers had existed for a long time, but no one noticed.

Second regret: Lack of infrastructure knowledge. As backend developers, we write code running on Kubernetes every day, but we didn't understand how Kubernetes works. Didn't understand what request/limit means. Didn't understand what conditions trigger HPA.

First pride: At least we realized in time to put things on the right track. Just one step of looking back, optimizing what we're using, is already a big step forward. 44% CPU saved didn't come from anywhere far - it came from understanding correctly and configuring correctly.

Second pride: Native code actually works. 250MB down to 25MB isn't marketing material - it's measurable reality in production.

What's Next?

The journey isn't over. With the results achieved, the path forward is clear:

1. Expand Native Code Coverage

Starting from simple modules, gradually expanding to more complex ones. Each successful native module is a step forward in performance and cost efficiency.

2. Complete V3 Migration

Continue migrating remaining modules to the V3 framework, ensuring complete:

  • Unit tests and integration tests
  • Fault tolerance (Rate limit, Bulkhead, Circuit breaker)
  • Observability (Health checks, Metrics, Tracing)

3. Maintain Optimization Culture

Right-sizing isn't a one-time task. Regular review of resource usage is needed, with alerts for services using abnormal resources (too high or too low).


Closing Thoughts

The 44% journey taught us one thing: optimization isn't the infrastructure team's job - it's everyone's job.

When developers understand resource management, they write better code. When developers understand container limits, they configure more correctly. When developers understand native compilation, they have one more powerful tool in their arsenal.

And sometimes, the biggest step forward doesn't come from building something new. It comes from looking back and optimizing what you already have.

44% CPU saved. 10x memory reduction with native code. These numbers are proof of a simple truth: when developers learn to save, the whole system benefits.

Keep going, keep growing.


Appendix

Right-sizing Formula Reference

ParameterFormulaExample
CPU RequestPeak Usage / 0.60.48 / 0.6 = 0.8 cores
Memory RequestPeak Usage / 0.61.5GB / 0.6 = 2.5GB
CPU LimitRequest × 2-40.8 × 3 = 2.4 cores
Memory LimitRequest × 1.22.5 × 1.2 = 3GB
Min PodsNormal Peak / Pod Capacity30 rps / 10 = 3 pods
Max PodsPeak Traffic / Pod Capacity100 rps / 10 = 10 pods

Note: Default HPA threshold is 80%, so divisor = 0.8 - 0.2 = 0.6

Share: