Replacing Ingress-NGINX: Comparing Kubernetes Gateway API Implementations

by Anika Shah - Technology
0 comments

Beyond Ingress: Navigating the Migration to Kubernetes Gateway API

For years, the Ingress API served as the standard for managing external HTTP/HTTPS traffic in Kubernetes. It was simple, widely adopted, and “just worked” for most teams. But as microservices grew in complexity, the limitations of Ingress—specifically its heavy reliance on implementation-specific annotations and limited feature set—became a bottleneck. When the announcement hit that Ingress-NGINX was being retired, many organizations found themselves at a crossroads: stick with another Ingress controller or leap forward to the Kubernetes Gateway API.

From Instagram — related to Navigating the Migration, Defining the Selection Criteria

Migrating to the Gateway API isn’t just a version upgrade; it’s a fundamental shift in how networking is handled. By introducing better role separation and a more expressive configuration model, the Gateway API aims to solve the “annotation hell” that plagued early Kubernetes networking. However, moving a production workload requires more than just reading the documentation—it requires rigorous benchmarking and a clear understanding of how different implementations handle real-world stress.

Defining the Selection Criteria

When evaluating a replacement for a retired ingress solution, the goal is to avoid vendor lock-in while ensuring the new system can handle existing traffic patterns. To narrow the field, the selection process focused on three primary requirements:

  • Full Conformity: The implementation had to be on the list of fully-conformant Gateway API implementations to ensure a baseline of standard behavior.
  • Cloud Agnostic: Because the infrastructure spans both GCP and Azure, cloud-specific solutions were eliminated to maintain portability.
  • Feature Depth: Analysis of the 1.4 feature matrix and third-party benchmarks helped shortlist the top contenders.

This process narrowed the field to three primary Gateway implementations: NGINX Gateway Fabric, Traefik, and Istio. While other options like HAProxy and F5 NGINX were considered, they were dropped due to either stale implementation status at the time of testing or a reliance on implementation-specific resources rather than standard Kubernetes types.

The Testing Framework: Use Cases and Benchmarks

To ensure the new gateway could handle production traffic, the testing phase was split into functional use cases and scalability benchmarks. The team used Claude to analyze existing production YAML files, sorting them into “use case buckets” to ensure every routing quirk was accounted for.

The Testing Framework: Use Cases and Benchmarks
Comparing Kubernetes Gateway Azure

Functional Validation

The team utilized two distinct backends for testing:

  • HTTPBin: Used for introspecting requests and responses, specifically to test dynamic host header overwrites via the /headers endpoint.
  • Custom Go Web Server: Deployed to simulate high-performance responses and adjustable latency, allowing the team to see how the gateway handled a pile-up of active requests.

Scalability and Performance

The performance benchmark targeted 10,000 requests per second (RPS) to provide a comfortable headroom above steady-state traffic. The environment consisted of four replicas of each gateway running on GCP e2-standard-4 nodes, with a K6 test client running on an Azure Standard_DC8as_cc_v5 instance.

The Results: Performance vs. Stability

While all three implementations—Istio, Traefik, and NGINX Gateway Fabric—handled basic use cases, the differences emerged during high-load and configuration-change tests.

I Migrated My Homelab from NGINX Ingress to Kubernetes Gateway API

1. Feature Depth and Complexity

Istio checked the most boxes regarding Gateway API features, whereas Traefik checked the fewest. A notable finding was that some standard features, like header modification in HTTPRoute, only handle static values. For dynamic regex needs, the team had to fall back on implementation-specific extensions. While all three were flexible enough, Istio’s filters were syntactically more complex than those of NGINX or Traefik.

2. Route Convergence

A critical test involved applying 5,000 HTTPRoutes concurrently. While NGINX and Istio converged on these routes in approximately 42 seconds, Traefik took significantly longer, exceeding a five-minute timeout before finally loading all routes.

3. The “Update Spike” Phenomenon

The most revealing test occurred when modifying routes while under a 10k RPS load. While Istio and Traefik remained stable, NGINX experienced significant latency spikes whenever a single HTTPRoute was updated (even with only 1,000 routes configured). This indicated a potential stability issue during frequent configuration changes.

Final Comparison Summary

Metric NGINX Gateway Fabric Traefik Istio
Feature Set Moderate Basic Advanced
Route Convergence Quick Slow (Timeout risk) Fast
Update Stability Latency Spikes Stable Stable
Configuration Standard Requires Entrypoints Complex Syntax

The Verdict: Why Istio Won

After exhaustive testing, the decision was made to migrate to Istio. While any of the three could have functioned, Istio proved to be the most solid option due to its stability and performance across all benchmarks. Its ability to handle route updates without inducing latency spikes and its comprehensive feature set made it the safest bet for a high-traffic production environment.

The Verdict: Why Istio Won
Comparing Kubernetes Gateway

For teams considering a similar move, the takeaway is clear: don’t rely solely on feature matrices. Real-world performance—especially how a gateway handles configuration updates under load—is the most critical factor in ensuring a seamless migration from Ingress to the Gateway API.

Key Takeaways for Architects

  • Avoid “Annotation Bloat”: Use the migration to Gateway API to move away from implementation-specific annotations and toward standardized HTTPRoute resources.
  • Test Convergence: If your environment requires thousands of routes, test how long it takes for the gateway to converge after a bulk update.
  • Monitor Update Latency: Ensure that pushing a new routing rule doesn’t cause a temporary latency spike for existing users.
  • Validate Extensions: Remember that standard HTTPRoute features may be limited; identify where you’ll need implementation-specific extensions (like regex header mods) early in the process.

Related Posts

Leave a Comment