Cloud Native Hybrid Networking: elevating the Network for the Application Layer
Table of Contents
- Cloud Native Hybrid Networking: elevating the Network for the Application Layer
- The Tragedy of Networking: Why Knowing “who” is Harder Than Knowing “Where”
- The Perils of the “Big Proxy” Architecture
- The Future of Policy Enforcement: From Infrastructure to Identity
- The Problem of Disconnected Security Policies & the Rise of Policy-as-Code
Louis Ryan
We’re going to talk about cloud native hybrid networking – or whatever that actually means. I don’t think anybody really knows exactly what it means. I want to share patterns I’ve seen, where people are coming from, what they’re trying to get to, and the struggles they have. Maybe with a little philosophical rant thrown in for entertainment.
Let me give you some background.I ran Google’s API management platform for about a decade and helped create gRPC in open source. I also work on Istio, an open-source project in the cloud native hybrid networking/service mesh space. Currently,I work for Solo. But the most vital point is: I am not a networking person.
My background is in applications, APIs, and the communication patterns between those things. I’m not about configuring routers or firewalls,though I’ve done my share of that in the past.I don’t consider myself a networking infrastructure person. However, I do have opinions about what networks can and should be doing, and I’ve been working on that for a long time. That’s what I’ll focus on today.
Outline
If I’m not a networking guy, why are you here? What do I care about? I believe networks need to evolve in three key ways. first, they need to elevate their functionality. Second, they should compose. And third, those compositions should be repeatable.
Networks have gotten incredibly good at shoving bits around at incredible speeds over the last 50-odd years. But we haven’t improved much in other areas. Networks and their abstractions are a necessity to the application, yet not a lot is going into making the network more useful to the application.
Things are being done, but often outside the network itself. We’ll get into that. The goal is to build higher-level systems and capabilities. These patterns should be repeatable. If I do something in one place, I should be able to do the same thing elsewhere. If I have to do something different in two different places, it drives up costs – in people, skills, compensating systems, and potentially even different systems to achieve the same effect. That’s not ideal. We want consistency, especially in a hybrid universe with two clouds and an on-premise system, particularly when evolving a legacy setup.
Elevate
Let’s start with elevation: the network needs to be more useful to the application, not the other way around. Too much time is spent in networking, or application development, compensating for the limitations of the network.The network is infrastructure. Its job is to be more useful to the application layer,not just to network admins. I don’t think enough effort is going into this.
If you run operations or make buying decisions, consider this: when you acquire something, does it provide utility up to the application layer? As the application layer is what delivers value to your business.
Think about the big problems. This isn’t necessarily a jab at F5 – though maybe it is indeed a little. It’s more about the concept. The IP address dominates how people think about networking. How many do I have? How do I get one? Who do I have to ask? What does it mean when I don’t have one?
The Tragedy of Networking: Why Knowing “who” is Harder Than Knowing “Where”
the story of romeo and Juliet isn’t just a tale of star-crossed lovers; it’s a surprisingly apt metaphor for many production system outages. Frequently enough, when things go wrong, we focus on the immediate symptom – a firewall rule, a DNS issue – without actually going back and looking at what the real problem was.Not an uncommon situation in production systems. I have had this happen to me in production systems, with inadvertent dependencies causing outages.A really big one about 15 years ago at Google which took out every public-facing API. The post-mortem was a little better than the one at the end of Romeo and Juliet, but this is roughly the play.
I said it was a tragedy about DNS. We all remember this lovely line, and this is what happens in tied networks. Wherefore are thou Romeo? Romeo is at this IP address. We’re talking to a name. Names are good.Names are an abstraction of the infrastructure.It’s what we want.The problem is DNS is asymmetric. When Juliet calls romeo, how does Romeo know that Juliet is calling him? He’s listening on a socket, or in a balcony in this case, or below a balcony. He’s going to use an API to try and figure out who’s calling him. The problem is this API returns an IP address. It does not say Juliet. Anybody who’s talking to Romeo as far as he’s concerned. When we think about security, we want to know who’s talking to us. Who art thou?
If the answer is an IP address, we don’t know. Not really. Not in modern networking. I see people who go, I’m just going to reverse look up the IP on my corporate network to figure out who this person is. They may be across a map, or you may have split horizon DNS, or all manner of kinds of things that prevent you from doing this. I’m sure it will come as a shock to nobody, but there is no RFC standard, or thing in networking today that when I call an API on a socket and say, tell me who my peer is, that it will tell you reliably a name. Fifty years doing the same thing, and such an API does not exist in POSIX. Why? We solved so many other problems in networking, but we don’t solve that one. This is the fabric, the very basis of modern application networking, the socket API, and yet we can’t do that one thing. Hard problem.
Going back and looking at the firewall rule, this is what they had in production. For everything in the Capulet network, you can’t talk to the Montagues, Montagues can’t talk to Capulets. Looks good. Except those four stars doing a lot of heavy lifting here, it’s turning into, in runtime, in a firewall system, a list of IP addresses. There are lots of ways things can be talking that don’t use the network the way you would expect. You have a ton of IP churn, which makes those rules unstable. You could be using a serverless system like Lambda where they don’t have IP addresses. They could be going through messaging tunnels, all kinds of messaging tunnels. Squid proxies here, there, everywhere. You could have the right code but run it in the wrong place.Maybe I accidentally run Romeo in Juliet’s network. Plain old misconfiguration. We’re all familiar with that one. Just a ton of complexity. It’s really hard to maintain these networks and keep the rules up to date. This could be thousands of rules, tens of thousands of rules.
My personal favorite, and most likely culprit in most cases, is the policy was right once upon a time, but it is not right now. It’s really hard to know that, as the policy involves translation. That translation is dependent on understanding of the infrastructure that has existed at the time the policy was written. If the infrastructure changes and those assumptions are invalid, the policy has rotten.What’s the solution? Maybe we should give things identities. Imagine what networking would be like, and network security would be like, if everything that ran on a network had an identity that was verifiable and provable by everything it talked to, and vice versa. Think about it for a second. the entire firewall industry would be different. how applications and authorization policies written would be different. yet we haven’t done this in networking. It’s shocking.
It’s not like the universe is sat still. People have built solutions to these things, they just haven’t built them into the network in a way that makes it easy for the applications to consume it. The networking layer doesn’t think it’s a responsibility to do this. I think it is. we have PKI, X5
The Perils of the “Big Proxy” Architecture
A common architectural pattern sees organizations funneling all network traffic – ingress, internal, and often excluding egress – through a central proxy, coupled with a comprehensive policy store. This approach offers the allure of complete visibility and control, satisfying demands for robust security and governance. All traffic is inspected, authenticated, authorized, and subject to quotas, rate limits, and auditing.
however, this “big proxy” model isn’t without its drawbacks. These large proxies represent single points of failure (SPOFs), and outages can be notable.This is often exacerbated by technological choices, like hosting the proxy within a Virtual Private Cloud (VPC) and routing on-premise traffic through the cloud and back, leading to increased costs. Organizations frequently leverage powerful, and expensive, API management products and load balancers – often functional overkill for the policies being enforced – resulting in disproportionate expenses per request.
Furthermore, this architecture can inadvertently shape organizational structure, giving rise to dedicated “big proxy” and “big policy store” teams. Interaction with these teams frequently enough devolves into a ticket-based system, mirroring the inefficiencies of conventional network and firewall operations, but potentially with added complexity.
This lack of composability becomes problematic when simple tasks require complex integration with the monolithic proxy. An engineer needing to serve 5 terabytes of data faces a significant impedance mismatch between their goal and the perceived engineering effort required. This friction breeds organizational resistance and incentivizes workarounds, potentially bypassing internal controls and creating entrenched dependencies that are difficult to address later. Ultimately, this creates constant backpressure against controls and a persistent struggle for organizations.
The Future of Policy Enforcement: From Infrastructure to Identity
The need for robust policy controls is exploding, driven by new challenges like managing AI consumption and the ever-increasing complexity of modern infrastructure. Organizations are grappling with a long list of controls – from cost management for AI tokens to PCI compliance and API management – and the desire to apply these consistently across hybrid and multi-cloud environments.
Currently,achieving this consistency frequently enough relies on powerful but complex systems like Envoy proxy. While capable, these tools require significant effort to configure and maintain, frequently enough falling to infrastructure teams acting as “human compilers” translating organizational policies into infrastructure-specific rules. This manual process is prone to errors and struggles to keep pace with evolving business needs.
The ideal solution shifts the focus from how policy is enforced (infrastructure) to who it applies to (identity). Identity-oriented policy roots controls in the structure of the association – “Louis at Solo reports to the CEO” – rather than technical details like IP addresses or VPN connections. This approach allows policies to be defined in business terms, making them easier to understand, audit, and adapt as the organization evolves.
This requires a new pattern: expressing policy in organizational terms and then automatically compiling it into the underlying infrastructure.Currently, this compilation is done manually, but the future lies in automation. Leveraging identity as a fundamental network feature bridges the gap between policy definition and enforcement, streamlining the process and improving consistency.
However, a critical consideration is avoiding Single Points of Failure (SPOFs). Centralized policy enforcement, while tempting for its simplicity, creates a significant risk. A single outage can bring down entire systems, as demonstrated by a major insurer impacted during a hurricane. The key is to distribute policy enforcement throughout the network, aligning failure domains with individual application failures. This ensures that the failure of one component doesn’t cascade into a global disruption.
The Problem of Disconnected Security Policies & the Rise of Policy-as-Code
This text highlights a significant challenge in enterprise security: the disconnect between different security mechanisms – firewall rules, TLS/JWT configurations for application communication, and authorization policies – and the resulting fragility of the overall security posture. Here’s a breakdown of the issues and the proposed solution:
the Current State: Siloed Security
* Self-reliant Policy Creation: Organizations often treat firewall rules,TLS/JWT configurations,and application authorization policies as entirely separate entities. These are written by different teams, at different times, and with little to no coordination.
* Lack of Holistic view: This leads to a situation where no single person or team has a comprehensive understanding of how these policies interact.Dependencies between them are implied but not explicitly defined or validated.
* Broad Firewall Rules & Boundary-Focused Security: Traditional firewall rules tend to be overly permissive, acting as broad “boundary controls” (e.g., preventing communication between entire VPCs) rather than granular, context-aware access controls. This reflects a lack of Zero Trust principles in many organizations’ networking approaches.
* Policy Rot: As these policies are created and maintained in isolation, they become outdated and inconsistent over time (“policy rot”). This is a major contributor to security breaches and lateral movement attacks.The lack of a central, reasoned approach makes updates and maintenance difficult.
The Core Problem: The lack of a unified, consistent way to express security relationships between systems.
The Proposed Solution: Policy-as-code
* Centralized Policy Language: The key idea is to define security policies in a dedicated, high-level policy language. This language would describe the relationships organizations want to enforce between systems, rather than focusing on the specific implementation details.
* Compilation to Infrastructure: This policy language would then be used to automatically generate the necessary configurations for various security systems:
* Firewall rules: The policy language could compile down to specific firewall rules (e.g., Palo alto Networks).
* Authorization Policies: it could also generate authorization policies for API management systems or other access control mechanisms.
* Sustainability & Reduced Risk: By centralizing policy definition,organizations can:
* Reduce Policy Rot: Changes are made in one place (the policy language) and automatically propagated to all relevant systems.
* Improve Security: A consistent, well-defined policy reduces the risk of misconfigurations and vulnerabilities.
* Increase Agility: Faster policy updates and deployments.
Real-World Example:
The text cites a bank in Australia that has implemented this approach, using a custom policy language to generate Palo Alto firewall rules multiple times per day. While the speed of deployment is notable (and potentially carries its own risks), the core benefit is the sustainability and consistency of their security policies.
In essence,the argument is for shifting from a configuration-centric approach to security to a policy-centric approach,leveraging automation to enforce those policies across the entire infrastructure. This is often referred to as “Policy-as-Code.”