Watch Out for This Istio Proxy Sidecar Memory Pitfall

Pranay Singhal
Geek Culture
Published in
5 min readMar 15, 2021

--

Image Source: YouTube

We encountered many perplexing issues throughout our ongoing service mesh journey with Istio at my company, many of which left us wishing that we had known about those pitfalls in advance. This is one such pitfall that you might encounter while using Istio in large Kubernetes clusters. By sharing my experience with this issue, I hope to help you avoid it in your own Istio travails.

The issue — excessive memory consumption by Istio proxy sidecars

Envoy proxy sidecars are the cornerstone of the Istio service mesh architecture. Envoy proxies perform all of the traffic management functions for the services in the mesh, such as routing, mTLS, circuit breaking, authorization, retries, etc.

When we first deployed Istio in smaller Kubernetes clusters dedicated for Istio experimentation, things worked out in textbook style. We learned and experimented with its features, and built up our confidence to move forward. We then decided to deploy Istio in our larger development Kubernetes cluster. This cluster was already in use for some time, and was being used actively by several teams across several lines of business to deploy and test their containerized workloads. The number of unique services deployed in this cluster was in the order of several hundreds.

As soon as we deployed our test Istio app in this cluster, we noticed that the proxy sidecars in each pod, which were consuming an average of 60–70 MB in the smaller cluster were now suddenly consuming 700MB to 1.2 GB each in the larger cluster! It was the same application, with no changes to configuration. Why was it suddenly consuming so much memory?!

Excessive memory consumption by Envoy proxy sidecars

Quotas of 1GB per pod were simply not feasible. But more importantly, it was clear that something was not quite right, so we decided to investigate.

The root cause

In order to perform its traffic management function, each proxy sidecar needs to be aware of the services ecosystem in the cluster. This information is fed to the proxy sidecars by the “pilot” component of the Istio control plane (pilot is now part of the single “istiod” service in the Istio control plane).

Pilot feeds metadata about deployed cluster services to each proxy sidecar (Source: Istio.io)

Here’s the catch — by default, pilot assumes that each proxy could potentially need to route traffic to any service in the cluster, so it goes ahead and pushes metadata about every service in the cluster to each proxy. Proxies hold this metadata in memory. By default, the amount of metadata pushed by the Istio control plane (and therefore, the amount of data held in memory by each proxy) is directly proportional to the number of services deployed in the cluster. In our case, this number was in several hundreds. The consequence of this was what we observed immediately after deploying the first mesh service in this cluster — sidecars consuming 1 GB memory each!

You can use the istioctl tool to find out how much config information the sidecar proxy in a particular pod is loaded with.

istioctl proxy-config clusters <<name-of-pod>> -n <<namespace-where-pod-is-deployed>>

In our case, it was a lot!

The simple fix

Fortunately, there is a simple way to fix this default behavior that causes the runaway proxy sidecar memory consumption issue in clusters with a large number of services. Using the Sidecar custom resource, you can easily limit the namespaces for which the Istio control plane will push information to your proxy sidecars. In our case, we deployed the following Sidecar resource:

apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
name: default
namespace: istio-system
spec:
egress:
- hosts:
- "./*"
- "istio-system/*"

The above Sidecar limits the scope of the traffic management of Istio proxy sidecars deployed in the cluster to only the services deployed in the same namespace as them, and to services deployed in the istio-system namespace, where the Istio control plane and ingress/egress gateway services are deployed. Along with imposing other traffic management restrictions, this Sidecar has the effect of limiting how much config data is pushed to and cached by the proxies.

Note that the “default” Sidecar resource needs to be deployed in the “rootNamespace” as configured via MeshConfig. By default, this is the “istio-system” namespace, but if you have overridden this in your cluster, your default Sidecar would need to be deployed in whichever namespace you have designated as the “rootNamespace” for Istio config.

Once we deployed this Sidecar resource and redeployed our Istio app, we immediately noticed the difference in memory consumption by the proxy sidecars.

Reduction in proxy sidecar memory consumption after namespace restriction via “Sidecar” resource

The memory consumption went down from 1GB to around 74MB — a significant improvement. We verified the proxies using the istioctl tool and confirmed that they indeed had only namespace-specific config information after the change.

What if you need the proxy sidecars in a particular namespace to be able to communicate with some other namespaces in the cluster? This can be achieved by overriding the default Sidecar resource by deploying finer tuned Sidecar resources in individual namespaces that expand the namespace scope of proxies in that namespace.

Conclusion

The characteristics of the Kubernetes cluster in which you deploy Istio has a clear impact on its behavior when it comes to resource consumption. It is very important to monitor resource metrics associated with various Istio control plane and data plane components, especially when you are moving the service mesh apps into a new cluster, or changing the architecture and size of your current cluster. The default behavior may not be the right one. Fortunately, there are techniques and methods available to adjust the default behavior — in this case, the resource consumption of proxy sidecars.

Check out this short video by Christian Posta (co-author of the upcoming book Istio in Action), which explains the problem and the solution quite well: https://www.youtube.com/watch?v=JcfLUHdntN4

--

--