In order to manage a large number of outbound connections to an external service from your K8s cluster, you may need to improve your setup. Each node (30) has a large number of small pods connected to a single external service via TCP connections, for a total of about 60 connections per node. You are finding that these pods take a long time to connect.
Potential causes of slow connections:
Numerous connections per node might put a burden on resources and delay the formation of connections. Each pod creating its own connections may lead to overhead and inefficiencies.
Yes, Recommended method is Reverse proxy:
Using a reverse proxy is the generally recommended approach for managing a large number of outgoing connections efficiently. It centralizes connection management and reduces overhead on individual pods. Refer to Radware article on Reverse proxy for more details.
- Install a reverse proxy on every node, such as HAproxy or NGINX, as a daemonset.
- Set the pods up so that they link to the internal reverse proxy rather than the external service.
- The connection pool to the remote service is managed by reverse proxy. Decreasing the total number of connections and increasing productivity.
Alternative method is Connection Pooling:
Consider using libraries within your pods that implement connection pooling for the specific service you are connecting to. These libraries can reuse existing connections reducing the need for frequent new connections. Refer to Michael Aboagye Stackoverflow blog on Connection pooling for more details.
If you are using connection pooling libraries ensure compatibility with your programming language and the service you are connecting to.
Monitor your cluster performance and connection times analyze metrics related to network traffic and resource usage to gauge effectiveness.