I am faced with the issue that a gRPC Client in Bidirectional streaming call to the server behind an AWS NLB, nginx ingress controller sometimes throws er "close rpc error: code = Internal desc = unexpected EOF".
Here is my setup:
- Golang pod gRPC server 1 replica (1) running on EKS cluster (in Singapore)
- Server exposes to nginx ingress behind AWS NLB (nginx controllers pods are deployed with 3 replicas in 3 different on-demand nodes, and the AWS NLB only targets on-demand nodes - not spot nodes)
- The client (2) is written in Golang as well running in the 3 Digital Ocean droplets with 1 instance per droplet (in Singapore) connected to the server through a domain pointed to NLB.
- The client is also a gRPC server that pushes data to a Socket server (3) (in the same VPC in Digital Ocean with (2) - a private connection - 3 droplets - 2 instances per droplet) by bidirectional streaming.
The error is thrown by (2) sometimes with one of the 2 below errors:
close rpc error: code = Internal desc = unexpected EOF
close rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: INTERNAL_ERROR
Logs in nginx:
[error] 25#25: *599548 recv() failed (104: Connection reset by peer) while sending to client
I also found the magic error only came from the (2) that connected with (3) when (3) has users connect to by socket connections. The others (2) that connected to (3) that don't have any client connected did not throw any error, but when I pointed the domain to (3) to move the client to them, they got the errors.
I also tried to disable proxy_buffering or enable it in the Nginx Configmap, but nothing happened. Here is my nginx config:
proxy_buffering off;
....
location ...
client_max_body_size 0;
proxy_connect_timeout 60s;
proxy_send_timeout 3600s;
proxy_read_timeout 3600s;
proxy_buffering off;
proxy_buffer_size 64k;
proxy_buffers 4 64k;
proxy_max_temp_file_size 1024m;
proxy_request_buffering on;
proxy_http_version 1.1;
proxy_cookie_domain off;
proxy_cookie_path off;
proxy_next_upstream error timeout;
proxy_next_upstream_timeout 0;
proxy_next_upstream_tries 3;
grpc_pass grpc://upstream_balancer;
proxy_redirect off;
Does anyone have faced this error or have any idea? Thank you!
[Update] I tried to connect to server (1) using K8S service NodePort, and it worked. So now I can confirm that the problem comes from nginx ingress. Is there any wrong with my Nginx configuration?