1

I am faced with the issue that a gRPC Client in Bidirectional streaming call to the server behind an AWS NLB, nginx ingress controller sometimes throws er "close rpc error: code = Internal desc = unexpected EOF".

Here is my setup:

  • Golang pod gRPC server 1 replica (1) running on EKS cluster (in Singapore)
  • Server exposes to nginx ingress behind AWS NLB (nginx controllers pods are deployed with 3 replicas in 3 different on-demand nodes, and the AWS NLB only targets on-demand nodes - not spot nodes)
  • The client (2) is written in Golang as well running in the 3 Digital Ocean droplets with 1 instance per droplet (in Singapore) connected to the server through a domain pointed to NLB.
  • The client is also a gRPC server that pushes data to a Socket server (3) (in the same VPC in Digital Ocean with (2) - a private connection - 3 droplets - 2 instances per droplet) by bidirectional streaming.

The error is thrown by (2) sometimes with one of the 2 below errors:

close rpc error: code = Internal desc = unexpected EOF

close rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: INTERNAL_ERROR

Logs in nginx:

[error] 25#25: *599548 recv() failed (104: Connection reset by peer) while sending to client

I also found the magic error only came from the (2) that connected with (3) when (3) has users connect to by socket connections. The others (2) that connected to (3) that don't have any client connected did not throw any error, but when I pointed the domain to (3) to move the client to them, they got the errors.

I also tried to disable proxy_buffering or enable it in the Nginx Configmap, but nothing happened. Here is my nginx config:

proxy_buffering off;
....
location ...
                    client_max_body_size                    0;
                    proxy_connect_timeout                   60s;                                                                            
                    proxy_send_timeout                      3600s;                                                                           
                    proxy_read_timeout                      3600s;                                                                           
                                                                                                                                           
                    proxy_buffering                         off;                                                                           
                    proxy_buffer_size                       64k;                                                                           
                    proxy_buffers                           4 64k;                                                                         
                                                                                                                                           
                    proxy_max_temp_file_size                1024m;                                                                         
                                                                                                                                           
                    proxy_request_buffering                 on;                                                                            
                    proxy_http_version                      1.1;                                                                           
                                                                                                                                           
                    proxy_cookie_domain                     off;                                                                           
                    proxy_cookie_path                       off;                                                                           
                                                                                                                                                                                  
                    proxy_next_upstream                     error timeout;                                                                 
                    proxy_next_upstream_timeout             0;                                                                             
                    proxy_next_upstream_tries               3;                                                                             
                                                                                                                                           
                    grpc_pass grpc://upstream_balancer;                                                                                    
                                                                                                                                           
                    proxy_redirect                          off;

Does anyone have faced this error or have any idea? Thank you!

[Update] I tried to connect to server (1) using K8S service NodePort, and it worked. So now I can confirm that the problem comes from nginx ingress. Is there any wrong with my Nginx configuration?

1 Answer 1

0

Have you tried to change grpc_send_timeout, grpc_read_timeout? https://kubernetes.github.io/ingress-nginx/examples/grpc/#notes-on-using-responserequest-streams

2
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review
    – Dave M
    Commented Nov 13, 2023 at 14:53
  • yeah, I tried but it not worked. Then I start to detect if there are any clients (3) that have a lot of delay messages when connected to (2) (the buffer I detect in the server (2)), so I clear the delay buffer from that client, and it not happened so frequent anymore. But I'm still confused about why I fixed the connection from (3) to (2), it solved the error EOF came from the connection (2) to (1)
    – Tristan
    Commented Nov 14, 2023 at 2:51

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .