0

I'm inheriting this old Django project hosted on an EC2 instance. It used to run on Heroku and used a Proximo proxy in front of gunicorn. Now it just runs a systemd script with the following:

ExecStart=/bin/sh -c "bin/proximo gunicorn myApp.wsgi --config config/gunicorn.cfg"

Now, for the most part, this system seems to be stable, but every now and then, we'll get 503 errors on the site, even when the instance is duplicated and behind a load balancer.

The only errors that show up that are related are in the Apache error log around the time that the site goes down:

[Wed Dec 27 18:01:20.487596 2023] [proxy:error] [pid 5912:tid 139805448980224] (2)No such file or directory: AH02454: HTTP: attempt to connect to Unix domain socket /var/run/rpc/xmlrpc.sock (localhost) failed
[Wed Dec 27 18:01:20.487636 2023] [proxy_http:error] [pid 5912:tid 139805448980224] [client 172.31.25.252:41564] AH01114: HTTP: failed to make connection to backend: httpd-UDS

So, naturally, I thought this was something with the proxy and simply removed bin/proximo before the gunicorn in the script. However, the site has goes down multiple times since then and we can see that there's this constant unhealthy target in the load balancer. I guess it's trying to restart over and over? It eventually resolves itself and the site comes back online, but this is really a head scratcher in terms of how to diagnose this. I've tried logging all requests to Cloudwatch to see if anything stands out but it does not. It looks like business as usual and there are no 500 errors in the Django layer of the application. It all seems to be around the load balancer level.

One idea I did have was to install XMLRPC on the server. This seems to be a PHP library and I'm not sure what the consequences of installing it would be, but it's one idea I have, although it doesn't really explain why the error is occurring.

One side note which I'm not sure is important is that there was this old Procfile that was being used for the Heroku instance which still had the proximo command to run the server. I recently removed it just to rule it out but I don't understand how a Procfile would cause something like this on our EC2 instance as I don't think anything is pointing to it.

Anyways, if anyone out there has any ideas and can queue me in on something, please share! I have no concrete answers for the client other than we're monitoring requests and the server. This IS an old Django instance (1.8 i believe) and so things are just holding on and I can't really migrate things to a newer Django + AWS environment without basically rewriting huge parts of the application (which is currently in progress).

Thanks for reading.

1 Answer 1

0

I've found this error can be triggered by a security scanner (Zanshin.TenchiSecurity.com scans specifically I've found) likely testing for CVE-2021-40438 and in effect also inducing issues from it. I found zanshin test a request like "POST /service/?unix:/../../../../var/run/rpc/xmlrpc.sock|http://EQKk/wsrpc HTTP/1.1". From trace logs you can see such a request results in a rewrite on the proxy worker to use the unix domain socket:

[Tue Apr 09 10:05:45.818935 2024] [proxy:trace2] [pid 13231] proxy_util.c(2098): [client 127.0.0.1:44300] *: rewrite of url due to UDS(/var/run/rpc/xmlrpc.sock): http://EQKk/wsrpc (proxy:http://EQKk/wsrpc)
[Tue Apr 09 10:05:45.818937 2024] [proxy:debug] [pid 13231] mod_proxy.c(1264): [client 127.0.0.1:44300] AH01143: Running scheme http handler (attempt 0)
[Tue Apr 09 10:05:45.818941 2024] [proxy_fcgi:debug] [pid 13231] mod_proxy_fcgi.c(1021): [client 127.0.0.1:44300] AH01076: url: http://EQKk/wsrpc proxyname: (null) proxyport: 0
[Tue Apr 09 10:05:45.818944 2024] [proxy_fcgi:debug] [pid 13231] mod_proxy_fcgi.c(1024): [client 127.0.0.1:44300] AH01077: declining URL http://EQKk/wsrpc
[Tue Apr 09 10:05:45.818947 2024] [proxy_wstunnel:debug] [pid 13231] mod_proxy_wstunnel.c(322): [client 127.0.0.1:44300] AH02450: declining URL http://EQKk/wsrpc
[Tue Apr 09 10:05:45.818952 2024] [proxy_http:trace1] [pid 13231] mod_proxy_http.c(1971): [client 127.0.0.1:44300] HTTP: serving URL http://EQKk/wsrpc
[Tue Apr 09 10:05:45.818963 2024] [proxy:debug] [pid 13231] proxy_util.c(2313): AH00942: HTTP: has acquired connection for (127.0.0.1)
[Tue Apr 09 10:05:45.818968 2024] [proxy:debug] [pid 13231] proxy_util.c(2366): [client 127.0.0.1:44300] AH00944: connecting http://EQKk/wsrpc to EQKk:80
[Tue Apr 09 10:05:45.818973 2024] [proxy:debug] [pid 13231] proxy_util.c(2403): [client 127.0.0.1:44300] AH02545: http: has determined UDS as /var/run/rpc/xmlrpc.sock
[Tue Apr 09 10:05:45.819029 2024] [proxy:debug] [pid 13231] proxy_util.c(2575): [client 127.0.0.1:44300] AH00947: connected /wsrpc to httpd-UDS:0
[Tue Apr 09 10:05:45.819088 2024] [proxy:error] [pid 13231] (2)No such file or directory: AH02454: HTTP: attempt to connect to Unix domain socket /var/run/rpc/xmlrpc.sock (127.0.0.1) failed

The proxy worker on this child process is then in a problem state so other requests using this child process see failures continuing to attempt that unix domain socket:

[Tue Apr 09 10:07:22.634729 2024] [proxy_wstunnel:debug] [pid 13231] mod_proxy_wstunnel.c(322): [client 127.0.0.1:50022] AH02450: declining URL http://127.0.0.1:8080/favicon.ico, referer: http://localhost/test/
[Tue Apr 09 10:07:22.634733 2024] [proxy_http:trace1] [pid 13231] mod_proxy_http.c(1971): [client 127.0.0.1:50022] HTTP: serving URL http://127.0.0.1:8080/favicon.ico, referer: http://localhost/test/
[Tue Apr 09 10:07:22.634736 2024] [proxy:debug] [pid 13231] proxy_util.c(2313): AH00942: HTTP: has acquired connection for (127.0.0.1)
[Tue Apr 09 10:07:22.634740 2024] [proxy:debug] [pid 13231] proxy_util.c(2366): [client 127.0.0.1:50022] AH00944: connecting http://127.0.0.1:8080/favicon.ico to 127.0.0.1:8080, referer: http://localhost/test/
[Tue Apr 09 10:07:22.745178 2024] [proxy:error] [pid 13231] [client 127.0.0.1:50022] AH00898: DNS lookup failure for: httpd-UDS returned by /favicon.ico, referer: http://localhost/test/

So I presume you may have some older httpd version 2.4.48 and earlier with mod_proxy configs that are susceptible to CVE-2021-40438. It'd be important to update httpd but on susceptible versions you might use a rewrite rule like below to reject a request that might try to inject the unix domain socket use through its uri or query string:

RewriteEngine On
RewriteRule ^.*unix:.* - [F,L]
RewriteCond %{QUERY_STRING} ^.*unix:.*
RewriteRule .* - [F,L]

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .