We are using WebSockets that hold connections for a long time, possibly weeks.
Apache 2.4.41 w/MPMEvent sometimes decides to stop a process (such as 1: when load drops, 2: when user reloads configuration, 3: when logrotate reloads configuration after rotating logs - Ubuntu default). When a process is stopping it seems to never restart if needed, even if the configuration is not changed (same generation, not used reload config).
The websocket connections prevent the process from stopping, and we can see the connections in /server-status, the threads are marked with a G for Gracefully finishing.
We had a situation where 8 processes were fully loaded and we had 'scoreboard is full', and the 8 remaining processes where 'stopping' and had threads with 'G' Gracefully finishing.
Question 1:
Is it true that once apache decides to stop a process because load has dropped and it is no longer needed as pr MinSpareThreads
, it cannot change its mind on that process and make the process accept connections again? If the only reason that our processes are stuck is that they are old generation (due to apache reload), a fine solution is just to remove logrotate. According to my tests this is true, Apache begin accepting connections to a process that is 'stopping' (but confirmation would be appreciated).
Question 2:
Is it possible to add a timeout so that when Apache needs to start a new process (aka server) but cannot because serverLimit
has been reached, it then forcefully kills a processes that are in state stopping but waiting due to open connections. Thus clearing up space and force-closing connections when required.
I should note that we found the setting GracefulShutdownTimeout, but after discovering this graceful-stop its clear this timout is referring to gracefully stopping the entire Apache server, and not a process (aka server, according to documentation).
I am aware of the possible solutions of not using logrotate (which cause all processes to be stopping every night), and changing our websockets to reconnect frequently. The latter is not easy in our case, and the former does not resolve the problem of fluctuating load leading to stopping processes.
Our mpm_event configuration looked like this:
<IfModule mpm_event_module>
StartServers 3
ServerLimit 16
MinSpareThreads 25
MaxSpareThreads 75
ThreadLimit 64
ThreadsPerChild 25
MaxRequestWorkers 400
MaxConnectionsPerChild 0
</IfModule>