0

I'll try to keep this brief, but I have a lot of code examples to show. Please let me know if you need more context!

We recently updated our flask server to handle web socket connections using flask socketio. There was a major performance hit (we use k6 to load test the prod server, and have cloudwatch logs in elastic search for individual apis), which made sense because you can only utilize 1 worker for the server (we used 4 previously).

To solve this, we adjusted the nginx configuration to load balance multiple socketio servers. This helped, and the performance overall certainly improved. But on certain pages the page doesn't load for anywhere from 15-45 seconds. This occurs on pages that hit a large number of APIs (50+), and I'm guessing it is related to this error in sentry, which we never had before adding flask socketio

General Exception TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/13/3o7r) while setting user Based on this error, it seems that when too many requests are made at once and the queuepool limit is reached, the server is hanging and causing these extremely slow load times. So increasing the pool size and max overflow should help to alleviate the problem.

Question: How can I optimize our Flask-SocketIO setup to eliminate the QueuePool overflow errors and regain our previous performance levels? Are there architectural changes or advanced techniques beyond increasing resource limits and adding server nodes that I should consider?

Here is the nginx conf

upstream socketio_nodes {
   # This needs to be enabled for web sockets to work
    ip_hash;

    server 127.0.0.1:5000;
    server 127.0.0.1:5001;
    server 127.0.0.1:5002;
    # to scale the app, just add more nodes here!
}

server {
    listen 80;
    server_name *.our-site.com;

    location / {
        include proxy_params;
    proxy_pass http://socketio_nodes;
    }

    location /socket.io {
        include proxy_params;
        proxy_http_version 1.1;
        proxy_buffering off;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_pass http://socketio_nodes/socket.io;
    }
    # Consider blocking access to source maps for security reasons. These will be uploaded to sentry during build.
    # location ~ \.map$ {
    #     deny all;
    # }
}

Here is the server service

[Unit]
Description=Gunicorn instance to serve our website
After=network.target

[Service]
User=ubuntu
Group=www-data
WorkingDirectory=/home/ubuntu/briefcase
Environment="PATH=/home/ubuntu/briefcaseEnv/bin:/usr/bin:/bin"
ExecStart=/home/ubuntu/briefcaseEnv/bin/gunicorn -k "geventwebsocket.gunicorn.workers.GeventWebSocketWorker" -w 1 --bind 0.0.0.0:%i --log-level=warning wsgi:app

And here in the deployment is where we spin up three servers

sudo systemctl restart [email protected]
sudo systemctl restart [email protected]
sudo systemctl restart [email protected]

If it helps, here is the flask socketio related logic

The socket instance

from flask_socketio import SocketIO, emit
import os


# configure cors_allowed_origins
if os.environ.get('FLASK_ENV') == 'production':
    origins = [
        # app url here
    ]
else:
    # allow all origins for development
    origins = "*"

# initialize your socket instance
# TODO: do we need the async_mode specified? How will this work in production?
socketio = SocketIO(async_mode='gevent', cors_allowed_origins=origins)

In app.py

# monkey patch at the top of the file
from gevent import monkey
monkey.patch_all()
from libs.Sockets.socket_instance import socketio

    socketio.init_app(app, message_queue='amqp://')

    if parsed_args.url_map:
        logging.info('\n\n########################################\n\n')
        logging.info(app.url_map)
        logging.info(
            '\n\n########################################\n^^^ App URL Map ^^^\n')
    socketio.run(app)(port=5000)

Let me know if any more context is needed (e.g. k6 summaries)

0

You must log in to answer this question.

Browse other questions tagged .