Resolved
Both API server were down
Today, at around 6:30 UTC, both API servers handling incoming requests returned a 503 status caused by the underlying processes being down.
Once we got notified of the outage, we quickly went on identifying the issue and making sure the servers would be back up as fast as possible.
It took us a bit less than 3 hours between the start of the outage and its resolution.
Now that the issue is resolved, we dug deeper in order to understand what exactly happened, in order to ensure this issue won't popup again in the future.
We discovered that the root cause was our automated update system that updated our local Redis store, causing our server to loose the connection in a way that it couldn't reconnect and blocking it from accepting further requests.
We made the necessary steps to ensure that the package will be updated only when required and with us monitoring it.
The service is now back to normal.
Resolved
·