API Outage
Started 28 Feb at 11:55am CET, resolved 28 Feb at 06:42pm CET.
Resolved
The service seems to be now back up and working correctly.
We'll keep a close eye on the service and are already starting to implement a new plan in case something like this happens again.
Our objective now is to setup an architecture that can fail at AWS and still remain operational for all our users.
Updated
We are starting to see positives results from the outage, the current deployment seems to handle the issue. We'll keep an eye to ensure it stays on.
Updated
A new error is occuring at the SNS endpoint from AWS...
Updated
We are currently deploying an update that tries to improve the communication with AWS SQS when their service is failing. The deployment is in progress and the API should be updated in about 20 minutes from now.
Updated
There is definitely an issue going on at AWS SQS: Sentry is reporting 12 news issues today, all of them are related to issues connecting to AWS SQS.
We are working on a fix to mitigate this right now.
Created
Starting at 10:55 am today, our API started to get slower and slower in providing responses, causing requests to be dropped and long processing time.
We are currently investigating the issue, but the errors we are receiving tends to point toward AWS having some issues with some services we rely on (SQS mostly is causing issues).
We are working on finding a solution to mitigate this and will keep you updated of any news.