Global outage | PDFShift

Resolved

The service was fully back up ten hours after the incident started.
We rolled out a new load balancing system to better handle the load and ensure a redundancy in case of server failure.

Posted 1 Feb at 08:30pm CET.

Updated

The service was starting to work a bit better three hours after the downtime, but had intermittent issues.

Posted 1 Feb at 01:30pm CET.

Created

On Monday at 9:30am UTC, we initiated an upgrade of our database to resolve some connection issues we had (with rare occurence, but still). The work was only to switch to a more powerful database that was ready and up-to-date.

Unfortunately, this caused our API, handled by AWS Beanstalk, to stop being able to communicate to the database, even though everything was set up correctly. This issue escalated as we were unable to revert to the old database for a reason we could not retrieve because of how Beanstalk is done.

This forced us to migrate the API to a new Load Balancer at AWS with a configured set of servers which took some time to setup.

Once this was done, the service started working again, but intermittently with some downtime occuring too often. This was at around 1pm.

The rest of the afternoon, we tried to stabilize the issues and ensure the downtime stopped, which we managed to do at around 10pm.

In order to ensure that this won't happen again, we have implemented a redudancy on the load balancer and we are in the process of hiring an AWS expert to bring a fresh pair of eyes on our infrastructure and our code to tighten everything, improve the speed, reliance and uptime of PDFShift.

Posted 1 Feb at 10:30am CET.