Postmortem -
Read details
Feb 14, 03:27 UTC
Resolved -
Service has been fully restored. All impacted jobs have been requeued and are currently processing normally. We will be publishing a public post-mortem with additional details about this incident.
Feb 13, 01:53 UTC
Monitoring -
The revert of the change helped and most of the metrics are back to the pre incident levels. We are requeuing failed jobs and monitoring to make sure the issue doesn’t come back.
Feb 13, 01:26 UTC
Identified -
We identified a potential internal networking configuration that may have caused the incident. We have since reverted that change and it appears services are recovering.
Feb 13, 01:04 UTC
Update -
We are still investigating the root cause for this incident. us-east-2 region isn’t receiving any network traffic at this point. We are also seeing some API request errors in other US regions, but not as high as us-east-2.
Feb 12, 23:58 UTC
Update -
We continue to see increased levels of 500 errors across US-West and US-East regions. Our engineering team is investigating the issue.
Feb 12, 22:57 UTC
Update -
The issue identified it as a problem in US-West with some impact in US-East and the impact seems to be primarily on reads rather than writes.
Feb 12, 22:37 UTC
Investigating -
We have identified increasing 500 errors in some US regions and are actively investigating the cause.
Feb 12, 21:32 UTC