Timeline
During the outage window, there were intermittent 2–5 minute periods where authentications briefly recovered as AWS updates were applied. However, most users experienced continuous disruption throughout the event.
Root Cause
The incident was caused by an internal configuration error during routine preparation of new infrastructure. A single AWS flag was incorrectly set, which applied non-production-ready changes to the production environment. Because these changes were propagated across all resources, our regional failover did not activate as expected.
Why Resolution Took Time
We identified the root cause within minutes and initiated recovery procedures. However, AWS resource errors prevented restoration from completing successfully. Each recovery attempt required:
This significantly extended the recovery timeline. Following AWS guidance, we fully rebuilt affected resources and databases, which restored authentication at 5:24 PM ET and allowed full-service recovery shortly after.
Mitigation & Next Steps
We are conducting a full review of both the deployment safeguards and recovery processes involved in this incident. We have already engaged our AWS infrastructure expert resources to help us in this process.
Our focus is on: