This incident was caused by an un-optimized database query, created by an edge-case in API logic. A distinct kind of API request caused several of these un-optimized queries to run, eating up the database workers' CPU. This, in turn, caused a reduced capacity of the database to handle other queries, in turn raising the API error rates to 2-3%. We have remedied this problem at both the database level (by imposing stricter query time limits) and the application level (by accounting for the edge case and preventing it from generating an un-optimized query).
Posted Sep 13, 2019 - 13:37 EDT
API performance has returned to optimal conditions.
Posted Sep 13, 2019 - 11:05 EDT
We have terminated the long running queries and performance is stabilizing. Error rates below 0.5%.
Posted Sep 13, 2019 - 10:56 EDT
We have identified several long running queries that are causing the performance degradation.
Posted Sep 13, 2019 - 10:46 EDT
We are experiencing elevated error rates spiking periodically at 4%. This appears to be a load issue. We are investigating.