API Performance Degraded

Incident Report for Intrinio

Postmortem

This incident was caused by an un-optimized database query, created by an edge-case in API logic. A distinct kind of API request caused several of these un-optimized queries to run, eating up the database workers' CPU. This, in turn, caused a reduced capacity of the database to handle other queries, in turn raising the API error rates to 2-3%. We have remedied this problem at both the database level (by imposing stricter query time limits) and the application level (by accounting for the edge case and preventing it from generating an un-optimized query).

Posted Sep 13, 2019 - 13:37 EDT

Resolved

API performance has returned to optimal conditions.

Posted Sep 13, 2019 - 11:05 EDT

Monitoring

We have terminated the long running queries and performance is stabilizing. Error rates below 0.5%.

Posted Sep 13, 2019 - 10:56 EDT

Identified

We have identified several long running queries that are causing the performance degradation.

Posted Sep 13, 2019 - 10:46 EDT

Investigating

We are experiencing elevated error rates spiking periodically at 4%. This appears to be a load issue. We are investigating.

Posted Sep 13, 2019 - 10:21 EDT

This incident affected: Web APIs (APIv2, APIv1).