How a JSON response took down our backend

Our backend recently started crashing because one API was returning a JSON response containing a couple of million objects. It wasn't a traffic spike or a bad deployment the server was simply running out of memory trying to build and serialise an absurdly large payload.

Areeb ur RubAreeb ur Rub

2 min read

How a JSON response took down our backend

A few days ago, one of our analytics APIs started crashing our backend. No traffic spike. No bad deployment. No database outage. Just one endpoint that would slowly eat up memory until the process died.

Like every developer, the first thing we blamed was the SQL query. We checked the joins, indexes, and execution plan. Everything looked perfectly fine.

Then we looked at the response. It was huge. The endpoint powers a time-series engagement graph. Since we don't store data for days where nothing happened, the backend fills in those missing days with zeroes before returning the response.

For a normal campaign, that means a few hundred objects. For this campaign, it was trying to generate millions. After digging through the data, we found the reason. One content item had somehow been saved with this publication date: 8 November 8390

Yep. The year 8390.

Our backend didn't think twice. It simply started generating daily entries from that date, filling in the gaps like it always does. By the end of it, the response wasn't a few kilobytes anymore. It had ballooned into a JSON payload containing millions of objects, large enough to consume hundreds of megabytes of memory just to build and serialise.

The database query itself wasn't the problem. The backend was crashing because it was trying to send an absurdly large JSON response. What I found funny is that there wasn't really a bug in the code. The logic was correct. The backend did exactly what we told it to do.

The real bug was the assumption that a content will ever have year as 8390. We fixed the bad data, added validation for incoming dates, and put limits on how much data a single request can generate.

It was a nice reminder that sometimes production incidents aren't caused by complicated systems. Sometimes all it takes is one weird value in your database and a backend that's trying a little too hard to be helpful.