Boring is better: Pushing 67k telemetry events per second into Postgres

We collect a tremendous amount of telemetry data. Prompts, usage metrics, session data, insights from every AI code assistant and desktop app our customers’ engineers use. All of it funnelling in from thousands of users into our insights platform.

At Flowstate, we use OpenTelemetry to measure AI spend. Every API call, every coding session, every model invocation, connected back to teams, projects, and cost centres. The volume is significant and it never stops flowing.

This means we need to store a lot of data. Not “analytics dashboard” data. Not “monthly report” data. Every span, every metric, every log line from every user across every customer, persisted so our analysis pipeline can chew on it later.

Everyone’s first instinct is the same: “You need a data warehouse.”

I tried. I really tried.

Shopping around

To be clear, the products I looked at are all excellent at what they’re designed for. They’re built for organisations with large-scale data teams and complex analytical workloads. Our problem was simpler: write a lot of telemetry data fast, query it later. For that, most of these solutions were more than we needed.

There was also a practical concern that kept nagging me. I should be able to develop on an airplane. Not that I literally do, but it’s the essence of the problem. If a component of my stack requires an internet connection to function, I can’t build, test and iterate locally. I can’t spin it up in Docker and throw data at it on a Saturday morning. A lot of these data warehouses felt like they were trying to solve problems that a well-tuned Postgres could handle.

BigQuery was the first stop. Great for querying at scale, less ideal for high-volume continuous writes. The streaming insert API has per-row costs that add up quickly at our volume, and the latency profile doesn’t suit near-real-time ingestion. Phenomenal analytics engine, just not the right fit for a write-heavy telemetry pipeline.

Snowflake is a powerful data platform, but the pricing model is complex (compute credits, storage, data transfer) and for what we needed, it was significantly more infrastructure than the problem required. When the alternative is a well-tuned Postgres, the cost comparison is stark.

Databricks is impressive if you have a dedicated data engineering function. The lakehouse architecture and Spark integration are genuinely powerful. But we don’t have a team of twelve data engineers, and introducing a platform of that complexity for what is fundamentally “write rows, query rows” felt like bringing a predator drone to a knife fight.

Amazon Redshift, Azure Synapse, and other managed analytical offerings are all solid products, but they each introduce operational overhead that didn’t match where we were at. Another system to monitor, another set of credentials, another vendor relationship.

ClickHouse was the most compelling option. Column-oriented, purpose-built for write-heavy analytical workloads, open source, and genuinely fast. I liked it a lot. The reason I moved away from it wasn’t the technology, it was deployment. Getting ClickHouse running reliably in a managed environment was more friction than I wanted. ClickHouse Cloud exists, but that’s another vendor dependency when I already have managed Postgres on Google Cloud SQL. And you can do analytics querying in Postgres anyway, so ClickHouse just felt like extra steps.

Which brought me back to the database I was already running.

I’ve built some wild stuff on Postgres over the years, and the question always comes up: “Are we sure one database is enough?” I can tell you from experience, one database can take a beating. There are stories of people running Postgres at petabyte scale. It can be done, with difficulty, but it can be done. So let’s see what we can do for our humble use case.

I already have Postgres. It already works. Let’s find out where it stops working.

The failures

What follows is a condensed timeline of me learning things the hard way.

Attempt 1: Just write to the production database

The first version was exactly as stupid as it sounds. Every time a telemetry event came in, we’d fire an INSERT at the production Postgres instance. One row at a time. Individually. Like posting letters.

It worked at low volume. It worked at medium volume. It stopped working at 3am on a Tuesday when the proxy service started handling a traffic spike and suddenly we were doing 10,000 individual INSERTs per second against a database that was also trying to serve the actual application.

I literally couldn’t get a write lock to run a migration. The connection pool was saturated, every available connection was blocked on an INSERT, and the migration runner was sitting there waiting for a lock that would never come. I ended up killing connections manually to get the migration through. At 3am. On a Tuesday.

Attempt 2: Batching (getting warmer)

The obvious fix: stop writing one row at a time. Buffer events in memory, flush to the database in batches of 500-1000 rows using multi-row INSERTs.

This was dramatically better. Instead of 10,000 transactions, you get 10-20 larger transactions. The database breathes. The connection pool empties. Migrations run.

But we had a new problem: what happens when the application crashes between batches? You lose whatever’s in the buffer.

Attempt 3: Redis as a buffer

So we added Redis as an intermediate buffer. Events come in, get appended to a Redis Stream (XADD), and a background worker consumes the stream (XREADGROUP) and flushes batches to Postgres.

This actually worked, and Redis is still in the final architecture. But not as a data store. It stores pointers to incoming objects, keeps track of what’s arrived and what’s been flushed, and lets us batch intelligently without the application needing to hold state. If the firehose has water pressure so high it would rip your skin off, Redis is the valve that turns it into something manageable.

The key insight was not using Redis as a database. It’s a coordination layer. The telemetry data itself goes straight to Postgres. Redis just tells us what’s waiting and what’s been processed.

The question

At this point I had a system that worked. Postgres for storage, Redis for coordination, batched COPY writes for throughput. But I didn’t actually know how much Postgres could take. Could it handle 10x the current load? 100x? What’s the actual ceiling?

There’s only one way to find out.

Let’s just… test it

I decided to do what any reasonable person would do: spin up Postgres in a Docker container on my laptop and throw increasingly absurd amounts of telemetry at it until something broke.

The setup:

Postgres 16 running in Docker with 512MB shared buffers and 500 max connections
pgBouncer in transaction pooling mode for the connection pooling tests
An OpenTelemetry-style schema: spans, metrics, and logs tables with JSONB attributes
A telemetry generator that creates realistic traces: ~1,900 events per simulated user at ~3KB each, spread across ~100 traces with 5-12 spans, metrics and logs
Four write strategies, tested across increasing user counts
For the extreme tests (10K+ users), we run 30-second burst tests because at ~200MB/sec sustained writes, my 512GB SSD would fill up fast. Any longer and I’d be benchmarking my laptop’s storage failure mode, not Postgres.

All of this running on a 2022 M2 MacBook Air with 24GB of RAM and a 512GB SSD. Not a server. A laptop.

The four strategies

Naive INSERT: One INSERT statement per telemetry event. 50 concurrent connections. The “please don’t do this” baseline, and exactly what I was doing in production at 3am on that Tuesday.

Batched INSERT: Accumulate events into batches of 500 rows, write them in a single multi-row INSERT. 20 connections in a pool.

COPY protocol: Postgres has a built-in bulk data loading protocol called COPY. It streams tab-separated data directly into the table, bypassing the SQL parser entirely. This is what ETL tools use. Batches of 5,000 rows piped through pg-copy-streams.

Pooled + Batched: Same batching strategy, but routed through pgBouncer in transaction pooling mode. Tests whether connection pooling adds meaningful throughput at high concurrency.

The numbers

At 1,000 simulated users, each generating ~1,900 events at ~3KB, we’re writing approximately 1.8 million rows of realistic telemetry across three tables. Real JSONB payloads with model names, token counts, cost data, session IDs. The kind of data you’d actually see in a production AI telemetry pipeline.

Here’s what happened.

Writes per second

At low volume, everything looks roughly the same. You can’t tell the difference between strategies when you’re writing a couple thousand rows.

But scale up and the lines diverge. The naive approach plateaus at about 14,000 writes/sec and stays there. You’re bottlenecked on per-statement overhead and connection contention.

COPY protocol hits 67,000 writes per second. On a laptop. In a Docker container. With max_wal_size at 4GB and default checkpoint settings. With 3KB payloads, that’s roughly 200MB/sec of sustained write throughput. If we’d tuned checkpoint timeouts and WAL compression, we probably could have pushed it higher.

One caveat on COPY: it’s all-or-nothing. If one row in a 5,000-row batch has malformed JSON or violates a constraint, the entire batch fails. Batched INSERT lets you use ON CONFLICT clauses to handle duplicates and messy data gracefully. In production, we pre-validate before COPY and fall back to batched INSERT for anything that looks dodgy.

Batched INSERT lands in the middle at ~34K writes/sec. Solid and practical — you don’t need to learn a new API to do it.

The pooled strategy through pgBouncer came in at ~43-49K writes/sec, faster than raw batching because pgBouncer reuses connections more efficiently than our application-level pool.

Head-to-head at 1,000 users

At 1,000 users (1.8M events), COPY is doing 67K writes/sec while naive is stuck at 14K. That’s nearly 5x faster for the same data. At these payload sizes, COPY wrote 916K events in 13.6 seconds. Naive took over a minute.

But here’s what surprised me: none of the strategies fell over. I expected Postgres to start choking. Connection errors, OOM kills, WAL bloat. None of it happened. Postgres just handled it.

So naturally, I decided to turn it up.

A quick reality check

Let’s be clear: we do not currently have 10 million concurrent users sending 1,900 telemetry events a day. If we did, we’d be making enough money to hire an entire department to worry about database architecture.

Right now, our actual production volume is easily handled by our current Postgres setup without breaking a sweat. So why push the simulation up to millions of users generating hundreds of megabytes per second?

Strictly to prove a point.

I wanted to know what happens when the “boring” choice finally hits the wall. And what I found is that even at absurd, theoretical scales where Postgres does start to slow down and the queries degrade, it doesn’t just die. It degrades gracefully. And more importantly, when it does slow down, you have standard, boring levers you can pull to get the speed back.

The real test: writes AND reads at the same time

Here’s the thing about write benchmarks in isolation. They’re lying to you.

In production, you don’t get to pause reads while you ingest data. Our analysis pipeline runs aggregate queries against this data while it’s being written. Percentile calculations across millions of spans. Service dependency maps. Error rate breakdowns. Queries that make Postgres think hard.

So I did the obvious thing: ran streaming COPY writes and 8 analytical queries simultaneously, scaling from 1,000 users all the way up to 10 million. For the larger tests, I used 30-second burst windows because at these write rates, my 512GB MacBook SSD would physically fill up in under 10 minutes.

At 1,000 users (full write, 1.8M events), COPY sustained 39K writes/sec while simultaneously serving over 7,000 analytical queries with a p95 of 9ms. The slowest query, the dependency map JOIN, took 1.2 seconds.

At 10 million simulated users, running as a 30-second burst, Postgres maintained 16,763 writes/sec while serving analytical queries with a p95 under 200ms. In 30 seconds, it wrote 514,536 rows of 3KB telemetry data. Across all scale points, the dependency map query (a self-join across the spans table) was consistently the slowest, peaking at about 1.2 seconds.

The write throughput does degrade as the table grows and reads compete for I/O. That’s expected. But Postgres never crashed, never OOM’d, never corrupted anything. It got slower and it kept going. Every query returned correct results. Every write was committed.

And remember: this was running with only 512MB of shared_buffers. The database was significantly larger than its cache. If I’d given it 4GB of shared_buffers on this 24GB Mac, the entire working set would have lived in RAM and those queries would have been much faster. I deliberately didn’t, because production databases don’t always get to keep everything in memory.

Okay but can we fix the slow reads?

That dependency map query was bugging me. So I added indexes and re-ran the test against ~920K rows (500 users worth of telemetry).

Seven targeted indexes: service name lookups, status code filters, trace ID joins, parent span relationships, composite metric lookups and severity distributions. Then ANALYZE to update the query planner’s statistics.

The standout result:

Recent errors query: 51ms → 1.2ms. A 43x speedup. To be clear, status_code and start_time are top-level columns in the schema, not buried in the JSONB payload. The JSONB attributes column stores the flexible stuff (HTTP headers, token counts, model names, cost data). The fields we query frequently are extracted columns with proper types. The partial index on status_code = 2 combined with the start_time DESC index let Postgres skip the full table scan entirely. It just walks the index and grabs the top 50.

Log severity distribution: 65ms → 33ms. The composite index on (severity, service_name) turns a sequential scan into an index-only scan. 2x faster.

The dependency map query dropped from 456ms to 320ms. A 1.4x improvement. Better, but not transformative. That query is doing a self-join across hundreds of thousands of rows, and no single index can eliminate the fundamental cost of correlating parent-child spans at that scale.

But that’s fine. There are clear scaling paths when you need them.

The scaling roadmap (when you actually need it)

This is the part where I’m going to argue against over-optimising early. What we’ve built works. It handles our current load comfortably, and it’ll handle 10x without breaking a sweat. But if we ever get to the point where we’re processing hundreds of millions of events, there are two clear paths forward, both still using Postgres.

Path 1: Read replicas

The simplest scaling move. Set up a streaming replication replica and point all analytical queries at it. Writes go to the primary, reads go to the replica. The primary can focus entirely on ingestion, and the replica can chew on complex JOINs without impacting write throughput.

This is a Tuesday afternoon change. Google Cloud SQL, AWS RDS, and Azure all support read replicas natively. You add a connection string and a routing rule. Your write throughput goes back to the standalone 67K writes/sec range because it’s not fighting with reads anymore, and your analytical queries can take as long as they want on the replica without anyone noticing. On proper server hardware with more CPU cores and faster storage, that number would be significantly higher.

For our use case, where the analysis doesn’t need to be real-time, a replica with a few seconds of replication lag is perfectly fine.

Path 2: Table partitioning

Time-based partitioning splits your tables into chunks. One partition per day, per week or per month. Queries that filter by time only scan the relevant partitions instead of the entire table. That 1.2-second dependency map query? If you’re only looking at the last 24 hours instead of the full history, you’re scanning a fraction of the data. The query drops from seconds to milliseconds.

Partitioning also makes data lifecycle management trivial. Want to drop data older than 90 days? DROP TABLE spans_2025_q4. No vacuum, no bloat, no locked table. Instant.

The mega-scale option

If we ever needed to go truly bonkers — hundreds of millions of users, billions of telemetry events — Postgres has an answer for that too. Citus is a Postgres extension (fully open-sourced by Microsoft) that distributes your data across multiple Postgres nodes using hash-based sharding. You CREATE EXTENSION citus; and you’re off. Your schema stays the same. Your queries stay the same. You just have more nodes doing the work.

Shard the spans table by trace_id and each node handles a slice of the total dataset. A query for a specific trace hits one node. An aggregate query fans out across all nodes and merges the results. Linear horizontal scaling while keeping everything in the Postgres ecosystem. Same tooling, same monitoring, same expertise.

We don’t need this. We probably won’t need this for a very long time. But the fact that the path exists, without leaving Postgres, is exactly why starting simple was the right call.

Don’t over-optimise early

I want to be really clear about this: the architecture we’re running right now is not the most optimal architecture for this problem. It’s the most appropriate one.

We have a data pipeline. We can perform complex queries while spamming writes at it. It handles concurrent analytical workloads. It doesn’t fall over. So why would I need a different database product for this?

Sure, disk I/O might become a limit at some point. But there are definitive scaling options, and we know exactly what they are because we’ve measured where the bottlenecks live. That’s the advantage of starting simple: when you need to scale, you know what to scale.

We could have started with ClickHouse. We could have set up Citus from day one. We could have built a Lambda architecture with Kafka and a stream processor and a serving layer and a batch layer and a… you get the point.

But all of that complexity has a cost. Every additional system is another thing to monitor, another thing that can fail at 3am, another thing your team needs to understand. If you don’t have the scale to justify it, you’re paying the complexity tax without getting the scalability benefit.

Postgres on Cloud SQL, with Redis as a coordination layer, using COPY protocol for batch ingestion. That’s our architecture. It handles our current load. It’ll handle 10x our current load. And when it can’t handle 100x, we’ll know exactly where the bottlenecks are because we’ve measured them.

The good news is we don’t need this service to be fast. The telemetry data is processed in batches for analysis, not served in real-time to users. A query that takes 4 seconds instead of 40 milliseconds is completely fine for our use case.

But if we ever did need it to be fast… well, you’ve seen the numbers. There’s a lot of headroom.

What I actually learned

Never use individual INSERTs for high-throughput writes. Batch or COPY. Always.
The COPY protocol exists for a reason. It’s not just for initial data loads. It’s a legitimate production ingestion strategy.
Connection pooling is not optional at scale. pgBouncer in transaction mode. No excuses.
Test writes AND reads together. Write-only benchmarks are misleading. The real performance profile is what happens when your database is doing both at once.
Indexes matter, but not for everything. A well-placed partial index can give you 43x speedup on targeted queries. But aggregate queries over millions of rows are going to be slow no matter what. That’s when you reach for replicas and partitioning.
Don’t over-optimise early. Start with the simplest architecture that works. Measure where the bottlenecks are. Scale the parts that need scaling. Not everything, not all at once.
Postgres can handle more than people give it credit for. 67K writes/sec of 3KB telemetry events. 39K writes/sec while simultaneously serving analytical queries. Scaled to 10M simulated users and it still didn’t crash. On a laptop. In Docker.
Test your assumptions. I spent weeks reading blog posts about data warehouse comparisons. I could have spent an afternoon with Docker and a script and had actual numbers. The numbers told a better story.

Oh, and one more thing. We spent this entire post putting Postgres through absolute hell. Millions of rows, concurrent reads, self-joins across massive spans. And we didn’t even think about the Redis sitting in front of it. The Redis that’s coordinating all of this, handling the firehose of incoming telemetry, tracking pointers, managing batch state. It didn’t even flinch. We didn’t benchmark it because there was nothing to benchmark. It just worked.

Most problems really can be solved with a web server, a Redis and a Postgres.

The benchmark code is written in Go and lives at github.com/willhackett/bench-postgres. docker-compose up -d, go run . -mode=full for the write strategies, go run . -mode=chaos for the combined write+read chaos test and go run . -mode=optimize for the indexing comparison.