notes > hn-on-databases

Hacker News on databases

What is big?

> By Jan 2024, our largest table had roughly 100 million rows.

I did a double take at this. At the onset of the article, the fact they're using a distributed database and the mention of a "mid 6 figure" DB bill made me assume they have some obscenely large database that's far beyond what a single node could do. They don't detail the Postgres setup that replaced it, so I assume it's a pretty standard single primary and a 100 million row table is well within the abilities of that—I have a 150 million row table happily plugging along on a 2vCPU+16GB instance. Apples and oranges, perhaps, but people shouldn't underestimate what a single modern server can do.

– luhn (2025)

[…] Devs usually do a double take when I tell them that their table with 100K rows is not in fact big, or even medium. Everyone’s experiences are different, of course, but to me, big is somewhere in the high hundreds of millions range. After a billion it doesn’t really matter; the difference between 5 billion and 1 billion isn’t important, because it’s exceedingly unlikely that a. Your working set is that large b. That your server could possibly cope with all of it at once. I hope you have partitions.

– sgarland (2025)

Yeah, 100mil is really not that much. I worked on a 10B rows table on an rds r6g.4xl, and Postgres handled it fine, even with 20+ indexes. Really not ideal and I'd rather have fewer indexes and sharding the table, but postgres dealt with it.

– williamdclt (2025)

Scaling a single node

[…] You don’t need quorum writes with traditional Postgres or MySQL HA setups. In most cases, writes go to a single primary, and replication to standbys is asynchronous or semi-synchronous, depending on your tolerance for potential data loss.

It’s all about finding the right balance. With modern vertically scalable hardware and fast SSDs, a single-node setup can handle quite a lot of load before hitting real limits with a failover setups.

– sreekanth850 (2025)

Schema design

Call me old fashioned, but when records start reaching the 100 million range, it's usually an indication that either your dataset is too wide (consider sharding) or too deep (consider time based archival) to fit into a monolithic schema. For context, I've dealt with multiple systems that generate this volume of data between 2003 - 2013 (mostly capital markets, but also some govt/compliance work) with databases and hardware from that era, and we rarely had an issue that could not be solved by either query optimization, caching, sharding or archival, usually in that order.

Secondly, we did most of these things using SQL, Bash scripts, cron jobs and some I/O logic built directly into the application code. They were robust enough to handle some extremely mission critical systems (a failure could bring down a US primary market and if it's bad enough, you hear it on the news).

– hliyan (2025)

ORMs

Every ORM is bad. Especially the "any DB" ORMs. Because they trick you into thinking about your data patterns in terms of writing application code, instead of writing code for the database. And most of the time their features and APIs are abstracted in a way that basically means you can only use the least-common-denominator of all the database backends that they can support.

I've sworn off ORMs entirely. My application is a Postgres application first and foremost. I use PG-specific features extensively. Why would I sacrifice all the power that Postgres offers me just for some conveniences in Python, or Ruby, or whatever?

Nah. Just write the good code for your database.

– VWWHFSfQ (2025)

[…] The most prolific backend frameworks are all built on ORMs for good reason. The best ones can deserialize inputs, validate them, place those object directly into the db, retrieve them later as objects, and then serialize them again all from essentially just a schema definition. Just to name a few advantages. Teams that take velocity seriously should use ORMs. As with any library choice you need to carefully vet them though.

– evantbyrne (2025)

Every ORM except Active Record is awful. Active Record is amazing.

– reillyse

MySQL vs. Postgres

Because MySQL got a (rightfully so) bad rap before it adopted InnoDB as the default storage engine, and then tech influencers happened. I love Postgres, but I also love MySQL, and 99% of the time I see people gushing about Postgres, they aren’t using any features that MySQL doesn’t have.

The single biggest thing for MySQL that should be a huge attraction for devs without RDBMS administration experience is that MySQL, by and large, doesn’t need much care and feeding. You’re not going to get paged for txid wraparound because you didn’t know autovacuum wasn’t keeping up on your larger tables. Unfortunately, the Achilles heel of MySQL (technically InnoDB) is also its greatest strength: its clustering index. This is fantastic for range queries, IFF you design your schema to exploit it, and don’t use a non-k-sortable PK like UUIDv4. Need to grab every FooRecord for a given user? If your PK is (user_id, ) then congrats, they’re all physically colocated on the same DB pages, and the DB only has to walk the B+tree once from the root node, then it just follows a linked list.

– sgarland (2025)

Integration with programming languages

[…] The separate datastore is the problem to be solved here - databases, especially relational databases, are extremely poorly integrated into programming languages and this makes it really painful to develop anything that uses them. You can just about use them as a place to dump serialized data to and from (not suitable for large systems because they're not properly distributed), but if you actually want to operate on data you need it to be in memory where you're running the code and you want it to be tightly integrated with your language and IDE and so on. […]

– lmm (2021)

Funny thing is, databases were tightly integrated into programming languages all the way back in 80s - that's exactly what dBase was, and why it became so popular. FoxBASE/FoxPro, Clipper, Paradox etc were all similar in that respect.

And yes, it made for some very powerful high-level tooling. I actually learned to code on FoxPro for DOS, and the efficiency with which you could crank out even fairly complicated line-of-business data-centric apps was amazing, and is not something I've seen in any tech stack since.

– int_19h (2021)

Articles

Migrating to Postgres (2025-05-14)

Table of contents

Hacker News on databases

What is big?

Scaling a single node

Schema design

ORMs

MySQL vs. Postgres

Integration with programming languages

Articles