Re: [GENERAL] 4B row limit for CLOB tables - Mailing list pgsql-hackers
From | Matthew Kelly |
---|---|
Subject | Re: [GENERAL] 4B row limit for CLOB tables |
Date | |
Msg-id | 5696A620-74A7-4C4D-937B-A4D5695CC9BE@tripadvisor.com Whole thread Raw |
In response to | Re: Fwd: [GENERAL] 4B row limit for CLOB tables (José Luis Tallón <jltallon@adv-solutions.net>) |
Responses |
Re: [GENERAL] 4B row limit for CLOB tables
|
List | pgsql-hackers |
> Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap page) is 8796093022208 (~9e13) bytes > ... which results in 8192 1GB segments :O 8192 1GB segments is just 8TB, its not _that_ large. At TripAdvisor we’ve been using a NoSQL solution to do session storage. We are looking to probably swap that out to be Postgres (every other db backing the site is Postgres). Essentially,what I’m building is a system with 1 logical table that maps session id to a 2KB+ grab bag of ever changing sessionattributes which is partially normalized, partially json. 315 million uniques a month multiplied by the retentionpolicy means I need to hold 2-4 billion session objects (and somehow expire old ones). Additionally, most httpcalls can update the session, so between maintenance windows I expect to take around 20 billion 'upserts’. Obviously,I will have to shard and partition the table in practice, but this weekend I ran a test that demonstrated thata single table on a 9.4 server + logical replication + Dell 730xd can handle 4x that workload. Well, it can for 38 hours…until you wrap xid’s on the toast table. :P I’ll be the first to admit that isn’t the normal use case though. I’mhappy to have found this thread, however, because I’m going to have to build around the global oid counter, explicitlythe prevent the problem I explain below regarding clustering. > Anybody actually reaching this limit out there? Well its not the 4 billion row limit that concerns me, its the global shared counter in conjunction with pg_restore/clusteringthat is actually pretty concerning. Just checked through all of TripAdvisor’s normal databases and the max tuples I see in single toast table is 17,000,000,so that is still a couple of orders of magnitude too small. (however, close enough that it’ll be a concern ina few years). However, I do have active databases where the current oid is between 1 billion and 2 billion. They were last dump-restoredfor a hardware upgrade a couple years ago and were a bit more than half the size. I therefore can imagine thatI have tables which are keyed by ~8,000,000 consecutive oids. I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries toaccomplish ~8,000,000 index scans inside of GetNewOidWithIndex. Even partitioning doesn’t protect you from this potentialproblem. What even more weird is that this issue can be trigged by consuming too many oid’s in a different database in the same cluster(i.e. creating large amounts of temp tables) > The problem with changing the id from 32 to 64 bits is that the storage *for everybody else* doubles, making the implementationslower for most.... though this might be actually not that important. Well, you aren’t doubling the storage. Even if you have to store the key in 4 places, you are adding 16 bytes per TOASTtuple. If we work off the 2KB estimate for each TOAST tuple, then you are only increasing the storage by 0.7%. I’msure there are more hidden costs but we are really only talking about a low single digit percent increase. In exchange,you get to drop one index scan per toast insert; an index scan looking in the only hot part of the index. That being said I’d be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates therisk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those ofus with larger than average installs. - Matt K
pgsql-hackers by date: