Re: [GENERAL] 4B row limit for CLOB tables - Mailing list pgsql-hackers

From Matthew Kelly
Subject Re: [GENERAL] 4B row limit for CLOB tables
Date
Msg-id 5696A620-74A7-4C4D-937B-A4D5695CC9BE@tripadvisor.com
Whole thread Raw
In response to Re: Fwd: [GENERAL] 4B row limit for CLOB tables  (José Luis Tallón <jltallon@adv-solutions.net>)
Responses Re: [GENERAL] 4B row limit for CLOB tables
List pgsql-hackers
> Hmm.... 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap page) is 8796093022208 (~9e13) bytes
> ... which results in 8192 1GB segments :O

8192 1GB segments is just 8TB, its not _that_ large.  At TripAdvisor we’ve been using a NoSQL solution to do session
storage. We are looking to probably swap that out to be Postgres (every other db backing the site is Postgres).
Essentially,what I’m building is a system with 1 logical table that maps session id to a 2KB+ grab bag of ever changing
sessionattributes which is partially normalized, partially json.  315 million uniques a month multiplied by the
retentionpolicy means I need to hold 2-4 billion session objects (and somehow expire old ones).  Additionally, most
httpcalls can update the session, so between maintenance windows I expect to take around 20 billion 'upserts’.
Obviously,I will have to shard and partition the table in practice, but this weekend I ran a test that demonstrated
thata single table on a 9.4 server + logical replication + Dell 730xd can handle 4x that workload.  Well, it can for 38
hours…until you wrap xid’s on the toast table.  :P  I’ll be the first to admit that isn’t the normal use case though.
I’mhappy to have found this thread, however, because I’m going to have to build around the global oid counter,
explicitlythe prevent the problem I explain below regarding clustering.
 

> Anybody actually reaching this limit out there?

Well its not the 4 billion row limit that concerns me, its the global shared counter in conjunction with
pg_restore/clusteringthat is actually pretty concerning.
 

Just checked through all of TripAdvisor’s normal databases and the max tuples I see in single toast table is
17,000,000,so that is still a couple of orders of magnitude too small.  (however, close enough that it’ll be a concern
ina few years).
 

However, I do have active databases where the current oid is between 1 billion and 2 billion.  They were last
dump-restoredfor a hardware upgrade a couple years ago and were a bit more than half the size.  I therefore can imagine
thatI have tables which are keyed by ~8,000,000 consecutive oids.
 

I would argue that when it wraps there will be a single insert that will probably block for 2-5 minutes while it tries
toaccomplish ~8,000,000 index scans inside of GetNewOidWithIndex.  Even partitioning doesn’t protect you from this
potentialproblem.
 

What even more weird is that this issue can be trigged by consuming too many oid’s in a different database in the same
cluster(i.e. creating large amounts of temp tables)
 

> The problem with changing the id from 32 to 64 bits is that the storage *for everybody else* doubles, making the
implementationslower for most.... though this might be actually not that important.
 

Well, you aren’t doubling the storage.  Even if you have to store the key in 4 places, you are adding 16 bytes per
TOASTtuple.  If we work off the 2KB estimate for each TOAST tuple, then you are only increasing the storage by 0.7%.
I’msure there are more hidden costs but we are really only talking about a low single digit percent increase.  In
exchange,you get to drop one index scan per toast insert; an index scan looking in the only hot part of the index. 
 

That being said I’d be perfectly happy merely giving each TOAST table its own sequence as that almost entire mitigates
therisk of an unexpected lock up on reasonably sized tables/partitions, and provides a functional work around for those
ofus with larger than average installs.
 

- Matt K

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Small memory leak in execute.c of ECPG driver
Next
From: Heikki Linnakangas
Date:
Subject: Re: Missing markup in pg_receivexlog.sgml