Hi,
On 2020-11-18 19:02:34 -0500, Tom Lane wrote:
> Having said that, I had imagined that we might never have to fix it,
> because if your table is big enough that it has a problem of this
> ilk then likely you want to partition it anyway. And partitioning
> solves the problem since each partition has its own toast table.
A few billion rows isn't that much anymore, and partitioning has a fair
number of restrictions (plus this only needs 4 billion toasted fields,
not 4 billion rows). More importantly, to hit problems around this, one
doesn't even have to have all those rows in a table at once -
performance will suffer once the oid counter has wrapped around,
especially if there's longer sequences of assigned values.
> Yeah. If we're going to put work into this, widening the IDs used
> to identify toast values seems like the right work to be doing.
To outline, here's what I think the two major pieces to get there are:
1) Make toast oid assignment independent of the oid counter. The easiest
way likely is to also create a serial and use that. That alone
improves the situation considerably, because it takes much longer to
to wrap around in each toast table.
The overhead of the additional WAL records isn't nothing, but
compared to maintaining a btree it's not likely to be measurable.
2) Widen non pg-upgraded toast tables to have 64bit chunk_id field. To
reference chunks >= 2^32, add VARTAG_ONDISK64, which is only used
when needed.
Greetings,
Andres Freund