Re: Lowering the ever-growing heap->pd_lower - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Lowering the ever-growing heap->pd_lower
Date
Msg-id 20220408210601.ny6maqqnfdipq7tx@alap3.anarazel.de
Whole thread Raw
In response to Re: Lowering the ever-growing heap->pd_lower  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Lowering the ever-growing heap->pd_lower  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,

On 2022-04-08 09:17:40 -0400, Robert Haas wrote:
> I agree that the value of 291 is pretty much accidental, but it also
> seems fairly generous to me. The bigger you make it, the more space
> you can waste. I must have missed (or failed to understand) previous
> discussions about why raising it would be a good idea.

It's not hard to hit scenarios where pages are effectively unusable, because
they have close to 291 dead items, without autovacuum triggering (or
autovacuum just taking a while). You basically just need updates / deletes to
concentrate in a certain range of the table and have indexing that prevents
HOT updates. Because the overall percentage of dead tuples is low, no
autovacuum is triggered, yet a range of the table contains little but dead
items.  At which point you basically waste 7k bytes (1164 bytes for dead items
IIRC) until a vacuum finally kicks in - way more than what what you'd waste if
the number of line items were limited at e.g. 2 x MaxHeapTuplesPerPage

This has become a bit more pronounced with vacuum skipping index cleanup when
there's "just a few" dead items - if all your updates concentrate in a small
region, 2% of the whole relation size isn't actually that small.


I wonder if we could reduce the real-world space wastage of the line pointer
array, if we changed the the logic about which OffsetNumbers to use during
inserts / updates and and made a few tweaks to to pruning.

1) It's kind of OK for heap-only tuples to get a high OffsetNumber - we can
   reclaim them during pruning once they're dead. They don't leave behind a
   dead item that's unreclaimable until the next vacuum with an index cleanup
   pass.

2) Arguably the OffsetNumber of a redirect target can be changed. It might
   break careless uses of WHERE ctid = ... though (which likely are already
   broken, just harder to hit).

These leads me to a few potential improvements:

a) heap_page_prune_prune() should take the number of used items into account
   when deciding whether to prune. Right now we trigger hot pruning based on
   the number of items only if PageGetMaxOffsetNumber(page) >=
   MaxHeapTuplesPerPage. But because it requires a vacuum to reclaim an ItemId
   used for a root tuple, we should trigger HOT pruning when it might lower
   which OffsetNumber get used.

b) heap_page_prune_prune() should be triggered in more paths. E.g. when
   inserting / updating, we should prune if it allows us to avoid using a high
   OffsetNumber.

c) What if we left some percentage of ItemIds unused, when looking for the
   OffsetNumber of a new HOT row version? That'd make it more likely for
   non-HOT updates and inserts to fit onto the page, without permanently
   increasing the size of the line pointer array.

d) If we think 2) is acceptable, we could move the targets of redirects to
   make space for new root tuples, without increasing the permanent size of
   the line pointer array.

Crazy?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Pre-allocating WAL files
Next
From: Andres Freund
Date:
Subject: Re: Lowering the ever-growing heap->pd_lower