Re: Disabling Heap-Only Tuples - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Disabling Heap-Only Tuples
Date
Msg-id 20230921231852.lfaueewbrv5lz6nr@awork3.anarazel.de
Whole thread Raw
In response to Re: Disabling Heap-Only Tuples  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: Disabling Heap-Only Tuples
List pgsql-hackers
Hi,

On 2023-09-19 20:20:06 +0200, Matthias van de Meent wrote:
> Mostly agreed, but I think there's a pitfall here. You seem to assume
> we have a perfect oracle that knows the optimal data size, but we
> already know that our estimates can be significantly off. I don't
> quite trust the statistics enough to do any calculations based on the
> number of tuples in the relation. That also ignores the fact that we
> don't actually have any good information about the average size of the
> tuples in the table. So with current statistics, any automated "this
> is how large the table should be" decisions would result in an
> automated footgun, instead of the current patch's where the user has
> to decide to configure it to an explicit value.

The proposed patch already relies on the FSM being reasonably up2date, no? If
the FSM doesn't know about free space, the patch won't be able to place tuples
earlier in the relation. And if the FSM wrongly thinks there's lots of free
space, it'll make updates very expensive.

We obviously don't want to scan the whole FSM on an ongoing basis, but
visiting the top-level FSM pages and/or having vacuum/analyze update some
statistic based on a more thorough analysis of the FSM doesn't seem insane.


A related issue is that an accurate tuple size and accurate number of tuples
isn't really sufficient - if tuples are wider, there can be plenty space on
pages without updates being able to reuse that space. And the width of tuples
doesn't have to be evenly distributed, so a simple approach of calculating how
many tuples of the average width fit in a page and then using that to come up
with the overall number of required pages isn't necessarily accurate either.


> But about that: I'm not sure what the "footgun" is that you've
> mentioned recently?
> The issue with excessive bloat (when the local_update_limit is set too
> small and fillfactor is low) was fixed in the latest patch nearly
> three weeks ago, so the only remaining issue with misconfiguration is
> slower updates.

There seem to be plenty footguns. Just to name a few:

- The user has to determine a good value for local_update_limit, without
  really any good way of doing so.

- A "too low" local_update_limit will often succeed in finding some space in
  earlier pages, without that providing useful progress on compaction -
  e.g. because subsequently tuples on the earlier page will be updated and
  there's now no space anymore. Leading to index bloat.

- Configuring local_update_limit as a fixed size will be fragile when the data
  actually grows, leading to lots of pointless out-of-page updates.


I think a minimal working approach could be to have the configuration be based
on the relation size vs space known to the FSM. If the target block of an
update is higher than ((relation_size - fsm_free_space) *
new_reloption_or_guc), try finding the target block via the FSM, even if
there's space on the page.


> Sure, that's not great, but in my opinion not a
> "footgun": performance returns immediately after resetting
> local_update_limit, and no space was lost.

I think there's plenty ways to get pointless out-of-page updates, and
therefore index bloat, with local_update_limit as-proposed (see earlier in the
email). Once you have such pointless out-of-page updates, disabling
local_update_limit won't bring performance back immediately (space usage due
to index bloat and lookup performance issues due to the additional index
entries).


> Updating the reloption after relation truncation implies having the
> same lock as relation truncation, i.e. AEL (if the vacuum docs are to
> be believed).

Aside: We really need to get rid of the AEL for relation trunction - it's
quite painful for hot standby workloads...

Thomas has been talking about a patch (and perhaps even posted it) that adds
infrastructure providing a "shared smgrrelation". Once we have that I think we
could lower the required lock level for truncation, by having storing both the
filesystem size and the "valid" size. There's a few potential models:

- Vacuum truncation could lower the valid size in-memory, end its transaction,
  wait for concurrent accesses to the relation to finish, check if/where to
  the relation has been extended since, acquire the extension lock and
  truncate down to the "valid" size.

  The danger with that is that the necessary waiting can be long, threatening
  to starve autovacuum of workers.

- Instead of making a single vacuum wait, we could have one vacuum update the
  valid size of the relation and also store an xid horizon. Later vacuums can
  truncate the physical size down the to valid size if there are no snapshot
  conflicts with said xid anymore.


If we had such an shared smgrrel, we could also make relation extension a lot
more efficient, because we would not need to pin all pages that a relation
extension "covers" - the reason that we need to pin the to-be-extended-pages
is to prevent concurrent scans from reading "new" blocks while the extension
is in progress, as otherwise such a buffer can be dirtied and written out,
potentially leading to lost writes and other fun issues. But with the shared
smgrrel, we can store the size-currently-being-extended-to separately from the
filesystem size. If it's not allowed to read the block range covered by those
blocks into s_b, the race doesn't exist anymore.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_upgrade and logical replication
Next
From: Michael Paquier
Date:
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node