Re: Disabling Heap-Only Tuples - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Disabling Heap-Only Tuples
Date
Msg-id CAEze2WhmE=Vv-deUS83S9qcdOz-Z4zOsk7HrL4GBNM=a+TO3jw@mail.gmail.com
Whole thread Raw
In response to Re: Disabling Heap-Only Tuples  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Disabling Heap-Only Tuples
Re: Disabling Heap-Only Tuples
List pgsql-hackers
On Tue, 19 Sept 2023 at 18:52, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Sep 19, 2023 at 12:30 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > I was thinking something vaguely like "a table size that's roughly what
> > an optimal autovacuuming schedule would leave the table at" assuming 0.2
> > vacuum_scale_factor.  You would determine the absolute minimum size for
> > the table given the current live tuples in the table, then add 20% to
> > account for a steady state of dead tuples and vacuumed space.  So it's
> > not 1.2x of the "current" table size at the time the local_update_limit
> > feature is installed, but 1.2x of the optimal table size.
>
> Right, that would be great. And honestly if that's something we can
> figure out, then why does the parameter even need to be an integer
> instead of a Boolean? If the system knows the optimal table size, then
> the user can just say "try to compact this table" and need not say to
> what size. The 1.2 multiplier is probably situation dependent and
> maybe the multiplier should indeed be a configuration parameter, but
> we would be way better off if the absolute size didn't need to be.

Mostly agreed, but I think there's a pitfall here. You seem to assume
we have a perfect oracle that knows the optimal data size, but we
already know that our estimates can be significantly off. I don't
quite trust the statistics enough to do any calculations based on the
number of tuples in the relation. That also ignores the fact that we
don't actually have any good information about the average size of the
tuples in the table. So with current statistics, any automated "this
is how large the table should be" decisions would result in an
automated footgun, instead of the current patch's where the user has
to decide to configure it to an explicit value.

But about that: I'm not sure what the "footgun" is that you've
mentioned recently?
The issue with excessive bloat (when the local_update_limit is set too
small and fillfactor is low) was fixed in the latest patch nearly
three weeks ago, so the only remaining issue with misconfiguration is
slower updates. Sure, that's not great, but in my opinion not a
"footgun": performance returns immediately after resetting
local_update_limit, and no space was lost.

> > This makes me think that maybe the logic needs to be a little more
> > complex to avoid the problem you describe: if an UPDATE is prevented
> > from being HOT because of this setting, but then it goes and consults
> > FSM and it gives the update a higher block number than the tuple's
> > current block (or it fails to give a block number at all so it is forced
> > to extend the relation), then the update should give up on that strategy
> > and use a HOT update after all.  (I have not read the actual patch;
> > maybe it already does this?  It sounds kinda obvious.)
>
> +1 to all of that. Anything we can do to reduce the chance of the
> parameter doing the opposite of what it's intended to do is, IMHO,
> really, really valuable. If you're in the situation where you really
> need something like this, you're probably having a pretty bad day
> already.

Yes, it does that with the latest patch, from not quite 3 weeks ago.

> Just to be more clear about my position, I don't think that having
> some kind of a feature along these lines is a bad idea.

Thanks for clarifying.

> I do think
> that this is one of those cases where the perfect is the enemy of the
> good, and we can fall into the trap of saying that since we can't do
> the perfect thing let's not do anything at all. At the same time, just
> because we need to do something doesn't mean we should do exactly the
> first thing that anybody thought up, or that we shouldn't try as hard
> as we can to mitigate the downsides. If we add something like this I
> bet it will get a lot of use. Even a minor improvement to the design
> that removes one pitfall of many could turn out to help a lot of
> people.

100% agreed.

> > Having to set AEL is not nice for sure, but wouldn't
> > ShareUpdateExclusiveLock be sufficient?  We have a bunch of reloptions
> > for which that is sufficient.
>
> Hmm, yeah, I think you're right.

Updating the reloption after relation truncation implies having the
same lock as relation truncation, i.e. AEL (if the vacuum docs are to
be believed). So the AEL is not reqiored for updating the storage
option (that would require SUEL), but for the block truncation
operation operation.

Kind regards,

Matthias van de Meent
Neon (http://neon.tech)



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: GenBKI emits useless open;close for catalogs without rows
Next
From: Robert Haas
Date:
Subject: Re: Disabling Heap-Only Tuples