Re: Lowering the ever-growing heap->pd_lower - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Lowering the ever-growing heap->pd_lower
Date
Msg-id CAH2-Wzm8VxpGS7MvW3bttddcOfqRt_uqf_SC0XQJwo5aK8L-wg@mail.gmail.com
Whole thread Raw
In response to Re: Lowering the ever-growing heap->pd_lower  (Simon Riggs <simon.riggs@enterprisedb.com>)
Responses Re: Lowering the ever-growing heap->pd_lower  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Re: Lowering the ever-growing heap->pd_lower  (Robert Haas <robertmhaas@gmail.com>)
Re: Lowering the ever-growing heap->pd_lower  (Simon Riggs <simon.riggs@enterprisedb.com>)
List pgsql-hackers
On Mon, Aug 2, 2021 at 11:57 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> 1. Allow same thing as PageTruncateLinePointerArray() during HOT cleanup
> That is going to have a clear benefit for HOT workloads, which by
> their nature will use a lot of line pointers.

Why do you say that?

> Many applications are updated much more frequently than they are vacuumed.
> Peter - what is your concern about doing this more frequently? Why
> would we *not* do this?

What I meant before was that this argument worked back when we limited
the technique to VACUUM's second heap pass. Doing line pointer array
truncation at that point alone meant that it only ever happened
outside of VACUUM proper. Prior to that point we literally did nothing
about LP_UNUSED items at the end of each line pointer array, so we
were going from doing nothing to doing something.

This time it's quite different: we're truncating the line pointer
array during pruning. Pruning often occurs opportunistically, during
regular query processing. In fact I'd say that it's far more common
than pruning by VACUUM. So the chances of line pointer array
truncation hurting rather than helping seems higher. Plus now we might
break things like DDL, that would naturally not have been affected
before because we were only doing line pointer truncation during
VACUUM proper.

Of course it might well be true that there is a significant benefit to
this patch. I don't think that we should assume that that's the case,
though. We have yet to see a test case showing any benefit. Maybe
that's an easy thing to produce, and maybe Matthias has assumed that I
must already know what to look at. But I don't. It's not obvious to me
how to assess this patch now.

I don't claim to have any insight about what we should or should not
do. At least not right now. When I committed the original (commit
3c3b8a4b), I did so because I thought that it was very likely to
improve certain cases and very unlikely to hurt any other cases.
Nothing more.

> 2. Reduce number of line pointers to 0 in some cases.
> Matthias - I don't think you've made a full case for doing this, nor
> looked at the implications.
> The comment clearly says "it seems like a good idea to avoid leaving a
> PageIsEmpty()" page behind.
> So I would be inclined to remove that from the patch and consider that
> as a separate issue, or close this.

This was part of that earlier commit because of sheer paranoia;
nothing more. I actually think that it's highly unlikely to protect us
from bugs in practice. Though I am, in a certain sense, likely to be
wrong about "PageIsEmpty() defensiveness", it does not bother me in
the slightest. It seems like the right approach in the absence of new
information about a significant downside. If my paranoia really did
turn out to be justified, then I would expect that there'd be a
subtle, nasty bug. That possibility is what I was really thinking of.
And so it almost doesn't matter to me how unlikely we might think such
a bug is now, unless and until somebody can demonstrate a real
practical downside to my defensive approach.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: make MaxBackends available in _PG_init
Next
From: Masahiko Sawada
Date:
Subject: Re: Failed transaction statistics to measure the logical replication progress