Thread: Maximize page freezing

Maximize page freezing

From
Simon Riggs
Date:
Starting new thread with updated patch to avoid confusion, as
mentioned by David Steele on the original thread:
Original messageid: 20201118020418.GA13408@alvherre.pgsql
On Wed, 18 Nov 2020 at 02:04, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2020-Nov-17, Simon Riggs wrote:
>
> > As an additional optimization, if we do find a row that needs freezing
> > on a data block, we should simply freeze *all* row versions on the
> > page, not just the ones below the selected cutoff. This is justified
> > since writing the block is the biggest cost and it doesn't make much
> > sense to leave a few rows unfrozen on a block that we are dirtying.
>
> Yeah.  We've had earlier proposals to use high and low watermarks: if any
> tuple is past the high watermark, then freeze all tuples that are past
> the low watermark.  However this is ancient thinking (prior to
> HEAP_XMIN_FROZEN) and we don't need the low watermark to be different
> from zero, since the original xid is retained anyway.
>
> So +1 for this idea.

Updated patch attached.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Maximize page freezing

From
Matthias van de Meent
Date:
On Thu, 28 Jul 2022 at 15:36, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> Starting new thread with updated patch to avoid confusion, as
> mentioned by David Steele on the original thread:
> Original messageid: 20201118020418.GA13408@alvherre.pgsql
> On Wed, 18 Nov 2020 at 02:04, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > On 2020-Nov-17, Simon Riggs wrote:
> >
> > > As an additional optimization, if we do find a row that needs freezing
> > > on a data block, we should simply freeze *all* row versions on the
> > > page, not just the ones below the selected cutoff. This is justified
> > > since writing the block is the biggest cost and it doesn't make much
> > > sense to leave a few rows unfrozen on a block that we are dirtying.
> >
> > Yeah.  We've had earlier proposals to use high and low watermarks: if any
> > tuple is past the high watermark, then freeze all tuples that are past
> > the low watermark.  However this is ancient thinking (prior to
> > HEAP_XMIN_FROZEN) and we don't need the low watermark to be different
> > from zero, since the original xid is retained anyway.
> >
> > So +1 for this idea.
>
> Updated patch attached.

Great idea, yet this patch seems to only freeze those tuples that are
located after the first to-be-frozen tuple. It should probably
re-visit earlier live tuples to potentially freeze those as well.

Kind regards,

Matthias van de Meent



Re: Maximize page freezing

From
Peter Geoghegan
Date:
On Thu, Jul 28, 2022 at 6:56 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Great idea, yet this patch seems to only freeze those tuples that are
> located after the first to-be-frozen tuple. It should probably
> re-visit earlier live tuples to potentially freeze those as well.

I have a big patch set pending that does this (which I dubbed
"page-level freezing"), plus a bunch of other things that control the
overhead. Although the basic idea of freezing all of the tuples on a
page together appears in earlier patching that were posted. These were
things that didn't make it into Postgres 15.

I should be able to post something in a couple of weeks.

-- 
Peter Geoghegan



Re: Maximize page freezing

From
Simon Riggs
Date:
On Thu, 28 Jul 2022 at 20:57, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Jul 28, 2022 at 6:56 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > Great idea, yet this patch seems to only freeze those tuples that are
> > located after the first to-be-frozen tuple. It should probably
> > re-visit earlier live tuples to potentially freeze those as well.
>
> I have a big patch set pending that does this (which I dubbed
> "page-level freezing"), plus a bunch of other things that control the
> overhead. Although the basic idea of freezing all of the tuples on a
> page together appears in earlier patching that were posted. These were
> things that didn't make it into Postgres 15.

Yes, my patch from 2020 was never reviewed, which is why I was
resubmitting here.

> I should be able to post something in a couple of weeks.

How do you see that affecting this thread?

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Maximize page freezing

From
Simon Riggs
Date:
On Thu, 28 Jul 2022 at 14:55, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Thu, 28 Jul 2022 at 15:36, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> >
> > Starting new thread with updated patch to avoid confusion, as
> > mentioned by David Steele on the original thread:
> > Original messageid: 20201118020418.GA13408@alvherre.pgsql
> > On Wed, 18 Nov 2020 at 02:04, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > On 2020-Nov-17, Simon Riggs wrote:
> > >
> > > > As an additional optimization, if we do find a row that needs freezing
> > > > on a data block, we should simply freeze *all* row versions on the
> > > > page, not just the ones below the selected cutoff. This is justified
> > > > since writing the block is the biggest cost and it doesn't make much
> > > > sense to leave a few rows unfrozen on a block that we are dirtying.
> > >
> > > Yeah.  We've had earlier proposals to use high and low watermarks: if any
> > > tuple is past the high watermark, then freeze all tuples that are past
> > > the low watermark.  However this is ancient thinking (prior to
> > > HEAP_XMIN_FROZEN) and we don't need the low watermark to be different
> > > from zero, since the original xid is retained anyway.
> > >
> > > So +1 for this idea.
> >
> > Updated patch attached.
>
> Great idea, yet this patch seems to only freeze those tuples that are
> located after the first to-be-frozen tuple. It should probably
> re-visit earlier live tuples to potentially freeze those as well.

Like this?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Maximize page freezing

From
Peter Geoghegan
Date:
On Fri, Jul 29, 2022 at 5:55 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> > I should be able to post something in a couple of weeks.
>
> How do you see that affecting this thread?

Well, it's clearly duplicative, at least in part. That in itself
doesn't mean much, but there are some general questions (that apply to
any variant of proactive/batched freezing), particularly around the
added overhead, and the question of whether or not we get to advance
relfrozenxid substantially in return for that cost. Those parts are
quite tricky.

I have every intention of addressing these thorny questions in my
upcoming patch set, which actually does far more than change the rules
about when and how we freeze -- changing the mechanism itself is very
much the easy part. I'm taking a holistic approach that involves
making an up-front decision about freezing strategy based on the
observed characteristics of the table, driven by what we see in the
visibility map at the start.

Similar questions will also apply to this patch, even though it isn't
as aggressive (your patch doesn't trigger freezing when a page is
about to be set all-visible in order to make sure that it can be set
all-frozen instead). You still want to give the user a clear benefit
for any added overhead. It needs a great deal of performance
validation, too.

-- 
Peter Geoghegan



Re: Maximize page freezing

From
Matthias van de Meent
Date:
On Fri, 29 Jul 2022 at 16:38, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Thu, 28 Jul 2022 at 14:55, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > Great idea, yet this patch seems to only freeze those tuples that are
> > located after the first to-be-frozen tuple. It should probably
> > re-visit earlier live tuples to potentially freeze those as well.
>
> Like this?

That wasn't quite what I imagined. In your patch, heap_page_prune is
disabled after the first frozen tuple, which makes the retry mechanism
with the HTSV check loop forever because it expects that tuple to be
vacuumed.

I was thinking more in the line of "do a backtrack in a specialized
code block when entering max_freeze_page mode" (without using
'retry'), though I'm not sure whether that's the best option
available.

Kind regards,

Matthias van de Meent