Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Date
Msg-id 1d2e0f4a-6ab8-10f1-c177-af97475df001@postgrespro.ru
Whole thread Raw
In response to Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 30.10.2020 03:42, Tomas Vondra wrote:
> Hi,
>
> I might be somewhat late to the party, but I've done a bit of
> benchmarking too ;-) I used TPC-H data from a 100GB test, and tried
> different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on
> current master and with the patch.
>
> So I don't really observe any measurable slowdowns in the COPY part (in
> fact there seems to be a tiny speedup, but it might be just noise). In
> the VACUUM part, there's clear speedup when the data was already frozen
> by COPY (Yes, those are zeroes, because it took less than 1 second.)
>
> So that looks pretty awesome, I guess.
>
> For the record, these tests were run on a server with NVMe SSD, so
> hopefully reliable and representative numbers.
>
Thank you for the review.

> A couple minor comments about the code:
>
> 2) I find the "if (all_frozen_set)" block a bit strange. It's a matter
> of personal preference, but I'd just use a single level of nesting, i.e.
> something like this:
>
>     /* if everything frozen, the whole page has to be visible */
>     Assert(!(all_frozen_set && !PageIsAllVisible(page)));
>
>     /*
>      * If we've frozen everything on the page, and if we're already
>      * holding pin on the vmbuffer, record that in the visibilitymap.
>      * If we're not holding the pin, it's OK to skip this - it's just
>      * an optimization.
>      *
>      * It's fine to use InvalidTransactionId here - this is only used
>      * when HEAP_INSERT_FROZEN is specified, which intentionally
>      * violates visibility rules.
>      */
>     if (all_frozen_set &&
>         visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
>     visibilitymap_set(...);
>
> IMHO it's easier to read, but YMMV. I've also reworded the comment a bit
> to say what we're doing etc. And I've moved the comment from inside the
> if block into the main comment - that was making it harder to read too.
>
I agree that it's a matter of taste. I've updated comments and left 
nesting unchanged to keep assertions simple.

>
> 3) I see RelationGetBufferForTuple does this:
>
>     /*
>      * This is for COPY FREEZE needs. If page is empty,
>      * pin vmbuffer to set all_frozen bit
>      */
>     if ((options & HEAP_INSERT_FROZEN) &&
>         (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
>         visibilitymap_pin(relation, BufferGetBlockNumber(buffer), 
> vmbuffer);
>
> so is it actually possible to get into the (all_frozen_set) without
> holding a pin on the visibilitymap? I haven't investigated this so
> maybe there are other ways to get into that code block. But the new
> additions to hio.c get the pin too.
>
I was thinking that GetVisibilityMapPins() can somehow unset the pin. I 
gave it a second look. And now I don't think it's possible to get into 
this code block without a pin.  So, I converted this check into an 
assertion.

>
> 4) In heap_xlog_multi_insert we now have this:
>
>     if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
>         PageClearAllVisible(page);
>     if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
>         PageSetAllVisible(page);
>
> IIUC it makes no sense to have both flags at the same time, right? So
> what about adding
>
>     Assert(!(XLH_INSERT_ALL_FROZEN_SET && 
> XLH_INSERT_ALL_VISIBLE_CLEARED));
>
> to check that?
>
Agree.

I placed this assertion to the very beginning of the function. It also 
helped to simplify the code a bit.
I also noticed, that we were not updating visibility map for all_frozen 
from heap_xlog_multi_insert. Fixed.

>
> I wonder what to do about the heap_insert - I know there are concerns it
> would negatively impact regular insert, but is it really true? I suppose
> this optimization would be valuable even for cases where multi-insert is
> not possible.
>
Do we have something like INSERT .. FREEZE? I only see 
TABLE_INSERT_FROZEN set for COPY FREEZE and for matview operations. Can 
you explain, what use-case are we trying to optimize by extending this 
patch to heap_insert()?

The new version is attached.
I've also fixed a typo in the comment by Tatsuo Ishii suggestion.
Also, I tested this patch with replication and found no issues.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Collation versioning
Next
From: Magnus Hagander
Date:
Subject: Re: -O switch