Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Date
Msg-id 20201030004252.d63rwclys4bniluu@development
Whole thread Raw
In response to Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tatsuo Ishii <ishii@sraoss.co.jp>)
Responses Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List pgsql-hackers
Hi,

I might be somewhat late to the party, but I've done a bit of
benchmarking too ;-) I used TPC-H data from a 100GB test, and tried
different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on
current master and with the patch.

The results look like this (the columns say what combination of COPY and
VACUUM was used - e.g. -/FREEZE means plain COPY and VACUUM FREEZE)

master:

             - / -    FREEZE / -     - / FREEZE    FREEZE / FREEZE
   ----------------------------------------------------------------
      COPY    2471          2477           2486               2484
    VACUUM     228           209            453                206

patched:

             - / -    FREEZE / -     - / FREEZE    FREEZE / FREEZE
   ----------------------------------------------------------------
       COPY   2459          2445           2458               2446
     VACUUM    227             0            467                  0

So I don't really observe any measurable slowdowns in the COPY part (in
fact there seems to be a tiny speedup, but it might be just noise). In
the VACUUM part, there's clear speedup when the data was already frozen
by COPY (Yes, those are zeroes, because it took less than 1 second.)

So that looks pretty awesome, I guess.

For the record, these tests were run on a server with NVMe SSD, so
hopefully reliable and representative numbers.


A couple minor comments about the code:

1) Maybe add a comment before the block setting xlrec->flags in
heap_multi_insert. It's not very complex, but it used to be a bit
simpler, and all the other pieces around have comments, so it won't
hurt.


2) I find the "if (all_frozen_set)" block a bit strange. It's a matter
of personal preference, but I'd just use a single level of nesting, i.e.
something like this:

     /* if everything frozen, the whole page has to be visible */
     Assert(!(all_frozen_set && !PageIsAllVisible(page)));

     /*
      * If we've frozen everything on the page, and if we're already
      * holding pin on the vmbuffer, record that in the visibilitymap.
      * If we're not holding the pin, it's OK to skip this - it's just
      * an optimization.
      *
      * It's fine to use InvalidTransactionId here - this is only used
      * when HEAP_INSERT_FROZEN is specified, which intentionally
      * violates visibility rules.
      */
     if (all_frozen_set &&
         visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
    visibilitymap_set(...);

IMHO it's easier to read, but YMMV. I've also reworded the comment a bit
to say what we're doing etc. And I've moved the comment from inside the
if block into the main comment - that was making it harder to read too.


3) I see RelationGetBufferForTuple does this:

     /*
      * This is for COPY FREEZE needs. If page is empty,
      * pin vmbuffer to set all_frozen bit
      */
     if ((options & HEAP_INSERT_FROZEN) &&
         (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
         visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);

so is it actually possible to get into the (all_frozen_set) without
holding a pin on the visibilitymap? I haven't investigated this so
maybe there are other ways to get into that code block. But the new
additions to hio.c get the pin too.


4) In heap_xlog_multi_insert we now have this:

     if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
         PageClearAllVisible(page);
     if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
         PageSetAllVisible(page);

IIUC it makes no sense to have both flags at the same time, right? So
what about adding

     Assert(!(XLH_INSERT_ALL_FROZEN_SET && XLH_INSERT_ALL_VISIBLE_CLEARED));

to check that?


5) Not sure we need to explicitly say this is for COPY FREE in all the
blocks added to hio.c. IMO it's sufficient to use HEAP_INSERT_FROZEN in
the condition, at this level of abstraction.


I wonder what to do about the heap_insert - I know there are concerns it
would negatively impact regular insert, but is it really true? I suppose
this optimization would be valuable even for cases where multi-insert is
not possible.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Next
From: Ashutosh Sharma
Date:
Subject: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8