Re: Performance degradation of REFRESH MATERIALIZED VIEW - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Performance degradation of REFRESH MATERIALIZED VIEW
Date
Msg-id CAD21AoCnp2dVVFiVvhypDXus4hJhCbOyydFj9PWc+NgctqLieQ@mail.gmail.com
Whole thread Raw
In response to Re: Performance degradation of REFRESH MATERIALIZED VIEW  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Performance degradation of REFRESH MATERIALIZED VIEW  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Fri, Apr 16, 2021 at 12:16 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Mon, 12 Apr 2021 15:20:41 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
> > .
> >
> > On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > While discussing freezing tuples during CTAS[1], we found that
> > > heap_insert() with HEAP_INSERT_FROZEN brings performance degradation.
> > > For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS,
> > > it took 12 sec whereas the code without the patch took 10 sec with the
> > > following query:
> > >
> > > create table t1 (a, b, c, d) as select i,i,i,i from
> > > generate_series(1,20000000) i;
> > >
> > > I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the
> > > following queries:
> > >
> > > create table source as select generate_series(1, 50000000);
> > > create materialized view mv as select * from source;
> > > refresh materialized view mv;
> > >
> > > The execution time of REFRESH MATERIALIZED VIEW are:
> > >
> > > w/ HEAP_INSERT_FROZEN flag : 42 sec
> > > w/o HEAP_INSERT_FROZEN flag : 33 sec
> > >
> > > After investigation, I found that such performance degradation happens
> > > on only HEAD code. It seems to me that commit 39b66a91b (and
> > > 7db0cd2145) is relevant that has heap_insert() set VM bits and
> > > PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas
> > > Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the
> > > page when inserting a tuple for the first time on the page (around
> > > L2133 in heapam.c), every subsequent heap_insert() on the page reads
> > > and pins a VM buffer (see RelationGetBufferForTuple()).
> >
> > IIUC RelationGetBufferForTuple() pins vm buffer if the page is
> > all-visible since the caller might clear vm bit during operation. But
> > it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting
> > HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible
> > bit but never clear those flag and bit during insertion. Therefore to
> > fix this issue, I think we can have RelationGetBufferForTuple() not to
> > pin vm buffer if we're inserting a frozen tuple (i.g.,
> > HEAP_FROZEN_INSERT case) and the target page is already all-visible.
>
> It seems right to me.
>
> > In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would
> > be the table is empty. That way, we will pin vm buffer only for the
> > first time of inserting frozen tuple into the empty page, then set
> > PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set
> > XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not
> > pin vm buffer as long as we’re inserting a frozen tuple into the same
> > page.
> >
> > If the target page is neither empty nor all-visible we will not pin vm
> > buffer, which is fine because if the page has non-frozen tuple we
> > cannot set bit on vm during heap_insert(). If all tuples on the page
> > are already frozen but PD_ALL_VISIBLE is not set for some reason, we
> > would be able to set all-frozen bit on vm but it seems not a good idea
> > since it requires checking during insertion if all existing tuples are
> > frozen or not.
> >
> > The attached patch does the above idea. With this patch, the same
> > performance tests took 33 sec.

Thank you for the comments.

>
> Great! The direction of the patch looks fine to me.
>
> +                * If we're inserting frozen entry into empty page, we will set
> +                * all-visible to page and all-frozen on visibility map.
> +                */
> +               if (PageGetMaxOffsetNumber(page) == 0)
>                         all_frozen_set = true;
>
> AFAICS the page is always empty when RelationGetBufferForTuple
> returned a valid vmbuffer.  So the "if" should be an "assert" instead.

There is a chance that RelationGetBufferForTuple() returns a valid
vmbuffer but the page is not empty, since RelationGetBufferForTuple()
checks without a lock if the page is empty. But when it comes to
HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now
since only one process inserts tuples into the relation. Will fix.

>
> And, the patch changes the value of all_frozen_set to false when the
> page was already all-frozen (thus not empty). It would be fine since
> we don't need to change the visibility of the page in that case but
> the variable name is no longer correct.  set_all_visible or such?

It seems to me that the variable name all_frozen_set corresponds to
XLH_INSERT_ALL_FROZEN_SET but I see your point. How about
set_all_frozen instead since we set all-frozen bits (also implying
setting all-visible)?

BTW I found the following description of XLH_INSERT_ALL_FROZEN_SET but
there is no all_visible_set anywhere:

/* all_frozen_set always implies all_visible_set */
#define XLH_INSERT_ALL_FROZEN_SET               (1<<5)

I'll update those comments as well.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Replication slot stats misgivings
Next
From: Bharath Rupireddy
Date:
Subject: Re: New Table Access Methods for Multi and Single Inserts