Re: Performance degradation of REFRESH MATERIALIZED VIEW - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Performance degradation of REFRESH MATERIALIZED VIEW
Date
Msg-id CAD21AoBQYoKQpq2kdZhsjOio2rzP3ubc=B2reS47yXuZNM4itw@mail.gmail.com
Whole thread Raw
In response to Re: Performance degradation of REFRESH MATERIALIZED VIEW  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Performance degradation of REFRESH MATERIALIZED VIEW  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Tue, Apr 20, 2021 at 11:04 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Apr 19, 2021 at 7:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I’ve updated the patch including the above comment.
>
> Thanks for the patch.
>
> I was trying to understand below statements:
> +                 * we check without a buffer lock if the page is empty but the
> +                 * caller doesn't need to recheck that since we assume that in
> +                 * HEAP_INSERT_FROZEN case, only one process is inserting a
> +                 * frozen tuple into this relation.
> +                 *
>
> And earlier comments from upthread:
>
> >> AFAICS the page is always empty when RelationGetBufferForTuple
> >> returned a valid vmbuffer.  So the "if" should be an "assert" instead.
>
> > There is a chance that RelationGetBufferForTuple() returns a valid
> > vmbuffer but the page is not empty, since RelationGetBufferForTuple()
> > checks without a lock if the page is empty. But when it comes to
> > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now
> > since only one process inserts tuples into the relation. Will fix."
>
> I'm not sure whether it is safe to assume "at least for now since only
> one process inserts tuples into the relation". What if we allow
> parallel inserts for HEAP_INSERT_FROZEN cases, I don't know whether we
> can do that or not. Correct me if I'm wrong.

I think if my assumption is wrong or we allow parallel insert for
HEAP_INSERT_FROZEN cases in the future, we need to deal with the case
where frozen tuples are concurrently inserted into the same page. For
example, we can release vmbuffer when we see the page is no longer
empty, or we can return a valid buffer but require the caller to
re-check if the page is still empty. The previous version patch took
the former approach. More concretely, heap_insert() rechecked if the
page is still empty in HEAP_INSERT_FROZEN case and set all_frozen_set
if so. But AFAICS concurrently inserting frozen tuples into the same
page doesn’t happen for now (COPY FREEZE and REFRESH MATERIALIZED VIEW
are users of HEAP_INSERT_FROZEN), also pointed out by Horiguchi-san.
So I added comments and assertions rather than addressing the case
that never happens with the current code. If concurrently inserting
frozen tuples into the same page happens, we should get the assertion
failure that I added in RelationGetBufferForTuple().

>
> While we are modifying something in heap_insert:
> 1) Can we adjust the comment below in heap_insert to the 80char limit?
>       * If we're inserting frozen entry into an empty page,
>       * set visibility map bits and PageAllVisible() hint.
> 2) I'm thinking whether we can do page = BufferGetPage(buffer); after
> RelationGetBufferForTuple and use in all the places where currently
> BufferGetPage(buffer) is being used:
> if (PageIsAllVisible(BufferGetPage(buffer)),
> PageClearAllVisible(BufferGetPage(buffer)); and we could even remove
> the local variable page of if (RelationNeedsWAL(relation)).
> 3) We could as well get the block number once and use it in all the
> places in heap_insert, thus we can remove extra calls of
> BufferGetBlockNumber(buffer).

All points are reasonable to me. I'll incorporate them in the next version.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Table refer leak in logical replication
Next
From: Pavel Stehule
Date:
Subject: Re: 2 questions about volatile attribute of pg_proc.