Re: Performance degradation of REFRESH MATERIALIZED VIEW - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Performance degradation of REFRESH MATERIALIZED VIEW |
Date | |
Msg-id | CAD21AoCnp2dVVFiVvhypDXus4hJhCbOyydFj9PWc+NgctqLieQ@mail.gmail.com Whole thread Raw |
In response to | Re: Performance degradation of REFRESH MATERIALIZED VIEW (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Performance degradation of REFRESH MATERIALIZED VIEW
|
List | pgsql-hackers |
On Fri, Apr 16, 2021 at 12:16 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Mon, 12 Apr 2021 15:20:41 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > > . > > > > On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > Hi, > > > > > > While discussing freezing tuples during CTAS[1], we found that > > > heap_insert() with HEAP_INSERT_FROZEN brings performance degradation. > > > For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS, > > > it took 12 sec whereas the code without the patch took 10 sec with the > > > following query: > > > > > > create table t1 (a, b, c, d) as select i,i,i,i from > > > generate_series(1,20000000) i; > > > > > > I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the > > > following queries: > > > > > > create table source as select generate_series(1, 50000000); > > > create materialized view mv as select * from source; > > > refresh materialized view mv; > > > > > > The execution time of REFRESH MATERIALIZED VIEW are: > > > > > > w/ HEAP_INSERT_FROZEN flag : 42 sec > > > w/o HEAP_INSERT_FROZEN flag : 33 sec > > > > > > After investigation, I found that such performance degradation happens > > > on only HEAD code. It seems to me that commit 39b66a91b (and > > > 7db0cd2145) is relevant that has heap_insert() set VM bits and > > > PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas > > > Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the > > > page when inserting a tuple for the first time on the page (around > > > L2133 in heapam.c), every subsequent heap_insert() on the page reads > > > and pins a VM buffer (see RelationGetBufferForTuple()). > > > > IIUC RelationGetBufferForTuple() pins vm buffer if the page is > > all-visible since the caller might clear vm bit during operation. But > > it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting > > HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible > > bit but never clear those flag and bit during insertion. Therefore to > > fix this issue, I think we can have RelationGetBufferForTuple() not to > > pin vm buffer if we're inserting a frozen tuple (i.g., > > HEAP_FROZEN_INSERT case) and the target page is already all-visible. > > It seems right to me. > > > In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would > > be the table is empty. That way, we will pin vm buffer only for the > > first time of inserting frozen tuple into the empty page, then set > > PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set > > XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not > > pin vm buffer as long as we’re inserting a frozen tuple into the same > > page. > > > > If the target page is neither empty nor all-visible we will not pin vm > > buffer, which is fine because if the page has non-frozen tuple we > > cannot set bit on vm during heap_insert(). If all tuples on the page > > are already frozen but PD_ALL_VISIBLE is not set for some reason, we > > would be able to set all-frozen bit on vm but it seems not a good idea > > since it requires checking during insertion if all existing tuples are > > frozen or not. > > > > The attached patch does the above idea. With this patch, the same > > performance tests took 33 sec. Thank you for the comments. > > Great! The direction of the patch looks fine to me. > > + * If we're inserting frozen entry into empty page, we will set > + * all-visible to page and all-frozen on visibility map. > + */ > + if (PageGetMaxOffsetNumber(page) == 0) > all_frozen_set = true; > > AFAICS the page is always empty when RelationGetBufferForTuple > returned a valid vmbuffer. So the "if" should be an "assert" instead. There is a chance that RelationGetBufferForTuple() returns a valid vmbuffer but the page is not empty, since RelationGetBufferForTuple() checks without a lock if the page is empty. But when it comes to HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now since only one process inserts tuples into the relation. Will fix. > > And, the patch changes the value of all_frozen_set to false when the > page was already all-frozen (thus not empty). It would be fine since > we don't need to change the visibility of the page in that case but > the variable name is no longer correct. set_all_visible or such? It seems to me that the variable name all_frozen_set corresponds to XLH_INSERT_ALL_FROZEN_SET but I see your point. How about set_all_frozen instead since we set all-frozen bits (also implying setting all-visible)? BTW I found the following description of XLH_INSERT_ALL_FROZEN_SET but there is no all_visible_set anywhere: /* all_frozen_set always implies all_visible_set */ #define XLH_INSERT_ALL_FROZEN_SET (1<<5) I'll update those comments as well. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: