Re: Performance degradation of REFRESH MATERIALIZED VIEW - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Performance degradation of REFRESH MATERIALIZED VIEW |
Date | |
Msg-id | CAD21AoCQfKv9miiiFwVNwvtm8gZ2koSis5OoCEysukmroNFBVg@mail.gmail.com Whole thread Raw |
Responses |
Re: Performance degradation of REFRESH MATERIALIZED VIEW
|
List | pgsql-hackers |
. On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi, > > While discussing freezing tuples during CTAS[1], we found that > heap_insert() with HEAP_INSERT_FROZEN brings performance degradation. > For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS, > it took 12 sec whereas the code without the patch took 10 sec with the > following query: > > create table t1 (a, b, c, d) as select i,i,i,i from > generate_series(1,20000000) i; > > I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the > following queries: > > create table source as select generate_series(1, 50000000); > create materialized view mv as select * from source; > refresh materialized view mv; > > The execution time of REFRESH MATERIALIZED VIEW are: > > w/ HEAP_INSERT_FROZEN flag : 42 sec > w/o HEAP_INSERT_FROZEN flag : 33 sec > > After investigation, I found that such performance degradation happens > on only HEAD code. It seems to me that commit 39b66a91b (and > 7db0cd2145) is relevant that has heap_insert() set VM bits and > PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas > Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the > page when inserting a tuple for the first time on the page (around > L2133 in heapam.c), every subsequent heap_insert() on the page reads > and pins a VM buffer (see RelationGetBufferForTuple()). IIUC RelationGetBufferForTuple() pins vm buffer if the page is all-visible since the caller might clear vm bit during operation. But it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible bit but never clear those flag and bit during insertion. Therefore to fix this issue, I think we can have RelationGetBufferForTuple() not to pin vm buffer if we're inserting a frozen tuple (i.g., HEAP_FROZEN_INSERT case) and the target page is already all-visible. In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would be the table is empty. That way, we will pin vm buffer only for the first time of inserting frozen tuple into the empty page, then set PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not pin vm buffer as long as we’re inserting a frozen tuple into the same page. If the target page is neither empty nor all-visible we will not pin vm buffer, which is fine because if the page has non-frozen tuple we cannot set bit on vm during heap_insert(). If all tuples on the page are already frozen but PD_ALL_VISIBLE is not set for some reason, we would be able to set all-frozen bit on vm but it seems not a good idea since it requires checking during insertion if all existing tuples are frozen or not. The attached patch does the above idea. With this patch, the same performance tests took 33 sec. Also, I've measured the number of page read during REFRESH MATERIALIZED VIEW using by pg_stat_statements. There were big different on shared_blks_hit on pg_stat_statements: 1. w/ HEAP_INSERT_FROZEN flag (HEAD) : 50221781 2. w/ HEAP_INSERT_FROZEN flag (HEAD) : 221782 3. Patched: 443014 Since the 'source' table has 50000000 and each heap_insert() read vm buffer, test 1 read pages as many as the number of insertion tuples. The value of test 3 is about twice as much as the one of test 2. This is because heap_insert() read the vm buffer for each first insertion to the page. The table has 221239 blocks. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
pgsql-hackers by date: