Thread: Re: Performance degradation of REFRESH MATERIALIZED VIEW
. On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi, > > While discussing freezing tuples during CTAS[1], we found that > heap_insert() with HEAP_INSERT_FROZEN brings performance degradation. > For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS, > it took 12 sec whereas the code without the patch took 10 sec with the > following query: > > create table t1 (a, b, c, d) as select i,i,i,i from > generate_series(1,20000000) i; > > I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the > following queries: > > create table source as select generate_series(1, 50000000); > create materialized view mv as select * from source; > refresh materialized view mv; > > The execution time of REFRESH MATERIALIZED VIEW are: > > w/ HEAP_INSERT_FROZEN flag : 42 sec > w/o HEAP_INSERT_FROZEN flag : 33 sec > > After investigation, I found that such performance degradation happens > on only HEAD code. It seems to me that commit 39b66a91b (and > 7db0cd2145) is relevant that has heap_insert() set VM bits and > PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas > Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the > page when inserting a tuple for the first time on the page (around > L2133 in heapam.c), every subsequent heap_insert() on the page reads > and pins a VM buffer (see RelationGetBufferForTuple()). IIUC RelationGetBufferForTuple() pins vm buffer if the page is all-visible since the caller might clear vm bit during operation. But it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible bit but never clear those flag and bit during insertion. Therefore to fix this issue, I think we can have RelationGetBufferForTuple() not to pin vm buffer if we're inserting a frozen tuple (i.g., HEAP_FROZEN_INSERT case) and the target page is already all-visible. In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would be the table is empty. That way, we will pin vm buffer only for the first time of inserting frozen tuple into the empty page, then set PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not pin vm buffer as long as we’re inserting a frozen tuple into the same page. If the target page is neither empty nor all-visible we will not pin vm buffer, which is fine because if the page has non-frozen tuple we cannot set bit on vm during heap_insert(). If all tuples on the page are already frozen but PD_ALL_VISIBLE is not set for some reason, we would be able to set all-frozen bit on vm but it seems not a good idea since it requires checking during insertion if all existing tuples are frozen or not. The attached patch does the above idea. With this patch, the same performance tests took 33 sec. Also, I've measured the number of page read during REFRESH MATERIALIZED VIEW using by pg_stat_statements. There were big different on shared_blks_hit on pg_stat_statements: 1. w/ HEAP_INSERT_FROZEN flag (HEAD) : 50221781 2. w/ HEAP_INSERT_FROZEN flag (HEAD) : 221782 3. Patched: 443014 Since the 'source' table has 50000000 and each heap_insert() read vm buffer, test 1 read pages as many as the number of insertion tuples. The value of test 3 is about twice as much as the one of test 2. This is because heap_insert() read the vm buffer for each first insertion to the page. The table has 221239 blocks. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
At Mon, 12 Apr 2021 15:20:41 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > . > > On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Hi, > > > > While discussing freezing tuples during CTAS[1], we found that > > heap_insert() with HEAP_INSERT_FROZEN brings performance degradation. > > For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS, > > it took 12 sec whereas the code without the patch took 10 sec with the > > following query: > > > > create table t1 (a, b, c, d) as select i,i,i,i from > > generate_series(1,20000000) i; > > > > I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the > > following queries: > > > > create table source as select generate_series(1, 50000000); > > create materialized view mv as select * from source; > > refresh materialized view mv; > > > > The execution time of REFRESH MATERIALIZED VIEW are: > > > > w/ HEAP_INSERT_FROZEN flag : 42 sec > > w/o HEAP_INSERT_FROZEN flag : 33 sec > > > > After investigation, I found that such performance degradation happens > > on only HEAD code. It seems to me that commit 39b66a91b (and > > 7db0cd2145) is relevant that has heap_insert() set VM bits and > > PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas > > Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the > > page when inserting a tuple for the first time on the page (around > > L2133 in heapam.c), every subsequent heap_insert() on the page reads > > and pins a VM buffer (see RelationGetBufferForTuple()). > > IIUC RelationGetBufferForTuple() pins vm buffer if the page is > all-visible since the caller might clear vm bit during operation. But > it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting > HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible > bit but never clear those flag and bit during insertion. Therefore to > fix this issue, I think we can have RelationGetBufferForTuple() not to > pin vm buffer if we're inserting a frozen tuple (i.g., > HEAP_FROZEN_INSERT case) and the target page is already all-visible. It seems right to me. > In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would > be the table is empty. That way, we will pin vm buffer only for the > first time of inserting frozen tuple into the empty page, then set > PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set > XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not > pin vm buffer as long as we’re inserting a frozen tuple into the same > page. > > If the target page is neither empty nor all-visible we will not pin vm > buffer, which is fine because if the page has non-frozen tuple we > cannot set bit on vm during heap_insert(). If all tuples on the page > are already frozen but PD_ALL_VISIBLE is not set for some reason, we > would be able to set all-frozen bit on vm but it seems not a good idea > since it requires checking during insertion if all existing tuples are > frozen or not. > > The attached patch does the above idea. With this patch, the same > performance tests took 33 sec. Great! The direction of the patch looks fine to me. + * If we're inserting frozen entry into empty page, we will set + * all-visible to page and all-frozen on visibility map. + */ + if (PageGetMaxOffsetNumber(page) == 0) all_frozen_set = true; AFAICS the page is always empty when RelationGetBufferForTuple returned a valid vmbuffer. So the "if" should be an "assert" instead. And, the patch changes the value of all_frozen_set to false when the page was already all-frozen (thus not empty). It would be fine since we don't need to change the visibility of the page in that case but the variable name is no longer correct. set_all_visible or such? > Also, I've measured the number of page read during REFRESH > MATERIALIZED VIEW using by pg_stat_statements. There were big > different on shared_blks_hit on pg_stat_statements: > > 1. w/ HEAP_INSERT_FROZEN flag (HEAD) : 50221781 > 2. w/ HEAP_INSERT_FROZEN flag (HEAD) : 221782 > 3. Patched: 443014 > > Since the 'source' table has 50000000 and each heap_insert() read vm > buffer, test 1 read pages as many as the number of insertion tuples. > The value of test 3 is about twice as much as the one of test 2. This > is because heap_insert() read the vm buffer for each first insertion > to the page. The table has 221239 blocks. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Apr 16, 2021 at 12:16 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Mon, 12 Apr 2021 15:20:41 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > > . > > > > On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > Hi, > > > > > > While discussing freezing tuples during CTAS[1], we found that > > > heap_insert() with HEAP_INSERT_FROZEN brings performance degradation. > > > For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS, > > > it took 12 sec whereas the code without the patch took 10 sec with the > > > following query: > > > > > > create table t1 (a, b, c, d) as select i,i,i,i from > > > generate_series(1,20000000) i; > > > > > > I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the > > > following queries: > > > > > > create table source as select generate_series(1, 50000000); > > > create materialized view mv as select * from source; > > > refresh materialized view mv; > > > > > > The execution time of REFRESH MATERIALIZED VIEW are: > > > > > > w/ HEAP_INSERT_FROZEN flag : 42 sec > > > w/o HEAP_INSERT_FROZEN flag : 33 sec > > > > > > After investigation, I found that such performance degradation happens > > > on only HEAD code. It seems to me that commit 39b66a91b (and > > > 7db0cd2145) is relevant that has heap_insert() set VM bits and > > > PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas > > > Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the > > > page when inserting a tuple for the first time on the page (around > > > L2133 in heapam.c), every subsequent heap_insert() on the page reads > > > and pins a VM buffer (see RelationGetBufferForTuple()). > > > > IIUC RelationGetBufferForTuple() pins vm buffer if the page is > > all-visible since the caller might clear vm bit during operation. But > > it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting > > HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible > > bit but never clear those flag and bit during insertion. Therefore to > > fix this issue, I think we can have RelationGetBufferForTuple() not to > > pin vm buffer if we're inserting a frozen tuple (i.g., > > HEAP_FROZEN_INSERT case) and the target page is already all-visible. > > It seems right to me. > > > In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would > > be the table is empty. That way, we will pin vm buffer only for the > > first time of inserting frozen tuple into the empty page, then set > > PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set > > XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not > > pin vm buffer as long as we’re inserting a frozen tuple into the same > > page. > > > > If the target page is neither empty nor all-visible we will not pin vm > > buffer, which is fine because if the page has non-frozen tuple we > > cannot set bit on vm during heap_insert(). If all tuples on the page > > are already frozen but PD_ALL_VISIBLE is not set for some reason, we > > would be able to set all-frozen bit on vm but it seems not a good idea > > since it requires checking during insertion if all existing tuples are > > frozen or not. > > > > The attached patch does the above idea. With this patch, the same > > performance tests took 33 sec. Thank you for the comments. > > Great! The direction of the patch looks fine to me. > > + * If we're inserting frozen entry into empty page, we will set > + * all-visible to page and all-frozen on visibility map. > + */ > + if (PageGetMaxOffsetNumber(page) == 0) > all_frozen_set = true; > > AFAICS the page is always empty when RelationGetBufferForTuple > returned a valid vmbuffer. So the "if" should be an "assert" instead. There is a chance that RelationGetBufferForTuple() returns a valid vmbuffer but the page is not empty, since RelationGetBufferForTuple() checks without a lock if the page is empty. But when it comes to HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now since only one process inserts tuples into the relation. Will fix. > > And, the patch changes the value of all_frozen_set to false when the > page was already all-frozen (thus not empty). It would be fine since > we don't need to change the visibility of the page in that case but > the variable name is no longer correct. set_all_visible or such? It seems to me that the variable name all_frozen_set corresponds to XLH_INSERT_ALL_FROZEN_SET but I see your point. How about set_all_frozen instead since we set all-frozen bits (also implying setting all-visible)? BTW I found the following description of XLH_INSERT_ALL_FROZEN_SET but there is no all_visible_set anywhere: /* all_frozen_set always implies all_visible_set */ #define XLH_INSERT_ALL_FROZEN_SET (1<<5) I'll update those comments as well. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
At Mon, 19 Apr 2021 13:32:31 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > On Fri, Apr 16, 2021 at 12:16 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > AFAICS the page is always empty when RelationGetBufferForTuple > > returned a valid vmbuffer. So the "if" should be an "assert" instead. > > There is a chance that RelationGetBufferForTuple() returns a valid > vmbuffer but the page is not empty, since RelationGetBufferForTuple() > checks without a lock if the page is empty. But when it comes to > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now > since only one process inserts tuples into the relation. Will fix. Yes. It seems to me that it is cleaner that RelationGetBufferForTuple returns vmbuffer only when the caller needs to change vm state. Thanks. > > And, the patch changes the value of all_frozen_set to false when the > > page was already all-frozen (thus not empty). It would be fine since > > we don't need to change the visibility of the page in that case but > > the variable name is no longer correct. set_all_visible or such? > > It seems to me that the variable name all_frozen_set corresponds to > XLH_INSERT_ALL_FROZEN_SET but I see your point. How about > set_all_frozen instead since we set all-frozen bits (also implying > setting all-visible)? Right. However, "if (set_all_frozen) then "set all_visible" looks like a bug^^;. all_frozen_set looks better in that context than set_all_frozen. So I withdraw the comment. > BTW I found the following description of XLH_INSERT_ALL_FROZEN_SET but > there is no all_visible_set anywhere: > > /* all_frozen_set always implies all_visible_set */ > #define XLH_INSERT_ALL_FROZEN_SET (1<<5) > > I'll update those comments as well. FWIW, it seems like a shorthand of "ALL_FROZEN_SET implies ALL_VISIBLE to be set together". The current comment is working to me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Apr 19, 2021 at 5:04 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Mon, 19 Apr 2021 13:32:31 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > > On Fri, Apr 16, 2021 at 12:16 PM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > AFAICS the page is always empty when RelationGetBufferForTuple > > > returned a valid vmbuffer. So the "if" should be an "assert" instead. > > > > There is a chance that RelationGetBufferForTuple() returns a valid > > vmbuffer but the page is not empty, since RelationGetBufferForTuple() > > checks without a lock if the page is empty. But when it comes to > > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now > > since only one process inserts tuples into the relation. Will fix. > > Yes. It seems to me that it is cleaner that RelationGetBufferForTuple > returns vmbuffer only when the caller needs to change vm state. > Thanks. > > > > And, the patch changes the value of all_frozen_set to false when the > > > page was already all-frozen (thus not empty). It would be fine since > > > we don't need to change the visibility of the page in that case but > > > the variable name is no longer correct. set_all_visible or such? > > > > It seems to me that the variable name all_frozen_set corresponds to > > XLH_INSERT_ALL_FROZEN_SET but I see your point. How about > > set_all_frozen instead since we set all-frozen bits (also implying > > setting all-visible)? > > Right. However, "if (set_all_frozen) then "set all_visible" looks like > a bug^^;. all_frozen_set looks better in that context than > set_all_frozen. So I withdraw the comment. > > > BTW I found the following description of XLH_INSERT_ALL_FROZEN_SET but > > there is no all_visible_set anywhere: > > > > /* all_frozen_set always implies all_visible_set */ > > #define XLH_INSERT_ALL_FROZEN_SET (1<<5) > > > > I'll update those comments as well. > > FWIW, it seems like a shorthand of "ALL_FROZEN_SET implies ALL_VISIBLE > to be set together". The current comment is working to me. > Okay, I've updated the patch accordingly. Please review it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
On Mon, Apr 19, 2021 at 1:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Apr 19, 2021 at 5:04 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Mon, 19 Apr 2021 13:32:31 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > > > On Fri, Apr 16, 2021 at 12:16 PM Kyotaro Horiguchi > > > <horikyota.ntt@gmail.com> wrote: > > > > AFAICS the page is always empty when RelationGetBufferForTuple > > > > returned a valid vmbuffer. So the "if" should be an "assert" instead. > > > > > > There is a chance that RelationGetBufferForTuple() returns a valid > > > vmbuffer but the page is not empty, since RelationGetBufferForTuple() > > > checks without a lock if the page is empty. But when it comes to > > > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now > > > since only one process inserts tuples into the relation. Will fix. > > > > Yes. It seems to me that it is cleaner that RelationGetBufferForTuple > > returns vmbuffer only when the caller needs to change vm state. > > Thanks. > > > > > > And, the patch changes the value of all_frozen_set to false when the > > > > page was already all-frozen (thus not empty). It would be fine since > > > > we don't need to change the visibility of the page in that case but > > > > the variable name is no longer correct. set_all_visible or such? > > > > > > It seems to me that the variable name all_frozen_set corresponds to > > > XLH_INSERT_ALL_FROZEN_SET but I see your point. How about > > > set_all_frozen instead since we set all-frozen bits (also implying > > > setting all-visible)? > > > > Right. However, "if (set_all_frozen) then "set all_visible" looks like > > a bug^^;. all_frozen_set looks better in that context than > > set_all_frozen. So I withdraw the comment. > > > > > BTW I found the following description of XLH_INSERT_ALL_FROZEN_SET but > > > there is no all_visible_set anywhere: > > > > > > /* all_frozen_set always implies all_visible_set */ > > > #define XLH_INSERT_ALL_FROZEN_SET (1<<5) > > > > > > I'll update those comments as well. > > > > FWIW, it seems like a shorthand of "ALL_FROZEN_SET implies ALL_VISIBLE > > to be set together". The current comment is working to me. > > > > Okay, I've updated the patch accordingly. Please review it. I was reading the patch, just found some typos: it should be "a frozen tuple" not "an frozen tuple". + * Also pin visibility map page if we're inserting an frozen tuple into + * If we're inserting an frozen entry into empty page, pin the With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 19, 2021 at 8:04 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Apr 19, 2021 at 1:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Apr 19, 2021 at 5:04 PM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > > > > At Mon, 19 Apr 2021 13:32:31 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > > > > On Fri, Apr 16, 2021 at 12:16 PM Kyotaro Horiguchi > > > > <horikyota.ntt@gmail.com> wrote: > > > > > AFAICS the page is always empty when RelationGetBufferForTuple > > > > > returned a valid vmbuffer. So the "if" should be an "assert" instead. > > > > > > > > There is a chance that RelationGetBufferForTuple() returns a valid > > > > vmbuffer but the page is not empty, since RelationGetBufferForTuple() > > > > checks without a lock if the page is empty. But when it comes to > > > > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now > > > > since only one process inserts tuples into the relation. Will fix. > > > > > > Yes. It seems to me that it is cleaner that RelationGetBufferForTuple > > > returns vmbuffer only when the caller needs to change vm state. > > > Thanks. > > > > > > > > And, the patch changes the value of all_frozen_set to false when the > > > > > page was already all-frozen (thus not empty). It would be fine since > > > > > we don't need to change the visibility of the page in that case but > > > > > the variable name is no longer correct. set_all_visible or such? > > > > > > > > It seems to me that the variable name all_frozen_set corresponds to > > > > XLH_INSERT_ALL_FROZEN_SET but I see your point. How about > > > > set_all_frozen instead since we set all-frozen bits (also implying > > > > setting all-visible)? > > > > > > Right. However, "if (set_all_frozen) then "set all_visible" looks like > > > a bug^^;. all_frozen_set looks better in that context than > > > set_all_frozen. So I withdraw the comment. > > > > > > > BTW I found the following description of XLH_INSERT_ALL_FROZEN_SET but > > > > there is no all_visible_set anywhere: > > > > > > > > /* all_frozen_set always implies all_visible_set */ > > > > #define XLH_INSERT_ALL_FROZEN_SET (1<<5) > > > > > > > > I'll update those comments as well. > > > > > > FWIW, it seems like a shorthand of "ALL_FROZEN_SET implies ALL_VISIBLE > > > to be set together". The current comment is working to me. > > > > > > > Okay, I've updated the patch accordingly. Please review it. > > I was reading the patch, just found some typos: it should be "a frozen > tuple" not "an frozen tuple". > > + * Also pin visibility map page if we're inserting an frozen tuple into > + * If we're inserting an frozen entry into empty page, pin the Thank you for the comment. I’ve updated the patch including the above comment. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
On Mon, Apr 19, 2021 at 7:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I’ve updated the patch including the above comment. Thanks for the patch. I was trying to understand below statements: + * we check without a buffer lock if the page is empty but the + * caller doesn't need to recheck that since we assume that in + * HEAP_INSERT_FROZEN case, only one process is inserting a + * frozen tuple into this relation. + * And earlier comments from upthread: >> AFAICS the page is always empty when RelationGetBufferForTuple >> returned a valid vmbuffer. So the "if" should be an "assert" instead. > There is a chance that RelationGetBufferForTuple() returns a valid > vmbuffer but the page is not empty, since RelationGetBufferForTuple() > checks without a lock if the page is empty. But when it comes to > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now > since only one process inserts tuples into the relation. Will fix." I'm not sure whether it is safe to assume "at least for now since only one process inserts tuples into the relation". What if we allow parallel inserts for HEAP_INSERT_FROZEN cases, I don't know whether we can do that or not. Correct me if I'm wrong. While we are modifying something in heap_insert: 1) Can we adjust the comment below in heap_insert to the 80char limit? * If we're inserting frozen entry into an empty page, * set visibility map bits and PageAllVisible() hint. 2) I'm thinking whether we can do page = BufferGetPage(buffer); after RelationGetBufferForTuple and use in all the places where currently BufferGetPage(buffer) is being used: if (PageIsAllVisible(BufferGetPage(buffer)), PageClearAllVisible(BufferGetPage(buffer)); and we could even remove the local variable page of if (RelationNeedsWAL(relation)). 3) We could as well get the block number once and use it in all the places in heap_insert, thus we can remove extra calls of BufferGetBlockNumber(buffer). Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 20, 2021 at 11:04 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Apr 19, 2021 at 7:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I’ve updated the patch including the above comment. > > Thanks for the patch. > > I was trying to understand below statements: > + * we check without a buffer lock if the page is empty but the > + * caller doesn't need to recheck that since we assume that in > + * HEAP_INSERT_FROZEN case, only one process is inserting a > + * frozen tuple into this relation. > + * > > And earlier comments from upthread: > > >> AFAICS the page is always empty when RelationGetBufferForTuple > >> returned a valid vmbuffer. So the "if" should be an "assert" instead. > > > There is a chance that RelationGetBufferForTuple() returns a valid > > vmbuffer but the page is not empty, since RelationGetBufferForTuple() > > checks without a lock if the page is empty. But when it comes to > > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now > > since only one process inserts tuples into the relation. Will fix." > > I'm not sure whether it is safe to assume "at least for now since only > one process inserts tuples into the relation". What if we allow > parallel inserts for HEAP_INSERT_FROZEN cases, I don't know whether we > can do that or not. Correct me if I'm wrong. I think if my assumption is wrong or we allow parallel insert for HEAP_INSERT_FROZEN cases in the future, we need to deal with the case where frozen tuples are concurrently inserted into the same page. For example, we can release vmbuffer when we see the page is no longer empty, or we can return a valid buffer but require the caller to re-check if the page is still empty. The previous version patch took the former approach. More concretely, heap_insert() rechecked if the page is still empty in HEAP_INSERT_FROZEN case and set all_frozen_set if so. But AFAICS concurrently inserting frozen tuples into the same page doesn’t happen for now (COPY FREEZE and REFRESH MATERIALIZED VIEW are users of HEAP_INSERT_FROZEN), also pointed out by Horiguchi-san. So I added comments and assertions rather than addressing the case that never happens with the current code. If concurrently inserting frozen tuples into the same page happens, we should get the assertion failure that I added in RelationGetBufferForTuple(). > > While we are modifying something in heap_insert: > 1) Can we adjust the comment below in heap_insert to the 80char limit? > * If we're inserting frozen entry into an empty page, > * set visibility map bits and PageAllVisible() hint. > 2) I'm thinking whether we can do page = BufferGetPage(buffer); after > RelationGetBufferForTuple and use in all the places where currently > BufferGetPage(buffer) is being used: > if (PageIsAllVisible(BufferGetPage(buffer)), > PageClearAllVisible(BufferGetPage(buffer)); and we could even remove > the local variable page of if (RelationNeedsWAL(relation)). > 3) We could as well get the block number once and use it in all the > places in heap_insert, thus we can remove extra calls of > BufferGetBlockNumber(buffer). All points are reasonable to me. I'll incorporate them in the next version. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Tue, Apr 20, 2021 at 11:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Apr 20, 2021 at 11:04 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Mon, Apr 19, 2021 at 7:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I’ve updated the patch including the above comment. > > > > Thanks for the patch. > > > > I was trying to understand below statements: > > + * we check without a buffer lock if the page is empty but the > > + * caller doesn't need to recheck that since we assume that in > > + * HEAP_INSERT_FROZEN case, only one process is inserting a > > + * frozen tuple into this relation. > > + * > > > > And earlier comments from upthread: > > > > >> AFAICS the page is always empty when RelationGetBufferForTuple > > >> returned a valid vmbuffer. So the "if" should be an "assert" instead. > > > > > There is a chance that RelationGetBufferForTuple() returns a valid > > > vmbuffer but the page is not empty, since RelationGetBufferForTuple() > > > checks without a lock if the page is empty. But when it comes to > > > HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now > > > since only one process inserts tuples into the relation. Will fix." > > > > I'm not sure whether it is safe to assume "at least for now since only > > one process inserts tuples into the relation". What if we allow > > parallel inserts for HEAP_INSERT_FROZEN cases, I don't know whether we > > can do that or not. Correct me if I'm wrong. > > I think if my assumption is wrong or we allow parallel insert for > HEAP_INSERT_FROZEN cases in the future, we need to deal with the case > where frozen tuples are concurrently inserted into the same page. For > example, we can release vmbuffer when we see the page is no longer > empty, or we can return a valid buffer but require the caller to > re-check if the page is still empty. The previous version patch took > the former approach. More concretely, heap_insert() rechecked if the > page is still empty in HEAP_INSERT_FROZEN case and set all_frozen_set > if so. But AFAICS concurrently inserting frozen tuples into the same > page doesn’t happen for now (COPY FREEZE and REFRESH MATERIALIZED VIEW > are users of HEAP_INSERT_FROZEN), also pointed out by Horiguchi-san. > So I added comments and assertions rather than addressing the case > that never happens with the current code. If concurrently inserting > frozen tuples into the same page happens, we should get the assertion > failure that I added in RelationGetBufferForTuple(). Upon thinking further, concurrent insertions into the same page are not possible while we are in heap_insert in between RelationGetBufferForTuple and UnlockReleaseBuffer(buffer);. RelationGetBufferForTuple will lock the buffer in exclusive mode, see LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); and comment " * Returns pinned and exclusive-locked buffer of a page in given relation". Even if parallel insertions are allowed in HEAP_INSERT_FROZEN cases, then each worker will separately acquire pages, insert into them and they skip getting visibility map page pin if the page is set all-visible by another worker. Some more comments on v3 patch: 1) Isn't it good to specify here that what we gain by avoiding pinning visibility map page something like: gain a few seconds/avoid extra function calls/or some other better wording? + * If the page already is non-empty and all-visible, we skip to + * pin on a visibility map buffer since we never clear and set + * all-frozen bit on visibility map during inserting a frozen + * tuple. + */ 2) Isn't it good to put PageIsAllVisible(BufferGetPage(buffer))) in the if clause instead of else if clause, because this is going to be hitting most of the time, we could avoid page empty check every time? + if (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0) + visibilitymap_pin(relation, targetBlock, vmbuffer); + else if (PageIsAllVisible(BufferGetPage(buffer))) + skip_vmbuffer = true; 3) I found another typo in v3 - it is "will set" not "will sets": + * In HEAP_INSERT_FROZEN cases, we handle the possibility that the caller will + * sets all-frozen bit on the visibility map page. We pin on the visibility 4) I think a commit message can be added to the upcoming patch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Hi, I took a look at this today, as I committed 39b66a91b back in January. I can reproduce the issue, with just 1M rows the before/after timings are roughly 480ms and 620ms on my hardware. Unfortunately, the v3 patch does not really fix the issue for me. The timing with it applied is ~610ms so the improvement is only minimal. I'm not sure what to do about this :-( I don't have any ideas about how to eliminate this overhead, so the only option I see is reverting the changes in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't be frozen ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: > I'm not sure what to do about this :-( I don't have any ideas about how to > eliminate this overhead, so the only option I see is reverting the changes > in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't > be frozen ... ISTM that the fundamental issue here is not that we acquire pins that we shouldn't, but that we do so at a much higher frequency than needed. It's probably too invasive for 14, but I think it might be worth exploring passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff the input will be more than one row. And then add the vm buffer of the target page to BulkInsertState, so that hio.c can avoid re-pinning the buffer. Greetings, Andres Freund
On 4/26/21 9:27 PM, Andres Freund wrote: > Hi, > > On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: >> I'm not sure what to do about this :-( I don't have any ideas about how to >> eliminate this overhead, so the only option I see is reverting the changes >> in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't >> be frozen ... > > ISTM that the fundamental issue here is not that we acquire pins that we > shouldn't, but that we do so at a much higher frequency than needed. > > It's probably too invasive for 14, but I think it might be worth exploring > passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff > the input will be more than one row. > > And then add the vm buffer of the target page to BulkInsertState, so that > hio.c can avoid re-pinning the buffer. > Yeah. The question still is what to do about 14, though. Shall we leave the code as it is now, or should we change it somehow? It seem a bit unfortunate that a COPY FREEZE optimization should negatively influence other (more) common use cases, so I guess we can't just keep the current code ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: > On 4/26/21 9:27 PM, Andres Freund wrote: > > On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: > > > I'm not sure what to do about this :-( I don't have any ideas about how to > > > eliminate this overhead, so the only option I see is reverting the changes > > > in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't > > > be frozen ... > > > > ISTM that the fundamental issue here is not that we acquire pins that we > > shouldn't, but that we do so at a much higher frequency than needed. > > > > It's probably too invasive for 14, but I think it might be worth exploring > > passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff > > the input will be more than one row. > > > > And then add the vm buffer of the target page to BulkInsertState, so that > > hio.c can avoid re-pinning the buffer. > > > > Yeah. The question still is what to do about 14, though. Shall we leave the > code as it is now, or should we change it somehow? It seem a bit unfortunate > that a COPY FREEZE optimization should negatively influence other (more) > common use cases, so I guess we can't just keep the current code ... I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c and see whether that fixes the regression. If it does, then we can analyze whether that's possibly the best way forward. Or whether we revert, live with the regression or find yet another path. Greetings, Andres Freund
On Mon, Apr 26, 2021 at 10:31 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > I took a look at this today, as I committed 39b66a91b back in January. I > can reproduce the issue, with just 1M rows the before/after timings are > roughly 480ms and 620ms on my hardware. > > Unfortunately, the v3 patch does not really fix the issue for me. The > timing with it applied is ~610ms so the improvement is only minimal. Since the reading vmbuffer is likely to hit on the shared buffer during inserting frozen tuples, I think the improvement would not be visible with a few million tuples depending on hardware. But it might not be as fast as before commit 39b66a91b since we read vmbuffer at least per insertion. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Tue, Apr 27, 2021 at 8:07 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: > > On 4/26/21 9:27 PM, Andres Freund wrote: > > > On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: > > > > I'm not sure what to do about this :-( I don't have any ideas about how to > > > > eliminate this overhead, so the only option I see is reverting the changes > > > > in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't > > > > be frozen ... > > > > > > ISTM that the fundamental issue here is not that we acquire pins that we > > > shouldn't, but that we do so at a much higher frequency than needed. > > > > > > It's probably too invasive for 14, but I think it might be worth exploring > > > passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff > > > the input will be more than one row. > > > > > > And then add the vm buffer of the target page to BulkInsertState, so that > > > hio.c can avoid re-pinning the buffer. > > > > > > > Yeah. The question still is what to do about 14, though. Shall we leave the > > code as it is now, or should we change it somehow? It seem a bit unfortunate > > that a COPY FREEZE optimization should negatively influence other (more) > > common use cases, so I guess we can't just keep the current code ... > > I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c > and see whether that fixes the regression. Is this idea to have RelationGetBufferForTuple() skip re-pinning vmbuffer? If so, is this essentially the same as the one in the v3 patch? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 4/27/21 7:34 AM, Masahiko Sawada wrote: > On Tue, Apr 27, 2021 at 8:07 AM Andres Freund <andres@anarazel.de> wrote: >> >> Hi, >> >> On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: >>> On 4/26/21 9:27 PM, Andres Freund wrote: >>>> On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: >>>>> I'm not sure what to do about this :-( I don't have any ideas about how to >>>>> eliminate this overhead, so the only option I see is reverting the changes >>>>> in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't >>>>> be frozen ... >>>> >>>> ISTM that the fundamental issue here is not that we acquire pins that we >>>> shouldn't, but that we do so at a much higher frequency than needed. >>>> >>>> It's probably too invasive for 14, but I think it might be worth exploring >>>> passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff >>>> the input will be more than one row. >>>> >>>> And then add the vm buffer of the target page to BulkInsertState, so that >>>> hio.c can avoid re-pinning the buffer. >>>> >>> >>> Yeah. The question still is what to do about 14, though. Shall we leave the >>> code as it is now, or should we change it somehow? It seem a bit unfortunate >>> that a COPY FREEZE optimization should negatively influence other (more) >>> common use cases, so I guess we can't just keep the current code ... >> >> I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c >> and see whether that fixes the regression. > > Is this idea to have RelationGetBufferForTuple() skip re-pinning > vmbuffer? If so, is this essentially the same as the one in the v3 > patch? > I don't think it is the same approach - it's a bit hard to follow what exactly happens in RelationGetBufferForTuple, but AFAICS it always starts with vmbuffer = InvalidBuffer, so it may pin the vmbuffer quite often, no? What Andres is suggesting (I think) is to modify ExecInsert() to pass a valid bistate to table_tuple_insert, instead of just NULL, and store the vmbuffer in it. Not sure how to identify when inserting more than just a single row, though ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 27, 2021 at 7:13 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > valid bistate to table_tuple_insert, instead of just NULL, and store the > vmbuffer in it. Not sure how to identify when inserting more than just a > single row, though ... I think the thread "should INSERT SELECT use a BulkInsertState?" [1], has a simple dynamic mechanism [with a GUC defining the threshold tuples] to switch over to using BulkInsertState. Maybe it's worth having a look at the patch - 0001-INSERT-SELECT-to-use-BulkInsertState-and-multi_i.patch? + /* Use bulk insert after a threshold number of tuples */ + // XXX: maybe this should only be done if it's not a partitioned table or + // if the partitions don't support miinfo, which uses its own bistates + mtstate->ntuples++; + if (mtstate->bistate == NULL && + mtstate->operation == CMD_INSERT && + mtstate->ntuples > bulk_insert_ntuples && + bulk_insert_ntuples >= 0) + { + elog(DEBUG1, "enabling bulk insert"); + mtstate->bistate = GetBulkInsertState(); + } [1] https://www.postgresql.org/message-id/20210222030158.GS14772%40telsasoft.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 27, 2021 at 03:43:07PM +0200, Tomas Vondra wrote: > On 4/27/21 7:34 AM, Masahiko Sawada wrote: > > On Tue, Apr 27, 2021 at 8:07 AM Andres Freund <andres@anarazel.de> wrote: > > > On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: > > > > On 4/26/21 9:27 PM, Andres Freund wrote: > > > > > On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: > > > > > > I'm not sure what to do about this :-( I don't have any ideas about how to > > > > > > eliminate this overhead, so the only option I see is reverting the changes > > > > > > in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't > > > > > > be frozen ... > > > > > > > > > > ISTM that the fundamental issue here is not that we acquire pins that we > > > > > shouldn't, but that we do so at a much higher frequency than needed. > > > > > > > > > > It's probably too invasive for 14, but I think it might be worth exploring > > > > > passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff > > > > > the input will be more than one row. > > > > > > > > > > And then add the vm buffer of the target page to BulkInsertState, so that > > > > > hio.c can avoid re-pinning the buffer. > > > > > > > > > > > > > Yeah. The question still is what to do about 14, though. Shall we leave the > > > > code as it is now, or should we change it somehow? It seem a bit unfortunate > > > > that a COPY FREEZE optimization should negatively influence other (more) > > > > common use cases, so I guess we can't just keep the current code ... > > > > > > I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c > > > and see whether that fixes the regression. > > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > valid bistate to table_tuple_insert, instead of just NULL, and store the > vmbuffer in it. Not sure how to identify when inserting more than just a > single row, though ... Maybe this is relevant. https://commitfest.postgresql.org/33/2553/ | INSERT SELECT: BulkInsertState and table_multi_insert The biistate part is small - Simon requested to also use table_multi_insert, which makes the patch much bigger, and there's probably lots of restrictions I haven't even thought to check. This uses a GUC threshold for bulk insert, but I'm still not sure it's really problematic to use a biistate for a single row. /* Use bulk insert after a threshold number of tuples */ // XXX: maybe this should only be done if it's not a partitioned table or // if the partitions don't support miinfo, which uses its own bistates mtstate->ntuples++; if (mtstate->bistate == NULL && mtstate->ntuples > bulk_insert_ntuples && bulk_insert_ntuples >= 0) { elog(DEBUG1, "enabling bulk insert"); mtstate->bistate = GetBulkInsertState(); -- Justin
On Tue, Apr 27, 2021 at 10:43 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 4/27/21 7:34 AM, Masahiko Sawada wrote: > > On Tue, Apr 27, 2021 at 8:07 AM Andres Freund <andres@anarazel.de> wrote: > >> > >> Hi, > >> > >> On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: > >>> On 4/26/21 9:27 PM, Andres Freund wrote: > >>>> On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: > >>>>> I'm not sure what to do about this :-( I don't have any ideas about how to > >>>>> eliminate this overhead, so the only option I see is reverting the changes > >>>>> in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't > >>>>> be frozen ... > >>>> > >>>> ISTM that the fundamental issue here is not that we acquire pins that we > >>>> shouldn't, but that we do so at a much higher frequency than needed. > >>>> > >>>> It's probably too invasive for 14, but I think it might be worth exploring > >>>> passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff > >>>> the input will be more than one row. > >>>> > >>>> And then add the vm buffer of the target page to BulkInsertState, so that > >>>> hio.c can avoid re-pinning the buffer. > >>>> > >>> > >>> Yeah. The question still is what to do about 14, though. Shall we leave the > >>> code as it is now, or should we change it somehow? It seem a bit unfortunate > >>> that a COPY FREEZE optimization should negatively influence other (more) > >>> common use cases, so I guess we can't just keep the current code ... > >> > >> I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c > >> and see whether that fixes the regression. > > > > Is this idea to have RelationGetBufferForTuple() skip re-pinning > > vmbuffer? If so, is this essentially the same as the one in the v3 > > patch? > > > > I don't think it is the same approach - it's a bit hard to follow what > exactly happens in RelationGetBufferForTuple, but AFAICS it always > starts with vmbuffer = InvalidBuffer, so it may pin the vmbuffer quite > often, no? With that patch, we pin the vmbuffer only when inserting a frozen tuple into an empty page. That is, when inserting a frozen tuple into an empty page, we pin the vmbuffer and heap_insert() will mark the page all-visible and set all-frozen bit on vm. And from the next insertion (into the same page) until the page gets full, since the page is already all-visible, we skip pinning the vmbuffer. IOW, if the target page is not empty but all-visible, we skip pinning the vmbuffer. We pin the vmbuffer only once per heap page used during insertion. > > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > valid bistate to table_tuple_insert, instead of just NULL, and store the > vmbuffer in it. Understood. This approach keeps using the same vmbuffer until we need another vm page corresponding to the target heap page, which seems better. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Wed, Apr 28, 2021 at 12:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Apr 27, 2021 at 10:43 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > > > > > > > On 4/27/21 7:34 AM, Masahiko Sawada wrote: > > > On Tue, Apr 27, 2021 at 8:07 AM Andres Freund <andres@anarazel.de> wrote: > > >> > > >> Hi, > > >> > > >> On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: > > >>> On 4/26/21 9:27 PM, Andres Freund wrote: > > >>>> On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: > > >>>>> I'm not sure what to do about this :-( I don't have any ideas about how to > > >>>>> eliminate this overhead, so the only option I see is reverting the changes > > >>>>> in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't > > >>>>> be frozen ... > > >>>> > > >>>> ISTM that the fundamental issue here is not that we acquire pins that we > > >>>> shouldn't, but that we do so at a much higher frequency than needed. > > >>>> > > >>>> It's probably too invasive for 14, but I think it might be worth exploring > > >>>> passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff > > >>>> the input will be more than one row. > > >>>> > > >>>> And then add the vm buffer of the target page to BulkInsertState, so that > > >>>> hio.c can avoid re-pinning the buffer. > > >>>> > > >>> > > >>> Yeah. The question still is what to do about 14, though. Shall we leave the > > >>> code as it is now, or should we change it somehow? It seem a bit unfortunate > > >>> that a COPY FREEZE optimization should negatively influence other (more) > > >>> common use cases, so I guess we can't just keep the current code ... > > >> > > >> I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c > > >> and see whether that fixes the regression. > > > > > > Is this idea to have RelationGetBufferForTuple() skip re-pinning > > > vmbuffer? If so, is this essentially the same as the one in the v3 > > > patch? > > > > > > > I don't think it is the same approach - it's a bit hard to follow what > > exactly happens in RelationGetBufferForTuple, but AFAICS it always > > starts with vmbuffer = InvalidBuffer, so it may pin the vmbuffer quite > > often, no? > > With that patch, we pin the vmbuffer only when inserting a frozen > tuple into an empty page. That is, when inserting a frozen tuple into > an empty page, we pin the vmbuffer and heap_insert() will mark the > page all-visible and set all-frozen bit on vm. And from the next > insertion (into the same page) until the page gets full, since the > page is already all-visible, we skip pinning the vmbuffer. IOW, if the > target page is not empty but all-visible, we skip pinning the > vmbuffer. We pin the vmbuffer only once per heap page used during > insertion. > > > > > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > > valid bistate to table_tuple_insert, instead of just NULL, and store the > > vmbuffer in it. > > Understood. This approach keeps using the same vmbuffer until we need > another vm page corresponding to the target heap page, which seems > better. But how is ExecInsert() related to REFRESH MATERIALIZED VIEW? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 4/27/21 5:44 PM, Masahiko Sawada wrote: > On Wed, Apr 28, 2021 at 12:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Tue, Apr 27, 2021 at 10:43 PM Tomas Vondra >> <tomas.vondra@enterprisedb.com> wrote: >>> >>> >>> >>> On 4/27/21 7:34 AM, Masahiko Sawada wrote: >>>> On Tue, Apr 27, 2021 at 8:07 AM Andres Freund <andres@anarazel.de> wrote: >>>>> >>>>> Hi, >>>>> >>>>> On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: >>>>>> On 4/26/21 9:27 PM, Andres Freund wrote: >>>>>>> On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: >>>>>>>> I'm not sure what to do about this :-( I don't have any ideas about how to >>>>>>>> eliminate this overhead, so the only option I see is reverting the changes >>>>>>>> in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't >>>>>>>> be frozen ... >>>>>>> >>>>>>> ISTM that the fundamental issue here is not that we acquire pins that we >>>>>>> shouldn't, but that we do so at a much higher frequency than needed. >>>>>>> >>>>>>> It's probably too invasive for 14, but I think it might be worth exploring >>>>>>> passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff >>>>>>> the input will be more than one row. >>>>>>> >>>>>>> And then add the vm buffer of the target page to BulkInsertState, so that >>>>>>> hio.c can avoid re-pinning the buffer. >>>>>>> >>>>>> >>>>>> Yeah. The question still is what to do about 14, though. Shall we leave the >>>>>> code as it is now, or should we change it somehow? It seem a bit unfortunate >>>>>> that a COPY FREEZE optimization should negatively influence other (more) >>>>>> common use cases, so I guess we can't just keep the current code ... >>>>> >>>>> I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c >>>>> and see whether that fixes the regression. >>>> >>>> Is this idea to have RelationGetBufferForTuple() skip re-pinning >>>> vmbuffer? If so, is this essentially the same as the one in the v3 >>>> patch? >>>> >>> >>> I don't think it is the same approach - it's a bit hard to follow what >>> exactly happens in RelationGetBufferForTuple, but AFAICS it always >>> starts with vmbuffer = InvalidBuffer, so it may pin the vmbuffer quite >>> often, no? >> >> With that patch, we pin the vmbuffer only when inserting a frozen >> tuple into an empty page. That is, when inserting a frozen tuple into >> an empty page, we pin the vmbuffer and heap_insert() will mark the >> page all-visible and set all-frozen bit on vm. And from the next >> insertion (into the same page) until the page gets full, since the >> page is already all-visible, we skip pinning the vmbuffer. IOW, if the >> target page is not empty but all-visible, we skip pinning the >> vmbuffer. We pin the vmbuffer only once per heap page used during >> insertion. >> >>> >>> What Andres is suggesting (I think) is to modify ExecInsert() to pass a >>> valid bistate to table_tuple_insert, instead of just NULL, and store the >>> vmbuffer in it. >> >> Understood. This approach keeps using the same vmbuffer until we need >> another vm page corresponding to the target heap page, which seems >> better. > > But how is ExecInsert() related to REFRESH MATERIALIZED VIEW? > TBH I haven't looked into the details, but Andres talked about nodeModifyTable and table_tuple_insert, and ExecInsert is the only place calling it. But maybe I'm just confused and Andres meant something else? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2021-04-28 00:44:47 +0900, Masahiko Sawada wrote: > On Wed, Apr 28, 2021 at 12:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > > > valid bistate to table_tuple_insert, instead of just NULL, and store the > > > vmbuffer in it. > > > > Understood. This approach keeps using the same vmbuffer until we need > > another vm page corresponding to the target heap page, which seems > > better. > > But how is ExecInsert() related to REFRESH MATERIALIZED VIEW? I was thinking of the CONCURRENTLY path for REFRESH MATERIALIZED VIEW I think. Or something. That actually makes it easier - we already pass in a bistate in the relevant paths. So if we add a current_vmbuf to BulkInsertStateData, we can avoid needing to pin so often. It seems that'd end up with a good bit cleaner and less risky code than the skip_vmbuffer_for_frozen_tuple_insertion_v3.patch approach. The current RelationGetBufferForTuple() interface / how it's used in heapam.c doesn't make this quite as trivial as it could be... Attached is a quick hack implementing this. For me it reduces the overhead noticably: REFRESH MATERIALIZED VIEW mv; before: Time: 26542.333 ms (00:26.542) after: Time: 23105.047 ms (00:23.105) Greetings, Andres Freund
Attachment
On 4/27/21 8:22 PM, Andres Freund wrote: > Hi, > > On 2021-04-28 00:44:47 +0900, Masahiko Sawada wrote: >> On Wed, Apr 28, 2021 at 12:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> What Andres is suggesting (I think) is to modify ExecInsert() to pass a >>>> valid bistate to table_tuple_insert, instead of just NULL, and store the >>>> vmbuffer in it. >>> >>> Understood. This approach keeps using the same vmbuffer until we need >>> another vm page corresponding to the target heap page, which seems >>> better. >> >> But how is ExecInsert() related to REFRESH MATERIALIZED VIEW? > > I was thinking of the CONCURRENTLY path for REFRESH MATERIALIZED VIEW I > think. Or something. > > That actually makes it easier - we already pass in a bistate in the relevant > paths. So if we add a current_vmbuf to BulkInsertStateData, we can avoid > needing to pin so often. It seems that'd end up with a good bit cleaner and > less risky code than the skip_vmbuffer_for_frozen_tuple_insertion_v3.patch > approach. > > The current RelationGetBufferForTuple() interface / how it's used in heapam.c > doesn't make this quite as trivial as it could be... Attached is a quick hack > implementing this. For me it reduces the overhead noticably: > > REFRESH MATERIALIZED VIEW mv; > before: > Time: 26542.333 ms (00:26.542) > after: > Time: 23105.047 ms (00:23.105) > Thanks, that looks promising. I repeated the tests I did on 26/4, and the results look like this: old (0c7d3bb99): 497ms master: 621ms patched: 531ms So yeah, that's a bit improvement - it does not remove the regression entirely, but +5% is much better than +25%. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, May 05, 2021 at 03:04:53PM +0200, Tomas Vondra wrote: > Thanks, that looks promising. I repeated the tests I did on 26/4, and the > results look like this: > > old (0c7d3bb99): 497ms > master: 621ms > patched: 531ms > > So yeah, that's a bit improvement - it does not remove the regression > entirely, but +5% is much better than +25%. Hmm. Is that really something we should do after feature freeze? A 25% degradation for matview refresh may be a problem for a lot of users and could be an upgrade stopper. Another thing we could do is also to revert 7db0cd2 and 39b66a9 from the v14 tree, and work on a proper solution for this performance problem for matviews for 15~. Thoughts? -- Michael
Attachment
On Tue, May 11, 2021 at 4:37 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, May 05, 2021 at 03:04:53PM +0200, Tomas Vondra wrote: > > Thanks, that looks promising. I repeated the tests I did on 26/4, and the > > results look like this: > > > > old (0c7d3bb99): 497ms > > master: 621ms > > patched: 531ms > > > > So yeah, that's a bit improvement - it does not remove the regression > > entirely, but +5% is much better than +25%. > > Hmm. Is that really something we should do after feature freeze? A > 25% degradation for matview refresh may be a problem for a lot of > users and could be an upgrade stopper. Another thing we could do is > also to revert 7db0cd2 and 39b66a9 from the v14 tree, and work on a > proper solution for this performance problem for matviews for 15~. I think the approach proposed by Andres eliminates the extra vmbuffer reads as much as possible. But even with the patch, there still is 5% degradation (and there is no way to disable inserting frozen tuples at matview refresh). Which could be a problem for some users. I think it’s hard to completely eliminate the overhead so we might need to consider another approach like having matview refresh use heap_multi_insert() instead of heap_insert(). I think the changes for heap_multi_insert() are fine so we can revert only heap_insert() part if we revert something from the v14 tree, although we will end up not inserting frozen tuples into toast tables. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Tue, May 11, 2021 at 2:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I think the approach proposed by Andres eliminates the extra vmbuffer > reads as much as possible. But even with the patch, there still is 5% > degradation (and there is no way to disable inserting frozen tuples at > matview refresh). Which could be a problem for some users. I think > it’s hard to completely eliminate the overhead so we might need to > consider another approach like having matview refresh use > heap_multi_insert() instead of heap_insert(). I may not have understood what's being discussed here completely, but if you want to use multi inserts for refresh matview code, maybe the "New Table Access Methods for Multi and Single Inserts" patches at [1], can help. [1] - https://www.postgresql.org/message-id/CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On 5/11/21 12:58 PM, Bharath Rupireddy wrote: > On Tue, May 11, 2021 at 2:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> I think the approach proposed by Andres eliminates the extra vmbuffer >> reads as much as possible. But even with the patch, there still is 5% >> degradation (and there is no way to disable inserting frozen tuples at >> matview refresh). Which could be a problem for some users. I think >> it’s hard to completely eliminate the overhead so we might need to >> consider another approach like having matview refresh use >> heap_multi_insert() instead of heap_insert(). > > I may not have understood what's being discussed here completely, but > if you want to use multi inserts for refresh matview code, maybe the > "New Table Access Methods for Multi and Single Inserts" patches at > [1], can help. > Maybe, but I think the main question is what to do for v14, so the uncommitted patch is kinda irrelevant. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5/11/21 11:04 AM, Masahiko Sawada wrote: > On Tue, May 11, 2021 at 4:37 PM Michael Paquier <michael@paquier.xyz> wrote: >> >> On Wed, May 05, 2021 at 03:04:53PM +0200, Tomas Vondra wrote: >>> Thanks, that looks promising. I repeated the tests I did on 26/4, and the >>> results look like this: >>> >>> old (0c7d3bb99): 497ms >>> master: 621ms >>> patched: 531ms >>> >>> So yeah, that's a bit improvement - it does not remove the regression >>> entirely, but +5% is much better than +25%. >> >> Hmm. Is that really something we should do after feature freeze? A >> 25% degradation for matview refresh may be a problem for a lot of >> users and could be an upgrade stopper. Another thing we could do is >> also to revert 7db0cd2 and 39b66a9 from the v14 tree, and work on a >> proper solution for this performance problem for matviews for 15~. > > I think the approach proposed by Andres eliminates the extra vmbuffer > reads as much as possible. But even with the patch, there still is 5% > degradation (and there is no way to disable inserting frozen tuples at > matview refresh). Which could be a problem for some users. I think > it’s hard to completely eliminate the overhead so we might need to > consider another approach like having matview refresh use > heap_multi_insert() instead of heap_insert(). > I think it's way too late to make such significant change (switching to heap_multi_insert) for v14 :-( Moreover, I doubt it affects just matview refresh - why wouldn't it affect other similar use cases? More likely it's just the case that was discovered. > I think the changes for heap_multi_insert() are fine so we can revert > only heap_insert() part if we revert something from the v14 tree, > although we will end up not inserting frozen tuples into toast tables. > I'd be somewhat unhappy about reverting just this bit, because it'd mean that we freeze rows in the main table but not rows in the TOAST tables (that was kinda why we concluded we need the heap_insert part too). I'm still a bit puzzled where does the extra overhead (in cases when freeze is not requested) come from, TBH. Intuitively, I'd hope there's a way to eliminate that entirely, and only pay the cost when requested (with the expectation that it's cheaper than freezing it that later). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 11, 2021 at 11:07 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 5/11/21 11:04 AM, Masahiko Sawada wrote: > > On Tue, May 11, 2021 at 4:37 PM Michael Paquier <michael@paquier.xyz> wrote: > >> > >> On Wed, May 05, 2021 at 03:04:53PM +0200, Tomas Vondra wrote: > >>> Thanks, that looks promising. I repeated the tests I did on 26/4, and the > >>> results look like this: > >>> > >>> old (0c7d3bb99): 497ms > >>> master: 621ms > >>> patched: 531ms > >>> > >>> So yeah, that's a bit improvement - it does not remove the regression > >>> entirely, but +5% is much better than +25%. > >> > >> Hmm. Is that really something we should do after feature freeze? A > >> 25% degradation for matview refresh may be a problem for a lot of > >> users and could be an upgrade stopper. Another thing we could do is > >> also to revert 7db0cd2 and 39b66a9 from the v14 tree, and work on a > >> proper solution for this performance problem for matviews for 15~. > > > > I think the approach proposed by Andres eliminates the extra vmbuffer > > reads as much as possible. But even with the patch, there still is 5% > > degradation (and there is no way to disable inserting frozen tuples at > > matview refresh). Which could be a problem for some users. I think > > it’s hard to completely eliminate the overhead so we might need to > > consider another approach like having matview refresh use > > heap_multi_insert() instead of heap_insert(). > > > > I think it's way too late to make such significant change (switching to > heap_multi_insert) for v14 :-( Right. > Moreover, I doubt it affects just matview > refresh - why wouldn't it affect other similar use cases? More likely > it's just the case that was discovered. I've not tested yet but I guess COPY FROM … FREEZE using heap_insert would similarly be affected since it also uses heap_insert() with TABLE_INSERT_FROZEN. > > > I think the changes for heap_multi_insert() are fine so we can revert > > only heap_insert() part if we revert something from the v14 tree, > > although we will end up not inserting frozen tuples into toast tables. > > > > I'd be somewhat unhappy about reverting just this bit, because it'd mean > that we freeze rows in the main table but not rows in the TOAST tables > (that was kinda why we concluded we need the heap_insert part too). > > I'm still a bit puzzled where does the extra overhead (in cases when > freeze is not requested) come from, TBH. Which cases do you mean? Doesn't matview refresh always request to freeze tuples even after applying the patch proposed on this thread? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Hi, On 2021-05-11 16:07:44 +0200, Tomas Vondra wrote: > On 5/11/21 11:04 AM, Masahiko Sawada wrote: > > I think the changes for heap_multi_insert() are fine so we can revert > > only heap_insert() part if we revert something from the v14 tree, > > although we will end up not inserting frozen tuples into toast tables. > > > > I'd be somewhat unhappy about reverting just this bit, because it'd mean > that we freeze rows in the main table but not rows in the TOAST tables (that > was kinda why we concluded we need the heap_insert part too). Is there a reason not to apply a polished version of my proposal? And then to look at the remaining difference? > I'm still a bit puzzled where does the extra overhead (in cases when freeze > is not requested) come from, TBH. Intuitively, I'd hope there's a way to > eliminate that entirely, and only pay the cost when requested (with the > expectation that it's cheaper than freezing it that later). I'd like to see a profile comparison between those two cases. Best with both profiles done in master, just once with the freeze path disabled... Greetings, Andres Freund
On 5/11/21 5:56 PM, Masahiko Sawada wrote: > On Tue, May 11, 2021 at 11:07 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 5/11/21 11:04 AM, Masahiko Sawada wrote: >>> On Tue, May 11, 2021 at 4:37 PM Michael Paquier <michael@paquier.xyz> wrote: >>>> >>>> On Wed, May 05, 2021 at 03:04:53PM +0200, Tomas Vondra wrote: >>>>> Thanks, that looks promising. I repeated the tests I did on 26/4, and the >>>>> results look like this: >>>>> >>>>> old (0c7d3bb99): 497ms >>>>> master: 621ms >>>>> patched: 531ms >>>>> >>>>> So yeah, that's a bit improvement - it does not remove the regression >>>>> entirely, but +5% is much better than +25%. >>>> >>>> Hmm. Is that really something we should do after feature freeze? A >>>> 25% degradation for matview refresh may be a problem for a lot of >>>> users and could be an upgrade stopper. Another thing we could do is >>>> also to revert 7db0cd2 and 39b66a9 from the v14 tree, and work on a >>>> proper solution for this performance problem for matviews for 15~. >>> >>> I think the approach proposed by Andres eliminates the extra vmbuffer >>> reads as much as possible. But even with the patch, there still is 5% >>> degradation (and there is no way to disable inserting frozen tuples at >>> matview refresh). Which could be a problem for some users. I think >>> it’s hard to completely eliminate the overhead so we might need to >>> consider another approach like having matview refresh use >>> heap_multi_insert() instead of heap_insert(). >>> >> >> I think it's way too late to make such significant change (switching to >> heap_multi_insert) for v14 :-( > > Right. > >> Moreover, I doubt it affects just matview >> refresh - why wouldn't it affect other similar use cases? More likely >> it's just the case that was discovered. > > I've not tested yet but I guess COPY FROM … FREEZE using heap_insert > would similarly be affected since it also uses heap_insert() with > TABLE_INSERT_FROZEN. > I'd say that's somewhat acceptable, as it's a trade-off between paying a bit of time during COPY vs. paying much more later (when freezing the rows eventually). From my POV the problem here is we've not asked to freeze the rows (unless I'm missing something and REFRESH freezes them?), but it's still a bit slower. However, 5% might also be just noise due to changes in layout of the binary. >> >>> I think the changes for heap_multi_insert() are fine so we can revert >>> only heap_insert() part if we revert something from the v14 tree, >>> although we will end up not inserting frozen tuples into toast tables. >>> >> >> I'd be somewhat unhappy about reverting just this bit, because it'd mean >> that we freeze rows in the main table but not rows in the TOAST tables >> (that was kinda why we concluded we need the heap_insert part too). >> >> I'm still a bit puzzled where does the extra overhead (in cases when >> freeze is not requested) come from, TBH. > > Which cases do you mean? Doesn't matview refresh always request to > freeze tuples even after applying the patch proposed on this thread? > Oh, I didn't realize that! That'd make this much less of an issue, I'd say, because if we're intentionally freezing the rows it's reasonable to pay a bit of time (in exchange for not having to do it later). The original +25% was a bit too much, of course, but +5% seems reasonable. FWIW I'm on vacation until the end of this week, I can't do much testing at the moment. Sorry. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5/11/21 7:25 PM, Andres Freund wrote: > Hi, > > On 2021-05-11 16:07:44 +0200, Tomas Vondra wrote: >> On 5/11/21 11:04 AM, Masahiko Sawada wrote: >>> I think the changes for heap_multi_insert() are fine so we can revert >>> only heap_insert() part if we revert something from the v14 tree, >>> although we will end up not inserting frozen tuples into toast tables. >>> >> >> I'd be somewhat unhappy about reverting just this bit, because it'd mean >> that we freeze rows in the main table but not rows in the TOAST tables (that >> was kinda why we concluded we need the heap_insert part too). > > Is there a reason not to apply a polished version of my proposal? And > then to look at the remaining difference? > Probably not, I was just a little bit confused what exactly is going on, unsure what to do about it. But if RMV freezes the rows, that probably explains it and your patch is the way to go. > >> I'm still a bit puzzled where does the extra overhead (in cases when freeze >> is not requested) come from, TBH. Intuitively, I'd hope there's a way to >> eliminate that entirely, and only pay the cost when requested (with the >> expectation that it's cheaper than freezing it that later). > > I'd like to see a profile comparison between those two cases. Best with > both profiles done in master, just once with the freeze path disabled... > OK. I'm mostly afk at the moment, I'll do that once I get back home, sometime over the weekend / maybe early next week. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2021-May-11, Michael Paquier wrote: > Hmm. Is that really something we should do after feature freeze? A > 25% degradation for matview refresh may be a problem for a lot of > users and could be an upgrade stopper. Another thing we could do is > also to revert 7db0cd2 and 39b66a9 from the v14 tree, and work on a > proper solution for this performance problem for matviews for 15~. > > Thoughts? My main thought while reading this thread is about the rules of feature freeze. I mean, we are indeed in feature freeze, so no new features should be added. But that doesn't mean we are in code freeze. For the period starting now and until RC (which is a couple of months away still) we should focus on ensuring that the features we do have are in as good a shape as possible. If that means adding more code to fix problems/bugs/performance problems in the existing code, so be it. I mean, reverting is not the only tool we have. Yes, reverting has its place. Moreover, threats of reversion have their place. People should definitely be working towards finding solutions to the problems in their commits lest they be reverted. However, freezing *people* by saying that no fixes are acceptable other than reverts ... is not good. So I agree with what Andres is saying downthread: let's apply the fix he proposed (it's not even that invasive anyway), and investigate the remaining 5% and see if we can find a solution. If by the end of the beta process we can definitely find no solution to the problem, we can revert the whole lot then. -- Álvaro Herrera 39°49'30"S 73°17'W
On 5/11/21 2:23 PM, Alvaro Herrera wrote: > On 2021-May-11, Michael Paquier wrote: > >> Hmm. Is that really something we should do after feature freeze? A >> 25% degradation for matview refresh may be a problem for a lot of >> users and could be an upgrade stopper. Another thing we could do is >> also to revert 7db0cd2 and 39b66a9 from the v14 tree, and work on a >> proper solution for this performance problem for matviews for 15~. >> >> Thoughts? > My main thought while reading this thread is about the rules of feature > freeze. I mean, we are indeed in feature freeze, so no new features > should be added. But that doesn't mean we are in code freeze. For the > period starting now and until RC (which is a couple of months away > still) we should focus on ensuring that the features we do have are in > as good a shape as possible. If that means adding more code to fix > problems/bugs/performance problems in the existing code, so be it. > I mean, reverting is not the only tool we have. > > Yes, reverting has its place. Moreover, threats of reversion have their > place. People should definitely be working towards finding solutions to > the problems in their commits lest they be reverted. However, freezing > *people* by saying that no fixes are acceptable other than reverts ... > is not good. > > So I agree with what Andres is saying downthread: let's apply the fix he > proposed (it's not even that invasive anyway), and investigate the > remaining 5% and see if we can find a solution. If by the end of the > beta process we can definitely find no solution to the problem, we can > revert the whole lot then. > I agree with all of this. Right now I'm only concerned if there isn't work apparently being done on some issue. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, May 11, 2021 at 02:46:35PM -0400, Andrew Dunstan wrote: > On 5/11/21 2:23 PM, Alvaro Herrera wrote: >> Yes, reverting has its place. Moreover, threats of reversion have their >> place. People should definitely be working towards finding solutions to >> the problems in their commits lest they be reverted. However, freezing >> *people* by saying that no fixes are acceptable other than reverts ... >> is not good. Well, that's an option on the table and a possibility, so I am listing it as a possible exit path as a potential solution, as much as a different optimization is another exit path to take care of this item :) >> So I agree with what Andres is saying downthread: let's apply the fix he >> proposed (it's not even that invasive anyway), and investigate the >> remaining 5% and see if we can find a solution. If by the end of the >> beta process we can definitely find no solution to the problem, we can >> revert the whole lot then. > > I agree with all of this. Right now I'm only concerned if there isn't > work apparently being done on some issue. If that's the consensus reached, that's fine by me as long as we don't keep a 25% performance regression. Now, looking at the patch proposed, I have to admit that this looks like some redesign of an existing feature, so that stresses me a bit in a period when we are aiming at making things stable, because this has a risk of making a part of the code more unstable. And I've had my share of calls over the last years in such situations, not only with Postgres, FWIW, so I may just sound like a conservative guy with a conservative hat. -- Michael
Attachment
Hi, On 2021-05-13 11:12:43 +0900, Michael Paquier wrote: > If that's the consensus reached, that's fine by me as long as we don't > keep a 25% performance regression. Now, looking at the patch > proposed, I have to admit that this looks like some redesign of an > existing feature, so that stresses me a bit in a period when we are > aiming at making things stable, because this has a risk of making a > part of the code more unstable. You're referencing tracking the vm page in the bulk insert state? I don't see how you get a less invasive fix that's not architecturally worse than this. If that's over your level of comfort, I don't see an alternative but to revert. But I also don't think it's particularly invasive? Greetings, Andres Freund
On Tue, May 11, 2021 at 11:46 AM Andrew Dunstan <andrew@dunslane.net> wrote: > > Yes, reverting has its place. Moreover, threats of reversion have their > > place. People should definitely be working towards finding solutions to > > the problems in their commits lest they be reverted. However, freezing > > *people* by saying that no fixes are acceptable other than reverts ... > > is not good. > > > > So I agree with what Andres is saying downthread: let's apply the fix he > > proposed (it's not even that invasive anyway), and investigate the > > remaining 5% and see if we can find a solution. If by the end of the > > beta process we can definitely find no solution to the problem, we can > > revert the whole lot then. > > > > > I agree with all of this. Right now I'm only concerned if there isn't > work apparently being done on some issue. +1. While reverting a patch is always on the table, it must be the option of last resort. I don't have any specific reason to believe that that's the point we're at just yet. -- Peter Geoghegan
On Wed, May 12, 2021 at 2:32 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 5/11/21 5:56 PM, Masahiko Sawada wrote: > > On Tue, May 11, 2021 at 11:07 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> On 5/11/21 11:04 AM, Masahiko Sawada wrote: > >>> On Tue, May 11, 2021 at 4:37 PM Michael Paquier <michael@paquier.xyz> wrote: > >>>> > >>>> On Wed, May 05, 2021 at 03:04:53PM +0200, Tomas Vondra wrote: > >>>>> Thanks, that looks promising. I repeated the tests I did on 26/4, and the > >>>>> results look like this: > >>>>> > >>>>> old (0c7d3bb99): 497ms > >>>>> master: 621ms > >>>>> patched: 531ms > >>>>> > >>>>> So yeah, that's a bit improvement - it does not remove the regression > >>>>> entirely, but +5% is much better than +25%. > >>>> > >>>> Hmm. Is that really something we should do after feature freeze? A > >>>> 25% degradation for matview refresh may be a problem for a lot of > >>>> users and could be an upgrade stopper. Another thing we could do is > >>>> also to revert 7db0cd2 and 39b66a9 from the v14 tree, and work on a > >>>> proper solution for this performance problem for matviews for 15~. > >>> > >>> I think the approach proposed by Andres eliminates the extra vmbuffer > >>> reads as much as possible. But even with the patch, there still is 5% > >>> degradation (and there is no way to disable inserting frozen tuples at > >>> matview refresh). Which could be a problem for some users. I think > >>> it’s hard to completely eliminate the overhead so we might need to > >>> consider another approach like having matview refresh use > >>> heap_multi_insert() instead of heap_insert(). > >>> > >> > >> I think it's way too late to make such significant change (switching to > >> heap_multi_insert) for v14 :-( > > > > Right. > > > >> Moreover, I doubt it affects just matview > >> refresh - why wouldn't it affect other similar use cases? More likely > >> it's just the case that was discovered. > > > > I've not tested yet but I guess COPY FROM … FREEZE using heap_insert > > would similarly be affected since it also uses heap_insert() with > > TABLE_INSERT_FROZEN. > > > > I'd say that's somewhat acceptable, as it's a trade-off between paying a > bit of time during COPY vs. paying much more later (when freezing the > rows eventually). > > From my POV the problem here is we've not asked to freeze the rows > (unless I'm missing something and REFRESH freezes them?), but it's still > a bit slower. However, 5% might also be just noise due to changes in > layout of the binary. > > >> > >>> I think the changes for heap_multi_insert() are fine so we can revert > >>> only heap_insert() part if we revert something from the v14 tree, > >>> although we will end up not inserting frozen tuples into toast tables. > >>> > >> > >> I'd be somewhat unhappy about reverting just this bit, because it'd mean > >> that we freeze rows in the main table but not rows in the TOAST tables > >> (that was kinda why we concluded we need the heap_insert part too). > >> > >> I'm still a bit puzzled where does the extra overhead (in cases when > >> freeze is not requested) come from, TBH. > > > > Which cases do you mean? Doesn't matview refresh always request to > > freeze tuples even after applying the patch proposed on this thread? > > > > Oh, I didn't realize that! That'd make this much less of an issue, I'd > say, because if we're intentionally freezing the rows it's reasonable to > pay a bit of time (in exchange for not having to do it later). The > original +25% was a bit too much, of course, but +5% seems reasonable. Yes. It depends on how much the matview refresh gets slower but I think the problem here is that users always are forced to pay the cost for freezing tuple during refreshing the matview. There is no way to disable it unlike FREEZE option of COPY command. I’ve done benchmarks for matview refresh on my machine (FreeBSD 12.1, AMD Ryzen 5 PRO 3400GE, 24GB RAM) with four codes: HEAD, HEAD + Andres’s patch, one before 39b66a91b, and HEAD without TABLE_INSERT_FROZEN. The workload is to refresh the matview that simply selects 50M tuples (about 1.7 GB). Here are the average execution times of three trials for each code: 1) head: 42.263 sec 2) head w/ Andres’s patch: 40.194 sec 3) before 39b66a91b commit: 38.143 sec 4) head w/o freezing tuples: 32.413 sec I also observed 5% degradation by comparing 1 and 2 but am not sure where the overhead came from. I agree with Andres’s proposal. It’s a straightforward approach. I think it’s a reasonable degradation comparing to the cost of freezing tuples later. But I’m concerned a bit that it’s reasonable that we force all users to pay the cost during matview refresh without any choice. So we need to find the remaining differences after applying a polished version of the patch. FYI I’ve attached flame graphs for each evaluation. Looking at 1_head.svg, we can see CPU spent much time on visibilittmap_pin() and it disappeared in 2_head_w_Andreas_patch.svg. There is no big difference at a glance between 2_head_w_Andreas_patch.svg and 3_before_39b66a91b.svg. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
Hi, On 2021-05-18 11:20:07 +0900, Masahiko Sawada wrote: > Yes. It depends on how much the matview refresh gets slower but I > think the problem here is that users always are forced to pay the cost > for freezing tuple during refreshing the matview. There is no way to > disable it unlike FREEZE option of COPY command. > > I’ve done benchmarks for matview refresh on my machine (FreeBSD 12.1, > AMD Ryzen 5 PRO 3400GE, 24GB RAM) with four codes: HEAD, HEAD + > Andres’s patch, one before 39b66a91b, and HEAD without > TABLE_INSERT_FROZEN. > > The workload is to refresh the matview that simply selects 50M tuples > (about 1.7 GB). Here are the average execution times of three trials > for each code: > > 1) head: 42.263 sec > 2) head w/ Andres’s patch: 40.194 sec > 3) before 39b66a91b commit: 38.143 sec > 4) head w/o freezing tuples: 32.413 sec I don't see such a big difference between andres-freeze/non-freeze. Is there any chance there's some noise in there? I found that I need to disable autovacuum and ensure that there's a checkpoint just before the REFRESH to get halfway meaningful numbers, as well as a min/max_wal_size ensuring that only recycled WAL is used. > I also observed 5% degradation by comparing 1 and 2 but am not sure > where the overhead came from. I agree with Andres’s proposal. It’s a > straightforward approach. What degradation are you referencing here? I compared your case 2 with 4 - as far as I can see the remaining performance difference is from the the difference in WAL records emitted: freeze-andres: Type N (%) Record size (%) FPI size (%) Combined size (%) ---- - --- ----------- --- -------- --- ------------- --- XLOG/CHECKPOINT_ONLINE 1 ( 0.00) 114 ( 0.00) 0 ( 0.00) 114 ( 0.00) Transaction/COMMIT 1 ( 0.00) 949 ( 0.00) 0 ( 0.00) 949 ( 0.00) Storage/CREATE 1 ( 0.00) 42 ( 0.00) 0 ( 0.00) 42 ( 0.00) Standby/LOCK 3 ( 0.00) 138 ( 0.00) 0 ( 0.00) 138 ( 0.00) Standby/RUNNING_XACTS 2 ( 0.00) 104 ( 0.00) 0 ( 0.00) 104 ( 0.00) Heap2/VISIBLE 44248 ( 0.44) 2610642 ( 0.44) 16384 ( 14.44) 2627026 ( 0.44) Heap2/MULTI_INSERT 5 ( 0.00) 1125 ( 0.00) 6696 ( 5.90) 7821 ( 0.00) Heap/INSERT 9955755 ( 99.12) 587389836 ( 99.12) 5128 ( 4.52) 587394964 ( 99.10) Heap/DELETE 13 ( 0.00) 702 ( 0.00) 0 ( 0.00) 702 ( 0.00) Heap/UPDATE 2 ( 0.00) 202 ( 0.00) 0 ( 0.00) 202 ( 0.00) Heap/HOT_UPDATE 1 ( 0.00) 65 ( 0.00) 4372 ( 3.85) 4437 ( 0.00) Heap/INSERT+INIT 44248 ( 0.44) 2610632 ( 0.44) 0 ( 0.00) 2610632 ( 0.44) Btree/INSERT_LEAF 33 ( 0.00) 2030 ( 0.00) 80864 ( 71.28) 82894 ( 0.01) -------- -------- -------- -------- Total 10044313 592616581 [99.98%] 113444 [0.02%] 592730025 [100%] nofreeze: Type N (%) Record size (%) FPI size (%) Combined size (%) ---- - --- ----------- --- -------- --- ------------- --- XLOG/NEXTOID 1 ( 0.00) 30 ( 0.00) 0 ( 0.00) 30 ( 0.00) Transaction/COMMIT 1 ( 0.00) 949 ( 0.00) 0 ( 0.00) 949 ( 0.00) Storage/CREATE 1 ( 0.00) 42 ( 0.00) 0 ( 0.00) 42 ( 0.00) Standby/LOCK 3 ( 0.00) 138 ( 0.00) 0 ( 0.00) 138 ( 0.00) Standby/RUNNING_XACTS 1 ( 0.00) 54 ( 0.00) 0 ( 0.00) 54 ( 0.00) Heap2/MULTI_INSERT 5 ( 0.00) 1125 ( 0.00) 7968 ( 7.32) 9093 ( 0.00) Heap/INSERT 9955755 ( 99.56) 587389836 ( 99.56) 5504 ( 5.06) 587395340 ( 99.54) Heap/DELETE 13 ( 0.00) 702 ( 0.00) 0 ( 0.00) 702 ( 0.00) Heap/UPDATE 2 ( 0.00) 202 ( 0.00) 0 ( 0.00) 202 ( 0.00) Heap/HOT_UPDATE 1 ( 0.00) 65 ( 0.00) 5076 ( 4.67) 5141 ( 0.00) Heap/INSERT+INIT 44248 ( 0.44) 2610632 ( 0.44) 0 ( 0.00) 2610632 ( 0.44) Btree/INSERT_LEAF 32 ( 0.00) 1985 ( 0.00) 73476 ( 67.54) 75461 ( 0.01) Btree/INSERT_UPPER 1 ( 0.00) 61 ( 0.00) 1172 ( 1.08) 1233 ( 0.00) Btree/SPLIT_L 1 ( 0.00) 1549 ( 0.00) 7480 ( 6.88) 9029 ( 0.00) Btree/DELETE 1 ( 0.00) 59 ( 0.00) 8108 ( 7.45) 8167 ( 0.00) Btree/REUSE_PAGE 1 ( 0.00) 50 ( 0.00) 0 ( 0.00) 50 ( 0.00) -------- -------- -------- -------- Total 10000067 590007479 [99.98%] 108784 [0.02%] 590116263 [100%] I.e. the additional Heap2/VISIBLE records show up. It's not particularly surprising that emitting an additional WAL record for every page isn't free. It's particularly grating / unnecessary because this is the REGBUF_WILL_INIT path - it's completely unnecessary to emit a separate record. I dimly remember that we explicitly discussed that we do *not* want to emit WAL records here? Greetings, Andres Freund
On 5/18/21 4:20 AM, Masahiko Sawada wrote: > ... >>>> >>>>> I think the changes for heap_multi_insert() are fine so we can revert >>>>> only heap_insert() part if we revert something from the v14 tree, >>>>> although we will end up not inserting frozen tuples into toast tables. >>>>> >>>> >>>> I'd be somewhat unhappy about reverting just this bit, because it'd mean >>>> that we freeze rows in the main table but not rows in the TOAST tables >>>> (that was kinda why we concluded we need the heap_insert part too). >>>> >>>> I'm still a bit puzzled where does the extra overhead (in cases when >>>> freeze is not requested) come from, TBH. >>> >>> Which cases do you mean? Doesn't matview refresh always request to >>> freeze tuples even after applying the patch proposed on this thread? >>> >> >> Oh, I didn't realize that! That'd make this much less of an issue, I'd >> say, because if we're intentionally freezing the rows it's reasonable to >> pay a bit of time (in exchange for not having to do it later). The >> original +25% was a bit too much, of course, but +5% seems reasonable. > > Yes. It depends on how much the matview refresh gets slower but I > think the problem here is that users always are forced to pay the cost > for freezing tuple during refreshing the matview. There is no way to > disable it unlike FREEZE option of COPY command. > Yeah, I see your point. I agree it's unfortunate there's no way to disable freezing during REFRESH MV. For most users that trade-off is probably fine, but for some cases (matviews refreshed often, or cases where it's fine to pay more but later) it may be an issue. From this POV, however, it may not be enough to optimize the current freezing code - it's always going to be a bit slower than before. So the only *real* solution may be adding a FREEZE option to the REFRESH MATERIALIZED VIEW command. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5/18/21 8:08 PM, Andres Freund wrote: > Hi, > > On 2021-05-18 11:20:07 +0900, Masahiko Sawada wrote: >> Yes. It depends on how much the matview refresh gets slower but I >> think the problem here is that users always are forced to pay the cost >> for freezing tuple during refreshing the matview. There is no way to >> disable it unlike FREEZE option of COPY command. >> >> I’ve done benchmarks for matview refresh on my machine (FreeBSD 12.1, >> AMD Ryzen 5 PRO 3400GE, 24GB RAM) with four codes: HEAD, HEAD + >> Andres’s patch, one before 39b66a91b, and HEAD without >> TABLE_INSERT_FROZEN. >> >> The workload is to refresh the matview that simply selects 50M tuples >> (about 1.7 GB). Here are the average execution times of three trials >> for each code: >> >> 1) head: 42.263 sec >> 2) head w/ Andres’s patch: 40.194 sec >> 3) before 39b66a91b commit: 38.143 sec >> 4) head w/o freezing tuples: 32.413 sec > > I don't see such a big difference between andres-freeze/non-freeze. Is > there any chance there's some noise in there? I found that I need to > disable autovacuum and ensure that there's a checkpoint just before the > REFRESH to get halfway meaningful numbers, as well as a min/max_wal_size > ensuring that only recycled WAL is used. > > >> I also observed 5% degradation by comparing 1 and 2 but am not sure >> where the overhead came from. I agree with Andres’s proposal. It’s a >> straightforward approach. > > What degradation are you referencing here? > > > I compared your case 2 with 4 - as far as I can see the remaining > performance difference is from the the difference in WAL records > emitted: > > freeze-andres: > > Type N (%) Record size (%) FPI size (%) Combined size (%) > ---- - --- ----------- --- -------- --- ------------- --- > XLOG/CHECKPOINT_ONLINE 1 ( 0.00) 114 ( 0.00) 0 ( 0.00) 114 ( 0.00) > Transaction/COMMIT 1 ( 0.00) 949 ( 0.00) 0 ( 0.00) 949 ( 0.00) > Storage/CREATE 1 ( 0.00) 42 ( 0.00) 0 ( 0.00) 42 ( 0.00) > Standby/LOCK 3 ( 0.00) 138 ( 0.00) 0 ( 0.00) 138 ( 0.00) > Standby/RUNNING_XACTS 2 ( 0.00) 104 ( 0.00) 0 ( 0.00) 104 ( 0.00) > Heap2/VISIBLE 44248 ( 0.44) 2610642 ( 0.44) 16384 ( 14.44) 2627026 ( 0.44) > Heap2/MULTI_INSERT 5 ( 0.00) 1125 ( 0.00) 6696 ( 5.90) 7821 ( 0.00) > Heap/INSERT 9955755 ( 99.12) 587389836 ( 99.12) 5128 ( 4.52) 587394964 ( 99.10) > Heap/DELETE 13 ( 0.00) 702 ( 0.00) 0 ( 0.00) 702 ( 0.00) > Heap/UPDATE 2 ( 0.00) 202 ( 0.00) 0 ( 0.00) 202 ( 0.00) > Heap/HOT_UPDATE 1 ( 0.00) 65 ( 0.00) 4372 ( 3.85) 4437 ( 0.00) > Heap/INSERT+INIT 44248 ( 0.44) 2610632 ( 0.44) 0 ( 0.00) 2610632 ( 0.44) > Btree/INSERT_LEAF 33 ( 0.00) 2030 ( 0.00) 80864 ( 71.28) 82894 ( 0.01) > -------- -------- -------- -------- > Total 10044313 592616581 [99.98%] 113444 [0.02%] 592730025 [100%] > > nofreeze: > > Type N (%) Record size (%) FPI size (%) Combined size (%) > ---- - --- ----------- --- -------- --- ------------- --- > XLOG/NEXTOID 1 ( 0.00) 30 ( 0.00) 0 ( 0.00) 30 ( 0.00) > Transaction/COMMIT 1 ( 0.00) 949 ( 0.00) 0 ( 0.00) 949 ( 0.00) > Storage/CREATE 1 ( 0.00) 42 ( 0.00) 0 ( 0.00) 42 ( 0.00) > Standby/LOCK 3 ( 0.00) 138 ( 0.00) 0 ( 0.00) 138 ( 0.00) > Standby/RUNNING_XACTS 1 ( 0.00) 54 ( 0.00) 0 ( 0.00) 54 ( 0.00) > Heap2/MULTI_INSERT 5 ( 0.00) 1125 ( 0.00) 7968 ( 7.32) 9093 ( 0.00) > Heap/INSERT 9955755 ( 99.56) 587389836 ( 99.56) 5504 ( 5.06) 587395340 ( 99.54) > Heap/DELETE 13 ( 0.00) 702 ( 0.00) 0 ( 0.00) 702 ( 0.00) > Heap/UPDATE 2 ( 0.00) 202 ( 0.00) 0 ( 0.00) 202 ( 0.00) > Heap/HOT_UPDATE 1 ( 0.00) 65 ( 0.00) 5076 ( 4.67) 5141 ( 0.00) > Heap/INSERT+INIT 44248 ( 0.44) 2610632 ( 0.44) 0 ( 0.00) 2610632 ( 0.44) > Btree/INSERT_LEAF 32 ( 0.00) 1985 ( 0.00) 73476 ( 67.54) 75461 ( 0.01) > Btree/INSERT_UPPER 1 ( 0.00) 61 ( 0.00) 1172 ( 1.08) 1233 ( 0.00) > Btree/SPLIT_L 1 ( 0.00) 1549 ( 0.00) 7480 ( 6.88) 9029 ( 0.00) > Btree/DELETE 1 ( 0.00) 59 ( 0.00) 8108 ( 7.45) 8167 ( 0.00) > Btree/REUSE_PAGE 1 ( 0.00) 50 ( 0.00) 0 ( 0.00) 50 ( 0.00) > -------- -------- -------- -------- > Total 10000067 590007479 [99.98%] 108784 [0.02%] 590116263 [100%] > > I.e. the additional Heap2/VISIBLE records show up. > > It's not particularly surprising that emitting an additional WAL record > for every page isn't free. It's particularly grating / unnecessary > because this is the REGBUF_WILL_INIT path - it's completely unnecessary > to emit a separate record. > Yeah, emitting WAL is not exactly cheap, although it's just a little bit more (0.44%). I haven't looked into the details, but I wonder why it has such disproportionate impact (although, the 32 vs. 40 sec may be off). > I dimly remember that we explicitly discussed that we do *not* want to > emit WAL records here? > Ummm, in which thread? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2021-05-18 20:34:08 +0200, Tomas Vondra wrote: > Yeah, I see your point. I agree it's unfortunate there's no way to disable > freezing during REFRESH MV. For most users that trade-off is probably fine, > but for some cases (matviews refreshed often, or cases where it's fine to > pay more but later) it may be an issue. > > From this POV, however, it may not be enough to optimize the current > freezing code - it's always going to be a bit slower than before. But the intrinsic overhead is *tiny*. Setting a few bits, with the other costs amortized over a lot of pages. As far as I can tell the measurable overhead is that the increased WAL logging - which is not necessary. Greetings, Andres Freund
Hi, On 2021-05-18 20:43:41 +0200, Tomas Vondra wrote: > Yeah, emitting WAL is not exactly cheap, although it's just a little bit > more (0.44%). I haven't looked into the details, but I wonder why it has > such disproportionate impact (although, the 32 vs. 40 sec may be off). I couldn't reproduce this large a performance difference - I saw more like 10% instead of 25%. > > I dimly remember that we explicitly discussed that we do *not* want to > > emit WAL records here? > Ummm, in which thread? https://postgr.es/m/20190408010427.4l63qr7h2fjcyp77%40alap3.anarazel.de On 2019-04-07 18:04:27 -0700, Andres Freund wrote: > This avoids an extra WAL record for setting empty pages to all visible, > by adding XLH_INSERT_ALL_VISIBLE_SET & XLH_INSERT_ALL_FROZEN_SET, and > setting those when appropriate in heap_multi_insert. Unfortunately > currently visibilitymap_set() doesn't really properly allow to do this, > as it has embedded WAL logging for heap. > > I think we should remove the WAL logging from visibilitymap_set(), and > move it to a separate, heap specific, function. It'd probably be sufficient for the current purpose to change visibilitymap_set()'s documentation to say that recptr can also be passed in if the action is already covered by a WAL record, and that it's the callers responsibility to think through the correctness issues. Here it's easy, because any error will just throw the relation away. We do need to to include all-visible / FSM change in the WAL, so crash-recovery / standbys end up with the same result as a primary running normally. We already have the information, via XLH_INSERT_ALL_FROZEN_SET. I think all we need to do is to add a visibilitymap_set() in the redo routines if XLH_INSERT_ALL_FROZEN_SET. Greetings, Andres Freund
On Wed, May 19, 2021 at 3:08 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-05-18 11:20:07 +0900, Masahiko Sawada wrote: > > Yes. It depends on how much the matview refresh gets slower but I > > think the problem here is that users always are forced to pay the cost > > for freezing tuple during refreshing the matview. There is no way to > > disable it unlike FREEZE option of COPY command. > > > > I’ve done benchmarks for matview refresh on my machine (FreeBSD 12.1, > > AMD Ryzen 5 PRO 3400GE, 24GB RAM) with four codes: HEAD, HEAD + > > Andres’s patch, one before 39b66a91b, and HEAD without > > TABLE_INSERT_FROZEN. > > > > The workload is to refresh the matview that simply selects 50M tuples > > (about 1.7 GB). Here are the average execution times of three trials > > for each code: > > > > 1) head: 42.263 sec > > 2) head w/ Andres’s patch: 40.194 sec > > 3) before 39b66a91b commit: 38.143 sec > > 4) head w/o freezing tuples: 32.413 sec > > I don't see such a big difference between andres-freeze/non-freeze. Is > there any chance there's some noise in there? I found that I need to > disable autovacuum and ensure that there's a checkpoint just before the > REFRESH to get halfway meaningful numbers, as well as a min/max_wal_size > ensuring that only recycled WAL is used. I've ran the same benchmarks with the following parameters: shared_buffers = 10GB max_wal_size = 50GB min_wal_size = 50GB checkpoint_timeout = 1h maintenance_work_mem = 1GB work_mem = 512MB autovacuum = off 1) head: 42.397 sec 2) head w/ Andres’s patch: 34.857 sec 3) before 39b66a91b commit: 32.556 sec 4) head w/o freezing tuples: 32.752 sec There is 6% degradation between 2 and 4 but 2 is much better than the previous tests. > > > > I also observed 5% degradation by comparing 1 and 2 but am not sure > > where the overhead came from. I agree with Andres’s proposal. It’s a > > straightforward approach. > > What degradation are you referencing here? Sorry, I meant comparing 2 to 3 and 4. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 5/11/21 7:35 PM, Tomas Vondra wrote: > > > On 5/11/21 7:25 PM, Andres Freund wrote: >> Hi, >> >> On 2021-05-11 16:07:44 +0200, Tomas Vondra wrote: >>> On 5/11/21 11:04 AM, Masahiko Sawada wrote: >>>> I think the changes for heap_multi_insert() are fine so we can revert >>>> only heap_insert() part if we revert something from the v14 tree, >>>> although we will end up not inserting frozen tuples into toast tables. >>>> >>> >>> I'd be somewhat unhappy about reverting just this bit, because it'd mean >>> that we freeze rows in the main table but not rows in the TOAST >>> tables (that >>> was kinda why we concluded we need the heap_insert part too). >> >> Is there a reason not to apply a polished version of my proposal? And >> then to look at the remaining difference? >> > > Probably not, I was just a little bit confused what exactly is going on, > unsure what to do about it. But if RMV freezes the rows, that probably > explains it and your patch is the way to go. > >> >>> I'm still a bit puzzled where does the extra overhead (in cases when >>> freeze >>> is not requested) come from, TBH. Intuitively, I'd hope there's a way to >>> eliminate that entirely, and only pay the cost when requested (with the >>> expectation that it's cheaper than freezing it that later). >> >> I'd like to see a profile comparison between those two cases. Best with >> both profiles done in master, just once with the freeze path disabled... >> > > OK. I'm mostly afk at the moment, I'll do that once I get back home, > sometime over the weekend / maybe early next week. > OK, so here are the flamegraphs, for all three cases - current master, 0c7d3bb99 (i.e. before heap_insert changes) and with the pinning patch applied. I did this using the same test case as before (50M table), but with -fno-omit-frame-pointer to get better profiles. It may add some overhead, but hopefully that applies to all cases equally. The first 10 runs for each case look like this: old master patched ---------------------- 55045 74284 58246 53927 74283 57273 54090 74114 57336 54194 74059 57223 54189 74186 57287 54090 74113 57278 54095 74036 57176 53896 74215 57303 54101 74060 57524 54062 74021 57278 ---------------------- 54168 74137 57392 1.36x 1.05x which is mostly in line with previous findings (the master overhead is a bit worse, possibly due to the frame pointers). Attached are the flame graphs for all three cases. The change in master is pretty clearly visible, but I don't see any clear difference between old and patched code :-( regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi, On 2021-05-21 18:17:01 +0200, Tomas Vondra wrote: > OK, so here are the flamegraphs, for all three cases - current master, > 0c7d3bb99 (i.e. before heap_insert changes) and with the pinning patch > applied. I did this using the same test case as before (50M table), but with > -fno-omit-frame-pointer to get better profiles. It may add some overhead, > but hopefully that applies to all cases equally. > > The first 10 runs for each case look like this: > > old master patched > ---------------------- > 55045 74284 58246 > 53927 74283 57273 > 54090 74114 57336 > 54194 74059 57223 > 54189 74186 57287 > 54090 74113 57278 > 54095 74036 57176 > 53896 74215 57303 > 54101 74060 57524 > 54062 74021 57278 > ---------------------- > 54168 74137 57392 > 1.36x 1.05x > > which is mostly in line with previous findings (the master overhead is a bit > worse, possibly due to the frame pointers). > > Attached are the flame graphs for all three cases. The change in master is > pretty clearly visible, but I don't see any clear difference between old and > patched code :-( I'm pretty sure it's the additional WAL records? Greetings, Andres Freund
On 5/21/21 6:43 PM, Andres Freund wrote: > Hi, > > ... > >> Attached are the flame graphs for all three cases. The change in master is >> pretty clearly visible, but I don't see any clear difference between old and >> patched code :-( > > I'm pretty sure it's the additional WAL records? > Not sure. If I understand what you suggested elsewhere in the thread, it should be fine to modify heap_insert to pass the page recptr to visibilitymap_set, roughly per the attached patch. I'm not sure it's correct, but it does eliminate the Heap2/VISIBILITY records for me (when applied on top of your patch). Funnily enough it does make it a wee bit slower: patch #1: 56941.505 patch #2: 58099.788 I wonder if this might be due to -fno-omit-frame-pointer, though, as without it I get these timings: 0c7d3bb99: 25540.417 master: 31868.236 patch #1: 26566.199 patch #2: 26487.943 So without the frame pointers there's no slowdown, but there's no clear improvement after removal of the WAL records either :-( regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Sat, May 22, 2021 at 3:10 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 5/21/21 6:43 PM, Andres Freund wrote: > > Hi, > > > > ... > > > >> Attached are the flame graphs for all three cases. The change in master is > >> pretty clearly visible, but I don't see any clear difference between old and > >> patched code :-( > > > > I'm pretty sure it's the additional WAL records? > > > > Not sure. If I understand what you suggested elsewhere in the thread, it > should be fine to modify heap_insert to pass the page recptr to > visibilitymap_set, roughly per the attached patch. > > I'm not sure it's correct, but it does eliminate the Heap2/VISIBILITY > records for me (when applied on top of your patch). Funnily enough it > does make it a wee bit slower: > > patch #1: 56941.505 > patch #2: 58099.788 > > I wonder if this might be due to -fno-omit-frame-pointer, though, as > without it I get these timings: > > 0c7d3bb99: 25540.417 > master: 31868.236 > patch #1: 26566.199 > patch #2: 26487.943 > > So without the frame pointers there's no slowdown, but there's no clear > improvement after removal of the WAL records either :-( Can we verify that the additional WAL records are the cause of this difference by making the matview unlogged by manually updating relpersistence = 'u'? Here are the results of benchmarks with unlogged matviews on my environment: 1) head: 22.927 sec 2) head w/ Andres’s patch: 16.629 sec 3) before 39b66a91b commit: 15.377 sec 4) head w/o freezing tuples: 14.551 sec And here are the results of logged matviews ICYMI: 1) head: 42.397 sec 2) head w/ Andres’s patch: 34.857 sec 3) before 39b66a91b commit: 32.556 sec 4) head w/o freezing tuples: 32.752 sec There seems no difference in the tendency. Which means the additional WAL is not the culprit? Interestingly, my previously proposed patch[1] was a better performance. With the patch, we skip all VM-related work on all insertions except for when inserting a tuple into a page for the first time. logged matviews: 31.591 sec unlogged matviews: 15.317 sec Regards, [1] https://www.postgresql.org/message-id/CAD21AoAaiPcgGRyJ7vpg05%3DNWqr6Vhaay_SEXyZBboQcZC8sFA%40mail.gmail.com Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 5/24/21 9:53 AM, Masahiko Sawada wrote: > On Sat, May 22, 2021 at 3:10 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 5/21/21 6:43 PM, Andres Freund wrote: >>> Hi, >>> >> > ... >> > >>>> Attached are the flame graphs for all three cases. The change in master is >>>> pretty clearly visible, but I don't see any clear difference between old and >>>> patched code :-( >>> >>> I'm pretty sure it's the additional WAL records? >>> >> >> Not sure. If I understand what you suggested elsewhere in the thread, it >> should be fine to modify heap_insert to pass the page recptr to >> visibilitymap_set, roughly per the attached patch. >> >> I'm not sure it's correct, but it does eliminate the Heap2/VISIBILITY >> records for me (when applied on top of your patch). Funnily enough it >> does make it a wee bit slower: >> >> patch #1: 56941.505 >> patch #2: 58099.788 >> >> I wonder if this might be due to -fno-omit-frame-pointer, though, as >> without it I get these timings: >> >> 0c7d3bb99: 25540.417 >> master: 31868.236 >> patch #1: 26566.199 >> patch #2: 26487.943 >> >> So without the frame pointers there's no slowdown, but there's no clear >> improvement after removal of the WAL records either :-( > > Can we verify that the additional WAL records are the cause of this > difference by making the matview unlogged by manually updating > relpersistence = 'u'? > > Here are the results of benchmarks with unlogged matviews on my environment: > > 1) head: 22.927 sec > 2) head w/ Andres’s patch: 16.629 sec > 3) before 39b66a91b commit: 15.377 sec > 4) head w/o freezing tuples: 14.551 sec > > And here are the results of logged matviews ICYMI: > > 1) head: 42.397 sec > 2) head w/ Andres’s patch: 34.857 sec > 3) before 39b66a91b commit: 32.556 sec > 4) head w/o freezing tuples: 32.752 sec > > There seems no difference in the tendency. Which means the additional > WAL is not the culprit? > Yeah, I agree the WAL does not seem to be the culprit here. The patch I posted skips the WAL logging entirely (verified by pg_waldump, although I have not mentioned that), and there's no clear improvement. (FWIW I'm not sure the patch is 100% correct, but it does eliminate the the extra WAL.) The patch however does not skip the whole visibilitymap_set, it still does the initial error checks. I wonder if that might play a role ... Another option might be changes in the binary layout - 5% change is well within the range that could be attributed to this, but it feels very hand-wavy and more like an excuse than real analysis. > Interestingly, my previously proposed patch[1] was a better > performance. With the patch, we skip all VM-related work on all > insertions except for when inserting a tuple into a page for the first > time. > > logged matviews: 31.591 sec > unlogged matviews: 15.317 sec > Hmmm, thanks for reminding us that patch. Why did we reject that approach in favor of the current one? I think at this point we have these two options: 1) Revert the freeze patches, either completely or just the heap_insert part, which is what seems to be causing issues. And try again in PG15, perhaps using a different approach, allow disabling freezing in refresh, or something like that. 2) Polish and commit the pinning patch from Andres, which does reduce the slowdown quite a bit. And either call it a day, or continue with the investigation / analysis regarding the remaining ~5% (but I personally have no idea what might be the problem ...). I'd like to keep the improvement, but I find the 5% regression rather annoying and hard to defend, considering how much we fight for every little improvement. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2021-05-24 12:37:18 +0200, Tomas Vondra wrote: > Another option might be changes in the binary layout - 5% change is well > within the range that could be attributed to this, but it feels very > hand-wavy and more like an excuse than real analysis. I don't think 5% is likely to be explained by binary layout unless you look for an explicitly adverse layout. > Hmmm, thanks for reminding us that patch. Why did we reject that approach in > favor of the current one? Don't know about others, but I think it's way too fragile. Greetings, Andres Freund
On 5/24/21 8:21 PM, Andres Freund wrote: > Hi, > > On 2021-05-24 12:37:18 +0200, Tomas Vondra wrote: >> Another option might be changes in the binary layout - 5% change is well >> within the range that could be attributed to this, but it feels very >> hand-wavy and more like an excuse than real analysis. > > I don't think 5% is likely to be explained by binary layout unless you > look for an explicitly adverse layout. > Yeah, true. But I'm out of ideas what might be causing the regression and how to fix it :-( > >> Hmmm, thanks for reminding us that patch. Why did we reject that approach in >> favor of the current one? > > Don't know about others, but I think it's way too fragile. > Is it really that fragile? Any particular risks you have in mind? Maybe we could protect against that somehow ... Anyway, that change would certainly be for PG15. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Based on the investigation and (lack of) progress so far, I'll revert part of the COPY FREEZE improvements shortly. I'll keep the initial 7db0cd2145 changes, tweaking heap_multi_insert, and remove most of 39b66a91bd (except for the heap_xlog_multi_insert bit). This should address the small 5% regression in refresh matview. I have no other ideas how to fix that, short of adding a user-level option to REFRESH MATERIALIZED VIEW command so that the users can opt out/in. Attached is the revert patch - I'll get it committed in the next day or two, once the tests complete (running with CCA so it takes time). regards On 5/25/21 12:30 AM, Tomas Vondra wrote: > On 5/24/21 8:21 PM, Andres Freund wrote: >> Hi, >> >> On 2021-05-24 12:37:18 +0200, Tomas Vondra wrote: >>> Another option might be changes in the binary layout - 5% change is well >>> within the range that could be attributed to this, but it feels very >>> hand-wavy and more like an excuse than real analysis. >> >> I don't think 5% is likely to be explained by binary layout unless you >> look for an explicitly adverse layout. >> > > Yeah, true. But I'm out of ideas what might be causing the regression > and how to fix it :-( > >> >>> Hmmm, thanks for reminding us that patch. Why did we reject that approach in >>> favor of the current one? >> >> Don't know about others, but I think it's way too fragile. >> > > Is it really that fragile? Any particular risks you have in mind? Maybe > we could protect against that somehow ... Anyway, that change would > certainly be for PG15. > > > regards > -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
OK, As mentioned in the previous message, I've reverted most of 39b66a91bd. It took a bit longer to test, because the revert patch I shared a couple days ago was actually incorrect/buggy in one place. I'm not entirely happy about the end result (as it does not really help with TOAST tables), so hopefully we'll be able to do something about that soon. I'm not sure what, though - we've spent quite a bit of time trying to address the regression, and I don't envision some major breakthrough. As for the regression example, I think in practice the impact would be much lower, because the queries are likely much more complex (not just a seqscan from a table), so the query execution will be a much bigger part of execution time. I do think the optimization would be a win in most cases where freezing is desirable. From this POV the problem is rather that REFRESH MV does not allow not freezing the result, so it has to pay the price always. So perhaps the way forward is to add "NO FREEZE" option to REFRESH MV, or something like that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 3, 2021 at 8:02 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > OK, > > As mentioned in the previous message, I've reverted most of 39b66a91bd. > It took a bit longer to test, because the revert patch I shared a couple > days ago was actually incorrect/buggy in one place. > > I'm not entirely happy about the end result (as it does not really help > with TOAST tables), so hopefully we'll be able to do something about > that soon. Me too and +1 for addressing the problem soon for PG15. > I'm not sure what, though - we've spent quite a bit of time > trying to address the regression, and I don't envision some major > breakthrough. > > As for the regression example, I think in practice the impact would be > much lower, because the queries are likely much more complex (not just a > seqscan from a table), so the query execution will be a much bigger part > of execution time. > > I do think the optimization would be a win in most cases where freezing > is desirable. From this POV the problem is rather that REFRESH MV does > not allow not freezing the result, so it has to pay the price always. So > perhaps the way forward is to add "NO FREEZE" option to REFRESH MV, or > something like that. That could be an option. Is it worth analyzing the cause of overhead and why my patch seemed to avoid it? If we can resolve the performance problem by fixing heap_insert() and related codes, we can use HEAP_INSERT_FROZEN for CREATE TABLE AS as well. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > As mentioned in the previous message, I've reverted most of 39b66a91bd. Should this topic be removed from the open-items list now? regards, tom lane
On 6/3/21 7:30 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> As mentioned in the previous message, I've reverted most of 39b66a91bd. > Should this topic be removed from the open-items list now? > > Yep. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com