Thread: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication

[BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication

From
"tanghy.fnst@fujitsu.com"
Date:
Hi

I met an assertion failure at the publisher in lazy_scan_heap() when synchronous running logical replication. Could
someoneplease take a look at it? 

Here's what I did to produce the problem.

First, use './configure --enable-cassert' to build the PG.
Then, I created multiple publications at publisher and multiple subscriptions at subscriber.
Then, set the value of synchronous_standby_names and reload, make them in synchronous commit mode. After that, an
assertionfailed at publisher when I COMMIT and ROLLBACK transactions concurrently:  

>TRAP: FailedAssertion("!all_visible_according_to_vm || prunestate.all_visible", File: "vacuumlazy.c", Line: 1347, PID:
1274675)

BTW, in asynchronous mode, the same problem can also happen but in a low frequency.(I tried many times, but the problem
happenedonly 2 times) 
As for synchronous mode, I found it seems easier to reproduce the problem with setting "autovacuum_naptime = 1".
But it still can't be 100% to reproduced it. (I tested it 5 times, 3 of them reproduced it.)

The script and the log are attached. It took about 6min to run it(without problem) on my machine and it could be less
than6min if the server crashed. 

Regards
Tang

Attachment
On Thu, Apr 29, 2021 at 02:34:21AM +0000, tanghy.fnst@fujitsu.com wrote:
> I met an assertion failure at the publisher in lazy_scan_heap() when
> synchronous running logical replication. Could someone please take a
> look at it?

This assertion is new as of 7136bf34.  Peter?
--
Michael

Attachment
On Wed, Apr 28, 2021 at 7:34 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
> >TRAP: FailedAssertion("!all_visible_according_to_vm || prunestate.all_visible", File: "vacuumlazy.c", Line: 1347,
PID:1274675)
 
>
> BTW, in asynchronous mode, the same problem can also happen but in a low frequency.(I tried many times, but the
problemhappened only 2 times)
 
> As for synchronous mode, I found it seems easier to reproduce the problem with setting "autovacuum_naptime = 1".
> But it still can't be 100% to reproduced it. (I tested it 5 times, 3 of them reproduced it.)

Is setting all_visible_according_to_vm false as below enough to avoid
the assertion failure?

diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index c3fc12d76c..76c17e063e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1146,6 +1146,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
*params, bool aggressive)
             {
                 ReleaseBuffer(vmbuffer);
                 vmbuffer = InvalidBuffer;
+                all_visible_according_to_vm = false;
             }

             /* Remove the collected garbage tuples from table and indexes */


-- 
Peter Geoghegan



RE: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication

From
"tanghy.fnst@fujitsu.com"
Date:
On Thursday, April 29, 2021 1:22 PM, Peter Geoghegan <pg@bowt.ie> wrote

>Is setting all_visible_according_to_vm false as below enough to avoid
>the assertion failure?
>
>diff --git a/src/backend/access/heap/vacuumlazy.c
>b/src/backend/access/heap/vacuumlazy.c
>index c3fc12d76c..76c17e063e 100644
>--- a/src/backend/access/heap/vacuumlazy.c
>+++ b/src/backend/access/heap/vacuumlazy.c
>@@ -1146,6 +1146,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
>*params, bool aggressive)
>             {
>                 ReleaseBuffer(vmbuffer);
>                 vmbuffer = InvalidBuffer;
>+                all_visible_according_to_vm = false;
>             }
>
>             /* Remove the collected garbage tuples from table and indexes */

Thanks for your reply.
I tried your patch but the problem seems not be fixed.

Regards,
Tang

On Thu, Apr 29, 2021 at 1:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Apr 28, 2021 at 7:34 PM tanghy.fnst@fujitsu.com
> <tanghy.fnst@fujitsu.com> wrote:
> > >TRAP: FailedAssertion("!all_visible_according_to_vm || prunestate.all_visible", File: "vacuumlazy.c", Line: 1347,
PID:1274675) 
> >
> > BTW, in asynchronous mode, the same problem can also happen but in a low frequency.(I tried many times, but the
problemhappened only 2 times) 
> > As for synchronous mode, I found it seems easier to reproduce the problem with setting "autovacuum_naptime = 1".
> > But it still can't be 100% to reproduced it. (I tested it 5 times, 3 of them reproduced it.)

Thanks for reporting the issue!

>
> Is setting all_visible_according_to_vm false as below enough to avoid
> the assertion failure?
>
> diff --git a/src/backend/access/heap/vacuumlazy.c
> b/src/backend/access/heap/vacuumlazy.c
> index c3fc12d76c..76c17e063e 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1146,6 +1146,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
> *params, bool aggressive)
>              {
>                  ReleaseBuffer(vmbuffer);
>                  vmbuffer = InvalidBuffer;
> +                all_visible_according_to_vm = false;
>              }
>
>              /* Remove the collected garbage tuples from table and indexes */

Since we set all_visible_according_to_vm before acquiring the buffer
lock it's likely to happen that the page gets modified and all-visible
bit is cleared after setting true to all_visible_according_to_vm. This
assertion can easily be reproduced by adding a delay before the buffer
lock and invoking autovacuums frequently:

diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index c3fc12d76c..76f067a7e4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1180,6 +1180,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
*params, bool aggressive)
        buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno,
                                 RBM_NORMAL, vacrel->bstrategy);

+       pg_usleep(100000);
+
        /*
         * We need buffer cleanup lock so that we can prune HOT chains and
         * defragment the page.

So we should recheck also visibility map bit there but I think we can
remove this assertion since we already do that in later code and we
don’t treat this case as a should-not-happen case:

        /*
         * As of PostgreSQL 9.2, the visibility map bit should never be set if
         * the page-level bit is clear.  However, it's possible that the bit
         * got cleared after we checked it and before we took the buffer
         * content lock, so we must recheck before jumping to the conclusion
         * that something bad has happened.
         */
        else if (all_visible_according_to_vm && !PageIsAllVisible(page)
                 && VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
        {
            elog(WARNING, "page is not marked all-visible but
visibility map bit is set in relation \"%s\" page %u",
                 vacrel->relname, blkno);
            visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
                                VISIBILITYMAP_VALID_BITS);
        }

I've attached a patch removing the assertion.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Thu, May 6, 2021 at 6:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Since we set all_visible_according_to_vm before acquiring the buffer
> lock it's likely to happen that the page gets modified and all-visible
> bit is cleared after setting true to all_visible_according_to_vm. This
> assertion can easily be reproduced by adding a delay before the buffer
> lock and invoking autovacuums frequently:
>

Your analysis seems correct to me that before we acquire the page the
someone can modify it.  So this assertion is not correct.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Hi,

On 2021-05-06 21:40:18 +0900, Masahiko Sawada wrote:
> Since we set all_visible_according_to_vm before acquiring the buffer
> lock it's likely to happen that the page gets modified and all-visible
> bit is cleared after setting true to all_visible_according_to_vm. This
> assertion can easily be reproduced by adding a delay before the buffer
> lock and invoking autovacuums frequently:

> So we should recheck also visibility map bit there but I think we can
> remove this assertion since we already do that in later code and we
> don’t treat this case as a should-not-happen case:
>
>         /*
>          * As of PostgreSQL 9.2, the visibility map bit should never be set if
>          * the page-level bit is clear.  However, it's possible that the bit
>          * got cleared after we checked it and before we took the buffer
>          * content lock, so we must recheck before jumping to the conclusion
>          * that something bad has happened.
>          */
>         else if (all_visible_according_to_vm && !PageIsAllVisible(page)
>                  && VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
>         {
>             elog(WARNING, "page is not marked all-visible but
> visibility map bit is set in relation \"%s\" page %u",
>                  vacrel->relname, blkno);
>             visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
>                                 VISIBILITYMAP_VALID_BITS);
>         }
>
> I've attached a patch removing the assertion.

I think it'd be a good idea to audit the other uses of
all_visible_according_to_vm to make sure there's no issues there. Can't
this e.g. make us miss setting all-visible in

        /*
         * Handle setting visibility map bit based on what the VM said about
         * the page before pruning started, and using prunestate
         */
        if (!all_visible_according_to_vm && prunestate.all_visible)

Perhaps we should update all_visible_according_to_vm after locking the
buffer, if PageIsAllVisible(page) is true?

Greetings,

Andres Freund



On Thu, May 6, 2021 at 5:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've attached a patch removing the assertion.

Pushed that just now -- thanks!

-- 
Peter Geoghegan



On Thu, May 6, 2021 at 12:32 PM Andres Freund <andres@anarazel.de> wrote:
> I think it'd be a good idea to audit the other uses of
> all_visible_according_to_vm to make sure there's no issues there. Can't
> this e.g. make us miss setting all-visible in
>
>                 /*
>                  * Handle setting visibility map bit based on what the VM said about
>                  * the page before pruning started, and using prunestate
>                  */
>                 if (!all_visible_according_to_vm && prunestate.all_visible)

I don't think so, because it's the inverse case -- the condition that
you quote is concerned with the case where we found the VM all_visible
bit to not be set earlier, and then found that we could set it now.

The assertion failed because the VM's all_visible bit was set
initially, but concurrently unset by some other backend. The
all_visible_according_to_vm tracking variable became stale, so it
wasn't correct to expect current information from prunestate to agree
that the page is still all_visible.

High level philosophical observation: This reminds me of another way
in which things are too tightly coupled in VACUUM. It's really a pity
that the visibility map's all_visible bit serves two purposes -- it
remembers pages that VACUUM doesn't have to visit (except perhaps if
it's an aggressive VACUUM), and is also used for index-only scans. If
it was just used for index-only scans then I don't think it would be
necessary for a HOT update to unset a page's all_visible bit. Since a
HOT chain's members are always versions of the same logical row, there
is no reason why an index-only scan needs to care which precise
version is actually visible to its MVCC snapshot (once we know that
there must be exactly one version from each HOT chain).

-- 
Peter Geoghegan



Hi,

On 2021-05-06 13:35:56 -0700, Peter Geoghegan wrote:
> On Thu, May 6, 2021 at 12:32 PM Andres Freund <andres@anarazel.de> wrote:
> > I think it'd be a good idea to audit the other uses of
> > all_visible_according_to_vm to make sure there's no issues there. Can't
> > this e.g. make us miss setting all-visible in
> >
> >                 /*
> >                  * Handle setting visibility map bit based on what the VM said about
> >                  * the page before pruning started, and using prunestate
> >                  */
> >                 if (!all_visible_according_to_vm && prunestate.all_visible)
> 
> I don't think so, because it's the inverse case -- the condition that
> you quote is concerned with the case where we found the VM all_visible
> bit to not be set earlier, and then found that we could set it now.

Uh, yes, that is exactly my point. Because all_visible_according_to_vm
is "falsely true", we'll not enter this branch, even though we actually
would want to mark it all visible again. If we did update
all_visible_according_to_vm after acquiring the content lock, we would
have entered this branch, no?

Greetings,

Andres Freund