Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication
Date
Msg-id 20210506193229.nspubcypujg26scq@alap3.anarazel.de
Whole thread Raw
In response to Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication
List pgsql-hackers
Hi,

On 2021-05-06 21:40:18 +0900, Masahiko Sawada wrote:
> Since we set all_visible_according_to_vm before acquiring the buffer
> lock it's likely to happen that the page gets modified and all-visible
> bit is cleared after setting true to all_visible_according_to_vm. This
> assertion can easily be reproduced by adding a delay before the buffer
> lock and invoking autovacuums frequently:

> So we should recheck also visibility map bit there but I think we can
> remove this assertion since we already do that in later code and we
> don’t treat this case as a should-not-happen case:
>
>         /*
>          * As of PostgreSQL 9.2, the visibility map bit should never be set if
>          * the page-level bit is clear.  However, it's possible that the bit
>          * got cleared after we checked it and before we took the buffer
>          * content lock, so we must recheck before jumping to the conclusion
>          * that something bad has happened.
>          */
>         else if (all_visible_according_to_vm && !PageIsAllVisible(page)
>                  && VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
>         {
>             elog(WARNING, "page is not marked all-visible but
> visibility map bit is set in relation \"%s\" page %u",
>                  vacrel->relname, blkno);
>             visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
>                                 VISIBILITYMAP_VALID_BITS);
>         }
>
> I've attached a patch removing the assertion.

I think it'd be a good idea to audit the other uses of
all_visible_according_to_vm to make sure there's no issues there. Can't
this e.g. make us miss setting all-visible in

        /*
         * Handle setting visibility map bit based on what the VM said about
         * the page before pruning started, and using prunestate
         */
        if (!all_visible_according_to_vm && prunestate.all_visible)

Perhaps we should update all_visible_according_to_vm after locking the
buffer, if PageIsAllVisible(page) is true?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Printing backtrace of postgres processes
Next
From: Andres Freund
Date:
Subject: Re: use `proc->pgxactoff` as the value of `index` in `ProcArrayRemove()`