Re: PANIC: wrong buffer passed to visibilitymap_clear - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: PANIC: wrong buffer passed to visibilitymap_clear
Date
Msg-id CAH2-WznPo0D2t9fBgK5jKjprdLxsJvy_PnnFxUeF2ftFxXstsg@mail.gmail.com
Whole thread Raw
In response to Re: PANIC: wrong buffer passed to visibilitymap_clear  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PANIC: wrong buffer passed to visibilitymap_clear  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Apr 12, 2021 at 9:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I think we have to stick with the current basic design, and just
> tighten things up to make sure that visibility pins are accounted for
> in the places that are missing it.
>
> Hence, I propose the attached.  It passes check-world, but that proves
> absolutely nothing of course :-(.  I wonder if there is any way to
> exercise these code paths deterministically.

This approach seems reasonable to me. At least you've managed to
structure the visibility map page pin check as concomitant with the
existing space recheck.

> (I have realized BTW that I was exceedingly fortunate to reproduce
> the buildfarm report here --- I have run hundreds of additional
> cycles of the same test scenario without getting a second failure.)

In the past I've had luck with RR's chaos mode (most notably with the
Jepsen SSI bug). That didn't work for me here, though I might just
have not persisted with it for long enough. I should probably come up
with a shell script that runs the same thing hundreds of times or more
in chaos mode, while making sure that useless recordings don't
accumulate.

The feature is described here:

https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mode.html

You only have to be lucky once. Once that happens, you're left with a
recording to review and re-review at your leisure. This includes all
Postgres backends, maybe even pg_regress and other scaffolding (if
that's what you're after).

But that's for debugging, not testing. The only way that we'll ever be
able to test stuff like this is with something like Alexander
Korotkov's stop events patch [1]. That infrastructure should be added
sooner rather than later.

[1] https://postgr.es/m/CAPpHfdtSEOHX8dSk9Qp+Z++i4BGQoffKip6JDWngEA+g7Z-XmQ@mail.gmail.com
--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Nandni Mehla
Date:
Subject: Proposal for working on open source with PostgreSQL
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] Identify LWLocks in tracepoints