Re: Incorrect result of bitmap heap scan. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Incorrect result of bitmap heap scan.
Date
Msg-id vdoqkxlh3apghactvvfw6ftgeog4hqnc5sinudx2qj4kbxyab4@sk6nalz5xxti
Whole thread Raw
Responses Re: Incorrect result of bitmap heap scan.
Re: Incorrect result of bitmap heap scan.
List pgsql-hackers
Hi,

On 2024-12-02 16:08:02 +0100, Matthias van de Meent wrote:
> Concurrency timeline:
> 
> Session 1. amgetbitmap() gets snapshot of index contents, containing
> references to dead tuples on heap P1.

IIUC, an unstanted important aspect here is that these dead tuples are *not*
visible to S1. Otherwise the VACUUM in the next step could not remove the dead
tuples.


> Session 2. VACUUM runs on the heap, removes TIDs for P1 from the
> index, deletes those TIDs from the heap pages, and finally sets heap
> pages' VM bits to ALL_VISIBLE, including the now cleaned page P1
> Session 1. Executes the bitmap heap scan that uses the bitmap without
> checking tuples on P1 due to ALL_VISIBLE and a lack of output columns.

Ugh :/

Pretty depressing that we've only found this now, ~6 years after it's been
released. We're lacking some tooling to find this stuff in a more automated
fashion.


> PS.
> I don't think the optimization itself is completely impossible, and we
> can probably re-enable an optimization like that if (or when) we find
> a way to reliably keep track of when to stop using the optimization. I
> don't think that's an easy fix, though, and definitely not for
> backbranches.

One way to make the optimization safe could be to move it into the indexam. If
an indexam would check the VM bit while blocking removal of the index entries,
it should make it safe. Of course that has the disadvantage of needing more VM
lookups than before, because it would not yet have been deduplicated...


Another issue with bitmap index scans is that it currently can't use
killtuples. I've seen several production outages where performance would
degrade horribly over time whenever estimates lead to important queries to
switch to bitmap scans, because lots of dead tuples would get accessed over
and over.

It seems pretty much impossible to fix that with the current interaction
between nodeBitmap* and indexam. I wonder if we could, at least in some cases,
and likely with some heuristics / limits, address this by performing some
visibility checks during the bitmap build.  I'm bringing it up here because I
suspect that some of the necessary changes would overlap with what's needed to
repair bitmap index-only scans.


> The solution I could think to keep most of this optimization requires
> the heap bitmap scan to notice that a concurrent process started
> removing TIDs from the heap after amgetbitmap was called; i.e.. a
> "vacuum generation counter" incremented every time heap starts the
> cleanup run.

I'm not sure this is a good path. We can already clean up pages during index
accesses and we are working on being able to set all-visible during "hot
pruning"/page-level-vacuuming. That'd probably actually be safe (because we
couldn't remove dead items without a real vacuum), but it's starting to get
pretty complicated...



> This is quite non-trivial, however, as we don't have much in place regarding
> per-relation shared structures which we could put such a value into, nor a
> good place to signal changes of the value to bitmap heap-scanning backends.

It doesn't have to be driven of table state, it could be state in
indexes. Most (all?) already have a metapage.

We could also add that state to pg_class? We already update pg_class after
most vacuums, so I don't think that'd be an issue.

Thomas had a WIP patch to add a shared-memory table of all open
relations. Which we need for quite a few features by now (e.g. more efficient
buffer mapping table, avoiding the relation size lookup during planning /
execution, more efficient truncation of relations, ...).


> From 07739e5af83664b67ea02d3db7a461a106d74040 Mon Sep 17 00:00:00 2001
> From: Matthias van de Meent <boekewurm+postgres@gmail.com>
> Date: Mon, 2 Dec 2024 15:59:35 +0100
> Subject: [PATCH v1] Disable BitmapScan's skip_fetch optimization
> 
> The optimization doesn't take concurrent vacuum's removal of TIDs into
> account, which can remove dead TIDs and make ALL_VISIBLE pages for which
> we have pointers to those dead TIDs in the bitmap.
> 
> The optimization itself isn't impossible, but would require more work
> figuring out that vacuum started removing TIDs and then stop using the
> optimization. However, we currently don't have such a system in place,
> making the optimization unsound to keep around.

Unfortunately I don't see a better path either.

I think it'd be good if we added a test that shows the failure mechanism so
that we don't re-introduce this in the future. Evidently this failure isn't
immediately obvious...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: CREATE SCHEMA ... CREATE DOMAIN support
Next
From: Wolfgang Walther
Date:
Subject: Re: Proposal: Role Sandboxing for Secure Impersonation