Thread: Re: Incorrect result of bitmap heap scan.

Re: Incorrect result of bitmap heap scan.

From
Andres Freund
Date:
Hi,

On 2024-12-02 16:08:02 +0100, Matthias van de Meent wrote:
> Concurrency timeline:
> 
> Session 1. amgetbitmap() gets snapshot of index contents, containing
> references to dead tuples on heap P1.

IIUC, an unstanted important aspect here is that these dead tuples are *not*
visible to S1. Otherwise the VACUUM in the next step could not remove the dead
tuples.


> Session 2. VACUUM runs on the heap, removes TIDs for P1 from the
> index, deletes those TIDs from the heap pages, and finally sets heap
> pages' VM bits to ALL_VISIBLE, including the now cleaned page P1
> Session 1. Executes the bitmap heap scan that uses the bitmap without
> checking tuples on P1 due to ALL_VISIBLE and a lack of output columns.

Ugh :/

Pretty depressing that we've only found this now, ~6 years after it's been
released. We're lacking some tooling to find this stuff in a more automated
fashion.


> PS.
> I don't think the optimization itself is completely impossible, and we
> can probably re-enable an optimization like that if (or when) we find
> a way to reliably keep track of when to stop using the optimization. I
> don't think that's an easy fix, though, and definitely not for
> backbranches.

One way to make the optimization safe could be to move it into the indexam. If
an indexam would check the VM bit while blocking removal of the index entries,
it should make it safe. Of course that has the disadvantage of needing more VM
lookups than before, because it would not yet have been deduplicated...


Another issue with bitmap index scans is that it currently can't use
killtuples. I've seen several production outages where performance would
degrade horribly over time whenever estimates lead to important queries to
switch to bitmap scans, because lots of dead tuples would get accessed over
and over.

It seems pretty much impossible to fix that with the current interaction
between nodeBitmap* and indexam. I wonder if we could, at least in some cases,
and likely with some heuristics / limits, address this by performing some
visibility checks during the bitmap build.  I'm bringing it up here because I
suspect that some of the necessary changes would overlap with what's needed to
repair bitmap index-only scans.


> The solution I could think to keep most of this optimization requires
> the heap bitmap scan to notice that a concurrent process started
> removing TIDs from the heap after amgetbitmap was called; i.e.. a
> "vacuum generation counter" incremented every time heap starts the
> cleanup run.

I'm not sure this is a good path. We can already clean up pages during index
accesses and we are working on being able to set all-visible during "hot
pruning"/page-level-vacuuming. That'd probably actually be safe (because we
couldn't remove dead items without a real vacuum), but it's starting to get
pretty complicated...



> This is quite non-trivial, however, as we don't have much in place regarding
> per-relation shared structures which we could put such a value into, nor a
> good place to signal changes of the value to bitmap heap-scanning backends.

It doesn't have to be driven of table state, it could be state in
indexes. Most (all?) already have a metapage.

We could also add that state to pg_class? We already update pg_class after
most vacuums, so I don't think that'd be an issue.

Thomas had a WIP patch to add a shared-memory table of all open
relations. Which we need for quite a few features by now (e.g. more efficient
buffer mapping table, avoiding the relation size lookup during planning /
execution, more efficient truncation of relations, ...).


> From 07739e5af83664b67ea02d3db7a461a106d74040 Mon Sep 17 00:00:00 2001
> From: Matthias van de Meent <boekewurm+postgres@gmail.com>
> Date: Mon, 2 Dec 2024 15:59:35 +0100
> Subject: [PATCH v1] Disable BitmapScan's skip_fetch optimization
> 
> The optimization doesn't take concurrent vacuum's removal of TIDs into
> account, which can remove dead TIDs and make ALL_VISIBLE pages for which
> we have pointers to those dead TIDs in the bitmap.
> 
> The optimization itself isn't impossible, but would require more work
> figuring out that vacuum started removing TIDs and then stop using the
> optimization. However, we currently don't have such a system in place,
> making the optimization unsound to keep around.

Unfortunately I don't see a better path either.

I think it'd be good if we added a test that shows the failure mechanism so
that we don't re-introduce this in the future. Evidently this failure isn't
immediately obvious...

Greetings,

Andres Freund



Re: Incorrect result of bitmap heap scan.

From
Peter Geoghegan
Date:
On Mon, Dec 2, 2024 at 11:32 AM Andres Freund <andres@anarazel.de> wrote:
> On 2024-12-02 16:08:02 +0100, Matthias van de Meent wrote:
> > Concurrency timeline:
> >
> > Session 1. amgetbitmap() gets snapshot of index contents, containing
> > references to dead tuples on heap P1.
>
> IIUC, an unstanted important aspect here is that these dead tuples are *not*
> visible to S1. Otherwise the VACUUM in the next step could not remove the dead
> tuples.

I would state the same thing slightly differently, FWIW: I would say
that the assumption that has been violated is that a TID is a stable
identifier for a given index tuple/logical row/row version (stable for
the duration of the scan).

This emphasis/definition seems slightly more useful, because it makes
it clear why this is hard to hit: you have to be fairly unlucky for a
dead-to-everyone TID to be returned by your scan (no LP_DEAD bit can
be set) and set in the scan's bitmap, only to later be concurrently
set LP_UNUSED in the heap -- all without VACUUM randomly being
prevented from setting the same page all-visible due to some unrelated
not-all-visible heap item making it unsafe.

> Ugh :/
>
> Pretty depressing that we've only found this now, ~6 years after it's been
> released. We're lacking some tooling to find this stuff in a more automated
> fashion.

FWIW I have suspicions about similar problems with index-only scans
that run in hot standby, and about all GiST index-only scans:

https://postgr.es/m/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com

In general there seems to be a lack of rigor in this area. I'm hoping
that Tomas Vondra's work can tighten things up here in passing.

> It seems pretty much impossible to fix that with the current interaction
> between nodeBitmap* and indexam. I wonder if we could, at least in some cases,
> and likely with some heuristics / limits, address this by performing some
> visibility checks during the bitmap build.  I'm bringing it up here because I
> suspect that some of the necessary changes would overlap with what's needed to
> repair bitmap index-only scans.

This seems like it plays into some of the stuff I've discussed with
Tomas, about caching visibility information in local state, as a means
to avoiding holding onto pins in the index AM. For things like
mark/restore support.

> > This is quite non-trivial, however, as we don't have much in place regarding
> > per-relation shared structures which we could put such a value into, nor a
> > good place to signal changes of the value to bitmap heap-scanning backends.
>
> It doesn't have to be driven of table state, it could be state in
> indexes. Most (all?) already have a metapage.

FWIW GiST doesn't have a metapage (a historical oversight).

> Unfortunately I don't see a better path either.
>
> I think it'd be good if we added a test that shows the failure mechanism so
> that we don't re-introduce this in the future. Evidently this failure isn't
> immediately obvious...

Maybe a good use for injection points?

--
Peter Geoghegan



Re: Incorrect result of bitmap heap scan.

From
Peter Geoghegan
Date:
On Mon, Dec 2, 2024 at 3:56 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I took what you wrote, and repurposed it to prove my old theory about
> GiST index-only scans being broken due to the lack of an appropriate
> interlock against concurrent TID recycling. See the attached patch.

BTW, if you change the test case to use the default B-Tree index AM
(by removing "USING GIST"), you'll see that VACUUM blocks on acquiring
a cleanup lock (and so the test just times out). The problem is that
GiST VACUUM just doesn't care about cleanup locks/TID recycling safety
-- though clearly it should.

--
Peter Geoghegan



Re: Incorrect result of bitmap heap scan.

From
Peter Geoghegan
Date:
On Mon, Dec 2, 2024 at 3:56 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I took what you wrote, and repurposed it to prove my old theory about
> GiST index-only scans being broken due to the lack of an appropriate
> interlock against concurrent TID recycling. See the attached patch.

I've moved discussion of this GiST bug over to the old 2021 thread
where I first raised concerns about the issue:

https://postgr.es/m/CAH2-Wz=jjiNL9FCh8C1L-GUH15f4WFTWub2x+_NucngcDDcHKw@mail.gmail.com

The GiST bug is actually causally unrelated to the bitmap index scan
bug under discussion, despite the high-level similarity. Seems best to
keep discussion of GiST on its own thread, for that reason.

--
Peter Geoghegan



Re: Incorrect result of bitmap heap scan.

From
Matthias van de Meent
Date:
On Mon, 2 Dec 2024 at 17:31, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-12-02 16:08:02 +0100, Matthias van de Meent wrote:
> > Concurrency timeline:
> >
> > Session 1. amgetbitmap() gets snapshot of index contents, containing
> > references to dead tuples on heap P1.
>
> IIUC, an unstanted important aspect here is that these dead tuples are *not*
> visible to S1. Otherwise the VACUUM in the next step could not remove the dead
> tuples.

Correct, this is about TIDs that are dead to all transactions, so
which might already be LP_DEAD in the heap.

> > PS.
> > I don't think the optimization itself is completely impossible, and we
> > can probably re-enable an optimization like that if (or when) we find
> > a way to reliably keep track of when to stop using the optimization. I
> > don't think that's an easy fix, though, and definitely not for
> > backbranches.
>
> One way to make the optimization safe could be to move it into the indexam. If
> an indexam would check the VM bit while blocking removal of the index entries,
> it should make it safe. Of course that has the disadvantage of needing more VM
> lookups than before, because it would not yet have been deduplicated...

I'm -0.5 on adding visibility checking to index AM's contract. The
most basic contract that an index AM must implement is currently:

1. Store the index tuples provided by aminsert()
2.a With amgettuple, return at least all stored TIDs that might match
the scankey (optionally marked with recheck when the AM isn't sure
about the TID matching the scankeys, or when returning TIDs not
provided by aminsert()), or
2.b With amgetbitmap, return a bitmap containing at least the TIDs
that match the description of 2.a (i.e. allows an index to have an
optimized approach to adding outputs of 2.a into a bitmap)
3. Remove all the bulkdelete-provided tids from its internal structures

Note that visibility checking is absent. Any visibility or dead tuple
hints (through e.g. kill_prior_tuple, or calling
table_index_delete_tuples) are optimizations which are not required
for basic index AM operations, and indeed they are frequently not
implemented in index AMs. Adding a requirement for index AMs to do
visibility checks would IMO significantly change the current
API/layering contracts.

> Another issue with bitmap index scans is that it currently can't use
> killtuples. I've seen several production outages where performance would
> degrade horribly over time whenever estimates lead to important queries to
> switch to bitmap scans, because lots of dead tuples would get accessed over
> and over.
>
> It seems pretty much impossible to fix that with the current interaction
> between nodeBitmap* and indexam. I wonder if we could, at least in some cases,
> and likely with some heuristics / limits, address this by performing some
> visibility checks during the bitmap build.

It's an interesting approach worth exploring. I'm a bit concerned
about the IO patterns this would create, though, especially when this
relates to BitmapAnd nodes: This node would create VM check IOs on the
order of O(sum_matching_tuple_pages), instead of
O(intersect_matching_tuple_pages). Additionally, it's wasted IO if
we're planning to go to the heap anyway, so this VM precheck would
have to be conditional on the bitmap scan not wanting any table data
(e.g. row_number, count()).

> I'm bringing it up here because I
> suspect that some of the necessary changes would overlap with what's needed to
> repair bitmap index-only scans.

I'd call this "bitmap-only scans" instead, as there might be multiple
indexes involved, but indeed, this also does sound like a viable
approach.

> > The solution I could think to keep most of this optimization requires
> > the heap bitmap scan to notice that a concurrent process started
> > removing TIDs from the heap after amgetbitmap was called; i.e.. a
> > "vacuum generation counter" incremented every time heap starts the
> > cleanup run.
>
> I'm not sure this is a good path. We can already clean up pages during index
> accesses and we are working on being able to set all-visible during "hot
> pruning"/page-level-vacuuming. That'd probably actually be safe (because we
> couldn't remove dead items without a real vacuum), but it's starting to get
> pretty complicated...

I imagine this solution to happen in the executor/heapAM bitmapscan
code, not in the index AM's amgetbitmap. It'd note down the 'vacuum
generation counter' before extracting the index's bitmap, and then,
after every VM lookup during the bitmap heap scan, it validates that
the generation counter hasn't changed (and thus that we can use that
VM bit as authorative for the visibility of the TIDs we got). I don't
think that this interaction specifically is very complicated, but it
would indeed add to the overall complexity of the system.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Incorrect result of bitmap heap scan.

From
Matthias van de Meent
Date:
On Tue, 7 Jan 2025 at 18:46, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> Hi,
>
> I've rebased my earlier patch to fix some minor conflicts with the
> work done on bitmaps in December last year. I've also included Andres'
> cursor-based isolation test as 0002; which now passes.

Rebased again, now on top of head due to f7a8fc10.

> The patches for the back-branches didn't need updating, as those
> branches have not diverged enough for those patches to have gotten
> stale. They're still available in my initial mail over at [0].

Same again.

> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)
>
> [0] https://www.postgresql.org/message-id/CAEze2Wg1Q4gWzm9RZ0yXydm23Mj3iScu8LA__Zz3JJEgpnoGPQ%40mail.gmail.com

Attachment

Re: Incorrect result of bitmap heap scan.

From
Peter Geoghegan
Date:
On Fri, Feb 28, 2025 at 12:53 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Rebased again, now on top of head due to f7a8fc10.

Is everybody in agreement about committing and back patching this fix,
which simply disables the optimization altogether?

I myself don't see a better way, but thought I'd ask before proceeding
with review and commit.

--
Peter Geoghegan



Re: Incorrect result of bitmap heap scan.

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> Is everybody in agreement about committing and back patching this fix,
> which simply disables the optimization altogether?
> I myself don't see a better way, but thought I'd ask before proceeding
> with review and commit.

If you don't see a clear path forward, then "disable" is the only
reasonable choice for the back branches.  Maybe we'll find a fix
in future, but it seems unlikely that it'd be back-patchable.

            regards, tom lane



Re: Incorrect result of bitmap heap scan.

From
Matthias van de Meent
Date:
On Sun, 2 Mar 2025 at 01:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Geoghegan <pg@bowt.ie> writes:
> > Is everybody in agreement about committing and back patching this fix,
> > which simply disables the optimization altogether?
> > I myself don't see a better way, but thought I'd ask before proceeding
> > with review and commit.
>
> If you don't see a clear path forward, then "disable" is the only
> reasonable choice for the back branches.  Maybe we'll find a fix
> in future, but it seems unlikely that it'd be back-patchable.

Agreed.

Here's patch v5 for the master branch (now up to f4694e0f), with no
interesting changes other than fixing apply conflicts caused by
bfe56cdf.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachment

Re: Incorrect result of bitmap heap scan.

From
Andres Freund
Date:
Hi,

On 2025-03-01 19:35:15 -0500, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > Is everybody in agreement about committing and back patching this fix,
> > which simply disables the optimization altogether?
> > I myself don't see a better way, but thought I'd ask before proceeding
> > with review and commit.
> 
> If you don't see a clear path forward, then "disable" is the only
> reasonable choice for the back branches.  Maybe we'll find a fix
> in future, but it seems unlikely that it'd be back-patchable.

I don't think there's a realistic way to fix it in the backbranches. And even
on master, I doubt that much of the current code would survive.

Greetings,

Andres Freund



Re: Incorrect result of bitmap heap scan.

From
vignesh C
Date:
On Wed, 5 Mar 2025 at 16:43, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Sun, 2 Mar 2025 at 01:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Peter Geoghegan <pg@bowt.ie> writes:
> > > Is everybody in agreement about committing and back patching this fix,
> > > which simply disables the optimization altogether?
> > > I myself don't see a better way, but thought I'd ask before proceeding
> > > with review and commit.
> >
> > If you don't see a clear path forward, then "disable" is the only
> > reasonable choice for the back branches.  Maybe we'll find a fix
> > in future, but it seems unlikely that it'd be back-patchable.
>
> Agreed.
>
> Here's patch v5 for the master branch (now up to f4694e0f), with no
> interesting changes other than fixing apply conflicts caused by
> bfe56cdf.

I noticed that Andres's comment from [1] is not yet addressed,
changing the commitfest entry status to Waiting on Author, please
address the comment and change it back to Needs review.
[1] - https://www.postgresql.org/message-id/tf5pp2o2a5x5qjcseq354bd26ya4o7p2vjzm5z4w57ca3vy6bc@ep7enrljvdkr

Regards,
Vignesh



Re: Incorrect result of bitmap heap scan.

From
Matthias van de Meent
Date:
On Sun, 16 Mar 2025 at 13:55, vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 5 Mar 2025 at 16:43, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> >
> > On Sun, 2 Mar 2025 at 01:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >
> > > Peter Geoghegan <pg@bowt.ie> writes:
> > > > Is everybody in agreement about committing and back patching this fix,
> > > > which simply disables the optimization altogether?
> > > > I myself don't see a better way, but thought I'd ask before proceeding
> > > > with review and commit.
> > >
> > > If you don't see a clear path forward, then "disable" is the only
> > > reasonable choice for the back branches.  Maybe we'll find a fix
> > > in future, but it seems unlikely that it'd be back-patchable.
> >
> > Agreed.
> >
> > Here's patch v5 for the master branch (now up to f4694e0f), with no
> > interesting changes other than fixing apply conflicts caused by
> > bfe56cdf.
>
> I noticed that Andres's comment from [1] is not yet addressed,
> changing the commitfest entry status to Waiting on Author, please
> address the comment and change it back to Needs review.
> [1] - https://www.postgresql.org/message-id/tf5pp2o2a5x5qjcseq354bd26ya4o7p2vjzm5z4w57ca3vy6bc@ep7enrljvdkr

As I understand it, Andres agrees that the feature is unsalvageable in
backbranches ("don't think there's a realistic way to fix it"). Andres
then continues to elaborate that even if the feature were salvageable,
it wouldn't contain much of the current code.

My patch removes the feature altogether in master and disables the
feature in the backbranches in the patches that were built against the
backbranches, which seems to match Andres' comments.

@vignesh, could you elaborate on what about Andres' comments I need to
address in my patch?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Incorrect result of bitmap heap scan.

From
Andres Freund
Date:
Hi,

On 2025-03-17 16:17:03 +0100, Matthias van de Meent wrote:
> As I understand it, Andres agrees that the feature is unsalvageable in
> backbranches ("don't think there's a realistic way to fix it"). Andres
> then continues to elaborate that even if the feature were salvageable,
> it wouldn't contain much of the current code.

Correct.


> My patch removes the feature altogether in master and disables the
> feature in the backbranches in the patches that were built against the
> backbranches, which seems to match Andres' comments.

Agreed.


> @vignesh, could you elaborate on what about Andres' comments I need to
> address in my patch?

I don't think there are any...

Greetings,

Andres Freund



Re: Incorrect result of bitmap heap scan.

From
Andres Freund
Date:
Hi,

Matthias, any chance you can provide a rebased version for master? If not,
I'll do the rebase.  Either way I'm planning to push this fairly soon.

Greetings,

Andres Freund



Re: Incorrect result of bitmap heap scan.

From
Matthias van de Meent
Date:
On Wed, 2 Apr 2025 at 17:37, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> Matthias, any chance you can provide a rebased version for master?

Sure, I'll try to have it in your inbox later today CEST.

> Either way I'm planning to push this fairly soon.

OK, thanks!

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Incorrect result of bitmap heap scan.

From
Matthias van de Meent
Date:
On Wed, 2 Apr 2025 at 18:20, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Wed, 2 Apr 2025 at 17:37, Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > Matthias, any chance you can provide a rebased version for master?
>
> Sure, I'll try to have it in your inbox later today CEST.

And here it is, v6 of the patchset, rebased up to master @ 764d501d.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachment

Re: Incorrect result of bitmap heap scan.

From
Andres Freund
Date:
Hi,

On 2025-04-02 18:58:11 +0200, Matthias van de Meent wrote:
> And here it is, v6 of the patchset, rebased up to master @ 764d501d.

Thanks!

Does anybody have an opinion about how non-invasive to be in the
back-branches? The minimal version is something like this diff:

diff --git i/src/backend/executor/nodeBitmapHeapscan.c w/src/backend/executor/nodeBitmapHeapscan.c
index 6b48a6d8350..0bd8b3404c1 100644
--- i/src/backend/executor/nodeBitmapHeapscan.c
+++ w/src/backend/executor/nodeBitmapHeapscan.c
@@ -187,6 +187,19 @@ BitmapHeapNext(BitmapHeapScanState *node)
         {
             bool        need_tuples = false;

+            /*
+             * Unfortunately it turns out that the below optimization does not
+             * take the removal of TIDs by a concurrent vacuum into
+             * account. The concurrent vacuum can remove dead TIDs and make
+             * pages ALL_VISIBLE while those dead TIDs are referenced in the
+             * bitmap. This would lead to a !need_tuples scan returning too
+             * many tuples.
+             *
+             * In the back-branches, we therefore simply disable the
+             * optimization. Removing all the relevant code would be too
+             * invasive (and a major backpatching pain).
+             */
+#ifdef NOT_ANYMORE
             /*
              * We can potentially skip fetching heap pages if we do not need
              * any columns of the table, either for checking non-indexable
@@ -197,6 +210,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
              */
             need_tuples = (node->ss.ps.plan->qual != NIL ||
                            node->ss.ps.plan->targetlist != NIL);
+#endif


But it seems a bit weird to continue checking SO_NEED_TUPLES (which is what
need_tuples ends up as) in other parts of the code. But it's less work to
backpatch that way.  Obviously we can't remove the relevant struct fields in
the backbranches.



Other notes:

- Should we commit the test showing that the naive implementation of
  index-only-bitmap-heapscan is broken, in case somebody wants to re-introduce
  it?

  If so, I think we should not backpatch the test. If it turns out to not be
  stable, it's a pain to fix test stability issues over multiple branches.

  And the patch would need a bit of comment editing to explain that, easy
  enough.


- We have some superfluous includes in nodeBitmapHeapscan.c - but I think
  that's not actually the fault of this patch. Seems the read-stream patch
  should have removed the at least the includes of visibilitymap.h, bufmgr.h
  and spccache.h?  And b09ff53667f math.h...


Greetings,

Andres Freund



Re: Incorrect result of bitmap heap scan.

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Does anybody have an opinion about how non-invasive to be in the
> back-branches? The minimal version is something like this diff:

Minimal is good -- less chance of breaking anything.

> - Should we commit the test showing that the naive implementation of
>   index-only-bitmap-heapscan is broken, in case somebody wants to re-introduce
>   it?

Seems like a good idea.  Agreed on HEAD-only for that.

> - We have some superfluous includes in nodeBitmapHeapscan.c - but I think
>   that's not actually the fault of this patch. Seems the read-stream patch
>   should have removed the at least the includes of visibilitymap.h, bufmgr.h
>   and spccache.h?  And b09ff53667f math.h...

Meh, let's leave that for the next round of IWYU hacking.

            regards, tom lane



Re: Incorrect result of bitmap heap scan.

From
Matthias van de Meent
Date:
On Wed, 2 Apr 2025 at 19:48, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2025-04-02 18:58:11 +0200, Matthias van de Meent wrote:
> > And here it is, v6 of the patchset, rebased up to master @ 764d501d.
>
> Thanks!
>
> Does anybody have an opinion about how non-invasive to be in the
> back-branches? The minimal version is something like this diff:
[...]
> But it seems a bit weird to continue checking SO_NEED_TUPLES (which is what
> need_tuples ends up as) in other parts of the code. But it's less work to
> backpatch that way.  Obviously we can't remove the relevant struct fields in
> the backbranches.

A minimal version seems fine to me.

> Other notes:
>
> - Should we commit the test showing that the naive implementation of
>   index-only-bitmap-heapscan is broken, in case somebody wants to re-introduce
>   it?

I'd prefer that, yes.

>   If so, I think we should not backpatch the test. If it turns out to not be
>   stable, it's a pain to fix test stability issues over multiple branches.

That's fair. But if the reason for not adding a test is potential
instability we could just as well add it now and remove it if it
actually proves to be unstable.

> - We have some superfluous includes in nodeBitmapHeapscan.c - but I think
>   that's not actually the fault of this patch. Seems the read-stream patch
>   should have removed the at least the includes of visibilitymap.h, bufmgr.h
>   and spccache.h?  And b09ff53667f math.h...

I have no strong opinion about this.

Thank you for picking this up!

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Incorrect result of bitmap heap scan.

From
Andres Freund
Date:
Hi,

On 2025-04-02 14:17:01 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Does anybody have an opinion about how non-invasive to be in the
> > back-branches? The minimal version is something like this diff:
> 
> Minimal is good -- less chance of breaking anything.
> 
> > - Should we commit the test showing that the naive implementation of
> >   index-only-bitmap-heapscan is broken, in case somebody wants to re-introduce
> >   it?
> 
> Seems like a good idea.  Agreed on HEAD-only for that.

Pushed that way.

Thanks for the report and the fix!

Greetings,

Andres Freund