Re: Qual push down to table AM - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Qual push down to table AM
Date
Msg-id CA+TgmoZhzRxZ9RWsTpUQiw+zNGUtNAFsFQDqr+W1dn2BJom7TQ@mail.gmail.com
Whole thread Raw
In response to Re: Qual push down to table AM  (Julien Tachoires <julien@tachoires.me>)
Responses Re: Qual push down to table AM
List pgsql-hackers
On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien@tachoires.me> wrote:
> Thank you for this quick feedback.
>
> One potential approach to solve this in heapgettup() would be:
> 1. hold the buffer lock
> 2. get the tuple from the buffer
> 3. if the tuple is not visible, move to the next tuple, back to 2.
> 4. release the buffer lock
> 5. if the tuple does not satisfy the scan keys, take the buffer lock,
>    move to the next tuple, back to 2.
> 6. return the tuple
>
> Do you see something fundamentally wrong here?

I spent a bit of time this afternoon looking at v4-0001. I noticed a
few spelling mistakes (abritrary x2, statisfied x1). As far as the
basic approach is concerned, I don't see how there can be a safety
problem here. If it's safe to release the buffer lock when we find a
tuple that matches the quals, for the purposes of returning that tuple
to the caller, then it seems like it must also be safe to release it
to evaluate a proposed qual.

Potentially, there could be a performance problem. Imagine that we
have some code right now that uses this code path and it's safe
because the qual that we're evaluating is something super-simple like
the integer less-than operator, so calling it under the buffer lock
doesn't create a stability hazard. Well, with the patch, we'd
potentially take and release the buffer lock a lot more times than we
do right now. Imagine that there are lots of tuples on each page but
only 1 or very few of them satisfy the qual: then we lock and unlock
the buffer a whole bunch of times instead of just once.

However, I don't think this really happens in practice. I believe it's
possible to take this code path if you set ignore_system_indexes=on,
because that turns index scans --- which, not surprisingly, have
scankeys --- into sequential scans which then end up also having
scankeys. Many of those scans use catalog snapshots so there's no
issue, but a little bit of debugging code seems to show that
systable_beginscan() can also be called with snapshot->snapshot_type
set to SNAPSHOT_ANY or SNAPSHOT_DIRTY. For example, see
GetNewOidWithIndex(). However, even if ignore_system_indexes=on gets a
little slower as a result of this or some other patch, I don't think
we really care, and without that setting, this code doesn't seem to
get exercised at all.

So, somewhat to my surprise, I think that v4-0001 might be basically
fine. I wonder if anyone else sees a problem that I'm missing?

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] VACUUM: avoid pre-creation transactions holding back cleanup of newly created relations
Next
From: Melanie Plageman
Date:
Subject: Re: Trying out read streams in pgvector (an extension)