Home > mailing lists

Re: Optimize single tuple fetch from nbtree index - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Optimize single tuple fetch from nbtree index
Date	August 24, 2019 00:58:17
Msg-id	CAH2-Wz=m674-RKQdCG+jCD9QGzN1Kcg-FOdYw4-j+5_PfcHbpQ@mail.gmail.com Whole thread Raw
In response to	Re: Optimize single tuple fetch from nbtree index (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List	pgsql-hackers

Tree view

On Fri, Aug 23, 2019 at 10:14 AM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
> Though, I got interested in the comment inconsistency you have found.
> I added debug message into this code branch of the patch and was able to
> see it in regression.diffs after 'make check':
> Speaking of your patch, it seems that the buffer was unpinned and pinned
> again between two reads,
> and the condition of holding it continuously has not been met.

See commit 2ed5b87f. The code is supposed to do that, but it might do
it more often than is truly necessary. We don't want to block VACUUM
by holding a buffer pin for a very long time, which is theoretically
possible here. Do you think that it is actually unnecessary here? In
other words, do you think that we can fix this without breaking cases
that commit 2ed5b87f cares about?

I have been suspicious of this commit all along. For example, I
noticed that it can cause the kill_prior_tuple mechanism to be
ineffective in a way that didn't happen prior to Postgres 9.5:

https://postgr.es/m/CAH2-Wz=SfAKVMv1x9Jh19EJ8am8TZn9f-yECipS9HrrRqSswnA@mail.gmail.com

That particular complaint was never addressed. I meant to do more on
commit 2ed5b87f.

> I didn't dig into the code, but this line looks suspicious (see my
> findings about BTScanPosIsPinned below):
>
>          /* bump pin on current buffer for assignment to mark buffer */
>          if (BTScanPosIsPinned(so->currPos))
>              IncrBufferRefCount(so->currPos.buf);
>
>
> While reading the code to answer your question, I noticed that
> BTScanPosIsPinned macro name is misleading.
> It calls BufferIsValid(), not BufferIsPinned() as one could expect.
> And BufferIsValid in bufmgr.h comment explicitly states that it
> shouldn't be confused with BufferIsPinned.
> The same goes for BTScanPosUnpinIfPinned().

I have always hated this macro. I think that code like the specific
code you quoted might be correct, kind of, but it looks like the
author was trying to change as little as possible about the code as it
existed in 2015, rather than changing things so that everything made
sense. It looks like a messy accretion.

Let me see if I can get it straight:

We're incrementing the ref count on the buffer if and only if it is
pinned (by which we mean valid), though only when the scan is valid
(which is not the same as pinned). Whether or not we increment the
count of a valid scan doesn't affect anything else we do (i.e. we
still restore a marked position either way).

This is just awful.

> I propose that we update BTScanPosIsPinned macro. Or, at least write a
> comment, why its current behavior is fine.
> There are a few existing callers, that are definitely expecting that
> this macro checks a pin, which it doesn't do.
> I don't quite understand if that already causes any subtle bug, or the
> current algorithm is fine.

I think that you're right -- at a minimum, this requires more
documentation. This code is a few years old, but I still wouldn't be
surprised if it turned out to be slightly wrong in a way that was
important. We still have no way of detecting if a buffer is accessed
without a pin. There have been numerous bugs like that before. (We
have talked about teaching Valgrind to detect the case, but that never
actually happened.)

--
Peter Geoghegan

pgsql-hackers by date:

From: Ibrar Ahmed
Date: 23 August 2019, 22:43:27
Subject: Re: Email to hackers for test coverage

From: Thomas Munro
Date: 24 August 2019, 01:01:41
Subject: Re: Asymmetric partition-wise JOIN

Re: Optimize single tuple fetch from nbtree index - Mailing list pgsql-hackers

Previous

Next