Various silliness in heap_getnext and related routines - Mailing list pgsql-hackers

From Tom Lane
Subject Various silliness in heap_getnext and related routines
Date
Msg-id 12086.992036108@sss.pgh.pa.us
Whole thread Raw
Responses Re: Various silliness in heap_getnext and related routines  (bruc@stone.congenomics.com (Robert E. Bruccoleri))
List pgsql-hackers
Robert Bruccoleri (bruc@stone.congen.com) wrote:
> It's not clear to me why the spinlock needs be grabbed at the
> beginning of RelationGetBufferWithBuffer,

I believe you are right: the spinlock doesn't need to be grabbed,
because if a valid buffer is passed in, it must already be pinned
(since the returned buffer is expected to be pinned).  Hence the check
for same-buffer could be done without first grabbing the spinlock.

In fact, closer analysis reveals a lot of silliness here.  There is an
inherent error in RelationGetBufferWithBuffer: if it allocates a fresh
buffer then that buffer gets pinned, but if it returns the passed-in
buffer then no extra pin is added.  Since the caller does not know which
case applies, there must be either a leaked pin or a bogus pin count
decrement later on.  The reason we see no failure is that the only use
of this routine is from heapgettup, and it turns out that the passed
buffer is *always* either invalid or the target buffer.  (I have run the
regression tests with Asserts in place that confirm this deduction.)
What's more, heapgettup conveniently forgets to decrement the pin count
of the buffer it passes, thus cancelling out
RelationGetBufferWithBuffer's failure to increment the pin count.  If
the target page were ever different from the passed buffer, these errors
would not cancel out and we'd have a buffer leak.

I think that we ought to get rid of RelationGetBufferWithBuffer
entirely, since it is really a broken variant of ReleaseAndReadBuffer.
We should add a test to ReleaseAndReadBuffer to short-circuit the work
if the passed buffer is the same as the requested page, and change the
calls in heapgettup to call ReleaseAndReadBuffer instead, which makes
the heapam.c code correct not to do its own decrements of the pin count.
(The handwaving comments in heap_getnext's calls of heapgettup could
then be fixed.)

While I'm looking at this, I can't help noticing that heapam.c goes to a
large amount of trouble to maintain "previous" and "next" tuple pointers
and pins in a seqscan, not only "current".  These pointers are useless
except in the very unusual case where one steps forward and then back
in a sequential scan (for example, "FETCH 1; FETCH BACKWARD 1;" in a
cursor).  It seems to me that this is wrongheaded.  We could simplify
and speed up the normal case by maintaining only a "current" pointer,
which would be well worth the extra work in the forward/back step case.

Comments, objections?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Is this a bug?
Next
From: Tom Lane
Date:
Subject: Re: make check fails