Thread: heap_hot_search_buffer refactoring

heap_hot_search_buffer refactoring

From

Robert Haas

Date:

06 June 2011, 15:03:46

The attached patch refactors heap_hot_search_buffer() so that
index_getnext() can use it, and modifies index_getnext() to do so.

The idea is based on one of Heikki's index-only scan patches, from 2009:

http://archives.postgresql.org/pgsql-hackers/2009-07/msg00676.php

I believe, however, that this portion of that patch stands alone,
completely independently of index-only scans.  At present, we have two
copies of the logic to traverse HOT chains floating around, which
means they can get out of sync, possibly resulting in bugs.  This has
already happened once:

http://archives.postgresql.org/pgsql-hackers/2011-05/msg00733.php

As a nice bonus, the new logic is both simpler and, I believe, more
correct than the current logic.  In IndexScanDescData, xs_hot_dead,
xs_next_hot, and xs_prev_xmax get replaced by a single boolean
xs_continue_hot.  There is a small behavioral difference: in the
current code, when use a non-MVCC snapshot with index_getnext() and
walk a HOT chain, each call remembers the *next* TID that should be
examined.  With this change, we instead remember the TID that we most
recently returned, and compute the next TID when index_getnext() is
called again.  It seems to me that this is really what we should have
been doing all along.  Imagine, for example, that we have a HOT chain
A -> B, where B has not yet committed.  We return A and remember that
we next intend to look at B.  Before index_getnext() is called, B
aborts and a new HOT update produces a HOT chain A -> C.  The next
call to index_getnext() will nonetheless look at B and conclude that
it's reached the end of the HOT chain.  This doesn't actually matter a
bit for current uses of index_getnext(), because - at least according
to Heikki's old notes - the only place we use a non-MVCC snapshot with
this function is during CLUSTER.  And at that point, we have the whole
table locked down, so nothing is happening concurrently.  I'm not sure
there's any possibility of us ever using this function in a way that
would break under the current implementation, but what I'm proposing
here does seem more robust to me.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

heap-hot-search-refactoring.patch

Re: heap_hot_search_buffer refactoring

From

Jeff Davis

Date:

19 June 2011, 14:49:55

On Mon, 2011-06-06 at 14:03 -0400, Robert Haas wrote:
> The attached patch refactors heap_hot_search_buffer() so that
> index_getnext() can use it, and modifies index_getnext() to do so.

Attached is a version of the patch that applies cleanly to master.

Regards,
    Jeff Davis

Attachment

heap-hot-search-buffer.patch.gz

Re: heap_hot_search_buffer refactoring

From

Jeff Davis

Date:

19 June 2011, 15:01:09

On Sun, 2011-06-19 at 10:50 -0700, Jeff Davis wrote:
> On Mon, 2011-06-06 at 14:03 -0400, Robert Haas wrote:
> > The attached patch refactors heap_hot_search_buffer() so that
> > index_getnext() can use it, and modifies index_getnext() to do so.
> 
> Attached is a version of the patch that applies cleanly to master.

In heap_hot_search_buffer:
 +       /* If this is not the first call, previous call returned            a (live!) tuple */         if (all_dead) -
             *all_dead = true; +               *all_dead = !first_call;
 

I think that's a typo: it should be:
 +               *all_dead = first_call;

Right?

Regards,Jeff Davis

Re: heap_hot_search_buffer refactoring

From

Robert Haas

Date:

19 June 2011, 15:17:09

On Sun, Jun 19, 2011 at 2:01 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sun, 2011-06-19 at 10:50 -0700, Jeff Davis wrote:
>> On Mon, 2011-06-06 at 14:03 -0400, Robert Haas wrote:
>> > The attached patch refactors heap_hot_search_buffer() so that
>> > index_getnext() can use it, and modifies index_getnext() to do so.
>>
>> Attached is a version of the patch that applies cleanly to master.
>
> In heap_hot_search_buffer:
>
>  +       /* If this is not the first call, previous call returned
>             a (live!) tuple */
>          if (all_dead)
>  -               *all_dead = true;
>  +               *all_dead = !first_call;
>
> I think that's a typo: it should be:
>
>  +               *all_dead = first_call;

Yikes.  I think you are right.  It's kind of scary that the regression
tests passed with that mistake.

New patch attached, with that one-line change.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

heap-hot-search-refactoring-v2.patch

Re: heap_hot_search_buffer refactoring

From

Tom Lane

Date:

19 June 2011, 15:20:40

Robert Haas <robertmhaas@gmail.com> writes:
> Yikes.  I think you are right.  It's kind of scary that the regression
> tests passed with that mistake.

Can we add a test that exposes that mistake?
        regards, tom lane

Re: heap_hot_search_buffer refactoring

From

Robert Haas

Date:

19 June 2011, 15:42:06

On Sun, Jun 19, 2011 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Yikes.  I think you are right.  It's kind of scary that the regression
>> tests passed with that mistake.
>
> Can we add a test that exposes that mistake?

Not sure.  We'd have to figure out how to reliably tickle it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: heap_hot_search_buffer refactoring

From

Robert Haas

Date:

19 June 2011, 17:51:50

On Sun, Jun 19, 2011 at 2:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Jun 19, 2011 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Yikes.  I think you are right.  It's kind of scary that the regression
>>> tests passed with that mistake.
>>
>> Can we add a test that exposes that mistake?
>
> Not sure.  We'd have to figure out how to reliably tickle it.

*thinks a bit*

When using an MVCC snapshot, we always have first_call = true, so the
effect of this mistake was just to disable the opportunistic killing
of dead tuples, which doesn't affect correctness.

When using a non-MVCC snapshot, we call heap_hot_search_buffer()
repeatedly until it returns false.  For so long as it returns true, it
does not matter how all_dead is set, because index_getnext() will
return the tuple without examining all_dead.  So only the final call
matters.  If the final call happens to also be the first call, then
all_dead might end up being false when it really ought to be true, but
that will once again just miss killing a dead tuple.  If the final
call isn't the first call, then we've got a problem, because now
all_dead will be true when it really ought to be false, and we'll nuke
an index tuple that we shouldn't nuke.

But if this is happening in the context of CLUSTER, then there might
still be no user-visible failure, because we're going to rebuild the
indexes anyway.  There could be a problem if CLUSTER aborts part-way
though.

A system catalog might get scanned with SnapshotNow, but to exercise
the bug you'd need to HOT update a system catalog and then have the
updating transaction commit between the time it sees the first row and
the time it sees the second one.

So I don't quite see how to construct a test case, ATM.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: heap_hot_search_buffer refactoring

From

Robert Haas

Date:

24 June 2011, 16:32:32

On Sun, Jun 19, 2011 at 2:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> New patch attached, with that one-line change.

Jeff, are you planning to review this further?  Do you think it's OK to commit?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: heap_hot_search_buffer refactoring

From

Jeff Davis

Date:

25 June 2011, 07:24:36

On Fri, 2011-06-24 at 15:32 -0400, Robert Haas wrote:
> On Sun, Jun 19, 2011 at 2:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > New patch attached, with that one-line change.
> 
> Jeff, are you planning to review this further?  Do you think it's OK to commit?

1. Patch does not apply to master cleanly, and it's in unified format
(so I can't compare it against the old patch very easily). This review
is for the first patch, disregarding the "skip = !first_call" issue that
you already fixed. If you had other changes in the latest version,
please repost the patch.

2. Comment above heap_hot_search_buffer should be updated to document
that heapTuple is an out-parameter and document the behavior of
first_call

3. The logic around "skip" is slightly confusing to me. Here's my
description: if it's not an MVCC snapshot and it's not the first call,
then you don't actually want to fetch the tuple with the given tid or a
later one in the chain -- you want to fetch the _next_ tuple in the
chain or a later one in the chain. Some wording of that description in a
comment (either in the function's comment or near the use of "skip")
would help a lot. Also, if skip is true, then the tid _must_ be visible
according to the (non-MVCC) snapshot, correct? It might help if that was
apparent from the code/comments.

Other than that, it looks good.

Regards,Jeff Davis

Re: heap_hot_search_buffer refactoring

From

Robert Haas

Date:

27 June 2011, 11:32:43

On Sat, Jun 25, 2011 at 6:24 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2011-06-24 at 15:32 -0400, Robert Haas wrote:
>> On Sun, Jun 19, 2011 at 2:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > New patch attached, with that one-line change.
>>
>> Jeff, are you planning to review this further?  Do you think it's OK to commit?
>
> 1. Patch does not apply to master cleanly, and it's in unified format
> (so I can't compare it against the old patch very easily). This review
> is for the first patch, disregarding the "skip = !first_call" issue that
> you already fixed. If you had other changes in the latest version,
> please repost the patch.

That is strange, because it applies for me.  But I had no other changes.

> 2. Comment above heap_hot_search_buffer should be updated to document
> that heapTuple is an out-parameter and document the behavior of
> first_call
>
> 3. The logic around "skip" is slightly confusing to me. Here's my
> description: if it's not an MVCC snapshot and it's not the first call,
> then you don't actually want to fetch the tuple with the given tid or a
> later one in the chain -- you want to fetch the _next_ tuple in the
> chain or a later one in the chain. Some wording of that description in a
> comment (either in the function's comment or near the use of "skip")
> would help a lot. Also, if skip is true, then the tid _must_ be visible
> according to the (non-MVCC) snapshot, correct? It might help if that was
> apparent from the code/comments.
>
> Other than that, it looks good.

OK, I've applied this with some additional comment changes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company