Thread: index-only scans versus serializable transactions

index-only scans versus serializable transactions

From
"Kevin Grittner"
Date:
By not visiting the heap page for tuples, index-only scans fail to
acquire all of the necessary predicate locks for correct behavior at
the serializable transaction isolation level.  The tag for the
tuple-level predicate locks includes the xmin, to avoid possible
problems with tid re-use.  (This was not covered in initial
pre-release versions of SSI, and testing actually hit the problem.)
When an "index-only" scan does need to look at the heap because the
visibility map doesn't indicate that the tuple is visible to all
transactions, the tuple-level predicate lock is acquired.  The best
we can do without visiting the heap is a page level lock on the heap
page, so that is what the attached patch does.

If there are no objections, I will apply to HEAD and 9.2.

-Kevin

Attachment

Re: index-only scans versus serializable transactions

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> By not visiting the heap page for tuples, index-only scans fail to
> acquire all of the necessary predicate locks for correct behavior at
> the serializable transaction isolation level.  The tag for the
> tuple-level predicate locks includes the xmin, to avoid possible
> problems with tid re-use.  (This was not covered in initial
> pre-release versions of SSI, and testing actually hit the problem.) 
> When an "index-only" scan does need to look at the heap because the
> visibility map doesn't indicate that the tuple is visible to all
> transactions, the tuple-level predicate lock is acquired.  The best
> we can do without visiting the heap is a page level lock on the heap
> page, so that is what the attached patch does.

> If there are no objections, I will apply to HEAD and 9.2.

This isn't right in detail: there are paths through the loop where
"tuple" is not NULL at the beginning of the next iteration
(specifically, consider failure of a lossy-qual recheck).  I think
that only results in wasted work, but it's still not operating as
intended.  I'd suggest moving the declaration/initialization of the
"tuple" variable to inside the while loop, since there's no desire
for its value to carry across loops.
        regards, tom lane