Re: Do we need so many hint bits? - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Do we need so many hint bits?
Date
Msg-id CA+U5nM+pfN15VQi2HyobFpqbd5=WsB6HZ7pM9ehyLKrTHRwPbA@mail.gmail.com
Whole thread Raw
In response to Re: Do we need so many hint bits?  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Do we need so many hint bits?  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 17 November 2012 21:20, Jeff Davis <pgsql@j-davis.com> wrote:

>> ISTM that we should tune that specifically by performing a VM lookup
>> for next 32 pages (or more), so we reduce the lookups well below 1 per
>> page. That way the overhead of using the VM will be similar to using
>> the PD_ALL_VISIBLE.
>
> That's another potential way to mitigate the effects during a scan, but
> it does add a little complexity. Right now, it share locks a buffer, and
> uses an array with one element for each tuple in the page. If
> PD_ALL_VISIBLE is set, then it marks all of the tuples *currently
> present* on the page as visible in the array, and then releases the
> share lock. Then, when reading the page, if another tuple is added
> (because we released the share lock and only have a pin), it doesn't
> matter because it's already invisible according to the array.
>
> With this approach, we'd need to keep a larger array to represent many
> pages. And it sounds like we'd need to share lock the pages ahead, and
> find out which items are currently present, in order to properly fill in
> the array. Not quite sure what to do there, but would require some more
> thought.

Hmm, that's too much and not really what I was thinking, but I concede
that was a little vague. No need for bigger arrays etc..

If we check the VM for next N blocks, then we know that all completed
transactions are commited. Yes, the VM can change, but that is not a
problem.

What I mean is that we keep an array of boolean[N] that simply tracks
what the VM said last time we checked it. If that is true for a block
then we do special processing, similar to the current all-visible path
and yet different, desribed below.

What we want is to do a HeapTupleVisibility check that does not rely
on tuple hints AND yet avoids all clog access. So when we scan a
buffer in page mode and we know the VM said it was all visible we
still check each tuple's visibility. If xid is below snapshot xmin
then the xid is known committed and the tuple is visible to this scan
(not necessarily all scans). We know this because the VM said this
page was all-visible AFTER our snapshot was taken. If tuple xid is
within snapshot or greater than snapshot xmax then the tuple is
invisible to our snapshot and we don't need to check clog. So once we
know the VM said the page was all visible we do not need to check clog
to establish visibility, we only need to check the tuple xmin against
our snapshot xmin.

So the VM can change under us and it doesn't matter. We don't need a
pin or lock on the VM, we just read it and let go. No race conditions,
no fuss.

The difference here is that we still need to check visibility of each
tuple, but that can be a very cheap check and never involves clog, nor
does it dirty the page. Tuple access is reasonably expensive in
comparison with a clog-less check on tuple xmin against snapshot xmin,
so the extra work is negligible.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Parser - Query Analyser
Next
From: Andres Freund
Date:
Subject: Re: Do we need so many hint bits?