Re: Page at a time index scan - Mailing list pgsql-patches

From Tom Lane
Subject Re: Page at a time index scan
Date
Msg-id 16108.1146665863@sss.pgh.pa.us
Whole thread Raw
In response to Re: Page at a time index scan  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Page at a time index scan  (Simon Riggs <simon@2ndquadrant.com>)
Re: Page at a time index scan  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
Simon Riggs <simon@2ndquadrant.com> writes:
> AFAICS we will need to return to the page for a backward scan, so we
> could just keep the pin the whole way. It's not possible to cache the
> left page pointer because block splits to our immediate left can update
> them even after we read the page contents.

Sure we can cache the left pointer: it's not any different from the
normal walk-left problem, because as soon as you drop pin on the page
you are leaving, all the same hazards apply.  This algorithm just has a
slightly wider window between dropping one pin and acquiring the next.
Walk-left is painful and (rarely) expensive, but it's a solved problem.

> We need never return to a page that *could* be deleted. While scanning
> in either direction, if the complete page contains nothing but dead
> items we can simply move straight onto the next page, having updated the
> page status to half-dead.

This is unnecessary and probably wrong.

> - we have a minimum threshold of deleted tuples. Otherwise we might
> re-dirty the page for just a single hint bit, so we end up writing the
> page out hundreds of times. (Guess: that should be 2 or 3)

You are optimizing the wrong thing here.  If we choose not to mark an
entry dead then we will pay for that omission on every future scan of
the same entry.  I don't think that outweighs the (doubtless rare)
situation where we expend an extra page fetch to reload the page.

It's worth noting that all of this stuff is predicated on the assumption
that index items never move across pre-existing page boundaries, in
either direction.  We are therefore going to be permanently giving up
any prospect of index space reclamation by merging partly-filled pages
(unless maybe in VACUUM FULL).  We didn't know how to do that anyway,
so I don't feel too bad about it, but if indexscans don't make any
attempt to explicitly re-locate their positions then that certainly
goes out the window.

            regards, tom lane

pgsql-patches by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Page at a time index scan
Next
From: Simon Riggs
Date:
Subject: Re: Page at a time index scan