Re: Page at a time index scan - Mailing list pgsql-patches

From Simon Riggs
Subject Re: Page at a time index scan
Date
Msg-id 1147100795.3468.316.camel@localhost.localdomain
Whole thread Raw
In response to Re: Page at a time index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Page at a time index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
On Mon, 2006-05-08 at 10:18 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I read your earlier post about needing to lock everything and spent some
> > time thinking about this. The issue of needing to lock everything means
> > that we would never be able to do a partial vacuum of an index i.e.
> > remove one page without a scan. I'm more concerned about making partial
> > vacuum work than I am about speeding up an all-block vacuum.
>
> [ shrug... ] That's an illusory goal anyway.  Retail tuple removal is
> just too inefficient.  (No, I don't believe in that proposed patch.)

Current VACUUM optimizes for the case where random updates/deletes
occur. If there are hotspots then scanning the whole relation is O(N) or
even O(N^2) if we have to scan the indexes multiple times.

We think we have a way to improve heap VACUUMs (bitmaps...) but we are
still looking for an equivalent for indexes.

> > My thinking was to write the blockid of the original left hand page, so
> > as to record the original ancestor since split. Thus if multiple splits
> > occurred, then the original ancestor blockid would remain on record
> > until VACUUM. In more detail: When we split a page if the ancestor
> > blockid is not set, then we set it to be the blockid of the old page
> > (new left hand page). If the ancestor blockid is already set then we
> > copy that to the new right hand page. Every split will write a value to
> > BTPageOpaqueData, though the values to use are already available without
> > extra work.
>
> Doesn't work, at least not for making it possible to vacuum part of the
> index.  The conflicting indexscan could have stopped on a page, and then
> that page could have split, before your "partial vacuum" ever started.
> So tracing back only as far as the data has split since vacuum started
> is not enough to prevent conflict.

That wasn't the proposal. Every split would be marked and stay marked
until those blocks were VACUUMed. The data used to mark is readily
available and doesn't rely on whether or not VACUUM is running.
IMHO this does work.

> (The other little problem is that we'd have to enlarge the BTOpaque
> overhead, because a block id doesn't fit in the available 16 bits.)

ISTM it is indeed a little problem. After CREATE INDEX, most index pages
don't stay completely full, so worrying about alignment wastage might
very occasionally save an I/O, but not enough for us to care.

> > I'm not very happy about an extra lock during page splitting, which adds
> > a performance hit even for tables that never will need regular vacuuming
> > (apart from occaisional wrap-around avoidance).
>
> While I'd rather not have done that, I don't believe that it makes for
> any material performance degradation.  Normal splits all take the lock
> in shared mode and hence suffer no contention.  Your proposal wouldn't
> make for less locking anyway, since it still assumes that there's a way
> to tell whether vacuum is active for a given index, which is just about
> the same amount of overhead as the code-as-committed.

The proposed scheme doesn't rely on knowing whether a VACUUM is active
or not.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


pgsql-patches by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: [PATCH] Magic block for modules
Next
From: Tom Lane
Date:
Subject: Re: Page at a time index scan