Re: pgstattuple extension for indexes - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: pgstattuple extension for indexes
Date
Msg-id 20060817195419.GD21363@pervasive.com
Whole thread Raw
In response to Re: pgstattuple extension for indexes  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: pgstattuple extension for indexes
List pgsql-hackers
On Thu, Aug 17, 2006 at 02:23:48PM +0200, Martijn van Oosterhout wrote:
> On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
> > I think this condition should be regarded as full-fragmented,
> > but pgstatindex reports that the leaf_fragmentation is 50%.
> > 
> > Presently, fragmentation factor is computed as the code below:
> > 
> >     if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
> >         stat->fragments++;
> > 
> > But the method has the above problem. So I suggest to use whether
> > the right link points to the next adjacent page or not.
> > 
> >     if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
> >         stat->fragments++;
> > 
> > Do you think which method is better? Or do you have other ideas?
> 
> If we do it your way, then every index will probably get a
> fragmentation of nearly 100%. That's not very useful. I don't agree
> that your example is fully fragmented, since on the first read the OS
> will read the next four (or more) blocks so all the others are
> zero-cost.

Ok, fine... expand the example out to an index that's not trivial in
size. Even with read-ahead, once you get to a few megs (which is
obviously not that big), you're seeking.

> A more useful definition of fragmentation would be: if you're scanning
> forward through an index, how often do you have to jump backwards
> again. The current calculation seems to get that fairly right...

Well, what really matters is how often you'll need to seek (either
intra-track or inter-track). Any time you need to seek you're hit with a
pretty serious penalty. And until we have an asyncronous prefetch
process, a forward seek will most likely be just as expensive as a
backwards seek, because by the time the data winds it's way from the
drive back to PostgreSQL and the next read request winds it's way back
down to the drive the data you wanted has probably flown past the head.
Granted, I'm ignoring OS read-ahead here, but in a heavily fragmented
index that you're actually reading off disk (ie: it's not trivially
small), that read-ahead isn't likely to help you too much.

Given all that, I'd argue that it's best to consider any page that isn't
in-order as another fragment.

On another note, now that scans happen at a per-page level, does that
make some kind of online index clustering command a possibility? Another
thought that comes to mind is putting enough brains in the indexes or
the FSM to request free pages that are in a specific region of the index
file. That would allow things to stay less fragmented. Of course a
similar method could be used to try and maintain a table heap in cluster
order, and I suspect that method would probably be a lot easier to
impliment.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BugTracker (Was: Re: 8.2 features status)
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Going for "all green" buildfarm results