Re: Performance TODO items - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Performance TODO items
Date
Msg-id 22293.996530010@sss.pgh.pa.us
Whole thread Raw
In response to Performance TODO items  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Performance TODO items  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I have thought of a few new TODO performance items:
> 1)  Someone at O'Reilly suggested that we order our duplicate index
> entries by tid so if we are hitting the heap for lots of duplicates, the
> hits will be on sequential pages.  Seems like a nice idea.

A more general solution is for indexscan to collect up a bunch of TIDs
from the index, sort them in-memory by TID order, and then probe into
the heap with those TIDs.  This is better than the above because you get
nice ordering of the heap accesses across multiple key values, not just
among the tuples with the same key.  (In a unique or near-unique index,
the above idea is nearly worthless.)

In the best case there are few enough TIDs retrieved from the index that
you can just do this once, but even if there are lots of TIDs, it should
be a win to do this in batches of a few thousand TIDs.  Essentially we
decouple indexscans into separate index-access and heap-access phases.

One big problem is that this doesn't interact well with concurrent VACUUM:
our present solution for concurrent VACUUM assumes that indexscans hold
a pin on an index page until they've finished fetching the pointed-to
heap tuples.  Another objection is that we'd have a harder time
implementing the TODO item of marking an indextuple dead when its
associated heaptuple is dead.  Anyone see a way around these problems?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Performance TODO items
Next
From: Bruce Momjian
Date:
Subject: Re: Performance TODO items