Re: B-Tree index builds, CLUSTER, and sortsupport - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: B-Tree index builds, CLUSTER, and sortsupport
Date
Msg-id CAM3SWZReiNJ1P-GXSLzqnLvxmaoCw48G8wBHc84qWaOe_-_KCQ@mail.gmail.com
Whole thread Raw
In response to Re: B-Tree index builds, CLUSTER, and sortsupport  (Andreas Karlsson <andreas@proxel.se>)
Responses Re: B-Tree index builds, CLUSTER, and sortsupport  (Andreas Karlsson <andreas@proxel.se>)
Re: B-Tree index builds, CLUSTER, and sortsupport  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Thanks for the review.

On Wed, Nov 5, 2014 at 4:33 PM, Andreas Karlsson <andreas@proxel.se> wrote:
> I looked at the changes to the code. The new code is clean and there is more
> code re-use and improved readability. On possible further improvement would
> be to move the preparation of SortSupport to a common function since this is
> done three time in the code.

The idea there is to have more direct control of sortsupport. With the
abbreviated keys patch, abbreviation occurs based on a decision made
by tuplesort.c. I can see why you'd say that, but I prefer to keep
initialization of sortsupport structs largely concentrated in
tuplesort.c, and more or less uniform regardless of the tuple-type
being sorted.

> I did some simple benchmarks by adding indexes to temporary tables and could
> see improvements of around 10% in index build time. So it gives a nice, but
> not amazing, performance improvement.

Cool.

> Is there any case where we should expect any greater performance
> improvement?

The really compelling case is abbreviated keys - as you probably know,
there is a patch that builds on this patch, and the abbreviated keys
patch, so that B-Tree builds and CLUSTER can use abbreviated keys too.
That doesn't really have much to do with this patch, though. The
important point is that heap tuple sorting (with a query that has no
client overhead, and involves one big sort) and B-Tree index creation
both have their tuplesort as a totally dominant cost. The improvements
for each case should be quite comparable, which is good (except, as
noted in my opening e-mail, when heap/datum tuplesorting can use the
onlyKey optimization, while B-Tree/CLUSTER tuplesorting cannot).

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Repeatable read and serializable transactions see data committed after tx start
Next
From: Robert Haas
Date:
Subject: Re: [BUGS] ltree::text not immutable?