Home > mailing lists

Re: B-Tree index builds, CLUSTER, and sortsupport - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: B-Tree index builds, CLUSTER, and sortsupport
Date	November 6, 2014 04:30:53
Msg-id	CAM3SWZReiNJ1P-GXSLzqnLvxmaoCw48G8wBHc84qWaOe_-_KCQ@mail.gmail.com Whole thread Raw
In response to	Re: B-Tree index builds, CLUSTER, and sortsupport (Andreas Karlsson <andreas@proxel.se>)
Responses	Re: B-Tree index builds, CLUSTER, and sortsupport (Andreas Karlsson <andreas@proxel.se>) Re: B-Tree index builds, CLUSTER, and sortsupport (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

Thanks for the review.

On Wed, Nov 5, 2014 at 4:33 PM, Andreas Karlsson <andreas@proxel.se> wrote:
> I looked at the changes to the code. The new code is clean and there is more
> code re-use and improved readability. On possible further improvement would
> be to move the preparation of SortSupport to a common function since this is
> done three time in the code.

The idea there is to have more direct control of sortsupport. With the
abbreviated keys patch, abbreviation occurs based on a decision made
by tuplesort.c. I can see why you'd say that, but I prefer to keep
initialization of sortsupport structs largely concentrated in
tuplesort.c, and more or less uniform regardless of the tuple-type
being sorted.

> I did some simple benchmarks by adding indexes to temporary tables and could
> see improvements of around 10% in index build time. So it gives a nice, but
> not amazing, performance improvement.

Cool.

> Is there any case where we should expect any greater performance
> improvement?

The really compelling case is abbreviated keys - as you probably know,
there is a patch that builds on this patch, and the abbreviated keys
patch, so that B-Tree builds and CLUSTER can use abbreviated keys too.
That doesn't really have much to do with this patch, though. The
important point is that heap tuple sorting (with a query that has no
client overhead, and involves one big sort) and B-Tree index creation
both have their tuplesort as a totally dominant cost. The improvements
for each case should be quite comparable, which is good (except, as
noted in my opening e-mail, when heap/datum tuplesorting can use the
onlyKey optimization, while B-Tree/CLUSTER tuplesorting cannot).

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Jim Nasby
Date: 06 November 2014, 04:06:41
Subject: Re: Repeatable read and serializable transactions see data committed after tx start

From: Robert Haas
Date: 06 November 2014, 04:38:42
Subject: Re: [BUGS] ltree::text not immutable?

Re: B-Tree index builds, CLUSTER, and sortsupport - Mailing list pgsql-hackers

Previous

Next