Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZQHJS=Bg9esOZo7V3nXTjEiAMyjVmjJyk-feLz13m6b8g@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Sun, Dec 6, 2015 at 3:59 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> My column has the format of ABC-123-456-789-0
>
> The name-space identifier ("ABC-")  is the same in 99.99% of the
> cases.  And to date, as well as in my extrapolation, the first two
> digits of the numeric part are leading zeros and the third one is
> mostly 0,1,2.  So the first 8 bytes really have less than 2 bits worth
> of information.  So yeah, not surprising abbreviation was not useful.

I think that given you're using the "C" collation, abbreviation should
still go ahead. I posted a patch to do that, which I need to further
justify per Robert's request (currently, we do nothing special based
on collation). Abbreviation should help in surprisingly marginal
cases, since far fewer memory accesses will be required in the early
stages of the sort with only (say) 5 distinct abbreviated keys. Once
abbreviated comparisons start to not help at all (with quicksort, at
some partition), there's a good chance that the full keys can be
reused to some extent, before being evicted from CPU caches.

>> BTW, roughly what does this CREATE INDEX look like? Is it a composite
>> index, for example?
>
> Nope, just a single column index.  In the extrapolated data set, each
> distinct value shows up a couple hundred times on average.  I'm
> thinking of converting it to a btree_gin index once I've tested them a
> bit more, as the compression benefits are substantial.

Unfortunately, that cannot use tuplesort.c at all.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Using quicksort for every external sort run
Next
From: Peter Geoghegan
Date:
Subject: Re: Using quicksort for every external sort run