Re: A qsort template - Mailing list pgsql-hackers

From John Naylor
Subject Re: A qsort template
Date
Msg-id CAFBsxsGzW5zTEGP6HPPh9vxkxJd_6kYN1-mRi6bKy+nM+oNpdA@mail.gmail.com
Whole thread Raw
In response to Re: A qsort template  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: A qsort template
Re: A qsort template
Re: A qsort template
List pgsql-hackers

On Sun, Jul 4, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> Since you are experimenting with tuplesort and likely thinking similar
> thoughts, here's a patch I've been using to explore that area.  I've
> seen it get, for example, ~1.18x speedup for simple index builds in
> favourable winds (YMMV, early hacking results only).  Currently, it
> kicks in when the leading column is of type int4, int8, timestamp,
> timestamptz, date or text + friends (when abbreviatable, currently
> that means "C" and ICU collations only), while increasing the
> executable by only 8.5kB (Clang, amd64, -O2, no debug).
>
> These types are handled with just three specialisations.  Their custom
> "fast" comparators all boiled down to comparisons of datum bits,
> varying only in signedness and width, so I tried throwing them away
> and using 3 new common routines.  Then I extended
> tuplesort_sort_memtuples()'s pre-existing specialisation dispatch to
> recognise qualifying users of those and select 3 corresponding sort
> specialisations.

I got around to getting a benchmark together to serve as a starting point. I based it off something I got from the archives, but don't remember where (I seem to remember Tomas Vondra wrote the original, but not sure). To start I just used types that were there already -- int, text, numeric. The latter two won't be helped by this patch, but I wanted to keep something like that so we can see what kind of noise variation there is. I'll probably cut text out in the future and just keep numeric for that purpose.

I've attached both the script and a crude spreadsheet. I'll try to figure out something nicer for future tests, and maybe some graphs. The "comparison" sheet has the results side by side (min of five). There are 6 distributions of values:
- random
- sorted
- "almost sorted"
- reversed
- organ pipe (first half ascending, second half descending)
- rotated (sorted but then put the smallest at the end)
- random 0s/1s

I included both "select a" and "select *" to make sure we have the recent datum sort optimization represented. The results look pretty good for ints -- about the same speed up master gets going from tuple sorts to datum sorts, and those got faster in turn also.

Next I think I'll run microbenchmarks on int64s with the test harness you attached earlier, and experiment with the qsort parameters a bit.

I'm also attaching your tuplesort patch so others can see what exactly I'm comparing.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: Slightly improve initdb --sync-only option's help message
Next
From: Michael Paquier
Date:
Subject: Re: Replace l337sp34k in comments.