Home > mailing lists

Re: A qsort template - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: A qsort template
Date	July 30, 2021 00:34:01
Msg-id	CAFBsxsGzW5zTEGP6HPPh9vxkxJd_6kYN1-mRi6bKy+nM+oNpdA@mail.gmail.com Whole thread
In response to	Re: A qsort template (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: A qsort template Re: A qsort template Re: A qsort template
List	pgsql-hackers

Tree view

On Sun, Jul 4, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> Since you are experimenting with tuplesort and likely thinking similar
> thoughts, here's a patch I've been using to explore that area. I've
> seen it get, for example, ~1.18x speedup for simple index builds in
> favourable winds (YMMV, early hacking results only). Currently, it
> kicks in when the leading column is of type int4, int8, timestamp,
> timestamptz, date or text + friends (when abbreviatable, currently
> that means "C" and ICU collations only), while increasing the
> executable by only 8.5kB (Clang, amd64, -O2, no debug).
>
> These types are handled with just three specialisations. Their custom
> "fast" comparators all boiled down to comparisons of datum bits,
> varying only in signedness and width, so I tried throwing them away
> and using 3 new common routines. Then I extended
> tuplesort_sort_memtuples()'s pre-existing specialisation dispatch to
> recognise qualifying users of those and select 3 corresponding sort
> specialisations.

I got around to getting a benchmark together to serve as a starting point. I based it off something I got from the archives, but don't remember where (I seem to remember Tomas Vondra wrote the original, but not sure). To start I just used types that were there already -- int, text, numeric. The latter two won't be helped by this patch, but I wanted to keep something like that so we can see what kind of noise variation there is. I'll probably cut text out in the future and just keep numeric for that purpose.

I've attached both the script and a crude spreadsheet. I'll try to figure out something nicer for future tests, and maybe some graphs. The "comparison" sheet has the results side by side (min of five). There are 6 distributions of values:

- random

- sorted

- "almost sorted"

- reversed

- organ pipe (first half ascending, second half descending)

- rotated (sorted but then put the smallest at the end)

- random 0s/1s

I included both "select a" and "select *" to make sure we have the recent datum sort optimization represented. The results look pretty good for ints -- about the same speed up master gets going from tuple sorts to datum sorts, and those got faster in turn also.

Next I think I'll run microbenchmarks on int64s with the test harness you attached earlier, and experiment with the qsort parameters a bit.

I'm also attaching your tuplesort patch so others can see what exactly I'm comparing.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

From: "Bossart, Nathan"
Date: 29 July 2021, 23:37:56
Subject: Re: Slightly improve initdb --sync-only option's help message

From: Michael Paquier
Date: 30 July 2021, 00:48:47
Subject: Re: Replace l337sp34k in comments.

Re: A qsort template - Mailing list pgsql-hackers

Attachment

Previous

Next