Re: A qsort template - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: A qsort template
Date
Msg-id CAEudQAozhKrf278ye1B43mkqy02rso6dnCyeQum1uftPRVR2Sw@mail.gmail.com
Whole thread Raw
In response to Re: A qsort template  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: A qsort template  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
Em qui., 29 de jul. de 2021 às 21:34, John Naylor <john.naylor@enterprisedb.com> escreveu:

On Sun, Jul 4, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> Since you are experimenting with tuplesort and likely thinking similar
> thoughts, here's a patch I've been using to explore that area.  I've
> seen it get, for example, ~1.18x speedup for simple index builds in
> favourable winds (YMMV, early hacking results only).  Currently, it
> kicks in when the leading column is of type int4, int8, timestamp,
> timestamptz, date or text + friends (when abbreviatable, currently
> that means "C" and ICU collations only), while increasing the
> executable by only 8.5kB (Clang, amd64, -O2, no debug).
>
> These types are handled with just three specialisations.  Their custom
> "fast" comparators all boiled down to comparisons of datum bits,
> varying only in signedness and width, so I tried throwing them away
> and using 3 new common routines.  Then I extended
> tuplesort_sort_memtuples()'s pre-existing specialisation dispatch to
> recognise qualifying users of those and select 3 corresponding sort
> specialisations.

I got around to getting a benchmark together to serve as a starting point. I based it off something I got from the archives, but don't remember where (I seem to remember Tomas Vondra wrote the original, but not sure). To start I just used types that were there already -- int, text, numeric. The latter two won't be helped by this patch, but I wanted to keep something like that so we can see what kind of noise variation there is. I'll probably cut text out in the future and just keep numeric for that purpose.

I've attached both the script and a crude spreadsheet. I'll try to figure out something nicer for future tests, and maybe some graphs. The "comparison" sheet has the results side by side (min of five). There are 6 distributions of values:
- random
- sorted
- "almost sorted"
- reversed
- organ pipe (first half ascending, second half descending)
- rotated (sorted but then put the smallest at the end)
- random 0s/1s

I included both "select a" and "select *" to make sure we have the recent datum sort optimization represented. The results look pretty good for ints -- about the same speed up master gets going from tuple sorts to datum sorts, and those got faster in turn also.

Next I think I'll run microbenchmarks on int64s with the test harness you attached earlier, and experiment with the qsort parameters a bit.

I'm also attaching your tuplesort patch so others can see what exactly I'm comparing.
The patch attached does not apply cleanly,
please can fix it?

error: patch failed: src/backend/utils/sort/tuplesort.c:4776
error: src/backend/utils/sort/tuplesort.c: patch does not apply

regards,
Ranier Vilela

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: pg_upgrade does not upgrade pg_stat_statements properly
Next
From: Zhihong Yu
Date:
Subject: Re: Segment fault when excuting SPI function On PG with commit 41c6a5be