Re: [HACKERS] [PATCH] Incremental sort - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: [HACKERS] [PATCH] Incremental sort
Date
Msg-id CAPpHfdvB0pvgvEhaBVv9cZu-5MR_HcM7Gwiri8vH-X7FpiSTHw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] Incremental sort  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: [HACKERS] [PATCH] Incremental sort  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Apr 27, 2017 at 5:23 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Thu, Apr 27, 2017 at 5:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 26, 2017 at 11:39 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> But I'd like to make incremental sort not slower than quicksort in case of
> presorted data.  New idea about it comes to my mind.  Since cause of
> incremental sort slowness in this case is too frequent reset of tuplesort,
> then what if we would artificially put data in larger groups.  Attached
> revision of patch implements this: it doesn't stop to accumulate tuples to
> tuplesort until we have MIN_GROUP_SIZE tuples.
>
> Now, incremental sort is not slower than quicksort.  And this seems to be
> cool.
> However, in the LIMIT case we will pay the price of fetching some extra
> tuples from outer node.  But, that doesn't seem to hurt us too much.
>
> Any thoughts?

Nice idea.

Cool.
Than I'm going to make a set of synthetic performance tests in order to ensure that there is no regression.

Next revision of patch is attached.
This revision contains one important optimization. I found that it's not necessary to make every tuple go through prevTuple slot.  It's enough to save single sample tuple per sort group in order to compare skip columns with it.  This optimization allows to evade regression on large sort groups which I have observed.

I'm also attaching python script (incsort_test.py) which I use for synthetic performance benchmarking.  This script runs benchmarks which are similar to one posted by Heikki, but with some variations.  These benchmarks are aimed to check if there are cases when incremental sort is slower than plain sort.

This script generates tables with structure described in 'tables' array.  For generation of texts, md5 function is used.  For first GroupedCols number of table columns, groups of GroupSize equal values are generated.  Then there are columns which values are just sequential. In the last column have PreorderedFrac fraction of sequential values and rest of values are random. Therefore, we can measure influence of presorted optimization in qsort with various fractions of presorted data.  Also there is btree index which covers all the columns of that table.

The benchmark query select contents of generated table order by grouped columns and by last column.  Index only scan outputs tuples ordered by grouped columns, and incremental sort have to perform sorting inside those groups.  Plain sort case is forced to also use index only scans, in order to compare sort methods not scan methods.

Results are also attached (results.csv).  Last column contains difference between incremental and plain sort time in percents.  Negative value mean that incremental sort is faster in this case.

Incremental sort is faster in vast majority of cases.  It appears to be slower only when whose dataset is one sort group.  In this case incremental sort is useless, and it should be considered as misuse of incremental sort.  Slowdown is related to the fact that we anyway have to do extra comparisons, unless we somehow push our comparison result into qsort itself and save some cpu cycles (but that would be unreasonable break of encapsulation).  Thus, in such cases regression seems to be inevitable anyway.  I think we could evade this regression during query planning.  If we see that there would be only few groups, we should choose plain sort instead of incremental sort.

Any thoughts?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw