An implementation of multi-key sort - Mailing list pgsql-hackers

From Wang Yao
Subject An implementation of multi-key sort
Date
Msg-id PH7P220MB1533DA211DF219996760CBB7D9EB2@PH7P220MB1533.NAMP220.PROD.OUTLOOK.COM
Whole thread Raw
Responses Re: An implementation of multi-key sort
List pgsql-hackers
Hi hackers,

I'd submit an implementation of multi-key sort for review. Please see the
code as attachment. Thanks for your reponse in advance.


Overview
--------

MKsort (multi-key sort) is an alternative of standard qsort algorithm,
which has better performance for particular sort scenarios, i.e. the data
set has multiple keys to be sorted.

The implementation is based on the paper:
Jon L. Bentley and Robert Sedgewick, "Fast Algorithms for Sorting and
Searching Strings", Jan 1997 [1]

MKsort is applied only for tuple sort by the patch. Theoretically it can
be applied for general-purpose sort scenario when there are multiple sort
keys available, but it is relatively difficult in practice because kind of
unique interface is needed to manipulate the keys. So I limit the usage of
mksort to sort SortTuple.

Comparing to classic quick sort, it can get significant performance
improvement once multiple keys are available. A rough test shows it got
~129% improvement than qsort for ORDER BY on 6 keys, and ~52% for CREATE
INDEX on the same data set. (See more details in section "Performance
Test")

Author: Yao Wang <yaowangm@outlook.com>
Co-author: Hongxu Ma <interma@outlook.com>

Scope
-----

The interface of mksort is pretty simple: in tuplesort_sort_memtuples(),
mksort_tuple() is invoked instead of qsort_tuple() if mksort is applicable.
The major logic in mksort_tuple() is to apply mksort algorithm on
SortTuple, and kind of callback mechanism is used to handle
sort-variant-specific issue, e.g. comparing different datums, like
qsort_tuple() does. It also handles the complexity of "abbreviated keys".

A small difference from classic mksort algorithm is: for IndexTuple, when
all the columns are equal, an additional comparing based on ItemPointer
is performed to determine the order. It is to make the result consistent
to existing qsort.

I did consider about implementing mksort by the approach of kind of
template mechanism like qsort (see sort_template.h), but it seems
unnecessary because all concrete tuple types need to be handled are
derived from SortTuple. Use callback to isolate type specific features
is good enough.

Note that not all tuple types are supported by mksort. Please see the
comments inside tuplesort_sort_memtuples().

Test Cases
----------

The changes of test cases include:

* Generally, mksort should generate result exactly same to qsort. However
some test cases don't. The reason is that SQL doesn't specify order on
all possible columns, e.g. "select c1, c2 from t1 order by c1" will
generate different results between mksort/qsort when c1 values are equal,
and the solution is to order c2 as well ("select c1, c2 from t1 order by
c1, c2"). (e.g. geometry)
* Some cases need to be updated to display the new sort method "multi-key
sort" in explain result. (e.g. incremental_sort)
* regress/tuplesort was updated with new cases to cover some scenarios of
mksort.

Performance Test
----------------

The script I used to configure the build:

CFLAGS="-O3 -fargument-noalias-global -fno-omit-frame-pointer -g"
./configure --prefix=$PGHOME --with-pgport=5432 --with-perl --with-openssl
--with-python --with-pam --with-blocksize=16 --with-wal-blocksize=16
--with-perl --enable-tap-tests --with-gssapi --with-ldap

I used the script for a rough test for ORDER BY:

\timing on
create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100));
insert into t1 values (generate_series(1,499999), 0, 0, 0, 0,
  'aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbb');
update t1 set c2 = c1 % 100, c3 = c1 % 50, c4 = c1 % 10, c5 = c1 % 3;
update t1 set c6 = 'aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbb'
|| (c1 % 5)::text;

-- Use a large work mem to ensure the entire sort happens in memory
set work_mem='1GB';

-- switch between qsort/mksort
set enable_mk_sort=off;

explain analyze select c1 from t1 order by c6, c5, c4, c3, c2, c1;

Results:

mksort:
1341.283 ms (00:01.341)
1379.098 ms (00:01.379)
1369.868 ms (00:01.370)

qsort:
3137.277 ms (00:03.137)
3147.771 ms (00:03.148)
3131.887 ms (00:03.132)

The perf improvement is ~129%.

Another perf test for CREATE INDEX:

create index idx_t1_mk on t3 (c6, c5, c4, c3, c2, c1);

Results:

mksort:
1147.207 ms (00:01.147)
1200.501 ms (00:01.201)
1235.657 ms (00:01.236)

Qsort:
1852.957 ms (00:01.853)
1824.209 ms (00:01.824)
1808.781 ms (00:01.809)

The perf improvement is ~52%.

Another test is to use one of queries of TPC-H:

set work_mem='1GB';

-- query rewritten from TPCH-Q1, and there are 6001215 rows in lineitem
explain analyze select
 l_returnflag,l_linestatus,l_quantity,l_shipmode
from
 lineitem
where
 l_shipdate <= date'1998-12-01' - interval '65 days'
order by
 l_returnflag,l_linestatus,l_quantity,l_shipmode;

Result:

Qsort:
14582.626 ms
14524.188 ms
14524.111 ms

mksort:
11390.891 ms
11647.065 ms
11546.791 ms

The perf improvement is ~25.8%.

[1] https://www.cs.tufts.edu/~nr/cs257/archive/bob-sedgewick/fast-strings.pdf
[2] https://www.tpc.org/tpch/


Thanks,

Yao Wang
Attachment

pgsql-hackers by date:

Previous
From: Nikita Malakhov
Date:
Subject: Re: Shared detoast Datum proposal
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: zlib detection in Meson on Windows broken?