Re: tuple radix sort - Mailing list pgsql-hackers

From Chao Li
Subject Re: tuple radix sort
Date
Msg-id DDA507E1-56DB-4D76-8B81-41660ED932B9@gmail.com
Whole thread Raw
In response to tuple radix sort  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: tuple radix sort
List pgsql-hackers

> On Oct 29, 2025, at 14:28, John Naylor <johncnaylorls@gmail.com> wrote:
>
> I suspect the challenge
> will be multikey sorts when the first key has low cardinality.

As you predicted, when the first key has very low cardinality, radix is a little bit slower. I built a test that proves
that:

```
evantest=# drop table if exists test_multi;
evantest=# create unlogged table test_multi (category int, name text);
— first column has only 4 distinct values
evantest=# insert into test_multi select (random() * 4)::int as category, md5(random()::text) || md5(random()::text) as
namefrom generate_series(1, 1000000); 
evantest=# vacuum freeze test_multi;
evantest=# select count(*) from test_multi;
evantest=# set work_mem = '64MB’;

evantest-# \timing on
Timing is on.
evantest=# set wip_radix_sort = 'off';
Time: 0.403 ms
evantest=# \o /dev/null
evantest=# select * from test_multi order by category, name;
Time: 5607.336 ms (00:05.607)
evantest=# select * from test_multi order by category, name;
Time: 5703.555 ms (00:05.704)
evantest=# select * from test_multi order by category, name;
Time: 5692.644 ms (00:05.693)

evantest=# set wip_radix_sort = 'on';
Time: 0.859 ms
evantest=# select * from test_multi order by category, name;
Time: 5822.979 ms (00:05.823)
evantest=# select * from test_multi order by category, name;
Time: 5881.256 ms (00:05.881)
evantest=# select * from test_multi order by category, name;
Time: 5976.351 ms (00:05.976)
```

Roughly 5% slower for this corner case.

However, when I recreate the test table with high cardinality first column, wip_radix_sort seems still slower:

```
evantest=# \o
evantest=# drop table if exists test_multi;
DROP TABLE
evantest=# create unlogged table test_multi (category int, name text);
CREATE TABLE
evantest=# insert into test_multi
evantest-# select (random() * 1000000)::int as category,  md5(random()::text) || md5(random()::text) as name from
generate_series(1,1000000); 
INSERT 0 1000000
evantest=# vacuum freeze test_multi;
VACUUM
evantest=# select count(*) from test_multi;
  count
---------
 1000000
(1 row)

evantest=# select * from test_multi limit 5;
 category |                               name
----------+------------------------------------------------------------------
   607050 | c555126a5afea9f5ffe3880248c89944d211bc378f8c3b6d125b4360fe8619b7
   843579 | 69b5a1dba76f52ff238566a3f88315a7425116d5d271fb54701b6e49d4afd8ce
   106298 | a96e8674db219e12463ecdbb405b99c631767972e489093045c97976c17c6561
   621860 | 5e6739ea9f533f9cdb0b8db76e3d4ce63be6b2b612c8aff06c4b80451f8f2edc
   290110 | 56944320e5abd3a854fffdd185b969727e8d414448d440725a392cda4c6355c4
(5 rows)

evantest=# \timing on
Timing is on.

evantest=# \o /dev/null
evantest=# set wip_radix_sort = 'off';
Time: 0.904 ms
evantest=# select * from test_multi limit 5;
Time: 0.983 ms
evantest=# select * from test_multi order by category, name;
Time: 593.578 ms
evantest=# select * from test_multi order by category, name;
Time: 597.329 ms
evantest=# select * from test_multi order by category, name;
Time: 600.050 ms

evantest=# set wip_radix_sort = 'on';
Time: 0.737 ms
evantest=# select * from test_multi order by category, name;
Time: 611.604 ms
evantest=# select * from test_multi order by category, name;
Time: 613.115 ms
evantest=# select * from test_multi order by category, name;
Time: 615.003 ms
```

This seems like a real regression.

Then I tried to only sort on the first column, yes, now radix is faster:

```
evantest=# set wip_radix_sort = 'off’;
evantest=# select * from test_multi order by category;
Time: 445.498 ms
evantest=# select * from test_multi order by category;
Time: 451.834 ms
evantest=# select * from test_multi order by category;
Time: 454.531 ms

evantest=# set wip_radix_sort = 'on';
Time: 0.329 ms
evantest=# select * from test_multi order by category;
Time: 402.829 ms
evantest=# select * from test_multi order by category;
Time: 408.014 ms
evantest=# select * from test_multi order by category;
Time: 415.340 ms
evantest=# select * from test_multi order by category;
Time: 413.969 ms
```

Hope the test helps. (The test was run a MacBook M4. )

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: MSVC: Improve warning options set
Next
From: John Naylor
Date:
Subject: Re: [PATCH] Refactor bytea_sortsupport(), take two