Re: tuple radix sort - Mailing list pgsql-hackers
| From | John Naylor |
|---|---|
| Subject | Re: tuple radix sort |
| Date | |
| Msg-id | CANWCAZYUsV1no_JejqD6na-6adj6e2SD2-aQEWzRgV=LtpsR5g@mail.gmail.com Whole thread Raw |
| In response to | Re: tuple radix sort (Chao Li <li.evan.chao@gmail.com>) |
| Responses |
Re: tuple radix sort
|
| List | pgsql-hackers |
On Wed, Oct 29, 2025 at 3:25 PM Chao Li <li.evan.chao@gmail.com> wrote: > > On Oct 29, 2025, at 14:28, John Naylor <johncnaylorls@gmail.com> wrote: > > > > I suspect the challenge > > will be multikey sorts when the first key has low cardinality. > > As you predicted, when the first key has very low cardinality, radix is a little bit slower. I built a test that provesthat: > > ``` > evantest=# drop table if exists test_multi; > evantest=# create unlogged table test_multi (category int, name text); > — first column has only 4 distinct values Thanks for testing. Note it's actually 5 because of rounding. Your text also seems to have em-dashes and unicode apostrophes where it should have dashes / single quotes. That's not great if you expect others to try to reproduce. I'm also not thrilled about having to remove your psql prompt. drop table if exists test_multi; create unlogged table test_multi (category int, name text); insert into test_multi select (random() * 4)::int as category, md5(random()::text) || md5(random()::text) as name from generate_series(1, 1000000); vacuum freeze test_multi; Anyway, because this table is larger than my first example, the input no longer fits into 64MB of work_mem and it switches to an external merge sort. Normally I set work_mem to 1GB for testing sorts so I don't have to think about it, but neglected to in my first email. I don't know if that explains the disparity, but I've made that change for my quick tests below. > evantest=# \o /dev/null > evantest=# select * from test_multi order by category, name; [...] > Roughly 5% slower for this corner case. Seems fine for me on this old Intel laptop (min of 5 runs): set wip_radix_sort = 'off'; 2368.369 set wip_radix_sort = 'on'; 2290.654 It's close enough that I'll want to test more closely at a range of low-cardinality inputs. I haven't done any rigorous scripted testing yet, so take this with a grain of salt. > However, when I recreate the test table with high cardinality first column, wip_radix_sort seems still slower: drop table if exists test_multi; create unlogged table test_multi (category int, name text); insert into test_multi select (random() * 1000000)::int as category, md5(random()::text) || md5(random()::text) as name from generate_series(1, 1000000); vacuum freeze test_multi; > evantest=# set wip_radix_sort = 'off'; > Time: 0.904 ms > evantest=# select * from test_multi order by category, name; > Time: 593.578 ms > evantest=# select * from test_multi order by category, name; > Time: 597.329 ms > evantest=# select * from test_multi order by category, name; > Time: 600.050 ms > > evantest=# set wip_radix_sort = 'on'; > Time: 0.737 ms > evantest=# select * from test_multi order by category, name; > Time: 611.604 ms > evantest=# select * from test_multi order by category, name; > Time: 613.115 ms > evantest=# select * from test_multi order by category, name; > Time: 615.003 ms > ``` > > This seems like a real regression. It's better for me here (min of 5 again), although the time scanning the table probably dominates: off: 536.257 on: 471.345 -- John Naylor Amazon Web Services
pgsql-hackers by date: