Re: tuple radix sort - Mailing list pgsql-hackers
| From | Chao Li |
|---|---|
| Subject | Re: tuple radix sort |
| Date | |
| Msg-id | 72B9451B-0121-491E-A702-BE8AAEFDB80E@gmail.com Whole thread Raw |
| In response to | Re: tuple radix sort (John Naylor <johncnaylorls@gmail.com>) |
| List | pgsql-hackers |
> On Oct 29, 2025, at 19:29, John Naylor <johncnaylorls@gmail.com> wrote: > > On Wed, Oct 29, 2025 at 3:25 PM Chao Li <li.evan.chao@gmail.com> wrote: >>> On Oct 29, 2025, at 14:28, John Naylor <johncnaylorls@gmail.com> wrote: >>> >>> I suspect the challenge >>> will be multikey sorts when the first key has low cardinality. >> >> As you predicted, when the first key has very low cardinality, radix is a little bit slower. I built a test that provesthat: >> >> ``` >> evantest=# drop table if exists test_multi; >> evantest=# create unlogged table test_multi (category int, name text); >> — first column has only 4 distinct values > > Thanks for testing. Note it's actually 5 because of rounding. Yes, 0-4, totally 5. > Your > text also seems to have em-dashes and unicode apostrophes where it > should have dashes / single quotes. That's not great if you expect > others to try to reproduce. I just copied the content from psql (running in iTerm). I did a Google search, and found that was because of Mac Mail’s “smartquotes” substitution. Looks like even I manually type in a pair of single quotes, it still does the substitution. Iwill try to see how to disable that, but I don’t want to switch to another mail app. > I'm also not thrilled about having to > remove your psql prompt. > I just wanted to show my entire test process, so I simply copied all contents from psql. In future, I will remove psql promptsfrom reproduce procedure. > drop table if exists test_multi; > create unlogged table test_multi (category int, name text); > insert into test_multi select (random() * 4)::int as category, > md5(random()::text) || md5(random()::text) as name from > generate_series(1, 1000000); > vacuum freeze test_multi; > > Anyway, because this table is larger than my first example, the input > no longer fits into 64MB of work_mem and it switches to an external > merge sort. Normally I set work_mem to 1GB for testing sorts so I > don't have to think about it, but neglected to in my first email. I changed work_men to 1GB and reran the test. As the high cardinality data are still there, so I first reran with data: ``` evantest=# set work_mem = '1GB'; Time: 0.301 ms evantest=# evantest=# select * from test_multi order by category, name; Time: 575.247 ms evantest=# select * from test_multi order by category, name; Time: 554.351 ms evantest=# select * from test_multi order by category, name; Time: 565.100 ms evantest=# evantest=# set wip_radix_sort = 'on'; Time: 0.752 ms evantest=# select * from test_multi order by category, name; Time: 558.057 ms evantest=# select * from test_multi order by category, name; Time: 565.542 ms evantest=# select * from test_multi order by category, name; Time: 559.973 ms ``` With radix_sort on and off, execution time are almost the same. Then I restore the data to low cardinality, off is still faster than on: ``` evantest=# set wip_radix_sort = ‘off'; Time: 0.549 ms evantest=# select * from test_multi order by category, name; Time: 5509.075 ms (00:05.509) evantest=# select * from test_multi order by category, name; Time: 5553.566 ms (00:05.554) evantest=# select * from test_multi order by category, name; Time: 5598.595 ms (00:05.599) evantest=# set wip_radix_sort = ‘on'; Time: 0.786 ms evantest=# evantest=# select * from test_multi order by category, name; Time: 5770.964 ms (00:05.771) evantest=# select * from test_multi order by category, name; Time: 5779.755 ms (00:05.780) evantest=# select * from test_multi order by category, name; Time: 5851.134 ms (00:05.851) evantest=# evantest=# set work_mem = '2GB’; # increasing work_mem to 2GB doesn’t help Time: 0.404 ms evantest=# evantest=# select * from test_multi order by category, name; Time: 5781.005 ms (00:05.781) evantest=# select * from test_multi order by category, name; Time: 5826.025 ms (00:05.826) evantest=# select * from test_multi order by category, name; Time: 5937.919 ms (00:05.938) ``` Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
pgsql-hackers by date: