Re: Sort functions with specialized comparators - Mailing list pgsql-hackers

From Stepan Neretin
Subject Re: Sort functions with specialized comparators
Date
Msg-id CAN-sa+Dug7-WmT+8Gnoesk8j_pHpE_Ebm59zjOhTvQtiKusnyg@mail.gmail.com
Whole thread Raw
In response to Re: Sort functions with specialized comparators  (Stepan Neretin <sncfmgg@gmail.com>)
Responses Re: Sort functions with specialized comparators
List pgsql-hackers


On Sat, Jun 8, 2024 at 1:50 AM Stepan Neretin <sncfmgg@gmail.com> wrote:
Hello all.

I am interested in the proposed patch and would like to propose some additional changes that would complement it. My changes would introduce similar optimizations when working with a list of integers or object identifiers. Additionally, my patch includes an extension for benchmarking, which shows an average speedup of 30-40%.

postgres=# SELECT bench_oid_sort(1000000);
                                                 bench_oid_sort                                                
----------------------------------------------------------------------------------------------------------------
 Time taken by list_sort: 116990848 ns, Time taken by list_oid_sort: 80446640 ns, Percentage difference: 31.24%
(1 row)

postgres=# SELECT bench_int_sort(1000000);
                                                 bench_int_sort                                                
----------------------------------------------------------------------------------------------------------------
 Time taken by list_sort: 118168506 ns, Time taken by list_int_sort: 80523373 ns, Percentage difference: 31.86%
(1 row)

What do you think about these changes?

Best regards, Stepan Neretin.

On Fri, Jun 7, 2024 at 11:08 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
Hi!

In a thread about sorting comparators[0] Andres noted that we have infrastructure to help compiler optimize sorting. PFA attached PoC implementation. I've checked that it indeed works on the benchmark from that thread.

postgres=# CREATE TABLE arrays_to_sort AS
   SELECT array_shuffle(a) arr
   FROM
       (SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
       generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber, Oid etc. But this sorting routines never show up in perf top or something like that.

Seems like in most cases we do not spend much time in sorting. But specialization does not cost us much too, only some CPU cycles of a compiler. I think we can further improve speedup by converting inline comparator to value extractor: more compilers will see what is actually going on. But I have no proofs for this reasoning.

What do you think?


Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/20240209184014.sobshkcsfjix6u4r%40awork3.anarazel.de#fc23df2cf314bef35095b632380b4a59

Hello all.

I have decided to explore more areas in which I can optimize and have added two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
                                                bench_int16_sort                                                
-----------------------------------------------------------------------------------------------------------------
 Time taken by usual sort: 66354981 ns, Time taken by optimized sort: 52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
                                                bench_float8_sort                                                
------------------------------------------------------------------------------------------------------------------
 Time taken by usual sort: 121475231 ns, Time taken by optimized sort: 74458545 ns, Percentage difference: 38.70%
(1 row)

postgres=#
 
Best regards, Stepan Neretin.
Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Improve the granularity of PQsocketPoll's timeout parameter?
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: relfilenode statistics