Re: Sort functions with specialized comparators - Mailing list pgsql-hackers

From Stepan Neretin
Subject Re: Sort functions with specialized comparators
Date
Msg-id CAN-sa+DxYFpzo99sFiRoBXnA2-oVu=JnK=8MPvV31EsVSPa-1A@mail.gmail.com
Whole thread Raw
In response to Re: Sort functions with specialized comparators  (Stepan Neretin <sncfmgg@gmail.com>)
List pgsql-hackers


On Mon, Jul 15, 2024 at 5:47 PM Stepan Neretin <sncfmgg@gmail.com> wrote:


On Mon, Jul 15, 2024 at 4:52 PM Stepan Neretin <sncfmgg@gmail.com> wrote:


On Mon, Jul 15, 2024 at 12:23 PM Антуан Виолин <violin.antuan@gmail.com> wrote:

Hello all.

I have decided to explore more areas in which I can optimize and have added
two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)


 Hello all
We would like to see the relationship between the length of the sorted array and the performance gain, perhaps some graphs. We also want to see to set a "worst case" test, sorting the array in ascending order when it is initially descending

Best, regards, Antoine Violin

postgres=#


On Mon, Jul 15, 2024 at 10:32 AM Stepan Neretin <sncfmgg@gmail.com> wrote:


On Sat, Jun 8, 2024 at 1:50 AM Stepan Neretin <sncfmgg@gmail.com> wrote:
Hello all.

I am interested in the proposed patch and would like to propose some additional changes that would complement it. My changes would introduce similar optimizations when working with a list of integers or object identifiers. Additionally, my patch includes an extension for benchmarking, which shows an average speedup of 30-40%.

postgres=# SELECT bench_oid_sort(1000000);
                                                 bench_oid_sort                                                
----------------------------------------------------------------------------------------------------------------
 Time taken by list_sort: 116990848 ns, Time taken by list_oid_sort: 80446640 ns, Percentage difference: 31.24%
(1 row)

postgres=# SELECT bench_int_sort(1000000);
                                                 bench_int_sort                                                
----------------------------------------------------------------------------------------------------------------
 Time taken by list_sort: 118168506 ns, Time taken by list_int_sort: 80523373 ns, Percentage difference: 31.86%
(1 row)

What do you think about these changes?

Best regards, Stepan Neretin.

On Fri, Jun 7, 2024 at 11:08 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
Hi!

In a thread about sorting comparators[0] Andres noted that we have infrastructure to help compiler optimize sorting. PFA attached PoC implementation. I've checked that it indeed works on the benchmark from that thread.

postgres=# CREATE TABLE arrays_to_sort AS
   SELECT array_shuffle(a) arr
   FROM
       (SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
       generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber, Oid etc. But this sorting routines never show up in perf top or something like that.

Seems like in most cases we do not spend much time in sorting. But specialization does not cost us much too, only some CPU cycles of a compiler. I think we can further improve speedup by converting inline comparator to value extractor: more compilers will see what is actually going on. But I have no proofs for this reasoning.

What do you think?


Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/20240209184014.sobshkcsfjix6u4r%40awork3.anarazel.de#fc23df2cf314bef35095b632380b4a59

Hello all.

I have decided to explore more areas in which I can optimize and have added two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
                                                bench_int16_sort                                                
-----------------------------------------------------------------------------------------------------------------
 Time taken by usual sort: 66354981 ns, Time taken by optimized sort: 52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
                                                bench_float8_sort                                                
------------------------------------------------------------------------------------------------------------------
 Time taken by usual sort: 121475231 ns, Time taken by optimized sort: 74458545 ns, Percentage difference: 38.70%
(1 row)

postgres=#
 
Best regards, Stepan Neretin.


I run benchmark with my patches:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.609 ms
initial connection time = 24.080 ms
tps = 6214.244789 (without initial connection time)

and without:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.731 ms
initial connection time = 15.177 ms
tps = 5776.173285 (without initial connection time)

tps with my patches increase. What do you think?

Best regards, Stepan Neretin.

I implement reverse benchmarks:

postgres=# SELECT bench_oid_reverse_sort(1000);
                                          bench_oid_reverse_sort                                          
----------------------------------------------------------------------------------------------------------
 Time taken by list_sort: 182557 ns, Time taken by list_oid_sort: 85864 ns, Percentage difference: 52.97%
(1 row)

Time: 2,291 ms
postgres=# SELECT bench_oid_reverse_sort(100000);
                                           bench_oid_reverse_sort                                            
-------------------------------------------------------------------------------------------------------------
 Time taken by list_sort: 9064163 ns, Time taken by list_oid_sort: 4313448 ns, Percentage difference: 52.41%
(1 row)

Time: 17,146 ms
postgres=# SELECT bench_oid_reverse_sort(1000000);
                                            bench_oid_reverse_sort                                            
---------------------------------------------------------------------------------------------------------------
 Time taken by list_sort: 61990395 ns, Time taken by list_oid_sort: 23703380 ns, Percentage difference: 61.76%
(1 row)

postgres=# SELECT bench_int_reverse_sort(1000000);
                                            bench_int_reverse_sort                                            
---------------------------------------------------------------------------------------------------------------
 Time taken by list_sort: 50712416 ns, Time taken by list_int_sort: 24120417 ns, Percentage difference: 52.44%
(1 row)

Time: 89,359 ms

postgres=# SELECT bench_float8_reverse_sort(1000000);
                                            bench_float8_reverse_sort                                            
-----------------------------------------------------------------------------------------------------------------
 Time taken by usual sort: 57447775 ns, Time taken by optimized sort: 25214023 ns, Percentage difference: 56.11%
(1 row)

Time: 92,308 ms

Hello again. I want to show you the graphs of when we increase the length vector/array sorting time (ns). What do you think about graphs?

Best regards, Stepan Neretin.

Hello again :) I made a mistake in the benchmarks code. I am attaching new corrected benchmarks for int sorting as example. And my stupid, simple python script for making benchs and draw graphs. What do you think about this graphs?


Best regards, Stepan Neretin.
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: gcc 12.1.0 warning
Next
From: Andres Freund
Date:
Subject: Re: Upgrade Debian CI images to Bookworm