RE: Proposal for optimizations with simd enabled sort - Mailing list pgsql-hackers

From Giacchino, Luca
Subject RE: Proposal for optimizations with simd enabled sort
Date
Msg-id CO6PR11MB5620FEE1EEF32F5E1242962F9540A@CO6PR11MB5620.namprd11.prod.outlook.com
Whole thread Raw
In response to Proposal for optimizations with simd enabled sort  ("R, Rakshit" <rakshit.r@intel.com>)
List pgsql-hackers
Hi All,

We'd like to share updated performance results for tuple sort after improving the benchmark to better stress sort. We
addedan offset to the query to eliminate the contribution of data movement to the client (referring to
https://www.postgresql.org/message-id/flat/CANWCAZbAmaZ7P%2BARjS97sJLXsBB5CPZyzFgqNDiqe-L%2BBqXzug%40mail.gmail.com).
Withthis change, flame graphs confirm that the sort contribution is much larger than with our previous setup (e.g., for
100krows/100k distinct values using quicksort, tuplesort_performsort is now ~60% of the cycles compared to ~10%
before).We also added data points at 10k and 50k rows. 

Setup:
- Randomly generated integers, with varying row count and cardinality (controlled by number of distinct values)
- ORDER BY query with OFFSET equal to number of rows
- Query executed by pgbench (1 job, 1 client) measuring tps (transactions per second)
- Data collected on AWS m7i.metal-24xl, Ubuntu 24.04, gcc13

The table below reports measured tps gains for SIMD-enabled sort vs quicksort

 10k rows,  10k distinct values:  +92.0%
 10k rows,   1k distinct values:  +61.5%
 10k rows, 100  distinct values:  +31.3%
 10k rows,  10  distinct values:  +10.9%

 50k rows,  50k distinct values: +101.2%
 50k rows,  10k distinct values:  +85.8%
 50k rows, 100  distinct values:  +33.3%
 50k rows,  10  distinct values:  +10.7%

100k rows, 100k distinct values:  +92.0%
100k rows,  10k distinct values:  +67.8%
100k rows, 100  distinct values:  +16.1%
100k rows,  10  distinct values:   -5.8%

  1M rows,   1M distinct values:  +49.1%
  1M rows,  10k distinct values:  +12.1%
  1M rows, 100  distinct values:  -17.5%
  1M rows,  10  distinct values:  -31.2%

In summary, no regressions are observed up to 50k rows. At 100k rows, some regression is observed for the very
low-cardinalitycase. The trend continues to 1M rows. 
It is easy to add dispatching logic at query execution time based on row count (to select SIMD sort or quicksort).
Conditionsbased on cardinality may be more complex to implement. We'd appreciate any recommendations on planning-time
orexecution-time statistics we could use. We propose to start with the row count condition. 
We are developing additional optimizations to improve performance for the 1M+ rows cases. We'll also explore different
datadistributions and share results as they are available. 

Thanks!

Luca Giacchino



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Cross-type index comparison support in contrib/btree_gin
Next
From: Álvaro Herrera
Date:
Subject: Re: Inconsistent LSN format in pg_waldump output