Thread: SIMD optimization for list_sort
Hi All,
We propose enabling SIMD-based sort for list_sort using the x86-simd-sort library (https://github.com/intel/x86-simd-sort).
The existing list_sort takes a comparator function to compare pairs of ListCell. On the other hand, x86-simd-sort requires an array of numeric values to sort, and it returns an array of sorted indices. To enable x86-simd-sort, we add new list_sort_simd functions that take an extractor function. The function extracts a value (float or uint32) from a ListCell. Those values are then collected into an array for x86-simd-sort to work on. A comparator function can still be passed to be used as tie-breaker.
typedef float (*list_sort_extractor_float)(const ListCell *a);
typedef uint32 (*list_sort_extractor_uint32)(const ListCell *a);
void list_sort_simd_float(List *list, list_sort_extractor_float extract, list_sort_comparator cmp);
void list_sort_simd_uint32(List *list, list_sort_extractor_uint32 extract, list_sort_comparator cmp);
These functions will exist alongside the current list_sort. Existing list_sort use cases in Postgres or extensions will not be affected by default, and they can be converted to list_sort_simd functions where it makes sense in terms of performance.
This feature introduces the x86-simd-sort library as a dependency. We plan to make the dependency optional, using a configuration option, such as --with-x86-simd-sort. If building without this option, list_sort_simd functions could fall back to the existing list_sort (e.g., by creating a comparator function that combines the extracted value and the tie-breaker comparator function). Alternatively (or in addition), we could add a function to report whether x86-simd-sort is available.
We identified a first use case for list_sort_simd_float in pgvector. As part of HNSW index construction, pgvector uses list_sort to sort candidate vectors by distance. Using list_sort_simd_float, we observed reduction in index build time in some scenarios. For example, we observed 7% reduction in index build time with the gist-960 dataset and 10% with the dbpedia-openai-1000k dataset (ANN-Benchmarks, HNSW index with halfvec, m=80). We are also looking into microbenchmarks to measure list_sort performance independently.
We’d appreciate feedback on this approach. In the meantime, we will complete the patch to share. We also plan to extend SIMD-based sort to tuple sort in the future.
Thanks!
Luca Giacchino
On 22/11/2024 01:27, Giacchino, Luca wrote: > The existing list_sort takes a comparator function to compare pairs of > ListCell. On the other hand, x86-simd-sort requires an array of numeric > values to sort, and it returns an array of sorted indices. To enable > x86-simd-sort, we add new list_sort_simd functions that take an > extractor function. The function extracts a value (float or uint32) from > a ListCell. Those values are then collected into an array for x86-simd- > sort to work on. A comparator function can still be passed to be used as > tie-breaker. > > typedef float (*list_sort_extractor_float)(const ListCell *a); > > typedef uint32 (*list_sort_extractor_uint32)(const ListCell *a); > > void list_sort_simd_float(List *list, list_sort_extractor_float extract, > list_sort_comparator cmp); > > void list_sort_simd_uint32(List *list, list_sort_extractor_uint32 > extract, list_sort_comparator cmp); > > These functions will exist alongside the current list_sort. Existing > list_sort use cases in Postgres or extensions will not be affected by > default, and they can be converted to list_sort_simd functions where it > makes sense in terms of performance. I'd suggest targeting pg_qsort() directly, instead of list_sort(). list_sort() is not used in very performance critical parts. > We identified a first use case for list_sort_simd_float in pgvector. As > part of HNSW index construction, pgvector uses list_sort to sort > candidate vectors by distance. Using list_sort_simd_float, we observed > reduction in index build time in some scenarios. For example, we > observed 7% reduction in index build time with the gist-960 dataset and > 10% with the dbpedia-openai-1000k dataset (ANN-Benchmarks, HNSW index > with halfvec, m=80). We are also looking into microbenchmarks to measure > list_sort performance independently. That's interesting. I'd suggest proposing this to the pgvector project directly, since pgvector would immediately benefit. > We’d appreciate feedback on this approach. In the meantime, we will > complete the patch to share. We also plan to extend SIMD-based sort to > tuple sort in the future. If you could use this to speed up tuple sorting, that would be much more interesting for PostgreSQL itself. -- Heikki Linnakangas Neon (https://neon.tech)
On Fri, Nov 22, 2024 at 06:01:22PM +0200, Heikki Linnakangas wrote: > On 22/11/2024 01:27, Giacchino, Luca wrote: >> We´d appreciate feedback on this approach. In the meantime, we will >> complete the patch to share. We also plan to extend SIMD-based sort to >> tuple sort in the future. > > If you could use this to speed up tuple sorting, that would be much more > interesting for PostgreSQL itself. +1 -- nathan
On Fri, Nov 22, 2024 at 6:27 AM Giacchino, Luca <luca.giacchino@intel.com> wrote: > We’d appreciate feedback on this approach. In the meantime, we will complete the patch to share. We also plan to extendSIMD-based sort to tuple sort in the future. Coincidentally, I'll be prototyping a tuple sort that will take better advantage of keys that are integers (whether authoritative or abbreviated), using scalar, portable branch-free techniques. This will require some re-architecting and I believe that will also make it easier for you to experiment with sorting networks there. -- John Naylor Amazon Web Services
On 22/11/2024 06:27, Giacchino, Luca wrote: > We’d appreciate feedback on this approach. In the meantime, we will > complete the patch to share. We also plan to extend SIMD-based sort to > tuple sort in the future. Nice! I continually see performance reports when sorting and group order impact performance. Because of that, efforts have been made to optimise sorts with recombination of clause order [1,2]. It would be nice to review your Sort operator optimisation. I hope, it might improve performance of the cases provided in these threads. [1] POC: GROUP BY optimization https://www.postgresql.org/message-id/flat/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru [2] Consider the number of columns in the sort cost model https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com -- regards, Andrei Lepikhov