> Note that our current implemention is highly optimized for low-cardinality inputs.
> This is needed for aggregate queries. I found this write-up of a couple scalar and
> vectorized sorts, and they show this library doing very poorly on very-low
> cardinality inputs. I would look into that before trying to get something in shape to
> share.
>
> https://github.com/Voultapher/sort-research-
> rs/blob/main/writeup/intel_avx512/text.md
That write up is fairly old and those perf problems has subsequently been fixed. See
https://github.com/intel/x86-simd-sort/pull/127andhttps://github.com/intel/x86-simd-sort/pull/168. I still suggest
measuringperf here for thoroughness.
>
> There is also the question of hardware support. It seems AVX-512 is not
> supported well on client side, where most developers work. And availability of
> any flavor is not guaranteed on server either.
> Something to keep in mind.
simd-sort also works on avx2 which is widely available. I would suggest benchmarking on one of the client laptops to
measurethe perf.
Raghuveer