On Tue, Sep 27, 2022 at 4:39 PM Benjamin Coutu <
ben.coutu@zeyos.com> wrote:
>
> Getting back to improvements for small sort runs, it might make sense to consider using in-register based sorting via sorting networks for very small runs.
> It looks like some of the commercial DBMSs do this very successfully.
"Looks like"?
> They use 4 512bit registers (AVX-512) in this example, which could technically store 4 * 4 sort-elements (given that the sorting key is 64 bit and the tuple pointer is 64 bit). I wonder whether this could also be done with just SSE (instead of AVX), which the project now readily supports thanks to your recent efforts?
SortTuples are currently 24 bytes, and supported vector registers are 16 bytes, so not sure how you think that would work.
More broadly, the more invasive a change is, the more risk is involved, and the more effort to test and evaluate. If you're serious about trying to improve insertion sort performance, the simple idea we discussed at the start of the thread is a much more modest step that has a good chance of justifying the time put into it. That is not to say it's easy, however, because testing is a non-trivial amount of work.