> On 26/01/23 07:40, David Rowley wrote:
> I just put together a benchmark script to try to help paint a picture
> of under what circumstances reducing the number of sorts slows down
> performance. work_mem was 4GB for each, so the sorts never went to disk.
I tried same test suit albeit work_mem = 1 MB to check how sort behaves with frequent
disk IO.
For 1 million row test, patch version shows some promise but performance degrades in 10 million row test.
There is also an anomalies in middle range for 100/1000 random value.
I have also added results of benchmark with sorting fix enabled and table with 3 columns
and sorting done on > 2 columns. I don't see much improvement vs master here.
Again, these results are for work_mem = 1 MB.
> Sorting in smaller batches that better fit into CPU cache.
More reading yielded that we are looking for cache-oblivious
sorting algorithm.
One the papers[1] mentions that in quick sort, whenever we reach size which can fit in cache,
instead of partitioning it further, we can do insertion sort there itself.
> Memory-tuned quicksort uses insertion sort to sort small subsets while they are in
> the cache, instead of partitioning further. This increases the instruction count,
> but reduces the cache misses
If this looks step in right direction, I can give it a try and do more reading and experimentation.
[1] http://www.ittc.ku.edu/~jsv/Papers/WAC01.sorting_using_registers.pdf
Thanks,
Ankit