Home > mailing lists

Re: Insertion Sort Improvements - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: Insertion Sort Improvements
Date	August 29, 2022 05:18:05
Msg-id	CAFBsxsFKs46nskQc4dt2XxDhtNA86NR=fOXXeg2Kf=6D_ej0CA@mail.gmail.com Whole thread
In response to	Re: Insertion Sort Improvements (Benjamin Coutu <ben.coutu@zeyos.com>)
Responses	Re: Insertion Sort Improvements
List	pgsql-hackers

Tree view

On Fri, Aug 26, 2022 at 9:06 PM Benjamin Coutu <ben.coutu@zeyos.com> wrote:
>
> Another idea could be to run a binary insertion sort and use a much higher threshold. This could significantly cut
downon comparisons (especially with presorted runs, which are quite common in real workloads). 

Comparisons that must go to the full tuple are expensive enough that
this idea might have merit in some cases, but that would be a research
project.

> If full binary search turned out to be an issue regarding cache locality, we could do it in smaller chunks,

The main issue with binary search is poor branch prediction. Also, if
large chunks are bad for cache locality, isn't that a strike against
using a "much higher threshold"?

> With less comparisons we should start keeping track of swaps and use that as an efficient way to determine
presortedness.We could change the insertion sort threshold to a swap threshold and do insertion sort until we hit the
swapthreshold. I suspect that would make the current presorted check obsolete (as binary insertion sort without or even
witha few swaps should be faster than the current presorted-check). 

The thread you linked to discusses partial insertion sort as a
replacement for the pre-sorted check, along with benchmark results and
graphs IIRC. I think it's possibly worth doing, but needs more
investigation to make sure the (few) regressions I saw either: 1. were
just noise or 2. can be ameliorated. As I said in the dual pivot
thread, this would be great for dual pivot since we could reuse
partial insertion sort for choosing the pivots, reducing binary space.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Andrey Lepikhov
Date: 29 August 2022, 05:05:29
Subject: Re: Removing unneeded self joins

From: David Rowley
Date: 29 August 2022, 05:26:29
Subject: Re: Reducing the chunk header sizes on all memory context types

Re: Insertion Sort Improvements - Mailing list pgsql-hackers

Previous

Next