Re: Insertion Sort Improvements - Mailing list pgsql-hackers

From John Naylor
Subject Re: Insertion Sort Improvements
Date
Msg-id CAFBsxsHUwd97Wxx+cmyAFZDj4fZpzJmBSyPkCCJFjd0fk1FvWg@mail.gmail.com
Whole thread Raw
In response to Re: Insertion Sort Improvements  (Benjamin Coutu <ben.coutu@zeyos.com>)
Responses Re: Insertion Sort Improvements
List pgsql-hackers

On Tue, May 23, 2023 at 4:10 PM Benjamin Coutu <ben.coutu@zeyos.com> wrote:
>
> Greetings,
>
> I would like to revisit the discussion and concur with John's perspective that incremental progress through small, successive modifications is the appropriate approach to move forward. Therefore, I would like to propose two distinct ideas aimed at enhancing the functionality of insertion sort:
>
> 1. Implementation of a more efficient variant (as described in the introductory part of this thread):

That's worth trying out. It might also then be worth trying to push both unordered values -- the big one up / the small one down. I've seen other implementations do that, but don't remember where, or what it's called.

> 2. It appears that there is a consensus against disregarding the presorted check, despite its questionable value. In light of this, an alternative suggestion is to integrate the presorted check into the insertion sort path by utilizing an unbounded insertion sort.

"Unbounded" means no bounds check on the array. That's not possible in its current form, so I think you misunderstood something.

> Only after a maximum number of swaps have occurred should we abandon the sorting process. 

I only remember implementations tracking loop iterations, not swaps. You'd need evidence that this is better.

> If insertion sort is executed on the entire array without any swaps, we can simply return. If not, and we continue with quicksort after the swap limit has been reached, we at least have left the array in a more sorted state, which may likely be advantageous for subsequent iterations.

An important part not mentioned yet: This might only be worth doing if the previous recursion level performed no swaps during partitioning and the current pivot candidates are ordered. That's a bit of work and might not be convenient now -- it'd be trivial with dual-pivot, but I've not looked at that in a while. (Fittingly, dual-pivot requires a higher insertion sort threshold so it goes both ways.)

> I would greatly appreciate assistance in benchmarking these proposed changes. Your collaboration in this matter would be invaluable.

I advise looking in the archives for scripts from previous benchmarks. No need to reinvent the wheel -- it's enough work as it is. A key thing here for #1 is that improving insertion sort requires increasing the threshold to show the true improvement. It's currently 7, but should really be something like 10. A script that repeats tests for, say, 7 through 18 should show a concave-up shape in the times. The bottom of the bowl should shift to higher values, and that minimum is what should be compared.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Docs: Encourage strong server verification with SCRAM
Next
From: Andres Freund
Date:
Subject: Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call