Thread: some aspects of our qsort might not be ideal
While reviewing Thomas Munro's patchset to consider expanding the uses of specialized qsort [1], I wondered about some aspects of the current qsort implementation. For background, ours is based on Bentley & McIlroy "Engineering a Sort Function" [2], which is a classic paper worth studying. 1) the check for pre-sorted input at the start each and every recursion This has been discussed before [3]. This is great if the input is sorted, and mostly harmless if the current partition is badly sorted enough that this check fails quickly, but it's not hard to imagine that this is mostly a waste of cycles. There have been proposals to base the pre-sorted check on input size [4] or to do it only once at the beginning, but that strikes me as looking for the keys under the lamppost because that's where the light is. The logical criterion for proceeding to check if the input is sorted is whether we *think* the input could be sorted. That may sound like a tautology, but consider the case where the partitioning phase didn't perform any swaps. Then, it makes sense to check, but we can go further. What if we do the check, but towards the end that check fails. If just a couple elements are out of place, does it really make sense to give up, partition, and recurse? If just a few iterations of insertion sort are all that is needed to finish, why not just do that? This way, we *dynamically* decide to optimistically start an insertion sort, in the hopes that it will occasionally prevent a recursion, and worst case the input is slightly more sorted for the next recursion. All we need is a lightweight way to detect that the partitioning phase didn't do any swaps. More on this later. 2) cardinality issues can cancel abbreviated keys, but our qsort is not optimized for that Since in this case comparison is very expensive, it stands to reason that qsort could profitably be optimized for a low number of unique keys. The Bentley & McIlroy scheme does take great pains to prevent quadratic behavior from lots of duplicates, but I've found something that might have stronger guarantees: Pattern-defeating quicksort (paper: [5] C++ implementation: [6]) claims worst-case runtime of O(nk) for inputs with k distinct elements. This is achieved via an asymmetric partitioning scheme. It's not complex, but it is subtle, so I won't try to describe it here. I recommend reading section 3 of the paper for details. Rust and Boost use PDQuicksort in some form, so it has some production use. The partitioning scheme just mentioned also provides an easy way to detect when no swaps have been done, thus solving #1 above. There is one requirement that is a double-edged sword: Recursion to the partitions must happen in order. This has an additional advantage that in every case but one, insertion sort doesn't need to guard against running off the beginning of the array. The downside for us is that we currently recurse to a partition based on its size as a stack-guarding measure, so that guard would have to be replaced. The C++ implementation of PDQuicksort falls back to heap sort as a last resort to bound runtime, but it would be just as effective at guarding the stack. That's a bit of a snag for making it production-ready, but not enough to prevent testing the actual qsort part. Does anyone see a reason not to put in the necessary work to try it out? [1] https://www.postgresql.org/message-id/CA%2BhUKGKztHEWm676csTFjYzortziWmOcf8HDss2Zr0muZ2xfEg%40mail.gmail.com [2] https://cs.fit.edu/~pkc/classes/writing/samples/bentley93engineering.pdf [3] https://www.postgresql.org/message-id/flat/87F42982BF2B434F831FCEF4C45FC33E5BD369DF%40EXCHANGE.corporate.connx.com#e69718293c45d89555020bd0922ad055 [4] https://www.postgresql.org/message-id/CA%2BTgmobKFcb6ajVEN8eSnBa78sB3FSwuEWTHXd2x9JC5DOh5OA%40mail.gmail.com [5] https://arxiv.org/pdf/2106.05123.pdf [6] https://github.com/orlp/pdqsort -- John Naylor EDB: http://www.enterprisedb.com
> [3] https://www.postgresql.org/message-id/flat/87F42982BF2B434F831FCEF4C45FC33E5BD369DF%40EXCHANGE.corporate.connx.com#e69718293c45d89555020bd0922ad055 The top of that thread is where I meant to point to: https://www.postgresql.org/message-id/flat/CAEYLb_Xn4-6f1ofsf2qduf24dDCVHbQidt7JPpdL_RiT1zBJ6A%40mail.gmail.com -- John Naylor EDB: http://www.enterprisedb.com
> This way, we *dynamically* > decide to optimistically start an insertion sort, in the hopes that it > will occasionally prevent a recursion, and worst case the input is > slightly more sorted for the next recursion. I should mention one detail -- we put a limit on how many iterations we attempt before we give up and partition/recurse. The author's implementation chooses 8 for this limit. The paper mentions this technique in section 5.2, but is not the origin of it. -- John Naylor EDB: http://www.enterprisedb.com
On Wed, Feb 16, 2022 at 2:29 AM John Naylor <john.naylor@enterprisedb.com> wrote: > Does anyone see a reason not to put in the necessary work to try it out? Seems reasonable to me. It's always a bit difficult, I feel, to know what test cases to use - almost any idea is going to have some case where it's worse than what we do today, and there can always be some user who does that exact thing 100% of the time. Moreover, it's hard to be certain that test cases we construct - say, ordered data, reverse ordered data, randomly ordered data, almost ordered data with a single element out of place, etc. - are actually covering all of the interesting cases. At the same time, I don't think anyone would seriously disagree with what you say in the subject line, and we won't make any progress by NOT trying things that are recommended in the academic literature. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, Attached is a draft series that implements some but not all features of pattern-defeating quicksort, namely the ones I thought were interesting for us. Recently this quicksort variant got committed for the next release of the Go language 1.19 [1] (which in turn was based on that of Rust [2]), and that implementation was a helpful additional example. The bottom line is that replacing the partitioning scheme this way is likely not worth doing because of our particular use cases, but along the way I found some other things that might be worth doing, so some good may come out of this. 0001 is a test module for running a microbenchmark, based on earlier work by Thomas Munro. 0002 is just preliminary refactoring. (From here on some regression tests change from patch to patch, since tests that rely on a sort order might have duplicates in the sort key.) 0003 starts a "partial insertion sort" if partitioning didn't do any swaps, and bails if it doesn't finish quickly. A footnote to what I said earlier: > All we need is a > lightweight way to detect that the partitioning phase didn't do any > swaps. Actually, Postgres did detect this until a3f0b3d68 [3], but wasn't very bright about it -- it attempted a complete insertion sort. Limiting the iterations is more sensible. In addition to the swap count that I borrowed from ancient PG code, this patch checks the root level as we do today (but now in a different code path), and there are a couple additional heuristics to limit wasted work: - Only try it if the pivot candidates are in order - If the input is presorted we always return, but if it isn't we will only proceed with swapping elements if the input was large enough that we checked the order of nine pivot candidates. - Only try partial insertion sort if the partitions are not highly different in size, since bad pivots will lead to fewer swaps in general, causing the swap count heuristic to be unhelpful. (Future work: look into randomizing pivot candidates when encountering such skewed partitions, as some other implementations do) 0004 builds on upon the previous and preemptively reverses the input if there are nine pivot candidates and they are in descending order. 0005 implements the PDQuicksort partitioning scheme. In working on this, I found I need to correct something I said earlier: > There is one requirement that is a double-edged sword: Recursion to > the partitions must happen in order. This has an additional advantage > that in every case but one, insertion sort doesn't need to guard > against running off the beginning of the array. The downside for us is > that we currently recurse to a partition based on its size as a > stack-guarding measure, so that guard would have to be replaced. This is not true actually (I read too much into an implementation detail), and with a tiny bit of bookkeepping we can retain our practice of recursing to the smaller partition. A very brief description of 0005: There are two partitioning functions: left and right. Partition-right puts elements >= pivot in the right partition. This is the default. When many duplicates end up in the right partition, the pivot chosen for the next iteration there could be the same as the previous pivot. In that case, we switch to partition-left: put elements equal to the pivot in the left partition. Since all those elements are the same, we are done and can iterate on the right side. With this scheme, only 2-way comparators are needed. Also, the previous counting of swaps during partitioning is replaced by a simple pointer check once per iteration. -- Tests: I have two flavors of tests: 1) ORDER BY queries with 10 million records: Select a single value for three types: - bigint (so can use datum sort) - 16-byte abbreviatable text - 16-byte non-abbreviatable text (median of 5 runs, script attached) 2) a microbenchmark sorting 1 million int32 datums with two ways to vary comparison schemes: direct C function/fmgr function and runtime/specialized, so four different combinations. The idea here is to get an easy way to test comparators of different "heaviness", not that these actual cases are all necessarily relevant. (minimum of 10 runs) The input distributions can be put into three categories: - Varying amounts of randomness and presortedness (random, sort50/90/99, dither) - Long runs of ascending and descending sequences (ascending, descending, organ, merge) - Low cardinality (dup8, dupsq, mod100/8) To get the microbenchmark output in the same format as the query script I ran psql -c 'select test_sort_cmp_weight(1 * 1024*1024)' 2> result.txt perl -n -e 'print "$2\t$3\t$1\t$4\n" if /NOTICE: \[(.+)\] num=(\d+), order=(\w+), time=(\d+\.\d+)/;' result.txt > result.csv -- Summary of results: Partial insertion sort: - For small types sometimes possibly a bit faster, sometimes significantly faster. - With queries, significantly slower for some partially-presorted distributions, otherwise mostly no change. - Could be further optimized for types known at compile time by using a "sift" algorithm rather than swapping at every iteration. We could also try basing the iteration limit on type size. Optimistically reversing input: - 20-25x faster on descending input, which is nice. It's not clear if it would benefit "mostly descending" input. PDQuicksort: - With light or moderate types/comparators, it seems significantly faster than B&M. - With heavier comparators (including all queries), it's slightly faster at best, and regresses noticeably with low cardinality inputs. This latter finding was somewhat surprising. The O(kn) behavior for low cardinality input is academically interesting, but doesn't translate to gains for some of our key use cases. -- Conclusion: The partial insertion sort seems useful if the regressions can be worked out. The reverse-input optimization is a small addition on top of that and seems worth doing. The PDQuicksort partitioning scheme is a win in some cases I tested but a loss in others, so at this point I'm inclined not to pursue that part further. I'd be happy to hear any thoughts anyone might have. [1] https://github.com/golang/go/blob/72e77a7f41bbf45d466119444307fd3ae996e257/src/sort/zsortfunc.go [2] https://github.com/rust-lang/rust/blob/415c9f95884caebd0be9b837ef0885d1ffde1b9b/library/core/src/slice/sort.rs [3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a3f0b3d68f9a5357a3f72b40a45bcc714a9e0649;hp=570b726533fb561bc5a35ba50ea944f139a250d5
Attachment
- v1-0003-Optimistically-attempt-insertion-sort-on-pre-part.patch
- v1-0002-Create-internal-workhorse-for-ST_SORT.patch
- v1-0001-Add-sort-test-module.patch
- v1-0004-Optimize-sorts-on-likely-reversed-input.patch
- v1-0005-Replace-qsort-s-partitioning-scheme.patch
- v1-results-query-graph.ods
- v1-results-microbenchmark-graph.ods
- sort-bench-pdqsort-sql-20220529.sh
On Thu, Jun 2, 2022 at 8:33 PM John Naylor <john.naylor@enterprisedb.com> wrote: > Attached is a draft series that implements some but not all features > of pattern-defeating quicksort, namely the ones I thought were > interesting for us. Recently this quicksort variant got committed for > the next release of the Go language 1.19 [1] (which in turn was based > on that of Rust [2]), and that implementation was a helpful additional > example. The bottom line is that replacing the partitioning scheme > this way is likely not worth doing because of our particular use > cases, but along the way I found some other things that might be worth > doing, so some good may come out of this. What about dual-pivot quicksort, which is used in Java 7+? That is the defacto successor to Bentley & McIlroy. In fact, Jon Bentley himself collaborated with its author, and provided benchmarking input. The underlying philosophy is essentially the same as the original -- it is supposed to be an "industrial strength" quicksort, with various adversarial cases considered directly. See: https://www.wild-inter.net/publications/wild-2018b.pdf https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf At one point quite a few years back I planned on investigating it myself, but never followed through. -- Peter Geoghegan
On Fri, Jun 3, 2022 at 10:44 AM Peter Geoghegan <pg@bowt.ie> wrote: > > What about dual-pivot quicksort, which is used in Java 7+? That is the > defacto successor to Bentley & McIlroy. In fact, Jon Bentley himself > collaborated with its author, and provided benchmarking input. The > underlying philosophy is essentially the same as the original -- it > is supposed to be an "industrial strength" quicksort, with various > adversarial cases considered directly. I had heard of it but not looked into it deeply. I did read that Java 7 uses dual pivot quicksort for primitives and timsort for objects. I wasn't sure if dual pivot was not good for objects (which could have possibly-complex comparators) or if timsort was just simply good for Java's use cases. It seems accessible to try doing, so I'll look into that. -- John Naylor EDB: http://www.enterprisedb.com
On Thu, Jun 2, 2022 at 9:24 PM John Naylor <john.naylor@enterprisedb.com> wrote: > I had heard of it but not looked into it deeply. I did read that Java > 7 uses dual pivot quicksort for primitives and timsort for objects. I > wasn't sure if dual pivot was not good for objects (which could have > possibly-complex comparators) or if timsort was just simply good for > Java's use cases. It seems accessible to try doing, so I'll look into > that. I think that Timsort is most useful with cases where individual comparisons are inherently very expensive (perhaps even implemented in Python), which might be common with objects due to the indirection that Python (and even Java) impose on objects in general. I would imagine that this is a lot less relevant in Postgres/tuplesort, because we have optimizations like abbreviated keys. And because we're mostly just sorting tuples on one or more attributes of scalar types. The abbreviated keys optimization is very much something that comes from the world of databases, not the world of sorting. It's pretty much a domain-specific technique. That seems relevant to me. -- Peter Geoghegan
On Fri, Jun 3, 2022 at 10:44 AM Peter Geoghegan <pg@bowt.ie> wrote: > What about dual-pivot quicksort, which is used in Java 7+? That is the > defacto successor to Bentley & McIlroy. In fact, Jon Bentley himself > collaborated with its author, and provided benchmarking input. The > underlying philosophy is essentially the same as the original -- it > is supposed to be an "industrial strength" quicksort, with various > adversarial cases considered directly. Here is a *rough* first pass at dual-pivot quicksort. I haven't changed the regression tests to adjust for qsort being an unstable sort, and there is some dead code. I looked to a couple JDKs for examples of design decisions as a first approximation. It includes a module with a few debugging printouts, run like so: select debug_qsort_int(array[7,6,5,4,3,2,1]); Pivot selection: A common way is to pick five candidates (here: mid, +/- 1/7, +/- 2/7), sort them in place, then pick the 2nd and 4th of them, or "tertile of five". If they are all evenly spaced, we can do insertion sort. Fall back to single-pivot: If any of the pivot candidates are equal, JDK assumes there could be a large number of duplicates and falls back to single-pivot, using the median of the five. I've done the same here. Despite having two code paths in all of our template instances, the VM binary is only about 5kb bigger, since MED3 is no longer built. Performance testing: Not started yet. I'm thinking an apples-to-apples comparison is not insightful enough, since the current insertion sort threshold 7 is already a bit constrained for single-pivot, and would be even more so for dual pivot, especially since 5 of the elements are pre-sorted to choose the pivots. My plan is to retest the threshold on HEAD using my latest tests, then use that as a starting point to test thresholds with dual-pivot. -- John Naylor EDB: http://www.enterprisedb.com
Attachment
On Thu, Jun 23, 2022 at 6:13 AM John Naylor <john.naylor@enterprisedb.com> wrote: > Here is a *rough* first pass at dual-pivot quicksort. I haven't > changed the regression tests to adjust for qsort being an unstable > sort, ... Hmm. I thought we had some reasons for preferring a stable sort algorithm. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, 23 Jun 2022 at 15:52, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Jun 23, 2022 at 6:13 AM John Naylor > <john.naylor@enterprisedb.com> wrote: > > Here is a *rough* first pass at dual-pivot quicksort. I haven't > > changed the regression tests to adjust for qsort being an unstable > > sort, ... > > Hmm. I thought we had some reasons for preferring a stable sort algorithm. I think that mostly has to do with reliability / stability of ORDER BY in combination with LIMIT and OFFSET, even though right now we cannot fully guarantee such stability due to unstable results from underlying plan nodes. As an example, a table scan node under a sort node can start its scan at an arbitrary point in the table (using synchronize_seqscans), and because Sort nodes only sort MinimalTuple-s, each set of tuples that have an equal sort value will be ordered by TID + y (mod tablesize), with arbitrary values for y. - Matthias
On Thu, Jun 23, 2022 at 7:51 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > I think that mostly has to do with reliability / stability of ORDER BY > in combination with LIMIT and OFFSET, even though right now we cannot > fully guarantee such stability due to unstable results from underlying > plan nodes. The current qsort isn't stable. While quicksort is never stable, our particular implementation has fairly significant optimizations that strongly rely on not using a stable sort. In particular, the B&M optimizations for duplicate elements. These optimizations make things like datum tuplesorts for 'SELECT(DISTINCT mycol) ...' style queries on low cardinality columns extremely fast. We're not really sorting so much as bucketing. This is based on Dijkstra's Dutch national flag problem. -- Peter Geoghegan
On Thu, 23 Jun 2022 at 17:04, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Jun 23, 2022 at 7:51 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > I think that mostly has to do with reliability / stability of ORDER BY > > in combination with LIMIT and OFFSET, even though right now we cannot > > fully guarantee such stability due to unstable results from underlying > > plan nodes. > > The current qsort isn't stable. Then I misunderstood Robert's comment, thanks for correcting me. - Matthias
I've run the microbenchmarks only so far, since they complete much faster than queries, and I wanted to tell quickly if this is a good avenue to pursue. Later I'll repeat for queries. Methodology is similar to what I did earlier in the thread, so I won't belabor that. Takeaways: - Contrary to some testing on single pivot I did some month ago with fewer input distributions, in this test neither single nor dual pivot benefit much from a large insertion sort threshold (i.e. 20 or so). I picked 12 for both to do the head-to-head comparison, since this seems to give a slight boost to specialized sorts on the slowest inputs. They don't have to be the same, of course, but it seems about right with these results. (Of course, it's easy to have specialized sorts define their own threshold, as Thomas Munro has shown.) - Generic qsort (type length and comparator passed at runtime) sees no benefit from dual-pivot in this test, but specialized qsorts do get a decent speedup. - Descending inputs get a huge boost when specialized. This is surprising to me, enough that I'm skeptical and want to double check the test. If it's legitimate, I'm happy about it. I've attached three spreadsheets with graphs, two for threshold tests for single- and dual-pivot, and one comparing single and dual for threshold=12. 0001 is the test module and rough script (not for commit). Note: The server won't build, since the threshold is passed via CPPFLAGS. 0002 is trivial and mostly for assert builds 0003 is a mostly cleaned up patch for dual pivot, with passing regression tests and default insertion sort threshold of 12 I'll add a CF entry for this. TODO: add results for queries. -- John Naylor EDB: http://www.enterprisedb.com
Attachment
I wrote: > TODO: add results for queries. Attached are three spreadsheets for queries, similar to the earlier ones for microbenchmarks. There are three datatypes: - Bigint: by selecting just the value, these can use Datum sorts - Text: the same number as bigint, but in lpadded text form, followed by 8 identical chararters, so 16 bytes. - Text no-abbrev: also 16 bytes of text derived from bigint, but with the 8 characters at the beginning to defeat abbreviation. For the single-vs-dual comparison, this time I chose the insertion sort threshold to be 10 for single pivot and 18 for dual-pivot. I see a big speedup in descending values here as well, completing in 20-50% less time. Everything else is the same, slightly faster, or slightly slower, at least on this machine (Comet Lake, gcc 12.1). The slightly slower cases are ones with duplicates. It would make sense that the duplicate-detection heuristic lags behind where we would ideally be using single-pivot, but the difference is small. In any case, there doesn't seem to be a compelling benefit from dual-pivot for most cases. I suspect the branches in the sorttuple comparators for null handling are negating the benefits from better d-cache and TLB behavior. In that case, a useful thing to try would be to pre-partition the sorttuple array between null and non-null values as it's being populated, then sort each of those in turn. (The null portion would only need sorting if there was more than one sort key.) That would automatically take care of nulls-first vs nulls-last upfront, save a bunch of binary space used for branches (worth doing on its own), and the "isnull" member would not be needed anymore. That wouldn't save data space because of alignment, but the compiler wouldn't need to generate a load/store instruction for it. gcc does, but clang and msvc just treat the 24-byte struct as 3 8-byte integers: https://godbolt.org/z/d8j1EvroP -- John Naylor EDB: http://www.enterprisedb.com
Attachment
On Tue, Jul 5, 2022 at 12:14 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> I suspect the branches in the sorttuple comparators for null handling
> are negating the benefits from better d-cache and TLB behavior. In
> that case, a useful thing to try would be to pre-partition the
> sorttuple array between null and non-null values as it's being
> populated, then sort each of those in turn. (The null portion would
> only need sorting if there was more than one sort key.) That would
> automatically take care of nulls-first vs nulls-last upfront, save a
> bunch of binary space used for branches (worth doing on its own),
To see if this idea had any legs, I simply ripped out the null handling from the specialized sorttuple comparators. This works for my tests since all the values are not null. Doing this much seems to be more effective than dual pivot in every case but purely-descending input. The combination of the two seems a bit faster still on bigint, but not on text. From this test, it seems the conclusions for tuple sorts with dual-pivot haven't changed: impressive speedup on descending input, not so much elsewhere. (spreadsheet attached, note the insertion sort thresholds are reused from previous testing, namely 10 and 18 for single / dual pivot)
> and
> the "isnull" member would not be needed anymore. That wouldn't save
> data space because of alignment, but the compiler wouldn't need to
> generate a load/store instruction for it.
This piece might not be workable, or would make external merges more difficult. That said, doing the null-handling up front when forming sort tuples seems worth doing, and I intend to come back to it at some point.
Regarding binary size, removing the null handling from the comparators shrinks the binary by about 4.3kB. Adding dual pivot on top of that expands it to about 3.5kB larger than HEAD, but I'm pretty sure I could get that down to net zero by replacing the presorted check with a partial insertion sort (as discussed before), and reusing that for the pivot selection as well. Being able to add partial insertion sort with net ~zero binary increase could be another small point in favor of dual-pivot.
I've attached the patches for this experiment for completeness, but withdrawing the CF entry for now.
--
John Naylor
EDB: http://www.enterprisedb.com
> I suspect the branches in the sorttuple comparators for null handling
> are negating the benefits from better d-cache and TLB behavior. In
> that case, a useful thing to try would be to pre-partition the
> sorttuple array between null and non-null values as it's being
> populated, then sort each of those in turn. (The null portion would
> only need sorting if there was more than one sort key.) That would
> automatically take care of nulls-first vs nulls-last upfront, save a
> bunch of binary space used for branches (worth doing on its own),
To see if this idea had any legs, I simply ripped out the null handling from the specialized sorttuple comparators. This works for my tests since all the values are not null. Doing this much seems to be more effective than dual pivot in every case but purely-descending input. The combination of the two seems a bit faster still on bigint, but not on text. From this test, it seems the conclusions for tuple sorts with dual-pivot haven't changed: impressive speedup on descending input, not so much elsewhere. (spreadsheet attached, note the insertion sort thresholds are reused from previous testing, namely 10 and 18 for single / dual pivot)
> and
> the "isnull" member would not be needed anymore. That wouldn't save
> data space because of alignment, but the compiler wouldn't need to
> generate a load/store instruction for it.
This piece might not be workable, or would make external merges more difficult. That said, doing the null-handling up front when forming sort tuples seems worth doing, and I intend to come back to it at some point.
Regarding binary size, removing the null handling from the comparators shrinks the binary by about 4.3kB. Adding dual pivot on top of that expands it to about 3.5kB larger than HEAD, but I'm pretty sure I could get that down to net zero by replacing the presorted check with a partial insertion sort (as discussed before), and reusing that for the pivot selection as well. Being able to add partial insertion sort with net ~zero binary increase could be another small point in favor of dual-pivot.
I've attached the patches for this experiment for completeness, but withdrawing the CF entry for now.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment
- v3-0005-Set-insertion-sort-threshold-to-18.patch
- v3-0006-Revert-Remove-null-comparisons-from-sort-comparat.patch
- v3-0004-Implement-dual-pivot-quicksort.patch
- remove-null-cmp-sp10-dp18.ods
- v3-0003-Create-internal-workhorse-for-ST_SORT.patch
- v3-0002-Remove-null-comparisons-from-sort-comparators.patch
- v3-0001-Set-insertion-sort-threshold-to-10.patch