Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | CAAaqYe_PXXHnOavS_X+C+yJnWesQzWeCv80jsp33SW4osGRUWg@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
List | pgsql-hackers |
On Thu, Jun 13, 2019 at 11:38:12PM -0400, James Coleman wrote: >I think the first thing to do is get some concrete numbers on performance if we: > >1. Only sort one group at a time. >2. Update the costing to prefer traditional sort unless we have very >high confidence we'll win with incremental sort. > >It'd be nice not to have to add additional complexity if at all possible. I've been focusing my efforts so far on seeing how much we can eliminate performance penalties (relative to traditional sort). It seems that if we can improve things enough there that we'd limit the amount of adjustment needed to costing -- we'd still need to consider cases where the lower startup cost results in picking significantly different plans in a broad sense (presumably due to lower startup cost and the ability to short circuit on a limit). But I'm hopeful then we might be able to avoid having to consult MCV lists (and we wouldn't have that available in all cases anyway) As I see it the two most significant concerning cases right now are: 1. Very large batches (in particular where the batch is effectively all of the matching rows such that we're really just doing a standard sort). 2. Many very small batches. (From reading the whole thread through again, there's a third case: skewed group sizes, but I'm ignoring that for now because my intuition is that if the sort can reasonably handle the above two cases *in execution not just planning* the skew problem will become a non-issue-) For (1), I'd really like to be able to find a way to still benefit from knowing we have prefix columns already sorted. Currently the patch doesn't do that because of the optimization for small batches precludes knowing that all prefixes in a batch are indeed equal. As such I've temporarily removed that optimization in my testing to see how much of a gain we get from not needing to sort prefix columns. Intuitively it seems plausible that this would always beat a standard sort, since a.) equality is sometimes much cheaper than ordering, and b.) it reduces the likelihood of spilling to disk. In addition my fix to make top-n heapsort improves things. The one thing I'm not sure about here is the fact that I haven't seen parallel incremental sort happen at all, and a parallel regular sort can sometimes beat single process incremental sort (though at the cost of higher system resources of course). As noted upthread though I haven't really investigated the parallel component yet at all. I did confirm a measurable speedup with suffix-only sorting (roughly 5%, relative to the current patch). For (2), I'd love to have an optimization for the "batches are a single tuple" case (indeed in my most common real-world use case, that's virtually guaranteed). But that optimization is also incompatible with the current minimum batch size optimization. There was previously reference to the cost of copying the tuple slot being the problem factor with small batches, and I've confirmed there is a performance penalty for small batches if you handle every prefix as a separate batch (relative to the current patch), but what I plan to investigate next is whether or not that penalty is primarily due to the bookkeeping of frequent copying of a pivot tuple or whether it's due to running the sort algorithm itself so frequently. If it's just the overhead of copying the pivot tuple, then I'm wondering if adding functionality to tuplesort to allow peeking at the first inserted tuple might be worth it. Alternatively I've been thinking about ways to achieve a hybrid approach: using single-batch suffix-only sort for large batches and multi-batch full sort for very small batches. For example, I could imagine maintaining two different tuplesorts (one for full sort and one for suffix-only sort) and inserting tuples into the full sorter until the minimum batch size, and then (like now) checking every tuple to see when we've finished the batch. If we finish the batch by, say, 2 * minimum batch size, then we perform the full sort. If we don't find a new batch by that point, we'd go back and check to see if the tuples we've accumulated so far are actually all the same batch. If so, we'd move all of them to the suffix-only sorter and continue adding rows until we hit a new batch. If not, we'd perform the full sort, but only return tuples out of the full sorter until we encounter the current batch, and then we'd move the remainder into the suffix-only sorter. Using this kind of approach you can also imagine further optimizations like checking the current tuple against the pivot tuple only every 10 processed tuples or so. That adds another level of complication, obviously, but could be interesting. Unrelated: There was some discussion upthread about whether or not abbreviated column support could/should be supported. The comments around tuplesort_set_bound in tuplesort.c claim that for bounded sort abbreviated keys are not useful, and, in fact, the code there disables abbreviated sort. Given that the single largest use case for this patch is when we have a LIMIT, I think we can conclude not supporting abbreviated keys is reasonable. One other (meta) question: Do we care mostly about "real" queries on synthetic data (and comparing them to master)? Or do we care more about comparing solely the difference in sorting cost between the two approaches? I think some of both is necessary, but I'm curious what you all think. James Coleman
pgsql-hackers by date: