Home > mailing lists

Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date	June 14, 2019 17:37:25
Msg-id	20190614173725.vu6xrkx4iu4ghjdz@development Whole thread
In response to	Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Jun 13, 2019 at 11:38:12PM -0400, James Coleman wrote:
>On Wed, Jun 5, 2019 at 12:14 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
>> > > 2) Provide some fallback at execution time. For example, we might watch
>> > > the size of the group, and if we run into an unexpectedly large one we
>> > > might abandon the incremental sort and switch to a "full sort" mode.
>> >
>> > Are there good examples of our doing this in other types of nodes
>> > (whether the fallback is an entirely different algorithm/node type)? I
>> > like this idea in theory, but I also think it's likely it would add a
>> > very significant amount of complexity. The other problem is knowing
>> > where to draw the line: you end up creating these kinds of cliffs
>> > where pulling one more tuple through the incremental sort would give
>> > you your batch and result in not having to pull many more tuples in a
>> > regular sort node, but the fallback logic kicks in anyway.
>> >
>> What about having some simple mechanism for this, like if we encounter
>> the group with more tuples than the one estimated then simply switch
>> to normal sort for the remaining tuples, as the estimates does not
>> hold true anyway. Atleast this will not give issues of having
>> regressions of incremental sort being too bad than the normal sort.
>> I mean having something like this, populate the tuplesortstate and
>> keep on counting the number of tuples in a group, if they are within
>> the budget call tuplesort_performsort, otherwise put all the tuples in
>> the tuplesort and then call tuplesort_performsort. We may have an
>> additional field in IncrementalSortState to save the estimated size of
>> each group. I am assuming that we use MCV lists to approximate better
>> the group sizes as suggested above by Tomas.
>
>I think the first thing to do is get some concrete numbers on performance if we:
>
>1. Only sort one group at a time.
>2. Update the costing to prefer traditional sort unless we have very
>high confidence we'll win with incremental sort.
>
>It'd be nice not to have to add additional complexity if at all possible.
>

+1 to that

>> > Unrelated to all of the above: if I read the patch properly it
>> > intentionally excludes backwards scanning. I don't see any particular
>> > reason why that ought to be the case, and it seems like an odd
>> > limitation for the feature should it be merged. Should that be a
>> > blocker to merging?
>>
>> Regarding this, I came across this,
>> /*
>>   * Incremental sort can't be used with either EXEC_FLAG_REWIND,
>>   * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we hold only current
>>   * bucket in tuplesortstate.
>>   */
>> I think that is quite understandable. How are you planning to support
>> backwards scan for this? In other words, when will incremental sort be
>> useful for backward scan.
>
>For some reason I was thinking we'd need it to support backwards scans
>to be able to handle DESC sort on the index, but I've tested and
>confirmed that already works. I suppose that's because the index scan
>provides that ordering and the sort node doesn't need to reverse the
>direction of what's provided to it. That's not particularly obvious to
>someone newer to the codebase; I'm not sure if that's documented
>anywhere.
>

Yeah, backward scans are not about ASC/DESC, it's about being able to walk
back through the result. And we can't do that with incremental sorts
without materialization.

>> On a different note, I can't stop imagining this operator on the lines
>> similar to parallel-append, wherein multiple workers can sort the
>> different groups independently at the same time.
>
>That is an interesting idea. I suppose it'd be particularly valuable
>if somehow there a node that was generating each batch in parallel
>already, though I'm not sure at first though what kind of query or
>node would result in that. I also wonder if (assuming that weren't the
>case) it would be much of an improvement since a single thread would
>have to generate each batch anyway; I'm not sure if the overhead of
>farming each batch out to a worker would actually be useful or if the
>real blocker is the base scan.
>
>At the very least it's an interesting idea.
>

I kinda doubt that'd be very valuable. Or more precisely, we kinda already
have that capability because we can do things like this:

   -> Gather Merge
      -> Sort
         -> ... scan ...

so I imagine we'd just do an Incremental Sort here. Granted, it's not
distributed by prefix groups (I assume that's what you mean by batches
here), but I don't think that's a big problem.

>---
>
>I've been writing down notes here, and I realized that my test case
>from far upthread is actually a useful setup to see how much overhead
>is involved in sorting each batch individually, since it sets up data
>with each batch only containing 1 tuple. That particular case is one
>we could easily optimize anyway in the code and skip sorting
>altogether -- might be a useful enhancement.
>
>I hope to do some more testing and then report back with the results.
>
>James Coleman

OK.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Jesper Pedersen
Date: 14 June 2019, 17:24:20
Subject: Re: Index Skip Scan

From: Tomas Vondra
Date: 14 June 2019, 18:11:35
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

Previous

Next