Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20200324230831.5zvjpgayh35bxedp@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Tue, Mar 24, 2020 at 06:26:11PM -0400, James Coleman wrote: >On Mon, Mar 23, 2020 at 11:44 PM Alvaro Herrera ><alvherre@2ndquadrant.com> wrote: >> >> On 2020-Mar-23, James Coleman wrote: >> >> > 4. nodeIncrementalSort.c ExecReScanIncrementalSort: This whole chunk >> > is suspect. I've mentioned previously I don't have a great mental >> > model of how rescan works and its invariants (IIRC someone said it was >> > about moving around a result set in a cursor). Regardless I'm pretty >> > sure this code just doesn't work correctly. >> >> I don't think that's the whole of it. My own vague understanding of >> ReScan is that it's there to support running a node again, possibly with >> different parameters. For example if you have a join of an indexscan >> on the outer side and an incremental sort on the inner side, and the >> values from the index are used as parameters to the incremental sort, >> then the incremental sort is going to receive ReScan calls for each of >> the values that the index returns. Sometimes the index could give you >> the same values as before (because there's a dupe in the index), so you >> can just return the same values from the incremental sort; but other >> times it's going to return different values so you need to reset the >> incremental sort to "start from scratch" using the new values as >> parameters. >> >> Now, if you have a cursor reading from the incremental sort and fetch >> all tuples, then rewind completely and fetch all again, then that's >> going to be a rescan as well. >> >> I agree with you that the code doesn't seem to implement that. > >I grepped the codebase for rescan, and noted this relevant info in >src/backend/executor/README: > >* Rescan command to reset a node and make it generate its output sequence >over again. > >* Parameters that can alter a node's results. After adjusting a parameter, >the rescan command must be applied to that node and all nodes above it. >There is a moderately intelligent scheme to avoid rescanning nodes >unnecessarily (for example, Sort does not rescan its input if no parameters >of the input have changed, since it can just reread its stored sorted data). > >That jives pretty well with what you're saying. > >The interesting thing with incremental sort, as the comments in the >patch already note, is that even if the params haven't changed, we >can't regenerate the same values again *unless* we know that we're >still in the same batch, or, have only processed a single full batch >(and the tuples are still in the full sort state) or we've >transitioned to prefix mode and have only transferred tuples from the >full sort state for a single prefix key group. > >That's a pretty narrow range of applicability of not needing to >re-execute the entire node, at least based on my assumptions about >when rescanning will typically happen. > >So, two followup questions: > >1. Given the narrow applicability, might it make sense to just say >"we're only going to do a total reset and rescan and not try to >implement a smart 'don't rescan if we don't have to'"? > I think that's a sensible approach. >2. What would be a typical or good way to test this? Should I >basically repeat many of the existing implementation tests but with a >cursor and verify that rescanning produces the same results? That's >probably the path I'm going to take if there are no objections. Of >course we would need even more testing if we wanted to have the "smart >rescan" functionality. > I haven't checked, but how are we testing it for the other nodes? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: