Re: Incremental sorts and EXEC_FLAG_REWIND - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: Incremental sorts and EXEC_FLAG_REWIND |
Date | |
Msg-id | CAAaqYe_r=hXjt79Z0dj+oef-jwsZ=XSMoQt3d3sDg2-B7c=uQg@mail.gmail.com Whole thread Raw |
In response to | Re: Incremental sorts and EXEC_FLAG_REWIND (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Incremental sorts and EXEC_FLAG_REWIND
|
List | pgsql-hackers |
On Fri, May 8, 2020 at 7:14 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote: > >On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote: > >> > >> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote: > >> > > >> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> wrote: > >> > > > >> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote: > >> > > > > >> > > > Hi, > >> > > > > >> > > > When initializing an incremental sort node, we have the following as > >> > > > of ExecInitIncrementalSort(): > >> > > > /* > >> > > > * Incremental sort can't be used with either EXEC_FLAG_REWIND, > >> > > > * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort > >> > > > * batches in the current sort state. > >> > > > */ > >> > > > Assert((eflags & (EXEC_FLAG_BACKWARD | > >> > > > EXEC_FLAG_MARK)) == 0); > >> > > > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here > >> > > > to begin with (because incremental sorts don't support rescans without > >> > > > parameter changes, right?), the comment and the assertion are telling > >> > > > a different story. > >> > > > >> > > I remember changing this assertion in response to an issue I'd found > >> > > which led to rewriting the rescan implementation, but I must have > >> > > missed updating the comment. > >> > > >> > All right, here are the most relevant messages: > >> > > >> > [1]: Here I'd said: > >> > ---------- > >> > While working on finding a test case to show rescan isn't implemented > >> > properly yet, I came across a bug. At the top of > >> > ExecInitIncrementalSort, we assert that eflags does not contain > >> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins > >> > disabled) breaks that assertion: > >> > > >> > select * from t join (select * from t order by a, b) s on s.a = t.a > >> > where t.a in (1,2); > >> > > >> > The comments about this flag in src/include/executor/executor.h say: > >> > > >> > * REWIND indicates that the plan node should try to efficiently support > >> > * rescans without parameter changes. (Nodes must support ExecReScan calls > >> > * in any case, but if this flag was not given, they are at liberty to do it > >> > * through complete recalculation. Note that a parameter change forces a > >> > * full recalculation in any case.) > >> > > >> > Now we know that except in rare cases (as just discussed recently up > >> > thread) we can't implement rescan efficiently. > >> > > >> > So is this a planner bug (i.e., should we try not to generate > >> > incremental sort plans that require efficient rewind)? Or can we just > >> > remove that part of the assertion and know that we'll implement the > >> > rescan, albeit inefficiently? We already explicitly declare that we > >> > don't support backwards scanning, but I don't see a way to declare the > >> > same for rewind. > >> > ---------- > >> > > >> > So it seems to me that we can't disallow REWIND, and we have to > >> > support rescan, but, we could try to mitigate the effects (without a > >> > param change) with a materialize node, as noted below. > >> > > >> > [2]: Here, in response to my questioning above if this was a planner > >> > bug, I'd said: > >> > ---------- > >> > Other nodes seem to get a materialization node placed above them to > >> > support this case "better". Is that something we should be doing? > >> > ---------- > >> > > >> > I never got any reply on this point; if we _did_ introduce a > >> > materialize node here, then it would mean we could start disallowing > >> > REWIND again. See the email for full details of a specific plan that I > >> > encountered that reproduced this. > >> > > >> > Thoughts? > >> > > >> > > In the meantime, your question is primarily about making sure the > >> > > code/comments/etc. are consistent and not a behavioral problem or > >> > > failure you've seen in testing? > >> > > >> > Still want to confirm this is the case. > >> > > >> > James > >> > > >> > [1]: https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com > >> > [2]: https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com > >> > >> Looking at this more, I think this is definitely suspect. The current > >> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK -- > >> the former is definitely fine because we declare that we don't support > >> backwards scans. The latter seems like the same reasoning would apply, > >> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've > >> attached a patch to do that. > >> > >> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given > >> the comments/docs seem to suggest it's a hint for efficiency rather > >> than something that has to work or be declared as not implemented, so > >> it seems like one of the following should be the outcome: > >> > >> 1. "Disallow" it by only generating materialize nodes above the > >> incremental sort node if REWIND will be required. I'm not sure if this > >> would mean that incremental sort just wouldn't be useful in that case? > >> 2. Keep the existing implementation where we basically ignore REWIND > >> and use our more inefficient implementation. In this case, I believe > >> we need to stop shielding child nodes from REWIND, though, since we we > >> aren't actually storing the full result set and will instead be > >> re-executing the child nodes. > >> > >> I've attached a patch to take course (2), since it's the easiest to > >> implement. But I'd still like feedback on what we should do here, > >> because I don't feel like I actually know what the semantics expected > >> of the executor/planner are on this point. If we do go with this > >> approach, someone should verify my comments additions about > >> materialize nodes is correct. > > > > IMO this is more a comment issue than a code issue, i.e. the comment in > ExecInitIncrementalSort should not mention the REWIND flag at all, as > it's merely a suggestion that cheaper rescans would be nice, but it's > just that - AFAICS it's entirely legal to just ignore the flag and do > full recalc. That's exactly what various other nodes do, I think. > > The BACKWARD/MARK flags are different, because we explicitly check if a > node supports that (ExecSupportsBackwardScan/ExecSupportsMarkRestore) > and we say 'no' in both cases for incremental sort. > > So I think it's OK to leave the assert as it is and just remove the > REWIND flag from the comment. > > Regarding child nodes, I think it's perfectly fine to continue passing > the REWIND flag to them, even if incremental sort has to start from > scratch - we'll still have to read all the input from scratch, but if > the child node can make that cheaper, why not? > > I plan to apply something along the lines of v2-0002, with some comment > tweaks (e.g. the comment still says we're shielding child nodes from > REWIND, but that's no longer the case). Thanks. > >I also happened to noticed that in rescan we are always setting > >node->bounded = false. I was under the impression that > >ExecSetTupleBound would be called *after* ExecReScanIncrementalSort, > >but looking at both ExecSetTupleBound and ExecReScanSort, but it seems > >that the inverse is true. Therefore if we set this to false each time, > >then we lose any possibility of using the bounded optimization for all > >rescans. > > > >I've added a tiny patch (minus one line) to the earlier patch series > >to fix that. > > > > Yeah, that seems like a legit bug. Yep. > As for v2-0001, I don't quite understand why we needs this? AFAICS the > ExecSupportsMarkRestore function already returns "false" for incremental > sort, and we only explicitly list nodes that may return "true" in some > cases. Ah, yes, the default is already false, so we don't need to explicitly do that. Thanks, James
pgsql-hackers by date: