Re: Incremental sorts and EXEC_FLAG_REWIND - Mailing list pgsql-hackers
From | Jonathan S. Katz |
---|---|
Subject | Re: Incremental sorts and EXEC_FLAG_REWIND |
Date | |
Msg-id | a2c22860-9f1c-51f6-3997-3d04cf882dc7@postgresql.org Whole thread Raw |
In response to | Re: Incremental sorts and EXEC_FLAG_REWIND (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Incremental sorts and EXEC_FLAG_REWIND
|
List | pgsql-hackers |
On 4/24/20 6:57 PM, Tomas Vondra wrote: > On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote: >> On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote: >>> >>> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote: >>> > >>> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> >>> wrote: >>> > > >>> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier >>> <michael@paquier.xyz> wrote: >>> > > > >>> > > > Hi, >>> > > > >>> > > > When initializing an incremental sort node, we have the >>> following as >>> > > > of ExecInitIncrementalSort(): >>> > > > /* >>> > > > * Incremental sort can't be used with either >>> EXEC_FLAG_REWIND, >>> > > > * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only >>> one of many sort >>> > > > * batches in the current sort state. >>> > > > */ >>> > > > Assert((eflags & (EXEC_FLAG_BACKWARD | >>> > > > EXEC_FLAG_MARK)) == 0); >>> > > > While I don't quite follow why EXEC_FLAG_REWIND should be >>> allowed here >>> > > > to begin with (because incremental sorts don't support rescans >>> without >>> > > > parameter changes, right?), the comment and the assertion are >>> telling >>> > > > a different story. >>> > > >>> > > I remember changing this assertion in response to an issue I'd found >>> > > which led to rewriting the rescan implementation, but I must have >>> > > missed updating the comment. >>> > >>> > All right, here are the most relevant messages: >>> > >>> > [1]: Here I'd said: >>> > ---------- >>> > While working on finding a test case to show rescan isn't implemented >>> > properly yet, I came across a bug. At the top of >>> > ExecInitIncrementalSort, we assert that eflags does not contain >>> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins >>> > disabled) breaks that assertion: >>> > >>> > select * from t join (select * from t order by a, b) s on s.a = t.a >>> > where t.a in (1,2); >>> > >>> > The comments about this flag in src/include/executor/executor.h say: >>> > >>> > * REWIND indicates that the plan node should try to efficiently >>> support >>> > * rescans without parameter changes. (Nodes must support ExecReScan >>> calls >>> > * in any case, but if this flag was not given, they are at liberty >>> to do it >>> > * through complete recalculation. Note that a parameter change >>> forces a >>> > * full recalculation in any case.) >>> > >>> > Now we know that except in rare cases (as just discussed recently up >>> > thread) we can't implement rescan efficiently. >>> > >>> > So is this a planner bug (i.e., should we try not to generate >>> > incremental sort plans that require efficient rewind)? Or can we just >>> > remove that part of the assertion and know that we'll implement the >>> > rescan, albeit inefficiently? We already explicitly declare that we >>> > don't support backwards scanning, but I don't see a way to declare the >>> > same for rewind. >>> > ---------- >>> > >>> > So it seems to me that we can't disallow REWIND, and we have to >>> > support rescan, but, we could try to mitigate the effects (without a >>> > param change) with a materialize node, as noted below. >>> > >>> > [2]: Here, in response to my questioning above if this was a planner >>> > bug, I'd said: >>> > ---------- >>> > Other nodes seem to get a materialization node placed above them to >>> > support this case "better". Is that something we should be doing? >>> > ---------- >>> > >>> > I never got any reply on this point; if we _did_ introduce a >>> > materialize node here, then it would mean we could start disallowing >>> > REWIND again. See the email for full details of a specific plan that I >>> > encountered that reproduced this. >>> > >>> > Thoughts? >>> > >>> > > In the meantime, your question is primarily about making sure the >>> > > code/comments/etc. are consistent and not a behavioral problem or >>> > > failure you've seen in testing? >>> > >>> > Still want to confirm this is the case. >>> > >>> > James >>> > >>> > [1]: >>> https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com >>> >>> > [2]: >>> https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com >>> >>> >>> Looking at this more, I think this is definitely suspect. The current >>> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK -- >>> the former is definitely fine because we declare that we don't support >>> backwards scans. The latter seems like the same reasoning would apply, >>> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've >>> attached a patch to do that. >>> >>> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given >>> the comments/docs seem to suggest it's a hint for efficiency rather >>> than something that has to work or be declared as not implemented, so >>> it seems like one of the following should be the outcome: >>> >>> 1. "Disallow" it by only generating materialize nodes above the >>> incremental sort node if REWIND will be required. I'm not sure if this >>> would mean that incremental sort just wouldn't be useful in that case? >>> 2. Keep the existing implementation where we basically ignore REWIND >>> and use our more inefficient implementation. In this case, I believe >>> we need to stop shielding child nodes from REWIND, though, since we we >>> aren't actually storing the full result set and will instead be >>> re-executing the child nodes. >>> >>> I've attached a patch to take course (2), since it's the easiest to >>> implement. But I'd still like feedback on what we should do here, >>> because I don't feel like I actually know what the semantics expected >>> of the executor/planner are on this point. If we do go with this >>> approach, someone should verify my comments additions about >>> materialize nodes is correct. >> >> I also happened to noticed that in rescan we are always setting >> node->bounded = false. I was under the impression that >> ExecSetTupleBound would be called *after* ExecReScanIncrementalSort, >> but looking at both ExecSetTupleBound and ExecReScanSort, but it seems >> that the inverse is true. Therefore if we set this to false each time, >> then we lose any possibility of using the bounded optimization for all >> rescans. >> >> I've added a tiny patch (minus one line) to the earlier patch series >> to fix that. >> > > Thanks. I'll take a look at the issue and fixes. With Beta 1 just around the corner[1], I wanted to check in to see if this was closer to being committed so we could close off the open item[2] prior the beta release. Thanks! Jonathan [1] https://www.postgresql.org/message-id/b782a4ec-5e8e-21a7-f628-624be683e6d6@postgresql.org [2] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
Attachment
pgsql-hackers by date: