Re: Incremental sorts and EXEC_FLAG_REWIND - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Incremental sorts and EXEC_FLAG_REWIND
Date
Msg-id 20200508231417.ivhr2anwj2yp5t5h@development
Whole thread Raw
In response to Re: Incremental sorts and EXEC_FLAG_REWIND  (James Coleman <jtc331@gmail.com>)
Responses Re: Incremental sorts and EXEC_FLAG_REWIND
List pgsql-hackers
On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote:
>On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote:
>>
>> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote:
>> >
>> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> wrote:
>> > >
>> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > When initializing an incremental sort node, we have the following as
>> > > > of ExecInitIncrementalSort():
>> > > >     /*
>> > > >      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
>> > > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
>> > > >      * batches in the current sort state.
>> > > >      */
>> > > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
>> > > >                        EXEC_FLAG_MARK)) == 0);
>> > > > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
>> > > > to begin with (because incremental sorts don't support rescans without
>> > > > parameter changes, right?), the comment and the assertion are telling
>> > > > a different story.
>> > >
>> > > I remember changing this assertion in response to an issue I'd found
>> > > which led to rewriting the rescan implementation, but I must have
>> > > missed updating the comment.
>> >
>> > All right, here are the most relevant messages:
>> >
>> > [1]: Here I'd said:
>> > ----------
>> > While working on finding a test case to show rescan isn't implemented
>> > properly yet, I came across a bug. At the top of
>> > ExecInitIncrementalSort, we assert that eflags does not contain
>> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
>> > disabled) breaks that assertion:
>> >
>> > select * from t join (select * from t order by a, b) s on s.a = t.a
>> > where t.a in (1,2);
>> >
>> > The comments about this flag in src/include/executor/executor.h say:
>> >
>> > * REWIND indicates that the plan node should try to efficiently support
>> > * rescans without parameter changes. (Nodes must support ExecReScan calls
>> > * in any case, but if this flag was not given, they are at liberty to do it
>> > * through complete recalculation. Note that a parameter change forces a
>> > * full recalculation in any case.)
>> >
>> > Now we know that except in rare cases (as just discussed recently up
>> > thread) we can't implement rescan efficiently.
>> >
>> > So is this a planner bug (i.e., should we try not to generate
>> > incremental sort plans that require efficient rewind)? Or can we just
>> > remove that part of the assertion and know that we'll implement the
>> > rescan, albeit inefficiently? We already explicitly declare that we
>> > don't support backwards scanning, but I don't see a way to declare the
>> > same for rewind.
>> > ----------
>> >
>> > So it seems to me that we can't disallow REWIND, and we have to
>> > support rescan, but, we could try to mitigate the effects (without a
>> > param change) with a materialize node, as noted below.
>> >
>> > [2]: Here, in response to my questioning above if this was a planner
>> > bug, I'd said:
>> > ----------
>> > Other nodes seem to get a materialization node placed above them to
>> > support this case "better". Is that something we should be doing?
>> > ----------
>> >
>> > I never got any reply on this point; if we _did_ introduce a
>> > materialize node here, then it would mean we could start disallowing
>> > REWIND again. See the email for full details of a specific plan that I
>> > encountered that reproduced this.
>> >
>> > Thoughts?
>> >
>> > > In the meantime, your question is primarily about making sure the
>> > > code/comments/etc. are consistent and not a behavioral problem or
>> > > failure you've seen in testing?
>> >
>> > Still want to confirm this is the case.
>> >
>> > James
>> >
>> > [1]:
https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
>> > [2]:
https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
>>
>> Looking at this more, I think this is definitely suspect. The current
>> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
>> the former is definitely fine because we declare that we don't support
>> backwards scans. The latter seems like the same reasoning would apply,
>> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
>> attached a patch to do that.
>>
>> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
>> the comments/docs seem to suggest it's a hint for efficiency rather
>> than something that has to work or be declared as not implemented, so
>> it seems like one of the following should be the outcome:
>>
>> 1. "Disallow" it by only generating materialize nodes above the
>> incremental sort node if REWIND will be required. I'm not sure if this
>> would mean that incremental sort just wouldn't be useful in that case?
>> 2. Keep the existing implementation where we basically ignore REWIND
>> and use our more inefficient implementation. In this case, I believe
>> we need to stop shielding child nodes from REWIND, though, since we we
>> aren't actually storing the full result set and will instead be
>> re-executing the child nodes.
>>
>> I've attached a patch to take course (2), since it's the easiest to
>> implement. But I'd still like feedback on what we should do here,
>> because I don't feel like I actually know what the semantics expected
>> of the executor/planner are on this point. If we do go with this
>> approach, someone should verify my comments additions about
>> materialize nodes is correct.
>

IMO this is more a comment issue than a code issue, i.e. the comment in
ExecInitIncrementalSort should not mention the REWIND flag at all, as
it's merely a suggestion that cheaper rescans would be nice, but it's
just that - AFAICS it's entirely legal to just ignore the flag and do
full recalc. That's exactly what various other nodes do, I think.

The BACKWARD/MARK flags are different, because we explicitly check if a
node supports that (ExecSupportsBackwardScan/ExecSupportsMarkRestore)
and we say 'no' in both cases for incremental sort.

So I think it's OK to leave the assert as it is and just remove the
REWIND flag from the comment.

Regarding child nodes, I think it's perfectly fine to continue passing
the REWIND flag to them, even if incremental sort has to start from
scratch - we'll still have to read all the input from scratch, but if
the child node can make that cheaper, why not?

I plan to apply something along the lines of v2-0002, with some comment
tweaks (e.g. the comment still says we're shielding child nodes from
REWIND, but that's no longer the case).

>I also happened to noticed that in rescan we are always setting
>node->bounded = false. I was under the impression that
>ExecSetTupleBound would be called *after* ExecReScanIncrementalSort,
>but looking at both ExecSetTupleBound and ExecReScanSort, but it seems
>that the inverse is true. Therefore if we set this to false each time,
>then we lose any possibility of using the bounded optimization for all
>rescans.
>
>I've added a tiny patch (minus one line) to the earlier patch series
>to fix that.
>

Yeah, that seems like a legit bug.


As for v2-0001, I don't quite understand why we needs this? AFAICS the
ExecSupportsMarkRestore function already returns "false" for incremental
sort, and we only explicitly list nodes that may return "true" in some
cases.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: [PATCH] Fix division by zero (explain.c)
Next
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Fix division by zero (explain.c)