Thread: Incremental sorts and EXEC_FLAG_REWIND

Incremental sorts and EXEC_FLAG_REWIND

From
Michael Paquier
Date:
Hi,

When initializing an incremental sort node, we have the following as
of ExecInitIncrementalSort():
    /*
     * Incremental sort can't be used with either EXEC_FLAG_REWIND,
     * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
     * batches in the current sort state.
     */
     Assert((eflags & (EXEC_FLAG_BACKWARD |
                       EXEC_FLAG_MARK)) == 0);
While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
to begin with (because incremental sorts don't support rescans without
parameter changes, right?), the comment and the assertion are telling
a different story.  And I can see that child nodes of an
IncrementalSort one use a set of eflags where these three are removed:
    /*
     * Initialize child nodes.
     *
     * We shield the child node from the need to support REWIND, BACKWARD, or
     * MARK/RESTORE.
     */
     eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);

I can also spot one case in the regression tests where we actually pass
down a REWIND flag (see incremental_sort.sql) when initializing an
IncrementalSort node:
-- We force the planner to choose a plan with incremental sort on the right side
-- of a nested loop join node. That way we trigger the rescan code path.
set local enable_hashjoin = off;
set local enable_mergejoin = off;
set local enable_material = off;
set local enable_sort = off;
explain (costs off) select * from t left join (select * from (select *
from t order by a) v order by a, b) s on s.a = t.a where t.a in (1,
2);
select * from t left join (select * from (select * from t order by a)
v order by a, b) s on s.a = t.a where t.a in (1, 2);

Alexander, Tomas, any thoughts?
--
Michael

Attachment

Re: Incremental sorts and EXEC_FLAG_REWIND

From
James Coleman
Date:
On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> Hi,
>
> When initializing an incremental sort node, we have the following as
> of ExecInitIncrementalSort():
>     /*
>      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
>      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
>      * batches in the current sort state.
>      */
>      Assert((eflags & (EXEC_FLAG_BACKWARD |
>                        EXEC_FLAG_MARK)) == 0);
> While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
> to begin with (because incremental sorts don't support rescans without
> parameter changes, right?), the comment and the assertion are telling
> a different story.

I remember changing this assertion in response to an issue I'd found
which led to rewriting the rescan implementation, but I must have
missed updating the comment.

> And I can see that child nodes of an
> IncrementalSort one use a set of eflags where these three are removed:
>     /*
>      * Initialize child nodes.
>      *
>      * We shield the child node from the need to support REWIND, BACKWARD, or
>      * MARK/RESTORE.
>      */
>      eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
>
> I can also spot one case in the regression tests where we actually pass
> down a REWIND flag (see incremental_sort.sql) when initializing an
> IncrementalSort node:
> -- We force the planner to choose a plan with incremental sort on the right side
> -- of a nested loop join node. That way we trigger the rescan code path.
> set local enable_hashjoin = off;
> set local enable_mergejoin = off;
> set local enable_material = off;
> set local enable_sort = off;
> explain (costs off) select * from t left join (select * from (select *
> from t order by a) v order by a, b) s on s.a = t.a where t.a in (1,
> 2);
> select * from t left join (select * from (select * from t order by a)
> v order by a, b) s on s.a = t.a where t.a in (1, 2);
>
> Alexander, Tomas, any thoughts?

I'll try to respond more fully later today when I can dig up the
specific change.

In the meantime, your question is primarily about making sure the
code/comments/etc. are consistent and not a behavioral problem or
failure you've seen in testing?

James



Re: Incremental sorts and EXEC_FLAG_REWIND

From
James Coleman
Date:
On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > Hi,
> >
> > When initializing an incremental sort node, we have the following as
> > of ExecInitIncrementalSort():
> >     /*
> >      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
> >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
> >      * batches in the current sort state.
> >      */
> >      Assert((eflags & (EXEC_FLAG_BACKWARD |
> >                        EXEC_FLAG_MARK)) == 0);
> > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
> > to begin with (because incremental sorts don't support rescans without
> > parameter changes, right?), the comment and the assertion are telling
> > a different story.
>
> I remember changing this assertion in response to an issue I'd found
> which led to rewriting the rescan implementation, but I must have
> missed updating the comment.

All right, here are the most relevant messages:

[1]: Here I'd said:
----------
While working on finding a test case to show rescan isn't implemented
properly yet, I came across a bug. At the top of
ExecInitIncrementalSort, we assert that eflags does not contain
EXEC_FLAG_REWIND. But the following query (with merge and hash joins
disabled) breaks that assertion:

select * from t join (select * from t order by a, b) s on s.a = t.a
where t.a in (1,2);

The comments about this flag in src/include/executor/executor.h say:

* REWIND indicates that the plan node should try to efficiently support
* rescans without parameter changes. (Nodes must support ExecReScan calls
* in any case, but if this flag was not given, they are at liberty to do it
* through complete recalculation. Note that a parameter change forces a
* full recalculation in any case.)

Now we know that except in rare cases (as just discussed recently up
thread) we can't implement rescan efficiently.

So is this a planner bug (i.e., should we try not to generate
incremental sort plans that require efficient rewind)? Or can we just
remove that part of the assertion and know that we'll implement the
rescan, albeit inefficiently? We already explicitly declare that we
don't support backwards scanning, but I don't see a way to declare the
same for rewind.
----------

So it seems to me that we can't disallow REWIND, and we have to
support rescan, but, we could try to mitigate the effects (without a
param change) with a materialize node, as noted below.

[2]: Here, in response to my questioning above if this was a planner
bug, I'd said:
----------
Other nodes seem to get a materialization node placed above them to
support this case "better". Is that something we should be doing?
----------

I never got any reply on this point; if we _did_ introduce a
materialize node here, then it would mean we could start disallowing
REWIND again. See the email for full details of a specific plan that I
encountered that reproduced this.

Thoughts?

> In the meantime, your question is primarily about making sure the
> code/comments/etc. are consistent and not a behavioral problem or
> failure you've seen in testing?

Still want to confirm this is the case.

James

[1]: https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com



Re: Incremental sorts and EXEC_FLAG_REWIND

From
James Coleman
Date:
On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote:
> > >
> > > Hi,
> > >
> > > When initializing an incremental sort node, we have the following as
> > > of ExecInitIncrementalSort():
> > >     /*
> > >      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
> > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
> > >      * batches in the current sort state.
> > >      */
> > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
> > >                        EXEC_FLAG_MARK)) == 0);
> > > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
> > > to begin with (because incremental sorts don't support rescans without
> > > parameter changes, right?), the comment and the assertion are telling
> > > a different story.
> >
> > I remember changing this assertion in response to an issue I'd found
> > which led to rewriting the rescan implementation, but I must have
> > missed updating the comment.
>
> All right, here are the most relevant messages:
>
> [1]: Here I'd said:
> ----------
> While working on finding a test case to show rescan isn't implemented
> properly yet, I came across a bug. At the top of
> ExecInitIncrementalSort, we assert that eflags does not contain
> EXEC_FLAG_REWIND. But the following query (with merge and hash joins
> disabled) breaks that assertion:
>
> select * from t join (select * from t order by a, b) s on s.a = t.a
> where t.a in (1,2);
>
> The comments about this flag in src/include/executor/executor.h say:
>
> * REWIND indicates that the plan node should try to efficiently support
> * rescans without parameter changes. (Nodes must support ExecReScan calls
> * in any case, but if this flag was not given, they are at liberty to do it
> * through complete recalculation. Note that a parameter change forces a
> * full recalculation in any case.)
>
> Now we know that except in rare cases (as just discussed recently up
> thread) we can't implement rescan efficiently.
>
> So is this a planner bug (i.e., should we try not to generate
> incremental sort plans that require efficient rewind)? Or can we just
> remove that part of the assertion and know that we'll implement the
> rescan, albeit inefficiently? We already explicitly declare that we
> don't support backwards scanning, but I don't see a way to declare the
> same for rewind.
> ----------
>
> So it seems to me that we can't disallow REWIND, and we have to
> support rescan, but, we could try to mitigate the effects (without a
> param change) with a materialize node, as noted below.
>
> [2]: Here, in response to my questioning above if this was a planner
> bug, I'd said:
> ----------
> Other nodes seem to get a materialization node placed above them to
> support this case "better". Is that something we should be doing?
> ----------
>
> I never got any reply on this point; if we _did_ introduce a
> materialize node here, then it would mean we could start disallowing
> REWIND again. See the email for full details of a specific plan that I
> encountered that reproduced this.
>
> Thoughts?
>
> > In the meantime, your question is primarily about making sure the
> > code/comments/etc. are consistent and not a behavioral problem or
> > failure you've seen in testing?
>
> Still want to confirm this is the case.
>
> James
>
> [1]: https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
> [2]:
https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com

Looking at this more, I think this is definitely suspect. The current
code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
the former is definitely fine because we declare that we don't support
backwards scans. The latter seems like the same reasoning would apply,
but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
attached a patch to do that.

The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
the comments/docs seem to suggest it's a hint for efficiency rather
than something that has to work or be declared as not implemented, so
it seems like one of the following should be the outcome:

1. "Disallow" it by only generating materialize nodes above the
incremental sort node if REWIND will be required. I'm not sure if this
would mean that incremental sort just wouldn't be useful in that case?
2. Keep the existing implementation where we basically ignore REWIND
and use our more inefficient implementation. In this case, I believe
we need to stop shielding child nodes from REWIND, though, since we we
aren't actually storing the full result set and will instead be
re-executing the child nodes.

I've attached a patch to take course (2), since it's the easiest to
implement. But I'd still like feedback on what we should do here,
because I don't feel like I actually know what the semantics expected
of the executor/planner are on this point. If we do go with this
approach, someone should verify my comments additions about
materialize nodes is correct.

James

Attachment

Re: Incremental sorts and EXEC_FLAG_REWIND

From
James Coleman
Date:
On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> wrote:
> > >
> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote:
> > > >
> > > > Hi,
> > > >
> > > > When initializing an incremental sort node, we have the following as
> > > > of ExecInitIncrementalSort():
> > > >     /*
> > > >      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
> > > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
> > > >      * batches in the current sort state.
> > > >      */
> > > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
> > > >                        EXEC_FLAG_MARK)) == 0);
> > > > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
> > > > to begin with (because incremental sorts don't support rescans without
> > > > parameter changes, right?), the comment and the assertion are telling
> > > > a different story.
> > >
> > > I remember changing this assertion in response to an issue I'd found
> > > which led to rewriting the rescan implementation, but I must have
> > > missed updating the comment.
> >
> > All right, here are the most relevant messages:
> >
> > [1]: Here I'd said:
> > ----------
> > While working on finding a test case to show rescan isn't implemented
> > properly yet, I came across a bug. At the top of
> > ExecInitIncrementalSort, we assert that eflags does not contain
> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
> > disabled) breaks that assertion:
> >
> > select * from t join (select * from t order by a, b) s on s.a = t.a
> > where t.a in (1,2);
> >
> > The comments about this flag in src/include/executor/executor.h say:
> >
> > * REWIND indicates that the plan node should try to efficiently support
> > * rescans without parameter changes. (Nodes must support ExecReScan calls
> > * in any case, but if this flag was not given, they are at liberty to do it
> > * through complete recalculation. Note that a parameter change forces a
> > * full recalculation in any case.)
> >
> > Now we know that except in rare cases (as just discussed recently up
> > thread) we can't implement rescan efficiently.
> >
> > So is this a planner bug (i.e., should we try not to generate
> > incremental sort plans that require efficient rewind)? Or can we just
> > remove that part of the assertion and know that we'll implement the
> > rescan, albeit inefficiently? We already explicitly declare that we
> > don't support backwards scanning, but I don't see a way to declare the
> > same for rewind.
> > ----------
> >
> > So it seems to me that we can't disallow REWIND, and we have to
> > support rescan, but, we could try to mitigate the effects (without a
> > param change) with a materialize node, as noted below.
> >
> > [2]: Here, in response to my questioning above if this was a planner
> > bug, I'd said:
> > ----------
> > Other nodes seem to get a materialization node placed above them to
> > support this case "better". Is that something we should be doing?
> > ----------
> >
> > I never got any reply on this point; if we _did_ introduce a
> > materialize node here, then it would mean we could start disallowing
> > REWIND again. See the email for full details of a specific plan that I
> > encountered that reproduced this.
> >
> > Thoughts?
> >
> > > In the meantime, your question is primarily about making sure the
> > > code/comments/etc. are consistent and not a behavioral problem or
> > > failure you've seen in testing?
> >
> > Still want to confirm this is the case.
> >
> > James
> >
> > [1]:
https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
> > [2]:
https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
>
> Looking at this more, I think this is definitely suspect. The current
> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
> the former is definitely fine because we declare that we don't support
> backwards scans. The latter seems like the same reasoning would apply,
> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
> attached a patch to do that.
>
> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
> the comments/docs seem to suggest it's a hint for efficiency rather
> than something that has to work or be declared as not implemented, so
> it seems like one of the following should be the outcome:
>
> 1. "Disallow" it by only generating materialize nodes above the
> incremental sort node if REWIND will be required. I'm not sure if this
> would mean that incremental sort just wouldn't be useful in that case?
> 2. Keep the existing implementation where we basically ignore REWIND
> and use our more inefficient implementation. In this case, I believe
> we need to stop shielding child nodes from REWIND, though, since we we
> aren't actually storing the full result set and will instead be
> re-executing the child nodes.
>
> I've attached a patch to take course (2), since it's the easiest to
> implement. But I'd still like feedback on what we should do here,
> because I don't feel like I actually know what the semantics expected
> of the executor/planner are on this point. If we do go with this
> approach, someone should verify my comments additions about
> materialize nodes is correct.

I also happened to noticed that in rescan we are always setting
node->bounded = false. I was under the impression that
ExecSetTupleBound would be called *after* ExecReScanIncrementalSort,
but looking at both ExecSetTupleBound and ExecReScanSort, but it seems
that the inverse is true. Therefore if we set this to false each time,
then we lose any possibility of using the bounded optimization for all
rescans.

I've added a tiny patch (minus one line) to the earlier patch series
to fix that.


James

Attachment

Re: Incremental sorts and EXEC_FLAG_REWIND

From
Tomas Vondra
Date:
On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote:
>On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote:
>>
>> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote:
>> >
>> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> wrote:
>> > >
>> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > When initializing an incremental sort node, we have the following as
>> > > > of ExecInitIncrementalSort():
>> > > >     /*
>> > > >      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
>> > > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
>> > > >      * batches in the current sort state.
>> > > >      */
>> > > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
>> > > >                        EXEC_FLAG_MARK)) == 0);
>> > > > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
>> > > > to begin with (because incremental sorts don't support rescans without
>> > > > parameter changes, right?), the comment and the assertion are telling
>> > > > a different story.
>> > >
>> > > I remember changing this assertion in response to an issue I'd found
>> > > which led to rewriting the rescan implementation, but I must have
>> > > missed updating the comment.
>> >
>> > All right, here are the most relevant messages:
>> >
>> > [1]: Here I'd said:
>> > ----------
>> > While working on finding a test case to show rescan isn't implemented
>> > properly yet, I came across a bug. At the top of
>> > ExecInitIncrementalSort, we assert that eflags does not contain
>> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
>> > disabled) breaks that assertion:
>> >
>> > select * from t join (select * from t order by a, b) s on s.a = t.a
>> > where t.a in (1,2);
>> >
>> > The comments about this flag in src/include/executor/executor.h say:
>> >
>> > * REWIND indicates that the plan node should try to efficiently support
>> > * rescans without parameter changes. (Nodes must support ExecReScan calls
>> > * in any case, but if this flag was not given, they are at liberty to do it
>> > * through complete recalculation. Note that a parameter change forces a
>> > * full recalculation in any case.)
>> >
>> > Now we know that except in rare cases (as just discussed recently up
>> > thread) we can't implement rescan efficiently.
>> >
>> > So is this a planner bug (i.e., should we try not to generate
>> > incremental sort plans that require efficient rewind)? Or can we just
>> > remove that part of the assertion and know that we'll implement the
>> > rescan, albeit inefficiently? We already explicitly declare that we
>> > don't support backwards scanning, but I don't see a way to declare the
>> > same for rewind.
>> > ----------
>> >
>> > So it seems to me that we can't disallow REWIND, and we have to
>> > support rescan, but, we could try to mitigate the effects (without a
>> > param change) with a materialize node, as noted below.
>> >
>> > [2]: Here, in response to my questioning above if this was a planner
>> > bug, I'd said:
>> > ----------
>> > Other nodes seem to get a materialization node placed above them to
>> > support this case "better". Is that something we should be doing?
>> > ----------
>> >
>> > I never got any reply on this point; if we _did_ introduce a
>> > materialize node here, then it would mean we could start disallowing
>> > REWIND again. See the email for full details of a specific plan that I
>> > encountered that reproduced this.
>> >
>> > Thoughts?
>> >
>> > > In the meantime, your question is primarily about making sure the
>> > > code/comments/etc. are consistent and not a behavioral problem or
>> > > failure you've seen in testing?
>> >
>> > Still want to confirm this is the case.
>> >
>> > James
>> >
>> > [1]:
https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
>> > [2]:
https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
>>
>> Looking at this more, I think this is definitely suspect. The current
>> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
>> the former is definitely fine because we declare that we don't support
>> backwards scans. The latter seems like the same reasoning would apply,
>> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
>> attached a patch to do that.
>>
>> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
>> the comments/docs seem to suggest it's a hint for efficiency rather
>> than something that has to work or be declared as not implemented, so
>> it seems like one of the following should be the outcome:
>>
>> 1. "Disallow" it by only generating materialize nodes above the
>> incremental sort node if REWIND will be required. I'm not sure if this
>> would mean that incremental sort just wouldn't be useful in that case?
>> 2. Keep the existing implementation where we basically ignore REWIND
>> and use our more inefficient implementation. In this case, I believe
>> we need to stop shielding child nodes from REWIND, though, since we we
>> aren't actually storing the full result set and will instead be
>> re-executing the child nodes.
>>
>> I've attached a patch to take course (2), since it's the easiest to
>> implement. But I'd still like feedback on what we should do here,
>> because I don't feel like I actually know what the semantics expected
>> of the executor/planner are on this point. If we do go with this
>> approach, someone should verify my comments additions about
>> materialize nodes is correct.
>
>I also happened to noticed that in rescan we are always setting
>node->bounded = false. I was under the impression that
>ExecSetTupleBound would be called *after* ExecReScanIncrementalSort,
>but looking at both ExecSetTupleBound and ExecReScanSort, but it seems
>that the inverse is true. Therefore if we set this to false each time,
>then we lose any possibility of using the bounded optimization for all
>rescans.
>
>I've added a tiny patch (minus one line) to the earlier patch series
>to fix that.
>

Thanks. I'll take a look at the issue and fixes.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Incremental sorts and EXEC_FLAG_REWIND

From
"Jonathan S. Katz"
Date:
On 4/24/20 6:57 PM, Tomas Vondra wrote:
> On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote:
>> On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote:
>>>
>>> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote:
>>> >
>>> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com>
>>> wrote:
>>> > >
>>> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier
>>> <michael@paquier.xyz> wrote:
>>> > > >
>>> > > > Hi,
>>> > > >
>>> > > > When initializing an incremental sort node, we have the
>>> following as
>>> > > > of ExecInitIncrementalSort():
>>> > > >     /*
>>> > > >      * Incremental sort can't be used with either
>>> EXEC_FLAG_REWIND,
>>> > > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only
>>> one of many sort
>>> > > >      * batches in the current sort state.
>>> > > >      */
>>> > > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
>>> > > >                        EXEC_FLAG_MARK)) == 0);
>>> > > > While I don't quite follow why EXEC_FLAG_REWIND should be
>>> allowed here
>>> > > > to begin with (because incremental sorts don't support rescans
>>> without
>>> > > > parameter changes, right?), the comment and the assertion are
>>> telling
>>> > > > a different story.
>>> > >
>>> > > I remember changing this assertion in response to an issue I'd found
>>> > > which led to rewriting the rescan implementation, but I must have
>>> > > missed updating the comment.
>>> >
>>> > All right, here are the most relevant messages:
>>> >
>>> > [1]: Here I'd said:
>>> > ----------
>>> > While working on finding a test case to show rescan isn't implemented
>>> > properly yet, I came across a bug. At the top of
>>> > ExecInitIncrementalSort, we assert that eflags does not contain
>>> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
>>> > disabled) breaks that assertion:
>>> >
>>> > select * from t join (select * from t order by a, b) s on s.a = t.a
>>> > where t.a in (1,2);
>>> >
>>> > The comments about this flag in src/include/executor/executor.h say:
>>> >
>>> > * REWIND indicates that the plan node should try to efficiently
>>> support
>>> > * rescans without parameter changes. (Nodes must support ExecReScan
>>> calls
>>> > * in any case, but if this flag was not given, they are at liberty
>>> to do it
>>> > * through complete recalculation. Note that a parameter change
>>> forces a
>>> > * full recalculation in any case.)
>>> >
>>> > Now we know that except in rare cases (as just discussed recently up
>>> > thread) we can't implement rescan efficiently.
>>> >
>>> > So is this a planner bug (i.e., should we try not to generate
>>> > incremental sort plans that require efficient rewind)? Or can we just
>>> > remove that part of the assertion and know that we'll implement the
>>> > rescan, albeit inefficiently? We already explicitly declare that we
>>> > don't support backwards scanning, but I don't see a way to declare the
>>> > same for rewind.
>>> > ----------
>>> >
>>> > So it seems to me that we can't disallow REWIND, and we have to
>>> > support rescan, but, we could try to mitigate the effects (without a
>>> > param change) with a materialize node, as noted below.
>>> >
>>> > [2]: Here, in response to my questioning above if this was a planner
>>> > bug, I'd said:
>>> > ----------
>>> > Other nodes seem to get a materialization node placed above them to
>>> > support this case "better". Is that something we should be doing?
>>> > ----------
>>> >
>>> > I never got any reply on this point; if we _did_ introduce a
>>> > materialize node here, then it would mean we could start disallowing
>>> > REWIND again. See the email for full details of a specific plan that I
>>> > encountered that reproduced this.
>>> >
>>> > Thoughts?
>>> >
>>> > > In the meantime, your question is primarily about making sure the
>>> > > code/comments/etc. are consistent and not a behavioral problem or
>>> > > failure you've seen in testing?
>>> >
>>> > Still want to confirm this is the case.
>>> >
>>> > James
>>> >
>>> > [1]:
>>> https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
>>>
>>> > [2]:
>>> https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
>>>
>>>
>>> Looking at this more, I think this is definitely suspect. The current
>>> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
>>> the former is definitely fine because we declare that we don't support
>>> backwards scans. The latter seems like the same reasoning would apply,
>>> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
>>> attached a patch to do that.
>>>
>>> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
>>> the comments/docs seem to suggest it's a hint for efficiency rather
>>> than something that has to work or be declared as not implemented, so
>>> it seems like one of the following should be the outcome:
>>>
>>> 1. "Disallow" it by only generating materialize nodes above the
>>> incremental sort node if REWIND will be required. I'm not sure if this
>>> would mean that incremental sort just wouldn't be useful in that case?
>>> 2. Keep the existing implementation where we basically ignore REWIND
>>> and use our more inefficient implementation. In this case, I believe
>>> we need to stop shielding child nodes from REWIND, though, since we we
>>> aren't actually storing the full result set and will instead be
>>> re-executing the child nodes.
>>>
>>> I've attached a patch to take course (2), since it's the easiest to
>>> implement. But I'd still like feedback on what we should do here,
>>> because I don't feel like I actually know what the semantics expected
>>> of the executor/planner are on this point. If we do go with this
>>> approach, someone should verify my comments additions about
>>> materialize nodes is correct.
>>
>> I also happened to noticed that in rescan we are always setting
>> node->bounded = false. I was under the impression that
>> ExecSetTupleBound would be called *after* ExecReScanIncrementalSort,
>> but looking at both ExecSetTupleBound and ExecReScanSort, but it seems
>> that the inverse is true. Therefore if we set this to false each time,
>> then we lose any possibility of using the bounded optimization for all
>> rescans.
>>
>> I've added a tiny patch (minus one line) to the earlier patch series
>> to fix that.
>>
>
> Thanks. I'll take a look at the issue and fixes.

With Beta 1 just around the corner[1], I wanted to check in to see if
this was closer to being committed so we could close off the open
item[2] prior the beta release.

Thanks!

Jonathan

[1]
https://www.postgresql.org/message-id/b782a4ec-5e8e-21a7-f628-624be683e6d6@postgresql.org
[2] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items


Attachment

Re: Incremental sorts and EXEC_FLAG_REWIND

From
James Coleman
Date:
On Thu, May 7, 2020 at 2:57 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>
> On 4/24/20 6:57 PM, Tomas Vondra wrote:
> > On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote:
> >> On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote:
> >>>
> >>> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote:
> >>> >
> >>> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com>
> >>> wrote:
> >>> > >
> >>> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier
> >>> <michael@paquier.xyz> wrote:
> >>> > > >
> >>> > > > Hi,
> >>> > > >
> >>> > > > When initializing an incremental sort node, we have the
> >>> following as
> >>> > > > of ExecInitIncrementalSort():
> >>> > > >     /*
> >>> > > >      * Incremental sort can't be used with either
> >>> EXEC_FLAG_REWIND,
> >>> > > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only
> >>> one of many sort
> >>> > > >      * batches in the current sort state.
> >>> > > >      */
> >>> > > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
> >>> > > >                        EXEC_FLAG_MARK)) == 0);
> >>> > > > While I don't quite follow why EXEC_FLAG_REWIND should be
> >>> allowed here
> >>> > > > to begin with (because incremental sorts don't support rescans
> >>> without
> >>> > > > parameter changes, right?), the comment and the assertion are
> >>> telling
> >>> > > > a different story.
> >>> > >
> >>> > > I remember changing this assertion in response to an issue I'd found
> >>> > > which led to rewriting the rescan implementation, but I must have
> >>> > > missed updating the comment.
> >>> >
> >>> > All right, here are the most relevant messages:
> >>> >
> >>> > [1]: Here I'd said:
> >>> > ----------
> >>> > While working on finding a test case to show rescan isn't implemented
> >>> > properly yet, I came across a bug. At the top of
> >>> > ExecInitIncrementalSort, we assert that eflags does not contain
> >>> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
> >>> > disabled) breaks that assertion:
> >>> >
> >>> > select * from t join (select * from t order by a, b) s on s.a = t.a
> >>> > where t.a in (1,2);
> >>> >
> >>> > The comments about this flag in src/include/executor/executor.h say:
> >>> >
> >>> > * REWIND indicates that the plan node should try to efficiently
> >>> support
> >>> > * rescans without parameter changes. (Nodes must support ExecReScan
> >>> calls
> >>> > * in any case, but if this flag was not given, they are at liberty
> >>> to do it
> >>> > * through complete recalculation. Note that a parameter change
> >>> forces a
> >>> > * full recalculation in any case.)
> >>> >
> >>> > Now we know that except in rare cases (as just discussed recently up
> >>> > thread) we can't implement rescan efficiently.
> >>> >
> >>> > So is this a planner bug (i.e., should we try not to generate
> >>> > incremental sort plans that require efficient rewind)? Or can we just
> >>> > remove that part of the assertion and know that we'll implement the
> >>> > rescan, albeit inefficiently? We already explicitly declare that we
> >>> > don't support backwards scanning, but I don't see a way to declare the
> >>> > same for rewind.
> >>> > ----------
> >>> >
> >>> > So it seems to me that we can't disallow REWIND, and we have to
> >>> > support rescan, but, we could try to mitigate the effects (without a
> >>> > param change) with a materialize node, as noted below.
> >>> >
> >>> > [2]: Here, in response to my questioning above if this was a planner
> >>> > bug, I'd said:
> >>> > ----------
> >>> > Other nodes seem to get a materialization node placed above them to
> >>> > support this case "better". Is that something we should be doing?
> >>> > ----------
> >>> >
> >>> > I never got any reply on this point; if we _did_ introduce a
> >>> > materialize node here, then it would mean we could start disallowing
> >>> > REWIND again. See the email for full details of a specific plan that I
> >>> > encountered that reproduced this.
> >>> >
> >>> > Thoughts?
> >>> >
> >>> > > In the meantime, your question is primarily about making sure the
> >>> > > code/comments/etc. are consistent and not a behavioral problem or
> >>> > > failure you've seen in testing?
> >>> >
> >>> > Still want to confirm this is the case.
> >>> >
> >>> > James
> >>> >
> >>> > [1]:
> >>> https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
> >>>
> >>> > [2]:
> >>> https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
> >>>
> >>>
> >>> Looking at this more, I think this is definitely suspect. The current
> >>> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
> >>> the former is definitely fine because we declare that we don't support
> >>> backwards scans. The latter seems like the same reasoning would apply,
> >>> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
> >>> attached a patch to do that.
> >>>
> >>> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
> >>> the comments/docs seem to suggest it's a hint for efficiency rather
> >>> than something that has to work or be declared as not implemented, so
> >>> it seems like one of the following should be the outcome:
> >>>
> >>> 1. "Disallow" it by only generating materialize nodes above the
> >>> incremental sort node if REWIND will be required. I'm not sure if this
> >>> would mean that incremental sort just wouldn't be useful in that case?
> >>> 2. Keep the existing implementation where we basically ignore REWIND
> >>> and use our more inefficient implementation. In this case, I believe
> >>> we need to stop shielding child nodes from REWIND, though, since we we
> >>> aren't actually storing the full result set and will instead be
> >>> re-executing the child nodes.
> >>>
> >>> I've attached a patch to take course (2), since it's the easiest to
> >>> implement. But I'd still like feedback on what we should do here,
> >>> because I don't feel like I actually know what the semantics expected
> >>> of the executor/planner are on this point. If we do go with this
> >>> approach, someone should verify my comments additions about
> >>> materialize nodes is correct.
> >>
> >> I also happened to noticed that in rescan we are always setting
> >> node->bounded = false. I was under the impression that
> >> ExecSetTupleBound would be called *after* ExecReScanIncrementalSort,
> >> but looking at both ExecSetTupleBound and ExecReScanSort, but it seems
> >> that the inverse is true. Therefore if we set this to false each time,
> >> then we lose any possibility of using the bounded optimization for all
> >> rescans.
> >>
> >> I've added a tiny patch (minus one line) to the earlier patch series
> >> to fix that.
> >>
> >
> > Thanks. I'll take a look at the issue and fixes.
>
> With Beta 1 just around the corner[1], I wanted to check in to see if
> this was closer to being committed so we could close off the open
> item[2] prior the beta release.
>
> Thanks!
>
> Jonathan
>
> [1]
> https://www.postgresql.org/message-id/b782a4ec-5e8e-21a7-f628-624be683e6d6@postgresql.org
> [2] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items


Tangential, I know, but I think we should consider [1] (includes a
very minor patch) part of the open items on incremental sort also:

James

[1]: https://www.postgresql.org/message-id/CAEudQAqxf+mbirkO7pAdL61Qw8U8cF_QnEaL101L0tbBUocoQg@mail.gmail.com



Re: Incremental sorts and EXEC_FLAG_REWIND

From
Tomas Vondra
Date:
On Thu, May 07, 2020 at 03:58:47PM -0400, James Coleman wrote:
>On Thu, May 7, 2020 at 2:57 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>>
>> ...
>>
>> With Beta 1 just around the corner[1], I wanted to check in to see if
>> this was closer to being committed so we could close off the open
>> item[2] prior the beta release.
>>
>> Thanks!
>>
>> Jonathan
>>
>> [1]
>> https://www.postgresql.org/message-id/b782a4ec-5e8e-21a7-f628-624be683e6d6@postgresql.org
>> [2] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
>
>
>Tangential, I know, but I think we should consider [1] (includes a
>very minor patch) part of the open items on incremental sort also:
>

Yes, I do plan to get both those issues fixed in the next coupld of days
(certainly in time for beta 1).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Incremental sorts and EXEC_FLAG_REWIND

From
"Jonathan S. Katz"
Date:
On 5/7/20 5:07 PM, Tomas Vondra wrote:
> On Thu, May 07, 2020 at 03:58:47PM -0400, James Coleman wrote:
>> On Thu, May 7, 2020 at 2:57 PM Jonathan S. Katz <jkatz@postgresql.org>
>> wrote:
>>>
>>> ...
>>>
>>> With Beta 1 just around the corner[1], I wanted to check in to see if
>>> this was closer to being committed so we could close off the open
>>> item[2] prior the beta release.
>>>
>>> Thanks!
>>>
>>> Jonathan
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/b782a4ec-5e8e-21a7-f628-624be683e6d6@postgresql.org
>>>
>>> [2] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
>>
>>
>> Tangential, I know, but I think we should consider [1] (includes a
>> very minor patch) part of the open items on incremental sort also:
>>
>
> Yes, I do plan to get both those issues fixed in the next coupld of days
> (certainly in time for beta 1).

Excellent - thank you!

Jonathan


Attachment

Re: Incremental sorts and EXEC_FLAG_REWIND

From
Tomas Vondra
Date:
On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote:
>On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote:
>>
>> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote:
>> >
>> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> wrote:
>> > >
>> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > When initializing an incremental sort node, we have the following as
>> > > > of ExecInitIncrementalSort():
>> > > >     /*
>> > > >      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
>> > > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
>> > > >      * batches in the current sort state.
>> > > >      */
>> > > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
>> > > >                        EXEC_FLAG_MARK)) == 0);
>> > > > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
>> > > > to begin with (because incremental sorts don't support rescans without
>> > > > parameter changes, right?), the comment and the assertion are telling
>> > > > a different story.
>> > >
>> > > I remember changing this assertion in response to an issue I'd found
>> > > which led to rewriting the rescan implementation, but I must have
>> > > missed updating the comment.
>> >
>> > All right, here are the most relevant messages:
>> >
>> > [1]: Here I'd said:
>> > ----------
>> > While working on finding a test case to show rescan isn't implemented
>> > properly yet, I came across a bug. At the top of
>> > ExecInitIncrementalSort, we assert that eflags does not contain
>> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
>> > disabled) breaks that assertion:
>> >
>> > select * from t join (select * from t order by a, b) s on s.a = t.a
>> > where t.a in (1,2);
>> >
>> > The comments about this flag in src/include/executor/executor.h say:
>> >
>> > * REWIND indicates that the plan node should try to efficiently support
>> > * rescans without parameter changes. (Nodes must support ExecReScan calls
>> > * in any case, but if this flag was not given, they are at liberty to do it
>> > * through complete recalculation. Note that a parameter change forces a
>> > * full recalculation in any case.)
>> >
>> > Now we know that except in rare cases (as just discussed recently up
>> > thread) we can't implement rescan efficiently.
>> >
>> > So is this a planner bug (i.e., should we try not to generate
>> > incremental sort plans that require efficient rewind)? Or can we just
>> > remove that part of the assertion and know that we'll implement the
>> > rescan, albeit inefficiently? We already explicitly declare that we
>> > don't support backwards scanning, but I don't see a way to declare the
>> > same for rewind.
>> > ----------
>> >
>> > So it seems to me that we can't disallow REWIND, and we have to
>> > support rescan, but, we could try to mitigate the effects (without a
>> > param change) with a materialize node, as noted below.
>> >
>> > [2]: Here, in response to my questioning above if this was a planner
>> > bug, I'd said:
>> > ----------
>> > Other nodes seem to get a materialization node placed above them to
>> > support this case "better". Is that something we should be doing?
>> > ----------
>> >
>> > I never got any reply on this point; if we _did_ introduce a
>> > materialize node here, then it would mean we could start disallowing
>> > REWIND again. See the email for full details of a specific plan that I
>> > encountered that reproduced this.
>> >
>> > Thoughts?
>> >
>> > > In the meantime, your question is primarily about making sure the
>> > > code/comments/etc. are consistent and not a behavioral problem or
>> > > failure you've seen in testing?
>> >
>> > Still want to confirm this is the case.
>> >
>> > James
>> >
>> > [1]:
https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
>> > [2]:
https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
>>
>> Looking at this more, I think this is definitely suspect. The current
>> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
>> the former is definitely fine because we declare that we don't support
>> backwards scans. The latter seems like the same reasoning would apply,
>> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
>> attached a patch to do that.
>>
>> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
>> the comments/docs seem to suggest it's a hint for efficiency rather
>> than something that has to work or be declared as not implemented, so
>> it seems like one of the following should be the outcome:
>>
>> 1. "Disallow" it by only generating materialize nodes above the
>> incremental sort node if REWIND will be required. I'm not sure if this
>> would mean that incremental sort just wouldn't be useful in that case?
>> 2. Keep the existing implementation where we basically ignore REWIND
>> and use our more inefficient implementation. In this case, I believe
>> we need to stop shielding child nodes from REWIND, though, since we we
>> aren't actually storing the full result set and will instead be
>> re-executing the child nodes.
>>
>> I've attached a patch to take course (2), since it's the easiest to
>> implement. But I'd still like feedback on what we should do here,
>> because I don't feel like I actually know what the semantics expected
>> of the executor/planner are on this point. If we do go with this
>> approach, someone should verify my comments additions about
>> materialize nodes is correct.
>

IMO this is more a comment issue than a code issue, i.e. the comment in
ExecInitIncrementalSort should not mention the REWIND flag at all, as
it's merely a suggestion that cheaper rescans would be nice, but it's
just that - AFAICS it's entirely legal to just ignore the flag and do
full recalc. That's exactly what various other nodes do, I think.

The BACKWARD/MARK flags are different, because we explicitly check if a
node supports that (ExecSupportsBackwardScan/ExecSupportsMarkRestore)
and we say 'no' in both cases for incremental sort.

So I think it's OK to leave the assert as it is and just remove the
REWIND flag from the comment.

Regarding child nodes, I think it's perfectly fine to continue passing
the REWIND flag to them, even if incremental sort has to start from
scratch - we'll still have to read all the input from scratch, but if
the child node can make that cheaper, why not?

I plan to apply something along the lines of v2-0002, with some comment
tweaks (e.g. the comment still says we're shielding child nodes from
REWIND, but that's no longer the case).

>I also happened to noticed that in rescan we are always setting
>node->bounded = false. I was under the impression that
>ExecSetTupleBound would be called *after* ExecReScanIncrementalSort,
>but looking at both ExecSetTupleBound and ExecReScanSort, but it seems
>that the inverse is true. Therefore if we set this to false each time,
>then we lose any possibility of using the bounded optimization for all
>rescans.
>
>I've added a tiny patch (minus one line) to the earlier patch series
>to fix that.
>

Yeah, that seems like a legit bug.


As for v2-0001, I don't quite understand why we needs this? AFAICS the
ExecSupportsMarkRestore function already returns "false" for incremental
sort, and we only explicitly list nodes that may return "true" in some
cases.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Incremental sorts and EXEC_FLAG_REWIND

From
James Coleman
Date:
On Fri, May 8, 2020 at 7:14 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote:
> >On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote:
> >>
> >> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote:
> >> >
> >> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> wrote:
> >> > >
> >> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > When initializing an incremental sort node, we have the following as
> >> > > > of ExecInitIncrementalSort():
> >> > > >     /*
> >> > > >      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
> >> > > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
> >> > > >      * batches in the current sort state.
> >> > > >      */
> >> > > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
> >> > > >                        EXEC_FLAG_MARK)) == 0);
> >> > > > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
> >> > > > to begin with (because incremental sorts don't support rescans without
> >> > > > parameter changes, right?), the comment and the assertion are telling
> >> > > > a different story.
> >> > >
> >> > > I remember changing this assertion in response to an issue I'd found
> >> > > which led to rewriting the rescan implementation, but I must have
> >> > > missed updating the comment.
> >> >
> >> > All right, here are the most relevant messages:
> >> >
> >> > [1]: Here I'd said:
> >> > ----------
> >> > While working on finding a test case to show rescan isn't implemented
> >> > properly yet, I came across a bug. At the top of
> >> > ExecInitIncrementalSort, we assert that eflags does not contain
> >> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
> >> > disabled) breaks that assertion:
> >> >
> >> > select * from t join (select * from t order by a, b) s on s.a = t.a
> >> > where t.a in (1,2);
> >> >
> >> > The comments about this flag in src/include/executor/executor.h say:
> >> >
> >> > * REWIND indicates that the plan node should try to efficiently support
> >> > * rescans without parameter changes. (Nodes must support ExecReScan calls
> >> > * in any case, but if this flag was not given, they are at liberty to do it
> >> > * through complete recalculation. Note that a parameter change forces a
> >> > * full recalculation in any case.)
> >> >
> >> > Now we know that except in rare cases (as just discussed recently up
> >> > thread) we can't implement rescan efficiently.
> >> >
> >> > So is this a planner bug (i.e., should we try not to generate
> >> > incremental sort plans that require efficient rewind)? Or can we just
> >> > remove that part of the assertion and know that we'll implement the
> >> > rescan, albeit inefficiently? We already explicitly declare that we
> >> > don't support backwards scanning, but I don't see a way to declare the
> >> > same for rewind.
> >> > ----------
> >> >
> >> > So it seems to me that we can't disallow REWIND, and we have to
> >> > support rescan, but, we could try to mitigate the effects (without a
> >> > param change) with a materialize node, as noted below.
> >> >
> >> > [2]: Here, in response to my questioning above if this was a planner
> >> > bug, I'd said:
> >> > ----------
> >> > Other nodes seem to get a materialization node placed above them to
> >> > support this case "better". Is that something we should be doing?
> >> > ----------
> >> >
> >> > I never got any reply on this point; if we _did_ introduce a
> >> > materialize node here, then it would mean we could start disallowing
> >> > REWIND again. See the email for full details of a specific plan that I
> >> > encountered that reproduced this.
> >> >
> >> > Thoughts?
> >> >
> >> > > In the meantime, your question is primarily about making sure the
> >> > > code/comments/etc. are consistent and not a behavioral problem or
> >> > > failure you've seen in testing?
> >> >
> >> > Still want to confirm this is the case.
> >> >
> >> > James
> >> >
> >> > [1]:
https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
> >> > [2]:
https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
> >>
> >> Looking at this more, I think this is definitely suspect. The current
> >> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
> >> the former is definitely fine because we declare that we don't support
> >> backwards scans. The latter seems like the same reasoning would apply,
> >> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
> >> attached a patch to do that.
> >>
> >> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
> >> the comments/docs seem to suggest it's a hint for efficiency rather
> >> than something that has to work or be declared as not implemented, so
> >> it seems like one of the following should be the outcome:
> >>
> >> 1. "Disallow" it by only generating materialize nodes above the
> >> incremental sort node if REWIND will be required. I'm not sure if this
> >> would mean that incremental sort just wouldn't be useful in that case?
> >> 2. Keep the existing implementation where we basically ignore REWIND
> >> and use our more inefficient implementation. In this case, I believe
> >> we need to stop shielding child nodes from REWIND, though, since we we
> >> aren't actually storing the full result set and will instead be
> >> re-executing the child nodes.
> >>
> >> I've attached a patch to take course (2), since it's the easiest to
> >> implement. But I'd still like feedback on what we should do here,
> >> because I don't feel like I actually know what the semantics expected
> >> of the executor/planner are on this point. If we do go with this
> >> approach, someone should verify my comments additions about
> >> materialize nodes is correct.
> >
>
> IMO this is more a comment issue than a code issue, i.e. the comment in
> ExecInitIncrementalSort should not mention the REWIND flag at all, as
> it's merely a suggestion that cheaper rescans would be nice, but it's
> just that - AFAICS it's entirely legal to just ignore the flag and do
> full recalc. That's exactly what various other nodes do, I think.
>
> The BACKWARD/MARK flags are different, because we explicitly check if a
> node supports that (ExecSupportsBackwardScan/ExecSupportsMarkRestore)
> and we say 'no' in both cases for incremental sort.
>
> So I think it's OK to leave the assert as it is and just remove the
> REWIND flag from the comment.
>
> Regarding child nodes, I think it's perfectly fine to continue passing
> the REWIND flag to them, even if incremental sort has to start from
> scratch - we'll still have to read all the input from scratch, but if
> the child node can make that cheaper, why not?
>
> I plan to apply something along the lines of v2-0002, with some comment
> tweaks (e.g. the comment still says we're shielding child nodes from
> REWIND, but that's no longer the case).

Thanks.

> >I also happened to noticed that in rescan we are always setting
> >node->bounded = false. I was under the impression that
> >ExecSetTupleBound would be called *after* ExecReScanIncrementalSort,
> >but looking at both ExecSetTupleBound and ExecReScanSort, but it seems
> >that the inverse is true. Therefore if we set this to false each time,
> >then we lose any possibility of using the bounded optimization for all
> >rescans.
> >
> >I've added a tiny patch (minus one line) to the earlier patch series
> >to fix that.
> >
>
> Yeah, that seems like a legit bug.

Yep.

> As for v2-0001, I don't quite understand why we needs this? AFAICS the
> ExecSupportsMarkRestore function already returns "false" for incremental
> sort, and we only explicitly list nodes that may return "true" in some
> cases.

Ah, yes, the default is already false, so we don't need to explicitly do that.

Thanks,
James



Re: Incremental sorts and EXEC_FLAG_REWIND

From
Tomas Vondra
Date:
On Fri, May 08, 2020 at 07:36:38PM -0400, James Coleman wrote:
>On Fri, May 8, 2020 at 7:14 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Fri, Apr 24, 2020 at 04:35:02PM -0400, James Coleman wrote:
>> >On Sun, Apr 19, 2020 at 12:14 PM James Coleman <jtc331@gmail.com> wrote:
>> >>
>> >> On Wed, Apr 15, 2020 at 2:04 PM James Coleman <jtc331@gmail.com> wrote:
>> >> >
>> >> > On Wed, Apr 15, 2020 at 11:02 AM James Coleman <jtc331@gmail.com> wrote:
>> >> > >
>> >> > > On Tue, Apr 14, 2020 at 2:53 AM Michael Paquier <michael@paquier.xyz> wrote:
>> >> > > >
>> >> > > > Hi,
>> >> > > >
>> >> > > > When initializing an incremental sort node, we have the following as
>> >> > > > of ExecInitIncrementalSort():
>> >> > > >     /*
>> >> > > >      * Incremental sort can't be used with either EXEC_FLAG_REWIND,
>> >> > > >      * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we only one of many sort
>> >> > > >      * batches in the current sort state.
>> >> > > >      */
>> >> > > >      Assert((eflags & (EXEC_FLAG_BACKWARD |
>> >> > > >                        EXEC_FLAG_MARK)) == 0);
>> >> > > > While I don't quite follow why EXEC_FLAG_REWIND should be allowed here
>> >> > > > to begin with (because incremental sorts don't support rescans without
>> >> > > > parameter changes, right?), the comment and the assertion are telling
>> >> > > > a different story.
>> >> > >
>> >> > > I remember changing this assertion in response to an issue I'd found
>> >> > > which led to rewriting the rescan implementation, but I must have
>> >> > > missed updating the comment.
>> >> >
>> >> > All right, here are the most relevant messages:
>> >> >
>> >> > [1]: Here I'd said:
>> >> > ----------
>> >> > While working on finding a test case to show rescan isn't implemented
>> >> > properly yet, I came across a bug. At the top of
>> >> > ExecInitIncrementalSort, we assert that eflags does not contain
>> >> > EXEC_FLAG_REWIND. But the following query (with merge and hash joins
>> >> > disabled) breaks that assertion:
>> >> >
>> >> > select * from t join (select * from t order by a, b) s on s.a = t.a
>> >> > where t.a in (1,2);
>> >> >
>> >> > The comments about this flag in src/include/executor/executor.h say:
>> >> >
>> >> > * REWIND indicates that the plan node should try to efficiently support
>> >> > * rescans without parameter changes. (Nodes must support ExecReScan calls
>> >> > * in any case, but if this flag was not given, they are at liberty to do it
>> >> > * through complete recalculation. Note that a parameter change forces a
>> >> > * full recalculation in any case.)
>> >> >
>> >> > Now we know that except in rare cases (as just discussed recently up
>> >> > thread) we can't implement rescan efficiently.
>> >> >
>> >> > So is this a planner bug (i.e., should we try not to generate
>> >> > incremental sort plans that require efficient rewind)? Or can we just
>> >> > remove that part of the assertion and know that we'll implement the
>> >> > rescan, albeit inefficiently? We already explicitly declare that we
>> >> > don't support backwards scanning, but I don't see a way to declare the
>> >> > same for rewind.
>> >> > ----------
>> >> >
>> >> > So it seems to me that we can't disallow REWIND, and we have to
>> >> > support rescan, but, we could try to mitigate the effects (without a
>> >> > param change) with a materialize node, as noted below.
>> >> >
>> >> > [2]: Here, in response to my questioning above if this was a planner
>> >> > bug, I'd said:
>> >> > ----------
>> >> > Other nodes seem to get a materialization node placed above them to
>> >> > support this case "better". Is that something we should be doing?
>> >> > ----------
>> >> >
>> >> > I never got any reply on this point; if we _did_ introduce a
>> >> > materialize node here, then it would mean we could start disallowing
>> >> > REWIND again. See the email for full details of a specific plan that I
>> >> > encountered that reproduced this.
>> >> >
>> >> > Thoughts?
>> >> >
>> >> > > In the meantime, your question is primarily about making sure the
>> >> > > code/comments/etc. are consistent and not a behavioral problem or
>> >> > > failure you've seen in testing?
>> >> >
>> >> > Still want to confirm this is the case.
>> >> >
>> >> > James
>> >> >
>> >> > [1]:
https://www.postgresql.org/message-id/CAAaqYe9%2Bap2SbU_E2WaC4F9ZMF4oa%3DpJZ1NBwaKDMP6GFUA77g%40mail.gmail.com
>> >> > [2]:
https://www.postgresql.org/message-id/CAAaqYe-sOp2o%3DL7nvGZDJ6GsL9%3Db_ztrGE1rhyi%2BF82p3my2bQ%40mail.gmail.com
>> >>
>> >> Looking at this more, I think this is definitely suspect. The current
>> >> code shields lower nodes from EXEC_FLAG_BACKWARD and EXEC_FLAG_MARK --
>> >> the former is definitely fine because we declare that we don't support
>> >> backwards scans. The latter seems like the same reasoning would apply,
>> >> but unfortunately we didn't add it to ExecSupportsMarkRestore, so I've
>> >> attached a patch to do that.
>> >>
>> >> The EXEC_FLAG_REWIND situation though I'm still not clear on -- given
>> >> the comments/docs seem to suggest it's a hint for efficiency rather
>> >> than something that has to work or be declared as not implemented, so
>> >> it seems like one of the following should be the outcome:
>> >>
>> >> 1. "Disallow" it by only generating materialize nodes above the
>> >> incremental sort node if REWIND will be required. I'm not sure if this
>> >> would mean that incremental sort just wouldn't be useful in that case?
>> >> 2. Keep the existing implementation where we basically ignore REWIND
>> >> and use our more inefficient implementation. In this case, I believe
>> >> we need to stop shielding child nodes from REWIND, though, since we we
>> >> aren't actually storing the full result set and will instead be
>> >> re-executing the child nodes.
>> >>
>> >> I've attached a patch to take course (2), since it's the easiest to
>> >> implement. But I'd still like feedback on what we should do here,
>> >> because I don't feel like I actually know what the semantics expected
>> >> of the executor/planner are on this point. If we do go with this
>> >> approach, someone should verify my comments additions about
>> >> materialize nodes is correct.
>> >
>>
>> IMO this is more a comment issue than a code issue, i.e. the comment in
>> ExecInitIncrementalSort should not mention the REWIND flag at all, as
>> it's merely a suggestion that cheaper rescans would be nice, but it's
>> just that - AFAICS it's entirely legal to just ignore the flag and do
>> full recalc. That's exactly what various other nodes do, I think.
>>
>> The BACKWARD/MARK flags are different, because we explicitly check if a
>> node supports that (ExecSupportsBackwardScan/ExecSupportsMarkRestore)
>> and we say 'no' in both cases for incremental sort.
>>
>> So I think it's OK to leave the assert as it is and just remove the
>> REWIND flag from the comment.
>>
>> Regarding child nodes, I think it's perfectly fine to continue passing
>> the REWIND flag to them, even if incremental sort has to start from
>> scratch - we'll still have to read all the input from scratch, but if
>> the child node can make that cheaper, why not?
>>
>> I plan to apply something along the lines of v2-0002, with some comment
>> tweaks (e.g. the comment still says we're shielding child nodes from
>> REWIND, but that's no longer the case).
>
>Thanks.
>
>> >I also happened to noticed that in rescan we are always setting
>> >node->bounded = false. I was under the impression that
>> >ExecSetTupleBound would be called *after* ExecReScanIncrementalSort,
>> >but looking at both ExecSetTupleBound and ExecReScanSort, but it seems
>> >that the inverse is true. Therefore if we set this to false each time,
>> >then we lose any possibility of using the bounded optimization for all
>> >rescans.
>> >
>> >I've added a tiny patch (minus one line) to the earlier patch series
>> >to fix that.
>> >
>>
>> Yeah, that seems like a legit bug.
>
>Yep.
>
>> As for v2-0001, I don't quite understand why we needs this? AFAICS the
>> ExecSupportsMarkRestore function already returns "false" for incremental
>> sort, and we only explicitly list nodes that may return "true" in some
>> cases.
>
>Ah, yes, the default is already false, so we don't need to explicitly do that.
>

I've pushed the fixes, with some minor tweaks. Most importantly I think
we don't need to worry about removing the flags before initializing
child nodes, because (a) we want to pass REWIND and (b) we should not
see the other flags in incremental sort. 


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services