Thread: Doubts about pushing LIMIT to MergeAppendPath

Doubts about pushing LIMIT to MergeAppendPath

From
Antonin Houska
Date:
Review of [1] made me think of this optimization, currently used only in
create_merge_append_path():

    /*
     * Apply query-wide LIMIT if known and path is for sole base relation.
     * (Handling this at this low level is a bit klugy.)
     */
    if (bms_equal(rel->relids, root->all_baserels))
        pathnode->limit_tuples = root->limit_tuples;
    else
        pathnode->limit_tuples = -1.0;

Currently it's not a problem because the output of MergeAppend plan is not
likely to be re-sorted, but I don't think data correctness should depend on
cost evaluation. Instead, -1 should be set here if there's any chance that the
output will be sorted again.

I tried to reproduce the issue by applying the "Incremental sort" [2] patch
and running the following example:

CREATE TABLE t(i int, j int);
CREATE TABLE t1() INHERITS (t);
CREATE INDEX ON t1(i, j);
INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
CREATE TABLE t2() INHERITS (t);
CREATE INDEX ON t2(i, j);
INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);

ANALYZE;

SELECT * FROM t ORDER BY i, j DESC LIMIT 1;

I expected the MergeAppend plan to apply the limit and thus prevent the
incremental sort node from receiving the first tuple that it should emit,
however the query yielded correct result. I think the reason is that the
MergeAppendPath.limit_tuples field is only used for cost estimates, but not
enforced during execution. Is that intended?

I thought this could be better approach to the limit push-down

    if (root->limit_tuples > 0 && root->parse->sortClause == NIL &&
        bms_equal(rel->relids, root->all_baserels))
        pathnode->limit_tuples = root->limit_tuples;
    else
        pathnode->limit_tuples = -1.0;

however it will stop working as soon as the incremental sort patch (which can
is used below the upper planner) gets applied.

Finally I think we should be able to apply the limit to generic path, not only
to MergeAppendPath. I just don't know how to check when it's correct. Does
anyone have an idea?

[1] https://commitfest.postgresql.org/20/1850/

[2] https://commitfest.postgresql.org/20/1124/

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Re: Doubts about pushing LIMIT to MergeAppendPath

From
Tomas Vondra
Date:
On 11/01/2018 12:48 PM, Antonin Houska wrote:
> Review of [1] made me think of this optimization, currently used only in
> create_merge_append_path():
> 
>     /*
>      * Apply query-wide LIMIT if known and path is for sole base relation.
>      * (Handling this at this low level is a bit klugy.)
>      */
>     if (bms_equal(rel->relids, root->all_baserels))
>         pathnode->limit_tuples = root->limit_tuples;
>     else
>         pathnode->limit_tuples = -1.0;
> 
> Currently it's not a problem because the output of MergeAppend plan is not
> likely to be re-sorted, but I don't think data correctness should depend on
> cost evaluation. Instead, -1 should be set here if there's any chance that the
> output will be sorted again.
> 

So you're saying we might end up with a plan like this:

    Limit
    -> Sort
        -> MergeAppend
           -> SeqScan on t

In which case we'd pass the wrong limit_tuples to the MergeAppend?

Hmmm, that would depend on us building MergeAppend node that does not
match the expected pathkeys, and pick it instead of plain Append node.
I'm not sure that's actually possible, but maybe it is ...

> I tried to reproduce the issue by applying the "Incremental sort" [2] patch
> and running the following example:
> 
> CREATE TABLE t(i int, j int);
> CREATE TABLE t1() INHERITS (t);
> CREATE INDEX ON t1(i, j);
> INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
> CREATE TABLE t2() INHERITS (t);
> CREATE INDEX ON t2(i, j);
> INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);
> 
> ANALYZE;
> 
> SELECT * FROM t ORDER BY i, j DESC LIMIT 1;
> 

So, what query plan did this use?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Doubts about pushing LIMIT to MergeAppendPath

From
Tomas Vondra
Date:

On 11/01/2018 08:11 PM, Tomas Vondra wrote:
> On 11/01/2018 12:48 PM, Antonin Houska wrote:
>> Review of [1] made me think of this optimization, currently used only in
>> create_merge_append_path():
>>
>>     /*
>>      * Apply query-wide LIMIT if known and path is for sole base relation.
>>      * (Handling this at this low level is a bit klugy.)
>>      */
>>     if (bms_equal(rel->relids, root->all_baserels))
>>         pathnode->limit_tuples = root->limit_tuples;
>>     else
>>         pathnode->limit_tuples = -1.0;
>>
>> Currently it's not a problem because the output of MergeAppend plan is not
>> likely to be re-sorted, but I don't think data correctness should depend on
>> cost evaluation. Instead, -1 should be set here if there's any chance that the
>> output will be sorted again.
>>
> 
> So you're saying we might end up with a plan like this:
> 
>     Limit
>     -> Sort
>         -> MergeAppend
>            -> SeqScan on t
> 
> In which case we'd pass the wrong limit_tuples to the MergeAppend?
> 
> Hmmm, that would depend on us building MergeAppend node that does not
> match the expected pathkeys, and pick it instead of plain Append node.
> I'm not sure that's actually possible, but maybe it is ...
> 

OK, so the reason is that when building child paths, we don't keep the
pathkeys unless it matches the "interesting" pathkeys.

So for example we may have an IndexPath, but with pathkeys=NIL if the
index does not match the ORDER BY we need. So we don't even build the
MergeAppend paths, as generate_mergeappend_paths iterates over child
pathkeys (and we don't have any).

We might have two child paths, one with pathkeys (matching the ORDER BY)
and one without pathkeys. But at that point the optimization does not
apply, because it checks for "single base relation".

At least that's my understanding of add_paths_to_append_rel().

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Doubts about pushing LIMIT to MergeAppendPath

From
Antonin Houska
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> On 11/01/2018 12:48 PM, Antonin Houska wrote:
> > Review of [1] made me think of this optimization, currently used only in
> > create_merge_append_path():
> >
> >     /*
> >      * Apply query-wide LIMIT if known and path is for sole base relation.
> >      * (Handling this at this low level is a bit klugy.)
> >      */
> >     if (bms_equal(rel->relids, root->all_baserels))
> >         pathnode->limit_tuples = root->limit_tuples;
> >     else
> >         pathnode->limit_tuples = -1.0;
> >
> > Currently it's not a problem because the output of MergeAppend plan is not
> > likely to be re-sorted, but I don't think data correctness should depend on
> > cost evaluation. Instead, -1 should be set here if there's any chance that the
> > output will be sorted again.
> >
>
> So you're saying we might end up with a plan like this:
>
>     Limit
>     -> Sort
>         -> MergeAppend
>            -> SeqScan on t
>
> In which case we'd pass the wrong limit_tuples to the MergeAppend?
>
> Hmmm, that would depend on us building MergeAppend node that does not
> match the expected pathkeys, and pick it instead of plain Append node.
> I'm not sure that's actually possible, but maybe it is ...

Yes, if there should be Sort node above MergeAppend, then it'll be probably
cheaper to use Append. So the problem should not happen in practice, but in
theory it seems to be possible.

>
> > I tried to reproduce the issue by applying the "Incremental sort" [2] patch
> > and running the following example:
> >
> > CREATE TABLE t(i int, j int);
> > CREATE TABLE t1() INHERITS (t);
> > CREATE INDEX ON t1(i, j);
> > INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
> > CREATE TABLE t2() INHERITS (t);
> > CREATE INDEX ON t2(i, j);
> > INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);
> >
> > ANALYZE;
> >
> > SELECT * FROM t ORDER BY i, j DESC LIMIT 1;
> >
>
> So, what query plan did this use?

 Limit
   ->  Incremental Sort
         Sort Key: t.i, t.j DESC
         Presorted Key: t.i
         ->  Merge Append
               Sort Key: t.i
               ->  Sort
                     Sort Key: t.i
                     ->  Seq Scan on t
               ->  Index Only Scan using t1_i_j_idx on t1
               ->  Index Only Scan using t2_i_j_idx on t2

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Re: Doubts about pushing LIMIT to MergeAppendPath

From
Tomas Vondra
Date:

On 11/01/2018 08:51 PM, Antonin Houska wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> 
>> On 11/01/2018 12:48 PM, Antonin Houska wrote:
>>> Review of [1] made me think of this optimization, currently used only in
>>> create_merge_append_path():
>>>
>>>     /*
>>>      * Apply query-wide LIMIT if known and path is for sole base relation.
>>>      * (Handling this at this low level is a bit klugy.)
>>>      */
>>>     if (bms_equal(rel->relids, root->all_baserels))
>>>         pathnode->limit_tuples = root->limit_tuples;
>>>     else
>>>         pathnode->limit_tuples = -1.0;
>>>
>>> Currently it's not a problem because the output of MergeAppend plan is not
>>> likely to be re-sorted, but I don't think data correctness should depend on
>>> cost evaluation. Instead, -1 should be set here if there's any chance that the
>>> output will be sorted again.
>>>
>>
>> So you're saying we might end up with a plan like this:
>>
>>     Limit
>>     -> Sort
>>         -> MergeAppend
>>            -> SeqScan on t
>>
>> In which case we'd pass the wrong limit_tuples to the MergeAppend?
>>
>> Hmmm, that would depend on us building MergeAppend node that does not
>> match the expected pathkeys, and pick it instead of plain Append node.
>> I'm not sure that's actually possible, but maybe it is ...
> 
> Yes, if there should be Sort node above MergeAppend, then it'll be probably
> cheaper to use Append. So the problem should not happen in practice, but in
> theory it seems to be possible.
> 

Actually no - see my other response. We don't build child paths with
pathkeys that are not useful (do not match the expected ordering), and
we only build MergeAppend for pathkeys from child paths.

So I believe we currently won't build a MergeAppend that would then
require additional Sort node, irrespectedly of the costing.

>>
>>> I tried to reproduce the issue by applying the "Incremental sort" [2] patch
>>> and running the following example:
>>>
>>> CREATE TABLE t(i int, j int);
>>> CREATE TABLE t1() INHERITS (t);
>>> CREATE INDEX ON t1(i, j);
>>> INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
>>> CREATE TABLE t2() INHERITS (t);
>>> CREATE INDEX ON t2(i, j);
>>> INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);
>>>
>>> ANALYZE;
>>>
>>> SELECT * FROM t ORDER BY i, j DESC LIMIT 1;
>>>
>>
>> So, what query plan did this use?
> 
>  Limit
>    ->  Incremental Sort
>          Sort Key: t.i, t.j DESC
>          Presorted Key: t.i
>          ->  Merge Append
>                Sort Key: t.i
>                ->  Sort
>                      Sort Key: t.i
>                      ->  Seq Scan on t
>                ->  Index Only Scan using t1_i_j_idx on t1
>                ->  Index Only Scan using t2_i_j_idx on t2
> 

Interesting. That might be broken, not sure if the LIMIT optimization
triggered or not.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Doubts about pushing LIMIT to MergeAppendPath

From
Antonin Houska
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> OK, so the reason is that when building child paths, we don't keep the
> pathkeys unless it matches the "interesting" pathkeys.
>
> So for example we may have an IndexPath, but with pathkeys=NIL if the
> index does not match the ORDER BY we need.

I don't agree that IndexPath will necessarily have pathkeys set to NIL in such
a case. Even if the index ordering does not match ORDER BY clause of the
query, the path can still be useful, typically for merge join.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Re: Doubts about pushing LIMIT to MergeAppendPath

From
Tomas Vondra
Date:
On 11/02/2018 08:16 AM, Antonin Houska wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> 
>> OK, so the reason is that when building child paths, we don't keep
>> the pathkeys unless it matches the "interesting" pathkeys.
>>
>> So for example we may have an IndexPath, but with pathkeys=NIL if
>> the index does not match the ORDER BY we need.
> 
> I don't agree that IndexPath will necessarily have pathkeys set to 
> NIL in such a case. Even if the index ordering does not match ORDER
> BY clause of the query, the path can still be useful, typically for
> merge join.
> 

But wouldn't that mean there's a MergeJoin below the Limit? AFAIK we
don't push limit_tuples to that node.

After looking at this a bit more after a healthy dose of coffee, I think
for this issue to happen the Limit would have to be placed immediately
above the MergeAppend node. But if the ordering does not match, there
will be an explicit Sort node in between, and we'll push the limit only
to that node (and not below). Which is probably what's happening in the
incremental sort query, BTW.

I certainly agree correctness must not depend on costing. But I don't
think that's really the case here - what you mentioned is merely one
part of the optimization, but there are other bits that make it work. At
least that's my understanding - if you could construct a counter-example
demonstrating the failure, that'd be a clear proof of course.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Doubts about pushing LIMIT to MergeAppendPath

From
Antonin Houska
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> On 11/02/2018 08:16 AM, Antonin Houska wrote:
> > Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> >
> >> OK, so the reason is that when building child paths, we don't keep
> >> the pathkeys unless it matches the "interesting" pathkeys.
> >>
> >> So for example we may have an IndexPath, but with pathkeys=NIL if
> >> the index does not match the ORDER BY we need.
> >
> > I don't agree that IndexPath will necessarily have pathkeys set to
> > NIL in such a case. Even if the index ordering does not match ORDER
> > BY clause of the query, the path can still be useful, typically for
> > merge join.
> >
>
> But wouldn't that mean there's a MergeJoin below the Limit?

I just wanted to say that pathkeys of MergeAppendPath do depend on those of
child paths, however the MergeAppendPath should expect any pathkeys from the
children. I mentioned MergeJoin just as an example reason for IndexPath to
have pathkeys unrelated to the ORDER BY clause.

> AFAIK we don't push limit_tuples to that node.

I guess you mean that in a plan like this

Limit -> MergeJoin -> MergeAppend

the Limit node would not get propagated to MergeAppend and thus it would not
prevent it from providing sufficient input for the additional sort?

Anyway, the optimization we discuss here is only applied if the MergeAppend
path is at the top of the join tree, so there cannot be any join above it.

> After looking at this a bit more after a healthy dose of coffee, I think
> for this issue to happen the Limit would have to be placed immediately
> above the MergeAppend node. But if the ordering does not match, there
> will be an explicit Sort node in between, and we'll push the limit only
> to that node (and not below). Which is probably what's happening in the
> incremental sort query, BTW.

Yes, the Limit should be applied "from above", when we know that it's
appropriate. (Currently it is applied by create_merge_append_path(), and at
the moment we don't know what will hapen to the output.) But the problem of
MergeAppend is that the limit can affect cost of the child nodes and the
MergeAppend node itself.

That's not clearly bad because if the MergeAppendPath won without the limit,
the limit should not make the cost worse. However I think the only correct
strategy is to apply the limit to all paths of given relation before
set_cheapest() is called, because another path can benefit from the limit even
more than the one to which we're trying to apply the limit. However at that
early stage we don't have the complete plan tree so we don't know if the limit
would ever get propagated down, and what its value would be ...

> I certainly agree correctness must not depend on costing. But I don't
> think that's really the case here - what you mentioned is merely one
> part of the optimization, but there are other bits that make it work.

As I mentioned upthread, I think the problem is currently hidden because the
limit_tuples field of MergeAppendPath is only used for cost estimates, but
executor is not aware of it.

> At least that's my understanding - if you could construct a counter-example
> demonstrating the failure, that'd be a clear proof of course.

I spent some time on it today, but w/o success. However the incremental sort
example I posted upthread should be sufficient reason to do something. I just
don't know what.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com