Thread: Redundant Result node

Redundant Result node

From

Richard Guo

Date:

22 August 2024, 10:34:05

I ran into a query plan where the Result node seems redundant to me:

create table t (a int, b int, c int);
insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
create index on t (a, b);
analyze t;

set enable_hashagg to off;
set enable_seqscan to off;

explain (verbose, costs off)
select distinct b, a from t order by a, b;
                       QUERY PLAN
---------------------------------------------------------
 Result
   Output: b, a
   ->  Unique
         Output: a, b
         ->  Index Only Scan using t_a_b_idx on public.t
               Output: a, b
(6 rows)

What I expect is that both the Scan node and the Unique node output
'b, a', and we do not need an additional projection step, something
like:

explain (verbose, costs off)
select distinct b, a from t order by a, b;
                    QUERY PLAN
---------------------------------------------------
 Unique
   Output: b, a
   ->  Index Only Scan using t_a_b_idx on public.t
         Output: b, a
(4 rows)

I looked into this a little bit and found that in function
create_ordered_paths, we decide whether a projection step is needed
based on a simple pointer comparison between sorted_path->pathtarget
and final_target.

    /* Add projection step if needed */
    if (sorted_path->pathtarget != target)
        sorted_path = apply_projection_to_path(root, ordered_rel,
                                               sorted_path, target);

This does not seem right to me, as PathTargets are not canonical, so
we cannot guarantee that two identical PathTargets will have the same
pointer.  Actually, for the query above, the two PathTargets are
identical but have different pointers.

I wonder if we need to invent a function to compare two PathTargets.
Alternatively, in this case, would it suffice to simply compare
PathTarget.exprs?

Thanks
Richard

Re: Redundant Result node

From

David Rowley

Date:

22 August 2024, 14:15:00

On Thu, 22 Aug 2024 at 19:34, Richard Guo <guofenglinux@gmail.com> wrote:
>     /* Add projection step if needed */
>     if (sorted_path->pathtarget != target)
>         sorted_path = apply_projection_to_path(root, ordered_rel,
>                                                sorted_path, target);
>
> This does not seem right to me, as PathTargets are not canonical, so
> we cannot guarantee that two identical PathTargets will have the same
> pointer.  Actually, for the query above, the two PathTargets are
> identical but have different pointers.
>
> I wonder if we need to invent a function to compare two PathTargets.
> Alternatively, in this case, would it suffice to simply compare
> PathTarget.exprs?

I think tlist.c would be a good home for such a function. If you go
with the function route, then it's easier to add optimisations such as
checking if the pointers are equal before going to the trouble of
checking if the exprs match.

David

Re: Redundant Result node

From

Peter Eisentraut

Date:

22 August 2024, 14:32:43

On 22.08.24 09:34, Richard Guo wrote:
> I looked into this a little bit and found that in function
> create_ordered_paths, we decide whether a projection step is needed
> based on a simple pointer comparison between sorted_path->pathtarget
> and final_target.
> 
>      /* Add projection step if needed */
>      if (sorted_path->pathtarget != target)
>          sorted_path = apply_projection_to_path(root, ordered_rel,
>                                                 sorted_path, target);
> 
> This does not seem right to me, as PathTargets are not canonical, so
> we cannot guarantee that two identical PathTargets will have the same
> pointer.  Actually, for the query above, the two PathTargets are
> identical but have different pointers.
> 
> I wonder if we need to invent a function to compare two PathTargets.

Wouldn't the normal node equal() work?

Re: Redundant Result node

From

David Rowley

Date:

22 August 2024, 15:02:48

On Thu, 22 Aug 2024 at 23:33, Peter Eisentraut <peter@eisentraut.org> wrote:
> > I wonder if we need to invent a function to compare two PathTargets.
>
> Wouldn't the normal node equal() work?

It might. I think has_volatile_expr might be missing a
pg_node_attr(equal_ignore).

David

Re: Redundant Result node

From

Rafia Sabih

Date:

22 August 2024, 23:17:13

On Thu, 22 Aug 2024 at 15:02, Ranier Vilela <ranier.vf@gmail.com> wrote:

Hi.

Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux@gmail.com> escreveu:
I ran into a query plan where the Result node seems redundant to me:

create table t (a int, b int, c int);
insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
create index on t (a, b);
analyze t;

set enable_hashagg to off;
set enable_seqscan to off;

explain (verbose, costs off)
select distinct b, a from t order by a, b;
QUERY PLAN
---------------------------------------------------------
Result
Output: b, a
-> Unique
Output: a, b
-> Index Only Scan using t_a_b_idx on public.t
Output: a, b
(6 rows)

What I expect is that both the Scan node and the Unique node output
'b, a', and we do not need an additional projection step, something
like:

explain (verbose, costs off)
select distinct b, a from t order by a, b;
QUERY PLAN
---------------------------------------------------
Unique
Output: b, a
-> Index Only Scan using t_a_b_idx on public.t
Output: b, a
(4 rows)

I looked into this a little bit and found that in function
create_ordered_paths, we decide whether a projection step is needed
based on a simple pointer comparison between sorted_path->pathtarget
and final_target.

/* Add projection step if needed */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);

This does not seem right to me, as PathTargets are not canonical, so
we cannot guarantee that two identical PathTargets will have the same
pointer. Actually, for the query above, the two PathTargets are
identical but have different pointers.
Could memcmp solve this?

With patch attached, using memcmp to compare the pointers.

select distinct b, a from t order by a, b;
QUERY PLAN
----------------------------------
Sort
Output: b, a
Sort Key: t.a, t.b
-> HashAggregate
Output: b, a
Group Key: t.a, t.b
-> Seq Scan on public.t
Output: a, b, c
(8 rows)

attached patch for consideration.

best regards,
Ranier Vilela

+1 for the idea of removing this redundant node.

I had a look in this patch, and I was wondering if we still need sorted_path->pathtarget != target in the condition.

Apart from that,

- if (sorted_path->pathtarget != target)
+ if (sorted_path->pathtarget != target &&
+ memcmp(sorted_path->pathtarget, target, sizeof(PathTarget)) != 0)

An extra space is there, please fix it.

Some regression tests should be added for this.

Regards,
Rafia Sabih

Re: Redundant Result node

From

Richard Guo

Date:

23 August 2024, 05:31:14

On Thu, Aug 22, 2024 at 8:03 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, 22 Aug 2024 at 23:33, Peter Eisentraut <peter@eisentraut.org> wrote:
> > > I wonder if we need to invent a function to compare two PathTargets.
> >
> > Wouldn't the normal node equal() work?
>
> It might. I think has_volatile_expr might be missing a
> pg_node_attr(equal_ignore).

Yeah, maybe we can make the node equal() work for PathTarget.  We'll
need to remove the no_equal attribute in PathTarget.  I think we also
need to specify pg_node_attr(equal_ignore) for PathTarget.cost.

BTW, I'm wondering why we specify no_copy for PathTarget, while
meanwhile implementing a separate function copy_pathtarget() in
tlist.c to copy a PathTarget.  Can't we support copyObject() for
PathTarget?

Also the pg_node_attr(array_size(exprs)) attribute for
PathTarget.sortgrouprefs does not seem right to me.  In a lot of cases
sortgrouprefs would just be NULL.  Usually it is valid only for
upper-level Paths.  Hmm, maybe this is why we do not support
copyObject() for PathTarget?

Thanks
Richard

Re: Redundant Result node

From

Richard Guo

Date:

23 August 2024, 06:08:47

On Thu, Aug 22, 2024 at 3:34 PM Richard Guo <guofenglinux@gmail.com> wrote:
>     /* Add projection step if needed */
>     if (sorted_path->pathtarget != target)
>         sorted_path = apply_projection_to_path(root, ordered_rel,
>                                                sorted_path, target);
>
> This does not seem right to me, as PathTargets are not canonical, so
> we cannot guarantee that two identical PathTargets will have the same
> pointer.  Actually, for the query above, the two PathTargets are
> identical but have different pointers.

FWIW, the redundant-projection issue is more common in practice than I
initially thought.  For a simple query as below:

explain (verbose, costs off) select a from t order by 1;
         QUERY PLAN
----------------------------
 Sort
   Output: a
   Sort Key: t.a
   ->  Seq Scan on public.t
         Output: a
(5 rows)

... we'll always make a separate ProjectionPath on top of the SortPath
in create_ordered_paths.  It’s only when we create the plan node for
the projection step in createplan.c that we realize a separate Result
is unnecessary.  This is not efficient.

Thanks
Richard

Re: Redundant Result node

From

Tom Lane

Date:

23 August 2024, 06:19:17

Richard Guo <guofenglinux@gmail.com> writes:
> ... we'll always make a separate ProjectionPath on top of the SortPath
> in create_ordered_paths.  It’s only when we create the plan node for
> the projection step in createplan.c that we realize a separate Result
> is unnecessary.  This is not efficient.

I'm not sure you're considering "efficiency" in the right light.
In my mind, any time we can postpone work from path-creation time
to plan-creation time, we're probably winning because we create
many more paths than plans.  Perhaps that's wrong in this case,
but it's not anywhere near as obvious as you suggest.

            regards, tom lane

Re: Redundant Result node

From

Richard Guo

Date:

23 August 2024, 06:48:37

On Fri, Aug 23, 2024 at 11:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
> > ... we'll always make a separate ProjectionPath on top of the SortPath
> > in create_ordered_paths.  It’s only when we create the plan node for
> > the projection step in createplan.c that we realize a separate Result
> > is unnecessary.  This is not efficient.
>
> I'm not sure you're considering "efficiency" in the right light.
> In my mind, any time we can postpone work from path-creation time
> to plan-creation time, we're probably winning because we create
> many more paths than plans.  Perhaps that's wrong in this case,
> but it's not anywhere near as obvious as you suggest.

I agree that it’s always desirable to postpone work from path-creation
time to plan-creation time.  In this case, however, it’s a little
different.  The projection step could actually be avoided from the
start if we perform the correct check in create_ordered_paths.

Thanks
Richard

Re: Redundant Result node

From

Tom Lane

Date:

23 August 2024, 06:56:15

Richard Guo <guofenglinux@gmail.com> writes:
> On Fri, Aug 23, 2024 at 11:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not sure you're considering "efficiency" in the right light.

> I agree that it’s always desirable to postpone work from path-creation
> time to plan-creation time.  In this case, however, it’s a little
> different.  The projection step could actually be avoided from the
> start if we perform the correct check in create_ordered_paths.

Well, the question is how expensive is the "correct check" compared
to what we're doing now.  It might be cheaper than creating an extra
level of path node, or it might not.  An important factor here is
that we'd pay the extra cost of a more complex check every time,
whether it avoids creation of an extra path node or not.

            regards, tom lane

Re: Redundant Result node

From

Ranier Vilela

Date:

23 August 2024, 15:29:03

Hi Rafia.

Em qui., 22 de ago. de 2024 às 17:17, Rafia Sabih <rafia.pghackers@gmail.com> escreveu:

On Thu, 22 Aug 2024 at 15:02, Ranier Vilela <ranier.vf@gmail.com> wrote:
Hi.

Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux@gmail.com> escreveu:
I ran into a query plan where the Result node seems redundant to me:

create table t (a int, b int, c int);
insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
create index on t (a, b);
analyze t;

set enable_hashagg to off;
set enable_seqscan to off;

explain (verbose, costs off)
select distinct b, a from t order by a, b;
QUERY PLAN
---------------------------------------------------------
Result
Output: b, a
-> Unique
Output: a, b
-> Index Only Scan using t_a_b_idx on public.t
Output: a, b
(6 rows)

What I expect is that both the Scan node and the Unique node output
'b, a', and we do not need an additional projection step, something
like:

explain (verbose, costs off)
select distinct b, a from t order by a, b;
QUERY PLAN
---------------------------------------------------
Unique
Output: b, a
-> Index Only Scan using t_a_b_idx on public.t
Output: b, a
(4 rows)

I looked into this a little bit and found that in function
create_ordered_paths, we decide whether a projection step is needed
based on a simple pointer comparison between sorted_path->pathtarget
and final_target.

/* Add projection step if needed */
if (sorted_path->pathtarget != target)
sorted_path = apply_projection_to_path(root, ordered_rel,
sorted_path, target);

This does not seem right to me, as PathTargets are not canonical, so
we cannot guarantee that two identical PathTargets will have the same
pointer. Actually, for the query above, the two PathTargets are
identical but have different pointers.
Could memcmp solve this?

With patch attached, using memcmp to compare the pointers.

select distinct b, a from t order by a, b;
QUERY PLAN
----------------------------------
Sort
Output: b, a
Sort Key: t.a, t.b
-> HashAggregate
Output: b, a
Group Key: t.a, t.b
-> Seq Scan on public.t
Output: a, b, c
(8 rows)

attached patch for consideration.

best regards,
Ranier Vilela

+1 for the idea of removing this redundant node.
I had a look in this patch, and I was wondering if we still need sorted_path->pathtarget != target in the condition.

Although the test is unnecessary, it is cheap and avoids a possible call to memcmp.

Thanks.

best regards,

Ranier Vilela

Re: Redundant Result node

From

Peter Eisentraut

Date:

26 August 2024, 15:58:29

On 23.08.24 10:27, Richard Guo wrote:
> On Fri, Aug 23, 2024 at 11:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Richard Guo <guofenglinux@gmail.com> writes:
>>> I agree that it’s always desirable to postpone work from path-creation
>>> time to plan-creation time.  In this case, however, it’s a little
>>> different.  The projection step could actually be avoided from the
>>> start if we perform the correct check in create_ordered_paths.
>>
>> Well, the question is how expensive is the "correct check" compared
>> to what we're doing now.  It might be cheaper than creating an extra
>> level of path node, or it might not.  An important factor here is
>> that we'd pay the extra cost of a more complex check every time,
>> whether it avoids creation of an extra path node or not.
> 
> Fair point.  After looking at the code for a while, I believe it is
> sufficient to compare PathTarget.exprs after we've checked that the
> two targets have different pointers.

-        if (sorted_path->pathtarget != target)
+        if (sorted_path->pathtarget != target &&
+            !equal(sorted_path->pathtarget->exprs, target->exprs))
              sorted_path = apply_projection_to_path(root, ordered_rel,

equal() already checks whether both pointers are equal, so I think this 
could be simplified to just

     if (!equal(sorted_path->pathtarget->exprs, target->exprs))

Re: Redundant Result node

From

Richard Guo

Date:

27 August 2024, 06:43:02

On Thu, Aug 22, 2024 at 9:02 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
> Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux@gmail.com> escreveu:
>> This does not seem right to me, as PathTargets are not canonical, so
>> we cannot guarantee that two identical PathTargets will have the same
>> pointer.  Actually, for the query above, the two PathTargets are
>> identical but have different pointers.
>
> Could memcmp solve this?

Hmm, I don't think memcmp works for nodes that contain pointers.

Thanks
Richard

Re: Redundant Result node

From

Ranier Vilela

Date:

27 August 2024, 14:26:15

Em ter., 27 de ago. de 2024 às 00:43, Richard Guo <guofenglinux@gmail.com> escreveu:

On Thu, Aug 22, 2024 at 9:02 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
> Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux@gmail.com> escreveu:
>> This does not seem right to me, as PathTargets are not canonical, so
>> we cannot guarantee that two identical PathTargets will have the same
>> pointer. Actually, for the query above, the two PathTargets are
>> identical but have different pointers.
>
> Could memcmp solve this?

Hmm, I don't think memcmp works for nodes that contain pointers.

The first case which memcmp can fail is if both pointers are null.

But considering the current behavior, the cost vs benefit favors memcmp.

best regards,

Ranier Vilela