Re: Redundant Result node - Mailing list pgsql-hackers

From Rafia Sabih
Subject Re: Redundant Result node
Date
Msg-id CA+FpmFdu-ocfiyO4u81k86YHG-Jf-GtC2DToGrcnYUg2EcQ8Kw@mail.gmail.com
Whole thread Raw
In response to Redundant Result node  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Redundant Result node
List pgsql-hackers


On Thu, 22 Aug 2024 at 15:02, Ranier Vilela <ranier.vf@gmail.com> wrote:
Hi.

Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofenglinux@gmail.com> escreveu:
I ran into a query plan where the Result node seems redundant to me:

create table t (a int, b int, c int);
insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
create index on t (a, b);
analyze t;

set enable_hashagg to off;
set enable_seqscan to off;

explain (verbose, costs off)
select distinct b, a from t order by a, b;
                       QUERY PLAN
---------------------------------------------------------
 Result
   Output: b, a
   ->  Unique
         Output: a, b
         ->  Index Only Scan using t_a_b_idx on public.t
               Output: a, b
(6 rows)

What I expect is that both the Scan node and the Unique node output
'b, a', and we do not need an additional projection step, something
like:

explain (verbose, costs off)
select distinct b, a from t order by a, b;
                    QUERY PLAN
---------------------------------------------------
 Unique
   Output: b, a
   ->  Index Only Scan using t_a_b_idx on public.t
         Output: b, a
(4 rows)

I looked into this a little bit and found that in function
create_ordered_paths, we decide whether a projection step is needed
based on a simple pointer comparison between sorted_path->pathtarget
and final_target.

    /* Add projection step if needed */
    if (sorted_path->pathtarget != target)
        sorted_path = apply_projection_to_path(root, ordered_rel,
                                               sorted_path, target);

This does not seem right to me, as PathTargets are not canonical, so
we cannot guarantee that two identical PathTargets will have the same
pointer.  Actually, for the query above, the two PathTargets are
identical but have different pointers.
Could memcmp solve this?

With patch attached, using memcmp to compare the pointers.

select distinct b, a from t order by a, b;
            QUERY PLAN
----------------------------------
 Sort
   Output: b, a
   Sort Key: t.a, t.b
   ->  HashAggregate
         Output: b, a
         Group Key: t.a, t.b
         ->  Seq Scan on public.t
               Output: a, b, c
(8 rows)

attached patch for consideration.

best regards,
Ranier Vilela
 
+1 for the idea of removing this redundant node.
I had a look in this patch, and I was wondering if we still need sorted_path->pathtarget != target in the condition.

Apart from that,
-               if (sorted_path->pathtarget != target)
+               if (sorted_path->pathtarget != target &&
+                   memcmp(sorted_path->pathtarget, target, sizeof(PathTarget)) != 0)
An extra space is there, please fix it.

Some regression tests should be added for this.

--

Regards,
Rafia Sabih

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Partial aggregates pushdown
Next
From: Daniel Gustafsson
Date:
Subject: Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?