Re: Wrong results with grouping sets - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Wrong results with grouping sets
Date
Msg-id CAMbWs49rYkurn5z3JnjwBAMQ=UitG3WJMtAvS4g9w30WCYoykw@mail.gmail.com
Whole thread Raw
In response to Re: Wrong results with grouping sets  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Wrong results with grouping sets
List pgsql-hackers
FWIW, in addition to fixing wrong result issues for queries with
grouping sets, the changes in 0001 also improve performance for
queries that have subqueries in the grouping expressions, because
different instances of the same subquery would need to be executed
only once.  As a simple example, consider

create table t (a int, b int);
insert into t select i, i from generate_series(1,10000)i;
analyze t;

-- on patched
explain (analyze, costs off)
select (select t1.b from t t2 where a = t1.a) as s1,
       (select t1.b from t t2 where a = t1.a) as s2,
       (select t1.b from t t2 where a = t1.a) as s3
from t t1
group by a, s1;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Group (actual time=20475.028..20480.543 rows=10000 loops=1)
   Group Key: t1.a, ((SubPlan 1))
   ->  Sort (actual time=20475.017..20475.821 rows=10000 loops=1)
         Sort Key: t1.a, ((SubPlan 1))
         Sort Method: quicksort  Memory: 697kB
         ->  Seq Scan on t t1 (actual time=7.435..20468.599 rows=10000 loops=1)
               SubPlan 1
                 ->  Seq Scan on t t2 (actual time=1.022..2.045 rows=1
loops=10000)
                       Filter: (a = t1.a)
                       Rows Removed by Filter: 9999
 Planning Time: 1.561 ms
 Execution Time: 20481.933 ms
(12 rows)

-- on master
explain (analyze, costs off)
select (select t1.b from t t2 where a = t1.a) as s1,
       (select t1.b from t t2 where a = t1.a) as s2,
       (select t1.b from t t2 where a = t1.a) as s3
from t t1
group by a, s1;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Group (actual time=20779.318..62233.526 rows=10000 loops=1)
   Group Key: t1.a, ((SubPlan 1))
   ->  Sort (actual time=20775.125..20777.936 rows=10000 loops=1)
         Sort Key: t1.a, ((SubPlan 1))
         Sort Method: quicksort  Memory: 697kB
         ->  Seq Scan on t t1 (actual time=7.492..20770.060 rows=10000 loops=1)
               SubPlan 1
                 ->  Seq Scan on t t2 (actual time=1.037..2.075 rows=1
loops=10000)
                       Filter: (a = t1.a)
                       Rows Removed by Filter: 9999
   SubPlan 2
     ->  Seq Scan on t t2_1 (actual time=1.037..2.071 rows=1 loops=10000)
           Filter: (a = t1.a)
           Rows Removed by Filter: 9999
   SubPlan 3
     ->  Seq Scan on t t2_2 (actual time=1.037..2.071 rows=1 loops=10000)
           Filter: (a = t1.a)
           Rows Removed by Filter: 9999
 Planning Time: 1.286 ms
 Execution Time: 62235.753 ms
(20 rows)

We can see that with the 0001 patch, this query runs ~3 times faster,
which is no surprise because there are 3 instances of the same
subquery in the targetlist.

Thanks
Richard



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Pgoutput not capturing the generated columns
Next
From: Fujii Masao
Date:
Subject: Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal