Thread: partition wise aggregate wrong rows cost

partition wise aggregate wrong rows cost

From
"bucoo"
Date:
Normal aggregate and partition wise aggregate have a big difference rows cost:

begin;
create table t1(id integer, name text) partition by hash(id);
create table t1_0 partition of t1 for values with(modulus 3, remainder 0);
create table t1_1 partition of t1 for values with(modulus 3, remainder 1);
create table t1_2 partition of t1 for values with(modulus 3, remainder 2);
commit;

normal aggregate rows cost is 200.
explain (verbose)
select count(1) from t1 group by id;
HashAggregate  (cost=106.20..108.20 rows=200 width=12) --here rows is 200
   Output: count(1), t1.id
   Group Key: t1.id
   ->  Append  (cost=0.00..87.15 rows=3810 width=4)
         ->  Seq Scan on public.t1_0 t1_1  (cost=0.00..22.70 rows=1270 width=4)
               Output: t1_1.id
         ->  Seq Scan on public.t1_1 t1_2  (cost=0.00..22.70 rows=1270 width=4)
               Output: t1_2.id
         ->  Seq Scan on public.t1_2 t1_3  (cost=0.00..22.70 rows=1270 width=4)
               Output: t1_3.id

And partition wise aggregate rows cost is 600
set enable_partitionwise_aggregate = on;
explain (verbose)
select count(1) from t1 group by id;
Append  (cost=29.05..96.15 rows=600 width=12) --here rows is 600
   ->  HashAggregate  (cost=29.05..31.05 rows=200 width=12)  --this rows looks like same as normal aggregate
         Output: count(1), t1.id
         Group Key: t1.id
         ->  Seq Scan on public.t1_0 t1  (cost=0.00..22.70 rows=1270 width=4)
               Output: t1.id
   ->  HashAggregate  (cost=29.05..31.05 rows=200 width=12)
         Output: count(1), t1_1.id
         Group Key: t1_1.id
         ->  Seq Scan on public.t1_1  (cost=0.00..22.70 rows=1270 width=4)
               Output: t1_1.id
   ->  HashAggregate  (cost=29.05..31.05 rows=200 width=12)
         Output: count(1), t1_2.id
         Group Key: t1_2.id
         ->  Seq Scan on public.t1_2  (cost=0.00..22.70 rows=1270 width=4)
               Output: t1_2.id

Source code is 15beta1(7fdbdf204920ac279f280d0a8e96946fdaf41aef)




Re: partition wise aggregate wrong rows cost

From
David Rowley
Date:
On Tue, 24 May 2022 at 15:38, bucoo <bucoo@sohu.com> wrote:
>
> Normal aggregate and partition wise aggregate have a big difference rows cost:

> explain (verbose)
> select count(1) from t1 group by id;
> HashAggregate  (cost=106.20..108.20 rows=200 width=12) --here rows is 200

> set enable_partitionwise_aggregate = on;
> explain (verbose)
> select count(1) from t1 group by id;
> Append  (cost=29.05..96.15 rows=600 width=12) --here rows is 600

I wouldn't say this is a bug. Could you not say that they're both
wrong given that your tables are empty?

What's going on here is that estimate_num_groups() is just returning
200, which is what it returns when there are no statistics to give any
indication of a better value. 200 is returned no matter if the
estimate is for a single partition or the partitioned table.  For the
partition-wise aggregate case, the 3 individual 200 estimates are just
summed up by the Append costing code to give 600.

The only way we could really do anything different here would be to
have estimate_num_groups() return a default value based on the number
of input rows. However, that 200 default is pretty long standing. We'd
need to consider quite a bit more than this case before we could
realistically consider changing it.

For tables that are being created and queried quickly after, we
normally tell people to run ANALYZE on the given tables to prevent
this sort of thing.

David



Re: partition wise aggregate wrong rows cost

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 24 May 2022 at 15:38, bucoo <bucoo@sohu.com> wrote:
>> Normal aggregate and partition wise aggregate have a big difference rows cost:

> I wouldn't say this is a bug. Could you not say that they're both
> wrong given that your tables are empty?

We try fairly hard to ensure that the rowcount estimate for a given
relation does not vary across paths, so I concur with the OP that
this is a bug.  Having said that, I'm not sure that the consequences
are significant.  As you say, the estimates seem to get a lot closer
as soon as the table has some statistics.  (But nonetheless, they
are not identical, so it's still a bug.)

            regards, tom lane



re: partition wise aggregate wrong rows cost

From
"bucoo"
Date:
> We try fairly hard to ensure that the row count estimate for a given relation
> does not vary across paths, so I concur with the OP that this is a bug.  Having
> said that, I'm not sure that the consequences are significant.  As you say, the
> estimates seem to get a lot closer as soon as the table has some statistics.
> (But nonetheless, they are not identical, so it's still a bug.)

Yes, the estimates seem to get a lot closer as soon as the table has some statistics.

> I'm not sure that the consequences are significant.
At least it doesn't make any difference to me for now.
I noticed this problem while testing aggregation.
It looks a little weird, so I emailed.

Thanks every one.