Thread: partition wise aggregate wrong rows cost
Normal aggregate and partition wise aggregate have a big difference rows cost: begin; create table t1(id integer, name text) partition by hash(id); create table t1_0 partition of t1 for values with(modulus 3, remainder 0); create table t1_1 partition of t1 for values with(modulus 3, remainder 1); create table t1_2 partition of t1 for values with(modulus 3, remainder 2); commit; normal aggregate rows cost is 200. explain (verbose) select count(1) from t1 group by id; HashAggregate (cost=106.20..108.20 rows=200 width=12) --here rows is 200 Output: count(1), t1.id Group Key: t1.id -> Append (cost=0.00..87.15 rows=3810 width=4) -> Seq Scan on public.t1_0 t1_1 (cost=0.00..22.70 rows=1270 width=4) Output: t1_1.id -> Seq Scan on public.t1_1 t1_2 (cost=0.00..22.70 rows=1270 width=4) Output: t1_2.id -> Seq Scan on public.t1_2 t1_3 (cost=0.00..22.70 rows=1270 width=4) Output: t1_3.id And partition wise aggregate rows cost is 600 set enable_partitionwise_aggregate = on; explain (verbose) select count(1) from t1 group by id; Append (cost=29.05..96.15 rows=600 width=12) --here rows is 600 -> HashAggregate (cost=29.05..31.05 rows=200 width=12) --this rows looks like same as normal aggregate Output: count(1), t1.id Group Key: t1.id -> Seq Scan on public.t1_0 t1 (cost=0.00..22.70 rows=1270 width=4) Output: t1.id -> HashAggregate (cost=29.05..31.05 rows=200 width=12) Output: count(1), t1_1.id Group Key: t1_1.id -> Seq Scan on public.t1_1 (cost=0.00..22.70 rows=1270 width=4) Output: t1_1.id -> HashAggregate (cost=29.05..31.05 rows=200 width=12) Output: count(1), t1_2.id Group Key: t1_2.id -> Seq Scan on public.t1_2 (cost=0.00..22.70 rows=1270 width=4) Output: t1_2.id Source code is 15beta1(7fdbdf204920ac279f280d0a8e96946fdaf41aef)
On Tue, 24 May 2022 at 15:38, bucoo <bucoo@sohu.com> wrote: > > Normal aggregate and partition wise aggregate have a big difference rows cost: > explain (verbose) > select count(1) from t1 group by id; > HashAggregate (cost=106.20..108.20 rows=200 width=12) --here rows is 200 > set enable_partitionwise_aggregate = on; > explain (verbose) > select count(1) from t1 group by id; > Append (cost=29.05..96.15 rows=600 width=12) --here rows is 600 I wouldn't say this is a bug. Could you not say that they're both wrong given that your tables are empty? What's going on here is that estimate_num_groups() is just returning 200, which is what it returns when there are no statistics to give any indication of a better value. 200 is returned no matter if the estimate is for a single partition or the partitioned table. For the partition-wise aggregate case, the 3 individual 200 estimates are just summed up by the Append costing code to give 600. The only way we could really do anything different here would be to have estimate_num_groups() return a default value based on the number of input rows. However, that 200 default is pretty long standing. We'd need to consider quite a bit more than this case before we could realistically consider changing it. For tables that are being created and queried quickly after, we normally tell people to run ANALYZE on the given tables to prevent this sort of thing. David
David Rowley <dgrowleyml@gmail.com> writes: > On Tue, 24 May 2022 at 15:38, bucoo <bucoo@sohu.com> wrote: >> Normal aggregate and partition wise aggregate have a big difference rows cost: > I wouldn't say this is a bug. Could you not say that they're both > wrong given that your tables are empty? We try fairly hard to ensure that the rowcount estimate for a given relation does not vary across paths, so I concur with the OP that this is a bug. Having said that, I'm not sure that the consequences are significant. As you say, the estimates seem to get a lot closer as soon as the table has some statistics. (But nonetheless, they are not identical, so it's still a bug.) regards, tom lane
> We try fairly hard to ensure that the row count estimate for a given relation > does not vary across paths, so I concur with the OP that this is a bug. Having > said that, I'm not sure that the consequences are significant. As you say, the > estimates seem to get a lot closer as soon as the table has some statistics. > (But nonetheless, they are not identical, so it's still a bug.) Yes, the estimates seem to get a lot closer as soon as the table has some statistics. > I'm not sure that the consequences are significant. At least it doesn't make any difference to me for now. I noticed this problem while testing aggregation. It looks a little weird, so I emailed. Thanks every one.