Thread: [PERF] Improve Cardinality Estimation for Joins with GROUP BY Having Single Clause

[PERF] Improve Cardinality Estimation for Joins with GROUP BY Having Single Clause

From

Ravi

Date:

27 November 2024, 16:11:13

Hi Developers,

Currently, PostgreSQL relies on table statistics, extracted within the examine_simple_variable function, to estimate join selectivity. However, when dealing with subqueries that include GROUP BY clauses even for the single length clauses which result in distinct rows, the planner often defaults to an assumption of 200 distinct rows. This leads to inaccurate cardinality predictions, potentially resulting in suboptimal join plans.

Problem Example

Consider the following query:

explain select * from t1 left join (select a, max(b) from t2 group by a) t2 on t1.a = t2.a;

The resulting plan predicts a high cardinality for the join, and places the larger dataset on the hash side:

QUERY PLAN

--------------------------------------------------------------------------------

Hash Join (cost=943037.92..955323.45 rows=6963818 width=16)

Hash Cond: (t1.a = t2.a)

-> Seq Scan on t1 (cost=0.00..289.00 rows=20000 width=8)

-> Hash (cost=893538.50..893538.50 rows=3017074 width=8)

-> HashAggregate (cost=777429.49..893538.50 rows=3017074 width=8)

Group Key: t2.a

Planned Partitions: 64

-> Seq Scan on t2 (cost=0.00..158673.98 rows=11000098 width=8)

(8 rows)

Here, the join cardinality is overestimated, and table t2 with larger dataset being placed on the hash side, despite t1 having fewer rows.

Proposed Solution:

In subqueries with a GROUP BY clause that has a single grouping column, it is reasonable to assume the result set contains unique values for that column.

By taking this assumption, we can consider the output of the aggregate node as unique and instead of assuming a default distinct row count (200), we should derive the estimate from the HashAggregate node’s row count.

Execution Plan after the patch applied:

QUERY PLAN

--------------------------------------------------------------------------

Hash Join (cost=777968.49..935762.27 rows=20000 width=16)

Hash Cond: (t2.a = t1.a)

-> HashAggregate (cost=777429.49..893538.50 rows=3017074 width=8)

Group Key: t2.a

Planned Partitions: 64

-> Seq Scan on t2 (cost=0.00..158673.98 rows=11000098 width=8)

-> Hash (cost=289.00..289.00 rows=20000 width=8)

-> Seq Scan on t1 (cost=0.00..289.00 rows=20000 width=8)

(8 rows)

Can you confirm if my assumption about leveraging the distinct row property of a GROUP BY clause with a single grouping column for improving join cardinality estimation is valid? If not, I would appreciate suggestions or corrections regarding this approach.

Thanks & Regards,

Ravi Revathy

Member Technical Staff

ZOHO Corporation

Re: [PERF] Improve Cardinality Estimation for Joins with GROUP BY Having Single Clause

From

Alena Rybakina

Date:

27 November 2024, 19:00:12

Hi!

On 27.11.2024 16:17, Ravi wrote:

Please find the patch attached below for your review.

Thanks & Regards,
Ravi Revathy
Member Technical Staff
ZOHO Corporation

---- On Wed, 27 Nov 2024 18:41:13 +0530 Ravi <revathy.r@zohocorp.com> wrote ---

Hi Developers,
Currently, PostgreSQL relies on table statistics, extracted within the examine_simple_variable function, to estimate join selectivity. However, when dealing with subqueries that include GROUP BY clauses even for the single length clauses which result in distinct rows, the planner often defaults to an assumption of 200 distinct rows. This leads to inaccurate cardinality predictions, potentially resulting in suboptimal join plans.

Problem Example

Consider the following query:

explain select * from t1 left join (select a, max(b) from t2 group by a) t2 on t1.a = t2.a;

The resulting plan predicts a high cardinality for the join, and places the larger dataset on the hash side:

QUERY PLAN
--------------------------------------------------------------------------------
Hash Join (cost=943037.92..955323.45 rows=6963818 width=16)
   Hash Cond: (t1.a = t2.a)
   -> Seq Scan on t1 (cost=0.00..289.00 rows=20000 width=8)
   -> Hash (cost=893538.50..893538.50 rows=3017074 width=8)
         -> HashAggregate (cost=777429.49..893538.50 rows=3017074 width=8)
               Group Key: t2.a
               Planned Partitions: 64
               -> Seq Scan on t2 (cost=0.00..158673.98 rows=11000098 width=8)
(8 rows)

Here, the join cardinality is overestimated, and table t2 with larger dataset being placed on the hash side, despite t1 having fewer rows.

Proposed Solution:
In subqueries with a GROUP BY clause that has a single grouping column, it is reasonable to assume the result set contains unique values for that column.
By taking this assumption, we can consider the output of the aggregate node as unique and instead of assuming a default distinct row count (200), we should derive the estimate from the HashAggregate node’s row count.

Execution Plan after the patch applied:

QUERY PLAN
--------------------------------------------------------------------------
Hash Join (cost=777968.49..935762.27 rows=20000 width=16)
   Hash Cond: (t2.a = t1.a)
   -> HashAggregate (cost=777429.49..893538.50 rows=3017074 width=8)
         Group Key: t2.a
         Planned Partitions: 64
         -> Seq Scan on t2 (cost=0.00..158673.98 rows=11000098 width=8)
   -> Hash (cost=289.00..289.00 rows=20000 width=8)
         -> Seq Scan on t1 (cost=0.00..289.00 rows=20000 width=8)
(8 rows)

Can you confirm if my assumption about leveraging the distinct row property of a GROUP BY clause with a single grouping column for improving join cardinality estimation is valid? If not, I would appreciate suggestions or corrections regarding this approach.

maybe I realized something was wrong, but I didn't see a problem with cardinality when I took out the problem like this:

alena@postgres=# drop table t1; DROP TABLE alena@postgres=# drop table t2; DROP TABLE alena@postgres=# create table t1 (a int); CREATE TABLE alena@postgres=# create table t1 (x int); ERROR: relation "t1" already exists alena@postgres=# create table t2 (a int, b int); CREATE TABLE alena@postgres=# insert into t1 select id from generate_series(1,1000) as id; INSERT 0 1000 alena@postgres=# insert into t2 select id, id%10 from generate_series(991,1900) as id; INSERT 0 910 alena@postgres=# analyze; ANALYZE alena@postgres=# explain analyze select * from t1 left join (select a, max(b) from t2 group by a) t2 on t1.a = t2.a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Left Join (cost=39.12..56.76 rows=1000 width=12) (actual time=2.024..2.731 rows=1000 loops=1) Hash Cond: (t1.a = t2.a) -> Seq Scan on t1 (cost=0.00..15.00 rows=1000 width=4) (actual time=0.030..0.259 rows=1000 loops=1) -> Hash (cost=27.75..27.75 rows=910 width=8) (actual time=1.986..1.987 rows=910 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 44kB -> HashAggregate (cost=18.65..27.75 rows=910 width=8) (actual time=1.162..1.586 rows=910 loops=1) Group Key: t2.a Batches: 1 Memory Usage: 169kB -> Seq Scan on t2 (cost=0.00..14.10 rows=910 width=8) (actual time=0.018..0.239 rows=910 loops=1) Planning Time: 0.215 ms Execution Time: 2.926 ms (11 rows)

Cardinality is predicted correctly as I see. I'm missing something?

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: [PERF] Improve Cardinality Estimation for Joins with GROUP BY Having Single Clause

From

Ravi

Date:

05 December 2024, 08:52:07

This issue occurs when the Aggregate node has more rows than the other node in the Join. The join selectivity is determined by the jselectivity factor, which is calculated based on the maximum number of distinct rows between the two nodes, as defined in the function eqjoinsel_inner. However, since the number of distinct rows for the Aggregate node defaults to 200, the other node's distinct row count is often larger and is therefore used as the maximum, despite the Aggregate node having a greater total number of rows.

Steps to recreate the issue:

postgres=# create table t1(a int);

CREATE TABLE

postgres=# create table t2(a int, b int);

CREATE TABLE

postgres=# insert into t1 select id from generate_series(1, 1000) as id;

INSERT 0 1000

insert into t2 select id, id%10 from generate_series(991, 20000) as id;

INSERT 0 19010

postgres=# analyze;

ANALYZE

postgres=# explain analyze select * from t1 left join (select a, max(b) from t2 group by a) t2 on t1.a = t2.a;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------

Hash Left Join (cost=797.88..815.51 rows=19010 width=12) (actual time=8.613..8.780 rows=1000 loops=1)

Hash Cond: (t1.a = t2.a)

-> Seq Scan on t1 (cost=0.00..15.00 rows=1000 width=4) (actual time=0.010..0.058 rows=1000 loops=1)

-> Hash (cost=560.25..560.25 rows=19010 width=8) (actual time=8.589..8.590 rows=19010 loops=1)

Buckets: 32768 Batches: 1 Memory Usage: 999kB

-> HashAggregate (cost=370.15..560.25 rows=19010 width=8) (actual time=4.594..6.735 rows=19010 loops=1)

Group Key: t2.a

Batches: 1 Memory Usage: 2321kB

-> Seq Scan on t2 (cost=0.00..275.10 rows=19010 width=8) (actual time=0.005..0.927 rows=19010 loops=1)

Planning Time: 0.166 ms

Execution Time: 9.399 ms

(11 rows)

After applying the patch we get the results as:

postgres=# explain analyze select * from t1 left join (select a, max(b) from t2 group by a) t2 on t1.a = t2.a;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------

Hash Right Join (cost=397.65..669.04 rows=1000 width=12) (actual time=8.003..11.075 rows=1000 loops=1)

Hash Cond: (t2.a = t1.a)

-> HashAggregate (cost=370.15..560.25 rows=19010 width=8) (actual time=7.301..9.457 rows=19010 loops=1)

Group Key: t2.a

Batches: 1 Memory Usage: 2321kB

-> Seq Scan on t2 (cost=0.00..275.10 rows=19010 width=8) (actual time=0.006..1.382 rows=19010 loops=1)

-> Hash (cost=15.00..15.00 rows=1000 width=4) (actual time=0.385..0.385 rows=1000 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 44kB

-> Seq Scan on t1 (cost=0.00..15.00 rows=1000 width=4) (actual time=0.023..0.155 rows=1000 loops=1)

Planning Time: 0.197 ms

Execution Time: 12.218 ms

(11 rows)

Thanks & Regards,

Ravi Revathy

Member Technical Staff

ZOHO Corporation

---- On Wed, 27 Nov 2024 21:30:12 +0530 Alena Rybakina <a.rybakina@postgrespro.ru> wrote ---

Hi!
On 27.11.2024 16:17, Ravi wrote:
Please find the patch attached below for your review.

Thanks & Regards,
Ravi Revathy
Member Technical Staff
ZOHO Corporation

---- On Wed, 27 Nov 2024 18:41:13 +0530 Ravi <revathy.r@zohocorp.com> wrote ---

Hi Developers,
Currently, PostgreSQL relies on table statistics, extracted within the examine_simple_variable function, to estimate join selectivity. However, when dealing with subqueries that include GROUP BY clauses even for the single length clauses which result in distinct rows, the planner often defaults to an assumption of 200 distinct rows. This leads to inaccurate cardinality predictions, potentially resulting in suboptimal join plans.

Problem Example

Consider the following query:

explain select * from t1 left join (select a, max(b) from t2 group by a) t2 on t1.a = t2.a;

The resulting plan predicts a high cardinality for the join, and places the larger dataset on the hash side:

QUERY PLAN
--------------------------------------------------------------------------------
Hash Join (cost=943037.92..955323.45 rows=6963818 width=16)
   Hash Cond: (t1.a = t2.a)
   -> Seq Scan on t1 (cost=0.00..289.00 rows=20000 width=8)
   -> Hash (cost=893538.50..893538.50 rows=3017074 width=8)
         -> HashAggregate (cost=777429.49..893538.50 rows=3017074 width=8)
               Group Key: t2.a
               Planned Partitions: 64
               -> Seq Scan on t2 (cost=0.00..158673.98 rows=11000098 width=8)
(8 rows)

Here, the join cardinality is overestimated, and table t2 with larger dataset being placed on the hash side, despite t1 having fewer rows.

Proposed Solution:
In subqueries with a GROUP BY clause that has a single grouping column, it is reasonable to assume the result set contains unique values for that column.
By taking this assumption, we can consider the output of the aggregate node as unique and instead of assuming a default distinct row count (200), we should derive the estimate from the HashAggregate node’s row count.

Execution Plan after the patch applied:

QUERY PLAN
--------------------------------------------------------------------------
Hash Join (cost=777968.49..935762.27 rows=20000 width=16)
   Hash Cond: (t2.a = t1.a)
   -> HashAggregate (cost=777429.49..893538.50 rows=3017074 width=8)
         Group Key: t2.a
         Planned Partitions: 64
         -> Seq Scan on t2 (cost=0.00..158673.98 rows=11000098 width=8)
   -> Hash (cost=289.00..289.00 rows=20000 width=8)
         -> Seq Scan on t1 (cost=0.00..289.00 rows=20000 width=8)
(8 rows)

Can you confirm if my assumption about leveraging the distinct row property of a GROUP BY clause with a single grouping column for improving join cardinality estimation is valid? If not, I would appreciate suggestions or corrections regarding this approach.

maybe I realized something was wrong, but I didn't see a problem with cardinality when I took out the problem like this:
alena@postgres=# drop table t1; DROP TABLE alena@postgres=# drop table t2; DROP TABLE alena@postgres=# create table t1 (a int); CREATE TABLE alena@postgres=# create table t1 (x int); ERROR: relation "t1" already exists alena@postgres=# create table t2 (a int, b int); CREATE TABLE alena@postgres=# insert into t1 select id from generate_series(1,1000) as id; INSERT 0 1000 alena@postgres=# insert into t2 select id, id%10 from generate_series(991,1900) as id; INSERT 0 910 alena@postgres=# analyze; ANALYZE alena@postgres=# explain analyze select * from t1 left join (select a, max(b) from t2 group by a) t2 on t1.a = t2.a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Left Join (cost=39.12..56.76 rows=1000 width=12) (actual time=2.024..2.731 rows=1000 loops=1) Hash Cond: (t1.a = t2.a) -> Seq Scan on t1 (cost=0.00..15.00 rows=1000 width=4) (actual time=0.030..0.259 rows=1000 loops=1) -> Hash (cost=27.75..27.75 rows=910 width=8) (actual time=1.986..1.987 rows=910 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 44kB -> HashAggregate (cost=18.65..27.75 rows=910 width=8) (actual time=1.162..1.586 rows=910 loops=1) Group Key: t2.a Batches: 1 Memory Usage: 169kB -> Seq Scan on t2 (cost=0.00..14.10 rows=910 width=8) (actual time=0.018..0.239 rows=910 loops=1) Planning Time: 0.215 ms Execution Time: 2.926 ms (11 rows)
Cardinality is predicted correctly as I see. I'm missing something?
-- 
Regards,
Alena Rybakina
Postgres Professional

Re: [PERF] Improve Cardinality Estimation for Joins with GROUP BY Having Single Clause

From

Ilia Evdokimov

Date:

18 March, 12:26:48

Hi,

On commit 122a9af I can't see any problem with query:

explain analyze select * from t1 left join (select a, max(b) from t2 
group by a) t2 on t1.a = t2.a;
                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
  Hash Right Join  (cost=397.65..669.04 rows=1000 width=12) (actual 
time=9.008..11.549 rows=1000.00 loops=1)
    Hash Cond: (t2.a = t1.a)
    Buffers: shared read=90
    ->  HashAggregate  (cost=370.15..560.25 rows=19010 width=8) (actual 
time=8.103..10.160 rows=19010.00 loops=1)
          Group Key: t2.a
          Batches: 1  Memory Usage: 2321kB
          Buffers: shared read=85
          ->  Seq Scan on t2  (cost=0.00..275.10 rows=19010 width=8) 
(actual time=0.015..1.907 rows=19010.00 loops=1)
                Buffers: shared read=85
    ->  Hash  (cost=15.00..15.00 rows=1000 width=4) (actual 
time=0.365..0.366 rows=1000.00 loops=1)
          Buckets: 1024  Batches: 1  Memory Usage: 44kB
          Buffers: shared read=5
          ->  Seq Scan on t1  (cost=0.00..15.00 rows=1000 width=4) 
(actual time=0.033..0.187 rows=1000.00 loops=1)
                Buffers: shared read=5
  Planning:
    Buffers: shared hit=31 read=16
  Planning Time: 0.751 ms
  Execution Time: 12.599 ms
(18 rows)

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: [PERF] Improve Cardinality Estimation for Joins with GROUP BY Having Single Clause

From

Ravi

Date:

20 March, 12:05:36

Hi,

I initially applied this patch on the master branch at commit f95da9f, but I have noticed that the latest master branch already includes a fix for this issue. A similar patch was recently committed in e983ee9 successfully resolving this issue. Thanks for the update and assistance!

Thanks & Regards,

Ravi Revathy

Member Technical Staff

ZOHO Corporation

---- On Tue, 18 Mar 2025 14:56:48 +0530 Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote ---

Hi,

On commit 122a9af I can't see any problem with query:

explain analyze select * from t1 left join (select a, max(b) from t2
group by a) t2 on t1.a = t2.a;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=397.65..669.04 rows=1000 width=12) (actual
time=9.008..11.549 rows=1000.00 loops=1)
   Hash Cond: (t2.a = t1.a)
   Buffers: shared read=90
   -> HashAggregate (cost=370.15..560.25 rows=19010 width=8) (actual
time=8.103..10.160 rows=19010.00 loops=1)
         Group Key: t2.a
         Batches: 1 Memory Usage: 2321kB
         Buffers: shared read=85
         -> Seq Scan on t2 (cost=0.00..275.10 rows=19010 width=8)
(actual time=0.015..1.907 rows=19010.00 loops=1)
               Buffers: shared read=85
   -> Hash (cost=15.00..15.00 rows=1000 width=4) (actual
time=0.365..0.366 rows=1000.00 loops=1)
         Buckets: 1024 Batches: 1 Memory Usage: 44kB
         Buffers: shared read=5
         -> Seq Scan on t1 (cost=0.00..15.00 rows=1000 width=4)
(actual time=0.033..0.187 rows=1000.00 loops=1)
               Buffers: shared read=5
Planning:
   Buffers: shared hit=31 read=16
Planning Time: 0.751 ms
Execution Time: 12.599 ms
(18 rows)

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.