BUG #16759: Estimation of the planner is wrong for hash join - Mailing list pgsql-bugs
| From | PG Bug reporting form |
|---|---|
| Subject | BUG #16759: Estimation of the planner is wrong for hash join |
| Date | |
| Msg-id | 16759-f0980add03d8f152@postgresql.org Whole thread Raw |
| Responses |
Re: BUG #16759: Estimation of the planner is wrong for hash join
|
| List | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 16759
Logged by: Bertrand Guillaumin
Email address: bertrand.guillaumin@gmail.com
PostgreSQL version: 11.8
Operating system: Linux
Description:
The following query estimated number of lines returned is 1 while it should
be around 67 or more :
explain analyze select * from enterprise where parent_enterprise in (select
enterprise_id from enterprise par where global_attribute15 = 'BEL');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Hash Join (cost=95.60..191.33 rows=1 width=977) (actual time=0.422..0.896
rows=56 loops=1)
Hash Cond: (enterprise.parent_enterprise = par.enterprise_id)
-> Seq Scan on enterprise (cost=0.00..92.87 rows=1087 width=977)
(actual time=0.004..0.095 rows=1087 loops=1)
-> Hash (cost=95.59..95.59 rows=1 width=5) (actual time=0.397..0.398
rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on enterprise par (cost=0.00..95.59 rows=1 width=5)
(actual time=0.251..0.394 rows=1 loops=1)
Filter: (global_attribute15 = 'BEL'::text)
Rows Removed by Filter: 1086
The same issue with a normal join :
explain analyze select * from enterprise ent1, enterprise ent2 where
ent1.parent_enterprise=ent2.enterprise_id and ent2.global_attribute15 =
'BEL';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=95.60..191.33 rows=1 width=1954) (actual time=0.444..0.954
rows=56 loops=1)
Hash Cond: (ent1.parent_enterprise = ent2.enterprise_id)
-> Seq Scan on enterprise ent1 (cost=0.00..92.87 rows=1087 width=977)
(actual time=0.004..0.104 rows=1087 loops=1)
-> Hash (cost=95.59..95.59 rows=1 width=977) (actual time=0.416..0.416
rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on enterprise ent2 (cost=0.00..95.59 rows=1
width=977) (actual time=0.252..0.410 rows=1 loops=1)
Filter: (global_attribute15 = 'BEL'::text)
Rows Removed by Filter: 1086
The statistics for parent_enterprise :
select * from pg_stats where tablename='enterprise' and
attname='parent_enterprise';
schemaname | tablename | attname | inherited | null_frac |
avg_width | n_distinct | most_common_vals
|
most_common_freqs |
histogram_bounds | correlation | most_common_elems |
most_common_elem_freqs | elem_count_histogram
------------+------------+-------------------+-----------+------------+-----------+------------+-----------------------------------------------------+-------------------------------------------------
-------------------------------------------------------------------+-------------------------+-------------+-------------------+------------------------+----------------------
xxxxxxx | enterprise | parent_enterprise | f | 0.00551978 |
4 | 16 | {48,682,6162,6639,6448,46,6630,6796,6553,6812,6854} |
{0.551058,0.184913,0.0818767,0.0772769,0.0515179
,0.0137994,0.00919963,0.00827967,0.00643974,0.00367985,0.00183993} |
{0,6831,6853,6904,6917} | 0.755042 | |
|
(1 row)
The right estimation when using = (not an option here):
explain select * from enterprise where parent_enterprise = (select
enterprise_id from enterprise par where global_attribute15 = 'BEL' limit
1);
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on enterprise (cost=95.59..191.18 rows=68 width=977)
Filter: (parent_enterprise = $0)
InitPlan 1 (returns $0)
-> Seq Scan on enterprise par (cost=0.00..95.59 rows=1 width=5)
Filter: (global_attribute15 = 'BEL'::text)
The wrong estimation leads to issues in bigger queries.
pgsql-bugs by date: