BUG #16759: Estimation of the planner is wrong for hash join - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16759: Estimation of the planner is wrong for hash join |
Date | |
Msg-id | 16759-f0980add03d8f152@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16759: Estimation of the planner is wrong for hash join
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16759 Logged by: Bertrand Guillaumin Email address: bertrand.guillaumin@gmail.com PostgreSQL version: 11.8 Operating system: Linux Description: The following query estimated number of lines returned is 1 while it should be around 67 or more : explain analyze select * from enterprise where parent_enterprise in (select enterprise_id from enterprise par where global_attribute15 = 'BEL'); QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Hash Join (cost=95.60..191.33 rows=1 width=977) (actual time=0.422..0.896 rows=56 loops=1) Hash Cond: (enterprise.parent_enterprise = par.enterprise_id) -> Seq Scan on enterprise (cost=0.00..92.87 rows=1087 width=977) (actual time=0.004..0.095 rows=1087 loops=1) -> Hash (cost=95.59..95.59 rows=1 width=5) (actual time=0.397..0.398 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on enterprise par (cost=0.00..95.59 rows=1 width=5) (actual time=0.251..0.394 rows=1 loops=1) Filter: (global_attribute15 = 'BEL'::text) Rows Removed by Filter: 1086 The same issue with a normal join : explain analyze select * from enterprise ent1, enterprise ent2 where ent1.parent_enterprise=ent2.enterprise_id and ent2.global_attribute15 = 'BEL'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=95.60..191.33 rows=1 width=1954) (actual time=0.444..0.954 rows=56 loops=1) Hash Cond: (ent1.parent_enterprise = ent2.enterprise_id) -> Seq Scan on enterprise ent1 (cost=0.00..92.87 rows=1087 width=977) (actual time=0.004..0.104 rows=1087 loops=1) -> Hash (cost=95.59..95.59 rows=1 width=977) (actual time=0.416..0.416 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on enterprise ent2 (cost=0.00..95.59 rows=1 width=977) (actual time=0.252..0.410 rows=1 loops=1) Filter: (global_attribute15 = 'BEL'::text) Rows Removed by Filter: 1086 The statistics for parent_enterprise : select * from pg_stats where tablename='enterprise' and attname='parent_enterprise'; schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram ------------+------------+-------------------+-----------+------------+-----------+------------+-----------------------------------------------------+------------------------------------------------- -------------------------------------------------------------------+-------------------------+-------------+-------------------+------------------------+---------------------- xxxxxxx | enterprise | parent_enterprise | f | 0.00551978 | 4 | 16 | {48,682,6162,6639,6448,46,6630,6796,6553,6812,6854} | {0.551058,0.184913,0.0818767,0.0772769,0.0515179 ,0.0137994,0.00919963,0.00827967,0.00643974,0.00367985,0.00183993} | {0,6831,6853,6904,6917} | 0.755042 | | | (1 row) The right estimation when using = (not an option here): explain select * from enterprise where parent_enterprise = (select enterprise_id from enterprise par where global_attribute15 = 'BEL' limit 1); QUERY PLAN ----------------------------------------------------------------------- Seq Scan on enterprise (cost=95.59..191.18 rows=68 width=977) Filter: (parent_enterprise = $0) InitPlan 1 (returns $0) -> Seq Scan on enterprise par (cost=0.00..95.59 rows=1 width=5) Filter: (global_attribute15 = 'BEL'::text) The wrong estimation leads to issues in bigger queries.
pgsql-bugs by date: