BUG #16759: Estimation of the planner is wrong for hash join - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #16759: Estimation of the planner is wrong for hash join
Date
Msg-id 16759-f0980add03d8f152@postgresql.org
Whole thread Raw
Responses Re: BUG #16759: Estimation of the planner is wrong for hash join
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16759
Logged by:          Bertrand Guillaumin
Email address:      bertrand.guillaumin@gmail.com
PostgreSQL version: 11.8
Operating system:   Linux
Description:

The following query estimated number of lines returned is 1 while it should
be around 67 or more : 
explain analyze select * from enterprise where parent_enterprise in (select
enterprise_id from enterprise par where global_attribute15 = 'BEL');
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=95.60..191.33 rows=1 width=977) (actual time=0.422..0.896
rows=56 loops=1)
   Hash Cond: (enterprise.parent_enterprise = par.enterprise_id)
   ->  Seq Scan on enterprise  (cost=0.00..92.87 rows=1087 width=977)
(actual time=0.004..0.095 rows=1087 loops=1)
   ->  Hash  (cost=95.59..95.59 rows=1 width=5) (actual time=0.397..0.398
rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on enterprise par  (cost=0.00..95.59 rows=1 width=5)
(actual time=0.251..0.394 rows=1 loops=1)
               Filter: (global_attribute15 = 'BEL'::text)
               Rows Removed by Filter: 1086

The same issue with a normal join : 
explain analyze select * from enterprise  ent1, enterprise ent2 where
ent1.parent_enterprise=ent2.enterprise_id and ent2.global_attribute15 =
'BEL';
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=95.60..191.33 rows=1 width=1954) (actual time=0.444..0.954
rows=56 loops=1)
   Hash Cond: (ent1.parent_enterprise = ent2.enterprise_id)
   ->  Seq Scan on enterprise ent1  (cost=0.00..92.87 rows=1087 width=977)
(actual time=0.004..0.104 rows=1087 loops=1)
   ->  Hash  (cost=95.59..95.59 rows=1 width=977) (actual time=0.416..0.416
rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on enterprise ent2  (cost=0.00..95.59 rows=1
width=977) (actual time=0.252..0.410 rows=1 loops=1)
               Filter: (global_attribute15 = 'BEL'::text)
               Rows Removed by Filter: 1086


The statistics for parent_enterprise : 
select * from pg_stats where tablename='enterprise' and
attname='parent_enterprise';
 schemaname | tablename  |      attname      | inherited | null_frac  |
avg_width | n_distinct |                  most_common_vals
|
most_common_freqs                                                  |
histogram_bounds     | correlation | most_common_elems |
most_common_elem_freqs | elem_count_histogram

------------+------------+-------------------+-----------+------------+-----------+------------+-----------------------------------------------------+-------------------------------------------------

-------------------------------------------------------------------+-------------------------+-------------+-------------------+------------------------+----------------------
xxxxxxx     | enterprise | parent_enterprise | f         | 0.00551978 |
   4 |         16 | {48,682,6162,6639,6448,46,6630,6796,6553,6812,6854} |
{0.551058,0.184913,0.0818767,0.0772769,0.0515179
,0.0137994,0.00919963,0.00827967,0.00643974,0.00367985,0.00183993} |
{0,6831,6853,6904,6917} |    0.755042 |                   |
      |
(1 row)

The right estimation when using = (not an option here): 
explain select * from enterprise where parent_enterprise = (select
enterprise_id from enterprise par where global_attribute15 = 'BEL' limit
1);
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on enterprise  (cost=95.59..191.18 rows=68 width=977)
   Filter: (parent_enterprise = $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on enterprise par  (cost=0.00..95.59 rows=1 width=5)
           Filter: (global_attribute15 = 'BEL'::text)

The wrong estimation leads to issues in bigger queries.


pgsql-bugs by date:

Previous
From: "Shinoda, Noriyoshi (PN Japan FSIP)"
Date:
Subject: RE: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.
Next
From: PG Bug reporting form
Date:
Subject: BUG #16760: Standby database missed records for at least 1 table