Re: Ambigous Plan - Larger Table on Hash Side - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Ambigous Plan - Larger Table on Hash Side |
Date | |
Msg-id | CAFjFpRfjJy2LS-uBhPR+u58HcqWKXUDkgt+n_=JqkantFe_82Q@mail.gmail.com Whole thread Raw |
In response to | Re: Ambigous Plan - Larger Table on Hash Side (Narendra Pradeep U U <narendra.pradeep@zohocorp.com>) |
List | pgsql-hackers |
On Tue, Mar 13, 2018 at 4:32 PM, Narendra Pradeep U U <narendra.pradeep@zohocorp.com> wrote: > Hi, > Thanks everyone for your suggestions. I would like to add explain > analyze of both the plans so that we can have broader picture. > > I have a work_mem of 1000 MB. > > The Plan which we get regularly with table being analyzed . > > tpch=# explain analyze select b from tab2 left join tab1 on a = b; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------- > Hash Left Join (cost=945515.68..1071064.34 rows=78264 width=4) (actual > time=9439.410..20445.620 rows=78264 loops=1) > Hash Cond: (tab2.b = tab1.a) > -> Seq Scan on tab2 (cost=0.00..1129.64 rows=78264 width=4) (actual > time=0.006..5.116 rows=78264 loops=1) > -> Hash (cost=442374.30..442374.30 rows=30667630 width=4) (actual > time=9133.593..9133.593 rows=30667722 loops=1) > Buckets: 33554432 Batches: 2 Memory Usage: 801126kB > -> Seq Scan on tab1 (cost=0.00..442374.30 rows=30667630 width=4) > (actual time=0.030..3584.652 rows=30667722 loops=1) > Planning time: 0.055 ms > Execution time: 20472.603 ms > (8 rows) > > > > I reproduced the other plan by not analyzing the smaller table. > > tpch=# explain analyze select b from tab2 left join tab1 on a = b; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------- > Hash Right Join (cost=2102.88..905274.97 rows=78039 width=4) (actual > time=15.331..7590.406 rows=78264 loops=1) > Hash Cond: (tab1.a = tab2.b) > -> Seq Scan on tab1 (cost=0.00..442375.48 rows=30667748 width=4) > (actual time=0.046..2697.480 rows=30667722 loops=1) > -> Hash (cost=1127.39..1127.39 rows=78039 width=4) (actual > time=15.133..15.133 rows=78264 loops=1) > Buckets: 131072 Batches: 1 Memory Usage: 3776kB > -> Seq Scan on tab2 (cost=0.00..1127.39 rows=78039 width=4) > (actual time=0.009..5.516 rows=78264 loops=1) > Planning time: 0.053 ms > Execution time: 7592.688 ms > (8 rows) I am surprised to see the estimates to be very close to the actual values even without analysing the small table. > > > The actual plan seems to be Slower. The smaller table (tab2) has exactly > each row duplicated 8 times and all the rows in larger table (tab2) are > distinct. what may be the exact reason and can we fix this ? After analysing the small table, the first plan is chosen as the cheapest. This means that the plan with smaller table being hashed has cost higher than the plan with larger table being hashed. We need to examine that costing to see what went wrong in costing. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: