Re: Ambigous Plan - Larger Table on Hash Side - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Ambigous Plan - Larger Table on Hash Side
Date
Msg-id CAFjFpRfjJy2LS-uBhPR+u58HcqWKXUDkgt+n_=JqkantFe_82Q@mail.gmail.com
Whole thread Raw
In response to Re: Ambigous Plan - Larger Table on Hash Side  (Narendra Pradeep U U <narendra.pradeep@zohocorp.com>)
List pgsql-hackers
On Tue, Mar 13, 2018 at 4:32 PM, Narendra Pradeep U U
<narendra.pradeep@zohocorp.com> wrote:
> Hi,
>       Thanks everyone for your suggestions. I would like to add  explain
> analyze of both the plans so that we can have broader picture.
>
> I have a work_mem of 1000 MB.
>
> The Plan which we get regularly with table being analyzed .
>
> tpch=# explain analyze  select b from tab2 left join tab1 on a = b;
>                                                            QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
>  Hash Left Join  (cost=945515.68..1071064.34 rows=78264 width=4) (actual
> time=9439.410..20445.620 rows=78264 loops=1)
>    Hash Cond: (tab2.b = tab1.a)
>    ->  Seq Scan on tab2  (cost=0.00..1129.64 rows=78264 width=4) (actual
> time=0.006..5.116 rows=78264 loops=1)
>    ->  Hash  (cost=442374.30..442374.30 rows=30667630 width=4) (actual
> time=9133.593..9133.593 rows=30667722 loops=1)
>          Buckets: 33554432  Batches: 2  Memory Usage: 801126kB
>          ->  Seq Scan on tab1  (cost=0.00..442374.30 rows=30667630 width=4)
> (actual time=0.030..3584.652 rows=30667722 loops=1)
>  Planning time: 0.055 ms
>  Execution time: 20472.603 ms
> (8 rows)
>
>
>
> I reproduced the  other plan by not analyzing the smaller table.
>
> tpch=# explain analyze  select b from tab2 left join tab1 on a = b;
>                                                         QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------
>  Hash Right Join  (cost=2102.88..905274.97 rows=78039 width=4) (actual
> time=15.331..7590.406 rows=78264 loops=1)
>    Hash Cond: (tab1.a = tab2.b)
>    ->  Seq Scan on tab1  (cost=0.00..442375.48 rows=30667748 width=4)
> (actual time=0.046..2697.480 rows=30667722 loops=1)
>    ->  Hash  (cost=1127.39..1127.39 rows=78039 width=4) (actual
> time=15.133..15.133 rows=78264 loops=1)
>          Buckets: 131072  Batches: 1  Memory Usage: 3776kB
>          ->  Seq Scan on tab2  (cost=0.00..1127.39 rows=78039 width=4)
> (actual time=0.009..5.516 rows=78264 loops=1)
>  Planning time: 0.053 ms
>  Execution time: 7592.688 ms
> (8 rows)

I am surprised to see the estimates to be very close to the actual
values even without analysing the small table.

>
>
> The actual plan seems to be Slower. The smaller table (tab2) has exactly
> each row duplicated 8 times  and all the rows in larger table (tab2) are
> distinct. what may be the exact reason  and  can we fix this ?

After analysing the small table, the first plan is chosen as the
cheapest. This means that the plan with smaller table being hashed has
cost higher than the plan with larger table being hashed. We need to
examine that costing to see what went wrong in costing.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] GUC for cleanup indexes threshold.
Next
From: Tatsuro Yamada
Date:
Subject: Re: planner bug regarding lateral and subquery?