Thread: Ambigous Plan - Larger Table on Hash Side

Ambigous Plan - Larger Table on Hash Side

From

Narendra Pradeep U U

Date:

12 March 2018, 19:32:52

Hi ,

Recently I came across a case where the planner choose larger table on hash side. I am not sure whether it is an intended behavior or we are missing something.

I have two tables (a and b) each with single column in it. One table 'a' is large with around 30 million distinct rows and other table 'b' has merely 70,000 rows with one-seventh (10,000) distinct rows. I have analyzed both the table. But while joining both the table I get the larger table on hash side.

tpch=# explain select b from b left join a on a = b;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
Hash Left Join (cost=824863.75..950104.42 rows=78264 width=4)
   Hash Cond: (b.b = a.a)o
   -> Foreign Scan on b (cost=0.00..821.64 rows=78264 width=4)
         CStore File: /home/likewise-open/pg96/data/cstore_fdw/1818708/1849879
         CStore File Size: 314587
   -> Hash (cost=321721.22..321721.22 rows=30667722 width=4)
         -> Foreign Scan on a (cost=0.00..321721.22 rows=30667722 width=4)
               CStore File: /home/likewise-open/pg96/data/cstore_fdw/1818708/1849876
               CStore File Size: 123236206
(9 rows)

I would like to know the reason for choosing this plan and Is there a easy fix to prevent such plans (especially like this one where it choose a larger hash table) ?

Thanks,

Pradeep

Re: Ambigous Plan - Larger Table on Hash Side

From

Tom Lane

Date:

12 March 2018, 19:52:00

Narendra Pradeep U U <narendra.pradeep@zohocorp.com> writes:
>       Recently I came across a case where the planner choose larger table on hash side. I am not sure whether it is
anintended  behavior or we are missing something.  

Probably the reason is that the smaller table has a less uniform
distribution of the hash key.  You don't want to hash with a nonuniform
distribution of the hashtable key; if many keys go into the same bucket
then performance degrades drastically.

            regards, tom lane

Re: Ambigous Plan - Larger Table on Hash Side

From

Andres Freund

Date:

12 March 2018, 20:43:14

On 2018-03-12 12:52:00 -0400, Tom Lane wrote:
> Narendra Pradeep U U <narendra.pradeep@zohocorp.com> writes:
> >       Recently I came across a case where the planner choose larger table on hash side. I am not sure whether it is
anintended  behavior or we are missing something. 

> 
> Probably the reason is that the smaller table has a less uniform
> distribution of the hash key.  You don't want to hash with a nonuniform
> distribution of the hashtable key; if many keys go into the same bucket
> then performance degrades drastically.

Not sure I follow. Unless the values are equivalent (i.e. duplicate key
values), why should non-uniformity in key space translate to hash space?
And if there's duplicates it shouldn't hurt much either, unless doing
a semi/anti-join? All rows are going to be returned and IIRC we quite
cheaply continue a bucket scan?

Greetings,

Andres Freund

Re: Ambigous Plan - Larger Table on Hash Side

From

Tom Lane

Date:

12 March 2018, 21:06:51

Andres Freund <andres@anarazel.de> writes:
> Not sure I follow. Unless the values are equivalent (i.e. duplicate key
> values), why should non-uniformity in key space translate to hash space?

Duplicates are exactly the problem.  See estimate_hash_bucket_stats.

> And if there's duplicates it shouldn't hurt much either, unless doing
> a semi/anti-join? All rows are going to be returned and IIRC we quite
> cheaply continue a bucket scan?

If the bucket containing the MCV is bigger than work_mem, you gotta
problem --- one not necessarily shared by the other relation.

            regards, tom lane

Re: Ambigous Plan - Larger Table on Hash Side

From

Ashutosh Bapat

Date:

13 March 2018, 10:12:12

On Mon, Mar 12, 2018 at 10:02 PM, Narendra Pradeep U U
<narendra.pradeep@zohocorp.com> wrote:
> Hi ,
>
>       Recently I came across a case where the planner choose larger table on
> hash side. I am not sure whether it is an intended  behavior or we are
> missing something.
>
>      I have two tables (a and b) each with  single column in it. One table
> 'a' is large with around 30 million distinct rows and other table 'b' has
> merely 70,000 rows with one-seventh (10,000) distinct rows. I have analyzed
> both the table.  But while joining both the table I get the larger table on
> hash side.
>
> tpch=# explain select b from b left join a on a = b;
>                                                QUERY PLAN
> ---------------------------------------------------------------------------------------------------------
>  Hash Left Join  (cost=824863.75..950104.42 rows=78264 width=4)
>    Hash Cond: (b.b = a.a)o
>    ->  Foreign Scan on b  (cost=0.00..821.64 rows=78264 width=4)
>          CStore File:
> /home/likewise-open/pg96/data/cstore_fdw/1818708/1849879
>          CStore File Size: 314587
>    ->  Hash  (cost=321721.22..321721.22 rows=30667722 width=4)
>          ->  Foreign Scan on a  (cost=0.00..321721.22 rows=30667722 width=4)
>                CStore File:
> /home/likewise-open/pg96/data/cstore_fdw/1818708/1849876
>                CStore File Size: 123236206
> (9 rows)
>
>
>
> I would like to know the reason for choosing this plan and Is there a easy
> fix to prevent such plans (especially like this one where it choose a larger
> hash table) ?

A plan with larger table being hashed doesn't necessarily bad
performing one. During partition-wise join analysis I have seen plans
with larger table being hashed perform better than the plans with
smaller table being hashed. But I have seen the other way around as
well. Although, I don't know an easy way to force which side of join
gets hashed. I tried that under the debugger. In your case, if you run
EXPLAIN ANALYZE on this query, produce outputs of two plans: one with
larger table being hashed and second with the smaller one being
hashed, you will see which of them performs better.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Ambigous Plan - Larger Table on Hash Side

From

Narendra Pradeep U U

Date:

13 March 2018, 14:02:36

Hi,

Thanks everyone for your suggestions. I would like to add explain analyze of both the plans so that we can have broader picture.

I have a work_mem of 1000 MB.

The Plan which we get regularly with table being analyzed .

tpch=# explain analyze select b from tab2 left join tab1 on a = b;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=945515.68..1071064.34 rows=78264 width=4) (actual time=9439.410..20445.620 rows=78264 loops=1)
   Hash Cond: (tab2.b = tab1.a)
   -> Seq Scan on tab2 (cost=0.00..1129.64 rows=78264 width=4) (actual time=0.006..5.116 rows=78264 loops=1)
   -> Hash (cost=442374.30..442374.30 rows=30667630 width=4) (actual time=9133.593..9133.593 rows=30667722 loops=1)
         Buckets: 33554432 Batches: 2 Memory Usage: 801126kB
         -> Seq Scan on tab1 (cost=0.00..442374.30 rows=30667630 width=4) (actual time=0.030..3584.652 rows=30667722 loops=1)
Planning time: 0.055 ms
Execution time: 20472.603 ms
(8 rows)

I reproduced the other plan by not analyzing the smaller table.

tpch=# explain analyze select b from tab2 left join tab1 on a = b;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=2102.88..905274.97 rows=78039 width=4) (actual time=15.331..7590.406 rows=78264 loops=1)
   Hash Cond: (tab1.a = tab2.b)
   -> Seq Scan on tab1 (cost=0.00..442375.48 rows=30667748 width=4) (actual time=0.046..2697.480 rows=30667722 loops=1)
   -> Hash (cost=1127.39..1127.39 rows=78039 width=4) (actual time=15.133..15.133 rows=78264 loops=1)
         Buckets: 131072 Batches: 1 Memory Usage: 3776kB
         -> Seq Scan on tab2 (cost=0.00..1127.39 rows=78039 width=4) (actual time=0.009..5.516 rows=78264 loops=1)
Planning time: 0.053 ms
Execution time: 7592.688 ms
(8 rows)

The actual plan seems to be Slower. The smaller table (tab2) has exactly each row duplicated 8 times and all the rows in larger table (tab2) are distinct. what may be the exact reason and can we fix this ?

P.s I have also attached a sql file to reproduce this

---- On Tue, 13 Mar 2018 12:42:12 +0530 Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote ----

On Mon, Mar 12, 2018 at 10:02 PM, Narendra Pradeep U U
<narendra.pradeep@zohocorp.com> wrote:
> Hi ,
>
> Recently I came across a case where the planner choose larger table on
> hash side. I am not sure whether it is an intended behavior or we are
> missing something.
>
> I have two tables (a and b) each with single column in it. One table
> 'a' is large with around 30 million distinct rows and other table 'b' has
> merely 70,000 rows with one-seventh (10,000) distinct rows. I have analyzed
> both the table. But while joining both the table I get the larger table on
> hash side.
>
> tpch=# explain select b from b left join a on a = b;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------
> Hash Left Join (cost=824863.75..950104.42 rows=78264 width=4)
> Hash Cond: (b.b = a.a)o
> -> Foreign Scan on b (cost=0.00..821.64 rows=78264 width=4)
> CStore File:
> /home/likewise-open/pg96/data/cstore_fdw/1818708/1849879
> CStore File Size: 314587
> -> Hash (cost=321721.22..321721.22 rows=30667722 width=4)
> -> Foreign Scan on a (cost=0.00..321721.22 rows=30667722 width=4)
> CStore File:
> /home/likewise-open/pg96/data/cstore_fdw/1818708/1849876
> CStore File Size: 123236206
> (9 rows)
>
>
>
> I would like to know the reason for choosing this plan and Is there a easy
> fix to prevent such plans (especially like this one where it choose a larger
> hash table) ?

A plan with larger table being hashed doesn't necessarily bad
performing one. During partition-wise join analysis I have seen plans
with larger table being hashed perform better than the plans with
smaller table being hashed. But I have seen the other way around as
well. Although, I don't know an easy way to force which side of join
gets hashed. I tried that under the debugger. In your case, if you run
EXPLAIN ANALYZE on this query, produce outputs of two plans: one with
larger table being hashed and second with the smaller one being
hashed, you will see which of them performs better.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

join_plan.sql

Re: Ambigous Plan - Larger Table on Hash Side

From

Ashutosh Bapat

Date:

14 March 2018, 07:48:16

On Tue, Mar 13, 2018 at 4:32 PM, Narendra Pradeep U U
<narendra.pradeep@zohocorp.com> wrote:
> Hi,
>       Thanks everyone for your suggestions. I would like to add  explain
> analyze of both the plans so that we can have broader picture.
>
> I have a work_mem of 1000 MB.
>
> The Plan which we get regularly with table being analyzed .
>
> tpch=# explain analyze  select b from tab2 left join tab1 on a = b;
>                                                            QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
>  Hash Left Join  (cost=945515.68..1071064.34 rows=78264 width=4) (actual
> time=9439.410..20445.620 rows=78264 loops=1)
>    Hash Cond: (tab2.b = tab1.a)
>    ->  Seq Scan on tab2  (cost=0.00..1129.64 rows=78264 width=4) (actual
> time=0.006..5.116 rows=78264 loops=1)
>    ->  Hash  (cost=442374.30..442374.30 rows=30667630 width=4) (actual
> time=9133.593..9133.593 rows=30667722 loops=1)
>          Buckets: 33554432  Batches: 2  Memory Usage: 801126kB
>          ->  Seq Scan on tab1  (cost=0.00..442374.30 rows=30667630 width=4)
> (actual time=0.030..3584.652 rows=30667722 loops=1)
>  Planning time: 0.055 ms
>  Execution time: 20472.603 ms
> (8 rows)
>
>
>
> I reproduced the  other plan by not analyzing the smaller table.
>
> tpch=# explain analyze  select b from tab2 left join tab1 on a = b;
>                                                         QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------
>  Hash Right Join  (cost=2102.88..905274.97 rows=78039 width=4) (actual
> time=15.331..7590.406 rows=78264 loops=1)
>    Hash Cond: (tab1.a = tab2.b)
>    ->  Seq Scan on tab1  (cost=0.00..442375.48 rows=30667748 width=4)
> (actual time=0.046..2697.480 rows=30667722 loops=1)
>    ->  Hash  (cost=1127.39..1127.39 rows=78039 width=4) (actual
> time=15.133..15.133 rows=78264 loops=1)
>          Buckets: 131072  Batches: 1  Memory Usage: 3776kB
>          ->  Seq Scan on tab2  (cost=0.00..1127.39 rows=78039 width=4)
> (actual time=0.009..5.516 rows=78264 loops=1)
>  Planning time: 0.053 ms
>  Execution time: 7592.688 ms
> (8 rows)

I am surprised to see the estimates to be very close to the actual
values even without analysing the small table.

>
>
> The actual plan seems to be Slower. The smaller table (tab2) has exactly
> each row duplicated 8 times  and all the rows in larger table (tab2) are
> distinct. what may be the exact reason  and  can we fix this ?

After analysing the small table, the first plan is chosen as the
cheapest. This means that the plan with smaller table being hashed has
cost higher than the plan with larger table being hashed. We need to
examine that costing to see what went wrong in costing.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Ambigous Plan - Larger Table on Hash Side

From

Jeff Janes

Date:

14 March 2018, 22:10:27

On Tue, Mar 13, 2018 at 4:02 AM, Narendra Pradeep U U <narendra.pradeep@zohocorp.com> wrote:

Hi,
Thanks everyone for your suggestions. I would like to add explain analyze of both the plans so that we can have broader picture.

I have a work_mem of 1000 MB.

Is it possible to repeat with 2000MB or 3000MB? It would be interesting to see what the estimated cost and what the actual time would be if there were only 1 batch rather than 2.

Also, can you repeat all of these with EXPLAIN (ANALYZE, TIMING OFF) ? Sometimes the act of measuring the times can distort the times by quite a bit. (It will still give an overall execution time, it just won't try to attribute that time to the individual steps)

Cheers,

Jeff

Re: Ambigous Plan - Larger Table on Hash Side

From

Narendra Pradeep U U

Date:

16 March 2018, 09:42:11

Hi Jeff,

I repeated the same query with a work_mem of 2000MB. It is faster than the one with two batches but still slower than hashing the smaller table. So in this case It makes more sense to hash the smaller table (less execution time and reduce hash table size).

Explain analyze with higher work_mem (2 GB)

tpch=# explain analyze select b from tab2 left join tab1 on a = b;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=825722.33..830863.00 rows=78264 width=4) (actual time=7876.746..7905.323 rows=78264 loops=1)
   Hash Cond: (tab2.b = tab1.a)
   -> Seq Scan on tab2 (cost=0.00..1129.64 rows=78264 width=4) (actual time=0.007..4.268 rows=78264 loops=1)
   -> Hash (cost=442375.48..442375.48 rows=30667748 width=4) (actual time=7834.214..7834.214 rows=30667722 loops=1)
         Buckets: 33554432 Batches: 1 Memory Usage: 1340307kB
         -> Seq Scan on tab1 (cost=0.00..442375.48 rows=30667748 width=4) (actual time=0.047..2267.995 rows=30667722 loops=1)
Planning time: 0.052 ms
Execution time: 7953.913 ms
(8 rows)

Yeah, explain analyze distorted the time, while the actual query is twice faster.

Still, 2GB work_mem is not a feasible idea. Is there a way to avoid these plan (especially avoiding larger table on hash side) or any work around available ?

Thanks,

Pradeep

---- On Thu, 15 Mar 2018 00:40:27 +0530 Jeff Janes <jeff.janes@gmail.com> wrote ----

On Tue, Mar 13, 2018 at 4:02 AM, Narendra Pradeep U U <narendra.pradeep@zohocorp.com> wrote:

Hi,
Thanks everyone for your suggestions. I would like to add explain analyze of both the plans so that we can have broader picture.

I have a work_mem of 1000 MB.

Is it possible to repeat with 2000MB or 3000MB? It would be interesting to see what the estimated cost and what the actual time would be if there were only 1 batch rather than 2.

Also, can you repeat all of these with EXPLAIN (ANALYZE, TIMING OFF) ? Sometimes the act of measuring the times can distort the times by quite a bit. (It will still give an overall execution time, it just won't try to attribute that time to the individual steps)

Cheers,

Jeff