Thread: BUG #6668: hashjoin cost problem

BUG #6668: hashjoin cost problem

From
postgresuser@yahoo.com
Date:
The following bug has been logged on the website:

Bug reference:      6668
Logged by:          Postgres User
Email address:      postgresuser@yahoo.com
PostgreSQL version: 9.1.3
Operating system:   Ubuntu
Description:=20=20=20=20=20=20=20=20


work_mem 1MB

create table small(i) as select (g/1000) * 1000 from
generate_series(1,10000) g;

create table large(i) as select generate_series(1,100000000);

vacuum;
vacuum;
vacuum analyze;

explain analyze select * from small inner join large using (i);

                                                             QUERY PLAN=20=
=20=20=20=20
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
=20=20=20=20=20=20=20=20=20=20=20=20=20
---------------------------------------------------------------------------=
---------------------------------------------
-------------
 Hash Join  (cost=3D3083103.00..3475328.00 rows=3D10000 width=3D4) (actual
time=3D84079.899..84989.419 rows=3D9001 loops=3D1)
   Hash Cond: (small.i =3D large.i)
   ->  Seq Scan on small  (cost=3D0.00..145.00 rows=3D10000 width=3D4) (act=
ual
time=3D0.008..0.588 rows=3D10000 loops=3D1)
   ->  Hash  (cost=3D1442478.00..1442478.00 rows=3D100000000 width=3D4) (ac=
tual
time=3D84079.741..84079.741 rows=3D100000000 loops
=3D1)
         Buckets: 4096  Batches: 4096  Memory Usage: 853kB
         ->  Seq Scan on large  (cost=3D0.00..1442478.00 rows=3D100000000
width=3D4) (actual time=3D0.005..59011.443 rows=3D100000
000 loops=3D1)
 Total runtime: 84990.270 ms
(7 rows)


It doesn't matter how big the big table is... for this distribution large
table is hashed.

Forcing (gdb) the cost in one of the cost_hashjoin calls to 0, it chooses to
hash the smaller table with better results:

explain analyze select * from small inner join large using (i);
                                                          QUERY PLAN=20=20=
=20=20=20=20=20=20
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
=20=20=20=20=20=20
---------------------------------------------------------------------------=
---------------------------------------------
------
 Hash Join  (cost=3D270.00..0.00 rows=3D10000 width=3D4) (actual
time=3D14028.034..16510.598 rows=3D9001 loops=3D1)
   Hash Cond: (large.i =3D small.i)
   ->  Seq Scan on large  (cost=3D0.00..1442478.00 rows=3D100000000 width=
=3D4)
(actual time=3D0.010..5893.344 rows=3D100000000 loo
ps=3D1)
   ->  Hash  (cost=3D145.00..145.00 rows=3D10000 width=3D4) (actual
time=3D3.854..3.854 rows=3D10000 loops=3D1)
         Buckets: 1024  Batches: 1  Memory Usage: 352kB
         ->  Seq Scan on small  (cost=3D0.00..145.00 rows=3D10000 width=3D4)
(actual time=3D0.005..1.585 rows=3D10000 loops=3D1)
 Total runtime: 16510.962 ms
(7 rows)

More in gdb, all of the cost seems to come from:

        run_cost +=3D hash_qual_cost.per_tuple * outer_path_rows *
            clamp_row_est(inner_path_rows * innerbucketsize) * 0.5;

(outer_path_rows *  clamp_row_est(inner_path_rows * innerbucketsize) * 0.5)
is 50 billion, leading to a wild cost. The parent's estimate of the number
of rows is rightly estimated at 10000, so 50 billion comparisons is
obviously bad estimate.

Re: BUG #6668: hashjoin cost problem

From
Tom Lane
Date:
postgresuser@yahoo.com writes:
> create table small(i) as select (g/1000) * 1000 from
> generate_series(1,10000) g;
> create table large(i) as select generate_series(1,100000000);

> It doesn't matter how big the big table is... for this distribution large
> table is hashed.

I don't think that's wrong.  If it hashes the small table, there cannot
be less than 1000 entries on each populated hash chain; adding more
work_mem doesn't help.  The planner is designed to avoid hashing such
unfriendly distributions as that.  The fact that you can get a somewhat
smaller runtime by forcing hashing in the other direction suggests that
its cost factors are not quite right for your specific case --- but it's
a long way from that observation to deciding that we should change the
cost factors for everyone.  In any case, the sizes of the tables are not
the only determinant of which one should be hashed.

            regards, tom lane

Re: BUG #6668: hashjoin cost problem

From
Postgres User
Date:
Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows o=
f output though?



________________________________
 From: Tom Lane <tgl@sss.pgh.pa.us>
To: postgresuser@yahoo.com=20
Cc: pgsql-bugs@postgresql.org=20
Sent: Wednesday, May 30, 2012 10:03 PM
Subject: Re: [BUGS] BUG #6668: hashjoin cost problem=20
=20
postgresuser@yahoo.com writes:
> create table small(i) as select (g/1000) * 1000 from
> generate_series(1,10000) g;
> create table large(i) as select generate_series(1,100000000);

> It doesn't matter how big the big table is... for this distribution large
> table is hashed.

I don't think that's wrong.=A0 If it hashes the small table, there cannot
be less than 1000 entries on each populated hash chain; adding more
work_mem doesn't help.=A0 The planner is designed to avoid hashing such
unfriendly distributions as that.=A0 The fact that you can get a somewhat
smaller runtime by forcing hashing in the other direction suggests that
its cost factors are not quite right for your specific case --- but it's
a long way from that observation to deciding that we should change the
cost factors for everyone.=A0 In any case, the sizes of the tables are not
the only determinant of which one should be hashed.

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 regards, tom lane=

Re: BUG #6668: hashjoin cost problem

From
Tom Lane
Date:
Postgres User <postgresuser@yahoo.com> writes:
> Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows of output though?

Well, if it hashes the smaller table, there's 100 million rows on the
outside, and each of them will probe one hash chain in the hash table.
If you're unlucky, each of those probes will hit a populated hash chain
with at least 1000 entries, leading to 100 billion comparisons.  I think
it might derate that worst-case by a factor of 2.  Now if you're lucky,
a lot of the outer tuples hit unpopulated hash chains and so the number
of comparisons is a lot less --- but in non-artificial examples,
that's not a very good bet to make.  The conservative assumption is that
both sides of the join have similar key distributions, so that the more
populated hash chains are also more likely to be probed.  The cost
estimate is therefore designed to discriminate against using an inner
relation with a non-flat distribution.

            regards, tom lane