Thread: BUG #6668: hashjoin cost problem
The following bug has been logged on the website: Bug reference: 6668 Logged by: Postgres User Email address: postgresuser@yahoo.com PostgreSQL version: 9.1.3 Operating system: Ubuntu Description:=20=20=20=20=20=20=20=20 work_mem 1MB create table small(i) as select (g/1000) * 1000 from generate_series(1,10000) g; create table large(i) as select generate_series(1,100000000); vacuum; vacuum; vacuum analyze; explain analyze select * from small inner join large using (i); QUERY PLAN=20= =20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20 ---------------------------------------------------------------------------= --------------------------------------------- ------------- Hash Join (cost=3D3083103.00..3475328.00 rows=3D10000 width=3D4) (actual time=3D84079.899..84989.419 rows=3D9001 loops=3D1) Hash Cond: (small.i =3D large.i) -> Seq Scan on small (cost=3D0.00..145.00 rows=3D10000 width=3D4) (act= ual time=3D0.008..0.588 rows=3D10000 loops=3D1) -> Hash (cost=3D1442478.00..1442478.00 rows=3D100000000 width=3D4) (ac= tual time=3D84079.741..84079.741 rows=3D100000000 loops =3D1) Buckets: 4096 Batches: 4096 Memory Usage: 853kB -> Seq Scan on large (cost=3D0.00..1442478.00 rows=3D100000000 width=3D4) (actual time=3D0.005..59011.443 rows=3D100000 000 loops=3D1) Total runtime: 84990.270 ms (7 rows) It doesn't matter how big the big table is... for this distribution large table is hashed. Forcing (gdb) the cost in one of the cost_hashjoin calls to 0, it chooses to hash the smaller table with better results: explain analyze select * from small inner join large using (i); QUERY PLAN=20=20= =20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20 ---------------------------------------------------------------------------= --------------------------------------------- ------ Hash Join (cost=3D270.00..0.00 rows=3D10000 width=3D4) (actual time=3D14028.034..16510.598 rows=3D9001 loops=3D1) Hash Cond: (large.i =3D small.i) -> Seq Scan on large (cost=3D0.00..1442478.00 rows=3D100000000 width= =3D4) (actual time=3D0.010..5893.344 rows=3D100000000 loo ps=3D1) -> Hash (cost=3D145.00..145.00 rows=3D10000 width=3D4) (actual time=3D3.854..3.854 rows=3D10000 loops=3D1) Buckets: 1024 Batches: 1 Memory Usage: 352kB -> Seq Scan on small (cost=3D0.00..145.00 rows=3D10000 width=3D4) (actual time=3D0.005..1.585 rows=3D10000 loops=3D1) Total runtime: 16510.962 ms (7 rows) More in gdb, all of the cost seems to come from: run_cost +=3D hash_qual_cost.per_tuple * outer_path_rows * clamp_row_est(inner_path_rows * innerbucketsize) * 0.5; (outer_path_rows * clamp_row_est(inner_path_rows * innerbucketsize) * 0.5) is 50 billion, leading to a wild cost. The parent's estimate of the number of rows is rightly estimated at 10000, so 50 billion comparisons is obviously bad estimate.
postgresuser@yahoo.com writes: > create table small(i) as select (g/1000) * 1000 from > generate_series(1,10000) g; > create table large(i) as select generate_series(1,100000000); > It doesn't matter how big the big table is... for this distribution large > table is hashed. I don't think that's wrong. If it hashes the small table, there cannot be less than 1000 entries on each populated hash chain; adding more work_mem doesn't help. The planner is designed to avoid hashing such unfriendly distributions as that. The fact that you can get a somewhat smaller runtime by forcing hashing in the other direction suggests that its cost factors are not quite right for your specific case --- but it's a long way from that observation to deciding that we should change the cost factors for everyone. In any case, the sizes of the tables are not the only determinant of which one should be hashed. regards, tom lane
Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows o= f output though? ________________________________ From: Tom Lane <tgl@sss.pgh.pa.us> To: postgresuser@yahoo.com=20 Cc: pgsql-bugs@postgresql.org=20 Sent: Wednesday, May 30, 2012 10:03 PM Subject: Re: [BUGS] BUG #6668: hashjoin cost problem=20 =20 postgresuser@yahoo.com writes: > create table small(i) as select (g/1000) * 1000 from > generate_series(1,10000) g; > create table large(i) as select generate_series(1,100000000); > It doesn't matter how big the big table is... for this distribution large > table is hashed. I don't think that's wrong.=A0 If it hashes the small table, there cannot be less than 1000 entries on each populated hash chain; adding more work_mem doesn't help.=A0 The planner is designed to avoid hashing such unfriendly distributions as that.=A0 The fact that you can get a somewhat smaller runtime by forcing hashing in the other direction suggests that its cost factors are not quite right for your specific case --- but it's a long way from that observation to deciding that we should change the cost factors for everyone.=A0 In any case, the sizes of the tables are not the only determinant of which one should be hashed. =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 regards, tom lane=
Postgres User <postgresuser@yahoo.com> writes: > Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows of output though? Well, if it hashes the smaller table, there's 100 million rows on the outside, and each of them will probe one hash chain in the hash table. If you're unlucky, each of those probes will hit a populated hash chain with at least 1000 entries, leading to 100 billion comparisons. I think it might derate that worst-case by a factor of 2. Now if you're lucky, a lot of the outer tuples hit unpopulated hash chains and so the number of comparisons is a lot less --- but in non-artificial examples, that's not a very good bet to make. The conservative assumption is that both sides of the join have similar key distributions, so that the more populated hash chains are also more likely to be probed. The cost estimate is therefore designed to discriminate against using an inner relation with a non-flat distribution. regards, tom lane