Re: BUG #6668: hashjoin cost problem - Mailing list pgsql-bugs

From Postgres User
Subject Re: BUG #6668: hashjoin cost problem
Date
Msg-id 1338443837.61025.YahooMailNeo@web121105.mail.ne1.yahoo.com
Whole thread Raw
In response to Re: BUG #6668: hashjoin cost problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #6668: hashjoin cost problem  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows o=
f output though?



________________________________
 From: Tom Lane <tgl@sss.pgh.pa.us>
To: postgresuser@yahoo.com=20
Cc: pgsql-bugs@postgresql.org=20
Sent: Wednesday, May 30, 2012 10:03 PM
Subject: Re: [BUGS] BUG #6668: hashjoin cost problem=20
=20
postgresuser@yahoo.com writes:
> create table small(i) as select (g/1000) * 1000 from
> generate_series(1,10000) g;
> create table large(i) as select generate_series(1,100000000);

> It doesn't matter how big the big table is... for this distribution large
> table is hashed.

I don't think that's wrong.=A0 If it hashes the small table, there cannot
be less than 1000 entries on each populated hash chain; adding more
work_mem doesn't help.=A0 The planner is designed to avoid hashing such
unfriendly distributions as that.=A0 The fact that you can get a somewhat
smaller runtime by forcing hashing in the other direction suggests that
its cost factors are not quite right for your specific case --- but it's
a long way from that observation to deciding that we should change the
cost factors for everyone.=A0 In any case, the sizes of the tables are not
the only determinant of which one should be hashed.

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 regards, tom lane=

pgsql-bugs by date:

Previous
From: edward@heroku.com
Date:
Subject: BUG #6671: Killed restore command causes postmaster to exit
Next
From: jose.soares@sferacarta.com
Date:
Subject: BUG #6669: unique index w/ multiple columns and NULLs