Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows o=
f output though?
________________________________
From: Tom Lane <tgl@sss.pgh.pa.us>
To: postgresuser@yahoo.com=20
Cc: pgsql-bugs@postgresql.org=20
Sent: Wednesday, May 30, 2012 10:03 PM
Subject: Re: [BUGS] BUG #6668: hashjoin cost problem=20
=20
postgresuser@yahoo.com writes:
> create table small(i) as select (g/1000) * 1000 from
> generate_series(1,10000) g;
> create table large(i) as select generate_series(1,100000000);
> It doesn't matter how big the big table is... for this distribution large
> table is hashed.
I don't think that's wrong.=A0 If it hashes the small table, there cannot
be less than 1000 entries on each populated hash chain; adding more
work_mem doesn't help.=A0 The planner is designed to avoid hashing such
unfriendly distributions as that.=A0 The fact that you can get a somewhat
smaller runtime by forcing hashing in the other direction suggests that
its cost factors are not quite right for your specific case --- but it's
a long way from that observation to deciding that we should change the
cost factors for everyone.=A0 In any case, the sizes of the tables are not
the only determinant of which one should be hashed.
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 regards, tom lane=