Tom Lane wrote:
> "Maciej Babinski" <maciej+postgres@apathy.killer-robot.net> writes:
>> Hash join of columns with many null fields is very slow unless the null
>> fields are commented out.
>
> I see no bug here. AFAICT your "much faster" query gets that way by
> having eliminated all the candidate join rows on the B side.
>
> regards, tom lane
The additional clause eliminates no rows beyond what the existing clause
would.
Any row eliminated by "b.join_id IS NOT NULL" could not possibly have
satisfied
"a.join_id = b.join_id".
Please note that if the join columns are not null, but still produce no
matches
for the join, the results are fast without the need for an extra clause
in the join:
DROP TABLE a;
DROP TABLE b;
CREATE TABLE a (id integer, join_id integer);
CREATE TABLE b (id integer, join_id integer);
INSERT INTO a (id) SELECT generate_series(1,10000);
INSERT INTO b (id) SELECT generate_series(1,10000);
ANALYZE a;
ANALYZE b;
EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id; /* 14
seconds */
EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id AND
b.join_id IS NOT NULL; /* 5ms */
UPDATE a SET join_id=1;
UPDATE b SET join_id=2;
EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id; /* 72ms */
EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id AND
b.join_id != 2; /* 48ms */
It seems to me that such a wild disparity in performance due to the
addition of a clause that is
implied by the existing clause should be considered a bug, but if I need
to submit a feature
request for the optimizer, then I'd be happy to. Thanks!
Maciej Babinski