Thread: BUG #2930: Hash join abyssmal with many null fields.
The following bug has been logged online: Bug reference: 2930 Logged by: Maciej Babinski Email address: maciej+postgres@killer-robot.net PostgreSQL version: 8.2 Operating system: CentOS 4.4 (linux 2.6.9) Description: Hash join abyssmal with many null fields. Details: Hash join of columns with many null fields is very slow unless the null fields are commented out. Steps to reproduce: CREATE TABLE a (id integer, join_id integer); CREATE TABLE b (id integer, join_id integer); INSERT INTO a (id) SELECT generate_series(1,10000); INSERT INTO b (id) SELECT generate_series(1,10000); ANALYZE a; ANALYZE b; EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id; EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND b.join_id IS NOT NULL; Here are the results of the two explains: test1=# EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id; EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND b.join_id IS NOT NULL; QUERY PLAN ---------------------------------------------------------------------------- -------------------------------------- Hash Left Join (cost=170.00..1590.01 rows=10000 width=16) (actual time=23.100..9451.778 rows=10000 loops=1) Hash Cond: (a.join_id = b.join_id) -> Seq Scan on a (cost=0.00..145.00 rows=10000 width=8) (actual time=0.014..11.071 rows=10000 loops=1) -> Hash (cost=145.00..145.00 rows=10000 width=8) (actual time=21.999..21.999 rows=10000 loops=1) -> Seq Scan on b (cost=0.00..145.00 rows=10000 width=8) (actual time=0.011..10.679 rows=10000 loops=1) Total runtime: 9460.944 ms (6 rows) test1=# EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND b.join_id IS NOT NULL; QUERY PLAN ---------------------------------------------------------------------------- -------------------------------- Hash Left Join (cost=145.00..340.01 rows=10000 width=16) (actual time=1.970..32.452 rows=10000 loops=1) Hash Cond: (a.join_id = b.join_id) -> Seq Scan on a (cost=0.00..145.00 rows=10000 width=8) (actual time=0.033..10.326 rows=10000 loops=1) -> Hash (cost=145.00..145.00 rows=1 width=8) (actual time=1.924..1.924 rows=0 loops=1) -> Seq Scan on b (cost=0.00..145.00 rows=1 width=8) (actual time=1.922..1.922 rows=0 loops=1) Filter: (join_id IS NOT NULL) Total runtime: 41.495 ms (7 rows) Note that the two ON clauses are logically identical, but the second query is much faster than the first. The first query will slow down exponentially with the number of rows: 200,000 rows yielded a runtime of over ten hours. Thanks!
"Maciej Babinski" <maciej+postgres@apathy.killer-robot.net> writes: > Hash join of columns with many null fields is very slow unless the null > fields are commented out. I see no bug here. AFAICT your "much faster" query gets that way by having eliminated all the candidate join rows on the B side. regards, tom lane
Maciej Babinski <maciej@apathy.killer-robot.net> writes: > Tom Lane wrote: >> I see no bug here. AFAICT your "much faster" query gets that way by >> having eliminated all the candidate join rows on the B side. > The additional clause eliminates no rows beyond what the existing > clause would. Any row eliminated by "b.join_id IS NOT NULL" could not > possibly have satisfied "a.join_id = b.join_id". Hmm. You assume that the = operator is strict, which is probably true, but the hash join code isn't assuming that. It might be worth checking for the case, though. What's happening, since we go ahead and put the null rows into the hash table, is that they all end up in the same hash chain because they all get hash code 0. And then that very long chain gets searched for each null outer row. If we knew the join operator is strict we could discard null rows immediately on both sides. > Please note that if the join columns are not null, but still produce > no matches for the join, the results are fast without the need for an > extra clause in the join: Yeah, because the rows get reasonably well distributed into different hash buckets. The optimizer will avoid a hash if it sees the data is not well-distributed, but IIRC it's not considering nulls when it makes that decision. regards, tom lane
Tom Lane wrote: > "Maciej Babinski" <maciej+postgres@apathy.killer-robot.net> writes: >> Hash join of columns with many null fields is very slow unless the null >> fields are commented out. > > I see no bug here. AFAICT your "much faster" query gets that way by > having eliminated all the candidate join rows on the B side. > > regards, tom lane The additional clause eliminates no rows beyond what the existing clause would. Any row eliminated by "b.join_id IS NOT NULL" could not possibly have satisfied "a.join_id = b.join_id". Please note that if the join columns are not null, but still produce no matches for the join, the results are fast without the need for an extra clause in the join: DROP TABLE a; DROP TABLE b; CREATE TABLE a (id integer, join_id integer); CREATE TABLE b (id integer, join_id integer); INSERT INTO a (id) SELECT generate_series(1,10000); INSERT INTO b (id) SELECT generate_series(1,10000); ANALYZE a; ANALYZE b; EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id; /* 14 seconds */ EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id AND b.join_id IS NOT NULL; /* 5ms */ UPDATE a SET join_id=1; UPDATE b SET join_id=2; EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id; /* 72ms */ EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id AND b.join_id != 2; /* 48ms */ It seems to me that such a wild disparity in performance due to the addition of a clause that is implied by the existing clause should be considered a bug, but if I need to submit a feature request for the optimizer, then I'd be happy to. Thanks! Maciej Babinski
Tom Lane wrote: > Maciej Babinski <maciej@apathy.killer-robot.net> writes: >> Tom Lane wrote: >>> I see no bug here. AFAICT your "much faster" query gets that way by >>> having eliminated all the candidate join rows on the B side. > >> The additional clause eliminates no rows beyond what the existing >> clause would. Any row eliminated by "b.join_id IS NOT NULL" could not >> possibly have satisfied "a.join_id = b.join_id". > > Hmm. You assume that the = operator is strict, which is probably true, > but the hash join code isn't assuming that. > Yes, both queries return identical results. > It might be worth checking for the case, though. What's happening, > since we go ahead and put the null rows into the hash table, is that > they all end up in the same hash chain because they all get hash code 0. > And then that very long chain gets searched for each null outer row. > If we knew the join operator is strict we could discard null rows > immediately on both sides. > >> Please note that if the join columns are not null, but still produce >> no matches for the join, the results are fast without the need for an >> extra clause in the join: > > Yeah, because the rows get reasonably well distributed into different > hash buckets. The optimizer will avoid a hash if it sees the data is > not well-distributed, but IIRC it's not considering nulls when it > makes that decision. You're right. Filling one join column with '1', and the other with '2' yields a plan without a hash join. Something that may be worth noting is that while my initial problem case (200,000 rows in more complicated tables), as well as the sample test case I put together (10,000 rows in simple tables) both yield this result, changing my test case to use 20,000 rows produces a plan that avoids hash joins as well: maciej_test=# EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Merge Join (cost=3435.54..3635.55 rows=1 width=16) (actual time=100.822..100.822 rows=0 loops=1) Merge Cond: ("outer".join_id = "inner".join_id) -> Sort (cost=1717.77..1767.77 rows=20000 width=8) (actual time=65.401..86.333 rows=20000 loops=1) Sort Key: a.join_id -> Seq Scan on a (cost=0.00..289.00 rows=20000 width=8) (actual time=0.005..21.119 rows=20000 loops=1) -> Sort (cost=1717.77..1767.77 rows=20000 width=8) (never executed) Sort Key: b.join_id -> Seq Scan on b (cost=0.00..289.00 rows=20000 width=8) (never executed) Total runtime: 101.891 ms (9 rows) Time: 103.051 ms Thanks for you time, Maciej Babinski