Thread: BUG #2930: Hash join abyssmal with many null fields.

BUG #2930: Hash join abyssmal with many null fields.

From
"Maciej Babinski"
Date:
The following bug has been logged online:

Bug reference:      2930
Logged by:          Maciej Babinski
Email address:      maciej+postgres@killer-robot.net
PostgreSQL version: 8.2
Operating system:   CentOS 4.4 (linux 2.6.9)
Description:        Hash join abyssmal with many null fields.
Details:

Hash join of columns with many null fields is very slow unless the null
fields are commented out. Steps to reproduce:

CREATE TABLE a (id integer, join_id integer);
CREATE TABLE b (id integer, join_id integer);

INSERT INTO a (id) SELECT generate_series(1,10000);
INSERT INTO b (id) SELECT generate_series(1,10000);

ANALYZE a;
ANALYZE b;

EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id;
EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND
b.join_id IS NOT NULL;

Here are the results of the two explains:

test1=# EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id =
b.join_id;
EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND
b.join_id IS NOT NULL;
                                                    QUERY PLAN
----------------------------------------------------------------------------
--------------------------------------
 Hash Left Join  (cost=170.00..1590.01 rows=10000 width=16) (actual
time=23.100..9451.778 rows=10000 loops=1)
   Hash Cond: (a.join_id = b.join_id)
   ->  Seq Scan on a  (cost=0.00..145.00 rows=10000 width=8) (actual
time=0.014..11.071 rows=10000 loops=1)
   ->  Hash  (cost=145.00..145.00 rows=10000 width=8) (actual
time=21.999..21.999 rows=10000 loops=1)
         ->  Seq Scan on b  (cost=0.00..145.00 rows=10000 width=8) (actual
time=0.011..10.679 rows=10000 loops=1)
 Total runtime: 9460.944 ms
(6 rows)

test1=# EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id
AND b.join_id IS NOT NULL;
                                                 QUERY PLAN
----------------------------------------------------------------------------
--------------------------------
 Hash Left Join  (cost=145.00..340.01 rows=10000 width=16) (actual
time=1.970..32.452 rows=10000 loops=1)
   Hash Cond: (a.join_id = b.join_id)
   ->  Seq Scan on a  (cost=0.00..145.00 rows=10000 width=8) (actual
time=0.033..10.326 rows=10000 loops=1)
   ->  Hash  (cost=145.00..145.00 rows=1 width=8) (actual time=1.924..1.924
rows=0 loops=1)
         ->  Seq Scan on b  (cost=0.00..145.00 rows=1 width=8) (actual
time=1.922..1.922 rows=0 loops=1)
               Filter: (join_id IS NOT NULL)
 Total runtime: 41.495 ms
(7 rows)


Note that the two ON clauses are logically identical, but the second query
is much faster than the first. The first query will slow down exponentially
with the number of rows: 200,000 rows yielded a runtime of over ten hours.
Thanks!

Re: BUG #2930: Hash join abyssmal with many null fields.

From
Tom Lane
Date:
"Maciej Babinski" <maciej+postgres@apathy.killer-robot.net> writes:
> Hash join of columns with many null fields is very slow unless the null
> fields are commented out.

I see no bug here.  AFAICT your "much faster" query gets that way by
having eliminated all the candidate join rows on the B side.

            regards, tom lane

Re: BUG #2930: Hash join abyssmal with many null fields.

From
Tom Lane
Date:
Maciej Babinski <maciej@apathy.killer-robot.net> writes:
> Tom Lane wrote:
>> I see no bug here.  AFAICT your "much faster" query gets that way by
>> having eliminated all the candidate join rows on the B side.

> The additional clause eliminates no rows beyond what the existing
> clause would.  Any row eliminated by "b.join_id IS NOT NULL" could not
> possibly have satisfied "a.join_id = b.join_id".

Hmm.  You assume that the = operator is strict, which is probably true,
but the hash join code isn't assuming that.

It might be worth checking for the case, though.  What's happening,
since we go ahead and put the null rows into the hash table, is that
they all end up in the same hash chain because they all get hash code 0.
And then that very long chain gets searched for each null outer row.
If we knew the join operator is strict we could discard null rows
immediately on both sides.

> Please note that if the join columns are not null, but still produce
> no matches for the join, the results are fast without the need for an
> extra clause in the join:

Yeah, because the rows get reasonably well distributed into different
hash buckets.  The optimizer will avoid a hash if it sees the data is
not well-distributed, but IIRC it's not considering nulls when it
makes that decision.

            regards, tom lane

Re: BUG #2930: Hash join abyssmal with many null fields.

From
Maciej Babinski
Date:
Tom Lane wrote:
> "Maciej Babinski" <maciej+postgres@apathy.killer-robot.net> writes:
>> Hash join of columns with many null fields is very slow unless the null
>> fields are commented out.
>
> I see no bug here.  AFAICT your "much faster" query gets that way by
> having eliminated all the candidate join rows on the B side.
>
>             regards, tom lane


The additional clause eliminates no rows beyond what the existing clause
would.
Any row eliminated by "b.join_id IS NOT NULL" could not possibly have
satisfied
"a.join_id = b.join_id".

Please note that if the join columns are not null, but still produce no
matches
for the join, the results are fast without the need for an extra clause
in the join:

DROP TABLE a;
DROP TABLE b;

CREATE TABLE a (id integer, join_id integer);
CREATE TABLE b (id integer, join_id integer);

INSERT INTO a (id) SELECT generate_series(1,10000);
INSERT INTO b (id) SELECT generate_series(1,10000);

ANALYZE a;
ANALYZE b;

EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id; /* 14
seconds */
EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id AND
b.join_id IS NOT NULL; /* 5ms */

UPDATE a SET join_id=1;
UPDATE b SET join_id=2;

EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id; /* 72ms */
EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id = b.join_id AND
b.join_id != 2; /* 48ms */


It seems to me that such a wild disparity in performance due to the
addition of a clause that is
implied by the existing clause should be considered a bug, but if I need
to submit a feature
request for the optimizer, then I'd be happy to. Thanks!

Maciej Babinski

Re: BUG #2930: Hash join abyssmal with many null fields.

From
Maciej Babinski
Date:
Tom Lane wrote:
> Maciej Babinski <maciej@apathy.killer-robot.net> writes:
>> Tom Lane wrote:
>>> I see no bug here.  AFAICT your "much faster" query gets that way by
>>> having eliminated all the candidate join rows on the B side.
>
>> The additional clause eliminates no rows beyond what the existing
>> clause would.  Any row eliminated by "b.join_id IS NOT NULL" could not
>> possibly have satisfied "a.join_id = b.join_id".
>
> Hmm.  You assume that the = operator is strict, which is probably true,
> but the hash join code isn't assuming that.
>

Yes, both queries return identical results.

> It might be worth checking for the case, though.  What's happening,
> since we go ahead and put the null rows into the hash table, is that
> they all end up in the same hash chain because they all get hash code 0.
> And then that very long chain gets searched for each null outer row.
> If we knew the join operator is strict we could discard null rows
> immediately on both sides.
>
>> Please note that if the join columns are not null, but still produce
>> no matches for the join, the results are fast without the need for an
>> extra clause in the join:
>
> Yeah, because the rows get reasonably well distributed into different
> hash buckets.  The optimizer will avoid a hash if it sees the data is
> not well-distributed, but IIRC it's not considering nulls when it
> makes that decision.


You're right. Filling one join column with '1', and the other with '2'
yields a plan without a hash join. Something that may be worth noting is
that while my initial problem case (200,000 rows in more complicated
tables),
as well as the sample test case I put together (10,000 rows in simple
tables)
both yield this result, changing my test case to use 20,000 rows produces
a plan that avoids hash joins as well:

maciej_test=# EXPLAIN ANALYZE SELECT * FROM a JOIN b ON a.join_id =
b.join_id;
                                                     QUERY PLAN
------------------------------------------------------------------------------------------------------------------
  Merge Join  (cost=3435.54..3635.55 rows=1 width=16) (actual
time=100.822..100.822 rows=0 loops=1)
    Merge Cond: ("outer".join_id = "inner".join_id)
    ->  Sort  (cost=1717.77..1767.77 rows=20000 width=8) (actual
time=65.401..86.333 rows=20000 loops=1)
          Sort Key: a.join_id
          ->  Seq Scan on a  (cost=0.00..289.00 rows=20000 width=8)
(actual time=0.005..21.119 rows=20000 loops=1)
    ->  Sort  (cost=1717.77..1767.77 rows=20000 width=8) (never executed)
          Sort Key: b.join_id
          ->  Seq Scan on b  (cost=0.00..289.00 rows=20000 width=8)
(never executed)
  Total runtime: 101.891 ms
(9 rows)

Time: 103.051 ms

Thanks for you time,
Maciej Babinski