Thread: Horribly slow hash join

Horribly slow hash join

From
"Jim C. Nasby"
Date:
Note the time for the hash join step:



------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=357.62..26677.99 rows=93668 width=62) (actual time=741.159..443381.011 rows=49091 loops=1)
   Hash Cond: ("outer".work_today = "inner".work_units)
   ->  Hash Join  (cost=337.11..24784.11 rows=93668 width=54) (actual time=731.374..417188.519 rows=49091 loops=1)
         Hash Cond: ("outer".work_total = "inner".work_units)
         ->  Seq Scan on email_rank  (cost=0.00..22240.04 rows=254056 width=46) (actual time=582.145..1627.759
rows=49091loops=1) 
               Filter: (project_id = 8)
         ->  Hash  (cost=292.49..292.49 rows=17849 width=16) (actual time=148.944..148.944 rows=0 loops=1)
               ->  Seq Scan on rank_tie_overall o  (cost=0.00..292.49 rows=17849 width=16) (actual time=0.059..75.984
rows=17849loops=1) 
   ->  Hash  (cost=17.81..17.81 rows=1081 width=16) (actual time=8.996..8.996 rows=0 loops=1)
         ->  Seq Scan on rank_tie_today d  (cost=0.00..17.81 rows=1081 width=16) (actual time=0.080..4.635 rows=1081
loops=1)
 Total runtime: 619047.032 ms

By comparison:
stats=# set enable_hashjoin=false;
SET
stats=# explain analyze select * from email_rank, rank_tie_overall o, rank_tie_today d  WHERE email_rank.work_today =
d.work_unitsAND email_rank.work_total = o.work_units AND email_rank.project_id = :ProjectID; 
                                                                             QUERY PLAN
                                             

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=55391.69..56823.23 rows=93668 width=80) (actual time=2705.344..3349.318 rows=49091 loops=1)
   Merge Cond: ("outer".work_units = "inner".work_today)
   ->  Index Scan using work_units_today on rank_tie_today d  (cost=0.00..23.89 rows=1081 width=16) (actual
time=0.150..4.874rows=1081 loops=1) 
   ->  Sort  (cost=55391.69..55625.86 rows=93668 width=64) (actual time=2705.153..2888.039 rows=49091 loops=1)
         Sort Key: email_rank.work_today
         ->  Merge Join  (cost=45047.64..47656.93 rows=93668 width=64) (actual time=1685.414..2494.342 rows=49091
loops=1)
               Merge Cond: ("outer".work_units = "inner".work_total)
               ->  Index Scan using work_units_overall on rank_tie_overall o  (cost=0.00..361.34 rows=17849 width=16)
(actualtime=0.122..79.383 rows=17849 loops=1) 
               ->  Sort  (cost=45047.64..45682.78 rows=254056 width=48) (actual time=1685.228..1866.215 rows=49091
loops=1)
                     Sort Key: email_rank.work_total
                     ->  Seq Scan on email_rank  (cost=0.00..22240.04 rows=254056 width=48) (actual
time=786.515..1289.101rows=49091 loops=1) 
                           Filter: (project_id = 8)
 Total runtime: 3548.087 ms

Even though the second case is only a select, it seems clear that
something's wrong...
--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Horribly slow hash join

From
Tom Lane
Date:
"Jim C. Nasby" <jim@nasby.net> writes:
> Note the time for the hash join step:

Have you ANALYZEd these tables lately?

It looks to me like it's hashing on some column that has only a small
number of distinct values, so that the hash doesn't actually help to
avoid comparisons.  The planner should know better than to choose such
a plan, but if it's working with obsolete stats ...

            regards, tom lane

Re: Horribly slow hash join

From
"Jim C. Nasby"
Date:
Yes, stats are up to date, and the values should be fairly unique.

Combined with the hash aggregate problem I saw (see my other email to
the list), do you think there could be some issue with the performance
of the hash function on FreeBSD 5.2 on AMD64?

I'll post the table you requested someplace you can grab it.

On Fri, Apr 16, 2004 at 12:34:11PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim@nasby.net> writes:
> > Note the time for the hash join step:
>
> Have you ANALYZEd these tables lately?
>
> It looks to me like it's hashing on some column that has only a small
> number of distinct values, so that the hash doesn't actually help to
> avoid comparisons.  The planner should know better than to choose such
> a plan, but if it's working with obsolete stats ...
>
>             regards, tom lane
>

--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Horribly slow hash join

From
Tom Lane
Date:
"Jim C. Nasby" <jim@nasby.net> writes:
> Combined with the hash aggregate problem I saw (see my other email to
> the list), do you think there could be some issue with the performance
> of the hash function on FreeBSD 5.2 on AMD64?

Yeah, I was wondering about that too.  Hard to imagine what though.
The hash function should be pretty platform-independent.

            regards, tom lane

Re: Horribly slow hash join

From
Tom Lane
Date:
[ resending because I fat-fingered the cc: to the list ]

I see the problem: all the entries in your work_units column have the
low 32 bits equal to zero.

regression=# select distinct work_units % (2^32)::bigint from Trank_work_overall;
 ?column?
----------
        0
(1 row)

The hash function for int8 only takes the low word into account, so all
of the entries end up on the same hash chain, resulting in worst-case
behavior.  This applies to both your hash join and hash aggregate cases.

We could change the hash function, perhaps, but then we'd just have
different cases where there's a problem ... hashing will always fail on
*some* set of inputs.  (Also, I have been harboring some notions of
supporting cross-type hash joins for integer types, which will not work
unless small int8 values hash the same as int4 etc.)

I guess the real issue is why are you encoding work_units like that?

            regards, tom lane

Re: Horribly slow hash join

From
Marcos Martínez(R)
Date:
I didn't follow the conversation from the begining, bu I imagine that you
could improve
performance using the value (work_units % (2^32) ) instead of work_units.
You could even make an index on this value. Like that, the HASH function
will work well. This is not a good solution, but ...

For example.

create index ind1 on table1 ( work_units % (2^32) );

create index ind1 on table2 ( work_units % (2^32) );

Select * from table1 join table2 on (table1.work_units % (2^32) ) =
(table2.work_units % (2^32) )


----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Jim C. Nasby" <jim@nasby.net>
Cc: <pgsql-performance@postgreSQL.org>
Sent: Saturday, April 17, 2004 6:08 PM
Subject: Re: [PERFORM] Horribly slow hash join


> [ resending because I fat-fingered the cc: to the list ]
>
> I see the problem: all the entries in your work_units column have the
> low 32 bits equal to zero.
>
> regression=# select distinct work_units % (2^32)::bigint from
Trank_work_overall;
>  ?column?
> ----------
>         0
> (1 row)
>
> The hash function for int8 only takes the low word into account, so all
> of the entries end up on the same hash chain, resulting in worst-case
> behavior.  This applies to both your hash join and hash aggregate cases.
>
> We could change the hash function, perhaps, but then we'd just have
> different cases where there's a problem ... hashing will always fail on
> *some* set of inputs.  (Also, I have been harboring some notions of
> supporting cross-type hash joins for integer types, which will not work
> unless small int8 values hash the same as int4 etc.)
>
> I guess the real issue is why are you encoding work_units like that?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org



Re: Horribly slow hash join

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> We could change the hash function, perhaps, but then we'd just have
> different cases where there's a problem ... hashing will always fail on
> *some* set of inputs.

Sure, but completely ignoring part of the input seems like an unfortunate
choice of hash function.

> (Also, I have been harboring some notions of supporting cross-type hash
> joins for integer types, which will not work unless small int8 values hash
> the same as int4 etc.)

The obvious way to modify the hash function is to xor the high 32 bits with
the low 32 bits. That maintains the property you need and at least ensures
that all the bits are taken into account.

--
greg

Re: Horribly slow hash join

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> (Also, I have been harboring some notions of supporting cross-type hash
>> joins for integer types, which will not work unless small int8 values hash
>> the same as int4 etc.)

> The obvious way to modify the hash function is to xor the high 32 bits with
> the low 32 bits. That maintains the property you need

No it doesn't ...

            regards, tom lane

Re: Horribly slow hash join

From
Dennis Bjorklund
Date:
On Sat, 17 Apr 2004, Tom Lane wrote:

> *some* set of inputs.  (Also, I have been harboring some notions of
> supporting cross-type hash joins for integer types, which will not work
> unless small int8 values hash the same as int4 etc.)

The simple solution would be to always extend integers to 64 bits (or
whatever the biggest integer is) before calculating the hash. It makes the
hash function a little slower for smaller types, but it's mostly an
operation in the cpu and no memory involved, so it's probably not
noticable.

--
/Dennis Björklund


Re: Horribly slow hash join

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >> (Also, I have been harboring some notions of supporting cross-type hash
> >> joins for integer types, which will not work unless small int8 values hash
> >> the same as int4 etc.)
>
> > The obvious way to modify the hash function is to xor the high 32 bits with
> > the low 32 bits. That maintains the property you need
>
> No it doesn't ...

Eh? Oh, negative numbers? So low^high^sign.


I wonder if it makes sense to have check the hash distribution after
generating the table and if it's bad then throw it away and try again with a
different hash function. The "different hash function" would probably just be
a seed value changing. Probably way overkill though.

--
greg

Re: Horribly slow hash join

From
Tom Lane
Date:
Dennis Bjorklund <db@zigo.dhs.org> writes:
> On Sat, 17 Apr 2004, Tom Lane wrote:
>> *some* set of inputs.  (Also, I have been harboring some notions of
>> supporting cross-type hash joins for integer types, which will not work
>> unless small int8 values hash the same as int4 etc.)

> The simple solution would be to always extend integers to 64 bits (or
> whatever the biggest integer is) before calculating the hash.

That creates portability issues though.  We do not depend on there being
a 64-bit-int type for anything except int8 itself, and I don't want to
start doing so.

            regards, tom lane

Re: Horribly slow hash join

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Eh? Oh, negative numbers? So low^high^sign.

[ thinks about it... ]  Yeah, that would work.  We can't backpatch it
without breaking existing hash indexes on int8, but it'd be reasonable
to change for 7.5 (since at the rate things are going, we won't have
pg_upgrade for 7.5 anyway...)

> I wonder if it makes sense to have check the hash distribution after
> generating the table and if it's bad then throw it away and try again with a
> different hash function. The "different hash function" would probably just be
> a seed value changing. Probably way overkill though.

Yeah, it'd be a pain trying to get all the type-specific hash functions
doing that.  I'm also unconvinced that a simple change of seed value
would necessarily make the distribution better.  In the worst case, if
the real problem is that all the input values are identical, you can
reseed all day long and it won't fix it.

            regards, tom lane

Re: Horribly slow hash join

From
Dennis Bjorklund
Date:
On Sun, 18 Apr 2004, Tom Lane wrote:

> That creates portability issues though.  We do not depend on there being
> a 64-bit-int type for anything except int8 itself, and I don't want to
> start doing so.

What do you mean? int8 is supported on all platformas and if the
hasfunction would convert all numbers to int8 before making the hash it
would work.

I don't see any portability problems.

--
/Dennis Björklund


Re: Horribly slow hash join

From
Tom Lane
Date:
Dennis Bjorklund <db@zigo.dhs.org> writes:
> What do you mean? int8 is supported on all platformas

No it isn't.

            regards, tom lane

Re: Horribly slow hash join

From
Dennis Bjorklund
Date:
On Sun, 18 Apr 2004, Tom Lane wrote:

> > What do you mean? int8 is supported on all platformas
>
> No it isn't.

So on platforms where it isn't you would use int4 as the biggest int then.
I don't really see that as a problem. As long as you calculate the hash on
the biggest int on that platform it should work.

--
/Dennis Björklund


Re: Horribly slow hash join

From
Bruno Wolff III
Date:
On Sun, Apr 18, 2004 at 18:27:09 +0200,
  Dennis Bjorklund <db@zigo.dhs.org> wrote:
> On Sun, 18 Apr 2004, Tom Lane wrote:
>
> > > What do you mean? int8 is supported on all platformas
> >
> > No it isn't.
>
> So on platforms where it isn't you would use int4 as the biggest int then.
> I don't really see that as a problem. As long as you calculate the hash on
> the biggest int on that platform it should work.

Another option would be to put the numbers into two int4s. For int4 or
smaller types one of these would be zero. int8s would be split between
the two. The hash function would then be defined on the two int4s.

Re: Horribly slow hash join

From
Dennis Bjorklund
Date:
On Sun, 18 Apr 2004, Bruno Wolff III wrote:

> Another option would be to put the numbers into two int4s. For int4 or
> smaller types one of these would be zero. int8s would be split between
> the two. The hash function would then be defined on the two int4s.

Sure, this is an internal calculation in the hash function. The only
important thing is that the number 7 (for example) gives the same hash
value no matter if it is an int2 or an int8 and that the hash function
works well also for int8 numbers (which is does not today).

At least that was the properties I understood that we wanted.

We got side tracked into talking about what datatype exists in all
platforms, that's not an issue at all.

--
/Dennis Björklund


Re: Horribly slow hash join

From
Greg Stark
Date:
Dennis Bjorklund <db@zigo.dhs.org> writes:

> On Sun, 18 Apr 2004, Bruno Wolff III wrote:
>
> > Another option would be to put the numbers into two int4s. For int4 or
> > smaller types one of these would be zero. int8s would be split between
> > the two. The hash function would then be defined on the two int4s.
>
> Sure, this is an internal calculation in the hash function. The only
> important thing is that the number 7 (for example) gives the same hash
> value no matter if it is an int2 or an int8 and that the hash function
> works well also for int8 numbers (which is does not today).

What's missing here is that the actual API for hash functions is that the data
type provides a function that hashes to 32 bit integers. Then the hash code
uses the 32 bit integer to crunch down to the actual number of buckets (using
mod).

The choice of 32 bit integers is purely arbitrary. As long as it's larger than
than the number of buckets in any sane hash table it's fine. 32 bits is
plenty.

I question the use of mod to crunch the hash value down though. In the case of
int4 the mapping to 32 bits is simply the identity. So the entire hash
function ends up being simply "input mod #buckets". It seems way too easy to
find real world data sets where many numbers will all be multiples of some
number. If that common divisor shares any factors with the number of buckets,
then the distribution will be very far from even with many empty buckets.

If the hash tables were made a power of two then it would be possible to mix
the bits of the 32 bit value and just mask off the unneeded bits. I've found
one page via google that mentions mixing bits in a hash function, but I would
look for a more serious treatment somewhere.

 http://burtleburtle.net/bob/hash/doobs.html

Incidentally, this text claims mod is extremely slow compared to bit
manipulations. I don't know that that kind of cycle counting is really is a
factor for postgres though.


Also, incidentally, this text is interesting:

 http://www.isthe.com/chongo/tech/comp/fnv/



--
greg

Re: Horribly slow hash join

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> If the hash tables were made a power of two then it would be possible to mix
> the bits of the 32 bit value and just mask off the unneeded bits. I've found
> one page via google that mentions mixing bits in a hash function, but I would
> look for a more serious treatment somewhere.
>  http://burtleburtle.net/bob/hash/doobs.html
> Incidentally, this text claims mod is extremely slow compared to bit
> manipulations.

Modding by a *non* power of 2 (esp. a prime) mixes the bits quite well,
and is likely faster than any multiple-instruction way to do the same.

The quoted article seems to be by someone who has spent a lot of time
counting assembly cycles and none at all reading the last thirty years
worth of CS literature.  Knuth's treatment of hashing has some actual
math to it...

            regards, tom lane

Re: Horribly slow hash join

From
Dave Cramer
Date:
Here's an interesting link that suggests that hyperthreading would be
much worse.


http://groups.google.com/groups?q=hyperthreading+dual+xeon+idle&start=10&hl=en&lr=&ie=UTF-8&c2coff=1&selm=aukkonen-FE5275.21093624062003%40shawnews.gv.shawcable.net&rnum=16

FWIW, I have anecdotal evidence that suggests that this is the case, on
of my clients was seeing very large context switches with HTT turned on,
and without it was much better.

Dave
On Mon, 2004-04-19 at 02:09, Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > If the hash tables were made a power of two then it would be possible to mix
> > the bits of the 32 bit value and just mask off the unneeded bits. I've found
> > one page via google that mentions mixing bits in a hash function, but I would
> > look for a more serious treatment somewhere.
> >  http://burtleburtle.net/bob/hash/doobs.html
> > Incidentally, this text claims mod is extremely slow compared to bit
> > manipulations.
>
> Modding by a *non* power of 2 (esp. a prime) mixes the bits quite well,
> and is likely faster than any multiple-instruction way to do the same.
>
> The quoted article seems to be by someone who has spent a lot of time
> counting assembly cycles and none at all reading the last thirty years
> worth of CS literature.  Knuth's treatment of hashing has some actual
> math to it...
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>
>
>
> !DSPAM:40837183123741526418863!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561


Re: Horribly slow hash join

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > If the hash tables were made a power of two then it would be possible to mix
> > the bits of the 32 bit value and just mask off the unneeded bits. I've found
> > one page via google that mentions mixing bits in a hash function, but I would
> > look for a more serious treatment somewhere.


> Modding by a *non* power of 2 (esp. a prime) mixes the bits quite well,
> and is likely faster than any multiple-instruction way to do the same.

Well a) any number that has any factors of two fails to mix in some bits.
That's a lot more common than non powers of two. b) The postgres code makes no
attempt to make the number of buckets a prime and c) Even if the number of
buckets were prime then it seems it would still be too easy to find real-world
data where all the data have that prime as a factor. As it is they only need
to have common factors to lose.

> The quoted article seems to be by someone who has spent a lot of time
> counting assembly cycles and none at all reading the last thirty years
> worth of CS literature.

Yes, well I did note that.

--
greg

Re: Horribly slow hash join

From
Greg Stark
Date:
Dave Cramer <pg@fastcrypt.com> writes:

> Here's an interesting link that suggests that hyperthreading would be
> much worse.

Uh, this is the wrong thread.

--
greg

Re: Horribly slow hash join

From
"Jim C. Nasby"
Date:
Dammit, I somehow deleted a bunch of replies to this.

Did a TODO ever come out of this?
--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Horribly slow hash join

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Modding by a *non* power of 2 (esp. a prime) mixes the bits quite well,
> and is likely faster than any multiple-instruction way to do the same.
>
> The quoted article seems to be by someone who has spent a lot of time
> counting assembly cycles and none at all reading the last thirty years
> worth of CS literature.  Knuth's treatment of hashing has some actual
> math to it...

[incidentally, I just found that the quoted article was independently found by
Bruce Momjian who found it convincing enough to convert the hash_any table
over to it two years ago]

I just reviewed Knuth as well as C.L.R. and several papers from CACM and
SIGMOD.

It seems we have three options:

mod():

 Pro: empirical research shows it the best algorithm for avoiding collisions

 Con: requires the hash table be a prime size and far from a power of two.
 This is inconvenient to arrange for dynamic tables as used in postgres.

multiplication method (floor(tablesize * remainder(x * golden ratio)))

 Pro: works with any table size

 Con: I'm not clear if it can be done without floating point arithmetic.
      It seems like it ought to be possible though.

Universal hashing:

 Pro: won't create gotcha situations where the distribution of data suddenly
      and unexpectedly creates unexpected performance problems.

 Con: more complex.

It would be trivial to switch the implementation from mod() to the
multiplicative method which is more suited to postgres's needs. However
universal hashing has some appeal. I hate the idea that a hash join could
perform perfectly well one day and suddenly become pessimal when new data is
loaded.

In a sense universal hashing is less predictable. For a DSS system that could
be bad. A query that runs fine every day might fail one day in an
unpredictable way even though the data is unchanged.

But in another sense it would be more predictable in that if you run the query
a thousand times the average performance would be very close to the expected
regardless of what the data is. Whereas more traditional algorithms have some
patterns of data that will consistently perform badly.

It's probably not worth it but postgres could maybe even be tricky and pretest
the parameters against the common values in the statistics table, generating
new ones if they fail to generate a nice distribution. That doesn't really
guarantee anything though, except that those common values would at least be
well distributed to start with.

--
greg