Re: Why hash join instead of nested loop? - Mailing list pgsql-performance

From Tom Lane
Subject Re: Why hash join instead of nested loop?
Date
Msg-id 17253.1123607579@sss.pgh.pa.us
Whole thread Raw
In response to Re: Why hash join instead of nested loop?  (Rhett Garber <rhettg@gmail.com>)
Responses Re: Why hash join instead of nested loop?
List pgsql-performance
Rhett Garber <rhettg@gmail.com> writes:
> Duplicated your setup in a separate DB.
> At least its reproducable for me.....

Hmm.  Well, we now have several data points but they seem to be on
wildly varying hardware.  To try to normalize the results a little,
I computed the total actual time for the hash plan divided by the sum
of the actual times for the two scan nodes.  Thus, for your example:

> Hash Join  (cost=4.83..5.91 rows=1 width=14) (actual time=7.148..7.159
> rows=1 loops=1)
>    Hash Cond: ("outer".id = "inner".obj2)
>    ->  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14)
> (actual time=0.007..0.015 rows=5 loops=1)
>    ->  Hash  (cost=4.83..4.83 rows=1 width=4) (actual
> time=0.055..0.055 rows=0 loops=1)
>          ->  Index Scan using connection_regid_obj1_index on
> connection  (cost=0.00..4.83 rows=1 width=4) (actual time=0.028..0.032
> rows=1 loops=1)
>                Index Cond: ((connection_registry_id = 40105) AND (obj1
> = 73582)) Total runtime: 7.693 ms
> (7 rows)

this would be 7.159 / (0.015 + 0.032).  This is probably not an
enormously robust statistic but it at least focuses attention in the
right place.  Here's what I get (rounded off to 4 digits which is surely
as much precision as we have in the numbers):

 Tom 7.4.8+      1.619
 Ian 7.4.8      6.000
 Ian 7.4.2     13.95
 Steinar 7.4.7      8.833
 Rhett orig    108.3
 Rhett test    152.3
 Michael 7.4.1      2.015

My number seems to be a bit of an outlier to the low side, but yours are
way the heck to the high side.  And Michael's test seems to rule out the
idea that it's something broken in 7.4.1 in particular.

I'm now thinking you've got either a platform- or compiler-specific
problem.  Exactly what is the hardware (the CPU not the disks)?  How did
you build or come by the Postgres executables (compiler, configure
options, etc)?

            regards, tom lane

pgsql-performance by date:

Previous
From: Rhett Garber
Date:
Subject: Re: Why hash join instead of nested loop?
Next
From: Dan Harris
Date:
Subject: Table locking problems?