Re: Why hash join instead of nested loop?

From: Tom Lane
Subject: Re: Why hash join instead of nested loop?
Date: ,
Msg-id: 17253.1123607579@sss.pgh.pa.us
(view: Whole thread, Raw)
In response to: Re: Why hash join instead of nested loop?  (Rhett Garber)
Responses: Re: Why hash join instead of nested loop?  (Rhett Garber)
List: pgsql-performance

Tree view

Why hash join instead of nested loop?  (Rhett Garber, )
 Re: Why hash join instead of nested loop?  (Havasvölgyi Ottó, )
  Re: Why hash join instead of nested loop?  (Rhett Garber, )
   Re: Why hash join instead of nested loop?  (Tom Lane, )
    Re: Why hash join instead of nested loop?  (Rhett Garber, )
     Re: Why hash join instead of nested loop?  (Tom Lane, )
      Re: Why hash join instead of nested loop?  ("Steinar H. Gunderson", )
      Re: Why hash join instead of nested loop?  (Michael Fuhr, )
      Re: Why hash join instead of nested loop?  (Ian Westmacott, )
       Re: Why hash join instead of nested loop?  (Tom Lane, )
        Re: Why hash join instead of nested loop?  (Ian Westmacott, )
      Re: Why hash join instead of nested loop?  (Rhett Garber, )
       Re: Why hash join instead of nested loop?  (Tom Lane, )
        Re: Why hash join instead of nested loop?  (Rhett Garber, )
         Re: Why hash join instead of nested loop?  (Tom Lane, )
          Re: Why hash join instead of nested loop?  (Rhett Garber, )
           Re: Why hash join instead of nested loop?  (Tom Lane, )
            Re: Why hash join instead of nested loop?  (Rhett Garber, )

Rhett Garber <> writes:
> Duplicated your setup in a separate DB.
> At least its reproducable for me.....

Hmm.  Well, we now have several data points but they seem to be on
wildly varying hardware.  To try to normalize the results a little,
I computed the total actual time for the hash plan divided by the sum
of the actual times for the two scan nodes.  Thus, for your example:

> Hash Join  (cost=4.83..5.91 rows=1 width=14) (actual time=7.148..7.159
> rows=1 loops=1)
>    Hash Cond: ("outer".id = "inner".obj2)
>    ->  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14)
> (actual time=0.007..0.015 rows=5 loops=1)
>    ->  Hash  (cost=4.83..4.83 rows=1 width=4) (actual
> time=0.055..0.055 rows=0 loops=1)
>          ->  Index Scan using connection_regid_obj1_index on
> connection  (cost=0.00..4.83 rows=1 width=4) (actual time=0.028..0.032
> rows=1 loops=1)
>                Index Cond: ((connection_registry_id = 40105) AND (obj1
> = 73582)) Total runtime: 7.693 ms
> (7 rows)

this would be 7.159 / (0.015 + 0.032).  This is probably not an
enormously robust statistic but it at least focuses attention in the
right place.  Here's what I get (rounded off to 4 digits which is surely
as much precision as we have in the numbers):

 Tom 7.4.8+      1.619
 Ian 7.4.8      6.000
 Ian 7.4.2     13.95
 Steinar 7.4.7      8.833
 Rhett orig    108.3
 Rhett test    152.3
 Michael 7.4.1      2.015

My number seems to be a bit of an outlier to the low side, but yours are
way the heck to the high side.  And Michael's test seems to rule out the
idea that it's something broken in 7.4.1 in particular.

I'm now thinking you've got either a platform- or compiler-specific
problem.  Exactly what is the hardware (the CPU not the disks)?  How did
you build or come by the Postgres executables (compiler, configure
options, etc)?

            regards, tom lane


pgsql-performance by date:

From: John A Meinel
Date:
Subject: Re: Table locking problems?
From: Tobias Brox
Date:
Subject: partial index regarded more expensive