Re: Why hash join instead of nested loop? - Mailing list pgsql-performance

From Rhett Garber
Subject Re: Why hash join instead of nested loop?
Date
Msg-id 41b0fe890508090951546156f7@mail.gmail.com
Whole thread Raw
In response to Re: Why hash join instead of nested loop?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Why hash join instead of nested loop?
List pgsql-performance
Duplicated your setup in a separate DB.

At least its reproducable for me.....

I tested this on a Xeon 2 Ghz, 1 Gig Ram. Its running on some shared
storage array that I'm not sure the details of.

My production example also shows up on our production machine that is
almost the same hardware but has dual zeon and 6 gigs of ram.

Rhett

Hash Join  (cost=4.83..5.91 rows=1 width=14) (actual time=7.148..7.159
rows=1 loops=1)
   Hash Cond: ("outer".id = "inner".obj2)
   ->  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14)
(actual time=0.007..0.015 rows=5 loops=1)
   ->  Hash  (cost=4.83..4.83 rows=1 width=4) (actual
time=0.055..0.055 rows=0 loops=1)
         ->  Index Scan using connection_regid_obj1_index on
connection  (cost=0.00..4.83 rows=1 width=4) (actual time=0.028..0.032
rows=1 loops=1)
               Index Cond: ((connection_registry_id = 40105) AND (obj1
= 73582)) Total runtime: 7.693 ms
(7 rows)


On 8/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Rhett Garber <rhettg@gmail.com> writes:
> > This is postgres 7.4.1
> > All the rows involved are integers.
>
> Hmph.  There is something really strange going on here.  I tried to
> duplicate your problem in 7.4.*, thus:
>
> regression=# create table rtmessagestate(id int, f1 char(6));
> CREATE TABLE
> regression=# insert into rtmessagestate values(1,'z');
> INSERT 559399 1
> regression=# insert into rtmessagestate values(2,'z');
> INSERT 559400 1
> regression=# insert into rtmessagestate values(3,'z');
> INSERT 559401 1
> regression=# insert into rtmessagestate values(4,'z');
> INSERT 559402 1
> regression=# insert into rtmessagestate values(5,'z');
> INSERT 559403 1
> regression=# vacuum analyze rtmessagestate;
> VACUUM
> regression=# create table connection(connection_registry_id int, obj1 int, obj2 int);
> CREATE TABLE
> regression=# create index connection_regid_obj1_index on connection(connection_registry_id,obj1);
> CREATE INDEX
> regression=# insert into  connection values(40105,73582,3);
> INSERT 559407 1
> regression=# explain analyze select rtmessagestate.* from rtmessagestate,connection where (connection_registry_id =
40105)AND (obj1  = 73582) and id = obj2; 
>                                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=4.83..5.91 rows=1 width=14) (actual time=0.498..0.544 rows=1 loops=1)
>    Hash Cond: ("outer".id = "inner".obj2)
>    ->  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14) (actual time=0.030..0.072 rows=5 loops=1)
>    ->  Hash  (cost=4.83..4.83 rows=1 width=4) (actual time=0.305..0.305 rows=0 loops=1)
>          ->  Index Scan using connection_regid_obj1_index on connection  (cost=0.00..4.83 rows=1 width=4) (actual
time=0.236..0.264rows=1 loops=1) 
>                Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582))
>  Total runtime: 1.119 ms
> (7 rows)
>
> This duplicates your example as to plan and row counts:
>
> > Hash Join  (cost=5.96..7.04 rows=1 width=14) (actual
> > time=10.591..10.609 rows=1 loops=1)
> > Hash Cond: ("outer".id = "inner".obj2)
> > ->  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14)
> > (actual time=0.011..0.022 rows=5 loops=1)
> > ->  Hash  (cost=5.96..5.96 rows=1 width=4) (actual
> > time=0.109..0.109 rows=0 loops=1)
> > ->  Index Scan using connection_regid_obj1_index on
> > connection  (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076
> > rows=1 loops=1)
> > Index Cond: ((connection_registry_id = 40105) AND (obj1
> > = 73582)) Total runtime: 11.536 ms
> > (7 rows)
>
> My machine is considerably slower than yours, to judge by the actual
> elapsed times in the scan nodes ... so why is it beating the pants
> off yours in the join step?
>
> Can you try the above script verbatim in a scratch database and see
> what you get?  (Note it's worth trying the explain two or three
> times to be sure the values have settled out.)
>
> I'm testing a fairly recent 7.4-branch build (7.4.8 plus), so that's one
> possible reason for the discrepancy between my results and yours, but I
> do not see anything in the 7.4 CVS logs that looks like it's related to
> hashjoin performance.
>
> I'd be interested to see results from other people using 7.4.* too.
>
>                         regards, tom lane
>

pgsql-performance by date:

Previous
From: Ian Westmacott
Date:
Subject: Re: Why hash join instead of nested loop?
Next
From: Tom Lane
Date:
Subject: Re: Why hash join instead of nested loop?