Re: Why hash join instead of nested loop? - Mailing list pgsql-performance
From | Rhett Garber |
---|---|
Subject | Re: Why hash join instead of nested loop? |
Date | |
Msg-id | 41b0fe890508090951546156f7@mail.gmail.com Whole thread Raw |
In response to | Re: Why hash join instead of nested loop? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Why hash join instead of nested loop?
|
List | pgsql-performance |
Duplicated your setup in a separate DB. At least its reproducable for me..... I tested this on a Xeon 2 Ghz, 1 Gig Ram. Its running on some shared storage array that I'm not sure the details of. My production example also shows up on our production machine that is almost the same hardware but has dual zeon and 6 gigs of ram. Rhett Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=7.148..7.159 rows=1 loops=1) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.007..0.015 rows=5 loops=1) -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.055..0.055 rows=0 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.028..0.032 rows=1 loops=1) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) Total runtime: 7.693 ms (7 rows) On 8/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Rhett Garber <rhettg@gmail.com> writes: > > This is postgres 7.4.1 > > All the rows involved are integers. > > Hmph. There is something really strange going on here. I tried to > duplicate your problem in 7.4.*, thus: > > regression=# create table rtmessagestate(id int, f1 char(6)); > CREATE TABLE > regression=# insert into rtmessagestate values(1,'z'); > INSERT 559399 1 > regression=# insert into rtmessagestate values(2,'z'); > INSERT 559400 1 > regression=# insert into rtmessagestate values(3,'z'); > INSERT 559401 1 > regression=# insert into rtmessagestate values(4,'z'); > INSERT 559402 1 > regression=# insert into rtmessagestate values(5,'z'); > INSERT 559403 1 > regression=# vacuum analyze rtmessagestate; > VACUUM > regression=# create table connection(connection_registry_id int, obj1 int, obj2 int); > CREATE TABLE > regression=# create index connection_regid_obj1_index on connection(connection_registry_id,obj1); > CREATE INDEX > regression=# insert into connection values(40105,73582,3); > INSERT 559407 1 > regression=# explain analyze select rtmessagestate.* from rtmessagestate,connection where (connection_registry_id = 40105)AND (obj1 = 73582) and id = obj2; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=0.498..0.544 rows=1 loops=1) > Hash Cond: ("outer".id = "inner".obj2) > -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.030..0.072 rows=5 loops=1) > -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.305..0.305 rows=0 loops=1) > -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.236..0.264rows=1 loops=1) > Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) > Total runtime: 1.119 ms > (7 rows) > > This duplicates your example as to plan and row counts: > > > Hash Join (cost=5.96..7.04 rows=1 width=14) (actual > > time=10.591..10.609 rows=1 loops=1) > > Hash Cond: ("outer".id = "inner".obj2) > > -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) > > (actual time=0.011..0.022 rows=5 loops=1) > > -> Hash (cost=5.96..5.96 rows=1 width=4) (actual > > time=0.109..0.109 rows=0 loops=1) > > -> Index Scan using connection_regid_obj1_index on > > connection (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076 > > rows=1 loops=1) > > Index Cond: ((connection_registry_id = 40105) AND (obj1 > > = 73582)) Total runtime: 11.536 ms > > (7 rows) > > My machine is considerably slower than yours, to judge by the actual > elapsed times in the scan nodes ... so why is it beating the pants > off yours in the join step? > > Can you try the above script verbatim in a scratch database and see > what you get? (Note it's worth trying the explain two or three > times to be sure the values have settled out.) > > I'm testing a fairly recent 7.4-branch build (7.4.8 plus), so that's one > possible reason for the discrepancy between my results and yours, but I > do not see anything in the 7.4 CVS logs that looks like it's related to > hashjoin performance. > > I'd be interested to see results from other people using 7.4.* too. > > regards, tom lane >
pgsql-performance by date: