Re: Why hash join instead of nested loop? - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: Why hash join instead of nested loop? |
Date | |
Msg-id | 10930.1123549106@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Why hash join instead of nested loop? (Rhett Garber <rhettg@gmail.com>) |
Responses |
Re: Why hash join instead of nested loop?
Re: Why hash join instead of nested loop? Re: Why hash join instead of nested loop? Re: Why hash join instead of nested loop? |
List | pgsql-performance |
Rhett Garber <rhettg@gmail.com> writes: > This is postgres 7.4.1 > All the rows involved are integers. Hmph. There is something really strange going on here. I tried to duplicate your problem in 7.4.*, thus: regression=# create table rtmessagestate(id int, f1 char(6)); CREATE TABLE regression=# insert into rtmessagestate values(1,'z'); INSERT 559399 1 regression=# insert into rtmessagestate values(2,'z'); INSERT 559400 1 regression=# insert into rtmessagestate values(3,'z'); INSERT 559401 1 regression=# insert into rtmessagestate values(4,'z'); INSERT 559402 1 regression=# insert into rtmessagestate values(5,'z'); INSERT 559403 1 regression=# vacuum analyze rtmessagestate; VACUUM regression=# create table connection(connection_registry_id int, obj1 int, obj2 int); CREATE TABLE regression=# create index connection_regid_obj1_index on connection(connection_registry_id,obj1); CREATE INDEX regression=# insert into connection values(40105,73582,3); INSERT 559407 1 regression=# explain analyze select rtmessagestate.* from rtmessagestate,connection where (connection_registry_id = 40105)AND (obj1 = 73582) and id = obj2; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=0.498..0.544 rows=1 loops=1) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.030..0.072 rows=5 loops=1) -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.305..0.305 rows=0 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.236..0.264rows=1 loops=1) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) Total runtime: 1.119 ms (7 rows) This duplicates your example as to plan and row counts: > Hash Join (cost=5.96..7.04 rows=1 width=14) (actual > time=10.591..10.609 rows=1 loops=1) > Hash Cond: ("outer".id = "inner".obj2) > -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) > (actual time=0.011..0.022 rows=5 loops=1) > -> Hash (cost=5.96..5.96 rows=1 width=4) (actual > time=0.109..0.109 rows=0 loops=1) > -> Index Scan using connection_regid_obj1_index on > connection (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076 > rows=1 loops=1) > Index Cond: ((connection_registry_id = 40105) AND (obj1 > = 73582)) Total runtime: 11.536 ms > (7 rows) My machine is considerably slower than yours, to judge by the actual elapsed times in the scan nodes ... so why is it beating the pants off yours in the join step? Can you try the above script verbatim in a scratch database and see what you get? (Note it's worth trying the explain two or three times to be sure the values have settled out.) I'm testing a fairly recent 7.4-branch build (7.4.8 plus), so that's one possible reason for the discrepancy between my results and yours, but I do not see anything in the 7.4 CVS logs that looks like it's related to hashjoin performance. I'd be interested to see results from other people using 7.4.* too. regards, tom lane
pgsql-performance by date: