Thread: Why hash join instead of nested loop?
I've got similiar queries that I think should be evaluated (as displayed through 'explain') the same, but don't. Hopefully this is the rigth place to send such a question and one of you can help explain this to me. The Tables: Connection - 1.2 million entries, about 60 megs, 3 integer fields that link two tables together (i.e. an identifier and two foreign keys). has an index on the identifier and either of the foreign keys. rtmessagestate - very small, 5 entries rtmessage - pretty big, 80,000 entries The Queries: select rtmessagestate.* from rtmessagestate, connection where connection_registry_id = 40105 and obj1 = 73582 and obj2 = rtmessagestate.id; returns 1 in 13.7 ms select rtmessage.id, subject from rtmessage, connection where connection_registry_id = 40003 and obj1 = 4666 and obj2 = rtmessage.id; returns 12 in 2 ms Some more possibly important details: entries in Connection with connection_registry_id = 40105: 30,000 entries with this id and obj1 = 73582: 1 entries in Connection with connection_registry_id = 40003: 6,000 entries with this id and obj1 = 4666: 20 but as I said before, there is an btree index on (connection_registry_id, obj1) Explain: The first query, breaks down as: Hash Join (cost=5.96..7.04 rows=1 width=14) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) -> Hash (cost=5.96..5.96 rows=1 width=4) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..5.96 rows=1 width=4) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582))(6 rows) While the second query is: Nested Loop (cost=0.00..11.62 rows=2 width=38) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..5.96 rows=1 width=4) Index Cond: ((connection_registry_id = 40003) AND (obj1 = 4666)) -> Index Scan using rtmessage_pkey on rtmessage (cost=0.00..5.65 rows=1 width=38) Index Cond: ("outer".obj2 = rtmessage.id) (5 rows) Actually running these queries shows that the second one (nested loop) is much faster than the hash join, presumably because of hash startup costs. Any ideas how I can make them both use the nested loop. I assume that this would be the fastest for both. Oddly enough, running the 1st query (rtmessagestate) as two queries or with a sub query is way faster than doing the join. And yes, I realize this schema may not be the most efficient for these examples, but it seems to be the most flexible. I'm working on some schema improvements also but if I could understand why this is slow that woudl probably help also. Thanks for you help, Rhett
Rhett, Please post the explain analyze for both queries. From that we can see the predicted and the actual costs of them. Regards, Otto ----- Original Message ----- From: "Rhett Garber" <rhettg@gmail.com> To: <pgsql-performance@postgresql.org> Sent: Friday, August 05, 2005 8:35 PM Subject: [PERFORM] Why hash join instead of nested loop? I've got similiar queries that I think should be evaluated (as displayed through 'explain') the same, but don't. Hopefully this is the rigth place to send such a question and one of you can help explain this to me. The Tables: Connection - 1.2 million entries, about 60 megs, 3 integer fields that link two tables together (i.e. an identifier and two foreign keys). has an index on the identifier and either of the foreign keys. rtmessagestate - very small, 5 entries rtmessage - pretty big, 80,000 entries The Queries: select rtmessagestate.* from rtmessagestate, connection where connection_registry_id = 40105 and obj1 = 73582 and obj2 = rtmessagestate.id; returns 1 in 13.7 ms select rtmessage.id, subject from rtmessage, connection where connection_registry_id = 40003 and obj1 = 4666 and obj2 = rtmessage.id; returns 12 in 2 ms Some more possibly important details: entries in Connection with connection_registry_id = 40105: 30,000 entries with this id and obj1 = 73582: 1 entries in Connection with connection_registry_id = 40003: 6,000 entries with this id and obj1 = 4666: 20 but as I said before, there is an btree index on (connection_registry_id, obj1) Explain: The first query, breaks down as: Hash Join (cost=5.96..7.04 rows=1 width=14) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) -> Hash (cost=5.96..5.96 rows=1 width=4) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..5.96 rows=1 width=4) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582))(6 rows) While the second query is: Nested Loop (cost=0.00..11.62 rows=2 width=38) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..5.96 rows=1 width=4) Index Cond: ((connection_registry_id = 40003) AND (obj1 = 4666)) -> Index Scan using rtmessage_pkey on rtmessage (cost=0.00..5.65 rows=1 width=38) Index Cond: ("outer".obj2 = rtmessage.id) (5 rows) Actually running these queries shows that the second one (nested loop) is much faster than the hash join, presumably because of hash startup costs. Any ideas how I can make them both use the nested loop. I assume that this would be the fastest for both. Oddly enough, running the 1st query (rtmessagestate) as two queries or with a sub query is way faster than doing the join. And yes, I realize this schema may not be the most efficient for these examples, but it seems to be the most flexible. I'm working on some schema improvements also but if I could understand why this is slow that woudl probably help also. Thanks for you help, Rhett ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
On 8/5/05, Havasvölgyi Ottó <h.otto@freemail.hu> wrote: > Please post the explain analyze for both queries. From that we can see the > predicted and the actual costs of them. > select rtmessagestate.* from rtmessagestate, connection where > connection_registry_id = 40105 and obj1 = 73582 and obj2 = > rtmessagestate.id; Hash Join (cost=5.96..7.04 rows=1 width=14) (actual time=10.591..10.609 rows=1 loops=1) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.011..0.022 rows=5 loops=1) -> Hash (cost=5.96..5.96 rows=1 width=4) (actual time=0.109..0.109 rows=0 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076 rows=1 loops=1) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) Total runtime: 11.536 ms (7 rows) > select rtmessage.id, subject from rtmessage, connection where > connection_registry_id = 40003 and obj1 = 4666 and obj2 = > rtmessage.id; Nested Loop (cost=0.00..11.62 rows=2 width=38) (actual time=0.186..0.970 rows=12 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..5.96 rows=1 width=4) (actual time=0.109..0.308 rows=12 loops=1) Index Cond: ((connection_registry_id = 40003) AND (obj1 = 4666)) -> Index Scan using rtmessage_pkey on rtmessage (cost=0.00..5.65 rows=1 width=38) (actual time=0.032..0.039 rows=1 loops=12) Index Cond: ("outer".obj2 = rtmessage.id) Total runtime: 1.183 ms (6 rows) Rhett
Rhett Garber <rhettg@gmail.com> writes: > Hash Join (cost=5.96..7.04 rows=1 width=14) (actual > time=10.591..10.609 rows=1 loops=1) > Hash Cond: ("outer".id = "inner".obj2) > -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) > (actual time=0.011..0.022 rows=5 loops=1) > -> Hash (cost=5.96..5.96 rows=1 width=4) (actual > time=0.109..0.109 rows=0 loops=1) > -> Index Scan using connection_regid_obj1_index on > connection (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076 > rows=1 loops=1) > Index Cond: ((connection_registry_id = 40105) AND (obj1 > = 73582)) Total runtime: 11.536 ms > (7 rows) [ scratches head... ] If the hash table build takes only 0.109 msec and loads only one row into the hash table, and the scan of rtmessagestate takes only 0.022 msec and produces only 5 rows, it is real hard to see how the join takes 10.609 msec overall. Unless the id and obj2 columns are of a datatype with an incredibly slow equality function. What is the datatype involved here, anyway? And what PG version are we speaking of? regards, tom lane
This is postgres 7.4.1 All the rows involved are integers. Thanks, Rhett On 8/5/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Rhett Garber <rhettg@gmail.com> writes: > > Hash Join (cost=5.96..7.04 rows=1 width=14) (actual > > time=10.591..10.609 rows=1 loops=1) > > Hash Cond: ("outer".id = "inner".obj2) > > -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) > > (actual time=0.011..0.022 rows=5 loops=1) > > -> Hash (cost=5.96..5.96 rows=1 width=4) (actual > > time=0.109..0.109 rows=0 loops=1) > > -> Index Scan using connection_regid_obj1_index on > > connection (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076 > > rows=1 loops=1) > > Index Cond: ((connection_registry_id = 40105) AND (obj1 > > = 73582)) Total runtime: 11.536 ms > > (7 rows) > > [ scratches head... ] If the hash table build takes only 0.109 msec > and loads only one row into the hash table, and the scan of > rtmessagestate takes only 0.022 msec and produces only 5 rows, it is > real hard to see how the join takes 10.609 msec overall. Unless the id > and obj2 columns are of a datatype with an incredibly slow equality > function. What is the datatype involved here, anyway? And what PG > version are we speaking of? > > regards, tom lane >
Rhett Garber <rhettg@gmail.com> writes: > This is postgres 7.4.1 > All the rows involved are integers. Hmph. There is something really strange going on here. I tried to duplicate your problem in 7.4.*, thus: regression=# create table rtmessagestate(id int, f1 char(6)); CREATE TABLE regression=# insert into rtmessagestate values(1,'z'); INSERT 559399 1 regression=# insert into rtmessagestate values(2,'z'); INSERT 559400 1 regression=# insert into rtmessagestate values(3,'z'); INSERT 559401 1 regression=# insert into rtmessagestate values(4,'z'); INSERT 559402 1 regression=# insert into rtmessagestate values(5,'z'); INSERT 559403 1 regression=# vacuum analyze rtmessagestate; VACUUM regression=# create table connection(connection_registry_id int, obj1 int, obj2 int); CREATE TABLE regression=# create index connection_regid_obj1_index on connection(connection_registry_id,obj1); CREATE INDEX regression=# insert into connection values(40105,73582,3); INSERT 559407 1 regression=# explain analyze select rtmessagestate.* from rtmessagestate,connection where (connection_registry_id = 40105)AND (obj1 = 73582) and id = obj2; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=0.498..0.544 rows=1 loops=1) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.030..0.072 rows=5 loops=1) -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.305..0.305 rows=0 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.236..0.264rows=1 loops=1) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) Total runtime: 1.119 ms (7 rows) This duplicates your example as to plan and row counts: > Hash Join (cost=5.96..7.04 rows=1 width=14) (actual > time=10.591..10.609 rows=1 loops=1) > Hash Cond: ("outer".id = "inner".obj2) > -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) > (actual time=0.011..0.022 rows=5 loops=1) > -> Hash (cost=5.96..5.96 rows=1 width=4) (actual > time=0.109..0.109 rows=0 loops=1) > -> Index Scan using connection_regid_obj1_index on > connection (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076 > rows=1 loops=1) > Index Cond: ((connection_registry_id = 40105) AND (obj1 > = 73582)) Total runtime: 11.536 ms > (7 rows) My machine is considerably slower than yours, to judge by the actual elapsed times in the scan nodes ... so why is it beating the pants off yours in the join step? Can you try the above script verbatim in a scratch database and see what you get? (Note it's worth trying the explain two or three times to be sure the values have settled out.) I'm testing a fairly recent 7.4-branch build (7.4.8 plus), so that's one possible reason for the discrepancy between my results and yours, but I do not see anything in the 7.4 CVS logs that looks like it's related to hashjoin performance. I'd be interested to see results from other people using 7.4.* too. regards, tom lane
On Mon, Aug 08, 2005 at 08:58:26PM -0400, Tom Lane wrote: > Hmph. There is something really strange going on here. I tried to > duplicate your problem in 7.4.*, thus: PostgreSQL 7.4.7 (Debian sarge): <create table and stuff, exactly the same as you> regression=# explain analyze select rtmessagestate.* from rtmessagestate,connection where (connection_registry_id = 40105)AND (obj1 = 73582) and id = obj2; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=0.155..0.159 rows=1 loops=1) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.003..0.006 rows=5 loops=1) -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.026..0.026 rows=0 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.011..0.012rows=1 loops=1) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) Total runtime: 0.215 ms (7 rows) This is an Opteron (in 32-bit mode), though. /* Steinar */ -- Homepage: http://www.sesse.net/
On Mon, Aug 08, 2005 at 08:58:26PM -0400, Tom Lane wrote: > I'd be interested to see results from other people using 7.4.* too. I just built 7.4.1 on FreeBSD 4.11-STABLE and ran your test: test=# explain analyze select rtmessagestate.* from rtmessagestate,connection where (connection_registry_id = 40105) AND(obj1 = 73582) and id = obj2; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=0.220..0.264 rows=1 loops=1) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.015..0.050 rows=5 loops=1) -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.103..0.103 rows=0 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.070..0.081rows=1 loops=1) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) Total runtime: 0.495 ms (7 rows) -- Michael Fuhr
On Mon, 2005-08-08 at 20:58, Tom Lane wrote: > I'd be interested to see results from other people using 7.4.* too. 7.4.8: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=0.122..0.126 rows=1 loops=1) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.003..0.006 rows=5 loops=1) -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.021..0.021 rows=0 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.013..0.015 rows=1 loops=1) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) Total runtime: 0.198 ms 7.4.2: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=0.577..0.600 rows=1 loops=1) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.006..0.023 rows=5 loops=1) -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.032..0.032 rows=0 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.016..0.020 rows=1 loops=1) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) Total runtime: 0.697 ms --Ian
Ian Westmacott <ianw@intellivid.com> writes: > On Mon, 2005-08-08 at 20:58, Tom Lane wrote: >> I'd be interested to see results from other people using 7.4.* too. > 7.4.8: > Total runtime: 0.198 ms > 7.4.2: > Total runtime: 0.697 ms Just to be clear: those are two different machines of different speeds, right? I don't believe we put any factor-of-three speedups into 7.4.* after release ;-) regards, tom lane
Yes, sorry, two totally different machines. The 7.4.8 run was on a dual P4 3.2GHz, and the 7.4.2 run was on a dual hyperthreaded Xeon 2.4GHz. --Ian On Tue, 2005-08-09 at 10:33, Tom Lane wrote: > Ian Westmacott <ianw@intellivid.com> writes: > > On Mon, 2005-08-08 at 20:58, Tom Lane wrote: > >> I'd be interested to see results from other people using 7.4.* too. > > > 7.4.8: > > Total runtime: 0.198 ms > > > 7.4.2: > > Total runtime: 0.697 ms > > Just to be clear: those are two different machines of different speeds, > right? I don't believe we put any factor-of-three speedups into 7.4.* > after release ;-) > > regards, tom lane
Duplicated your setup in a separate DB. At least its reproducable for me..... I tested this on a Xeon 2 Ghz, 1 Gig Ram. Its running on some shared storage array that I'm not sure the details of. My production example also shows up on our production machine that is almost the same hardware but has dual zeon and 6 gigs of ram. Rhett Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=7.148..7.159 rows=1 loops=1) Hash Cond: ("outer".id = "inner".obj2) -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.007..0.015 rows=5 loops=1) -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.055..0.055 rows=0 loops=1) -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.028..0.032 rows=1 loops=1) Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) Total runtime: 7.693 ms (7 rows) On 8/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Rhett Garber <rhettg@gmail.com> writes: > > This is postgres 7.4.1 > > All the rows involved are integers. > > Hmph. There is something really strange going on here. I tried to > duplicate your problem in 7.4.*, thus: > > regression=# create table rtmessagestate(id int, f1 char(6)); > CREATE TABLE > regression=# insert into rtmessagestate values(1,'z'); > INSERT 559399 1 > regression=# insert into rtmessagestate values(2,'z'); > INSERT 559400 1 > regression=# insert into rtmessagestate values(3,'z'); > INSERT 559401 1 > regression=# insert into rtmessagestate values(4,'z'); > INSERT 559402 1 > regression=# insert into rtmessagestate values(5,'z'); > INSERT 559403 1 > regression=# vacuum analyze rtmessagestate; > VACUUM > regression=# create table connection(connection_registry_id int, obj1 int, obj2 int); > CREATE TABLE > regression=# create index connection_regid_obj1_index on connection(connection_registry_id,obj1); > CREATE INDEX > regression=# insert into connection values(40105,73582,3); > INSERT 559407 1 > regression=# explain analyze select rtmessagestate.* from rtmessagestate,connection where (connection_registry_id = 40105)AND (obj1 = 73582) and id = obj2; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=0.498..0.544 rows=1 loops=1) > Hash Cond: ("outer".id = "inner".obj2) > -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) (actual time=0.030..0.072 rows=5 loops=1) > -> Hash (cost=4.83..4.83 rows=1 width=4) (actual time=0.305..0.305 rows=0 loops=1) > -> Index Scan using connection_regid_obj1_index on connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.236..0.264rows=1 loops=1) > Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582)) > Total runtime: 1.119 ms > (7 rows) > > This duplicates your example as to plan and row counts: > > > Hash Join (cost=5.96..7.04 rows=1 width=14) (actual > > time=10.591..10.609 rows=1 loops=1) > > Hash Cond: ("outer".id = "inner".obj2) > > -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) > > (actual time=0.011..0.022 rows=5 loops=1) > > -> Hash (cost=5.96..5.96 rows=1 width=4) (actual > > time=0.109..0.109 rows=0 loops=1) > > -> Index Scan using connection_regid_obj1_index on > > connection (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076 > > rows=1 loops=1) > > Index Cond: ((connection_registry_id = 40105) AND (obj1 > > = 73582)) Total runtime: 11.536 ms > > (7 rows) > > My machine is considerably slower than yours, to judge by the actual > elapsed times in the scan nodes ... so why is it beating the pants > off yours in the join step? > > Can you try the above script verbatim in a scratch database and see > what you get? (Note it's worth trying the explain two or three > times to be sure the values have settled out.) > > I'm testing a fairly recent 7.4-branch build (7.4.8 plus), so that's one > possible reason for the discrepancy between my results and yours, but I > do not see anything in the 7.4 CVS logs that looks like it's related to > hashjoin performance. > > I'd be interested to see results from other people using 7.4.* too. > > regards, tom lane >
Rhett Garber <rhettg@gmail.com> writes: > Duplicated your setup in a separate DB. > At least its reproducable for me..... Hmm. Well, we now have several data points but they seem to be on wildly varying hardware. To try to normalize the results a little, I computed the total actual time for the hash plan divided by the sum of the actual times for the two scan nodes. Thus, for your example: > Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=7.148..7.159 > rows=1 loops=1) > Hash Cond: ("outer".id = "inner".obj2) > -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14) > (actual time=0.007..0.015 rows=5 loops=1) > -> Hash (cost=4.83..4.83 rows=1 width=4) (actual > time=0.055..0.055 rows=0 loops=1) > -> Index Scan using connection_regid_obj1_index on > connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.028..0.032 > rows=1 loops=1) > Index Cond: ((connection_registry_id = 40105) AND (obj1 > = 73582)) Total runtime: 7.693 ms > (7 rows) this would be 7.159 / (0.015 + 0.032). This is probably not an enormously robust statistic but it at least focuses attention in the right place. Here's what I get (rounded off to 4 digits which is surely as much precision as we have in the numbers): Tom 7.4.8+ 1.619 Ian 7.4.8 6.000 Ian 7.4.2 13.95 Steinar 7.4.7 8.833 Rhett orig 108.3 Rhett test 152.3 Michael 7.4.1 2.015 My number seems to be a bit of an outlier to the low side, but yours are way the heck to the high side. And Michael's test seems to rule out the idea that it's something broken in 7.4.1 in particular. I'm now thinking you've got either a platform- or compiler-specific problem. Exactly what is the hardware (the CPU not the disks)? How did you build or come by the Postgres executables (compiler, configure options, etc)? regards, tom lane
> I'm now thinking you've got either a platform- or compiler-specific > problem. Exactly what is the hardware (the CPU not the disks)? How did > you build or come by the Postgres executables (compiler, configure > options, etc)? I've tried it on two of our machines, both HP Proliant DL580: Production: Intel(R) Xeon(TM) MP CPU 2.80GHz (I think there are 2 physical CPUs with Hyperthreading, shows up as 4) 6 gigs RAM Development: Intel(R) XEON(TM) MP CPU 2.00GHz (I have vague recollection of disabling hyperthreading on this chip because of some other kernel issue) 1 gig RAM They are both running SuSE 8, 2.4.21-128-smp kernel Compile instructions (I didn't do it myself) indicate we built from source with nothing fancy: tar xpvf postgresql-7.4.1.tar.bz2 cd postgresql-7.4.1 ./configure --prefix=/usr/local/postgresql-7.4.1 make make install make install-all-headers If i run 'file' on /usr/local/postgresql-7.4.1/bin/postgres : postgres: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), not stripped Thanks for all your help guys, Rhett
Rhett Garber <rhettg@gmail.com> writes: > They are both running SuSE 8, 2.4.21-128-smp kernel > Compile instructions (I didn't do it myself) indicate we built from > source with nothing fancy: You could double-check the configure options by running pg_config. But probably the more interesting question is whether any nondefault CFLAGS were used, and I don't think pg_config records that. (Hmm, maybe it should.) In any case, there's no smoking gun there. I'm now wondering if maybe there's something unusual about your runtime parameters. AFAIR you didn't show us your postgresql.conf settings --- could we see any nondefault entries there? (I looked quickly at the 7.4 hashjoin code, and I see that it uses a hash table sized according to sort_mem even when the input is predicted to be very small ... so an enormous sort_mem setting would account for some plan startup overhead to initialize the table ...) regards, tom lane
Well that could be an issue, is this abnormally large: #shared_buffers = 1536 # min 16, at least max_connections*2, 8KB each shared_buffers = 206440 #sort_mem = 131072 # min 64, size in KB sort_mem = 524288 # min 64, size in KB vacuum_mem = 131072 # min 1024, size in K I actually had a lot of trouble finding example values for these... no one wants to give real numbers in any postgres performance tuning articles I saw. What would be appropriate for machines with 1 or 6 gigs of RAM and wanting to maximize performance. Rhett On 8/9/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Rhett Garber <rhettg@gmail.com> writes: > > They are both running SuSE 8, 2.4.21-128-smp kernel > > > Compile instructions (I didn't do it myself) indicate we built from > > source with nothing fancy: > > You could double-check the configure options by running pg_config. > But probably the more interesting question is whether any nondefault > CFLAGS were used, and I don't think pg_config records that. > (Hmm, maybe it should.) > > In any case, there's no smoking gun there. I'm now wondering if maybe > there's something unusual about your runtime parameters. AFAIR you > didn't show us your postgresql.conf settings --- could we see any > nondefault entries there? > > (I looked quickly at the 7.4 hashjoin code, and I see that it uses a > hash table sized according to sort_mem even when the input is predicted > to be very small ... so an enormous sort_mem setting would account for > some plan startup overhead to initialize the table ...) > > regards, tom lane >
Rhett Garber <rhettg@gmail.com> writes: > Well that could be an issue, is this abnormally large: > #shared_buffers = 1536 # min 16, at least max_connections*2, 8KB each > shared_buffers = 206440 > #sort_mem = 131072 # min 64, size in KB > sort_mem = 524288 # min 64, size in KB > vacuum_mem = 131072 # min 1024, size in K The vacuum_mem number is OK I think, but both of the others seem unreasonably large. Conventional wisdom about shared_buffers is that the sweet spot is maybe 10000 or so buffers, rarely more than 50000. (Particularly in pre-8.0 releases, there are code paths that grovel through all the buffers linearly, so there is a significant cost to making it too large.) Don't worry about it being too small to make effective use of RAM --- we rely on the kernel's disk cache to do that. sort_mem is *per sort*, and so half a gig in a machine with only a couple of gig is far too much except when you know you have only one query running. A couple dozen backends each trying to use half a gig will drive you into the ground in no time. Conventional wisdom here is that the global setting should be conservatively small (perhaps 10Mb to 100Mb depending on how many concurrent backends you expect to have), and then you can explicitly increase it locally with SET for specific queries that need it. In terms of the problem at hand, try the test case with a few different values of sort_mem (use SET to adjust it, you don't need to touch the config file) and see what happens. I think the cost you're seeing is just startup overhead to zero a hash table of a few hundred meg ... regards, tom lane
Bingo, the smaller the sort_mem, the faster that query is. Thanks a lot to everybody that helped, i'll tweak with these values more when I get a chance now that I have some guidelines that make sense. Rhett On 8/9/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Rhett Garber <rhettg@gmail.com> writes: > > Well that could be an issue, is this abnormally large: > > #shared_buffers = 1536 # min 16, at least max_connections*2, 8KB each > > shared_buffers = 206440 > > #sort_mem = 131072 # min 64, size in KB > > sort_mem = 524288 # min 64, size in KB > > vacuum_mem = 131072 # min 1024, size in K > > The vacuum_mem number is OK I think, but both of the others seem > unreasonably large. Conventional wisdom about shared_buffers is that > the sweet spot is maybe 10000 or so buffers, rarely more than 50000. > (Particularly in pre-8.0 releases, there are code paths that grovel > through all the buffers linearly, so there is a significant cost to > making it too large.) Don't worry about it being too small to make > effective use of RAM --- we rely on the kernel's disk cache to do that. > > sort_mem is *per sort*, and so half a gig in a machine with only a > couple of gig is far too much except when you know you have only one > query running. A couple dozen backends each trying to use half a gig > will drive you into the ground in no time. Conventional wisdom here > is that the global setting should be conservatively small (perhaps > 10Mb to 100Mb depending on how many concurrent backends you expect to > have), and then you can explicitly increase it locally with SET for > specific queries that need it. > > In terms of the problem at hand, try the test case with a few different > values of sort_mem (use SET to adjust it, you don't need to touch the > config file) and see what happens. I think the cost you're seeing is > just startup overhead to zero a hash table of a few hundred meg ... > > regards, tom lane >