Thread: Any optimizations to the join code in 7.1?
Hello. I have a particular query which performs a 15-way join; I believe in normalization ;-). Under 7.0.3, using the defaults where GEQO is enabled after 11, the query (which returns 1 row) takes 10 seconds. With GEQO turned off, it takes 18 seconds. Naturally I intend to upgrade as soon as possible, but I looked through the change log and didn't see anything specific WRT large joins. I was wondering if any work had been done in that area for 7.1. I realize you can only squeeze so much blood from stone, but.... Thanks for any info, Mike Mascari mascarm@mascari.com
Mike Mascari <mascarm@mascari.com> writes: > I have a particular query which performs a 15-way join; You should read http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html regards, tom lane
On Wed, 25 Apr 2001, Tom Lane wrote: > Mike Mascari <mascarm@mascari.com> writes: > > I have a particular query which performs a 15-way join; > > You should read > http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html I was recently poring over this page myself, as I've been working w/some larger-than-usual queries. Two questions: 1) it appears (from my tests) that SELECT * FROM CREATE VIEW joined as SELECT p.id, p.pname, c.cname FROM p LEFT OUTER JOIN c using (id) gives the same answer as SELECT * FROM CREATE VIEW nested SELECT p.id, p.pname, (select c.cname from c where c.id = p.id) FROM p However, I often am writing VIEWs that will be used by developers in a front-end system. Usually, this view might have30 items in the select clause, but the developer using it is likely to only as for four or five items. In this case,I often prefer the subquery form because it appears that SELECT id, pname FROM joined is more complicated than SELECT id, pname FROM nested as the first has to perform the join, and the second doesn't. Is this actually correct? 2) The explicit-joins help suggests that manual structuring and experimentation might help -- has anyone written (or could anyone write) anthing about where to start in guessing what join order might be optimal? -- Joel Burton <jburton@scw.org> Director of Information Systems, Support Center of Washington
Joel Burton <jburton@scw.org> writes: > 1) it appears (from my tests) that SELECT * FROM > CREATE VIEW joined as > SELECT p.id, > p.pname, > c.cname > FROM p > LEFT OUTER JOIN c using (id) > gives the same answer as SELECT * FROM > CREATE VIEW nested > SELECT p.id, > p.pname, > (select c.cname from c where c.id = p.id) > FROM p Only if c.id is a unique column (ie, there are always 0 or 1 matches in c for any given p.id). Otherwise the subselect form will fail. > However, I often am writing VIEWs that will be used by developers > in a front-end system. Usually, this view might have 30 items in the > select clause, but the developer using it is likely to only as for > four or five items. In this case, I often prefer the > subquery form because it appears that > SELECT id, pname FROM joined > is more complicated than > SELECT id, pname FROM nested > as the first has to perform the join, and the second doesn't. > Is this actually correct? This approach is probably reasonable if the cname field of the view result is seldom wanted at all, and never used as a WHERE constraint. You'd get a very nonoptimal plan if someone did select * from nested where cname like 'foo%' since the planner has no way to use the LIKE constraint to limit the rows fetched from p. In the JOIN format, on the other hand, I think the constraint could be exploited. Also bear in mind that the subselect form is essentially forcing the join to be done via a nested loop. If you have an index on c.id then this may not be too bad, but without one the performance will be horrid. Even with an index, nested loop with inner indexscan is not the join method of choice if you are retrieving a lot of rows. > 2) The explicit-joins help suggests that manual structuring and > experimentation might help -- has anyone written (or could > anyone write) anthing about where to start in guessing what > join order might be optimal? The obvious starting point is the plan produced by the planner from an unconstrained query. Even if you don't feel like trying to improve it, you could cut the time to reproduce the plan quite a bit --- just CROSS JOIN a few of the relation pairs that are joined first in the unconstrained plan. regards, tom lane
On Wed, 25 Apr 2001, Tom Lane wrote: > > 2) The explicit-joins help suggests that manual structuring and > > experimentation might help -- has anyone written (or could > > anyone write) anthing about where to start in guessing what > > join order might be optimal? > > The obvious starting point is the plan produced by the planner from an > unconstrained query. Even if you don't feel like trying to improve it, > you could cut the time to reproduce the plan quite a bit --- just CROSS > JOIN a few of the relation pairs that are joined first in the > unconstrained plan. In other words, let it do the work, and steal the credit for ourselves. :-) Thanks, Tom. I appreciate your answers to my questions. In other DB systems I've used, some find that for this original query: SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo'; that this version SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo'; has slower performance than SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo'; because it can reduce b before any join. Is it safe to assume that this is a valid optimization in PostgreSQL? If this whole thing were a view, except w/o the WHERE clause, and we were querying the view w/the b.name WHERE clause, would we still see a performance boost from the right arrangement? (ie, does our criteria get pushed down early enough in the joining process?) TIA, -- Joel Burton <jburton@scw.org> Director of Information Systems, Support Center of Washington
Joel Burton <jburton@scw.org> writes: > In other DB systems I've used, some find that for this original query: > SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo'; > that this version > SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo'; > has slower performance than > SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo'; > because it can reduce b before any join. > Is it safe to assume that this is a valid optimization in PostgreSQL? In general, that'd be a waste of time --- our planner considers the same set of plans in either case. However, it could make a difference if the planner thinks that the two choices (a outer or b outer) have exactly the same cost. In that case the order you wrote them in will influence which plan actually gets picked; and if the planner's estimate is wrong --- ie, there really is a considerable difference in the costs --- then you could see a change in performance depending on which way you wrote it. That's a pretty unusual circumstance, maybe, but it just happens that I'm in the middle of looking at a planning bug wherein exactly this behavior occurs... > If this whole thing were a view, except w/o the WHERE clause, and we were > querying the view w/the b.name WHERE clause, would we still see a > performance boost from the right arrangement? (ie, does our criteria get > pushed down early enough in the joining process?) Shouldn't make a difference; AFAIK the WHERE clause will get pushed down as far as possible, independently of whether a view is involved or you wrote it out the hard way. regards, tom lane
Sorry for the delay in the response. It took be a day to get everything upgraded to 7.1. To restate the problem - in 7.0 with GEQO enabled, a 15-way join took 10 seconds. With GEQO disabled it took 18 seconds. 7.1 out of the box took only 2 seconds! I was amazed and shocked at this damned impressive improvement in planning speed....until I actually used the explicit JOIN syntax described in 11.2. Instanteous results! Instantaneous..... Thanks a bunch, (still in shock) Mike Mascari mascarm@mascari.com -----Original Message----- From: Tom Lane [SMTP:tgl@sss.pgh.pa.us] Sent: Wednesday, April 25, 2001 12:42 PM To: mascarm@mascari.com Cc: 'pgsql-hackers@postgresql.org' Subject: Re: [HACKERS] Any optimizations to the join code in 7.1? Mike Mascari <mascarm@mascari.com> writes: > I have a particular query which performs a 15-way join; You should read http://www.postgresql.org/devel-corner/docs/postgres/explicit-join s.html regards, tom lane
You can thank Tom Lane for most/all of our optimization improvements. > Sorry for the delay in the response. It took be a day to get > everything upgraded to 7.1. To restate the problem - in 7.0 with > GEQO enabled, a 15-way join took 10 seconds. With GEQO disabled it > took 18 seconds. 7.1 out of the box took only 2 seconds! I was amazed > and shocked at this damned impressive improvement in planning > speed....until I actually used the explicit JOIN syntax described in > 11.2. Instanteous results! Instantaneous..... > > Thanks a bunch, > (still in shock) > > Mike Mascari > mascarm@mascari.com > > -----Original Message----- > From: Tom Lane [SMTP:tgl@sss.pgh.pa.us] > Sent: Wednesday, April 25, 2001 12:42 PM > To: mascarm@mascari.com > Cc: 'pgsql-hackers@postgresql.org' > Subject: Re: [HACKERS] Any optimizations to the join code in 7.1? > > Mike Mascari <mascarm@mascari.com> writes: > > I have a particular query which performs a 15-way join; > > You should read > http://www.postgresql.org/devel-corner/docs/postgres/explicit-join > s.html > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> ... 7.1 out of the box took only 2 seconds! I was amazed > and shocked at this damned impressive improvement in planning > speed....until I actually used the explicit JOIN syntax described in > 11.2. Instanteous results! Instantaneous..... But it is possible, under many circumstances, for query optimization to be a benefit for a many-table query. The docs indicate that explicit join syntax bypasses that, even for inner joins, so you may find that this syntax is a net loss in performance depending on the query and your choice of table order. Presumably we will be interested in making these two forms of inner join equivalent in behavior in a future release. Tom, what are the impediments we might encounter in doing this? - Thomas
What would be nice, and I don't know how it would be done or what the syntax would be, would be a feature that allows PostgreSQL to skip not only the parsing stage, but the planning stage as well. Then, when the data has changed dramatically enough to warrant it, as you point out, a command can be issued to 'refresh' the query plan. My 15-way join has expanded to a 19-way join and is still instantaneous, albeit on a very small set of data. Before 7.1, the query would simply have taken far too long, and I would have had to denormalize the database for performance purposes. With the explicit join syntax, it allows me to design the database 'the right way'. I basically used EXPLAIN SELECT... to determine the explicit join order, so as the data changes, its something I'll have to do on occassion to ensure good performance, but at least its now possible. :-) Mike Mascari mascarm@mascari.com -----Original Message----- From: Thomas Lockhart [SMTP:lockhart@alumni.caltech.edu] Sent: Friday, April 27, 2001 9:49 PM To: mascarm@mascari.com; 'Tom Lane' Cc: 'pgsql-hackers@postgresql.org' Subject: [HACKERS] Re: Any optimizations to the join code in 7.1? > ... 7.1 out of the box took only 2 seconds! I was amazed > and shocked at this damned impressive improvement in planning > speed....until I actually used the explicit JOIN syntax described in > 11.2. Instanteous results! Instantaneous..... But it is possible, under many circumstances, for query optimization to be a benefit for a many-table query. The docs indicate that explicit join syntax bypasses that, even for inner joins, so you may find that this syntax is a net loss in performance depending on the query and your choice of table order. Presumably we will be interested in making these two forms of inner join equivalent in behavior in a future release. Tom, what are the impediments we might encounter in doing this? - Thomas ---------------------------(end of broadcast)--------------------- ------ TIP 4: Don't 'kill -9' the postmaster
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > But it is possible, under many circumstances, for query optimization to > be a benefit for a many-table query. The docs indicate that explicit > join syntax bypasses that, even for inner joins, so you may find that > this syntax is a net loss in performance depending on the query and your > choice of table order. > Presumably we will be interested in making these two forms of inner join > equivalent in behavior in a future release. Tom, what are the > impediments we might encounter in doing this? I don't think there are any real technical problems in the way; it's simply an implementation choice not to treat INNER JOIN the same as an implicit join list. I did it that way in 7.1 mainly as a flyer, to see how many people would think it's a feature vs. how many think it's a bug. The votes aren't all in yet, but here we have Mike apparently pretty pleased with it, while I recall at least one other person who was not happy with the 7.1 behavior. regards, tom lane