Thread: Slow join query optimisation?
I am trying to execute a moderately complex join to combine student enrolment data with the various tables that define the foreign keys used by the enrolment table. Here is a sample SELECT statement: SELECT users.name, subjects.canonical, periods.description, courses.description, campuses.description, studyModes.description FROM t25 agents, t27 offerings, t30 enrolments, t26 subjects, t29 periods, t28 courses, t23 campuses, t21 studyModes, t24 users WHERE agents.agent='/Subjects/1999/sem1/GCO/1812' AND offerings.masterPeriod=agents.period AND offerings.masterSubject=agents.subject AND enrolments.offeringPeriod=offerings.period AND enrolments.offeringSubject=offerings.subject AND subjects.subject=enrolments.offeringSubject AND periods.period=enrolments.offeringPeriod AND courses.course=enrolments.course AND campuses.campus=enrolments.campus AND studyModes.studyMode=enrolments.studyMode AND users.studentID=enrolments.studentID; This seems to execute unusually slowly, regardless of how little data may be in the tables. The time is all in the backend. It takes almost a second (PII 333MHz, 128M RAM, repeated test so no disk I/O involved) even with all empty tables. However, as I just slightly simplify the query (by removing all reference to one table) the time drops: 11 tables: 6.97s 10 tables: 2.47s 9 tables: 0.99s (this is the SELECT shown above) 8 tables: 0.39s 7 tables: 0.19s 6 tables: 0.12s 5 tables: 0.09s It seems to make no difference which tables I remove; the issue seems to be the total number of tables that I am joining. If it matters, all the tables have indexes on the combinations of attributes used in my WHERE clause. My question for the list: Is there something in the query optimiser that will take a time that is something like exponential on the number of tables being joined? And if so, is there any way to get around it (other than using several smaller selects and combining the data myself or denormalising my tables)? Doug. P.S. I am using version 6.5.1, but I couldn't see anything in the HISTORY for 6.5.3 that sounded likely to affect this particular issue...
Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > My question for the list: Is there something in the query optimiser > that will take a time that is something like exponential on the number > of tables being joined? Yes. See the manual's discussion of genetic query optimization. You might try setting the GEQO threshold to something less than 10. FWIW, the time you are looking at here is strictly planning time and will not increase when you actually put data in the tables. Unless you foresee fairly small final tables, the planning time is not likely to be a significant factor in this range of query sizes. Still, if the purpose of the join is just to substitute descriptions for IDs, you'd be well advised to consider doing it via table lookup on the frontend side, assuming your frontend is coded in a language that makes that reasonable to do. The Postgres optimizer cannot help but waste many cycles per query reverse-engineering your intent, even assuming that it ultimately comes up with the best possible query plan for the join... regards, tom lane
> Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > > My question for the list: Is there something in the query optimiser > > that will take a time that is something like exponential on the number > > of tables being joined? > > Yes. See the manual's discussion of genetic query optimization. You > might try setting the GEQO threshold to something less than 10. This seemed to make the query take *much* longer (like about five times as long as before!) even with empty tables? > FWIW, the time you are looking at here is strictly planning time and > will not increase when you actually put data in the tables. Unless > you foresee fairly small final tables, the planning time is not likely > to be a significant factor in this range of query sizes. That would be true if I were typically selecting a large proportion of the table data. However, in my application, a typical query only retrieves 50 tuples. The query optimisation therefore dominates the total time. I don't think the absolute table size matters much, since the tables are indexed. Here is a summary of my 9-table query timings: GEQO=11 GEQO=3 ======= ====== Empty tables (hence 0 tuples selected) 0.94s 5.09s Full tables (26 tuples selected) 0.98s 5.15s These times include executing the SELECT, which (via libpq) means that it also includes the fetch times for all the selected tuples). > Still, if the purpose of the join is just to substitute descriptions > for IDs, you'd be well advised to consider doing it via table lookup > on the frontend side, assuming your frontend is coded in a language > that makes that reasonable to do. The drawback for me here is that either I must load up all the lookup tables (when I may only need a few of the values from each of them), or else I have to look up just the values I need (which is possible but messy). Is there any way to turn off the optimisation? Or perhaps some way to work out the optimal strategy once, and then provide this information directly? After all, presumably the optimiser will work out the same answer every time (for the same table structures and the same SELECT) so it seems pointless doing this work every time the SELECT gets executed... Doug.
Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > Is there any way to turn off the optimisation? Or perhaps some way to > work out the optimal strategy once, and then provide this information > directly? Not at the moment. There's been some talk of caching query plans, which would get the job done, but none of the current developers are working on that now. It'll probably happen someday... regards, tom lane ************
On Thu, 2 Dec 1999, Tom Lane wrote: > Date: Thu, 02 Dec 1999 10:21:43 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: dougt@mugc.cc.monash.edu.au > Cc: pgsql-interfaces@postgreSQL.org > Subject: Re: [INTERFACES] Slow join query optimisation? > > Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > > Is there any way to turn off the optimisation? Or perhaps some way to > > work out the optimal strategy once, and then provide this information > > directly? > > Not at the moment. There's been some talk of caching query plans, > which would get the job done, but none of the current developers are > working on that now. It'll probably happen someday... What do you need to have 'someday' tomorrow :-? Seriously, last topics about postgres performance indicate that many people now seriously consider Postgres as a db engine for real business application. And because business is more and more shifting to Internet, we also need to think about Web-friendly postgres features. One feature (LIMIT) implemented by Jan already brings many Web-developers. Next was MVCC (and we're waiting for WAL). We need DB pooling, which would be very useful in high-performance Web application, CORBA interface is very important (I currently just reading books). Caching query plans is very desirable, probably not caching, but storing (naming), so developer could decide which query should be stored. Regards, Oleg > > regards, tom lane > > ************ > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ************
> Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > > My question for the list: Is there something in the query optimiser > > that will take a time that is something like exponential on the number > > of tables being joined? > > Yes. See the manual's discussion of genetic query optimization. You > might try setting the GEQO threshold to something less than 10. This seemed to make the query take *much* longer (like about five times as long as before!) even with empty tables? > FWIW, the time you are looking at here is strictly planning time and > will not increase when you actually put data in the tables. Unless > you foresee fairly small final tables, the planning time is not likely > to be a significant factor in this range of query sizes. That would be true if I were typically selecting a large proportion of the table data. However, in my application, a typical query only retrieves 50 tuples. The query optimisation therefore dominates the total time. I don't think the absolute table size matters much, since the tables are indexed. Here is a summary of my 9-table query timings: GEQO=11 GEQO=3 ======= ====== Empty tables (hence 0 tuples selected) 0.94s 5.09s Full tables (26 tuples selected) 0.98s 5.15s These times include executing the SELECT, which (via libpq) means that it also includes the fetch times for all the selected tuples). > Still, if the purpose of the join is just to substitute descriptions > for IDs, you'd be well advised to consider doing it via table lookup > on the frontend side, assuming your frontend is coded in a language > that makes that reasonable to do. The drawback for me here is that either I must load up all the lookup tables (when I may only need a few of the values from each of them), or else I have to look up just the values I need (which is possible but messy). Is there any way to turn off the optimisation? Or perhaps some way to work out the optimal strategy once, and then provide this information directly? After all, presumably the optimiser will work out the same answer every time (for the same table structures and the same SELECT) so it seems pointless doing this work every time the SELECT gets executed... Doug. ************
Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > Is there any way to turn off the optimisation? Or perhaps some way to > work out the optimal strategy once, and then provide this information > directly? Not at the moment. There's been some talk of caching query plans, which would get the job done, but none of the current developers are working on that now. It'll probably happen someday... regards, tom lane ************
On Thu, 2 Dec 1999, Tom Lane wrote: > Date: Thu, 02 Dec 1999 10:21:43 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: dougt@mugc.cc.monash.edu.au > Cc: pgsql-interfaces@postgreSQL.org > Subject: Re: [INTERFACES] Slow join query optimisation? > > Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > > Is there any way to turn off the optimisation? Or perhaps some way to > > work out the optimal strategy once, and then provide this information > > directly? > > Not at the moment. There's been some talk of caching query plans, > which would get the job done, but none of the current developers are > working on that now. It'll probably happen someday... What do you need to have 'someday' tomorrow :-? Seriously, last topics about postgres performance indicate that many people now seriously consider Postgres as a db engine for real business application. And because business is more and more shifting to Internet, we also need to think about Web-friendly postgres features. One feature (LIMIT) implemented by Jan already brings many Web-developers. Next was MVCC (and we're waiting for WAL). We need DB pooling, which would be very useful in high-performance Web application, CORBA interface is very important (I currently just reading books). Caching query plans is very desirable, probably not caching, but storing (naming), so developer could decide which query should be stored. Regards, Oleg > > regards, tom lane > > ************ > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ************