Thread: Extremely irregular query performance
Hi, I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's) with 4Gb of RAM. I have recently noticed that the performance of some more complex queries is extremely variable and irregular. For example, I currently have a query that returns a small number of rows (5) by joining a dozen of tables. Below are the running times obtained by repeatedly lauching this query in psql (nothing else was running on the server at that time): Time: 424.848 ms Time: 1615.143 ms Time: 15036.475 ms Time: 83471.683 ms Time: 163.224 ms Time: 2454.939 ms Time: 188.093 ms Time: 158.071 ms Time: 192.431 ms Time: 195.076 ms Time: 635.739 ms Time: 164549.902 ms As you can see, the performance is most of the time pretty good (less than 1 second), but every fourth of fifth time I launch the query the server seems to go into orbit. For the longer running times, I can see from 'top' that the server process uses almost 100% of a CPU. This is rather worrisome, as I cannot be confident of the overall performance of my application with so much variance in query response times. I suspect a configuration problem related to the cache mechanism (shared_buffers? effective_cache_size?), but to be honest I do not know where to start to diagnose it. I also noticed that the query plan can vary when the same query is launched two times in a row (with no other changes to the DB in between). Is there a random aspect to the query optimizer that could explain some of the observed variance in performance ? Any help would be greatly appreciated. Thanks in advance, J-P
=?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes: > I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's) > with 4Gb of RAM. I have recently noticed that the performance of > some more complex queries is extremely variable and irregular. > For example, I currently have a query that returns a small number > of rows (5) by joining a dozen of tables. A dozen tables? You're exceeding the geqo_threshold and getting a plan that has some randomness in it. You could either increase geqo_threshold if you can stand the extra planning time, or try increasing geqo_effort to get it to search a little harder and hopefully find a passable plan more often. See http://www.postgresql.org/docs/8.1/static/geqo.html http://www.postgresql.org/docs/8.1/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-GEQO I'm kinda surprised that you don't get better results with the default settings. We could tinker some more with the defaults, if you can provide evidence about better values ... regards, tom lane
On Wed, 2006-01-11 at 16:37, Jean-Philippe Côté wrote: > Hi, > > I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's) > with 4Gb of RAM. I have recently noticed that the performance of > some more complex queries is extremely variable and irregular. > For example, I currently have a query that returns a small number > of rows (5) by joining a dozen of tables. Below are the running times > obtained by repeatedly lauching this query in psql (nothing else > was running on the server at that time): > > Time: 424.848 ms > Time: 1615.143 ms > Time: 15036.475 ms > Time: 83471.683 ms > Time: 163.224 ms > Time: 2454.939 ms > Time: 188.093 ms > Time: 158.071 ms > Time: 192.431 ms > Time: 195.076 ms > Time: 635.739 ms > Time: 164549.902 ms > > As you can see, the performance is most of the time pretty good (less > than 1 second), but every fourth of fifth time I launch the query > the server seems to go into orbit. For the longer running times, > I can see from 'top' that the server process uses almost 100% of > a CPU. As mentioned earlier, it could be you're exceeding the GEQO threshold. It could also be that you are doing just enough else at the time, and have your shared buffers or sort mem high enough that you're initiating a swap storm. Mind posting all the parts of your postgresql.conf file you've changed from the default?
Thanks a lot for this info, I was indeed exceeding the genetic optimizer's threshold. Now that it is turned off, I get a very stable response time of 435ms (more or less 5ms) for the same query. It is about three times slower than the best I got with the genetic optimizer on, but the overall average is much lower. I'll also try to play with the geqo parameters and see if things improve. Thanks again, J-P -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tom Lane Sent: January 11, 2006 6:03 PM To: Jean-Philippe Côté Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Extremely irregular query performance =?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes: > I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's) > with 4Gb of RAM. I have recently noticed that the performance of > some more complex queries is extremely variable and irregular. > For example, I currently have a query that returns a small number > of rows (5) by joining a dozen of tables. A dozen tables? You're exceeding the geqo_threshold and getting a plan that has some randomness in it. You could either increase geqo_threshold if you can stand the extra planning time, or try increasing geqo_effort to get it to search a little harder and hopefully find a passable plan more often. See http://www.postgresql.org/docs/8.1/static/geqo.html http://www.postgresql.org/docs/8.1/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-GEQO I'm kinda surprised that you don't get better results with the default settings. We could tinker some more with the defaults, if you can provide evidence about better values ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
If this is a query that will be executed more than once, you can also avoid incurring the planning overhead multiple times by using PREPARE. -- Mark Lewis On Wed, 2006-01-11 at 18:50 -0500, Jean-Philippe Côté wrote: > Thanks a lot for this info, I was indeed exceeding the genetic > optimizer's threshold. Now that it is turned off, I get > a very stable response time of 435ms (more or less 5ms) for > the same query. It is about three times slower than the best > I got with the genetic optimizer on, but the overall average > is much lower. > > I'll also try to play with the geqo parameters and see if things > improve. > > Thanks again, > > J-P
=?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes: > Thanks a lot for this info, I was indeed exceeding the genetic > optimizer's threshold. Now that it is turned off, I get > a very stable response time of 435ms (more or less 5ms) for > the same query. It is about three times slower than the best > I got with the genetic optimizer on, but the overall average > is much lower. Hmm. It would be interesting to use EXPLAIN ANALYZE to confirm that the plan found this way is the same as the best plan found by GEQO, and the extra couple hundred msec is the price you pay for the exhaustive plan search. If GEQO is managing to find a plan better than the regular planner then we need to look into why ... regards, tom lane
On Wed, 2006-01-11 at 22:23 -0500, Tom Lane wrote: > =?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes: > > Thanks a lot for this info, I was indeed exceeding the genetic > > optimizer's threshold. Now that it is turned off, I get > > a very stable response time of 435ms (more or less 5ms) for > > the same query. It is about three times slower than the best > > I got with the genetic optimizer on, but the overall average > > is much lower. > > Hmm. It would be interesting to use EXPLAIN ANALYZE to confirm that the > plan found this way is the same as the best plan found by GEQO, and > the extra couple hundred msec is the price you pay for the exhaustive > plan search. If GEQO is managing to find a plan better than the regular > planner then we need to look into why ... It seems worth noting in the EXPLAIN whether GEQO has been used to find the plan, possibly along with other factors influencing the plan such as enable_* settings. Best Regards, Simon Riggs
Can I actully know whether a given plan is excuted with GEQO on ? In other words, if I launch 'explain <query>', I'll get a given plan, but if I re-launch the <query> (withtout the 'explain' keyword), could I get a different plan given that GEQO induces some randomness ? >Is it the plan that is different in the fastest case with GEQO or is it >the time needed to plan that is causing the GEQO to beat the exhaustive >search?
Jean-Philippe Cote wrote: > > > Can I actully know whether a given plan is excuted with GEQO on ? > In other words, if I launch 'explain <query>', I'll get a given plan, but if I re-launch > the <query> (withtout the 'explain' keyword), could I get a different > plan given that GEQO induces some randomness ? > > >Is it the plan that is different in the fastest case with GEQO or is it > >the time needed to plan that is causing the GEQO to beat the exhaustive > >search? Yes, is it likely that when using GEQO you would get a different plan each time, so running it with and without EXPLAIN would produce different plans. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Thu, Jan 12, 2006 at 09:48:41AM +0000, Simon Riggs wrote: > On Wed, 2006-01-11 at 22:23 -0500, Tom Lane wrote: > > =?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes: > > > Thanks a lot for this info, I was indeed exceeding the genetic > > > optimizer's threshold. Now that it is turned off, I get > > > a very stable response time of 435ms (more or less 5ms) for > > > the same query. It is about three times slower than the best > > > I got with the genetic optimizer on, but the overall average > > > is much lower. > > > > Hmm. It would be interesting to use EXPLAIN ANALYZE to confirm that the > > plan found this way is the same as the best plan found by GEQO, and > > the extra couple hundred msec is the price you pay for the exhaustive > > plan search. If GEQO is managing to find a plan better than the regular > > planner then we need to look into why ... > > It seems worth noting in the EXPLAIN whether GEQO has been used to find > the plan, possibly along with other factors influencing the plan such as > enable_* settings. > Is it the plan that is different in the fastest case with GEQO or is it the time needed to plan that is causing the GEQO to beat the exhaustive search? Ken
On Thu, Jan 12, 2006 at 03:23:14PM -0500, Jean-Philippe Cote wrote: > > > Can I actully know whether a given plan is excuted with GEQO on ? > In other words, if I launch 'explain <query>', I'll get a given plan, but if I re-launch > the <query> (withtout the 'explain' keyword), could I get a different > plan given that GEQO induces some randomness ? > > >Is it the plan that is different in the fastest case with GEQO or is it > >the time needed to plan that is causing the GEQO to beat the exhaustive > >search? > GEQO will be used if the number of joins is over the GEQO limit in the configuration file. The GEQO process is an iterative random process to find an query plan. The EXPLAIN results are the plan for that query, but not neccessarily for subsequent runs. GEQO's advantage is a much faster plan time than the exhaustive search method normally used. If the resulting plan time is less than the exhaustive search plan time, for short queries you can have the GECO run more quickly than the exhaustive search result. Of course, if you PREPARE the query the plan time drops out. Ken
Simon Riggs wrote: > On Wed, 2006-01-11 at 22:23 -0500, Tom Lane wrote: > > =?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes: > > > Thanks a lot for this info, I was indeed exceeding the genetic > > > optimizer's threshold. Now that it is turned off, I get > > > a very stable response time of 435ms (more or less 5ms) for > > > the same query. It is about three times slower than the best > > > I got with the genetic optimizer on, but the overall average > > > is much lower. > > > > Hmm. It would be interesting to use EXPLAIN ANALYZE to confirm that the > > plan found this way is the same as the best plan found by GEQO, and > > the extra couple hundred msec is the price you pay for the exhaustive > > plan search. If GEQO is managing to find a plan better than the regular > > planner then we need to look into why ... > > It seems worth noting in the EXPLAIN whether GEQO has been used to find > the plan, possibly along with other factors influencing the plan such as > enable_* settings. I thought the best solution would be to replace "QUERY PLAN" with "GEQO QUERY PLAN" when GEQO was in use. However, looking at the code, I see no way to do that cleanly. Instead, I added documentation to EXPLAIN to highlight the fact the execution plan will change when GEQO is in use. (I also removed a documentation mention of the pre-7.3 EXPLAIN output behavior.) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/ref/explain.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/explain.sgml,v retrieving revision 1.35 diff -c -c -r1.35 explain.sgml *** doc/src/sgml/ref/explain.sgml 4 Jan 2005 00:39:53 -0000 1.35 --- doc/src/sgml/ref/explain.sgml 20 Jan 2006 16:18:53 -0000 *************** *** 151,161 **** </para> <para> ! Prior to <productname>PostgreSQL</productname> 7.3, the plan was ! emitted in the form of a <literal>NOTICE</literal> message. Now it ! appears as a query result (formatted like a table with a single ! text column). </para> </refsect1> <refsect1> --- 151,162 ---- </para> <para> ! Genetic query optimization (<acronym>GEQO</acronym>) randomly ! tests execution plans. Therefore, when the number of tables ! exceeds <varname>geqo</> and genetic query optimization is in use, ! the execution plan will change each time the statement is executed. </para> + </refsect1> <refsect1>
Bruce Momjian <pgman@candle.pha.pa.us> writes: > <para> > ! Genetic query optimization (<acronym>GEQO</acronym>) randomly > ! tests execution plans. Therefore, when the number of tables > ! exceeds <varname>geqo</> and genetic query optimization is in use, > ! the execution plan will change each time the statement is executed. > </para> geqo_threshold, please --- geqo is a boolean. Possibly better wording: Therefore, when the number of tables exceeds geqo_threshold causing genetic query optimization to be used, the execution plan is likely to change each time the statement is executed. regards, tom lane
Done, and paragraph added to 8.1.X. (7.3 mention retained for 8.1.X.) --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > <para> > > ! Genetic query optimization (<acronym>GEQO</acronym>) randomly > > ! tests execution plans. Therefore, when the number of tables > > ! exceeds <varname>geqo</> and genetic query optimization is in use, > > ! the execution plan will change each time the statement is executed. > > </para> > > geqo_threshold, please --- geqo is a boolean. > > Possibly better wording: Therefore, when the number of tables exceeds > geqo_threshold causing genetic query optimization to be used, the > execution plan is likely to change each time the statement is executed. > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073