Thread: Extremely irregular query performance

Extremely irregular query performance

From
Jean-Philippe Côté
Date:
Hi,

I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's)
with 4Gb of RAM. I have recently noticed that the performance of
some more complex queries is extremely variable and irregular.
For example, I currently have a query that returns a small number
of rows (5) by joining a dozen of tables. Below are the running times
obtained by repeatedly lauching this query in psql (nothing else
was running on the server at that time):

Time: 424.848 ms
Time: 1615.143 ms
Time: 15036.475 ms
Time: 83471.683 ms
Time: 163.224 ms
Time: 2454.939 ms
Time: 188.093 ms
Time: 158.071 ms
Time: 192.431 ms
Time: 195.076 ms
Time: 635.739 ms
Time: 164549.902 ms

As you can see, the performance is most of the time pretty good (less
than 1 second), but every fourth of fifth time I launch the query
the server seems to go into orbit. For the longer running times,
I can see from 'top' that the server process uses almost 100% of
a CPU.

This is rather worrisome, as I cannot be confident of the overall performance
of my application with so much variance in query response times.

I suspect a configuration problem related to the cache mechanism
(shared_buffers? effective_cache_size?), but to be honest I do not know
where to start to diagnose it. I also noticed that the query plan
can vary when the same query is launched two times in a row (with
no other changes to the DB in between). Is there a random aspect to
the query optimizer that could explain some of the observed variance
in performance ?

Any help would be greatly appreciated.

Thanks in advance,

J-P




Re: Extremely irregular query performance

From
Tom Lane
Date:
=?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes:
> I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's)
> with 4Gb of RAM. I have recently noticed that the performance of
> some more complex queries is extremely variable and irregular.
> For example, I currently have a query that returns a small number
> of rows (5) by joining a dozen of tables.

A dozen tables?  You're exceeding the geqo_threshold and getting a plan
that has some randomness in it.  You could either increase
geqo_threshold if you can stand the extra planning time, or try
increasing geqo_effort to get it to search a little harder and hopefully
find a passable plan more often.  See

http://www.postgresql.org/docs/8.1/static/geqo.html
http://www.postgresql.org/docs/8.1/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-GEQO

I'm kinda surprised that you don't get better results with the default
settings.  We could tinker some more with the defaults, if you can
provide evidence about better values ...

            regards, tom lane

Re: Extremely irregular query performance

From
Scott Marlowe
Date:
On Wed, 2006-01-11 at 16:37, Jean-Philippe Côté wrote:
> Hi,
>
> I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's)
> with 4Gb of RAM. I have recently noticed that the performance of
> some more complex queries is extremely variable and irregular.
> For example, I currently have a query that returns a small number
> of rows (5) by joining a dozen of tables. Below are the running times
> obtained by repeatedly lauching this query in psql (nothing else
> was running on the server at that time):
>
> Time: 424.848 ms
> Time: 1615.143 ms
> Time: 15036.475 ms
> Time: 83471.683 ms
> Time: 163.224 ms
> Time: 2454.939 ms
> Time: 188.093 ms
> Time: 158.071 ms
> Time: 192.431 ms
> Time: 195.076 ms
> Time: 635.739 ms
> Time: 164549.902 ms
>
> As you can see, the performance is most of the time pretty good (less
> than 1 second), but every fourth of fifth time I launch the query
> the server seems to go into orbit. For the longer running times,
> I can see from 'top' that the server process uses almost 100% of
> a CPU.

As mentioned earlier, it could be you're exceeding the GEQO threshold.

It could also be that you are doing just enough else at the time, and
have your shared buffers or sort mem high enough that you're initiating
a swap storm.

Mind posting all the parts of your postgresql.conf file you've changed
from the default?

Re: Extremely irregular query performance

From
Jean-Philippe Côté
Date:
Thanks a lot for this info, I was indeed exceeding the genetic
optimizer's threshold.  Now that it is turned off, I get
a very stable response time of 435ms (more or less 5ms) for
the same query. It is about three times slower than the best
I got with the genetic optimizer on, but the overall average
is much lower.

I'll also try to play with the geqo parameters and see if things
improve.

Thanks again,

J-P


-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tom Lane
Sent: January 11, 2006 6:03 PM
To: Jean-Philippe Côté
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Extremely irregular query performance

=?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes:
> I'm running version 8.1 on a dedicated Sun v20 server (2 AMD x64's)
> with 4Gb of RAM. I have recently noticed that the performance of
> some more complex queries is extremely variable and irregular.
> For example, I currently have a query that returns a small number
> of rows (5) by joining a dozen of tables.

A dozen tables?  You're exceeding the geqo_threshold and getting a plan
that has some randomness in it.  You could either increase
geqo_threshold if you can stand the extra planning time, or try
increasing geqo_effort to get it to search a little harder and hopefully
find a passable plan more often.  See

http://www.postgresql.org/docs/8.1/static/geqo.html
http://www.postgresql.org/docs/8.1/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-GEQO

I'm kinda surprised that you don't get better results with the default
settings.  We could tinker some more with the defaults, if you can
provide evidence about better values ...

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


Re: Extremely irregular query performance

From
Mark Lewis
Date:
If this is a query that will be executed more than once, you can also
avoid incurring the planning overhead multiple times by using PREPARE.

-- Mark Lewis

On Wed, 2006-01-11 at 18:50 -0500, Jean-Philippe Côté wrote:
> Thanks a lot for this info, I was indeed exceeding the genetic
> optimizer's threshold.  Now that it is turned off, I get
> a very stable response time of 435ms (more or less 5ms) for
> the same query. It is about three times slower than the best
> I got with the genetic optimizer on, but the overall average
> is much lower.
>
> I'll also try to play with the geqo parameters and see if things
> improve.
>
> Thanks again,
>
> J-P


Re: Extremely irregular query performance

From
Tom Lane
Date:
=?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes:
> Thanks a lot for this info, I was indeed exceeding the genetic
> optimizer's threshold.  Now that it is turned off, I get
> a very stable response time of 435ms (more or less 5ms) for
> the same query. It is about three times slower than the best
> I got with the genetic optimizer on, but the overall average
> is much lower.

Hmm.  It would be interesting to use EXPLAIN ANALYZE to confirm that the
plan found this way is the same as the best plan found by GEQO, and
the extra couple hundred msec is the price you pay for the exhaustive
plan search.  If GEQO is managing to find a plan better than the regular
planner then we need to look into why ...

            regards, tom lane

Re: Extremely irregular query performance

From
Simon Riggs
Date:
On Wed, 2006-01-11 at 22:23 -0500, Tom Lane wrote:
> =?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes:
> > Thanks a lot for this info, I was indeed exceeding the genetic
> > optimizer's threshold.  Now that it is turned off, I get
> > a very stable response time of 435ms (more or less 5ms) for
> > the same query. It is about three times slower than the best
> > I got with the genetic optimizer on, but the overall average
> > is much lower.
>
> Hmm.  It would be interesting to use EXPLAIN ANALYZE to confirm that the
> plan found this way is the same as the best plan found by GEQO, and
> the extra couple hundred msec is the price you pay for the exhaustive
> plan search.  If GEQO is managing to find a plan better than the regular
> planner then we need to look into why ...

It seems worth noting in the EXPLAIN whether GEQO has been used to find
the plan, possibly along with other factors influencing the plan such as
enable_* settings.

Best Regards, Simon Riggs


Re: Extremely irregular query performance

From
Jean-Philippe Cote
Date:
Can I actully know whether a given plan is excuted with GEQO on ?
In other words, if I launch 'explain <query>', I'll get a given plan, but if I re-launch
the <query> (withtout the 'explain' keyword), could I get a different
plan given that GEQO induces some randomness ?

>Is it the plan that is different in the fastest case with GEQO or is it
>the time needed to plan that is causing the GEQO to beat the exhaustive
>search?



Re: Extremely irregular query performance

From
Bruce Momjian
Date:
Jean-Philippe Cote wrote:
>
>
> Can I actully know whether a given plan is excuted with GEQO on ?
> In other words, if I launch 'explain <query>', I'll get a given plan, but if I re-launch
> the <query> (withtout the 'explain' keyword), could I get a different
> plan given that GEQO induces some randomness ?
>
> >Is it the plan that is different in the fastest case with GEQO or is it
> >the time needed to plan that is causing the GEQO to beat the exhaustive
> >search?

Yes, is it likely that when using GEQO you would get a different plan
each time, so running it with and without EXPLAIN would produce
different plans.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Extremely irregular query performance

From
Kenneth Marshall
Date:
On Thu, Jan 12, 2006 at 09:48:41AM +0000, Simon Riggs wrote:
> On Wed, 2006-01-11 at 22:23 -0500, Tom Lane wrote:
> > =?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes:
> > > Thanks a lot for this info, I was indeed exceeding the genetic
> > > optimizer's threshold.  Now that it is turned off, I get
> > > a very stable response time of 435ms (more or less 5ms) for
> > > the same query. It is about three times slower than the best
> > > I got with the genetic optimizer on, but the overall average
> > > is much lower.
> >
> > Hmm.  It would be interesting to use EXPLAIN ANALYZE to confirm that the
> > plan found this way is the same as the best plan found by GEQO, and
> > the extra couple hundred msec is the price you pay for the exhaustive
> > plan search.  If GEQO is managing to find a plan better than the regular
> > planner then we need to look into why ...
>
> It seems worth noting in the EXPLAIN whether GEQO has been used to find
> the plan, possibly along with other factors influencing the plan such as
> enable_* settings.
>

Is it the plan that is different in the fastest case with GEQO or is it
the time needed to plan that is causing the GEQO to beat the exhaustive
search?

Ken


Re: Extremely irregular query performance

From
Kenneth Marshall
Date:
On Thu, Jan 12, 2006 at 03:23:14PM -0500, Jean-Philippe Cote wrote:
>
>
> Can I actully know whether a given plan is excuted with GEQO on ?
> In other words, if I launch 'explain <query>', I'll get a given plan, but if I re-launch
> the <query> (withtout the 'explain' keyword), could I get a different
> plan given that GEQO induces some randomness ?
>
> >Is it the plan that is different in the fastest case with GEQO or is it
> >the time needed to plan that is causing the GEQO to beat the exhaustive
> >search?
>
GEQO will be used if the number of joins is over the GEQO limit in
the configuration file. The GEQO process is an iterative random
process to find an query plan. The EXPLAIN results are the plan for that
query, but not neccessarily for subsequent runs. GEQO's advantage is a
much faster plan time than the exhaustive search method normally used.
If the resulting plan time is less than the exhaustive search plan time,
for short queries you can have the GECO run more quickly than the
exhaustive search result. Of course, if you PREPARE the query the plan
time drops out.

Ken

Re: Extremely irregular query performance

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Wed, 2006-01-11 at 22:23 -0500, Tom Lane wrote:
> > =?iso-8859-1?Q?Jean-Philippe_C=F4t=E9?= <jean-philippe.cote@crt.umontreal.ca> writes:
> > > Thanks a lot for this info, I was indeed exceeding the genetic
> > > optimizer's threshold.  Now that it is turned off, I get
> > > a very stable response time of 435ms (more or less 5ms) for
> > > the same query. It is about three times slower than the best
> > > I got with the genetic optimizer on, but the overall average
> > > is much lower.
> >
> > Hmm.  It would be interesting to use EXPLAIN ANALYZE to confirm that the
> > plan found this way is the same as the best plan found by GEQO, and
> > the extra couple hundred msec is the price you pay for the exhaustive
> > plan search.  If GEQO is managing to find a plan better than the regular
> > planner then we need to look into why ...
>
> It seems worth noting in the EXPLAIN whether GEQO has been used to find
> the plan, possibly along with other factors influencing the plan such as
> enable_* settings.

I thought the best solution would be to replace "QUERY PLAN" with "GEQO
QUERY PLAN" when GEQO was in use.  However, looking at the code, I see
no way to do that cleanly.

Instead, I added documentation to EXPLAIN to highlight the fact the
execution plan will change when GEQO is in use.

(I also removed a documentation mention of the pre-7.3 EXPLAIN output
behavior.)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/explain.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/explain.sgml,v
retrieving revision 1.35
diff -c -c -r1.35 explain.sgml
*** doc/src/sgml/ref/explain.sgml    4 Jan 2005 00:39:53 -0000    1.35
--- doc/src/sgml/ref/explain.sgml    20 Jan 2006 16:18:53 -0000
***************
*** 151,161 ****
    </para>

    <para>
!    Prior to <productname>PostgreSQL</productname> 7.3, the plan was
!    emitted in the form of a <literal>NOTICE</literal> message.  Now it
!    appears as a query result (formatted like a table with a single
!    text column).
    </para>
   </refsect1>

   <refsect1>
--- 151,162 ----
    </para>

    <para>
!    Genetic query optimization (<acronym>GEQO</acronym>) randomly
!    tests execution plans.  Therefore, when the number of tables
!    exceeds <varname>geqo</> and genetic query optimization is in use,
!    the execution plan will change each time the statement is executed.
    </para>
+
   </refsect1>

   <refsect1>

Re: Extremely irregular query performance

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>     <para>
> !    Genetic query optimization (<acronym>GEQO</acronym>) randomly
> !    tests execution plans.  Therefore, when the number of tables
> !    exceeds <varname>geqo</> and genetic query optimization is in use,
> !    the execution plan will change each time the statement is executed.
>     </para>

geqo_threshold, please --- geqo is a boolean.

Possibly better wording:  Therefore, when the number of tables exceeds
geqo_threshold causing genetic query optimization to be used, the
execution plan is likely to change each time the statement is executed.

            regards, tom lane

Re: Extremely irregular query performance

From
Bruce Momjian
Date:
Done, and paragraph added to 8.1.X.  (7.3 mention retained for 8.1.X.)

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >     <para>
> > !    Genetic query optimization (<acronym>GEQO</acronym>) randomly
> > !    tests execution plans.  Therefore, when the number of tables
> > !    exceeds <varname>geqo</> and genetic query optimization is in use,
> > !    the execution plan will change each time the statement is executed.
> >     </para>
>
> geqo_threshold, please --- geqo is a boolean.
>
> Possibly better wording:  Therefore, when the number of tables exceeds
> geqo_threshold causing genetic query optimization to be used, the
> execution plan is likely to change each time the statement is executed.
>
>             regards, tom lane
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073