Thread: What happens between end of explain analyze and end of query execution ?

What happens between end of explain analyze and end of query execution ?

From
Franck Routier
Date:
Hi,

I have query that should be quick, and used to be quick, but is not
anymore... Explain analyze can be seen here http://explain.depesz.com/s/cpV
but it is fundamentaly quick : Total runtime: 0.545 ms.

But query execution takes 3.6 seconds. Only 12 rows are returned. Adding
a limit 1 has no influence.

Postgresql client and server are on the same server, on localhost.

I wonder where my 3.5 seconds are going....

Notice that there is no fancy configuration to tell Postgresql to
evaluate every query plan on earth, that is gecoxxxx settings are on
their default values.

Any idea on what can be happening that takes so long ?

Franck


Attachment

Re: What happens between end of explain analyze and end of query execution ?

From
Nikolas Everett
Date:
We had a problem where the actual query planning time blew up and took way more time then the query execution.  We reproduced the problem by forming a new connection and then just explaining the query.  If that takes more than a couple of milliseconds you likely have the problem.  The second plan was fast.

We then dtraced the backend process for a new connection and found that opening files had become super slow.  In our case this was caused by running the database on nfs.

Good luck,

Nik


On Thu, Apr 4, 2013 at 10:48 AM, Franck Routier <franck.routier@axege.com> wrote:
Hi,

I have query that should be quick, and used to be quick, but is not anymore... Explain analyze can be seen here http://explain.depesz.com/s/cpV
but it is fundamentaly quick : Total runtime: 0.545 ms.

But query execution takes 3.6 seconds. Only 12 rows are returned. Adding a limit 1 has no influence.

Postgresql client and server are on the same server, on localhost.

I wonder where my 3.5 seconds are going....

Notice that there is no fancy configuration to tell Postgresql to evaluate every query plan on earth, that is gecoxxxx settings are on their default values.

Any idea on what can be happening that takes so long ?

Franck


Re: What happens between end of explain analyze and end of query execution ?

From
Franck Routier
Date:
Right, explain alone takes 3.6 seconds, so the time really seems to go query planning...


Le 04/04/2013 17:01, Nikolas Everett a écrit :
We had a problem where the actual query planning time blew up and took way more time then the query execution.  We reproduced the problem by forming a new connection and then just explaining the query.  If that takes more than a couple of milliseconds you likely have the problem.  The second plan was fast.

We then dtraced the backend process for a new connection and found that opening files had become super slow.  In our case this was caused by running the database on nfs.

Good luck,

Nik


On Thu, Apr 4, 2013 at 10:48 AM, Franck Routier <franck.routier@axege.com> wrote:
Hi,

I have query that should be quick, and used to be quick, but is not anymore... Explain analyze can be seen here http://explain.depesz.com/s/cpV
but it is fundamentaly quick : Total runtime: 0.545 ms.

But query execution takes 3.6 seconds. Only 12 rows are returned. Adding a limit 1 has no influence.

Postgresql client and server are on the same server, on localhost.

I wonder where my 3.5 seconds are going....

Notice that there is no fancy configuration to tell Postgresql to evaluate every query plan on earth, that is gecoxxxx settings are on their default values.

Any idea on what can be happening that takes so long ?

Franck



Attachment
Franck Routier <franck.routier@axege.com> writes:
> Right, explain alone takes 3.6 seconds, so the time really seems to go
> query planning...

Well, you've not shown us the query, so it's all going to be
speculation.  But maybe you have some extremely expensive function that
the planner is evaluating to fold to a constant, or something like that?
The generated plan isn't terribly complicated, but we can't see what
was required to produce it.

            regards, tom lane


Re: What happens between end of explain analyze and end of query execution ?

From
Franck Routier
Date:
Le 04/04/2013 18:25, Tom Lane a écrit :
Franck Routier <franck.routier@axege.com> writes:
Right, explain alone takes 3.6 seconds, so the time really seems to go 
query planning...
Well, you've not shown us the query, so it's all going to be
speculation.  But maybe you have some extremely expensive function that
the planner is evaluating to fold to a constant, or something like that?
The generated plan isn't terribly complicated, but we can't see what
was required to produce it.
		regards, tom lane


The request is not using any function. It looks like this:

SELECT *
         FROM sanrss
         LEFT JOIN sanrum  ON sanrum.sanrum___rforefide = sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide = sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
            LEFT JOIN sanact  ON sanact.sanact___rforefide = sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide = sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside = sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide = sanrum.sanrumide AND sanact.sanact___sanrumide   IS NOT NULL AND sanact.sanact___rsanopide='CCAM'
            LEFT JOIN sandia  ON sandia.sandia___rforefide = sanrum.sanrum___rforefide AND sandia.sandia___rfovsnide = sanrum.sanrum___rfovsnide AND sandia.sandia___sanrsside = sanrum.sanrum___sanrsside AND sandia.sandia___sanrumide = sanrum.sanrumide AND sandia.sandiasig=1
            LEFT JOIN saneds  ON sanrss.sanrss___rforefide = saneds.saneds___rforefide AND sanrss.sanrss___rfovsnide = saneds.saneds___rfovsnide AND sanrss.sanrss___sanedside = saneds.sanedside
            LEFT JOIN rsaidp  ON saneds.saneds___rforefide = rsaidp.rsaidp___rforefide AND saneds.saneds___rsaidpide = rsaidp.rsaidpide
              WHERE sanrss.sanrss___rforefide =  'CHCL'  AND sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
         ORDER BY sanrum.sanrumord, sanrum.sanrumide


Schema looks like this :
rsaidp
 |
 v
sanrss --------
 |            |
 v            v
sanrum       sandia
 |
 v
sanact


Primary keys are most made of several varchars. Foreign keys do exist. Query is getting these data for one specific sanrss.
This used to take around 50ms to execute, and is now taking 3.5 seconds. And it looks like this is spent computing a query plan...

I also tried: PREPARE qry(id) as select ....
The prepare takes 3.5 seconds. Execute qry(value) takes a few milliseconds...

Regards,
Franck
Attachment

Re: What happens between end of explain analyze and end of query execution ?

From
nik9000@gmail.com
Date:


On Apr 4, 2013, at 2:49 PM, Franck Routier <franck.routier@axege.com> wrote:

Le 04/04/2013 18:25, Tom Lane a écrit :
Franck Routier <franck.routier@axege.com> writes:
Right, explain alone takes 3.6 seconds, so the time really seems to go 
query planning...
Well, you've not shown us the query, so it's all going to be
speculation.  But maybe you have some extremely expensive function that
the planner is evaluating to fold to a constant, or something like that?
The generated plan isn't terribly complicated, but we can't see what
was required to produce it.
		regards, tom lane


The request is not using any function. It looks like this:

SELECT *
         FROM sanrss
         LEFT JOIN sanrum  ON sanrum.sanrum___rforefide = sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide = sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
            LEFT JOIN sanact  ON sanact.sanact___rforefide = sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide = sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside = sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide = sanrum.sanrumide AND sanact.sanact___sanrumide   IS NOT NULL AND sanact.sanact___rsanopide='CCAM'
            LEFT JOIN sandia  ON sandia.sandia___rforefide = sanrum.sanrum___rforefide AND sandia.sandia___rfovsnide = sanrum.sanrum___rfovsnide AND sandia.sandia___sanrsside = sanrum.sanrum___sanrsside AND sandia.sandia___sanrumide = sanrum.sanrumide AND sandia.sandiasig=1
            LEFT JOIN saneds  ON sanrss.sanrss___rforefide = saneds.saneds___rforefide AND sanrss.sanrss___rfovsnide = saneds.saneds___rfovsnide AND sanrss.sanrss___sanedside = saneds.sanedside
            LEFT JOIN rsaidp  ON saneds.saneds___rforefide = rsaidp.rsaidp___rforefide AND saneds.saneds___rsaidpide = rsaidp.rsaidpide
              WHERE sanrss.sanrss___rforefide =  'CHCL'  AND sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
         ORDER BY sanrum.sanrumord, sanrum.sanrumide


Schema looks like this :
rsaidp
 |
 v
sanrss --------
 |            |
 v            v
sanrum       sandia
 |
 v
sanact


Primary keys are most made of several varchars. Foreign keys do exist. Query is getting these data for one specific sanrss.
This used to take around 50ms to execute, and is now taking 3.5 seconds. And it looks like this is spent computing a query plan...

I also tried: PREPARE qry(id) as select ....
The prepare takes 3.5 seconds. Execute qry(value) takes a few milliseconds...

Regards,
Franck

Is it only this query or all queries?
Franck Routier <franck.routier@axege.com> writes:
> The request is not using any function. It looks like this:
> [ unexciting query ]

Hmph.  Can't see any reason for that to take a remarkably long time to
plan.  Can you put together a self-contained test case demonstrating
excessive planning time?  What PG version is this, anyway?

            regards, tom lane


Re: What happens between end of explain analyze and end of query execution ?

From
Merlin Moncure
Date:
On Thu, Apr 4, 2013 at 9:48 AM, Franck Routier <franck.routier@axege.com> wrote:
> Hi,
>
> I have query that should be quick, and used to be quick, but is not
> anymore... Explain analyze can be seen here http://explain.depesz.com/s/cpV
> but it is fundamentaly quick : Total runtime: 0.545 ms.
>
> But query execution takes 3.6 seconds. Only 12 rows are returned. Adding a
> limit 1 has no influence.
>
> Postgresql client and server are on the same server, on localhost.
>
> I wonder where my 3.5 seconds are going....

Also, 3.6 seconds according to what exactly?  For example if your 12
rows contain megabytes of bytea data that would be a possible cause
(albeit unlikely) since explain doesn't include network transfer time.

merlin


Re: What happens between end of explain analyze and end of query execution ?

From
Franck Routier
Date:
Le 04/04/2013 21:08, Tom Lane a écrit :
> Franck Routier <franck.routier@axege.com> writes:
>> The request is not using any function. It looks like this:
>> [ unexciting query ]
> Hmph.  Can't see any reason for that to take a remarkably long time to
> plan.  Can you put together a self-contained test case demonstrating
> excessive planning time?  What PG version is this, anyway?
>
>             regards, tom lane
>
>
Well, I don't know how to reproduce, as it is really only happening on
this database.

What I notice is that removing joins has a huge impact on the time
explain takes to return:

The full query takes 2.6 seconds to return. Notice it has dropped from
3.6 seconds to 2.6 since yesterday after I did a vacuum analyze on the
tables that go into the request.

EXPLAIN SELECT *
          FROM sanrss
          LEFT JOIN sanrum  ON sanrum.sanrum___rforefide =
sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide =
sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
             LEFT JOIN sanact  ON sanact.sanact___rforefide =
sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide =
sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside =
sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide =
sanrum.sanrumide AND sanact.sanact___sanrumide   IS NOT NULL AND
sanact.sanact___rsanopide='CCAM'
             LEFT JOIN sandia  ON sandia.sandia___rforefide =
sanrum.sanrum___rforefide AND sandia.sandia___rfovsnide =
sanrum.sanrum___rfovsnide AND sandia.sandia___sanrsside =
sanrum.sanrum___sanrsside AND sandia.sandia___sanrumide =
sanrum.sanrumide AND sandia.sandiasig=1
             LEFT JOIN saneds  ON sanrss.sanrss___rforefide =
saneds.saneds___rforefide AND sanrss.sanrss___rfovsnide =
saneds.saneds___rfovsnide AND sanrss.sanrss___sanedside = saneds.sanedside
             LEFT JOIN rsaidp  ON saneds.saneds___rforefide =
rsaidp.rsaidp___rforefide AND saneds.saneds___rsaidpide = rsaidp.rsaidpide
               WHERE sanrss.sanrss___rforefide =  'CHCL' AND
sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
          ORDER BY sanrum.sanrumord, sanrum.sanrumide

==> 2.6 seconds

If I remove the join on either table 'sandia' or table 'saneds', the
explain return in 1.2 seconds. If I remove both, explain returns in 48ms.

EXPLAIN SELECT *
          FROM sanrss
          LEFT JOIN sanrum  ON sanrum.sanrum___rforefide =
sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide =
sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
             LEFT JOIN sanact  ON sanact.sanact___rforefide =
sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide =
sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside =
sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide =
sanrum.sanrumide AND sanact.sanact___sanrumide   IS NOT NULL AND
sanact.sanact___rsanopide='CCAM'
             LEFT JOIN saneds  ON sanrss.sanrss___rforefide =
saneds.saneds___rforefide AND sanrss.sanrss___rfovsnide =
saneds.saneds___rfovsnide AND sanrss.sanrss___sanedside = saneds.sanedside
             LEFT JOIN rsaidp  ON saneds.saneds___rforefide =
rsaidp.rsaidp___rforefide AND saneds.saneds___rsaidpide = rsaidp.rsaidpide
               WHERE sanrss.sanrss___rforefide =  'CHCL' AND
sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
          ORDER BY sanrum.sanrumord, sanrum.sanrumide

==> 1.2 seconds

EXPLAIN SELECT *
          FROM sanrss
          LEFT JOIN sanrum  ON sanrum.sanrum___rforefide =
sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide =
sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
             LEFT JOIN sanact  ON sanact.sanact___rforefide =
sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide =
sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside =
sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide =
sanrum.sanrumide AND sanact.sanact___sanrumide   IS NOT NULL AND
sanact.sanact___rsanopide='CCAM'
             LEFT JOIN sandia  ON sandia.sandia___rforefide =
sanrum.sanrum___rforefide AND sandia.sandia___rfovsnide =
sanrum.sanrum___rfovsnide AND sandia.sandia___sanrsside =
sanrum.sanrum___sanrsside AND sandia.sandia___sanrumide =
sanrum.sanrumide AND sandia.sandiasig=1
               WHERE sanrss.sanrss___rforefide =  'CHCL' AND
sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
          ORDER BY sanrum.sanrumord, sanrum.sanrumide

==> 1.2 seconds

EXPLAIN SELECT *
          FROM sanrss
          LEFT JOIN sanrum  ON sanrum.sanrum___rforefide =
sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide =
sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
             LEFT JOIN sanact  ON sanact.sanact___rforefide =
sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide =
sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside =
sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide =
sanrum.sanrumide AND sanact.sanact___sanrumide   IS NOT NULL AND
sanact.sanact___rsanopide='CCAM'
               WHERE sanrss.sanrss___rforefide =  'CHCL' AND
sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
          ORDER BY sanrum.sanrumord, sanrum.sanrumide

==> 48 ms

Maybe the statistics tables for sandia and saneds are in a bad shape ?
(don't know how to check this).

Regards,

Franck


Attachment
Franck Routier <franck.routier@axege.com> writes:
> Le 04/04/2013 21:08, Tom Lane a �crit :
>> Hmph.  Can't see any reason for that to take a remarkably long time to
>> plan.  Can you put together a self-contained test case demonstrating
>> excessive planning time?  What PG version is this, anyway?

> What I notice is that removing joins has a huge impact on the time
> explain takes to return:

Hm, kind of looks like it's just taking an unreasonable amount of time
to process each join clause.  What have you got the statistics targets
set to in this database?  What are the datatypes of the join columns?
And (again) what PG version is this exactly?

            regards, tom lane


Re: What happens between end of explain analyze and end of query execution ?

From
Merlin Moncure
Date:
On Fri, Apr 5, 2013 at 8:55 AM, Franck Routier <franck.routier@axege.com> wrote:
> Le 04/04/2013 21:08, Tom Lane a écrit :
>
>> Franck Routier <franck.routier@axege.com> writes:
>>>
>>> The request is not using any function. It looks like this:
>>> [ unexciting query ]
>>
>> Hmph.  Can't see any reason for that to take a remarkably long time to
>> plan.  Can you put together a self-contained test case demonstrating
>> excessive planning time?  What PG version is this, anyway?
>>
>>                         regards, tom lane
>>
>>
> Well, I don't know how to reproduce, as it is really only happening on this
> database.
>
> What I notice is that removing joins has a huge impact on the time explain
> takes to return:
>
> The full query takes 2.6 seconds to return. Notice it has dropped from 3.6
> seconds to 2.6 since yesterday after I did a vacuum analyze on the tables
> that go into the request.
>
> EXPLAIN SELECT *
>
>          FROM sanrss
>          LEFT JOIN sanrum  ON sanrum.sanrum___rforefide =
> sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide =
> sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
>             LEFT JOIN sanact  ON sanact.sanact___rforefide =
> sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide =
> sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside =
> sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide = sanrum.sanrumide
> AND sanact.sanact___sanrumide   IS NOT NULL AND
> sanact.sanact___rsanopide='CCAM'
>             LEFT JOIN sandia  ON sandia.sandia___rforefide =
> sanrum.sanrum___rforefide AND sandia.sandia___rfovsnide =
> sanrum.sanrum___rfovsnide AND sandia.sandia___sanrsside =
> sanrum.sanrum___sanrsside AND sandia.sandia___sanrumide = sanrum.sanrumide
> AND sandia.sandiasig=1
>             LEFT JOIN saneds  ON sanrss.sanrss___rforefide =
> saneds.saneds___rforefide AND sanrss.sanrss___rfovsnide =
> saneds.saneds___rfovsnide AND sanrss.sanrss___sanedside = saneds.sanedside
>             LEFT JOIN rsaidp  ON saneds.saneds___rforefide =
> rsaidp.rsaidp___rforefide AND saneds.saneds___rsaidpide = rsaidp.rsaidpide
>               WHERE sanrss.sanrss___rforefide =  'CHCL' AND
> sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
>          ORDER BY sanrum.sanrumord, sanrum.sanrumide
>
> ==> 2.6 seconds
>
> If I remove the join on either table 'sandia' or table 'saneds', the explain
> return in 1.2 seconds. If I remove both, explain returns in 48ms.
>
> EXPLAIN SELECT *
>
>          FROM sanrss
>          LEFT JOIN sanrum  ON sanrum.sanrum___rforefide =
> sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide =
> sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
>             LEFT JOIN sanact  ON sanact.sanact___rforefide =
> sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide =
> sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside =
> sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide = sanrum.sanrumide
> AND sanact.sanact___sanrumide   IS NOT NULL AND
> sanact.sanact___rsanopide='CCAM'
>             LEFT JOIN saneds  ON sanrss.sanrss___rforefide =
> saneds.saneds___rforefide AND sanrss.sanrss___rfovsnide =
> saneds.saneds___rfovsnide AND sanrss.sanrss___sanedside = saneds.sanedside
>             LEFT JOIN rsaidp  ON saneds.saneds___rforefide =
> rsaidp.rsaidp___rforefide AND saneds.saneds___rsaidpide = rsaidp.rsaidpide
>               WHERE sanrss.sanrss___rforefide =  'CHCL' AND
> sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
>          ORDER BY sanrum.sanrumord, sanrum.sanrumide
>
> ==> 1.2 seconds
>
> EXPLAIN SELECT *
>
>          FROM sanrss
>          LEFT JOIN sanrum  ON sanrum.sanrum___rforefide =
> sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide =
> sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
>             LEFT JOIN sanact  ON sanact.sanact___rforefide =
> sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide =
> sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside =
> sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide = sanrum.sanrumide
> AND sanact.sanact___sanrumide   IS NOT NULL AND
> sanact.sanact___rsanopide='CCAM'
>             LEFT JOIN sandia  ON sandia.sandia___rforefide =
> sanrum.sanrum___rforefide AND sandia.sandia___rfovsnide =
> sanrum.sanrum___rfovsnide AND sandia.sandia___sanrsside =
> sanrum.sanrum___sanrsside AND sandia.sandia___sanrumide = sanrum.sanrumide
> AND sandia.sandiasig=1
>               WHERE sanrss.sanrss___rforefide =  'CHCL' AND
> sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
>          ORDER BY sanrum.sanrumord, sanrum.sanrumide
>
> ==> 1.2 seconds
>
> EXPLAIN SELECT *
>
>          FROM sanrss
>          LEFT JOIN sanrum  ON sanrum.sanrum___rforefide =
> sanrss.sanrss___rforefide AND sanrum.sanrum___rfovsnide =
> sanrss.sanrss___rfovsnide AND sanrum.sanrum___sanrsside = sanrss.sanrsside
>             LEFT JOIN sanact  ON sanact.sanact___rforefide =
> sanrum.sanrum___rforefide AND sanact.sanact___rfovsnide =
> sanrum.sanrum___rfovsnide AND sanact.sanact___sanrsside =
> sanrum.sanrum___sanrsside AND sanact.sanact___sanrumide = sanrum.sanrumide
> AND sanact.sanact___sanrumide   IS NOT NULL AND
> sanact.sanact___rsanopide='CCAM'
>               WHERE sanrss.sanrss___rforefide =  'CHCL' AND
> sanrss.sanrss___rfovsnide =  '201012_600'  AND sanrss.sanrsside =  '1188100'
>          ORDER BY sanrum.sanrumord, sanrum.sanrumide
>
> ==> 48 ms
>
> Maybe the statistics tables for sandia and saneds are in a bad shape ?
> (don't know how to check this).


Ok, "explain" (without analyze) is measuring plan time only (not
execution time).  Can you confirm that's the time we are measuring
(and again, according to what)? Performance issues here are a
different ball game.  Please supply precise version#, there were a
couple of plantime bugs fixed recently.

merlin


Re: What happens between end of explain analyze and end of query execution ?

From
Nikolas Everett
Date:
On Fri, Apr 5, 2013 at 9:55 AM, Franck Routier <franck.routier@axege.com> wrote:
Le 04/04/2013 21:08, Tom Lane a écrit :
Maybe the statistics tables for sandia and saneds are in a bad shape ? (don't know how to check this).

Regards,

Franck



Could this be caused by system table bloat?

Also, can you check how long it takes to plan:
1.  A query without a table at all (SELECT NOW())
2.  A query with an unrelated table

Again, what version of PostgreSQL is this?

Re: What happens between end of explain analyze and end of query execution ?

From
Franck Routier
Date:
Le 05/04/2013 16:17, Tom Lane a écrit :
> Franck Routier <franck.routier@axege.com> writes:
>> Le 04/04/2013 21:08, Tom Lane a écrit :
>>> Hmph.  Can't see any reason for that to take a remarkably long time to
>>> plan.  Can you put together a self-contained test case demonstrating
>>> excessive planning time?  What PG version is this, anyway?
>> What I notice is that removing joins has a huge impact on the time
>> explain takes to return:
> Hm, kind of looks like it's just taking an unreasonable amount of time
> to process each join clause.  What have you got the statistics targets
> set to in this database?  What are the datatypes of the join columns?
> And (again) what PG version is this exactly?
>
>             regards, tom lane
>
>
Postgresql version is 8.4.8.
Value for default_statistics_target is 5000 (maybe this is the problem ?)
Join columns datatypes are all varchar(32).

Also notice that not all joins have equals imppact on the time taken:
removing join on some tables has no impact (sanact, rsaidp), while
removing joins on others (saneds, sandia) has an important effect...

Regards,
Franck


Attachment

Re: What happens between end of explain analyze and end of query execution ?

From
Franck Routier
Date:
Le 05/04/2013 16:17, Merlin Moncure a écrit :
>
> Ok, "explain" (without analyze) is measuring plan time only (not
> execution time).  Can you confirm that's the time we are measuring
> (and again, according to what)? Performance issues here are a
> different ball game.  Please supply precise version#, there were a
> couple of plantime bugs fixed recently.
>
> merlin
Yes, I confirm time is taken by analyze alone. Executing the query is quick.
PG version is 8.4.8.

Franck


Attachment

Re: What happens between end of explain analyze and end of query execution ?

From
Franck Routier
Date:
Le 05/04/2013 16:18, Nikolas Everett a écrit :
On Fri, Apr 5, 2013 at 9:55 AM, Franck Routier <franck.routier@axege.com> wrote:
Le 04/04/2013 21:08, Tom Lane a écrit :
Maybe the statistics tables for sandia and saneds are in a bad shape ? (don't know how to check this).

Regards,

Franck



Could this be caused by system table bloat?

Also, can you check how long it takes to plan:
1.  A query without a table at all (SELECT NOW())
2.  A query with an unrelated table

Again, what version of PostgreSQL is this?

PG 8.4.8
select now() is quick (15ms from pgadmin)
But I can reproduce the problem on other tables : explain on a query with 4 join takes 4.5 seconds on an other set of tables...

System bloat... maybe. Not sure how to check. But as Tom asked, default_statistics_target is 5000. Maybe the problem is here ? What should I look after in pg_statistic to tell is there is a prolem here ?

Regards,
Franck
Attachment

Re: What happens between end of explain analyze and end of query execution ?

From
Franck Routier
Date:
Le 05/04/2013 16:17, Tom Lane a écrit :
> What have you got the statistics targets set to in this database?

Ok, the problem definitely comes from the default_statistics_target
which is obviously too high on the database.

I have experimented with explain on queries with another set of 4 joined
tables.
In my first attempt, explain took more than 4 seconds (!)
Then I have set default_statistics_target to 100 and analyzed the 4
tables. Explain took a few ms.
Restored default_statistics_target to 5000, analyzed again. Explain took
1.8 seconds.

So, I seem to have two related problems: statistics are somewhat bloated
(as re-analyzing with same target takes the explain time from 4 sec down
to 1.8 sec).
And the target is far too high (as default target value take analyse
down to a few ms).

Now... can someone help me understand what happens ? Where can I look
(in pg_stats ?) to see the source of the problem ? maybe a column with a
huge list of different values the palnner has to parse ?


Regards,
Franck


Attachment

Re: What happens between end of explain analyze and end of query execution ?

From
Kevin Grittner
Date:
Franck Routier <franck.routier@axege.com> wrote:

> Ok, the problem definitely comes from the
> default_statistics_target which is obviously too high on the
> database.

> Now... can someone help me understand what happens ? Where can I
> look (in pg_stats ?) to see the source of the problem ? maybe a
> column with a huge list of different values the palnner has to
> parse ?

This is a fundamental issue in query planning -- how much work do
you want to do to try to come up with the best plan?  Too little,
and the plan can be unacceptably slow; too much and you spend more
extra time on planning than the improvement in the plan (if any)
saves you.  Reading and processing statistics gets more expensive
as you boost the volume.

What I would suggest is taking the default_statistics_target for
the cluster back down to the default, and selectviely boosting the
statistics target for individual columns as you find plans which
benefit.  Don't set it right at the edge of the tipping point, but
don't automatically jump to 5000 every time either.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Franck Routier <franck.routier@axege.com> writes:
> Le 05/04/2013 16:17, Tom Lane a �crit :
>> What have you got the statistics targets set to in this database?

> Ok, the problem definitely comes from the default_statistics_target
> which is obviously too high on the database.

Yeah, eqjoinsel() is O(N^2) in the lengths of the MCV lists, in the
worst case where there's little overlap in the list memberships.
The actual cost would depend a lot on the specific column datatypes.

Not sure about your report that re-analyzing with the same stats target
made a significant difference.  It might have been a matter of chance
variation in the sampled MCV list leading to more or fewer matches.

            regards, tom lane


Re: What happens between end of explain analyze and end of query execution ?

From
Franck Routier
Date:
Le 06/04/2013 18:27, Tom Lane a écrit :
>> Ok, the problem definitely comes from the default_statistics_target
>> which is obviously too high on the database.
> Yeah, eqjoinsel() is O(N^2) in the lengths of the MCV lists, in the
> worst case where there's little overlap in the list memberships.
> The actual cost would depend a lot on the specific column datatypes.
>
> Not sure about your report that re-analyzing with the same stats target
> made a significant difference.  It might have been a matter of chance
> variation in the sampled MCV list leading to more or fewer matches.
>
>             regards, tom lane
>
>
Thank you all for your help, I appreciate it really much.

Just a last note: maybe the documentation could draw the attention on
this side effect of high statistics target.
Right now it says:
  "Larger values increase the time needed to do ANALYZE, but might
improve the quality of the planner's estimates"
(http://www.postgresql.org/docs/9.2/static/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET)
and: "Raising the limit might allow more accurate planner estimates to
be made, particularly for columns with irregular data distributions, at
the price of consuming more space in pg_statistic and slightly more time
to compute the estimates."
(http://www.postgresql.org/docs/9.2/static/planner-stats.html).

It could be noted that a too high target can also have a noticeable cost
on query planning.

Best regards,
Franck


Attachment
On Sat, Apr 6, 2013 at 9:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Franck Routier <franck.routier@axege.com> writes:
> Le 05/04/2013 16:17, Tom Lane a écrit :
>> What have you got the statistics targets set to in this database?

> Ok, the problem definitely comes from the default_statistics_target
> which is obviously too high on the database.

Yeah, eqjoinsel() is O(N^2) in the lengths of the MCV lists, in the
worst case where there's little overlap in the list memberships.
The actual cost would depend a lot on the specific column datatypes.


I guess this pre-emptively answers a question I was intending to ask on performance:  Whether anyone increased default_statistics_target and came to regret it.  I had seen several problems fixed by increasing default_statistics_target, but this is the first one I recall caused by increasing it.

Do you think fixing the O(N^2) behavior would be a good to-do item?

Cheers,

Jeff
Jeff Janes <jeff.janes@gmail.com> writes:
> On Sat, Apr 6, 2013 at 9:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, eqjoinsel() is O(N^2) in the lengths of the MCV lists, in the
>> worst case where there's little overlap in the list memberships.

> I guess this pre-emptively answers a question I was intending to ask on
> performance:  Whether anyone increased default_statistics_target and came
> to regret it.  I had seen several problems fixed by increasing
> default_statistics_target, but this is the first one I recall caused by
> increasing it.

I recall having heard some similar complaints before, but not often.

> Do you think fixing the O(N^2) behavior would be a good to-do item?

If you can think of a way to do it that doesn't create new assumptions
that eqjoinsel ought not make (say, that the datatype is sortable).

I guess one possibility is to have a different join selectivity function
for those types that *are* sortable, which would fix the issue for most
commonly used types.

            regards, tom lane