Thread: BUG #6275: Horrible performance regression

BUG #6275: Horrible performance regression

From
"Finlay Thompson"
Date:
The following bug has been logged online:

Bug reference:      6275
Logged by:          Finlay Thompson
Email address:      finlay@dragonfly.co.nz
PostgreSQL version: 8.4.9
Operating system:   Ubuntu 11.04
Description:        Horrible performance regression
Details:

After an upgrade from 8.4.8 to 8.4.9 performance of load script went from ~
3 hours to not finishing after 24 hours.


Context: We have a continuous build script, that loads, grooms, matches data
every few hours. The script has been run over 100 times in the last few
weeks, and has steadily been taking around 3 hours. It is really a group of
scripts, something like 15000 lines of sql, and some python scripts to run
it together.

After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
the script suddenly stopped working, and consuming all the ram (16GB) on the
computer (i7).

I spent about a day trying to diagnose the problem, with some help from
#postgresql, and ran through the script step by step, but was unable to find
any specific issue.

Today I reinstalled the 8.4.8 version, and it built the database in 3:02
hours, as before.

From this, I reckon something is broken in the 8.4.9 release.

I would love to isolate the problem, and am happy to try anything people
suggest.

Sorry I can't be more specific about the problem, but I thought I should
report the issue.

btw: The database uses plpython and postgis.

Regards,
Finlay Thompson

Re: BUG #6275: Horrible performance regression

From
Tom Lane
Date:
"Finlay Thompson" <finlay@dragonfly.co.nz> writes:
> After an upgrade from 8.4.8 to 8.4.9 performance of load script went from ~
> 3 hours to not finishing after 24 hours.

> Context: We have a continuous build script, that loads, grooms, matches data
> every few hours. The script has been run over 100 times in the last few
> weeks, and has steadily been taking around 3 hours. It is really a group of
> scripts, something like 15000 lines of sql, and some python scripts to run
> it together.

> After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
> the script suddenly stopped working, and consuming all the ram (16GB) on the
> computer (i7).

Well, the most likely theory seems to be that one of the planner changes
in 8.4.9 is resulting in a much less optimal plan choice for one or more
queries, but it's impossible to diagnose it on the basis of this much
information.

> btw: The database uses plpython and postgis.

Hmm, did you change postgis versions at the same time?  If so, which
upgrade caused the problem, postgres or postgis?

            regards, tom lane

Re: BUG #6275: Horrible performance regression

From
Finlay Thompson
Date:
Hi Tom

Sorry for the delay.

The version of postgis didn't change, and when I rolled back to
postgresql 8.4.9 the expected performance returned. I am very confident
the regression was caused by the 8.4.9 upgrade.

Any suggestions on how to diagnose a long run script like this ? I can
easily turn on more logging, in both cases and compare ? Can I log the
query plans ?



Finlay






On Fri, 2011-10-28 at 00:53 -0400, Tom Lane wrote:
> "Finlay Thompson" <finlay@dragonfly.co.nz> writes:
> > After an upgrade from 8.4.8 to 8.4.9 performance of load script went from ~
> > 3 hours to not finishing after 24 hours.
>
> > Context: We have a continuous build script, that loads, grooms, matches data
> > every few hours. The script has been run over 100 times in the last few
> > weeks, and has steadily been taking around 3 hours. It is really a group of
> > scripts, something like 15000 lines of sql, and some python scripts to run
> > it together.
>
> > After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
> > the script suddenly stopped working, and consuming all the ram (16GB) on the
> > computer (i7).
>
> Well, the most likely theory seems to be that one of the planner changes
> in 8.4.9 is resulting in a much less optimal plan choice for one or more
> queries, but it's impossible to diagnose it on the basis of this much
> information.
>
> > btw: The database uses plpython and postgis.
>
> Hmm, did you change postgis versions at the same time?  If so, which
> upgrade caused the problem, postgres or postgis?
>
>             regards, tom lane

--
Finlay Thompson
www.dragonfly.co.nz
Dragonfly, PO Box 27535, Wellington 6141, New Zealand
Level 5, 158 Victoria Street, Te Aro, Wellington
M: +64 21 426 878
T: +64 4 385 9285

Re: BUG #6275: Horrible performance regression

From
Robert Haas
Date:
On Sun, Oct 30, 2011 at 10:30 PM, Finlay Thompson
<finlay@dragonfly.co.nz> wrote:
> Hi Tom
>
> Sorry for the delay.
>
> The version of postgis didn't change, and when I rolled back to
> postgresql 8.4.9 the expected performance returned. I am very confident
> the regression was caused by the 8.4.9 upgrade.
>
> Any suggestions on how to diagnose a long run script like this ? I can
> easily turn on more logging, in both cases and compare ? Can I log the
> query plans ?

The easiest thing to do is probably EXPLAIN ANALYZE each query and
post the output to this thread.

If you can't do that for some reason, there's always auto_explain, but
that has some overhead and is a bit more work to set up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6275: Horrible performance regression

From
Guillaume Smet
Date:
On Mon, Oct 31, 2011 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> If you can't do that for some reason, there's always auto_explain, but
> that has some overhead and is a bit more work to set up.

If it's a massive load of data with a lot of queries, I think his best
bet is to log the queries with log_min_duration_statement = <something
sensible> (I'd try to start with 30ms) then analyze the log results
with pgFouine or any other log analyzer for both versions.

--
Guillaume

Re: BUG #6275: Horrible performance regression

From
Mark Kirkwood
Date:
On 28/10/11 15:42, Finlay Thompson wrote:
> After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
> the script suddenly stopped working, and consuming all the ram (16GB) on the
> computer (i7).
>
>

If there is one query exhausting all ram, then it could be tricky to
catch it in the logs (log duration and friends log the query time on
completion... which might be never). If you have vm.overcommit_memory
set to 2 then you may get the offending statement in the log when the
postgres backend errors out with ENOMEM.

If you know the approx time at which memory starts filling up then
getting (most of) the query text from pg_stat_activity would enable you
to track down the offending sql in your load script.

regards

Mark

Re: BUG #6275: Horrible performance regression

From
Finlay Thompson
Date:
Hi Mark


Thanks for your suggestions (and Robert and Guillaume).  I will try the
overcommit_memory=2 idea out in a couple of weeks. At the moment I am
running everything on 8.4.8, which is working great, and have pressing
deadlines looming (next Wednesday).

I'll let you know when I do. I want to find a test case that exhibits a
real postgresql bug, or informs me as to what I have done wrong.


Regards,
Finaly



On Tue, 2011-11-01 at 17:26 +1300, Mark Kirkwood wrote:
> On 28/10/11 15:42, Finlay Thompson wrote:
> > After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9,
> > the script suddenly stopped working, and consuming all the ram (16GB) on the
> > computer (i7).
> >
> >
>
> If there is one query exhausting all ram, then it could be tricky to
> catch it in the logs (log duration and friends log the query time on
> completion... which might be never). If you have vm.overcommit_memory
> set to 2 then you may get the offending statement in the log when the
> postgres backend errors out with ENOMEM.
>
> If you know the approx time at which memory starts filling up then
> getting (most of) the query text from pg_stat_activity would enable you
> to track down the offending sql in your load script.
>
> regards
>
> Mark

--
Finlay Thompson
www.dragonfly.co.nz
Dragonfly, PO Box 27535, Wellington 6141, New Zealand
Level 5, 158 Victoria Street, Te Aro, Wellington
M: +64 21 426 878
T: +64 4 385 9285