Thread: Strange performance degradation

Strange performance degradation

From
Lorenzo Allegrucci
Date:
Hi all,

I'm experiencing a strange behavior with my postgresql 8.3:
performance is degrading after 3/4 days of running time but if I
just restart it performance returns back to it's normal value..
In normal conditions the postgres process uses about 3% of cpu time
but when is in "degraded" conditions it can use up to 25% of cpu time.
The load of my server is composed of many INSERTs on a table, and
many UPDATEs and SELECT on another table, no DELETEs.
I tried to run vacuum by the pg_maintenance script (Debian Lenny)
but it doesn't help. (I have autovacuum off).

So, my main question is.. how can just a plain simple restart of postgres
restore the original performance (3% cpu time)?
I can post my postgresql.conf if needed.
Thank you for your help,

--
Lorenzo

Re: Strange performance degradation

From
"A. Kretschmer"
Date:
In response to Lorenzo Allegrucci :
>
> Hi all,
>
> I'm experiencing a strange behavior with my postgresql 8.3:
> performance is degrading after 3/4 days of running time but if I
> just restart it performance returns back to it's normal value..
> In normal conditions the postgres process uses about 3% of cpu time
> but when is in "degraded" conditions it can use up to 25% of cpu time.
> The load of my server is composed of many INSERTs on a table, and
> many UPDATEs and SELECT on another table, no DELETEs.
> I tried to run vacuum by the pg_maintenance script (Debian Lenny)
> but it doesn't help. (I have autovacuum off).

Bad idea. Really.

>
> So, my main question is.. how can just a plain simple restart of postgres
> restore the original performance (3% cpu time)?

You should enable autovacuum.

And you should run vacuum verbose manually and see the output.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

Re: Strange performance degradation

From
Lorenzo Allegrucci
Date:
A. Kretschmer wrote:
> In response to Lorenzo Allegrucci :
>> Hi all,
>>
>> I'm experiencing a strange behavior with my postgresql 8.3:
>> performance is degrading after 3/4 days of running time but if I
>> just restart it performance returns back to it's normal value..
>> In normal conditions the postgres process uses about 3% of cpu time
>> but when is in "degraded" conditions it can use up to 25% of cpu time.
>> The load of my server is composed of many INSERTs on a table, and
>> many UPDATEs and SELECT on another table, no DELETEs.
>> I tried to run vacuum by the pg_maintenance script (Debian Lenny)
>> but it doesn't help. (I have autovacuum off).
>
> Bad idea. Really.

Why running vacuum by hand is a bad idea?
vacuum doesn't solve anyway, it seems only a plain restart stops the
performance degradation.

>> So, my main question is.. how can just a plain simple restart of postgres
>> restore the original performance (3% cpu time)?
>
> You should enable autovacuum.
>
> And you should run vacuum verbose manually and see the output.

below is the output of vacuum analyze verbose
(NOTE: I've already run vacuum this morning, this is a second run)

DETAIL:  A total of 58224 page slots are in use (including overhead).
58224 page slots are required to track all free space.
Current limits are:  2000000 page slots, 1000 relations, using 11784 kB.

Re: Strange performance degradation

From
Guillaume Cottenceau
Date:
Lorenzo Allegrucci <lorenzo.allegrucci 'at' forinicom.it> writes:

> A. Kretschmer wrote:
>> In response to Lorenzo Allegrucci :
>>> Hi all,
>>>
>>> I'm experiencing a strange behavior with my postgresql 8.3:
>>> performance is degrading after 3/4 days of running time but if I
>>> just restart it performance returns back to it's normal value..
>>> In normal conditions the postgres process uses about 3% of cpu time
>>> but when is in "degraded" conditions it can use up to 25% of cpu time.
>>> The load of my server is composed of many INSERTs on a table, and
>>> many UPDATEs and SELECT on another table, no DELETEs.
>>> I tried to run vacuum by the pg_maintenance script (Debian Lenny)
>>> but it doesn't help. (I have autovacuum off).
>>
>> Bad idea. Really.
>
> Why running vacuum by hand is a bad idea?

It's rather turning autovacuum off which is a bad idea.

> vacuum doesn't solve anyway, it seems only a plain restart stops the
> performance degradation.

Notice: normally, restarting doesn't help for vacuum-related
problems.

Your degradation might come from a big request being intensive on
PG's and OS's caches, resulting in data useful to other requests
getting farther (but it should get back to normal if the big
request is not performed again). And btw, 25% is far from 100% so
response time should be the same if there are no other factors;
you should rather have a look at IOs (top, vmstat, iostat) during
problematic time. How do you measure your degradation, btw?

>>> So, my main question is.. how can just a plain simple restart of postgres
>>> restore the original performance (3% cpu time)?
>>
>> You should enable autovacuum.
>>
>> And you should run vacuum verbose manually and see the output.
>
> below is the output of vacuum analyze verbose
> (NOTE: I've already run vacuum this morning, this is a second run)
>
> DETAIL:  A total of 58224 page slots are in use (including overhead).
> 58224 page slots are required to track all free space.
> Current limits are:  2000000 page slots, 1000 relations, using 11784 kB.

Which means your FSM settings look fine; but doesn't mean your
database is not bloated (and with many UPDATEs and no correct
vacuuming, it should be bloated). One way to know is to restore a
recent backup, issue VACUUM VERBOSE on a table known to be large
and regularly UPDATE's/DELETE'd on both databases (in production,
and on the restore) and compare the reported number of pages
needed. The difference is the potential benefit of running VACUUM
FULL (or CLUSTER) in production (once your DB is bloated, a
normal VACUUM doesn't remove the bloat).

  db_production=# VACUUM VERBOSE table;
      [...]
  INFO:  "table": found 408 removable, 64994 nonremovable row versions in 4395 pages

  db_restored=# VACUUM VERBOSE table;
      [...]
  INFO:  "table": found 0 removable, 64977 nonremovable row versions in 628 pages

In that 628/4395 example, we have 85% bloat in production.

--
Guillaume Cottenceau

Re: [GENERAL] Strange performance degradation

From
Tom Lane
Date:
Lorenzo Allegrucci <lorenzo.allegrucci@forinicom.it> writes:
> So, my main question is.. how can just a plain simple restart of postgres
> restore the original performance (3% cpu time)?

Are you killing off any long-running transactions when you restart?

            regards, tom lane

Re: Strange performance degradation

From
Matthew Wakeling
Date:
On Fri, 20 Nov 2009, Lorenzo Allegrucci wrote:
> performance is degrading...

> In normal conditions the postgres process uses about 3% of cpu time
> but when is in "degraded" conditions it can use up to 25% of cpu time.

You don't really give enough information to determine what is going on
here. This could be one of two situations:

1. You have a constant incoming stream of short-lived requests at a
constant rate, and Postgres is taking eight times as much CPU to service
it as normal. You're looking at CPU usage in aggregate over long periods
of time. In this case, we should look at long running transactions and
other slowdown possibilities.

2. You are running a complex query, and you look at top and see that
Postgres uses eight times as much CPU as when it has been freshly started.
In this case, the "performance degradation" could actually be that the
data is more in cache, and postgres is able to process it eight times
*faster*. Restarting Postgres kills the cache and puts you back at square
one.

Which of these is it?

Matthew

--
 Reality is that which, when you stop believing in it, doesn't go away.
                                              -- Philip K. Dick

Re: [GENERAL] Strange performance degradation

From
Lorenzo Allegrucci
Date:
Brian Modra wrote:
> I had a similar problem: I did a large delete, and then a selct which
> "covered" the previous rows.
> It took ages, because the index still had those deleted rows.
> Possibly the same happens with update.
>
> Try this:
> vacuum analyse
> reindex database ....
> (your database name instead of ...)
>
> or, rather do this table by table:
> vacuum analyse ....
> reindex table ...
>
>
> Autovacuum is a generally good thing.
>
>> So, my main question is.. how can just a plain simple restart of postgres
>> restore the original performance (3% cpu time)?
>
> there were probably some long transactions running. Stopping postgres
> effectively kills them off.

I'll try that, thanks for your help Brian.

Re: [GENERAL] Strange performance degradation

From
Lorenzo Allegrucci
Date:
Tom Lane wrote:
> Lorenzo Allegrucci <lorenzo.allegrucci@forinicom.it> writes:
>> So, my main question is.. how can just a plain simple restart of postgres
>> restore the original performance (3% cpu time)?
>
> Are you killing off any long-running transactions when you restart?

After three days of patient waiting it looks like the common
'<IDLE> in transaction' problem..

[sorry for >80 cols]

19329 ?        S     15:54 /usr/lib/postgresql/8.3/bin/postgres -D /var/lib/postgresql/8.3/main -c
config_file=/etc/postgresql/8.3/main/postgresql.conf
19331 ?        Ss     3:40  \_ postgres: writer process
19332 ?        Ss     0:42  \_ postgres: wal writer process
19333 ?        Ss    15:01  \_ postgres: stats collector process
19586 ?        Ss   114:00  \_ postgres: forinicom weadmin [local] idle
20058 ?        Ss     0:00  \_ postgres: forinicom weadmin [local] idle
13136 ?        Ss     0:00  \_ postgres: forinicom weadmin 192.168.4.253(43721) idle in transaction

My app is a Django webapp, maybe there's some bug in the Django+psycopg2 stack?

Anyway, how can I get rid those "idle in transaction" processes?
Can I just kill -15 them or is there a less drastic way to do it?

Re: [GENERAL] Strange performance degradation

From
Tom Lane
Date:
Bill Moran <wmoran@potentialtech.com> writes:
> In response to Lorenzo Allegrucci <lorenzo.allegrucci@forinicom.it>:
>> Tom Lane wrote:
>>> Are you killing off any long-running transactions when you restart?

>> Anyway, how can I get rid those "idle in transaction" processes?
>> Can I just kill -15 them or is there a less drastic way to do it?

> Connections idle in transaction do not cause performance problems simply
> by being there, at least not when there are so few.

The idle transaction doesn't eat resources in itself.  What it does do
is prevent VACUUM from reclaiming dead rows that are recent enough that
they could still be seen by the idle transaction.  The described
behavior sounds to me like other transactions are wasting lots of cycles
scanning through dead-but-not-yet-reclaimed rows.  There are some other
things that also get slower as the window between oldest and newest
active XID gets wider.

(8.4 alleviates this problem in many cases, but the OP said he was
running 8.3.)

> If you -TERM them, any uncommitted data will be rolled back, which may
> not be what you want.  Don't -KILL them, that will upset the postmaster.

-TERM isn't an amazingly safe thing either in 8.3.  Don't you have a way
to kill the client-side sessions?

> My answer to your overarching question is that you need to dig deeper to
> find the real cause of your problem, you're just starting to isolate it.

Agreed, what you really want to do is find and fix the transaction leak
on the client side.

            regards, tom lane

Re: [GENERAL] Strange performance degradation

From
Matthew Wakeling
Date:
On Mon, 23 Nov 2009, Lorenzo Allegrucci wrote:
> Anyway, how can I get rid those "idle in transaction" processes?
> Can I just kill -15 them or is there a less drastic way to do it?

Are you crazy? Sure, if you want to destroy all of the changes made to the
database in that transaction and thoroughly confuse the client
application, you can send a TERM signal to a backend, but the consequences
to your data are on your own head.

Fix the application, don't tell Postgres to stop being a decent database.

Matthew

--
 I would like to think that in this day and age people would know better than
 to open executables in an e-mail. I'd also like to be able to flap my arms
 and fly to the moon.                                    -- Tim Mullen

Re: [GENERAL] Strange performance degradation

From
Lorenzo Allegrucci
Date:
Matthew Wakeling wrote:
> On Mon, 23 Nov 2009, Lorenzo Allegrucci wrote:
>> Anyway, how can I get rid those "idle in transaction" processes?
>> Can I just kill -15 them or is there a less drastic way to do it?
>
> Are you crazy? Sure, if you want to destroy all of the changes made to
> the database in that transaction and thoroughly confuse the client
> application, you can send a TERM signal to a backend, but the
> consequences to your data are on your own head.

I'm not crazy, it was just a question..
Anyway, problem solved in the Django application.


Re: Strange performance degradation

From
Matthew Wakeling
Date:
On Tue, 24 Nov 2009, Denis Lussier wrote:
> IMHO the client application is already confused and it's in Prod.
> Shouldn't he perhaps terminate/abort the IDLE connections in Prod and
> work on correcting the problem so it doesn't occur in Dev/Test??

The problem is, the connection isn't just IDLE - it is idle IN
TRANSACTION. This means that there is quite possibly some data that has
been modified in that transaction. If you kill the backend, then that will
automatically roll back the transaction, and all of those changes would be
lost.

I agree that correcting the problem in dev/test is the priority, but I
would be very cautious about killing transactions in production. You don't
know what data is uncommitted. The safest thing to do may be to bounce the
application, rather than Postgres.

Matthew

--
 All of this sounds mildly turgid and messy and confusing... but what the
 heck. That's what programming's all about, really
                                        -- Computer Science Lecturer

Re: [GENERAL] Strange performance degradation

From
"Ing. Marcos Ortiz Valmaseda"
Date:
Lorenzo Allegrucci escribió:
> Matthew Wakeling wrote:
>> On Mon, 23 Nov 2009, Lorenzo Allegrucci wrote:
>>> Anyway, how can I get rid those "idle in transaction" processes?
>>> Can I just kill -15 them or is there a less drastic way to do it?
>>
>> Are you crazy? Sure, if you want to destroy all of the changes made
>> to the database in that transaction and thoroughly confuse the client
>> application, you can send a TERM signal to a backend, but the
>> consequences to your data are on your own head.
>
> I'm not crazy, it was just a question..
> Anyway, problem solved in the Django application.
>
>
Matthew replied to you of that way because this is not a good manner to
do this, not fot thr fact that you are crazy.

You can find better ways to do this.

Regards

Re: Strange performance degradation

From
Matthew Wakeling
Date:
On Tue, 24 Nov 2009, Denis Lussier wrote:
> Bouncing the app will roll back the transactions.

Depends on the application. Some certainly use a shutdown hook to flush
data out to a database cleanly.

Obviously if you kill -9 it, then all bets are off.

Matthew

--
 Software suppliers are trying to make their software packages more
 'user-friendly'.... Their best approach, so far, has been to take all
 the old brochures, and stamp the words, 'user-friendly' on the cover.
 -- Bill Gates