Re: Strange performance degradation - Mailing list pgsql-performance

From Guillaume Cottenceau
Subject Re: Strange performance degradation
Date
Msg-id 878we1igy4.fsf@meuh.mnc.lan
Whole thread Raw
In response to Re: Strange performance degradation  (Lorenzo Allegrucci <lorenzo.allegrucci@forinicom.it>)
List pgsql-performance
Lorenzo Allegrucci <lorenzo.allegrucci 'at' forinicom.it> writes:

> A. Kretschmer wrote:
>> In response to Lorenzo Allegrucci :
>>> Hi all,
>>>
>>> I'm experiencing a strange behavior with my postgresql 8.3:
>>> performance is degrading after 3/4 days of running time but if I
>>> just restart it performance returns back to it's normal value..
>>> In normal conditions the postgres process uses about 3% of cpu time
>>> but when is in "degraded" conditions it can use up to 25% of cpu time.
>>> The load of my server is composed of many INSERTs on a table, and
>>> many UPDATEs and SELECT on another table, no DELETEs.
>>> I tried to run vacuum by the pg_maintenance script (Debian Lenny)
>>> but it doesn't help. (I have autovacuum off).
>>
>> Bad idea. Really.
>
> Why running vacuum by hand is a bad idea?

It's rather turning autovacuum off which is a bad idea.

> vacuum doesn't solve anyway, it seems only a plain restart stops the
> performance degradation.

Notice: normally, restarting doesn't help for vacuum-related
problems.

Your degradation might come from a big request being intensive on
PG's and OS's caches, resulting in data useful to other requests
getting farther (but it should get back to normal if the big
request is not performed again). And btw, 25% is far from 100% so
response time should be the same if there are no other factors;
you should rather have a look at IOs (top, vmstat, iostat) during
problematic time. How do you measure your degradation, btw?

>>> So, my main question is.. how can just a plain simple restart of postgres
>>> restore the original performance (3% cpu time)?
>>
>> You should enable autovacuum.
>>
>> And you should run vacuum verbose manually and see the output.
>
> below is the output of vacuum analyze verbose
> (NOTE: I've already run vacuum this morning, this is a second run)
>
> DETAIL:  A total of 58224 page slots are in use (including overhead).
> 58224 page slots are required to track all free space.
> Current limits are:  2000000 page slots, 1000 relations, using 11784 kB.

Which means your FSM settings look fine; but doesn't mean your
database is not bloated (and with many UPDATEs and no correct
vacuuming, it should be bloated). One way to know is to restore a
recent backup, issue VACUUM VERBOSE on a table known to be large
and regularly UPDATE's/DELETE'd on both databases (in production,
and on the restore) and compare the reported number of pages
needed. The difference is the potential benefit of running VACUUM
FULL (or CLUSTER) in production (once your DB is bloated, a
normal VACUUM doesn't remove the bloat).

  db_production=# VACUUM VERBOSE table;
      [...]
  INFO:  "table": found 408 removable, 64994 nonremovable row versions in 4395 pages

  db_restored=# VACUUM VERBOSE table;
      [...]
  INFO:  "table": found 0 removable, 64977 nonremovable row versions in 628 pages

In that 628/4395 example, we have 85% bloat in production.

--
Guillaume Cottenceau

pgsql-performance by date:

Previous
From: Axel Rau
Date:
Subject: Re: SSD + RAID
Next
From: Matthew Wakeling
Date:
Subject: Re: SSD + RAID