Thanks for your reply.
On Tue, May 22, 2012 at 7:19 PM, Andy Colson <andy@squeakycode.net> wrote:
> On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedman<netllama@gmail.com>
> wrote:
>>>
>>> Greetings,
>>>
>>> When I got in this morning, I found
>>> an autovacuum process that had been running since just before the load
>>> spiked,
>
>
> Autovacuum might need to set the freeze bit very first time it runs. I
> recall hearing advice about running a 'vacuum freeze' after you insert a
> huge amount of data. And I recall pg_upgrade doesn't write stats, so did
> you analyze your database?
yes, I ran a 'vacuum analyze' for all databases & tables immediately
following completion of pg_upgrade.
>
> Or, maybe its not vacuum... maybe some of your sql statements are planning
> differently and running really bad. Can you check some? Can you log slow
> queries?
>
> Have you checked the status of your raid? Maybe you lost a drive and its in
> recovery and you have very slow IO?
I checked that initially, but the array is fine.
After banging my head on the wall for a long time, I happened to
notice that khugepaged was consuming 100% CPU every time autovacuum
was running. I did:
echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag
and immediately the entire problem went away. Load dropped within
minutes from 35.00 to 1.00, and has remained under 4.00 for the past
18 hours. Prior to disabling defrag, I never saw the load below 10.00
for more than a few seconds at a time.
So this looks like a nasty Fedora16 kernel bug to me, or maybe
postgresql & Fedora16's default kernel settings are just not
compatible?
Is anyone else using Fedora16 & PostgreSQL-9.1 ?