Thread: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1

Greetings,
I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
replication to 3 hot standby servers).  All of them are running
Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.  I
made no changes to postgresql.conf following the upgrade.  I used
pg_upgrade on the master to upgrade it, followed by blowing away
$PGDATA on all the standbys and rsyncing them fresh from the master.
All of the servers have 128GB RAM, and at least 16 CPU cores.

Everything appeared to be working fine until last night when the load
on the master suddenly took off, and hovered at around 30.00 ever
since.  Prior to the load spike, the load was hovering around 2.00
(which is actually lower than it was averaging prior to the upgrade
when it was often around 4.00).  When I got in this morning, I found
an autovacuum process that had been running since just before the load
spiked, and the pg_dump cronjob that started shortly after the load
spike (and normally completes in about 20 minutes for all the
databases) was still running, and hadn't finished the first of the 6
databases.  I ended up killing the pg_dump process altogether in the
hope that it might unblock whatever was causing the high load.
Unfortunately that didn't help, and the load continued to run high.

I proceeded to check dmesg, /var/log/messages and the postgresql
server log (all on the master), but I didn't spot anything out of the
ordinary, definitely nothing that pointed to a potential explanation
for all of the high load.

I inspected what the autovacuum process was doing, and determined that
it was chewing away on the largest table (nearly 98 million rows) in
the largest database.  It was making very slow progress, at least I
believe that was the case, as when I attached strace to the process,
the seek addresses were changing in a random fashion.

Here are the current autovacuum settings:
autovacuum                      | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold    | 50
autovacuum_freeze_max_age       | 200000000
autovacuum_max_workers          | 4
autovacuum_naptime              | 1min
autovacuum_vacuum_cost_delay    | 20ms
autovacuum_vacuum_cost_limit    | -1
autovacuum_vacuum_scale_factor  | 0.2
autovacuum_vacuum_threshold     | 50

Did something significant change in 9.1 that would impact autovacuum
behavior?  I'm at a complete loss on how to debug this, since I'm
using the exact same settings now as prior to the upgrade.

thanks

No one has any ideas or suggestions, or even questions?  If someone
needs more information, I'd be happy to provide it.

This problem is absolutely killing me.

On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedman <netllama@gmail.com> wrote:
> Greetings,
> I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
> replication to 3 hot standby servers).  All of them are running
> Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
> Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.  I
> made no changes to postgresql.conf following the upgrade.  I used
> pg_upgrade on the master to upgrade it, followed by blowing away
> $PGDATA on all the standbys and rsyncing them fresh from the master.
> All of the servers have 128GB RAM, and at least 16 CPU cores.
>
> Everything appeared to be working fine until last night when the load
> on the master suddenly took off, and hovered at around 30.00 ever
> since.  Prior to the load spike, the load was hovering around 2.00
> (which is actually lower than it was averaging prior to the upgrade
> when it was often around 4.00).  When I got in this morning, I found
> an autovacuum process that had been running since just before the load
> spiked, and the pg_dump cronjob that started shortly after the load
> spike (and normally completes in about 20 minutes for all the
> databases) was still running, and hadn't finished the first of the 6
> databases.  I ended up killing the pg_dump process altogether in the
> hope that it might unblock whatever was causing the high load.
> Unfortunately that didn't help, and the load continued to run high.
>
> I proceeded to check dmesg, /var/log/messages and the postgresql
> server log (all on the master), but I didn't spot anything out of the
> ordinary, definitely nothing that pointed to a potential explanation
> for all of the high load.
>
> I inspected what the autovacuum process was doing, and determined that
> it was chewing away on the largest table (nearly 98 million rows) in
> the largest database.  It was making very slow progress, at least I
> believe that was the case, as when I attached strace to the process,
> the seek addresses were changing in a random fashion.
>
> Here are the current autovacuum settings:
> autovacuum                      | on
> autovacuum_analyze_scale_factor | 0.1
> autovacuum_analyze_threshold    | 50
> autovacuum_freeze_max_age       | 200000000
> autovacuum_max_workers          | 4
> autovacuum_naptime              | 1min
> autovacuum_vacuum_cost_delay    | 20ms
> autovacuum_vacuum_cost_limit    | -1
> autovacuum_vacuum_scale_factor  | 0.2
> autovacuum_vacuum_threshold     | 50
>
> Did something significant change in 9.1 that would impact autovacuum
> behavior?  I'm at a complete loss on how to debug this, since I'm
> using the exact same settings now as prior to the upgrade.
>
> thanks

I'd say to do some monitoring of your machine when this is happening.
vmstat, iostat, iotop, htop, and so on.  Are you running out of
memory, a context switch / interrupt storm?  IO bound?  And so on.

On Tue, May 22, 2012 at 12:20 PM, Lonni J Friedman <netllama@gmail.com> wrote:
> No one has any ideas or suggestions, or even questions?  If someone
> needs more information, I'd be happy to provide it.
>
> This problem is absolutely killing me.
>
> On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedman <netllama@gmail.com> wrote:
>> Greetings,
>> I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
>> replication to 3 hot standby servers).  All of them are running
>> Fedora-16-x86_64.  Last Friday I upgraded the entire cluster from
>> Fedora-15 with postgresql-9.0.6 to Fedora-16 with postgresql-9.1.3.  I
>> made no changes to postgresql.conf following the upgrade.  I used
>> pg_upgrade on the master to upgrade it, followed by blowing away
>> $PGDATA on all the standbys and rsyncing them fresh from the master.
>> All of the servers have 128GB RAM, and at least 16 CPU cores.
>>
>> Everything appeared to be working fine until last night when the load
>> on the master suddenly took off, and hovered at around 30.00 ever
>> since.  Prior to the load spike, the load was hovering around 2.00
>> (which is actually lower than it was averaging prior to the upgrade
>> when it was often around 4.00).  When I got in this morning, I found
>> an autovacuum process that had been running since just before the load
>> spiked, and the pg_dump cronjob that started shortly after the load
>> spike (and normally completes in about 20 minutes for all the
>> databases) was still running, and hadn't finished the first of the 6
>> databases.  I ended up killing the pg_dump process altogether in the
>> hope that it might unblock whatever was causing the high load.
>> Unfortunately that didn't help, and the load continued to run high.
>>
>> I proceeded to check dmesg, /var/log/messages and the postgresql
>> server log (all on the master), but I didn't spot anything out of the
>> ordinary, definitely nothing that pointed to a potential explanation
>> for all of the high load.
>>
>> I inspected what the autovacuum process was doing, and determined that
>> it was chewing away on the largest table (nearly 98 million rows) in
>> the largest database.  It was making very slow progress, at least I
>> believe that was the case, as when I attached strace to the process,
>> the seek addresses were changing in a random fashion.
>>
>> Here are the current autovacuum settings:
>> autovacuum                      | on
>> autovacuum_analyze_scale_factor | 0.1
>> autovacuum_analyze_threshold    | 50
>> autovacuum_freeze_max_age       | 200000000
>> autovacuum_max_workers          | 4
>> autovacuum_naptime              | 1min
>> autovacuum_vacuum_cost_delay    | 20ms
>> autovacuum_vacuum_cost_limit    | -1
>> autovacuum_vacuum_scale_factor  | 0.2
>> autovacuum_vacuum_threshold     | 50
>>
>> Did something significant change in 9.1 that would impact autovacuum
>> behavior?  I'm at a complete loss on how to debug this, since I'm
>> using the exact same settings now as prior to the upgrade.
>>
>> thanks
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



--
To understand recursion, one must first understand recursion.

On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedman<netllama@gmail.com>  wrote:
>> Greetings,
>> When I got in this morning, I found
>> an autovacuum process that had been running since just before the load
>> spiked,

Autovacuum might need to set the freeze bit very first time it runs.  I recall hearing advice about running a 'vacuum
freeze'after you insert a huge amount of data.  And I recall pg_upgrade doesn't write stats, so did you analyze your
database?

Or, maybe its not vacuum... maybe some of your sql statements are planning differently and running really bad.  Can you
checksome?  Can you log slow queries? 

Have you checked the status of your raid?  Maybe you lost a drive and its in recovery and you have very slow IO?

-Andy

Thanks for your reply.

On Tue, May 22, 2012 at 7:19 PM, Andy Colson <andy@squeakycode.net> wrote:
>  On Mon, May 21, 2012 at 2:05 PM, Lonni J Friedman<netllama@gmail.com>
>  wrote:
>>>
>>> Greetings,
>>>
>>> When I got in this morning, I found
>>> an autovacuum process that had been running since just before the load
>>> spiked,
>
>
> Autovacuum might need to set the freeze bit very first time it runs.  I
> recall hearing advice about running a 'vacuum freeze' after you insert a
> huge amount of data.  And I recall pg_upgrade doesn't write stats, so did
> you analyze your database?

yes, I ran a 'vacuum analyze' for all databases & tables immediately
following completion of pg_upgrade.

>
> Or, maybe its not vacuum... maybe some of your sql statements are planning
> differently and running really bad.  Can you check some?  Can you log slow
> queries?
>
> Have you checked the status of your raid?  Maybe you lost a drive and its in
> recovery and you have very slow IO?

I checked that initially, but the array is fine.

After banging my head on the wall for  a long time, I happened to
notice that khugepaged was consuming 100% CPU every time autovacuum
was running.  I did:
echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag

and immediately the entire problem went away.  Load dropped within
minutes from 35.00 to 1.00, and has remained under 4.00 for the past
18 hours.  Prior to disabling defrag, I never saw the load below 10.00
for more than a few seconds at a time.

So this looks like a nasty Fedora16 kernel bug to me, or maybe
postgresql & Fedora16's default kernel settings are just not
compatible?

Is anyone else using Fedora16 & PostgreSQL-9.1 ?

Lonni J Friedman <netllama@gmail.com> writes:
> After banging my head on the wall for  a long time, I happened to
> notice that khugepaged was consuming 100% CPU every time autovacuum
> was running.  I did:
> echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag
> and immediately the entire problem went away.

Fascinating.

> So this looks like a nasty Fedora16 kernel bug to me, or maybe
> postgresql & Fedora16's default kernel settings are just not
> compatible?

I agree, kernel bug.  What kernel version are you using exactly?

> Is anyone else using Fedora16 & PostgreSQL-9.1 ?

I use an F16 box daily, but can't claim to have done major performance
testing with it.  Can you put together a summary of your nondefault
Postgres settings?  I wonder whether it only kicks in for a certain
size of shared memory for instance.

            regards, tom lane

On Wed, May 23, 2012 at 9:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
>> After banging my head on the wall for  a long time, I happened to
>> notice that khugepaged was consuming 100% CPU every time autovacuum
>> was running.  I did:
>> echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag
>> and immediately the entire problem went away.
>
> Fascinating.

In hindsight, sure.  Before that, it was 2 days of horror.

>
>> So this looks like a nasty Fedora16 kernel bug to me, or maybe
>> postgresql & Fedora16's default kernel settings are just not
>> compatible?
>
> I agree, kernel bug.  What kernel version are you using exactly?

I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.

>
>> Is anyone else using Fedora16 & PostgreSQL-9.1 ?
>
> I use an F16 box daily, but can't claim to have done major performance
> testing with it.  Can you put together a summary of your nondefault
> Postgres settings?  I wonder whether it only kicks in for a certain
> size of shared memory for instance.

Oh yea, I'm quite certain that this is somehow related to my setup,
and not a generic problem with all F16/pgsql systems.  For starters,
this problem isn't happening on any of the 3 standby systems, which
are all otherwise identical to the master in every respect.  Also when
we had done some testing (prior to the upgrades), we never ran into
any of these problems.  However our test environment was on smaller
scale hardware, with a much smaller number of clients (and overall
load).

Here are the non default settings in postgresql.conf :
wal_level = hot_standby
archive_mode = on
archive_timeout = 61
max_wal_senders = 10
wal_keep_segments = 5000
hot_standby = on
log_autovacuum_min_duration = 2500
autovacuum_max_workers = 4
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
effective_cache_size = 88GB
work_mem = 576MB
wal_buffers = 16MB
checkpoint_segments = 64
shared_buffers = 8GB
max_connections = 350

Let me know if you have any other questions.  I'd be happy to provide
as much information as possible if it can aid in fixing this bug.

On 24/05/12 05:09, Lonni J Friedman wrote:
On Wed, May 23, 2012 at 9:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lonni J Friedman <netllama@gmail.com> writes:
After banging my head on the wall for  a long time, I happened to
notice that khugepaged was consuming 100% CPU every time autovacuum
was running.  I did:
echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag
and immediately the entire problem went away.
Fascinating.
In hindsight, sure.  Before that, it was 2 days of horror.

So this looks like a nasty Fedora16 kernel bug to me, or maybe
postgresql & Fedora16's default kernel settings are just not
compatible?
I agree, kernel bug.  What kernel version are you using exactly?
I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.

Is anyone else using Fedora16 & PostgreSQL-9.1 ?
I use an F16 box daily, but can't claim to have done major performance
testing with it.  Can you put together a summary of your nondefault
Postgres settings?  I wonder whether it only kicks in for a certain
size of shared memory for instance.
Oh yea, I'm quite certain that this is somehow related to my setup,
and not a generic problem with all F16/pgsql systems.  For starters,
this problem isn't happening on any of the 3 standby systems, which
are all otherwise identical to the master in every respect.  Also when
we had done some testing (prior to the upgrades), we never ran into
any of these problems.  However our test environment was on smaller
scale hardware, with a much smaller number of clients (and overall
load).

Here are the non default settings in postgresql.conf :
wal_level = hot_standby
archive_mode = on
archive_timeout = 61
max_wal_senders = 10
wal_keep_segments = 5000
hot_standby = on
log_autovacuum_min_duration = 2500
autovacuum_max_workers = 4
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
effective_cache_size = 88GB
work_mem = 576MB
wal_buffers = 16MB
checkpoint_segments = 64
shared_buffers = 8GB
max_connections = 350

Let me know if you have any other questions.  I'd be happy to provide
as much information as possible if it can aid in fixing this bug.

I think they will need details of things like: RAM, number/type processors, number & type
of disks, disk controllers & any other hardware specs that might be relevant etc.- at very
least: total RAM & number of spindles

Also anything else running on the box.

Plus transaction load pattern - over time and read/write ratios.

type/nature of queries

size of heavily accessed tables and their indexes

Apologies, if you have already supplied the above!

Take the above with a grain of salt, I have read about performance issues in databases,
but never had to chase issues myself. Apart from once where I reduced the query time
of one particular query down from 30+ minutes to about 10 seconds PROGRESS
4GL/Oracle db (not relevant here)!


Cheers,
Gavin

(who has the luxury of designing a very sophisticated db with very low transaction rates,
so doesn't need to worry about performance issues)

On Wed, May 23, 2012 at 12:36 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
> On 24/05/12 05:09, Lonni J Friedman wrote:
>
> On Wed, May 23, 2012 at 9:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Lonni J Friedman <netllama@gmail.com> writes:
>
> After banging my head on the wall for  a long time, I happened to
> notice that khugepaged was consuming 100% CPU every time autovacuum
> was running.  I did:
> echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag
> and immediately the entire problem went away.
>
> Fascinating.
>
> In hindsight, sure.  Before that, it was 2 days of horror.
>
> So this looks like a nasty Fedora16 kernel bug to me, or maybe
> postgresql & Fedora16's default kernel settings are just not
> compatible?
>
> I agree, kernel bug.  What kernel version are you using exactly?
>
> I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.
>
> Is anyone else using Fedora16 & PostgreSQL-9.1 ?
>
> I use an F16 box daily, but can't claim to have done major performance
> testing with it.  Can you put together a summary of your nondefault
> Postgres settings?  I wonder whether it only kicks in for a certain
> size of shared memory for instance.
>
> Oh yea, I'm quite certain that this is somehow related to my setup,
> and not a generic problem with all F16/pgsql systems.  For starters,
> this problem isn't happening on any of the 3 standby systems, which
> are all otherwise identical to the master in every respect.  Also when
> we had done some testing (prior to the upgrades), we never ran into
> any of these problems.  However our test environment was on smaller
> scale hardware, with a much smaller number of clients (and overall
> load).
>
> Here are the non default settings in postgresql.conf :
> wal_level = hot_standby
> archive_mode = on
> archive_timeout = 61
> max_wal_senders = 10
> wal_keep_segments = 5000
> hot_standby = on
> log_autovacuum_min_duration = 2500
> autovacuum_max_workers = 4
> maintenance_work_mem = 1GB
> checkpoint_completion_target = 0.7
> effective_cache_size = 88GB
> work_mem = 576MB
> wal_buffers = 16MB
> checkpoint_segments = 64
> shared_buffers = 8GB
> max_connections = 350
>
> Let me know if you have any other questions.  I'd be happy to provide
> as much information as possible if it can aid in fixing this bug.
>
> I think they will need details of things like: RAM, number/type processors,
> number & type
> of disks, disk controllers & any other hardware specs that might be relevant
> etc.- at very
> least: total RAM & number of spindles

16 core Xeon X5550 2.67GHz
128GB RAM
$PGDATA sits on a RAID5 array comprised of 3 SATA disks.  Its Linux's
md software RAID.

>
> Also anything else running on the box.

nothing else.  its dedicated exclusively to postgresql.

>
> Plus transaction load pattern - over time and read/write ratios.

I'm not sure how I'd obtain this data.  however, the patterns didn't
change since the upgrade.  If someone can point me in the right
direction, I can at least obtain this data as its generated currently.

>
> type/nature of queries

I need some clarification on specifically what you're asking for here.

>
> size of heavily accessed tables and their indexes

there are several rather large tables (90 million+ rows), but most
others are under 1M rows.  However, most tables are accessed & written
to with equal frequency.

Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
>> 16 core Xeon X5550 2.67GHz
>> 128GB RAM
>> $PGDATA sits on a RAID5 array comprised of 3 SATA disks.  Its Linux's
>> md software RAID.

> How does this compare to your other machines running the same, or
> similar, databases?
> However, you do say that the other machines are indentical - but are the
> other
> machines different in any aspect, that might prove siginificant?

I think Lonnie said that the other machines are just running standby
clusters, which would mean they aren't running autovacuum as such,
merely applying any WAL it produces.  So that could be plenty enough
to explain a difference in kernel-visible behavior.

            regards, tom lane

On Wed, May 23, 2012 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
>>> 16 core Xeon X5550 2.67GHz
>>> 128GB RAM
>>> $PGDATA sits on a RAID5 array comprised of 3 SATA disks.  Its Linux's
>>> md software RAID.
>
>> How does this compare to your other machines running the same, or
>> similar, databases?
>> However, you do say that the other machines are indentical - but are the
>> other
>> machines different in any aspect, that might prove siginificant?
>
> I think Lonnie said that the other machines are just running standby
> clusters, which would mean they aren't running autovacuum as such,
> merely applying any WAL it produces.  So that could be plenty enough
> to explain a difference in kernel-visible behavior.

Yes, that is correct.

On 24/05/12 08:18, Lonni J Friedman wrote:
On Wed, May 23, 2012 at 12:36 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
On 24/05/12 05:09, Lonni J Friedman wrote:

On Wed, May 23, 2012 at 9:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Lonni J Friedman <netllama@gmail.com> writes:

After banging my head on the wall for  a long time, I happened to
notice that khugepaged was consuming 100% CPU every time autovacuum
was running.  I did:
echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag
and immediately the entire problem went away.

Fascinating.

In hindsight, sure.  Before that, it was 2 days of horror.

So this looks like a nasty Fedora16 kernel bug to me, or maybe
postgresql & Fedora16's default kernel settings are just not
compatible?

I agree, kernel bug.  What kernel version are you using exactly?

I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.

Is anyone else using Fedora16 & PostgreSQL-9.1 ?

I use an F16 box daily, but can't claim to have done major performance
testing with it.  Can you put together a summary of your nondefault
Postgres settings?  I wonder whether it only kicks in for a certain
size of shared memory for instance.

Oh yea, I'm quite certain that this is somehow related to my setup,
and not a generic problem with all F16/pgsql systems.  For starters,
this problem isn't happening on any of the 3 standby systems, which
are all otherwise identical to the master in every respect.  Also when
we had done some testing (prior to the upgrades), we never ran into
any of these problems.  However our test environment was on smaller
scale hardware, with a much smaller number of clients (and overall
load).

Here are the non default settings in postgresql.conf :
wal_level = hot_standby
archive_mode = on
archive_timeout = 61
max_wal_senders = 10
wal_keep_segments = 5000
hot_standby = on
log_autovacuum_min_duration = 2500
autovacuum_max_workers = 4
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
effective_cache_size = 88GB
work_mem = 576MB
wal_buffers = 16MB
checkpoint_segments = 64
shared_buffers = 8GB
max_connections = 350

Let me know if you have any other questions.  I'd be happy to provide
as much information as possible if it can aid in fixing this bug.

I think they will need details of things like: RAM, number/type processors,
number & type
of disks, disk controllers & any other hardware specs that might be relevant
etc.- at very
least: total RAM & number of spindles
16 core Xeon X5550 2.67GHz
128GB RAM
$PGDATA sits on a RAID5 array comprised of 3 SATA disks.  Its Linux's
md software RAID.
How does this compare to your other machines running the same, or similar, databases?
However, you do say that the other machines are indentical - but are the other
machines different in any aspect, that might prove siginificant?


Also anything else running on the box.
nothing else.  its dedicated exclusively to postgresql.

Plus transaction load pattern - over time and read/write ratios.
I'm not sure how I'd obtain this data.  however, the patterns didn't
change since the upgrade.  If someone can point me in the right
direction, I can at least obtain this data as its generated currently.

type/nature of queries
I need some clarification on specifically what you're asking for here.

The complexity, structure, and features of the queries. Do you have lots of sub queries,
and ORDER BY's? Also the number of tables accessed in a query. This is heading into the
territory where others will be better placed to advise you as to what might be relevant!


size of heavily accessed tables and their indexes
there are several rather large tables (90 million+ rows), but most
others are under 1M rows.  However, most tables are accessed & written
to with equal frequency.
The number, type, size, and usage of indexes might be relevant.


I hope the above comments help, probably others will have more specific requests.


Cheers,
Gavin
On Wed, May 23, 2012 at 2:45 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
> On 24/05/12 08:18, Lonni J Friedman wrote:
>
> On Wed, May 23, 2012 at 12:36 PM, Gavin Flower
> <GavinFlower@archidevsys.co.nz> wrote:
>
> On 24/05/12 05:09, Lonni J Friedman wrote:
>
> On Wed, May 23, 2012 at 9:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Lonni J Friedman <netllama@gmail.com> writes:
>
> After banging my head on the wall for  a long time, I happened to
> notice that khugepaged was consuming 100% CPU every time autovacuum
> was running.  I did:
> echo "madvise" > /sys/kernel/mm/transparent_hugepage/defrag
> and immediately the entire problem went away.
>
> Fascinating.
>
> In hindsight, sure.  Before that, it was 2 days of horror.
>
> So this looks like a nasty Fedora16 kernel bug to me, or maybe
> postgresql & Fedora16's default kernel settings are just not
> compatible?
>
> I agree, kernel bug.  What kernel version are you using exactly?
>
> I'm using the stock 3.3.5-2.fc16.x86_64 kernel that is in Fedora updates.
>
> Is anyone else using Fedora16 & PostgreSQL-9.1 ?
>
> I use an F16 box daily, but can't claim to have done major performance
> testing with it.  Can you put together a summary of your nondefault
> Postgres settings?  I wonder whether it only kicks in for a certain
> size of shared memory for instance.
>
> Oh yea, I'm quite certain that this is somehow related to my setup,
> and not a generic problem with all F16/pgsql systems.  For starters,
> this problem isn't happening on any of the 3 standby systems, which
> are all otherwise identical to the master in every respect.  Also when
> we had done some testing (prior to the upgrades), we never ran into
> any of these problems.  However our test environment was on smaller
> scale hardware, with a much smaller number of clients (and overall
> load).
>
> Here are the non default settings in postgresql.conf :
> wal_level = hot_standby
> archive_mode = on
> archive_timeout = 61
> max_wal_senders = 10
> wal_keep_segments = 5000
> hot_standby = on
> log_autovacuum_min_duration = 2500
> autovacuum_max_workers = 4
> maintenance_work_mem = 1GB
> checkpoint_completion_target = 0.7
> effective_cache_size = 88GB
> work_mem = 576MB
> wal_buffers = 16MB
> checkpoint_segments = 64
> shared_buffers = 8GB
> max_connections = 350
>
> Let me know if you have any other questions.  I'd be happy to provide
> as much information as possible if it can aid in fixing this bug.
>
> I think they will need details of things like: RAM, number/type processors,
> number & type
> of disks, disk controllers & any other hardware specs that might be relevant
> etc.- at very
> least: total RAM & number of spindles
>
> 16 core Xeon X5550 2.67GHz
> 128GB RAM
> $PGDATA sits on a RAID5 array comprised of 3 SATA disks.  Its Linux's
> md software RAID.
>
> How does this compare to your other machines running the same, or similar,
> databases?
> However, you do say that the other machines are indentical - but are the
> other
> machines different in any aspect, that might prove siginificant?
>
>
>
> Also anything else running on the box.
>
> nothing else.  its dedicated exclusively to postgresql.
>
> Plus transaction load pattern - over time and read/write ratios.
>
> I'm not sure how I'd obtain this data.  however, the patterns didn't
> change since the upgrade.  If someone can point me in the right
> direction, I can at least obtain this data as its generated currently.
>
> type/nature of queries
>
> I need some clarification on specifically what you're asking for here.
>
> The complexity, structure, and features of the queries. Do you have lots of
> sub queries,
> and ORDER BY's? Also the number of tables accessed in a query. This is
> heading into the
> territory where others will be better placed to advise you as to what might
> be relevant!

No, not lots of subqueries or ORDERing, and most queries only touch a
single table.  However, I'm honestly not sure that I'm following where
you're going with this.   The problem isn't triggered by explicit
queries.  I can disable all external access, and simply wait for
autovacuum to kick off, and the box starts to die.

Lonni J Friedman <netllama@gmail.com> writes:
> No, not lots of subqueries or ORDERing, and most queries only touch a
> single table.  However, I'm honestly not sure that I'm following where
> you're going with this.   The problem isn't triggered by explicit
> queries.  I can disable all external access, and simply wait for
> autovacuum to kick off, and the box starts to die.

Can you correlate the performance hit with any specific part of
autovacuum?  In particular, I'm wondering if it matters whether vacuum
is cleaning tables or indexes --- it alternates between the two, and the
access patterns are a bit different.  You could probably watch what the
autovac process is doing with strace to see what it's accessing.

            regards, tom lane

On Thu, May 24, 2012 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
>> No, not lots of subqueries or ORDERing, and most queries only touch a
>> single table.  However, I'm honestly not sure that I'm following where
>> you're going with this.   The problem isn't triggered by explicit
>> queries.  I can disable all external access, and simply wait for
>> autovacuum to kick off, and the box starts to die.
>
> Can you correlate the performance hit with any specific part of
> autovacuum?  In particular, I'm wondering if it matters whether vacuum
> is cleaning tables or indexes --- it alternates between the two, and the
> access patterns are a bit different.  You could probably watch what the
> autovac process is doing with strace to see what it's accessing.

Is there something specific I should be looking for in the strace
output, or is this just a matter of correlating PID and FD to
pg_class.relfilenode ?

Lonni J Friedman <netllama@gmail.com> writes:
> On Thu, May 24, 2012 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Can you correlate the performance hit with any specific part of
>> autovacuum? In particular, I'm wondering if it matters whether vacuum
>> is cleaning tables or indexes --- it alternates between the two, and the
>> access patterns are a bit different. You could probably watch what the
>> autovac process is doing with strace to see what it's accessing.

> Is there something specific I should be looking for in the strace
> output, or is this just a matter of correlating PID and FD to
> pg_class.relfilenode ?

Nah, just match up the files it touches with pg_class.relfilenode.

            regards, tom lane

Re: significant performance hit whenever autovacuum runs after upgrading from 9.0 -> 9.1

From
fche@redhat.com (Frank Ch. Eigler)
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> [...]  Can you correlate the performance hit with any specific part
> of autovacuum?  In particular, I'm wondering if it matters whether
> vacuum is cleaning tables or indexes [...]

In case it helps, this systemtap run will report on the top few
time-sampled call graphs of post* processes:

# debuginfo-install -y postgresql
# stap -V    [...Fedora16...]
Systemtap translator/driver (version 1.7/0.153 non-git sources)
# cat post-prof.stp
probe timer.profile {
  if (substr(execname(),0,4) != "post") next
  bt[sprint_ustack(ubacktrace())] <<< 1
}
probe end {
  foreach ([s] in bt- limit 50) { println("\n",@count(bt[s]),":",s) }
}
global bt[20000]

# stap post-prof.stp -v -d /usr/bin/postgres -d /usr/bin/postmaster --ldd --all-modules  --suppress-handler-errors
[wait awhile during workload]
^C

1390:index_getnext+0x1f9 [postgres]
IndexNext+0x56 [postgres]
ExecScan+0x14e [postgres]
ExecProcNode+0x228 [postgres]
ExecLimit+0xb8 [postgres]
ExecProcNode+0xd8 [postgres]
standard_ExecutorRun+0x14a [postgres]
PortalRunSelect+0x287 [postgres]
PortalRun+0x248 [postgres]
PostgresMain+0x754 [postgres]
ServerLoop+0x799 [postgres]
PostmasterMain+0x647 [postgres]
main+0x71e [postgres]
__libc_start_main+0xed [libc-2.14.90.so]
_start+0x29 [postgres]

935:__lseek_nocancel+0x7 [libc-2.14.90.so]
FileSeek+0x127 [postgres]
_mdnblocks+0x20 [postgres]
mdnblocks+0x6b [postgres]
get_relation_info+0x5e2 [postgres]
build_simple_rel+0x169 [postgres]
add_base_rels_to_query+0x83 [postgres]
query_planner+0x159 [postgres]
grouping_planner+0xc0b [postgres]
subquery_planner+0x5b4 [postgres]
standard_planner+0xe4 [postgres]
pg_plan_query+0x1b [postgres]
pg_plan_queries+0x54 [postgres]
PostgresMain+0x848 [postgres]
ServerLoop+0x799 [postgres]
PostmasterMain+0x647 [postgres]


721:__send+0x22 [libc-2.14.90.so]
internal_flush+0x3b [postgres]
pq_flush+0x22 [postgres]
ReadyForQuery+0x29 [postgres]
PostgresMain+0x49e [postgres]
ServerLoop+0x799 [postgres]
PostmasterMain+0x647 [postgres]
main+0x71e [postgres]
__libc_start_main+0xed [libc-2.14.90.so]
_start+0x29 [postgres]

431:recv+0x22 [libc-2.14.90.so]
secure_read+0x1c6 [postgres]
pq_recvbuf+0x5f [postgres]
pq_getbyte+0x15 [postgres]
PostgresMain+0x4bf [postgres]
ServerLoop+0x799 [postgres]
PostmasterMain+0x647 [postgres]
main+0x71e [postgres]
__libc_start_main+0xed [libc-2.14.90.so]
_start+0x29 [postgres]

380:__lseek_nocancel+0x7 [libc-2.14.90.so]
FileSeek+0x127 [postgres]
_mdnblocks+0x20 [postgres]
mdnblocks+0x6b [postgres]
estimate_rel_size+0x7c [postgres]
get_relation_info+0x121 [postgres]
build_simple_rel+0x169 [postgres]
add_base_rels_to_query+0x83 [postgres]
query_planner+0x159 [postgres]
grouping_planner+0xc0b [postgres]
subquery_planner+0x5b4 [postgres]
standard_planner+0xe4 [postgres]
pg_plan_query+0x1b [postgres]
pg_plan_queries+0x54 [postgres]
PostgresMain+0x848 [postgres]
ServerLoop+0x799 [postgres]
[...]

On Thu, May 24, 2012 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
>> On Thu, May 24, 2012 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Can you correlate the performance hit with any specific part of
>>> autovacuum? In particular, I'm wondering if it matters whether vacuum
>>> is cleaning tables or indexes --- it alternates between the two, and the
>>> access patterns are a bit different. You could probably watch what the
>>> autovac process is doing with strace to see what it's accessing.
>
>> Is there something specific I should be looking for in the strace
>> output, or is this just a matter of correlating PID and FD to
>> pg_class.relfilenode ?
>
> Nah, just match up the files it touches with pg_class.relfilenode.

Seems to be slower across the board regardless of what is being
vacuumed (tables or indices).

For example, i have a relatively small table (currently 395 rows,
rarely has any inserts or deletes).  vacuuming just the table (not
indices) takes on average 4s with the khugepaged defragmenter on, and
less than 1s with it off.  vacuuming just the indices takes on average
2s with the khugepaged defragmenter on, and less than 1s with it off.

The much larger tables (thousands to millions of rows), I see similar
performance (although they take even longer to complete with & without
the khugepaged defragmenter enabled).