Re: Waiting on ExclusiveLock on extension - Mailing list pgsql-general

From Andomar
Subject Re: Waiting on ExclusiveLock on extension
Date
Msg-id 55337440.7070306@aule.net
Whole thread Raw
In response to Re: Waiting on ExclusiveLock on extension  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: Waiting on ExclusiveLock on extension
List pgsql-general
To put the top question first:

How can table extension locks explain a a massive spike in CPU usage?

I can imagine 400 connections waiting on disk I/O, but then, wouldn't
they all be sleeping?

 > Ok, that's a MAJOR hint, because relation 1249 is a system catalog;
 > namely pg_attribute. So I think what's happening here is that your
 > catalog has become horrifically bloated. I'm 99% certain that VACUUM ALL
 > will not vacuum the catalog tables.
 >
 > Do you by chance have autovacuum turned off?
 >
 > A manual VACUUM VERBOSE pg_attribute might provide some immediate relief.
 >
Autovacuum is turned on.  In addition, we do a manual VACUUM ALL at
night.  VACUUM VERBOSE pg_attribute ran in 0 seconds and processed a few
hundred rows.

 > Are you using a connection pool? Establishing 50 new database
 > connections per second won't do anything to help performance...
 >
As I understand it, a pool reduces network and CPU load.  We have never
seen any issues with those.  So the extra monitoring and maintenance
cost of a pool seems hard to justify.

 > I think what that means is that there was suddenly a big spike in memory
 > demand at the OS level, so now the OS is frantically dumping cached
 > pages. That in itself won't explain this, but it may be a clue.
 >
We monitor memory usage with Cacti.  It's a dedicated server and nearly
all memory is used as cache.  If a script runs and demands memory, that
becomes visible as cache is cleared out.  There is no change in the
amount of memory used as cache around the outage.

 > In order to extend a relation we need to ask the filesystem to actually
 > extend the file (which presumably means at least writing some metadata
 > to disk), and then I think we create a WAL record. Creating the WAL
 > record won't by itself write to disk... *unless* the wal_buffers are all
 > already full.
 >
I have a question here, we have "synchronous_commit = off".  So when
Postgres extends a page, would it do that just in memory, or does part
of the extend operation require synchronous I/O?

 > So if you also see an I/O spike when this happens you could well
 > just be starved from the I/O system (though obviously it'd be
 > better if we handled that situation more elegantly than this).

The SAR data shows no increase in pgpgin/s and pgpgout/s, which if I
understand it correctly, means that there is no I/O spike.  There is
however an enormous increase in CPU usage.

 > I do suspect your pgfree/s is very high though; putting 200k pages/s on
 > the free list seems like something's broken.
 >
The system has constant and considerable load of small writes.  The
pg_activity tool shows 300 IOPs sustained (it claims max IPs above
11000.)  Postgres 9.3 had a comparable pgfree/s.

Would you know a good resource to get more knowledgeable about pgfree,
pgpin, pgsteal?

Kind regards,
Andomar


pgsql-general by date:

Previous
From: Jim Nasby
Date:
Subject: Re: [SQL] function to send email with query results
Next
From: rob stone
Date:
Subject: Re: Running pg_upgrade under Debian