Thread: Waiting on ExclusiveLock on extension

Waiting on ExclusiveLock on extension

From

Andomar

Date:

16 April 2015, 20:24:50

Hi,

After upgrading our database from 9.3.5 to 9.4.1 last night, the server
suffers from high CPU spikes. During these spikes, there are a lot of
these messages in the logs:

     process X still waiting for ExclusiveLock on extension of relation
Y of database Z after 1036.234 ms

And:

     process X acquired ExclusiveLock on extension of relation Y of
database Z after 2788.050 ms

What looks suspicious is that there are sometimes several "acquired"
messages for the exact same relation number in the exact same millisecond.

a) I'm assuming "extension" means growing the physical diskspace used by
a table-- is that correct?
b) How can you find the name of the relation being extended? based on
the relation number.
c) Why would Postgres grow a table twice in the same millisecond? Could
it be an index with a high fill factor?

Any suggestions on how to approach this issue are welcome.

Thanks for your time,
Andomar

P.S. The upgrade was done with pg_dump.  So the database was converted
to SQL and then imported into 9.4.

Re: Waiting on ExclusiveLock on extension

From

Qingqing Zhou

Date:

16 April 2015, 21:23:33

On Thu, Apr 16, 2015 at 1:24 PM, Andomar <andomar@aule.net> wrote:
> After upgrading our database from 9.3.5 to 9.4.1 last night, the server
> suffers from high CPU spikes. During these spikes, there are a lot of these
> messages in the logs:
>
>     process X still waiting for ExclusiveLock on extension of relation Y of
> database Z after 1036.234 ms
>
This issue has been complained several times, and here is the most recent one:
http://www.postgresql.org/message-id/0DDFB621-7282-4A2B-8879-A47F7CECBCE4@simply.name

PG 9.4.1 shall have much alleviated it by relaxing buffer pool related
locks. PG 9.4.1 shall be actually better in relation extension
handling - a possible explanation is that your 9.3.5 database has been
used for a while thus there are holes in pages, so not many extensions
are required.

>
> a) I'm assuming "extension" means growing the physical diskspace used by a
> table-- is that correct?
Yes.

> b) How can you find the name of the relation being extended? based on the
> relation number.
select <number>::regclass;

> c) Why would Postgres grow a table twice in the same millisecond? Could it
> be an index with a high fill factor?
PG extends one page each time when new space needed.

> Any suggestions on how to approach this issue are welcome.
>
There are some diagnosis in above link, see if it rings any bell. From
PG kernel side, I think fundamentally we may want to extend many pages
each time instead of one.

Regards,
Qingqing

Re: Waiting on ExclusiveLock on extension

From

Andomar

Date:

16 April 2015, 21:39:31

Thanks for your reply.
> This issue has been complained several times, and here is the most recent one:
> http://www.postgresql.org/message-id/0DDFB621-7282-4A2B-8879-A47F7CECBCE4@simply.name
That post is about a server with huge shared_buffers, but ours is just
8GB. Total memory 48GB memory on a dedicated server. Checkpoints write
around 2% of the buffers.
> PG 9.4.1 shall have much alleviated it by relaxing buffer pool related
> locks. PG 9.4.1 shall be actually better in relation extension
> handling - a possible explanation is that your 9.3.5 database has been
> used for a while thus there are holes in pages, so not many extensions
> are required.
The 9.3.5 version went live as an upgrade from 9.1.x in the same way. So
it started from an SQL dump. The load has not changed much since the
9.3.5 upgrade.

With holes in pages, I suppose you mean the fill factor? Is there a way
to see the current fillfactor of a table and its indices?

Kind regards,
Andomar

Re: Waiting on ExclusiveLock on extension

From

Qingqing Zhou

Date:

16 April 2015, 22:09:17

On Thu, Apr 16, 2015 at 2:39 PM, Andomar <andomar@aule.net> wrote:
> That post is about a server with huge shared_buffers, but ours is just 8GB.
> Total memory 48GB memory on a dedicated server. Checkpoints write around 2%
> of the buffers.
Are you able to take some 'perf top' during high CPU spike and see
what's burning CPU there? Though the issue is related to blocking, but
high CPU spikes may hint some spinning to acquire behavior.

> With holes in pages, I suppose you mean the fill factor? Is there a way to
> see the current fillfactor of a table and its indices?
>
Yes, holes meaning free space within a page. It can come from <100%
fillfactor or vacuum collected dead tuples. You can see fillfactor
value in pg_class.reloptions. If you nothing there, that's 100% and
90% for heap and btree by default respectively.

If your previous relation size is smaller than after upgrade, that's a
signal that you do have holes in relation, thus extension can be
avoided sometimes for new tuples.

Regards,
Qingqing

Re: Waiting on ExclusiveLock on extension

From

Andomar

Date:

17 April 2015, 08:01:10

> Are you able to take some 'perf top' during high CPU spike and see
> what's burning CPU there? Though the issue is related to blocking, but
> high CPU spikes may hint some spinning to acquire behavior.

Will do, although hopefully the spikes were only growing pains after the
upgrade.

> If your previous relation size is smaller than after upgrade, that's a
> signal that you do have holes in relation, thus extension can be
> avoided sometimes for new tuples.

The relation between high CPU and page splits is not immediately obvious
to me.

We run with synchronous_commit off, but there does seem to be a peak in
I/O requests around the CPU spikes.

Is a page split by nature a synchronous I/O activity? And do the other
connections wait in some kind of CPU intensive form (like a spinlock?)

Kind regards,
Andomar

Re: Waiting on ExclusiveLock on extension

From

Andres Freund

Date:

17 April 2015, 08:14:24

On 2015-04-16 14:23:25 -0700, Qingqing Zhou wrote:
> On Thu, Apr 16, 2015 at 1:24 PM, Andomar <andomar@aule.net> wrote:
> > After upgrading our database from 9.3.5 to 9.4.1 last night, the server
> > suffers from high CPU spikes. During these spikes, there are a lot of these
> > messages in the logs:
> >
> >     process X still waiting for ExclusiveLock on extension of relation Y of
> > database Z after 1036.234 ms
> >
> This issue has been complained several times, and here is the most recent one:
> http://www.postgresql.org/message-id/0DDFB621-7282-4A2B-8879-A47F7CECBCE4@simply.name
>
> PG 9.4.1 shall have much alleviated it by relaxing buffer pool related
> locks.

Hm. I'm not aware of related changes in 9.4? 9.5 should be a bit better,
but I don't think 9.4 will make much of a difference.

> > b) How can you find the name of the relation being extended? based on the
> > relation number.
> select <number>::regclass;

That's not correct. The relfilenode is only equivalent to the relation's
oid when you initially create it. But once rewritten it'll change.

Rather use

SELECT oid::regclass FROM pg_class WHERE pg_relation_filenode(oid) = 2937136;


> > Any suggestions on how to approach this issue are welcome.
> >
> There are some diagnosis in above link, see if it rings any bell. From
> PG kernel side, I think fundamentally we may want to extend many pages
> each time instead of one.

I don't really agree that that's the most important bit. See
http://archives.postgresql.org/message-id/20150329185619.GA29062%40alap3.anarazel.de

Greetings,

Andres Freund

Re: Waiting on ExclusiveLock on extension

From

Jeff Janes

Date:

17 April 2015, 15:55:33

On Fri, Apr 17, 2015 at 1:14 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-04-16 14:23:25 -0700, Qingqing Zhou wrote:
> On Thu, Apr 16, 2015 at 1:24 PM, Andomar <andomar@aule.net> wrote:

> > b) How can you find the name of the relation being extended? based on the
> > relation number.
> select <number>::regclass;

That's not correct. The relfilenode is only equivalent to the relation's
oid when you initially create it. But once rewritten it'll change.

Rather use

SELECT oid::regclass FROM pg_class WHERE pg_relation_filenode(oid) = 2937136;

But the thing being reported in the lock wait log message is the relation OID itself, not the relfilenode.

You would use your query if you get the number by watching the file names in the file system as they grow,

but not if you get it from the log message.

Cheers,

Jeff

On Fri, Apr 17, 2015 at 1:14 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-04-16 14:23:25 -0700, Qingqing Zhou wrote:
> On Thu, Apr 16, 2015 at 1:24 PM, Andomar <andomar@aule.net> wrote:
> > After upgrading our database from 9.3.5 to 9.4.1 last night, the server
> > suffers from high CPU spikes. During these spikes, there are a lot of these
> > messages in the logs:
> >
> > process X still waiting for ExclusiveLock on extension of relation Y of
> > database Z after 1036.234 ms
> >
> This issue has been complained several times, and here is the most recent one:
> http://www.postgresql.org/message-id/0DDFB621-7282-4A2B-8879-A47F7CECBCE4@simply.name
>
> PG 9.4.1 shall have much alleviated it by relaxing buffer pool related
> locks.

Hm. I'm not aware of related changes in 9.4? 9.5 should be a bit better,
but I don't think 9.4 will make much of a difference.

> > b) How can you find the name of the relation being extended? based on the
> > relation number.
> select <number>::regclass;

That's not correct. The relfilenode is only equivalent to the relation's
oid when you initially create it. But once rewritten it'll change.

Rather use

SELECT oid::regclass FROM pg_class WHERE pg_relation_filenode(oid) = 2937136;

> > Any suggestions on how to approach this issue are welcome.
> >
> There are some diagnosis in above link, see if it rings any bell. From
> PG kernel side, I think fundamentally we may want to extend many pages
> each time instead of one.

I don't really agree that that's the most important bit. See
http://archives.postgresql.org/message-id/20150329185619.GA29062%40alap3.anarazel.de

Greetings,

Andres Freund

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Waiting on ExclusiveLock on extension

From

Qingqing Zhou

Date:

17 April 2015, 18:10:24

On Fri, Apr 17, 2015 at 1:14 AM, Andres Freund <andres@anarazel.de> wrote:
> Hm. I'm not aware of related changes in 9.4? 9.5 should be a bit better,
> but I don't think 9.4 will make much of a difference.
>
You are right. I mis-read the check-in log.

> I don't really agree that that's the most important bit. See
> http://archives.postgresql.org/message-id/20150329185619.GA29062%40alap3.anarazel.de
>
Thanks for the link. I actually think changes in a more fundamental
way - will follow on that thread.

Regards,
Qingqing

Re: Waiting on ExclusiveLock on extension

From

Jim Nasby

Date:

17 April 2015, 18:50:34

On 4/16/15 4:39 PM, Andomar wrote:
> Thanks for your reply.
>> This issue has been complained several times, and here is the most
>> recent one:
>> http://www.postgresql.org/message-id/0DDFB621-7282-4A2B-8879-A47F7CECBCE4@simply.name
>>
> That post is about a server with huge shared_buffers, but ours is just
> 8GB. Total memory 48GB memory on a dedicated server. Checkpoints write
> around 2% of the buffers.

Yeah, I suspect the OP in that thread was seeing something different
than you are, but that's just a guess.

>> PG 9.4.1 shall have much alleviated it by relaxing buffer pool related
>> locks. PG 9.4.1 shall be actually better in relation extension
>> handling - a possible explanation is that your 9.3.5 database has been
>> used for a while thus there are holes in pages, so not many extensions
>> are required.
> The 9.3.5 version went live as an upgrade from 9.1.x in the same way. So
> it started from an SQL dump. The load has not changed much since the
> 9.3.5 upgrade.

Yes, but did you have the same workload when you upgraded to 9.3 as you
do today?

> With holes in pages, I suppose you mean the fill factor?

The fill factor plays a role in whether a page has free space, yes,
*especially* during an initial load (like restoring from pg_dump). Keep
in mind that as the system runs you're going to be creating free space
in pages as data is updated or deleted. So the amount of free space per
page on the 9.3 database you moved from wouldn't be anything like what
it was when you moved to 9.3.

> Is there a way
> to see the current fillfactor of a table and its indices?

Possibly the bloat query
(https://wiki.postgresql.org/wiki/Show_database_bloat) would. The page
inspect utility certainly would tell you. Possibly more useful would be
to see what the pg_freespacemap contrib module shows in the free space
map, since that's what the system will actually use to see where it can
find a page to insert data on.

BTW, something else to be aware of: because you essentially re-loaded
all your data in a single transaction, that means that a: you'll be
doing a lot of hint bit updates until all data has been read a second
time, and b: a lot of this data will come up for freezing at around the
same time, creating a big chunk of work for autovacuum. That's caused
problems for me in the past, though that was on a database that had a
pretty high workload.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Waiting on ExclusiveLock on extension

From

Andomar

Date:

17 April 2015, 21:22:17

> Yes, but did you have the same workload when you upgraded to 9.3 as
> you do today?

The workload is very similar. We upgraded from 9.1 to 9.3 only two
months ago, and our usage statistics have not changed much. There were
no "remaining connection slots are reserved for non-replication
superuser connections" messages in the weeks after the 9.3 upgrade.

> BTW, something else to be aware of: because you essentially re-loaded
> all your data in a single transaction, that means that a: you'll be
> doing a lot of hint bit updates until all data has been read a second
> time, and b: a lot of this data will come up for freezing at around
> the same time, creating a big chunk of work for autovacuum. That's
> caused problems for me in the past, though that was on a database that
> had a pretty high workload.

We run "VACUUM ALL" every night and although that gave a bit more CPU
and I/O but nothing major.

When the problem occurs, the symptoms are:
- spike to a very high load average (= CPU usage)
- messages about long waits on ExclusiveLock on extension appear
- remaining connection slots are reserved (with corresponding end user
impact)

An example of a message with 1 second wait duration and a lot of waiting
queries:

process 667 still waiting for ExclusiveLock on extension of relation
1249 of database 16406 after 1005.226 ms","Process holding the lock:
36279. Wait queue: 36725, 36405, 36511, 36721, 36280, 36048, 36566,
36636, 36466, 36734, 36723, 36621, 36423, 36931, 36720, 36429, 35500,
36735, 37015, 36717, 36938, 36870, 36732, 36587, 36869, 36285, 36573,
37101, 36937, 36414, 36834, 37105, 36867, 36724, 36991, 37102, 36882,
36802, 37163, 39964, 39723, 40044, 39821, 40150, 40218, 40203, 40054,
40060, 40173, 40091, 40174, 40058, 40658, 40370, 40177, 40920, 41085,
41103, 41117, 41154, 41161, 41066, 41053, 41380, 40661, 40632, 40698,
41242, 40681, 41174, 41328, 41075, 41245, 41326, 41523, 41153, 41170,
40543, 41314, 41526, 41490, 41157, 41353, 41472, 41730, 41546, 45087,
41535, 41474, 41362, 41450, 41948, 41929, 41459, 41508, 42117, 42127,
41950, 41922, 42414, 41939, 42565, 42643, 42242, 41796, 42324, 42358,
42411, 42487, 41758, 42120, 42570, 41820, 41925, 43356, 43381, 43360,
43351, 43364, 42336, 42871, 43007, 42455, 43363, 42287, 43336, 42652,
42803, 43567, 43706, 43795, 43630, 43716, 43810, 43596, 43061, 43954,
44014, 43377, 43366, 43825, 43454, 43840, 43582, 43839, 44009, 43842,
43693, 44320, 43824, 43456, 43852, 43863, 44708, 43848, 44255, 44587,
44936, 44915, 44759, 44700, 44948, 45051, 44808, 45189, 45137, 45037,
45303, 45294, 45710, 45711, 45755, 45660, 45120, 45576, 46221, 46125,
46703, 46512, 46399, 46684, 46762, 46373, 46929, 46443, 46817, 46858,
47017, 46886, 46805, 46890, 47593, 47548, 47272, 47454, 47906, 47462,
47801, 47939, 47994, 48098, 48363, 47815, 48393, 48250, 48213, 48470,
48408, 48509, 48499, 48115, 48448, 48877, 461, 560, 637, 48902, 641,
49019, 667, 877, 1306, 1070, 1265.",,,,"SQL statement ""CREATE TEMPORARY
TABLE

An example of a message with 140 second wait duration:

process 27173 acquired ExclusiveLock on extension of relation 16787 of
database 16406 after 138522.699 ms",,,,,"SQL statement ""INSERT INTO

There are about 50 queries a second, so a 10 second lock will exhaust
the number of available connections.

We keep sar logs, and they show an unusual amount of pgscand/s and
pgsteal/s around the time of the problem:

07:25:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s
pgscand/s pgsteal/s    %vmeff
08:25:01 PM     29.93   1069.13 393727.33      0.00 170655.03 0.00
90.20     90.20    100.00
08:26:01 PM     20.14   1325.55 388751.51      0.00 157188.40 0.00
315.81    315.81    100.00
08:27:02 PM     59.33   1215.97 612453.98      0.00 271483.71 0.00
180.40    180.40    100.00
08:28:01 PM     43.28   1881.71 473067.50      0.00 236778.10 0.00
114.87    114.87    100.00
08:29:01 PM     23.49    665.73 406865.61      0.00 213178.17 0.00
293.11    293.11    100.00
08:30:01 PM     31.39   1317.51 448565.03      0.02 193320.76 0.00
225.88    225.83     99.98
-- Problem starts here
08:31:02 PM     21.16   1499.32 468467.44      0.10 211767.21 0.00
4311.87   4311.84    100.00
08:32:01 PM     10.19    648.87 261067.16      0.05 167231.84 0.00
1071.01   1070.98    100.00
08:33:01 PM     63.59    950.94 422101.19      0.22 242284.12 0.00
1243.37   1243.34    100.00
-- Problem ends here
08:34:01 PM     24.97   1321.61 412294.87      0.00 273734.03 0.00
0.00      0.00      0.00
08:35:01 PM     11.60   1094.22 353741.41      0.00 238541.73 0.00
0.00      0.00      0.00
08:36:01 PM     39.22    976.60 368450.80      0.10 240632.57 0.00
0.00      0.00      0.00
08:37:01 PM     19.83    967.31 320415.39      0.00 217557.42 0.00
0.00      0.00      0.00
08:38:01 PM     15.68   1884.09 301785.58      0.00 200274.51 0.00
0.00      0.00      0.00
08:39:01 PM     62.61    858.31 487099.01      0.00 330130.65 0.00
0.00      0.00      0.0

pgscand/s = Number of pages scanned directly per second.
pgsteal/s = Number of pages the system has reclaimed from cache
(pagecache and swapcache) per second to satisfy its memory demands.

Could the pgscand and pgsteal numbers provide a hint?  They're sometimes
zero for more than half an hour, so they don't seem related to checkpoints.

Kind regards,
Andomar

Re: Waiting on ExclusiveLock on extension

From

Jim Nasby

Date:

18 April 2015, 00:42:11

On 4/17/15 4:22 PM, Andomar wrote:
>
>> Yes, but did you have the same workload when you upgraded to 9.3 as
>> you do today?
>
> The workload is very similar. We upgraded from 9.1 to 9.3 only two
> months ago, and our usage statistics have not changed much. There were
> no "remaining connection slots are reserved for non-replication
> superuser connections" messages in the weeks after the 9.3 upgrade.
>
>> BTW, something else to be aware of: because you essentially re-loaded
>> all your data in a single transaction, that means that a: you'll be
>> doing a lot of hint bit updates until all data has been read a second
>> time, and b: a lot of this data will come up for freezing at around
>> the same time, creating a big chunk of work for autovacuum. That's
>> caused problems for me in the past, though that was on a database that
>> had a pretty high workload.
>
> We run "VACUUM ALL" every night and although that gave a bit more CPU
> and I/O but nothing major.
>
> When the problem occurs, the symptoms are:
> - spike to a very high load average (= CPU usage)
> - messages about long waits on ExclusiveLock on extension appear
> - remaining connection slots are reserved (with corresponding end user
> impact)
>
> An example of a message with 1 second wait duration and a lot of waiting
> queries:
>
> process 667 still waiting for ExclusiveLock on extension of relation
> 1249 of database 16406 after 1005.226 ms","Process holding the lock:
> 36279. Wait queue: 36725, 36405, 36511, 36721, 36280, 36048, 36566,
...
> 49019, 667, 877, 1306, 1070, 1265.",,,,"SQL statement ""CREATE TEMPORARY
> TABLE

Ok, that's a MAJOR hint, because relation 1249 is a system catalog;
namely pg_attribute. So I think what's happening here is that your
catalog has become horrifically bloated. I'm 99% certain that VACUUM ALL
will not vacuum the catalog tables.

Do you by chance have autovacuum turned off?

A manual VACUUM VERBOSE pg_attribute might provide some immediate relief.

> An example of a message with 140 second wait duration:
>
> process 27173 acquired ExclusiveLock on extension of relation 16787 of
> database 16406 after 138522.699 ms",,,,,"SQL statement ""INSERT INTO

This is a different case though, because 16787 is not a catalog table.
(SELECT 16787::regclass; will tell you what table that is).

> There are about 50 queries a second, so a 10 second lock will exhaust
> the number of available connections.

Are you using a connection pool? Establishing 50 new database
connections per second won't do anything to help performance...

Running out of connections in this scenario isn't surprising.

> We keep sar logs, and they show an unusual amount of pgscand/s and
> pgsteal/s around the time of the problem:
>
> 07:25:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s
> pgscand/s pgsteal/s    %vmeff
> 08:25:01 PM     29.93   1069.13 393727.33      0.00 170655.03 0.00
> 90.20     90.20    100.00
> 08:26:01 PM     20.14   1325.55 388751.51      0.00 157188.40 0.00
> 315.81    315.81    100.00

...

> pgscand/s = Number of pages scanned directly per second.
> pgsteal/s = Number of pages the system has reclaimed from cache
> (pagecache and swapcache) per second to satisfy its memory demands.
>
> Could the pgscand and pgsteal numbers provide a hint?  They're sometimes
> zero for more than half an hour, so they don't seem related to checkpoints.

I think what that means is that there was suddenly a big spike in memory
demand at the OS level, so now the OS is frantically dumping cached
pages. That in itself won't explain this, but it may be a clue.

In order to extend a relation we need to ask the filesystem to actually
extend the file (which presumably means at least writing some metadata
to disk), and then I think we create a WAL record. Creating the WAL
record won't by itself write to disk... *unless* the wal_buffers are all
already full. So if you also see an I/O spike when this happens you
could well just be starved from the I/O system (though obviously it'd be
better if we handled that situation more elegantly than this).

Another possibility given the big uptick in page scanning in the OS is
that you're saturating the memory bus... but you'd need WAY more than 5k
pages/s to do that.

I do suspect your pgfree/s is very high though; putting 200k pages/s on
the free list seems like something's broken.

BTW,
http://www.postgresql.org/message-id/466D72D2-68EC-4FF4-93B8-43B687E7B705@simply.name
contains a list of URLs about relation extension problems.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Waiting on ExclusiveLock on extension

From

Andomar

Date:

19 April 2015, 09:24:30

To put the top question first:

How can table extension locks explain a a massive spike in CPU usage?

I can imagine 400 connections waiting on disk I/O, but then, wouldn't
they all be sleeping?

 > Ok, that's a MAJOR hint, because relation 1249 is a system catalog;
 > namely pg_attribute. So I think what's happening here is that your
 > catalog has become horrifically bloated. I'm 99% certain that VACUUM ALL
 > will not vacuum the catalog tables.
 >
 > Do you by chance have autovacuum turned off?
 >
 > A manual VACUUM VERBOSE pg_attribute might provide some immediate relief.
 >
Autovacuum is turned on.  In addition, we do a manual VACUUM ALL at
night.  VACUUM VERBOSE pg_attribute ran in 0 seconds and processed a few
hundred rows.

 > Are you using a connection pool? Establishing 50 new database
 > connections per second won't do anything to help performance...
 >
As I understand it, a pool reduces network and CPU load.  We have never
seen any issues with those.  So the extra monitoring and maintenance
cost of a pool seems hard to justify.

 > I think what that means is that there was suddenly a big spike in memory
 > demand at the OS level, so now the OS is frantically dumping cached
 > pages. That in itself won't explain this, but it may be a clue.
 >
We monitor memory usage with Cacti.  It's a dedicated server and nearly
all memory is used as cache.  If a script runs and demands memory, that
becomes visible as cache is cleared out.  There is no change in the
amount of memory used as cache around the outage.

 > In order to extend a relation we need to ask the filesystem to actually
 > extend the file (which presumably means at least writing some metadata
 > to disk), and then I think we create a WAL record. Creating the WAL
 > record won't by itself write to disk... *unless* the wal_buffers are all
 > already full.
 >
I have a question here, we have "synchronous_commit = off".  So when
Postgres extends a page, would it do that just in memory, or does part
of the extend operation require synchronous I/O?

 > So if you also see an I/O spike when this happens you could well
 > just be starved from the I/O system (though obviously it'd be
 > better if we handled that situation more elegantly than this).

The SAR data shows no increase in pgpgin/s and pgpgout/s, which if I
understand it correctly, means that there is no I/O spike.  There is
however an enormous increase in CPU usage.

 > I do suspect your pgfree/s is very high though; putting 200k pages/s on
 > the free list seems like something's broken.
 >
The system has constant and considerable load of small writes.  The
pg_activity tool shows 300 IOPs sustained (it claims max IPs above
11000.)  Postgres 9.3 had a comparable pgfree/s.

Would you know a good resource to get more knowledgeable about pgfree,
pgpin, pgsteal?

Kind regards,
Andomar

Re: Waiting on ExclusiveLock on extension

From

Jim Nasby

Date:

19 April 2015, 20:08:56

On 4/19/15 4:24 AM, Andomar wrote:
> To put the top question first:
>
> How can table extension locks explain a a massive spike in CPU usage?
>
> I can imagine 400 connections waiting on disk I/O, but then, wouldn't
> they all be sleeping?

Not necessarily. Spinlocks don't put the process to sleep, but they're
also supposed to be very short lived.

>  > Ok, that's a MAJOR hint, because relation 1249 is a system catalog;
>  > namely pg_attribute. So I think what's happening here is that your
>  > catalog has become horrifically bloated. I'm 99% certain that VACUUM ALL
>  > will not vacuum the catalog tables.
>  >
>  > Do you by chance have autovacuum turned off?
>  >
>  > A manual VACUUM VERBOSE pg_attribute might provide some immediate
> relief.
>  >
> Autovacuum is turned on.  In addition, we do a manual VACUUM ALL at
> night.  VACUUM VERBOSE pg_attribute ran in 0 seconds and processed a few
> hundred rows.
>
>  > Are you using a connection pool? Establishing 50 new database
>  > connections per second won't do anything to help performance...
>  >
> As I understand it, a pool reduces network and CPU load.  We have never
> seen any issues with those.  So the extra monitoring and maintenance
> cost of a pool seems hard to justify.

Well, it sounds like you are CPU bound here... :P I don't know if this
is related or not, but it wouldn't hurt. If you install pg_bouncer on
the database server itself (which it's designed for) it shouldn't add
much maintenance cost.

>  > I think what that means is that there was suddenly a big spike in memory
>  > demand at the OS level, so now the OS is frantically dumping cached
>  > pages. That in itself won't explain this, but it may be a clue.
>  >
> We monitor memory usage with Cacti.  It's a dedicated server and nearly
> all memory is used as cache.  If a script runs and demands memory, that
> becomes visible as cache is cleared out.  There is no change in the
> amount of memory used as cache around the outage.
>
>  > In order to extend a relation we need to ask the filesystem to actually
>  > extend the file (which presumably means at least writing some metadata
>  > to disk), and then I think we create a WAL record. Creating the WAL
>  > record won't by itself write to disk... *unless* the wal_buffers are all
>  > already full.
>  >
> I have a question here, we have "synchronous_commit = off".  So when
> Postgres extends a page, would it do that just in memory, or does part
> of the extend operation require synchronous I/O?

Turning that off doesn't mean there will never be an fsync, it just
means that we don't wait for one before returning from COMMIT. I don't
think relation extension itself requires a fsnyc, but see below.

>  > So if you also see an I/O spike when this happens you could well
>  > just be starved from the I/O system (though obviously it'd be
>  > better if we handled that situation more elegantly than this).
>
> The SAR data shows no increase in pgpgin/s and pgpgout/s, which if I
> understand it correctly, means that there is no I/O spike.  There is
> however an enormous increase in CPU usage.

I'm not familiar enough with SAR to know if that's correct or not;
iostat would be a good way to confirm it.

>  > I do suspect your pgfree/s is very high though; putting 200k pages/s on
>  > the free list seems like something's broken.
>  >
> The system has constant and considerable load of small writes.  The
> pg_activity tool shows 300 IOPs sustained (it claims max IPs above
> 11000.)  Postgres 9.3 had a comparable pgfree/s.

That leads me to a new theory... you may be running into problems
finding free buffers in the buffer pool. We need to have a buffer before
we can extend a relation, and if you have a lot of pressure on shared
buffers it can take quite a bit of CPU to find one. To make matters
worse, that search for a buffer takes place while holding the extension
lock.

Would you be able to get a stack trace of a backend that's holding an
extension lock? Or maybe perf would provide some insight.

> Would you know a good resource to get more knowledgeable about pgfree,
> pgpin, pgsteal?

Unfortunately I'm not strong on the system tools.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Waiting on ExclusiveLock on extension

From

Jim Nasby

Date:

19 April 2015, 20:18:39

On 4/19/15 3:08 PM, Jim Nasby wrote:
>>  > I do suspect your pgfree/s is very high though; putting 200k
>> pages/s on
>>  > the free list seems like something's broken.
>>  >
>> The system has constant and considerable load of small writes.  The
>> pg_activity tool shows 300 IOPs sustained (it claims max IPs above
>> 11000.)  Postgres 9.3 had a comparable pgfree/s.
>
> That leads me to a new theory... you may be running into problems
> finding free buffers in the buffer pool. We need to have a buffer before
> we can extend a relation, and if you have a lot of pressure on shared
> buffers it can take quite a bit of CPU to find one. To make matters
> worse, that search for a buffer takes place while holding the extension
> lock.
>
> Would you be able to get a stack trace of a backend that's holding an
> extension lock? Or maybe perf would provide some insight.

BTW,
http://postgresql.org/message-id/flat/20150329185619.GA29062@alap3.anarazel.de
has some useful info about this.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Waiting on ExclusiveLock on extension

From

Andomar

Date:

20 April 2015, 19:05:50

> Would you be able to get a stack trace of a backend that's holding an
> extension lock? Or maybe perf would provide some insight.
>

The outage occurred again but briefer.  There were no ExclusiveLock
messages, presumably because the timeout for logging locks was not
exceeded.  But all available connection slots were used and many
incoming requests were denied.

Below you'll find the "perf report" and the "zoomed in" (I suppose
that's the callstack) of the top entry.

The top entry's call stack has these 4 postmaster functions near the top:

PinBuffer
LockRelease
hash_any
HeapTupleSatisfiesMVCC

We'll be rolling back from 9.4.1 to 9.3.6 tonight, hoping that will
resolve the issue.


===== BELOW A 10 SECOND perf top CAPTURE DURING THE OUTAGE

# ========
# captured on: Mon Apr 20 20:34:43 2015
# hostname : db1a
# os release : 2.6.32-504.1.3.el6.x86_64
# perf version : 2.6.32-504.1.3.el6.x86_64.debug
# arch : x86_64
# nrcpus online : 24
# nrcpus avail : 24
# cpudesc : Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz
# cpuid : GenuineIntel,6,45,7
# total memory : 49373964 kB
# cmdline : /usr/bin/perf record -a -o 2015-04-20_20:34:28 sleep 10
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2
= 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1,
precise_ip = 0, attr_mmap2 = 0, attr_mmap  = 1, attr_mmap_data = 0
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: uncore_r3qpi_1 = 23, uncore_r3qpi_0 = 22, uncore_r2pcie
= 21, uncore_qpi_1 = 20, uncore_qpi_0 = 19, uncore_imc_3 = 18,
uncore_imc_2 = 17, uncore_imc_1 = 16, uncore_imc_0 = 15, uncore_ha = 14,
uncore_pcu = 13, uncore_cbox_5 = 12, uncore_c
# ========
#
# Samples: 960K of event 'cycles'
# Event count (approx.): 757057831613
#
# Overhead          Command          Shared Object
                        Symbol
# ........  ...............  .....................
.................................................
#
     59.73%       postmaster  [kernel.kallsyms]      [k] compaction_alloc
      1.31%       postmaster  [kernel.kallsyms]      [k] _spin_lock
      0.94%       postmaster  [kernel.kallsyms]      [k]
__reset_isolation_suitable
      0.78%       postmaster  [kernel.kallsyms]      [k] compact_zone
      0.67%       postmaster  [kernel.kallsyms]      [k]
get_pageblock_flags_group
      0.64%       postmaster  [kernel.kallsyms]      [k] copy_page_c
      0.48%           :13410  [kernel.kallsyms]      [k] compaction_alloc
      0.45%           :13465  [kernel.kallsyms]      [k] compaction_alloc
      0.45%       postmaster  [kernel.kallsyms]      [k] clear_page_c
      0.44%       postmaster  postgres               [.]
hash_search_with_hash_value
      0.41%           :13324  [kernel.kallsyms]      [k] compaction_alloc
      0.40%           :13561  [kernel.kallsyms]      [k] compaction_alloc
      0.38%           :13374  [kernel.kallsyms]      [k] compaction_alloc
      0.37%           :13272  [kernel.kallsyms]      [k] compaction_alloc
      0.37%       postmaster  [kernel.kallsyms]      [k] unmap_vmas
      0.36%       postmaster  [kernel.kallsyms]      [k] page_fault
      0.36%           :13380  [kernel.kallsyms]      [k] compaction_alloc
      0.35%           :13482  [kernel.kallsyms]      [k] compaction_alloc
      0.34%           :13555  [kernel.kallsyms]      [k] compaction_alloc
      0.34%       postmaster  [kernel.kallsyms]      [k]
set_pageblock_flags_group
      0.34%       postmaster  [kernel.kallsyms]      [k] page_check_address
      0.33%           :13528  [kernel.kallsyms]      [k] compaction_alloc
      0.33%           :13464  [kernel.kallsyms]      [k] compaction_alloc
      0.31%           :13547  [kernel.kallsyms]      [k] compaction_alloc
      0.30%       postmaster  [kernel.kallsyms]      [k] _spin_lock_irqsave
      0.29%           :13395  [kernel.kallsyms]      [k] compaction_alloc
      0.29%           :13546  [kernel.kallsyms]      [k] compaction_alloc
      0.28%       postmaster  [kernel.kallsyms]      [k]
remove_migration_pte
      0.28%           :13355  [kernel.kallsyms]      [k] compaction_alloc
      0.28%       postmaster  [kernel.kallsyms]      [k] list_del
      0.28%           :13432  [kernel.kallsyms]      [k] compaction_alloc
      0.27%           :13258  [kernel.kallsyms]      [k] compaction_alloc
      0.27%           :13328  [kernel.kallsyms]      [k] compaction_alloc
      0.26%       postmaster  [kernel.kallsyms]      [k] __wake_up_bit
      0.26%           :13361  [kernel.kallsyms]      [k] compaction_alloc
      0.25%           :13334  [kernel.kallsyms]      [k] compaction_alloc
      0.25%           :13366  [kernel.kallsyms]      [k] compaction_alloc
      0.25%           :13549  [kernel.kallsyms]      [k] compaction_alloc
      0.25%           :13530  [kernel.kallsyms]      [k] compaction_alloc
      0.24%           :13391  [kernel.kallsyms]      [k] compaction_alloc
      0.24%           :13387  [kernel.kallsyms]      [k] compaction_alloc
      0.24%           :13364  [kernel.kallsyms]      [k] compaction_alloc
      0.24%           :13529  [kernel.kallsyms]      [k] compaction_alloc
      0.24%       postmaster  [kernel.kallsyms]      [k] find_get_page
      0.23%       postmaster  postgres               [.] _bt_compare
      0.22%           :13522  [kernel.kallsyms]      [k] compaction_alloc
      0.22%           :13579  [kernel.kallsyms]      [k] compaction_alloc
      0.22%           :13412  [kernel.kallsyms]      [k] compaction_alloc
      0.21%       postmaster  postgres               [.] AllocSetAlloc
      0.21%           :13407  [kernel.kallsyms]      [k] compaction_alloc
      0.20%           :13569  [kernel.kallsyms]      [k] compaction_alloc
      0.20%           :13322  [kernel.kallsyms]      [k] compaction_alloc
      0.20%           :13554  [kernel.kallsyms]      [k] compaction_alloc
      0.20%           :13327  [kernel.kallsyms]      [k] compaction_alloc


===== BELOW IS THE ZOOM OF THE TOP ENTRY

Samples: 604K of event 'cycles', Event count (approx.): 476360854137,
Thread: postmaster(11565)
  94.92%  postmaster  [kernel.kallsyms]  [k] compaction_alloc
   1.49%  postmaster  [kernel.kallsyms]  [k] __reset_isolation_suitable
   1.01%  postmaster  [kernel.kallsyms]  [k] copy_page_c
   0.54%  postmaster  [kernel.kallsyms]  [k] set_pageblock_flags_group
   0.41%  postmaster  [kernel.kallsyms]  [k] __wake_up_bit
   0.37%  postmaster  [kernel.kallsyms]  [k] find_get_page
   0.16%  postmaster  [kernel.kallsyms]  [k] find_vma
   0.16%  postmaster  libc-2.12.so       [.] _int_malloc
   0.16%  postmaster  postgres           [.] PinBuffer
–’0.14%  postmaster  [kernel.kallsyms]  [k] __alloc_pages_nodemask
–’0.11%  postmaster  [kernel.kallsyms]  [k] filemap_fault
–’0.08%  postmaster  postgres           [.] LockRelease
–’0.08%  postmaster  postgres           [.] hash_any
–’0.06%  postmaster  libc-2.12.so       [.] _int_free
–’0.06%  postmaster  [kernel.kallsyms]  [k] page_remove_rmap
–’0.03%  postmaster  postgres           [.] HeapTupleSatisfiesMVCC
–’0.03%  postmaster  [kernel.kallsyms]  [k] free_pages_and_swap_cache
–’0.03%  postmaster  postgres           [.] s_lock
–’0.02%  postmaster  [kernel.kallsyms]  [k] unlink_anon_vmas
–’0.02%  postmaster  [kernel.kallsyms]  [k] mem_cgroup_lru_add_list
–’0.02%  postmaster  postgres           [.] ReindexIsProcessingIndex
–’0.01%  postmaster  [kernel.kallsyms]  [k] set_page_dirty
–’0.01%  postmaster  postgres           [.] eqjoinsel
–’0.01%  postmaster  [kernel.kallsyms]  [k] perf_event_aux_ctx
–’0.01%  postmaster  postgres           [.] hash_seq_search
–’0.01%  postmaster  postgres           [.] get_mergejoin_opfamilies
–’0.01%  postmaster  [kernel.kallsyms]  [k] acl_permission_check
–’0.01%  postmaster  libc-2.12.so       [.] __sigsetjmp
–’0.01%  postmaster  postgres           [.] GetUserId
–’0.00%  postmaster  postgres           [.] pull_up_subqueries_recurse
–’0.00%  postmaster  [kernel.kallsyms]  [k] do_brk
–’0.00%  postmaster  postgres           [.] SearchSysCacheList
–’0.00%  postmaster  postgres           [.] join_search_one_level

Re: Waiting on ExclusiveLock on extension

From

Andomar

Date:

20 April 2015, 20:42:09

Another update (let me know if I'm mailing too much).

 >      59.73%       postmaster  [kernel.kallsyms]      [k]
 > compaction_alloc

The compaction_alloc function seemed to point to issues with transparent
huge pages (THP.)  I tried to turn this off with:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

This change yielded immediate results.  The "migration/xx" processes
disappeared from "top", the function "compaction_alloc" disappeared from
"perf top", and system load dropped from 5 to 0.3.

We've called off the 9.4 -> 9.3 downgrade for now.
For anyone else finding this thread, we are running CentOS 6.5 with
kernel 2.6.32 (which was released in Dec 2009.)

Cheers,
Andomar