Re: Sudden connection and load average spikes with postgresql 9.3 - Mailing list pgsql-performance

From eudald_v
Subject Re: Sudden connection and load average spikes with postgresql 9.3
Date
Msg-id 1435851680008-5856298.post@n5.nabble.com
Whole thread Raw
In response to Re: Sudden connection and load average spikes with postgresql 9.3  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Dear Josh,
I'm sorry I didn't write before, but we have been very busy with this issue
and, you know, when something goes wrong, the apocalypse comes with it.

I've been working on everything you suggested.

I used your tables and script and I can give you a sample of it on
locked_query_start
 2015-07-02 14:49:45.972129+02 |       15314 |          |       4001 |
| "TABLE_Z"     | tuple        | ExclusiveLock | 24018:24      | relation
| ShareUpdateExclusiveLock |              |             | YYY.YYY.YYY.YYY/32
| 2015-07-02 14:49:26.635599+02 | 2015-07-02 14:49:26.635599+02 | 2015-07-02
14:49:26.635601+02 | INSERT INTO "TABLE_X" ("field1", "field2", "field3",
"field4", "field5", "field6", "field7") VALUES (22359509, 92, 5, 88713,
'XXX.XXX.XXX.XXX', 199, 10)  |            |                  | 2015-07-02
14:11:45.368709+02 | 2015-07-02 14:11:45.368709+02 | active       |
2015-07-02 14:11:45.36871+02  | autovacuum: VACUUM ANALYZE public.TABLE_Z

 2015-07-02 14:49:45.972129+02 |       15857 |          |       4001 |
| "TABLE_Z"     | tuple        | ExclusiveLock | 24018:24      | relation
| ShareUpdateExclusiveLock |              |             | YYY.YYY.YYY.YYY/32
| 2015-07-02 14:49:22.79166+02  | 2015-07-02 14:49:22.79166+02  | 2015-07-02
14:49:22.791665+02 | INSERT INTO "TABLE_X" ("field1", "field2", "field3",
"field4", "field5", "field6", "field7") VALUES (14515978, 92, 5, 88713,
'XXX.XXX.XXX.XXX', 199, 10) |            |                  | 2015-07-02
14:11:45.368709+02 | 2015-07-02 14:11:45.368709+02 | active       |
2015-07-02 14:11:45.36871+02  | autovacuum: VACUUM ANALYZE public.TABLE_Z

2015-07-02 14:49:45.972129+02 |       15314 |          |      14712 |
| "TABLE_Z"     | tuple        | ExclusiveLock | 24018:24      | relation
| AccessShareLock          |              |             |
1YYY.YYY.YYY.YYY/32 | 2015-07-02 14:49:26.635599+02 | 2015-07-02
14:49:26.635599+02 | 2015-07-02 14:49:26.635601+02 | INSERT INTO "TABLE_X"
("field1", "field2", "field3", "field4", "field5", "field6", "field") VALUES
(22359509, 92, 5, 88713, 'XXX.XXX.XXX.XXX', 199, 10)  |            |
185.10.253.72/32 | 2015-07-02 14:48:48.841375+02 | 2015-07-02
14:48:48.841375+02 | active       | 2015-07-02 14:48:48.841384+02 | INSERT
INTO "TABLE_Y" ("email_id", "sendout_id", "feed_id", "isp_id")  VALUES
(46015879, 75471, 419, 0)

All that was recorded during a spike. From this log I have to point
something:
Tables TABLE_X and TABLE_Y have both a TRIGGER that does an INSERT to
TABLE_Z
As you can see, TABLE_Z was being VACUUM ANALYZED. I wonder if TRIGGERS and
VACUUM work well together, just to check another perspective.

We also have carefully looked at our scripts and we have performed some code
optimitzations (like close db connections earlier), but the spikes continue
to happen.

FS is ext4 and I don't know how can I check the transaction log
configuration

This is how IO lookslike before and after any problem happens:
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             357,00      7468,00      8840,00       7468       8840
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5,02    0,00    2,44    0,06    0,00   92,47
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             471,00      7032,00     13760,00       7032      13760
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5,14    0,00    2,92    0,03    0,00   91,92
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             376,00      7192,00      8048,00       7192       8048
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,77    0,00    2,57    0,03    0,00   92,63
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             304,00      7280,00      8252,00       7280       8252

And this is how it looks like when the spike happens:
http://pastebin.com/2hAYuDZ5

Hope it can help into determining what's happening.

Thanks for all your efforts and collaboration!
Eudald



--
View this message in context:
http://postgresql.nabble.com/Sudden-connection-and-load-average-spikes-with-postgresql-9-3-tp5855895p5856298.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: New server: SSD/RAID recommendations?
Next
From: Craig James
Date:
Subject: Re: New server: SSD/RAID recommendations?