Hi Jeff and all,
On 23/05/15 22:13, Jeff Janes wrote:
> Are you sure it is the read IO that causes the problem?
Yes. Trouble is here that we are talking about a 361 GB table
List of relationsSchema | Name | Type | Owner | Size
|
Description
--------+-----------------------------+----------+----------+------------+-------------public | *redacted*_y2015m04
| table | postgres | 361 GB |
and while we have
shared_buffers = 325GBhuge_pages = on
this is not the only table of this size (total db size ist 1.8tb) and more
current data got written to *redacted*_y2015m05 (the manually-partitioned table
for may), so most of the m04 data would have got evicted from the cache when
this issue surfaced initially.
There is one application pushing data (bulk inserts) and we have transaction
rates for this app in a log. The moment the vacuum started, these rates dropped.
Unfortunately I cannot present helpful log excerpts here as the autovacuum never
finished so far (because the admin killed the db), so we have zero logging about
past autovac events.
At the moment, the application is shut down and the machine is only running the
vacs:
query_start | 2015-05-22 19:33:52.44334+02
waiting | f
query | autovacuum: VACUUM public.*redacted*_y2015m04 (to prevent
wraparound)
query_start | 2015-05-22 19:34:02.46004+02
waiting | f
query | autovacuum: VACUUM ANALYZE public.*redacted*_y2015m05 (to
prevent wraparound)
so we know that any io must be caused by the vacs:
shell# uptime13:33:33 up 1 day, 18:01, 2 users, load average: 5.75, 12.71, 8.43
shell# zpool iostat capacity operations bandwidth
pool alloc free read write read write
--------------- ----- ----- ----- ----- ----- -----
tank1 358G 6.90T 872 55 15.1M 3.08M
Again, we know IO capacity is insufficient, the pool is on 2 magnetic disks only
atm, so an avg read rate of 872 IOPS averaged over 42 hours is not even bad.
> I don't know happened to that, but there is another patch waiting for review and
> testing:
>
> https://commitfest.postgresql.org/5/221/
This is really interesting, thank you very much for the pointer.
Cheers, Nils