Server vacuuming the same table again and again - Mailing list pgsql-performance

From Дмитрий Шалашов
Subject Server vacuuming the same table again and again
Date
Msg-id CAKPeCUE60iAdwDCJv82DTNQYpJgrM_O2ehTPBAdbx6kqFWrunA@mail.gmail.com
Whole thread Raw
Responses Re: Server vacuuming the same table again and again
Re: Server vacuuming the same table again and again
List pgsql-performance
Hi!

Half a day ago one of our production PG servers (arguably busiest one) become very slow; I went to investigate the issue and found that it runs simultaneously '(auto)VACUUM ANALYZE recommendations' - largest table on that server - and checkpoint, giving a 100% disk load, that resulted with queue of queries which only made things worse of course.
For a while I tried to set different ionice settings to wal writer and checkpointer processes (-c 2 -n [5-7]) for no visible effect. Then I cancelled autovacuum and it seems to help.

When things settled up and day was reaching end I started VACUUM ANALYZE of this table by hand and continued observations.
Vacuum ended in about 2 hours and half. But soon I noticed that server started another autovacuum of the same table...
Problems returned and resolved after it finished (not 100% sure it was the reason though).

In the morning autovacuum was back. And then it finished and I gone to work. And now I'm here and there is autovacuum again %)
And load too. But I had to say, sometimes there is autovacuum and no load. I'm not really sure autovacuum is the culprit, but there is correlation and it behaves strange anyway.
In the app code nothing changed I believe.

Any recommendations where to dig further?

PG version: 9.2.8

Server hardware: E5-2690 x 2, 96GB RAM, 146GB 15k SAS x 8, HP P420i 2G RAID controller, raid 1 for system and raid 50 for DB.

Perfomance settings changed:
shared_buffers = 24GB
temp_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 1GB
effective_cache_size = 48GB
effective_io_concurrency = 6 (I just realised I have to set it to 4, right?)
synchronous_commit = off
checkpoint_segments = 64
checkpoint_timeout = 10min
checkpoint_completion_target = 0.8
checkpoint_warning = 3600s

Plus I set vm.dirty_background_bytes to 134217728 and vm.dirty_bytes to 1073741824.

Also I believe now that raid 1 for system might be a mistake. Maybe give it for WAL?

Best regards,
Dmitriy Shalashov

pgsql-performance by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: tsearch2, large data and indexes
Next
From: Ilya Kosmodemiansky
Date:
Subject: Re: Server vacuuming the same table again and again