Thread: PostgreSQL 9.2.3 performance problem caused Exclusive locks
Hi, I upgraded our master database server from 9.2.2 to 9.2.3 on Monday. We have been experiencing performance problems since then. Yesterday, our application hit the connection limit 5 times. It causes approximately 15 seconds of downtime. The database server hit 50 load average, then everything came back to normal. We have a very good database server dedicated to PostgreSQL. It has 64 cores and 200 GiB of RAM which is 2 times bigger than our database. We run PostgreSQL on RHEL relase 6.2. The database executes 2k transactions per second in busy hours. The server is running 1 - 2 load average normally. PostgreSQL writes several following logs during the problem which I never saw before 9.2.3: LOG: process 4793 acquired ExclusiveLock on extension of relation 305605 of database 16396 after 2348.675 ms The relation 305605 was the biggest table of the database. Our application stores web service logs as XML's on that table. It is only used to insert new rows. One row is approximately 2 MB and 50 rows inserted per second at most busy times. We saw autovacuum processes during the problem. We disabled autovacuum for that table but is does not help. I tried to archive the table. Create a new empty one, but it does not help, too. We also have an unlogged table to used by our application for locking. It is autovacuumed every 5 minutes as new rows are inserted and deleted continuously. Most of our configuration parameters remain default except the following: max_connections = 200 shared_buffers = 64GB max_prepared_transactions = 0 work_mem = 64MB maintenance_work_mem = 512MB shared_preload_libraries = '$libdir/pg_stat_statements' wal_level = hot_standby checkpoint_segments = 40 effective_cache_size = 128GB track_activity_query_size = 8192 autovacuum = on autovacuum_max_workers = 10 I will try to reduce autovacuum_max_workers and increase max_connections to avoid downtime. Do you have any other suggestions? Do you know what might have caused this problem? Do you think downgrading to 9.2.2 is a good idea?
On Fri, 08 Mar 2013 20:39:45 +0200, AI Rumman <rummandba@gmail.com> wrote: > Knowing your problem, I read the docs and found that : > * > * > > *Fix performance problems with autovacuum truncation in busy workloads > (Jan > Wieck)* > > *Truncation of empty pages at the end of a table requires exclusive lock, > but autovacuum was coded to fail (and release the table lock) when there > are conflicting lock requests. Under load, it is easily possible that > truncation would never occur, resulting in table bloat. Fix by > performing a > partial truncation, releasing the lock, then attempting to re-acquire the > lock and > continue<http://www.postgresql.org/docs/9.2/static/release-9-2-3.html#>. > This fix also greatly reduces the average time before autovacuum releases > the lock after a conflicting request arrives.* > > This could be a reason of your locking. Yes, I saw this. It is commit b19e4250b45e91c9cbdd18d35ea6391ab5961c8d by Jan Wieck. He also seems worried in the commit message about this patch. Do you think this is the exact reason of the problem? I have downgraded to 9.2.2, decreased the autovacuum_max_workers to 2 from 10 and increase max_connections to 500 from 200 in the mean time. There are not any ExclusiveLock's since then.
2013-03-08 13:27:16 +0200 Emre Hasegeli <emre.hasegeli@tart.com.tr>: > PostgreSQL writes several following logs during the problem which I never > saw before 9.2.3: > > LOG: process 4793 acquired ExclusiveLock on extension of relation > 305605 of database 16396 after 2348.675 ms I tried * to downgrade to 9.2.2 * to disable autovacuum * to disable synchronous commit * to write less on the big tables * to increase checkpoint segments * to increase max connections * to move pg_xlog to sepe None of them helps to avoid downtimes. I could not find anything related to it? Do you have any idea? Have you ever experience something like this?
Emre, > > LOG: process 4793 acquired ExclusiveLock on extension of relation > > 305605 of database 16396 after 2348.675 ms The reason you're seeing that message is that you have log_lock_waits turned on. That message says that some process waited for 2.3 seconds to get a lock for expanding the size of relation 16396/305605,which is most likely an index. This is most likely due to changes in your application, or an increase in concurrentwrite activity. --Josh
On Friday, March 8, 2013, Emre Hasegeli wrote:
PostgreSQL writes several following logs during the problem which I never
saw before 9.2.3:
LOG: process 4793 acquired ExclusiveLock on extension of relation 305605 of database 16396 after 2348.675 ms
The key here is not that it is an ExclusiveLock, but rather than it is the relation extension lock. I don't think the extension lock is ever held across user-code, or transaction boundaries, or anything like that. It is held over some small IOs. So if it blocked on that for over 2 seconds, you almost surely have some serious IO congestion.
And this particular message is probably more a symptom of that congestion than anything else.
You said you rolled back to 9.2.2 and the stalling is still there. Are you still seeing the log message, or are you now seeing silently stalls? Did you roll back all other changes that were made at the same time as the upgrade to 9.2.3 (kernel versions, filesystem changes/versions, etc.)?
Cheers,
Jeff
On Thu, 14 Mar 2013 06:53:55 +0200, Jeff Janes <jeff.janes@gmail.com> wrote: > On Friday, March 8, 2013, Emre Hasegeli wrote: > > PostgreSQL writes several following logs during the problem which I never >> saw before 9.2.3: >> >> LOG: process 4793 acquired ExclusiveLock on extension of relation >> 305605 >> of database 16396 after 2348.675 ms >> > > The key here is not that it is an ExclusiveLock, but rather than it is > the > relation extension lock. I don't think the extension lock is ever held > across user-code, or transaction boundaries, or anything like that. It > is > held over some small IOs. So if it blocked on that for over 2 seconds, > you > almost surely have some serious IO congestion. > > And this particular message is probably more a symptom of that congestion > than anything else. > > You said you rolled back to 9.2.2 and the stalling is still there. Are > you > still seeing the log message, or are you now seeing silently stalls? Did > you roll back all other changes that were made at the same time as the > upgrade to 9.2.3 (kernel versions, filesystem changes/versions, etc.)? I did not try with different kernel or file system. It was not because of 9.2.3, same problem occurred in both 9.2.2 and 9.2.3. Increasing max connections make it worse. It lasts almost 15 minutes in the last time. There were not much disk utilization while it is happening, "top" was pointing out most of the CPU usage on the %sy column, there were no IO wait. I saw "allocstalls" increasing on "atop". There were a lot of slow insert statements in the logs except ExclusiveLock waits. We were using 64 GiB of shared buffers. RhodiumToad suggested to reduce it on the IRC channel. It did not happen since then. It was a real problem for us. I could not find anything related to it. I cannot let it happen again on the production environment but I would be happy to share more experience, if it would help you fix it.
I am having the same exact problems. I reduced shared buffers as that seems to have done the trick for now in this thread. If things improve I'll post back and confirm. -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-9-2-3-performance-problem-caused-Exclusive-locks-tp5747909p5756113.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.