Thread: lazy vacuum sleeps with exclusive lock on table
Hi, I noticed that lazy vacuum acquires an exclusive lock at the end, to be able to truncate the table. This is not a surprise. If it cannot acquire the lock, it simply skips truncating the table and goes on with life. However, what's problematic is that if a non-zero cost delay has been set, it will happily take naps while determining what to truncate :-( This seems a bad idea. It also may explain why some people is seeing autovacuum blocking other processes. It also readily explains why this is so when there are no non-granted locks for autovacuum. Comments? I think we should remove the sleep in the truncate phase. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, 2007-06-28 at 17:16 -0400, Alvaro Herrera wrote: > I noticed that lazy vacuum acquires an exclusive lock at the end, to be > able to truncate the table. This is not a surprise. If it cannot > acquire the lock, it simply skips truncating the table and goes on with > life. > > However, what's problematic is that if a non-zero cost delay has been > set, it will happily take naps while determining what to truncate :-( > This seems a bad idea. It also may explain why some people is seeing > autovacuum blocking other processes. It also readily explains why this > is so when there are no non-granted locks for autovacuum. > > Comments? I think we should remove the sleep in the truncate phase. Do we have any timings for that lock-out? Even with a largish sleep delay, I can't think it's locked out for that long. Seems like VACUUM shouldn't try just once to get the lock. It could be very frustrating to wait hours for a VACUUM to finish, only to find a small query prevents file truncation. That's just too random. It should retry as many times as there are blocks for it to truncate i.e. it tries harder to truncate the more it needs to do so. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Thu, 2007-06-28 at 17:16 -0400, Alvaro Herrera wrote: > > > I noticed that lazy vacuum acquires an exclusive lock at the end, to be > > able to truncate the table. This is not a surprise. If it cannot > > acquire the lock, it simply skips truncating the table and goes on with > > life. > > > > However, what's problematic is that if a non-zero cost delay has been > > set, it will happily take naps while determining what to truncate :-( > > This seems a bad idea. It also may explain why some people is seeing > > autovacuum blocking other processes. It also readily explains why this > > is so when there are no non-granted locks for autovacuum. > > > > Comments? I think we should remove the sleep in the truncate phase. > > Do we have any timings for that lock-out? Even with a largish sleep > delay, I can't think it's locked out for that long. I created a table like this: create table foo (a int); begin; insert into foo select * from generate_series(1, 1000000); rollback; It took it 14 seconds to truncate with 50ms vacuum delay. What I'm requesting here is that the sleep in count_nondeletable_pages() be removed and that change backpatched to 8.2 and 8.1. > Seems like VACUUM shouldn't try just once to get the lock. It could be > very frustrating to wait hours for a VACUUM to finish, only to find a > small query prevents file truncation. That's just too random. It should > retry as many times as there are blocks for it to truncate i.e. it tries > harder to truncate the more it needs to do so. We don't know how many pages we can truncate until after we have acquired the exclusive lock and examined the pages in question, scanning backwards from the end of the table. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> wrote: > What I'm requesting here is that the sleep in count_nondeletable_pages() > be removed and that change backpatched to 8.2 and 8.1. Agreed. We'd better to shorten the exclusive locking as far as possible. > We don't know how many pages we can truncate until after we have > acquired the exclusive lock and examined the pages in question, scanning > backwards from the end of the table. But many OSes do not care about read-ahead in backward scanning. I have a test result that shows truncating a large part of table takes very long time. Is it better better to change it to forward scanning? For example, starting with (tail of the file - 16MB) and scanning 16MB of segment forward to decide the position for truncation. If we can truncate all of the segment, do recheck from (tail of the file - 32MB) and repeat. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Alvaro Herrera <alvherre@commandprompt.com> writes: > What I'm requesting here is that the sleep in count_nondeletable_pages() > be removed and that change backpatched to 8.2 and 8.1. Are you sure that that is, and always will be, the only sleep in that part of the code path? Seems like it might be better to adjust the cost_delay parameters after we acquire exclusive lock. I'm not sure dialing them all the way back to zero is a good idea, but certainly we could make them more aggressive. >> Seems like VACUUM shouldn't try just once to get the lock. > We don't know how many pages we can truncate until after we have > acquired the exclusive lock and examined the pages in question, scanning > backwards from the end of the table. We could estimate this during the forward scan. The backward scan is still necessary to see if anyone has inserted tuples after we looked at a page, but certainly a page that VACUUM failed to empty will still be nonempty, so we can determine an upper bound on how much might be truncatable. However, that's an orthogonal concern and should probably be discussed separately. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > What I'm requesting here is that the sleep in count_nondeletable_pages() > > be removed and that change backpatched to 8.2 and 8.1. > > Are you sure that that is, and always will be, the only sleep in that > part of the code path? It is currently, as far as I can see, the only sleep. I think we could backpatch the removal of that call, and consider changing the cost_delay parameters when we acquire the exclusive lock in HEAD. I haven't tried with crazy features like gist indexes though. Maybe there's more sleep calls in the vacuum code for those. But from what I can gather, all the index clean up is done before trying to truncate the relation so we should be safe. Also, we don't release the exclusive lock; we hold on it till commit. Maybe it would be a good idea to release it as soon as we're done with it. > Seems like it might be better to adjust the cost_delay parameters after > we acquire exclusive lock. I'm not sure dialing them all the way back > to zero is a good idea, but certainly we could make them more > aggressive. Hmm. The less we keep the exclusive lock, the better. I think an extra bit of I/O for a short moment is warranted in this case -- better than holding a lock that lots of processes could be waiting on. > >> Seems like VACUUM shouldn't try just once to get the lock. > > > We don't know how many pages we can truncate until after we have > > acquired the exclusive lock and examined the pages in question, scanning > > backwards from the end of the table. > > We could estimate this during the forward scan. The backward scan is > still necessary to see if anyone has inserted tuples after we looked at > a page, but certainly a page that VACUUM failed to empty will still be > nonempty, so we can determine an upper bound on how much might be > truncatable. > > However, that's an orthogonal concern and should probably be discussed > separately. Right (and furthermore we shouldn't backpatch it). -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "This is a foot just waiting to be shot" (Andrew Dunstan)
On Fri, 2007-06-29 at 09:29 +0900, ITAGAKI Takahiro wrote: > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > What I'm requesting here is that the sleep in count_nondeletable_pages() > > be removed and that change backpatched to 8.2 and 8.1. > > Agreed. We'd better to shorten the exclusive locking as far as possible. That is just one possibility, but I'd like to consider other possibilities before we go for that, especially backpatched. ISTM holding the lock across many I/Os is the thing that is causing long lock times. Removing the sleep may not substantially reduce the time on a busy system. Alvaro's example also shows that the number of blocks removed could be a substantial number - reminding us that the time the lock is held would still be O(N), whereas we would like it to be O(1). This is important since we don't even attempt truncation until the number of blocks is large enough to be worth bothering with. Would it be better to keep the sleep in there, but release and re-acquire the lock either side of the sleep? That would allow other transactions to progress without long lock waits. Currently, releasing the lock is a problem because the new FSM entries are added after truncation, so any updates and inserts would probably try to extend the relation, thus preventing further truncation. If we did things differently, we would have no reason to fail when we attempt to re-acquire the lock: - analyze where the truncation point would be on the vacuum pass - add FSM entries for all blocks below the truncation point. If that is below a minimum of 5% of the entries/16 blocks then we can move the truncation point higher so that the FSM entry is large enough to allow us time to truncate. - truncate the file, one bite at a time as we sleep (or max 16 blocks at a time if no sleep requested), possibly scanning forwards not back I would still like to see VACUUM spin a few times trying to acquire the lock before it gives up attempting to truncate. Re-running the whole VACUUM just to get another split-second chance to truncate is not very useful behaviour either. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Simon Riggs wrote: > On Fri, 2007-06-29 at 09:29 +0900, ITAGAKI Takahiro wrote: > > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > > > What I'm requesting here is that the sleep in count_nondeletable_pages() > > > be removed and that change backpatched to 8.2 and 8.1. > > > > Agreed. We'd better to shorten the exclusive locking as far as possible. > > That is just one possibility, but I'd like to consider other > possibilities before we go for that, especially backpatched. > > ISTM holding the lock across many I/Os is the thing that is causing long > lock times. Removing the sleep may not substantially reduce the time on > a busy system. Alvaro's example also shows that the number of blocks > removed could be a substantial number - reminding us that the time the > lock is held would still be O(N), whereas we would like it to be O(1). > This is important since we don't even attempt truncation until the > number of blocks is large enough to be worth bothering with. > > Would it be better to keep the sleep in there, but release and > re-acquire the lock either side of the sleep? That would allow other > transactions to progress without long lock waits. > > Currently, releasing the lock is a problem because the new FSM entries > are added after truncation, so any updates and inserts would probably > try to extend the relation, thus preventing further truncation. If we > did things differently, we would have no reason to fail when we attempt > to re-acquire the lock: > - analyze where the truncation point would be on the vacuum pass > - add FSM entries for all blocks below the truncation point. If that is > below a minimum of 5% of the entries/16 blocks then we can move the > truncation point higher so that the FSM entry is large enough to allow > us time to truncate. > - truncate the file, one bite at a time as we sleep (or max 16 blocks at > a time if no sleep requested), possibly scanning forwards not back > > I would still like to see VACUUM spin a few times trying to acquire the > lock before it gives up attempting to truncate. Re-running the whole > VACUUM just to get another split-second chance to truncate is not very > useful behaviour either. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Alvaro Herrera wrote: > Tom Lane wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > What I'm requesting here is that the sleep in count_nondeletable_pages() > > > be removed and that change backpatched to 8.2 and 8.1. > > > > Are you sure that that is, and always will be, the only sleep in that > > part of the code path? > > It is currently, as far as I can see, the only sleep. I think we could > backpatch the removal of that call, and consider changing the > cost_delay parameters when we acquire the exclusive lock in HEAD. I noticed that autovacuum can reset VacuumCostDelay to a non-zero value when the cost balancing code runs. Of course, we can reset the target value so that resetting it does not cause a problem. I propose applying this patch from 8.1 onwards. HEAD would get an additional treatment to avoid the balancing problem. Note that I am releasing the exclusive lock on the table after the truncate is done. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > I propose applying this patch from 8.1 onwards. HEAD would get an > additional treatment to avoid the balancing problem. If you're going to insert an early unlock, it should be as early as possible, ie right after the RelationTruncate() call. The bookkeeping in between is probably trivial, but why hold the lock for it? Also, rather than just removing the vacuum_delay_point call, you need a comment explicitly pointing out why that loop hasn't got one. Otherwise somebody will think it an oversight and put it back in someday. regards, tom lane