Re: autovacuum truncate exclusive lock round two - Mailing list pgsql-hackers
From | Jan Wieck |
---|---|
Subject | Re: autovacuum truncate exclusive lock round two |
Date | |
Msg-id | 50B77432.10007@Yahoo.com Whole thread Raw |
In response to | Re: autovacuum truncate exclusive lock round two ("Kevin Grittner" <kgrittn@mail.com>) |
Responses |
Re: autovacuum truncate exclusive lock round two
|
List | pgsql-hackers |
On 11/28/2012 3:33 PM, Kevin Grittner wrote: > Kevin Grittner wrote: > >> I still need to review the timing calls, since I'm not familiar >> with them so it wasn't immediately obvious to me whether they >> were being used correctly. I have no reason to believe that they >> aren't, but feel I should check. > > It seems odd to me that assignment of one instr_time to another is > done with INSTR_TIME_SET_ZERO() of the target followed by > INSTR_TIME_ADD() with the target and the source. It seems to me > that simple assignment would be more readable, and I can't see any > down side. > > Why shouldn't: > > INSTR_TIME_SET_ZERO(elapsed); > INSTR_TIME_ADD(elapsed, currenttime); > INSTR_TIME_SUBTRACT(elapsed, starttime); > > instead be?: > > elapsed = currenttime; > INSTR_TIME_SUBTRACT(elapsed, starttime); > > And why shouldn't: > > INSTR_TIME_SET_ZERO(starttime); > INSTR_TIME_ADD(starttime, currenttime); > > instead be?: > > starttime = currenttime; > > Resetting starttime this way seems especially odd. instr_time is LARGE_INTEGER on Win32 but struct timeval on Unix. Is starttime = currenttime; portable if those are structs? > >> Also, I want to do another pass looking just for off-by-one >> errors on blkno. Again, I have no reason to believe that there is >> a problem; it just seems like it would be a particularly easy >> type of mistake to make and miss when a key structure has this >> field: >> >> BlockNumber nonempty_pages; >> /* actually, last nonempty page + 1 */ > > No problems found with that. > >> And I want to test more. > > The patch worked as advertised in all my tests, but I became > uncomforatable with the games being played with the last autovacuum > timestamp and the count of dead tuples. Sure, that causes > autovacuum to kick back in until it has dealt with the truncation, > but it makes it hard for a human looking at the stat views to see > what's going on, and I'm not sure it wouldn't lead to bad plans due > to stale statistics. > > Personally, I would rather see us add a boolean to indicate that > autovacuum was needed (regardless of the math on the other columns) > and use that to control the retries -- leaving the other columns > free to reflect reality. You mean to extend the stats by yet another column? The thing is that this whole case happens rarely. We see this every other month or so and only on a rolling window table after it got severely bloated due to some abnormal use (purging disabled for some operational reason). The whole situation resolves itself after a few minutes to maybe an hour or two. Personally I don't think living with a few wrong stats on one table for that time is so bad that it warrants changing that much more code. Lower casing TRUE/FALSE will be done. If the LW_SHARED is enough in LockHasWaiters(), that's fine too. I think we have a consensus that the check interval should be derived from deadlock_timeout/10 clamped to 10ms. I'm going to remove that GUC. About the other two, I'm just not sure. Maybe using half the value as for the lock waiter interval as the lock retry interval, again clamped to 10ms, and simply leaving one GUC that controls how long autovacuum should retry this. I'm not too worried about the retry interval. However, the problem with that overall retry duration is that this value very much depends on the usage pattern of the database. If long running transactions (like >5 seconds) are the norm, then a hard coded value of 2 seconds before autovacuum gives up isn't going to help much. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
pgsql-hackers by date: