Thread: Fixing Simms' vacuum problems
Michael Simms was kind enough to give me login privileges on his system to poke at his problems with vacuum running concurrently with table create/drop operations. I am not sure why his setup seems to display the problem easier than mine does, but it's certainly true that crashes occur very easily there, whereas it often takes many tries for me. Anyway, I am now convinced that his symptoms are indeed explained by the locking and cache-invalidation problems we have been discussing. I saw a number of different failures, but they all seemed to trace back to one of two common themes: (1) The non-vacuuming backend crashes because of accessing a system-relation tuple that isn't in the same place anymore: the tuple is found in the local syscache, but the item location recorded there is stale because vacuum has moved the tuple, and the non-vacuum process hasn't noticed the SI update message for it yet. (2) The vacuuming backend can fail because of trying to vacuum a relation that's already been deleted. This can be blamed on the known bug that DROP TABLE releases its exclusive lock on the target table before end of transaction. I expect there are also failures due to the lack-of-lock problems that Hiroshi recently identified, but I didn't happen to see any of those in the limited number of cases that I watched with the debugger. So, it looks like a solution involves two components: first, being more careful to lock system relations appropriately, and second, being sure that SI messages are seen soon enough. I think the read-SI-messages- at-lock-time code that's already in place for 6.6 will be sufficient for the second point, if we are religious about acquiring appropriate locks. (BTW, I think that in most cases an appropriate lock on a system table will be less strong than AccessExclusiveLock --- Vadim, do you agree?) Once we have the changes, the next question is do we want to risk back-patching them into 6.5.2? I can see several ways that we could proceed: 1. Back-patch into REL6_5, and postpone 6.5.2 release for a while for beta-testing. 2. Put out 6.5.2 now (since it already has several other useful fixes), then back-patch, and release 6.5.3 after a beta-testinginterval. 3. Leave these changes out of 6.5.*, and try to get 6.6 out the door soon instead. I am not eager to hurry 6.6 along --- I have a lot of half-done work in the planner/optimizer that I'd like to finish for 6.6. Perhaps choice #2 is the way to go. Comments? regards, tom lane
> Once we have the changes, the next question is do we want to risk > back-patching them into 6.5.2? I can see several ways that we could > proceed: > 1. Back-patch into REL6_5, and postpone 6.5.2 release for a while > for beta-testing. > 2. Put out 6.5.2 now (since it already has several other useful fixes), > then back-patch, and release 6.5.3 after a beta-testing interval. > 3. Leave these changes out of 6.5.*, and try to get 6.6 out the door > soon instead. > > I am not eager to hurry 6.6 along --- I have a lot of half-done work > in the planner/optimizer that I'd like to finish for 6.6. Perhaps > choice #2 is the way to go. Comments? > > regards, tom lane I woudl also suggest number 2 would be best for all. It means teh bugfix for my (and potentially other peoples) problems gets fixed before 6.6 but there is no delay to the 6.5.2 bugfixes being released. I am curious, is there a reason that there is not a regular release of the development tree also? I am aware we can get it through CVS to hammer on it, but releases would be easier in many ways, certainly easier to develop patches against. Just a thought, as it seems that the linux kernel benefits greatly from this approach. As a final word, I would like to thank tom for his looking into the problem. I have been really impressed with the responses of the postgresql developers, they seem to be a lot more approachable and willing to fix problems than in most other open source systems I have seen. Hopefully when I get a bit more time and get more familiar with the postgresql code, I'll be able to actually provide some solutions instead of just breaking it and telling you lot {:-) Thanks! ~Michael
On Sat, 11 Sep 1999, Tom Lane wrote: > Once we have the changes, the next question is do we want to risk > back-patching them into 6.5.2? I can see several ways that we could > proceed: > 1. Back-patch into REL6_5, and postpone 6.5.2 release for a while > for beta-testing. > 2. Put out 6.5.2 now (since it already has several other useful fixes), > then back-patch, and release 6.5.3 after a beta-testing interval. > 3. Leave these changes out of 6.5.*, and try to get 6.6 out the door > soon instead. > > I am not eager to hurry 6.6 along --- I have a lot of half-done work > in the planner/optimizer that I'd like to finish for 6.6. Perhaps > choice #2 is the way to go. Comments? Option 2 makes *me* feel the most comfortable...we were holding off on 6.5.2 due to some things ppl were working on...are those complete? I can roll out a 6.5.2 tonight if everyone feel comfortable with it, or wait for a few days (Wednesday?) to make sure all is iron'd out? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sat, 11 Sep 1999, Michael Simms wrote: > I am curious, is there a reason that there is not a regular release of the > development tree also? I am aware we can get it through CVS to hammer > on it, but releases would be easier in many ways, certainly easier to develop > patches against. ftp://ftp.postgresql.org/pub/postgresql-snapshot.tar.gz Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> Once we have the changes, the next question is do we want to risk > back-patching them into 6.5.2? I can see several ways that we could > proceed: > 1. Back-patch into REL6_5, and postpone 6.5.2 release for a while > for beta-testing. > 2. Put out 6.5.2 now (since it already has several other useful fixes), > then back-patch, and release 6.5.3 after a beta-testing interval. > 3. Leave these changes out of 6.5.*, and try to get 6.6 out the door > soon instead. > > I am not eager to hurry 6.6 along --- I have a lot of half-done work > in the planner/optimizer that I'd like to finish for 6.6. Perhaps > choice #2 is the way to go. Comments? Seems #2 is good choice for me too. --- Tatsuo Ishii
The Hermit Hacker <scrappy@hub.org> writes: > Option 2 makes *me* feel the most comfortable...we were holding off on > 6.5.2 due to some things ppl were working on...are those complete? I can > roll out a 6.5.2 tonight if everyone feel comfortable with it, or wait for > a few days (Wednesday?) to make sure all is iron'd out? I don't have any more code changes that I want to try to squeeze into 6.5.2, but I thought Bruce still needed to update the change log etc etc. Dunno about the rest of the crew; anyone have more to do? regards, tom lane
> The Hermit Hacker <scrappy@hub.org> writes: > > Option 2 makes *me* feel the most comfortable...we were holding off on > > 6.5.2 due to some things ppl were working on...are those complete? I can > > roll out a 6.5.2 tonight if everyone feel comfortable with it, or wait for > > a few days (Wednesday?) to make sure all is iron'd out? > > I don't have any more code changes that I want to try to squeeze into > 6.5.2, but I thought Bruce still needed to update the change log etc > etc. Dunno about the rest of the crew; anyone have more to do? Yes, I have to do that. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> The Hermit Hacker <scrappy@hub.org> writes: > > Option 2 makes *me* feel the most comfortable...we were holding off on > > 6.5.2 due to some things ppl were working on...are those complete? I can > > roll out a 6.5.2 tonight if everyone feel comfortable with it, or wait for > > a few days (Wednesday?) to make sure all is iron'd out? > > I don't have any more code changes that I want to try to squeeze into > 6.5.2, but I thought Bruce still needed to update the change log etc > etc. Dunno about the rest of the crew; anyone have more to do? > I have updated everything needed for 6.5.2. Thomas, can you update the HISTORY file for 6.5.2. Thanks. This is good timing. I just finished a 4-month project yesterday. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sun, 12 Sep 1999, Bruce Momjian wrote: > > The Hermit Hacker <scrappy@hub.org> writes: > > > Option 2 makes *me* feel the most comfortable...we were holding off on > > > 6.5.2 due to some things ppl were working on...are those complete? I can > > > roll out a 6.5.2 tonight if everyone feel comfortable with it, or wait for > > > a few days (Wednesday?) to make sure all is iron'd out? > > > > I don't have any more code changes that I want to try to squeeze into > > 6.5.2, but I thought Bruce still needed to update the change log etc > > etc. Dunno about the rest of the crew; anyone have more to do? > > > > I have updated everything needed for 6.5.2. Thomas, can you update the > HISTORY file for 6.5.2. Thanks. Okay, will wrap 6.5.2 on Tuesday evening then... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> I don't have any more code changes that I want to try to squeeze into > 6.5.2, but I thought Bruce still needed to update the change log etc > etc. Dunno about the rest of the crew; anyone have more to do? I should put in my recent fix for Tatsuo regarding unspecified string types in case statements. Should get to it this evening (Monday morning, GMT)... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Tom Lane wrote: > > So, it looks like a solution involves two components: first, being more > careful to lock system relations appropriately, and second, being sure > that SI messages are seen soon enough. I think the read-SI-messages- > at-lock-time code that's already in place for 6.6 will be sufficient for > the second point, if we are religious about acquiring appropriate locks. > (BTW, I think that in most cases an appropriate lock on a system table > will be less strong than AccessExclusiveLock --- Vadim, do you agree?) ExclusiveLock should be ok. Vadim