Thread: Problem with pg_upgrade?
I have gotten two reports via IRC that months after using 9.0 pg_upgrade, some of the clog files have been removed while there is still data in the table needing those clog files. These reports came to me through Rhodiumtoad who analyzed the systems. Looking at pg_upgrade, I am concerned that somehow autovaccum is running in frozen mode before I have restored the frozen xids for the table or database. Here is the code I am using: snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " "-o \"-p %d -c autovacuum=off" "-c autovacuum_freeze_max_age=2000000000\" " "start >> \"%s\" 2>&1" SYSTEMQUOTE, bindir, Does anyone have any other suggestions on how to make sure autovacuum does not run in freeze mode? I know 'autovacuum=off' turns off normal autovacuum. Would increasing autovacuum_naptime help? It looks like the autovacuum code sleeps before processing anything, but I am not certain. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, 2011-03-29 at 15:52 -0400, Bruce Momjian wrote: > Does anyone have any other suggestions on how to make sure autovacuum > does not run in freeze mode? Can you run in single user mode? Regards,Jeff Davis
Excerpts from Jeff Davis's message of mar mar 29 21:27:34 -0300 2011: > On Tue, 2011-03-29 at 15:52 -0400, Bruce Momjian wrote: > > Does anyone have any other suggestions on how to make sure autovacuum > > does not run in freeze mode? > > Can you run in single user mode? I asked the same thing. Apparently the problem is that it would make error handling a lot more difficult. I think it would be better to have some sort of option to disable autovacuum completely which would be used only during pg_upgrade. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, 2011-03-29 at 21:43 -0300, Alvaro Herrera wrote: > I think it would be better to have > some sort of option to disable autovacuum completely which would be used > only during pg_upgrade. Sounds reasonable to me. Regards,Jeff Davis
Alvaro Herrera wrote: > Excerpts from Jeff Davis's message of mar mar 29 21:27:34 -0300 2011: > > On Tue, 2011-03-29 at 15:52 -0400, Bruce Momjian wrote: > > > Does anyone have any other suggestions on how to make sure autovacuum > > > does not run in freeze mode? > > > > Can you run in single user mode? > > I asked the same thing. Apparently the problem is that it would make > error handling a lot more difficult. I think it would be better to have > some sort of option to disable autovacuum completely which would be used > only during pg_upgrade. Yes, also consider that pg_dumpall assumes psql with its use of \connect, and I would have to start/stop the single-user backend for every database change, plus I use psql with ON_ERROR_STOP=on. I think we have three options: o find if the use of autovacuum_freeze_max_age is safe, or make it safeo document that autovacuum_naptime always happensbefore autovacuum does anything and set it higho modify autovacuum to be an enum, with values on/off/disabled I think the last one makes more sense, and is safer if we need to backpatch this. Creating a new variable for this would be confusing because it could conflict with the 'autovacuum' setting. Also, I am unclear if this is really our bug. At least one of the systems was on Ubuntu/Debian, and they might both have been, and I know Debian changes our source code. Where can I find a copy of the diffs they have made? I am hearing only second-hand reports of this problem through Rhodiumtoad on IRC. I don't have IRC access this week so if someone can get details from him that would help. I think the fix he found was to pull the clog files off of an old file system backup. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On ons, 2011-03-30 at 10:57 -0400, Bruce Momjian wrote: > Also, I am unclear if this is really our bug. At least one of the > systems was on Ubuntu/Debian, and they might both have been, and I know > Debian changes our source code. Where can I find a copy of the diffs > they have made? http://bazaar.launchpad.net/~pitti/postgresql/debian-9.0/files/head:/debian/patches/
Peter Eisentraut wrote: > On ons, 2011-03-30 at 10:57 -0400, Bruce Momjian wrote: > > Also, I am unclear if this is really our bug. At least one of the > > systems was on Ubuntu/Debian, and they might both have been, and I know > > Debian changes our source code. Where can I find a copy of the diffs > > they have made? > > http://bazaar.launchpad.net/~pitti/postgresql/debian-9.0/files/head:/debian/patches/ These all seem reasonable, but I am confused by this. Someone reported last week that the equals sign is not optional in postgreql.conf on Debian but I don't see that patch in here. Also I thought they modified pg_hba.conf in some odd way. Are these changes no longer made in 9.0? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Mar 30, 2011 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote: > I think we have three options: > > o find if the use of autovacuum_freeze_max_age is safe, or make > it safe > o document that autovacuum_naptime always happens before > autovacuum does anything and set it high > o modify autovacuum to be an enum, with values on/off/disabled > > I think the last one makes more sense, and is safer if we need to > backpatch this. Creating a new variable for this would be confusing > because it could conflict with the 'autovacuum' setting. I have to admit the prospect of abuse is slightly frightening to me here. I guess we can't be held responsible for users who do dumb things, but it might not be too clear to someone what the difference is between autovacuum=off and autovacuum=disabled. I don't really understand why this is an issue in the first place, though. Surely we must be setting the XID counter on the new cluster to match the one on the old cluster, and migrating the relfrozenxid and datfrozenxid settings, so why does it matter if someone runs vacuum freeze? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2011-03-30 at 16:46 -0400, Robert Haas wrote: > I don't really > understand why this is an issue in the first place, though. Surely we > must be setting the XID counter on the new cluster to match the one on > the old cluster, and migrating the relfrozenxid and datfrozenxid > settings, so why does it matter if someone runs vacuum freeze? Because autovacuum may run before those things are properly set, as Bruce said in the original email: "I am concerned that somehow autovaccum is running in frozen mode before I have restored the frozen xids for the table or database." I think some kind of hidden GUC might be the best option. I tend to agree that a third option to the "autovacuum" setting would be confusing. Regards,Jeff Davis
On ons, 2011-03-30 at 15:39 -0400, Bruce Momjian wrote: > Peter Eisentraut wrote: > > On ons, 2011-03-30 at 10:57 -0400, Bruce Momjian wrote: > > > Also, I am unclear if this is really our bug. At least one of the > > > systems was on Ubuntu/Debian, and they might both have been, and I know > > > Debian changes our source code. Where can I find a copy of the diffs > > > they have made? > > > > http://bazaar.launchpad.net/~pitti/postgresql/debian-9.0/files/head:/debian/patches/ > > These all seem reasonable, but I am confused by this. Someone reported > last week that the equals sign is not optional in postgreql.conf on > Debian but I don't see that patch in here. That's probably because some other tool processes to the configuration file to find the data directory or something. > Also I thought they modified > pg_hba.conf in some odd way. Are these changes no longer made in 9.0? I think that was about 10 years ago.
Robert Haas wrote: > On Wed, Mar 30, 2011 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote: > > I think we have three options: > > > > ? ? ? ?o ?find if the use of autovacuum_freeze_max_age is safe, or make > > ? ? ? ? ? it safe > > ? ? ? ?o ?document that autovacuum_naptime always happens before > > ? ? ? ? ? autovacuum does anything and set it high > > ? ? ? ?o ?modify autovacuum to be an enum, with values on/off/disabled > > > > I think the last one makes more sense, and is safer if we need to > > backpatch this. ?Creating a new variable for this would be confusing > > because it could conflict with the 'autovacuum' setting. > > I have to admit the prospect of abuse is slightly frightening to me > here. I guess we can't be held responsible for users who do dumb > things, but it might not be too clear to someone what the difference > is between autovacuum=off and autovacuum=disabled. I don't really > understand why this is an issue in the first place, though. Surely we > must be setting the XID counter on the new cluster to match the one on > the old cluster, and migrating the relfrozenxid and datfrozenxid > settings, so why does it matter if someone runs vacuum freeze? First, I am not sure it is a problem, but based on the IRC reports I felt I should ask here for confirmation. Here is a sample pg_dump output: CREATE TABLE sample ( x integer);-- For binary upgrade, set relfrozenxid.UPDATE pg_catalog.pg_classSET relfrozenxid ='703'WHERE oid = 'sample'::pg_catalog.regclass; So, we set the cluster xid while we do this schema-only restore. I belive it might be possible for autovacuum to run while the schema is restored, see an empty table, and set the relfrozenxid to be the current xid, when in fact we are about to put a heap file in place of the current empty file. I thought the autovacuum_freeze_max_age=2000000000 would prevent this but now I am not sure. I assumed that since the gap between the restored relfrozenxid and the current counter would certainly be < 2000000000 that autovacuum would not touch it. It is possible these users had drastically modified autovacuum_freeze_max_age to cause 3-billion gaps --- again, I have no direct contact with the reporters, but I figured being paranoid is a good thing where pg_upgrade is involved. I wonder if the fact that these people never reported the bug means there were doing something odd with their servers. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > First, I am not sure it is a problem, but based on the IRC reports I > felt I should ask here for confirmation. Here is a sample pg_dump > output: > > CREATE TABLE sample ( > x integer > ); > > -- For binary upgrade, set relfrozenxid. > UPDATE pg_catalog.pg_class > SET relfrozenxid = '703' > WHERE oid = 'sample'::pg_catalog.regclass; > > So, we set the cluster xid while we do this schema-only restore. I > belive it might be possible for autovacuum to run while the schema is > restored, see an empty table, and set the relfrozenxid to be the current > xid, when in fact we are about to put a heap file in place of the > current empty file. I thought the autovacuum_freeze_max_age=2000000000 > would prevent this but now I am not sure. I assumed that since the gap > between the restored relfrozenxid and the current counter would > certainly be < 2000000000 that autovacuum would not touch it. It is > possible these users had drastically modified autovacuum_freeze_max_age > to cause 3-billion gaps --- again, I have no direct contact with the > reporters, but I figured being paranoid is a good thing where pg_upgrade > is involved. > > I wonder if the fact that these people never reported the bug means > there were doing something odd with their servers. I just updated the C comment about what we are doing: * Using autovacuum=off disables cleanup vacuum and analyze, but * freeze vacuums can still happen, so we set *autovacuum_freeze_max_age very high. We assume all datfrozenxid and * relfrozen values are less than a gap of 2000000000from the current * xid counter, so autovacuum will not touch them. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > > I wonder if the fact that these people never reported the bug means > > there were doing something odd with their servers. > > I just updated the C comment about what we are doing: > > * Using autovacuum=off disables cleanup vacuum and analyze, but > * freeze vacuums can still happen, so we set > * autovacuum_freeze_max_age very high. We assume all datfrozenxid and > * relfrozen values are less than a gap of 2000000000 from the current > * xid counter, so autovacuum will not touch them. FYI, 2000000000 is the maximum value for autovacuum_freeze_max_age, so a user can't set it higher. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Mar 30, 2011 at 5:27 PM, Bruce Momjian <bruce@momjian.us> wrote: > First, I am not sure it is a problem, but based on the IRC reports I > felt I should ask here for confirmation. Here is a sample pg_dump > output: > > CREATE TABLE sample ( > x integer > ); > > -- For binary upgrade, set relfrozenxid. > UPDATE pg_catalog.pg_class > SET relfrozenxid = '703' > WHERE oid = 'sample'::pg_catalog.regclass; > > So, we set the cluster xid while we do this schema-only restore. I > belive it might be possible for autovacuum to run while the schema is > restored, see an empty table, and set the relfrozenxid to be the current > xid, when in fact we are about to put a heap file in place of the > current empty file. I thought the autovacuum_freeze_max_age=2000000000 > would prevent this but now I am not sure. I assumed that since the gap > between the restored relfrozenxid and the current counter would > certainly be < 2000000000 that autovacuum would not touch it. It is > possible these users had drastically modified autovacuum_freeze_max_age > to cause 3-billion gaps --- again, I have no direct contact with the > reporters, but I figured being paranoid is a good thing where pg_upgrade > is involved. It does seem possible that that could happen, but I'm not sure exactly what would be causing autovacuum to fire in the first place. It wouldn't have to be triggered by the anti-wraparound machinery - if the table appeared to be in need of vacuuming, then we'd vacuum it, discover that is was empty, and update relfrozenxid. Hmm... could it fire just because the table has no stats? But if that were the case you'd think we'd be seeing this more often. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > > So, we set the cluster xid while we do this schema-only restore. ?I > > belive it might be possible for autovacuum to run while the schema is > > restored, see an empty table, and set the relfrozenxid to be the current > > xid, when in fact we are about to put a heap file in place of the > > current empty file. ?I thought the autovacuum_freeze_max_age=2000000000 > > would prevent this but now I am not sure. ?I assumed that since the gap > > between the restored relfrozenxid and the current counter would > > certainly be < 2000000000 that autovacuum would not touch it. ?It is > > possible these users had drastically modified autovacuum_freeze_max_age > > to cause 3-billion gaps --- again, I have no direct contact with the > > reporters, but I figured being paranoid is a good thing where pg_upgrade > > is involved. > > It does seem possible that that could happen, but I'm not sure exactly > what would be causing autovacuum to fire in the first place. It > wouldn't have to be triggered by the anti-wraparound machinery - if > the table appeared to be in need of vacuuming, then we'd vacuum it, > discover that is was empty, and update relfrozenxid. Hmm... could it > fire just because the table has no stats? But if that were the case > you'd think we'd be seeing this more often. Well, autovacuum=off, so it should only run in freeze mode, and I can't see how that could happen. I am thinking I have to study autovacuum.c. I wonder if datfrozenxid could be incremented because the database is originally empty. It would just need to scan pg_class, not actually vacuum anything. I wonder if we do that. The bottom line is I am hanging too much on autovacuum_freeze_max_age causing autovacuum to do nothing. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > > It does seem possible that that could happen, but I'm not sure exactly > > what would be causing autovacuum to fire in the first place. It > > wouldn't have to be triggered by the anti-wraparound machinery - if > > the table appeared to be in need of vacuuming, then we'd vacuum it, > > discover that is was empty, and update relfrozenxid. Hmm... could it > > fire just because the table has no stats? But if that were the case > > you'd think we'd be seeing this more often. > > Well, autovacuum=off, so it should only run in freeze mode, and I can't > see how that could happen. I am thinking I have to study autovacuum.c. > > I wonder if datfrozenxid could be incremented because the database is > originally empty. It would just need to scan pg_class, not actually > vacuum anything. I wonder if we do that. The bottom line is I am > hanging too much on autovacuum_freeze_max_age causing autovacuum to do > nothing. What if we allow autovacuum_max_workers to be set to zero; the current minimum is one. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Bruce Momjian wrote: > > > It does seem possible that that could happen, but I'm not sure exactly > > > what would be causing autovacuum to fire in the first place. It > > > wouldn't have to be triggered by the anti-wraparound machinery - if > > > the table appeared to be in need of vacuuming, then we'd vacuum it, > > > discover that is was empty, and update relfrozenxid. Hmm... could it > > > fire just because the table has no stats? But if that were the case > > > you'd think we'd be seeing this more often. > > > > Well, autovacuum=off, so it should only run in freeze mode, and I can't > > see how that could happen. I am thinking I have to study autovacuum.c. > > > > I wonder if datfrozenxid could be incremented because the database is > > originally empty. It would just need to scan pg_class, not actually > > vacuum anything. I wonder if we do that. The bottom line is I am > > hanging too much on autovacuum_freeze_max_age causing autovacuum to do > > nothing. > > What if we allow autovacuum_max_workers to be set to zero; the current > minimum is one. I can think of one case where autovacuum_freeze_max_age would be insufficient. If you set autovacuum_freeze_max_age in the old cluster to 2B, and you had a database that was near that limit, the tables created by pg_upgrade's --schema-only restore might create enough new transactions to cause autovacuum to run in freeze mode. While I think it is unlikely that is the cause of the problem report, it is enough for me to discount using autovacuum_freeze_max_age to disable autovacuum freeze. I will work on code to allow autovacuum_max_workers to be set to zero in HEAD and 9.0, and have pg_upgrade us that. I think the maintenance overhead of an invisible variable is too much. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 31.03.2011 17:55, Bruce Momjian wrote: > I will work on code to allow autovacuum_max_workers to be set to zero in > HEAD and 9.0, and have pg_upgrade us that. We've intentionally not allowed the user to disable anti-wraparound autovacuum before. Do we really want to allow it now for the sake of pg_upgrade? > I think the maintenance > overhead of an invisible variable is too much. A simple GUC or command-line switch isn't much code. Is the problem just that the clog files get removed too early, or is there something else? If it's just the clog files, we could simply copy them (again) after updating datfrozenxids. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > On 31.03.2011 17:55, Bruce Momjian wrote: > > I will work on code to allow autovacuum_max_workers to be set to zero in > > HEAD and 9.0, and have pg_upgrade us that. > > We've intentionally not allowed the user to disable anti-wraparound > autovacuum before. Do we really want to allow it now for the sake of > pg_upgrade? Not sure. > > I think the maintenance > > overhead of an invisible variable is too much. > > A simple GUC or command-line switch isn't much code. Well, is this going to show in SHOW ALL or pg_settings? Do we have the ability to easily disable display of this? > Is the problem just that the clog files get removed too early, or is > there something else? If it's just the clog files, we could simply copy > them (again) after updating datfrozenxids. The problem is that pg_upgrade through pg_dumpall is setting pg_database/pg_class frozen xid values and I can't have autovacuum modifying the system while this is happening. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> I think the maintenance >> overhead of an invisible variable is too much. > > A simple GUC or command-line switch isn't much code. I like the idea of a command-line switch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > >> ?I think the maintenance > >> overhead of an invisible variable is too much. > > > > A simple GUC or command-line switch isn't much code. > > I like the idea of a command-line switch. If you want to do that you should gereralize it as --binary-upgrade in case we have other needs for it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >> >> ?I think the maintenance >> >> overhead of an invisible variable is too much. >> > >> > A simple GUC or command-line switch isn't much code. >> >> I like the idea of a command-line switch. > > If you want to do that you should gereralize it as --binary-upgrade in > case we have other needs for it. Yeah. Or we could do a binary_upgrade GUC which has the effect of forcibly suppressing autovacuum, and maybe other things later. I think that's a lot less hazardous than fiddling with the autovacuum GUC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 30, 2011 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote:
I am hearing only second-hand reports of this problem through
Rhodiumtoad on IRC. I don't have IRC access this week
Regards,
--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Robert Haas wrote: > On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > >> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas > >> <heikki.linnakangas@enterprisedb.com> wrote: > >> >> ?I think the maintenance > >> >> overhead of an invisible variable is too much. > >> > > >> > A simple GUC or command-line switch isn't much code. > >> > >> I like the idea of a command-line switch. > > > > If you want to do that you should gereralize it as --binary-upgrade in > > case we have other needs for it. > > Yeah. Or we could do a binary_upgrade GUC which has the effect of > forcibly suppressing autovacuum, and maybe other things later. I > think that's a lot less hazardous than fiddling with the autovacuum > GUC. I like the idea of a command-line flag because it forces everything to be affected, and cannot be turned on and off in sessions --- if you are doing a binary upgrade, locked-down is good. :-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Robert Haas wrote: > > On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > Robert Haas wrote: > > >> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas > > >> <heikki.linnakangas@enterprisedb.com> wrote: > > >> >> ?I think the maintenance > > >> >> overhead of an invisible variable is too much. > > >> > > > >> > A simple GUC or command-line switch isn't much code. > > >> > > >> I like the idea of a command-line switch. > > > > > > If you want to do that you should gereralize it as --binary-upgrade in > > > case we have other needs for it. > > > > Yeah. Or we could do a binary_upgrade GUC which has the effect of > > forcibly suppressing autovacuum, and maybe other things later. I > > think that's a lot less hazardous than fiddling with the autovacuum > > GUC. > > I like the idea of a command-line flag because it forces everything to > be affected, and cannot be turned on and off in sessions --- if you are > doing a binary upgrade, locked-down is good. :-) Someone emailed me mentioning we might want to use this flag to cause the server to use only local connections or lock it down in some other way in the future. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
OK, thanks to RhodiumToad on IRC, I was able to determine the cause of the two reported pg_upgrade problems he saw via IRC. It seems toast tables have xids and pg_dump is not preserving the toast relfrozenxids as it should. Heap tables have preserved relfrozenxids, but if you update a heap row but don't change the toast value, and the old heap row is later removed, the toast table can have an older relfrozenxids than the heap table. The fix for this is to have pg_dump preserve toast relfrozenxids, which can be easily added and backpatched. We might want to push a 9.0.4 for this. Second, we need to find a way for people to detect and fix existing systems that have this problem, perhaps looming when the pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we need to figure out how to get this information to users. Perhaps the communication comes through the 9.0.4 release announcement. Yes, this is not good! :-( I will still add a special flag to postgres to turn off autovacuum, but as we suspected, this is only a marginal improvement and not the cause of the 9.0.X failures. The good news is that only two people have seen this problem and it only happens when the hint bits have not been set on the toast rows and the oldest heap rows have been updated. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of > the two reported pg_upgrade problems he saw via IRC. It seems toast > tables have xids and pg_dump is not preserving the toast relfrozenxids > as it should. Heap tables have preserved relfrozenxids, but if you > update a heap row but don't change the toast value, and the old heap row > is later removed, the toast table can have an older relfrozenxids than > the heap table. > > The fix for this is to have pg_dump preserve toast relfrozenxids, which > can be easily added and backpatched. We might want to push a 9.0.4 for > this. Second, we need to find a way for people to detect and fix > existing systems that have this problem, perhaps looming when the > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we > need to figure out how to get this information to users. Perhaps the > communication comes through the 9.0.4 release announcement. I am not sure how to interpret the lack of replies to this email. Either it is confidence, shock, or we told you so. ;-) Anyway, the attached patch fixes the problem. The fix is for pg_dump's binary upgrade mode. This would need to be backpatched back to 8.4 because pg_migrator needs this too. I have added a personal regression test to show which pg_class.relfrozenxid values are not preserved, and with this patch the only ones not preserved are toast tables used by system tables, which are not copied from the old cluster (FirstNormalObjectId = 16384). I am attaching that old/new pg_class.relfrozenxid diff as well. Any idea how to correct existing systems? Would VACUUM FREEZE of just the toast tables work? I perhaps could create a short DO block that would vacuum freeze just toast tables; it would have to be run in every database. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c new file mode 100644 index 3f6e77b..1ccdb4d *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *************** getTables(int *numTables) *** 3812,3817 **** --- 3812,3819 ---- int i_relhasrules; int i_relhasoids; int i_relfrozenxid; + int i_toastoid; + int i_toastfrozenxid; int i_relpersistence; int i_owning_tab; int i_owning_col; *************** getTables(int *numTables) *** 3855,3861 **** "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " ! "c.relfrozenxid, c.relpersistence, " "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype," "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " --- 3857,3865 ---- "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " ! "c.relfrozenxid, tc.oid AS toid, " ! "tc.relfrozenxid AS tfrozenxid, " ! "c.relpersistence, " "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype," "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " *************** getTables(int *numTables) *** 3889,3895 **** "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " ! "c.relfrozenxid, 'p' AS relpersistence, " "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype," "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " --- 3893,3901 ---- "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " ! "c.relfrozenxid, tc.oid AS toid, " ! "tc.relfrozenxid AS tfrozenxid, " ! "'p' AS relpersistence, " "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype," "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " *************** getTables(int *numTables) *** 3922,3928 **** "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " ! "c.relfrozenxid, 'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " --- 3928,3936 ---- "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " ! "c.relfrozenxid, tc.oid AS toid, " ! "tc.relfrozenxid AS tfrozenxid, " ! "'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " *************** getTables(int *numTables) *** 3955,3961 **** "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " ! "relfrozenxid, 'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " --- 3963,3972 ---- "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " ! "relfrozenxid, " ! "0 AS toid, " ! "0 AS tfrozenxid, " ! "'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " *************** getTables(int *numTables) *** 3987,3993 **** "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " ! "0 AS relfrozenxid, 'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " --- 3998,4007 ---- "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " ! "0 AS relfrozenxid, " ! "0 AS toid, " ! "0 AS tfrozenxid, " ! "'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " *************** getTables(int *numTables) *** 4019,4025 **** "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " ! "0 AS relfrozenxid, 'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " --- 4033,4042 ---- "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " ! "0 AS relfrozenxid, " ! "0 AS toid, " ! "0 AS tfrozenxid, " ! "'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " *************** getTables(int *numTables) *** 4047,4053 **** "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " ! "0 AS relfrozenxid, 'p' AS relpersistence, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " --- 4064,4073 ---- "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " ! "0 AS relfrozenxid, " ! "0 AS toid, " ! "0 AS tfrozenxid, " ! "'p' AS relpersistence, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " *************** getTables(int *numTables) *** 4070,4076 **** "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, " "'t'::bool AS relhasoids, " ! "0 AS relfrozenxid, 'p' AS relpersistence, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " --- 4090,4099 ---- "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, " "'t'::bool AS relhasoids, " ! "0 AS relfrozenxid, " ! "0 AS toid, " ! "0 AS tfrozenxid, " ! "'p' AS relpersistence, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " *************** getTables(int *numTables) *** 4103,4109 **** "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, " "'t'::bool AS relhasoids, " ! "0 as relfrozenxid, 'p' AS relpersistence, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " --- 4126,4135 ---- "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, " "'t'::bool AS relhasoids, " ! "0 as relfrozenxid, " ! "0 AS toid, " ! "0 AS tfrozenxid, " ! "'p' AS relpersistence, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " *************** getTables(int *numTables) *** 4149,4154 **** --- 4175,4182 ---- i_relhasrules = PQfnumber(res, "relhasrules"); i_relhasoids = PQfnumber(res, "relhasoids"); i_relfrozenxid = PQfnumber(res, "relfrozenxid"); + i_toastoid = PQfnumber(res, "toid"); + i_toastfrozenxid = PQfnumber(res, "tfrozenxid"); i_relpersistence = PQfnumber(res, "relpersistence"); i_owning_tab = PQfnumber(res, "owning_tab"); i_owning_col = PQfnumber(res, "owning_col"); *************** getTables(int *numTables) *** 4190,4195 **** --- 4218,4225 ---- tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0); tblinfo[i].hasoids = (strcmp(PQgetvalue(res, i, i_relhasoids), "t") == 0); tblinfo[i].frozenxid = atooid(PQgetvalue(res, i, i_relfrozenxid)); + tblinfo[i].toast_oid = atooid(PQgetvalue(res, i, i_toastoid)); + tblinfo[i].toast_frozenxid = atooid(PQgetvalue(res, i, i_toastfrozenxid)); if (PQgetisnull(res, i, i_reloftype)) tblinfo[i].reloftype = NULL; else *************** dumpTableSchema(Archive *fout, TableInfo *** 12221,12233 **** } } ! appendPQExpBuffer(q, "\n-- For binary upgrade, set relfrozenxid\n"); appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" "SET relfrozenxid = '%u'\n" "WHERE oid = ", tbinfo->frozenxid); appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout); appendPQExpBuffer(q, "::pg_catalog.regclass;\n"); } /* Loop dumping statistics and storage statements */ --- 12251,12273 ---- } } ! appendPQExpBuffer(q, "\n-- For binary upgrade, set heap's relfrozenxid\n"); appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" "SET relfrozenxid = '%u'\n" "WHERE oid = ", tbinfo->frozenxid); appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout); appendPQExpBuffer(q, "::pg_catalog.regclass;\n"); + + if (tbinfo->toast_oid) + { + /* We preserve the toast oids, so we can use it during restore */ + appendPQExpBuffer(q, "\n-- For binary upgrade, set toast's relfrozenxid\n"); + appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" + "SET relfrozenxid = '%u'\n" + "WHERE oid = '%u';\n", + tbinfo->toast_frozenxid, tbinfo->toast_oid); + } } /* Loop dumping statistics and storage statements */ diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h new file mode 100644 index 113ecb1..6559e23 *** a/src/bin/pg_dump/pg_dump.h --- b/src/bin/pg_dump/pg_dump.h *************** typedef struct _tableInfo *** 248,253 **** --- 248,255 ---- bool hastriggers; /* does it have any triggers? */ bool hasoids; /* does it have OIDs? */ uint32 frozenxid; /* for restore frozen xid */ + Oid toast_oid; /* for restore toast frozen xid */ + uint32 toast_frozenxid;/* for restore toast frozen xid */ int ncheck; /* # of CHECK expressions */ char *reloftype; /* underlying type for typed table */ /* these two are set only if table is a sequence owned by a column: */ 3c3 < postgres|654 --- > postgres|1457 6,7c6,7 < template0|654 < template1|654 --- > template0|1457 > template1|1457 11,19c11,19 < pg_toast|pg_toast_11454|654 < pg_toast|pg_toast_11459|654 < pg_toast|pg_toast_11464|654 < pg_toast|pg_toast_11469|654 < pg_toast|pg_toast_11474|654 < pg_toast|pg_toast_11479|654 < pg_toast|pg_toast_11484|654 < pg_toast|pg_toast_1255|654 < pg_toast|pg_toast_1262|654 --- > pg_toast|pg_toast_11511|1457 > pg_toast|pg_toast_11516|1457 > pg_toast|pg_toast_11521|1457 > pg_toast|pg_toast_11526|1457 > pg_toast|pg_toast_11531|1457 > pg_toast|pg_toast_11536|1457 > pg_toast|pg_toast_11541|1457 > pg_toast|pg_toast_1255|1457 > pg_toast|pg_toast_1262|1457 96,103c96,104 < pg_toast|pg_toast_2396|654 < pg_toast|pg_toast_2604|654 < pg_toast|pg_toast_2606|654 < pg_toast|pg_toast_2609|654 < pg_toast|pg_toast_2618|654 < pg_toast|pg_toast_2619|654 < pg_toast|pg_toast_2620|654 < pg_toast|pg_toast_2964|654 --- > pg_toast|pg_toast_2396|1457 > pg_toast|pg_toast_2604|1457 > pg_toast|pg_toast_2606|1457 > pg_toast|pg_toast_2609|1457 > pg_toast|pg_toast_2618|1457 > pg_toast|pg_toast_2619|1457 > pg_toast|pg_toast_2620|1457 > pg_toast|pg_toast_2964|1457 > pg_toast|pg_toast_3596|1457 279c280 < (268 rows) --- > (269 rows)
On 4/7/11 9:16 AM, Bruce Momjian wrote: > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of >> the two reported pg_upgrade problems he saw via IRC. BTW, just for the release notes, RhodiumToad == Andrew Gierth. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, 2011-04-07 at 12:16 -0400, Bruce Momjian wrote: > Bruce Momjian wrote: > > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of > > the two reported pg_upgrade problems he saw via IRC. It seems toast > > tables have xids and pg_dump is not preserving the toast relfrozenxids > > as it should. Heap tables have preserved relfrozenxids, but if you > > update a heap row but don't change the toast value, and the old heap row > > is later removed, the toast table can have an older relfrozenxids than > > the heap table. > > > > The fix for this is to have pg_dump preserve toast relfrozenxids, which > > can be easily added and backpatched. We might want to push a 9.0.4 for > > this. Second, we need to find a way for people to detect and fix > > existing systems that have this problem, perhaps looming when the > > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we > > need to figure out how to get this information to users. Perhaps the > > communication comes through the 9.0.4 release announcement. > > I am not sure how to interpret the lack of replies to this email. > Either it is confidence, shock, or we told you so. ;-) > > Anyway, the attached patch fixes the problem. The fix is for pg_dump's > binary upgrade mode. This would need to be backpatched back to 8.4 > because pg_migrator needs this too. > > I have added a personal regression test to show which > pg_class.relfrozenxid values are not preserved, and with this patch the > only ones not preserved are toast tables used by system tables, which > are not copied from the old cluster (FirstNormalObjectId = 16384). I am > attaching that old/new pg_class.relfrozenxid diff as well. > > Any idea how to correct existing systems? Would VACUUM FREEZE of just > the toast tables work? VACUUM FREEZE will never set the relfrozenxid backward. If it was never preserved to begin with, I assume that the existing value could be arbitrarily before or after, so it might not be updated. I think that after you VACUUM FREEZE the toast table, then the real oldest frozen xid (as opposed to the bad value in relfrozenxid for the toast table) would have to be the same or newer than that of the heap. Right? That means you could safely copy the heap's relfrozenxid to the relfrozenxid of its toast table. > I perhaps could create a short DO block that > would vacuum freeze just toast tables; it would have to be run in every > database. Well, that won't work, because VACUUM can't be executed in a transaction block or function. Regards,Jeff Davis
On Thu, Apr 07, 2011 at 12:16:55PM -0400, Bruce Momjian wrote: > Bruce Momjian wrote: > > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of > > the two reported pg_upgrade problems he saw via IRC. It seems toast > > tables have xids and pg_dump is not preserving the toast relfrozenxids > > as it should. Heap tables have preserved relfrozenxids, but if you > > update a heap row but don't change the toast value, and the old heap row > > is later removed, the toast table can have an older relfrozenxids than > > the heap table. > > > > The fix for this is to have pg_dump preserve toast relfrozenxids, which > > can be easily added and backpatched. We might want to push a 9.0.4 for > > this. Second, we need to find a way for people to detect and fix > > existing systems that have this problem, perhaps looming when the > > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we > > need to figure out how to get this information to users. Perhaps the > > communication comes through the 9.0.4 release announcement. > > I am not sure how to interpret the lack of replies to this email. > Either it is confidence, shock, or we told you so. ;-) Your explanation and patch make sense. Seems all too clear in retrospect. > Any idea how to correct existing systems? Would VACUUM FREEZE of just > the toast tables work? I perhaps could create a short DO block that > would vacuum freeze just toast tables; it would have to be run in every > database. I see three cases: 1) The pg_class.relfrozenxid that the TOAST table should have received ("true relfrozenxid") is still covered by available clog files. Fixable with some combination of pg_class.relfrozenxid twiddling and "SET vacuum_freeze_table_age = 0; VACUUM toasttbl". 2) The true relfrozenxid is no longer covered by available clog files. The fix for case 1 will get "file "foo" doesn't exist, reading as zeroes" log messages, and we will treat all transactions as uncommitted. Not generally fixable after that has happened. We could probably provide a recipe for checking whether it could have happened given access to a backup from just before the upgrade. 3) Enough transaction xids have elapsed such that the true relfrozenxid is again covered by clog files, but those records are unrelated to the original transactions. Actually, I don't think this can happen, even with the maximum autovacuum_freeze_max_age. I haven't tested those, so I'm sure there's some error in that assessment. nm
Jeff Davis wrote: > > I have added a personal regression test to show which > > pg_class.relfrozenxid values are not preserved, and with this patch the > > only ones not preserved are toast tables used by system tables, which > > are not copied from the old cluster (FirstNormalObjectId = 16384). I am > > attaching that old/new pg_class.relfrozenxid diff as well. > > > > Any idea how to correct existing systems? Would VACUUM FREEZE of just > > the toast tables work? > > VACUUM FREEZE will never set the relfrozenxid backward. If it was never > preserved to begin with, I assume that the existing value could be > arbitrarily before or after, so it might not be updated. > > I think that after you VACUUM FREEZE the toast table, then the real > oldest frozen xid (as opposed to the bad value in relfrozenxid for the > toast table) would have to be the same or newer than that of the heap. > Right? That means you could safely copy the heap's relfrozenxid to the > relfrozenxid of its toast table. OK, so the only other idea I have is to write some pretty complicated query function that does a sequential scan of each toast table and pulls the earliest xmin/xmax from the tables and use that to set the relfrozenxid (pretty complicated because it has to deal with the freeze horizon and wraparound). > > I perhaps could create a short DO block that > > would vacuum freeze just toast tables; it would have to be run in every > > database. > > Well, that won't work, because VACUUM can't be executed in a transaction > block or function. Good point. The only bright part of this is that missing clog will throw an error so we are not returning incorrect data, and hopefully people will report problems to us when it happens. Ideally I would like to get this patch and correction code out into the field in case more people run into this problem. I know some will, I just don't know how many. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Apr 7, 2011 at 3:46 PM, Bruce Momjian <bruce@momjian.us> wrote: > Jeff Davis wrote: >> > I have added a personal regression test to show which >> > pg_class.relfrozenxid values are not preserved, and with this patch the >> > only ones not preserved are toast tables used by system tables, which >> > are not copied from the old cluster (FirstNormalObjectId = 16384). I am >> > attaching that old/new pg_class.relfrozenxid diff as well. >> > >> > Any idea how to correct existing systems? Would VACUUM FREEZE of just >> > the toast tables work? >> >> VACUUM FREEZE will never set the relfrozenxid backward. If it was never >> preserved to begin with, I assume that the existing value could be >> arbitrarily before or after, so it might not be updated. >> >> I think that after you VACUUM FREEZE the toast table, then the real >> oldest frozen xid (as opposed to the bad value in relfrozenxid for the >> toast table) would have to be the same or newer than that of the heap. >> Right? That means you could safely copy the heap's relfrozenxid to the >> relfrozenxid of its toast table. > > OK, so the only other idea I have is to write some pretty complicated > query function that does a sequential scan of each toast table and pulls > the earliest xmin/xmax from the tables and use that to set the > relfrozenxid (pretty complicated because it has to deal with the freeze > horizon and wraparound). > >> > I perhaps could create a short DO block that >> > would vacuum freeze just toast tables; it would have to be run in every >> > database. >> >> Well, that won't work, because VACUUM can't be executed in a transaction >> block or function. > > Good point. > > The only bright part of this is that missing clog will throw an error so > we are not returning incorrect data, and hopefully people will report > problems to us when it happens. > > Ideally I would like to get this patch and correction code out into the > field in case more people run into this problem. I know some will, I > just don't know how many. ISTM we need to force a minor release once we are sure this has been corrected. We had also probably put out an announcement warning people that have already used pg_upgrade of possible data corruption. I'm not sure exactly what the language around that should be, but this does seem pretty bad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > ISTM we need to force a minor release once we are sure this has > been corrected. We had also probably put out an announcement > warning people that have already used pg_upgrade of possible data > corruption. I'm not sure exactly what the language around that > should be, but this does seem pretty bad. We just used this to upgrade all of our databases to 9.0. Most of those (particularly the databases where data originates) have VACUUM FREEZE ANALYZE run nightly, and we ran this against all databases right after each pg_upgrade. Will that have offered us some protection from this bug? -Kevin
On Thu, 2011-04-07 at 15:46 -0400, Bruce Momjian wrote: > OK, so the only other idea I have is to write some pretty complicated > query function that does a sequential scan of each toast table and pulls > the earliest xmin/xmax from the tables and use that to set the > relfrozenxid (pretty complicated because it has to deal with the freeze > horizon and wraparound). That sounds like the correct way to fix the situation, although it's a little more work to install another function just for this one-time purpose. TransactionIdPrecedes() should already account for wraparound, so I don't think that it will be too complicated (make sure to read every tuple though, not just the ones currently visible). Stepping back a second to make sure I understand the problem: the only problem is that relfrozenxid on the toast table after an upgrade is wrong. Correct? Regards,Jeff Davis
Jeff Davis wrote: > On Thu, 2011-04-07 at 15:46 -0400, Bruce Momjian wrote: > > OK, so the only other idea I have is to write some pretty complicated > > query function that does a sequential scan of each toast table and pulls > > the earliest xmin/xmax from the tables and use that to set the > > relfrozenxid (pretty complicated because it has to deal with the freeze > > horizon and wraparound). > > That sounds like the correct way to fix the situation, although it's a > little more work to install another function just for this one-time > purpose. TransactionIdPrecedes() should already account for wraparound, > so I don't think that it will be too complicated (make sure to read > every tuple though, not just the ones currently visible). I want to avoid anything that requires a compile because they are hard for many sites to install so TransactionIdPrecedes() is out. We will need to do this in PL/pgSQL probably. > Stepping back a second to make sure I understand the problem: the only > problem is that relfrozenxid on the toast table after an upgrade is > wrong. Correct? Yes, it was not restored from the old cluster. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Jeff Davis wrote: > On Thu, 2011-04-07 at 15:46 -0400, Bruce Momjian wrote: > > OK, so the only other idea I have is to write some pretty complicated > > query function that does a sequential scan of each toast table and pulls > > the earliest xmin/xmax from the tables and use that to set the > > relfrozenxid (pretty complicated because it has to deal with the freeze > > horizon and wraparound). > > That sounds like the correct way to fix the situation, although it's a > little more work to install another function just for this one-time > purpose. TransactionIdPrecedes() should already account for wraparound, > so I don't think that it will be too complicated (make sure to read > every tuple though, not just the ones currently visible). > > Stepping back a second to make sure I understand the problem: the only > problem is that relfrozenxid on the toast table after an upgrade is > wrong. Correct? One minimal solution might be to set the toast relfozenxid to match the heap frozenxid? Ideas? It is not 100% accurate but it might help. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Robert Haas wrote: > >> Well, that won't work, because VACUUM can't be executed in a transaction > >> block or function. > > > > Good point. > > > > The only bright part of this is that missing clog will throw an error so > > we are not returning incorrect data, and hopefully people will report > > problems to us when it happens. > > > > Ideally I would like to get this patch and correction code out into the > > field in case more people run into this problem. ?I know some will, I > > just don't know how many. > > ISTM we need to force a minor release once we are sure this has been > corrected. We had also probably put out an announcement warning > people that have already used pg_upgrade of possible data corruption. > I'm not sure exactly what the language around that should be, but this > does seem pretty bad. Yep, "pretty bad" it is. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Robert Haas wrote: > > >> Well, that won't work, because VACUUM can't be executed in a transaction > > >> block or function. > > > > > > Good point. > > > > > > The only bright part of this is that missing clog will throw an error so > > > we are not returning incorrect data, and hopefully people will report > > > problems to us when it happens. > > > > > > Ideally I would like to get this patch and correction code out into the > > > field in case more people run into this problem. ?I know some will, I > > > just don't know how many. > > > > ISTM we need to force a minor release once we are sure this has been > > corrected. We had also probably put out an announcement warning > > people that have already used pg_upgrade of possible data corruption. > > I'm not sure exactly what the language around that should be, but this > > does seem pretty bad. > > Yep, "pretty bad" it is. The bug exists because I did not realize that the toast relfrozenxid is tracked independently of the heap, until the IRC report diagnosis. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, 2011-04-07 at 17:06 -0400, Bruce Momjian wrote: > I want to avoid anything that requires a compile because they are hard > for many sites to install so TransactionIdPrecedes() is out. We will > need to do this in PL/pgSQL probably. PL/pgSQL can't see dead rows, so that would not be correct. It's guaranteed to be the same value you see from the heap or newer; because if it's not visible in the heap, it's not going to be visible in the toast table. Regards,Jeff Davis
Jeff Davis wrote: > On Thu, 2011-04-07 at 17:06 -0400, Bruce Momjian wrote: > > I want to avoid anything that requires a compile because they are hard > > for many sites to install so TransactionIdPrecedes() is out. We will > > need to do this in PL/pgSQL probably. > > PL/pgSQL can't see dead rows, so that would not be correct. It's > guaranteed to be the same value you see from the heap or newer; because > if it's not visible in the heap, it's not going to be visible in the > toast table. Well, frankly all we need to do is set those hint bits before the clog gets remove, so maybe just a SELECT * would do the trick! That and maybe set the relfrozenxid to match the heap. It is there now or more people would be reporting problems. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Jeff Davis wrote: > > On Thu, 2011-04-07 at 17:06 -0400, Bruce Momjian wrote: > > > I want to avoid anything that requires a compile because they are hard > > > for many sites to install so TransactionIdPrecedes() is out. We will > > > need to do this in PL/pgSQL probably. > > > > PL/pgSQL can't see dead rows, so that would not be correct. It's > > guaranteed to be the same value you see from the heap or newer; because > > if it's not visible in the heap, it's not going to be visible in the > > toast table. > > Well, frankly all we need to do is set those hint bits before the clog > gets remove, so maybe just a SELECT * would do the trick! That and > maybe set the relfrozenxid to match the heap. > > It is there now or more people would be reporting problems. Clarification, "the clog" is there now or more people would be reporting problems. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> wrote: > all we need to do is set those hint bits before the clog gets > remove, so maybe just a SELECT * would do the trick! Does that mean that those experiencing the problem are failing to do the vacuumdb run which is recommended in the pg_upgrade instructions? -Kevin
Kevin Grittner wrote: > Bruce Momjian <bruce@momjian.us> wrote: > > > all we need to do is set those hint bits before the clog gets > > remove, so maybe just a SELECT * would do the trick! > > Does that mean that those experiencing the problem are failing to do > the vacuumdb run which is recommended in the pg_upgrade instructions? You know, I looked at that, but I don't think that is going to save me. :-( It says: Upgrade complete----------------| Optimizer statistics are not transferred by pg_upgrade| so consider running:| vacuumdb--all --analyze-only| on the newly-upgraded cluster.| Running this script will delete the old cluster's data files:| /usr/var/local/pgdev/pgfoundry/pg_migrator/pg_migrator/delete_old_cluster.sh We recommend 'vacuumdb --all --analyze-only' which I assume only samples random pages and does not set all the hint bits. In fact, you can't even analyze TOAST tables: test=> ANALYZE pg_toast.pg_toast_3596;WARNING: skipping "pg_toast_3596" --- cannot analyze non-tables orspecial system tablesANALYZE but you can SELECT from them: chunk_id | chunk_seq | chunk_data----------+-----------+------------(0 rows) Also, if we force VACUUM FREEZE on the toast tables we would have no need to advance their relfrozenxids because all the xids would be fixed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, 2011-04-07 at 12:38 -0700, Jeff Davis wrote: > > Any idea how to correct existing systems? Would VACUUM FREEZE of just > > the toast tables work? > > VACUUM FREEZE will never set the relfrozenxid backward. If it was never > preserved to begin with, I assume that the existing value could be > arbitrarily before or after, so it might not be updated. Now that I understand the problem a little better, I think VACUUM FREEZE might work, after all. Originally, I thought that the toast table's relfrozenxid could be some arbitrarily wrong value. But actually, the CREATE TABLE is issued after the xid of the new cluster has already been advanced to the xid of the old cluster, so it should be a "somewhat reasonable" value. That means that VACUUM FREEZE of the toast table, if there are no concurrent transactions, will freeze all of the tuples; and the newFrozenXid should always be seen as newer than the existing (and wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and everything should be fine. Right? Regards,Jeff Davis
Jeff Davis wrote: > On Thu, 2011-04-07 at 12:38 -0700, Jeff Davis wrote: > > > Any idea how to correct existing systems? Would VACUUM FREEZE of just > > > the toast tables work? > > > > VACUUM FREEZE will never set the relfrozenxid backward. If it was never > > preserved to begin with, I assume that the existing value could be > > arbitrarily before or after, so it might not be updated. > > Now that I understand the problem a little better, I think VACUUM FREEZE > might work, after all. Good. I don't want to be inventing something complex if I can avoid it. Simple is good, espeically if admins panic. I would rather simple and longer than short but complex :-) > Originally, I thought that the toast table's relfrozenxid could be some > arbitrarily wrong value. But actually, the CREATE TABLE is issued after > the xid of the new cluster has already been advanced to the xid of the > old cluster, so it should be a "somewhat reasonable" value. Yes, it will be reasonable. > That means that VACUUM FREEZE of the toast table, if there are no > concurrent transactions, will freeze all of the tuples; and the > newFrozenXid should always be seen as newer than the existing (and > wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and > everything should be fine. Right? Right. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Apr 7, 2011 at 5:52 PM, Bruce Momjian <bruce@momjian.us> wrote: > Jeff Davis wrote: >> On Thu, 2011-04-07 at 12:38 -0700, Jeff Davis wrote: >> > > Any idea how to correct existing systems? Would VACUUM FREEZE of just >> > > the toast tables work? >> > >> > VACUUM FREEZE will never set the relfrozenxid backward. If it was never >> > preserved to begin with, I assume that the existing value could be >> > arbitrarily before or after, so it might not be updated. >> >> Now that I understand the problem a little better, I think VACUUM FREEZE >> might work, after all. > > Good. I don't want to be inventing something complex if I can avoid it. > Simple is good, espeically if admins panic. I would rather simple and > longer than short but complex :-) > >> Originally, I thought that the toast table's relfrozenxid could be some >> arbitrarily wrong value. But actually, the CREATE TABLE is issued after >> the xid of the new cluster has already been advanced to the xid of the >> old cluster, so it should be a "somewhat reasonable" value. > > Yes, it will be reasonable. > >> That means that VACUUM FREEZE of the toast table, if there are no >> concurrent transactions, will freeze all of the tuples; and the >> newFrozenXid should always be seen as newer than the existing (and >> wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and >> everything should be fine. Right? > > Right. This depends on how soon after the upgrade VACUUM FREEZE is run, doesn't it? If the XID counter has advanced too far... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Thu, Apr 7, 2011 at 5:52 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Jeff Davis wrote: > >> On Thu, 2011-04-07 at 12:38 -0700, Jeff Davis wrote: > >> > > Any idea how to correct existing systems? ?Would VACUUM FREEZE of just > >> > > the toast tables work? > >> > > >> > VACUUM FREEZE will never set the relfrozenxid backward. If it was never > >> > preserved to begin with, I assume that the existing value could be > >> > arbitrarily before or after, so it might not be updated. > >> > >> Now that I understand the problem a little better, I think VACUUM FREEZE > >> might work, after all. > > > > Good. ?I don't want to be inventing something complex if I can avoid it. > > Simple is good, espeically if admins panic. ?I would rather simple and > > longer than short but complex ?:-) > > > >> Originally, I thought that the toast table's relfrozenxid could be some > >> arbitrarily wrong value. But actually, the CREATE TABLE is issued after > >> the xid of the new cluster has already been advanced to the xid of the > >> old cluster, so it should be a "somewhat reasonable" value. > > > > Yes, it will be reasonable. > > > >> That means that VACUUM FREEZE of the toast table, if there are no > >> concurrent transactions, will freeze all of the tuples; and the > >> newFrozenXid should always be seen as newer than the existing (and > >> wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and > >> everything should be fine. Right? > > > > Right. > > This depends on how soon after the upgrade VACUUM FREEZE is run, > doesn't it? If the XID counter has advanced too far... Well, I assume VACUUM FREEZE is going to sequential scan the table and replace every xid. If the clog is gone, well, we have problems. I think the IRC reporter pulled the clog files from a backup. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > > > Yes, it will be reasonable. > > > > > >> That means that VACUUM FREEZE of the toast table, if there are no > > >> concurrent transactions, will freeze all of the tuples; and the > > >> newFrozenXid should always be seen as newer than the existing (and > > >> wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and > > >> everything should be fine. Right? > > > > > > Right. > > > > This depends on how soon after the upgrade VACUUM FREEZE is run, > > doesn't it? If the XID counter has advanced too far... > > Well, I assume VACUUM FREEZE is going to sequential scan the table and > replace every xid. If the clog is gone, well, we have problems. I > think the IRC reporter pulled the clog files from a backup. So I think we have four possible approaches to correct databases: 1) SELECT * to set the hint bits2) VACUUM to set the hint bits3) VACUUM FREEZE to remove the old xids4) some complicatedfunction I don't like #4, and I think I can script #2 and #3 in psql by using COPY to create a VACUUM script and then run it with \i. #1 is easy in a DO block with PL/pgSQL. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, 2011-04-07 at 20:14 -0400, Bruce Momjian wrote: > So I think we have four possible approaches to correct databases: > > 1) SELECT * to set the hint bits > 2) VACUUM to set the hint bits > 3) VACUUM FREEZE to remove the old xids > 4) some complicated function > > I don't like #4, and I think I can script #2 and #3 in psql by using COPY > to create a VACUUM script and then run it with \i. #1 is easy in a DO > block with PL/pgSQL. The only one that sounds very reasonable to me is #3. If there are any xids older than the relfrozenxid, we need to get rid of them. If there is some reason that doesn't work, I suppose we can consider the alternatives. But I don't like the hint-bit-setting approach much. What if the xmax is really a transaction that got an exclusive lock on the tuple, rather than actually deleting it? Are you sure that a SELECT (or even a normal VACUUM) would get rid of that xid, or might something still try to look it up in the clog later? Not only that, but hint-bit-setting is not WAL-logged, so you'd really have to do a checkpoint afterward. Regards,Jeff Davis
Jeff Davis wrote: > On Thu, 2011-04-07 at 20:14 -0400, Bruce Momjian wrote: > > So I think we have four possible approaches to correct databases: > > > > 1) SELECT * to set the hint bits > > 2) VACUUM to set the hint bits > > 3) VACUUM FREEZE to remove the old xids > > 4) some complicated function > > > > I don't like #4, and I think I can script #2 and #3 in psql by using COPY > > to create a VACUUM script and then run it with \i. #1 is easy in a DO > > block with PL/pgSQL. > > The only one that sounds very reasonable to me is #3. If there are any > xids older than the relfrozenxid, we need to get rid of them. If there > is some reason that doesn't work, I suppose we can consider the > alternatives. But I don't like the hint-bit-setting approach much. > > What if the xmax is really a transaction that got an exclusive lock on > the tuple, rather than actually deleting it? Are you sure that a SELECT > (or even a normal VACUUM) would get rid of that xid, or might something > still try to look it up in the clog later? > > Not only that, but hint-bit-setting is not WAL-logged, so you'd really > have to do a checkpoint afterward. Glad you said that! Here is a script which does what we want: -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that were upgraded by pg_upgrade and pg_migrator.-- Runthe script using psql for every database in the cluster, -- except 'template0', e.g.-- psql -f pg_upgrade_fix dbname--It will not lock any tables but will generate I/O.--SET vacuum_freeze_min_age = 0;SET vacuum_freeze_table_age = 0;CREATETEMPORARY TABLE pg_upgrade_fix AS SELECT 'VACUUM FREEZE pg_toast.' || quote_ident(relname) || ';' FROM pg_class c, pg_namespace n WHERE c.relnamespace = n.oid AND n.nspname = 'pg_toast' AND c.relkind= 't';\copy pg_upgrade_fix TO 'pg_upgrade_fix.sql';\i pg_upgrade_fix.sqlDROP TABLE pg_upgrade_fix; Looks pretty simple to copy/paste and use. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Noah Misch wrote: > On Thu, Apr 07, 2011 at 12:16:55PM -0400, Bruce Momjian wrote: > > Bruce Momjian wrote: > > > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of > > > the two reported pg_upgrade problems he saw via IRC. It seems toast > > > tables have xids and pg_dump is not preserving the toast relfrozenxids > > > as it should. Heap tables have preserved relfrozenxids, but if you > > > update a heap row but don't change the toast value, and the old heap row > > > is later removed, the toast table can have an older relfrozenxids than > > > the heap table. > > > > > > The fix for this is to have pg_dump preserve toast relfrozenxids, which > > > can be easily added and backpatched. We might want to push a 9.0.4 for > > > this. Second, we need to find a way for people to detect and fix > > > existing systems that have this problem, perhaps looming when the > > > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we > > > need to figure out how to get this information to users. Perhaps the > > > communication comes through the 9.0.4 release announcement. > > > > I am not sure how to interpret the lack of replies to this email. > > Either it is confidence, shock, or we told you so. ;-) > > Your explanation and patch make sense. Seems all too clear in retrospect. Yeah, like "duh" for me. > > Any idea how to correct existing systems? Would VACUUM FREEZE of just > > the toast tables work? I perhaps could create a short DO block that > > would vacuum freeze just toast tables; it would have to be run in every > > database. > > I see three cases: > > 1) The pg_class.relfrozenxid that the TOAST table should have received > ("true relfrozenxid") is still covered by available clog files. Fixable > with some combination of pg_class.relfrozenxid twiddling and "SET > vacuum_freeze_table_age = 0; VACUUM toasttbl". Right, VACUUM FREEZE. I now see I don't need to set vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has: if (n->options & VACOPT_FREEZE)n->freeze_min_age = n->freeze_table_age = 0; > 2) The true relfrozenxid is no longer covered by available clog files. > The fix for case 1 will get "file "foo" doesn't exist, reading as > zeroes" log messages, and we will treat all transactions as uncommitted. Uh, are you sure? I think it would return an error message about a missing clog file for the query; here is a report of a case not related to pg_upgrade: http://archives.postgresql.org/pgsql-admin/2010-09/msg00109.php > Not generally fixable after that has happened. We could probably > provide a recipe for checking whether it could have happened given > access to a backup from just before the upgrade. The IRC folks pulled the clog files off of backups. > 3) Enough transaction xids have elapsed such that the true relfrozenxid > is again covered by clog files, but those records are unrelated to the > original transactions. Actually, I don't think this can happen, even > with the maximum autovacuum_freeze_max_age. Yes, I don't think that can happen either. One concern I have is that existing heap tables are protecting clog files, but once those are frozen, the system might remove clog files not realizing it has to freeze the heap tables too. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Apr 07, 2011 at 10:21:06PM -0400, Bruce Momjian wrote: > Noah Misch wrote: > > 1) The pg_class.relfrozenxid that the TOAST table should have received > > ("true relfrozenxid") is still covered by available clog files. Fixable > > with some combination of pg_class.relfrozenxid twiddling and "SET > > vacuum_freeze_table_age = 0; VACUUM toasttbl". > > Right, VACUUM FREEZE. I now see I don't need to set > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has: > > if (n->options & VACOPT_FREEZE) > n->freeze_min_age = n->freeze_table_age = 0; True; it just performs more work than strictly necessary. We don't actually need earlier-than-usual freezing. We need only ensure that the relfrozenxid will guide future VACUUMs to do that freezing early enough. However, I'm not sure how to do that without directly updating relfrozenxid, so it's probably just as well to cause some extra work and stick to the standard interface. > > 2) The true relfrozenxid is no longer covered by available clog files. > > The fix for case 1 will get "file "foo" doesn't exist, reading as > > zeroes" log messages, and we will treat all transactions as uncommitted. > > Uh, are you sure? I think it would return an error message about a > missing clog file for the query; here is a report of a case not related > to pg_upgrade: > > http://archives.postgresql.org/pgsql-admin/2010-09/msg00109.php My statement was indeed incorrect. (Was looking at the "reading as zeroes" message in slru.c, but it only applies during recovery.) > > Not generally fixable after that has happened. We could probably > > provide a recipe for checking whether it could have happened given > > access to a backup from just before the upgrade. > > The IRC folks pulled the clog files off of backups. Since we do get the error after all, that should always be enough. > One concern I have is that existing heap tables are protecting clog > files, but once those are frozen, the system might remove clog files not > realizing it has to freeze the heap tables too. Yes. On a similar note, would it be worth having your prototype fixup script sort the VACUUM FREEZE calls in descending relfrozenxid order? Thanks, nm
Noah Misch wrote: > On Thu, Apr 07, 2011 at 10:21:06PM -0400, Bruce Momjian wrote: > > Noah Misch wrote: > > > 1) The pg_class.relfrozenxid that the TOAST table should have received > > > ("true relfrozenxid") is still covered by available clog files. Fixable > > > with some combination of pg_class.relfrozenxid twiddling and "SET > > > vacuum_freeze_table_age = 0; VACUUM toasttbl". > > > > Right, VACUUM FREEZE. I now see I don't need to set > > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has: > > > > if (n->options & VACOPT_FREEZE) > > n->freeze_min_age = n->freeze_table_age = 0; > > True; it just performs more work than strictly necessary. We don't actually > need earlier-than-usual freezing. We need only ensure that the relfrozenxid > will guide future VACUUMs to do that freezing early enough. However, I'm not > sure how to do that without directly updating relfrozenxid, so it's probably > just as well to cause some extra work and stick to the standard interface. Looking at the code, it seems VACUUM FREEZE will freeze any row it can, and because we restarted the cluster after the upgrade, all the rows we care about are visible or dead to all transactions. pg_upgrade certainly relies on that fact already. > > > 2) The true relfrozenxid is no longer covered by available clog files. > > > The fix for case 1 will get "file "foo" doesn't exist, reading as > > > zeroes" log messages, and we will treat all transactions as uncommitted. > > > > Uh, are you sure? I think it would return an error message about a > > missing clog file for the query; here is a report of a case not related > > to pg_upgrade: > > > > http://archives.postgresql.org/pgsql-admin/2010-09/msg00109.php > > My statement was indeed incorrect. (Was looking at the "reading as zeroes" > message in slru.c, but it only applies during recovery.) No problem. Thanks for the review. > > > Not generally fixable after that has happened. We could probably > > > provide a recipe for checking whether it could have happened given > > > access to a backup from just before the upgrade. > > > > The IRC folks pulled the clog files off of backups. > > Since we do get the error after all, that should always be enough. That was my thought too. > > One concern I have is that existing heap tables are protecting clog > > files, but once those are frozen, the system might remove clog files not > > realizing it has to freeze the heap tables too. > > Yes. On a similar note, would it be worth having your prototype fixup script > sort the VACUUM FREEZE calls in descending relfrozenxid order? Good question. I don't think the relfrozenxid ordering would make sense because I think it is unlikely problems will happen while they are running the script. The other problem is that because of wraparound it is unclear which xid is earliest. What I did add was to order by the oid, so at least the toast numbers are in order and people can more easily track its progress. New version; I made some other small adjustments: -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.-- Runthe script using psql for every database in the cluster-- except 'template0', e.g.:-- psql -U postgres -a -f pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';' FROM pg_classc, pg_namespace n WHERE c.relnamespace = n.oid AND n.nspname = 'pg_toast' AND c.relkind ='t' ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, 2011-04-07 at 22:21 -0400, Bruce Momjian wrote: > One concern I have is that existing heap tables are protecting clog > files, but once those are frozen, the system might remove clog files not > realizing it has to freeze the heap tables too. I don't understand. Can you elaborate? Regards,Jeff Davis
On Fri, 2011-04-08 at 07:08 -0400, Noah Misch wrote: > > Right, VACUUM FREEZE. I now see I don't need to set > > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has: > > > > if (n->options & VACOPT_FREEZE) > > n->freeze_min_age = n->freeze_table_age = 0; > > True; it just performs more work than strictly necessary. We don't actually > need earlier-than-usual freezing. We need only ensure that the relfrozenxid > will guide future VACUUMs to do that freezing early enough. However, I'm not > sure how to do that without directly updating relfrozenxid, so it's probably > just as well to cause some extra work and stick to the standard interface. If there are tuples in a toast table containing xids that are older than the toast table's relfrozenxid, then there are only two options: 1. Make relfrozenxid go backward to the right value. There is currently no mechanism to do this without compiling C code into the server, because (a) VACUUM FREEZE will never move the relfrozenxid backward; and (b) there is no way to find the oldest xid in a table with a normal snapshot. 2. Get rid of those xids older than relfrozenxid (i.e. VACUUM FREEZE). I don't know what you mean about VACUUM FREEZE doing extra work. I suppose you could set the vacuum_freeze_min_age to be exactly the right value such that it freezes everything before the existing (and wrong) relfrozenxid, but in practice I think it would be the same amount of work. Regards,Jeff Davis
Bruce Momjian wrote: > Bruce Momjian wrote: > > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of > > the two reported pg_upgrade problems he saw via IRC. It seems toast > > tables have xids and pg_dump is not preserving the toast relfrozenxids > > as it should. Heap tables have preserved relfrozenxids, but if you > > update a heap row but don't change the toast value, and the old heap row > > is later removed, the toast table can have an older relfrozenxids than > > the heap table. > > > > The fix for this is to have pg_dump preserve toast relfrozenxids, which > > can be easily added and backpatched. We might want to push a 9.0.4 for > > this. Second, we need to find a way for people to detect and fix > > existing systems that have this problem, perhaps looming when the > > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we > > need to figure out how to get this information to users. Perhaps the > > communication comes through the 9.0.4 release announcement. > > I am not sure how to interpret the lack of replies to this email. > Either it is confidence, shock, or we told you so. ;-) > > Anyway, the attached patch fixes the problem. The fix is for pg_dump's > binary upgrade mode. This would need to be backpatched back to 8.4 > because pg_migrator needs this too. OK, I have applied the attached three patches to 8.4, 9.0, and 9.1. They are all slightly different because of code drift, and I created a unified diff which I find is clearer for single-line changes. I was very careful about the patching of queries because many of these queries are only activated when dumping an older database, and are therefore hard to test for SQL query errors. I included all the version patches in case someone sees something I missed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 3842895..c5057f7 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3185,6 +3185,8 @@ getTables(int *numTables) int i_relhasrules; int i_relhasoids; int i_relfrozenxid; + int i_toastoid; + int i_toastfrozenxid; int i_owning_tab; int i_owning_col; int i_reltablespace; @@ -3226,7 +3228,8 @@ getTables(int *numTables) "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " - "c.relfrozenxid, " + "c.relfrozenxid, tc.oid AS toid, " + "tc.relfrozenxid AS tfrozenxid, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, " @@ -3259,6 +3262,8 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " "relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, " @@ -3290,6 +3295,8 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, " @@ -3321,6 +3328,8 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " "NULL AS reltablespace, " @@ -3348,6 +3357,8 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " "NULL AS reltablespace, " @@ -3370,6 +3381,8 @@ getTables(int *numTables) "relhasindex, relhasrules, " "'t'::bool AS relhasoids, " "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " "NULL AS reltablespace, " @@ -3446,6 +3459,8 @@ getTables(int *numTables) i_relhasrules = PQfnumber(res, "relhasrules"); i_relhasoids = PQfnumber(res, "relhasoids"); i_relfrozenxid = PQfnumber(res, "relfrozenxid"); + i_toastoid = PQfnumber(res, "toid"); + i_toastfrozenxid = PQfnumber(res, "tfrozenxid"); i_owning_tab = PQfnumber(res, "owning_tab"); i_owning_col = PQfnumber(res, "owning_col"); i_reltablespace = PQfnumber(res, "reltablespace"); @@ -3484,6 +3499,8 @@ getTables(int *numTables) tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0); tblinfo[i].hasoids = (strcmp(PQgetvalue(res, i, i_relhasoids), "t") == 0); tblinfo[i].frozenxid = atooid(PQgetvalue(res, i, i_relfrozenxid)); + tblinfo[i].toast_oid = atooid(PQgetvalue(res, i, i_toastoid)); + tblinfo[i].toast_frozenxid = atooid(PQgetvalue(res, i, i_toastfrozenxid)); tblinfo[i].ncheck = atoi(PQgetvalue(res, i, i_relchecks)); if (PQgetisnull(res, i, i_owning_tab)) { @@ -10044,13 +10061,23 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) } } - appendPQExpBuffer(q, "\n-- For binary upgrade, set relfrozenxid.\n"); + appendPQExpBuffer(q, "\n-- For binary upgrade, set heap's relfrozenxid\n"); appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" "SET relfrozenxid = '%u'\n" "WHERE oid = ", tbinfo->frozenxid); appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout); appendPQExpBuffer(q, "::pg_catalog.regclass;\n"); + + if (tbinfo->toast_oid) + { + /* We preserve the toast oids, so we can use it during restore */ + appendPQExpBuffer(q, "\n-- For binary upgrade, set toast's relfrozenxid\n"); + appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" + "SET relfrozenxid = '%u'\n" + "WHERE oid = '%u';\n", + tbinfo->toast_frozenxid, tbinfo->toast_oid); + } } /* Loop dumping statistics and storage statements */ diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index 3339bf7..390b20c 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -227,6 +227,8 @@ typedef struct _tableInfo bool hastriggers; /* does it have any triggers? */ bool hasoids; /* does it have OIDs? */ uint32 frozenxid; /* for restore frozen xid */ + Oid toast_oid; /* for restore toast frozen xid */ + uint32 toast_frozenxid;/* for restore toast frozen xid */ int ncheck; /* # of CHECK expressions */ /* these two are set only if table is a sequence owned by a column: */ Oid owning_tab; /* OID of table owning sequence */ diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index f93affd..8721e65 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3409,6 +3409,8 @@ getTables(int *numTables) int i_relhasrules; int i_relhasoids; int i_relfrozenxid; + int i_toastoid; + int i_toastfrozenxid; int i_owning_tab; int i_owning_col; int i_reltablespace; @@ -3451,7 +3453,8 @@ getTables(int *numTables) "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " - "c.relfrozenxid, " + "c.relfrozenxid, tc.oid AS toid, " + "tc.relfrozenxid AS tfrozenxid, " "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -3484,7 +3487,8 @@ getTables(int *numTables) "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " - "c.relfrozenxid, " + "c.relfrozenxid, tc.oid AS toid, " + "tc.relfrozenxid AS tfrozenxid, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -3518,6 +3522,8 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " "relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -3550,6 +3556,8 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -3582,6 +3590,8 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -3610,6 +3620,8 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " @@ -3633,6 +3645,8 @@ getTables(int *numTables) "relhasindex, relhasrules, " "'t'::bool AS relhasoids, " "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " @@ -3666,6 +3680,8 @@ getTables(int *numTables) "relhasindex, relhasrules, " "'t'::bool AS relhasoids, " "0 as relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " @@ -3711,6 +3727,8 @@ getTables(int *numTables) i_relhasrules = PQfnumber(res, "relhasrules"); i_relhasoids = PQfnumber(res, "relhasoids"); i_relfrozenxid = PQfnumber(res, "relfrozenxid"); + i_toastoid = PQfnumber(res, "toid"); + i_toastfrozenxid = PQfnumber(res, "tfrozenxid"); i_owning_tab = PQfnumber(res, "owning_tab"); i_owning_col = PQfnumber(res, "owning_col"); i_reltablespace = PQfnumber(res, "reltablespace"); @@ -3750,6 +3768,8 @@ getTables(int *numTables) tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0); tblinfo[i].hasoids = (strcmp(PQgetvalue(res, i, i_relhasoids), "t") == 0); tblinfo[i].frozenxid = atooid(PQgetvalue(res, i, i_relfrozenxid)); + tblinfo[i].toast_oid = atooid(PQgetvalue(res, i, i_toastoid)); + tblinfo[i].toast_frozenxid = atooid(PQgetvalue(res, i, i_toastfrozenxid)); if (PQgetisnull(res, i, i_reloftype)) tblinfo[i].reloftype = NULL; else @@ -10852,13 +10872,23 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) } } - appendPQExpBuffer(q, "\n-- For binary upgrade, set relfrozenxid.\n"); + appendPQExpBuffer(q, "\n-- For binary upgrade, set heap's relfrozenxid\n"); appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" "SET relfrozenxid = '%u'\n" "WHERE oid = ", tbinfo->frozenxid); appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout); appendPQExpBuffer(q, "::pg_catalog.regclass;\n"); + + if (tbinfo->toast_oid) + { + /* We preserve the toast oids, so we can use it during restore */ + appendPQExpBuffer(q, "\n-- For binary upgrade, set toast's relfrozenxid\n"); + appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" + "SET relfrozenxid = '%u'\n" + "WHERE oid = '%u';\n", + tbinfo->toast_frozenxid, tbinfo->toast_oid); + } } /* Loop dumping statistics and storage statements */ diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index c309f69..1dc7157 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -228,6 +228,8 @@ typedef struct _tableInfo bool hastriggers; /* does it have any triggers? */ bool hasoids; /* does it have OIDs? */ uint32 frozenxid; /* for restore frozen xid */ + Oid toast_oid; /* for restore toast frozen xid */ + uint32 toast_frozenxid;/* for restore toast frozen xid */ int ncheck; /* # of CHECK expressions */ char *reloftype; /* underlying type for typed table */ /* these two are set only if table is a sequence owned by a column: */ diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 3f6e77b..1ccdb4d 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3812,6 +3812,8 @@ getTables(int *numTables) int i_relhasrules; int i_relhasoids; int i_relfrozenxid; + int i_toastoid; + int i_toastfrozenxid; int i_relpersistence; int i_owning_tab; int i_owning_col; @@ -3855,7 +3857,9 @@ getTables(int *numTables) "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " - "c.relfrozenxid, c.relpersistence, " + "c.relfrozenxid, tc.oid AS toid, " + "tc.relfrozenxid AS tfrozenxid, " + "c.relpersistence, " "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -3889,7 +3893,9 @@ getTables(int *numTables) "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " - "c.relfrozenxid, 'p' AS relpersistence, " + "c.relfrozenxid, tc.oid AS toid, " + "tc.relfrozenxid AS tfrozenxid, " + "'p' AS relpersistence, " "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -3922,7 +3928,9 @@ getTables(int *numTables) "(%s c.relowner) AS rolname, " "c.relchecks, c.relhastriggers, " "c.relhasindex, c.relhasrules, c.relhasoids, " - "c.relfrozenxid, 'p' AS relpersistence, " + "c.relfrozenxid, tc.oid AS toid, " + "tc.relfrozenxid AS tfrozenxid, " + "'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -3955,7 +3963,10 @@ getTables(int *numTables) "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " - "relfrozenxid, 'p' AS relpersistence, " + "relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " + "'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -3987,7 +3998,10 @@ getTables(int *numTables) "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " - "0 AS relfrozenxid, 'p' AS relpersistence, " + "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " + "'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -4019,7 +4033,10 @@ getTables(int *numTables) "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " - "0 AS relfrozenxid, 'p' AS relpersistence, " + "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " + "'p' AS relpersistence, " "NULL AS reloftype, " "d.refobjid AS owning_tab, " "d.refobjsubid AS owning_col, " @@ -4047,7 +4064,10 @@ getTables(int *numTables) "(%s relowner) AS rolname, " "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, relhasoids, " - "0 AS relfrozenxid, 'p' AS relpersistence, " + "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " + "'p' AS relpersistence, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " @@ -4070,7 +4090,10 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, " "'t'::bool AS relhasoids, " - "0 AS relfrozenxid, 'p' AS relpersistence, " + "0 AS relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " + "'p' AS relpersistence, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " @@ -4103,7 +4126,10 @@ getTables(int *numTables) "relchecks, (reltriggers <> 0) AS relhastriggers, " "relhasindex, relhasrules, " "'t'::bool AS relhasoids, " - "0 as relfrozenxid, 'p' AS relpersistence, " + "0 as relfrozenxid, " + "0 AS toid, " + "0 AS tfrozenxid, " + "'p' AS relpersistence, " "NULL AS reloftype, " "NULL::oid AS owning_tab, " "NULL::int4 AS owning_col, " @@ -4149,6 +4175,8 @@ getTables(int *numTables) i_relhasrules = PQfnumber(res, "relhasrules"); i_relhasoids = PQfnumber(res, "relhasoids"); i_relfrozenxid = PQfnumber(res, "relfrozenxid"); + i_toastoid = PQfnumber(res, "toid"); + i_toastfrozenxid = PQfnumber(res, "tfrozenxid"); i_relpersistence = PQfnumber(res, "relpersistence"); i_owning_tab = PQfnumber(res, "owning_tab"); i_owning_col = PQfnumber(res, "owning_col"); @@ -4190,6 +4218,8 @@ getTables(int *numTables) tblinfo[i].hastriggers = (strcmp(PQgetvalue(res, i, i_relhastriggers), "t") == 0); tblinfo[i].hasoids = (strcmp(PQgetvalue(res, i, i_relhasoids), "t") == 0); tblinfo[i].frozenxid = atooid(PQgetvalue(res, i, i_relfrozenxid)); + tblinfo[i].toast_oid = atooid(PQgetvalue(res, i, i_toastoid)); + tblinfo[i].toast_frozenxid = atooid(PQgetvalue(res, i, i_toastfrozenxid)); if (PQgetisnull(res, i, i_reloftype)) tblinfo[i].reloftype = NULL; else @@ -12221,13 +12251,23 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) } } - appendPQExpBuffer(q, "\n-- For binary upgrade, set relfrozenxid\n"); + appendPQExpBuffer(q, "\n-- For binary upgrade, set heap's relfrozenxid\n"); appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" "SET relfrozenxid = '%u'\n" "WHERE oid = ", tbinfo->frozenxid); appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout); appendPQExpBuffer(q, "::pg_catalog.regclass;\n"); + + if (tbinfo->toast_oid) + { + /* We preserve the toast oids, so we can use it during restore */ + appendPQExpBuffer(q, "\n-- For binary upgrade, set toast's relfrozenxid\n"); + appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" + "SET relfrozenxid = '%u'\n" + "WHERE oid = '%u';\n", + tbinfo->toast_frozenxid, tbinfo->toast_oid); + } } /* Loop dumping statistics and storage statements */ diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index 113ecb1..6559e23 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -248,6 +248,8 @@ typedef struct _tableInfo bool hastriggers; /* does it have any triggers? */ bool hasoids; /* does it have OIDs? */ uint32 frozenxid; /* for restore frozen xid */ + Oid toast_oid; /* for restore toast frozen xid */ + uint32 toast_frozenxid;/* for restore toast frozen xid */ int ncheck; /* # of CHECK expressions */ char *reloftype; /* underlying type for typed table */ /* these two are set only if table is a sequence owned by a column: */
Jeff Davis wrote: > On Thu, 2011-04-07 at 22:21 -0400, Bruce Momjian wrote: > > One concern I have is that existing heap tables are protecting clog > > files, but once those are frozen, the system might remove clog files not > > realizing it has to freeze the heap tables too. > > I don't understand. Can you elaborate? Well, when you initially run pg_upgrade, your heap relfrozenxid is preserved, and we only remove clog files when _all_ relations in all database do not need them, so for a time the heap tables will keep the clogs around. Over time, the heap files will be vacuum frozen, and their relfrozenxid advanced. Once that happens to all heaps, the system thinks it can remove clog files, and doesn't realize the toast tables also need vacuuming. This is the "it might become more of a problem in the future" concern I have. The script I posted does fix this, and the code changes prevent it from happening completely. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Jeff Davis wrote: > On Fri, 2011-04-08 at 07:08 -0400, Noah Misch wrote: > > > Right, VACUUM FREEZE. I now see I don't need to set > > > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has: > > > > > > if (n->options & VACOPT_FREEZE) > > > n->freeze_min_age = n->freeze_table_age = 0; > > > > True; it just performs more work than strictly necessary. We don't actually > > need earlier-than-usual freezing. We need only ensure that the relfrozenxid > > will guide future VACUUMs to do that freezing early enough. However, I'm not > > sure how to do that without directly updating relfrozenxid, so it's probably > > just as well to cause some extra work and stick to the standard interface. > > If there are tuples in a toast table containing xids that are older than > the toast table's relfrozenxid, then there are only two options: > > 1. Make relfrozenxid go backward to the right value. There is currently > no mechanism to do this without compiling C code into the server, > because (a) VACUUM FREEZE will never move the relfrozenxid backward; and > (b) there is no way to find the oldest xid in a table with a normal > snapshot. Right, this is all to complicated. > 2. Get rid of those xids older than relfrozenxid (i.e. VACUUM FREEZE). > > I don't know what you mean about VACUUM FREEZE doing extra work. I > suppose you could set the vacuum_freeze_min_age to be exactly the right > value such that it freezes everything before the existing (and wrong) > relfrozenxid, but in practice I think it would be the same amount of > work. We don't know how far back to go with freezing, so we just have to freeze it all. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, Apr 08, 2011 at 10:05:01AM -0700, Jeff Davis wrote: > On Fri, 2011-04-08 at 07:08 -0400, Noah Misch wrote: > > > Right, VACUUM FREEZE. I now see I don't need to set > > > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has: > > > > > > if (n->options & VACOPT_FREEZE) > > > n->freeze_min_age = n->freeze_table_age = 0; > > > > True; it just performs more work than strictly necessary. We don't actually > > need earlier-than-usual freezing. We need only ensure that the relfrozenxid > > will guide future VACUUMs to do that freezing early enough. However, I'm not > > sure how to do that without directly updating relfrozenxid, so it's probably > > just as well to cause some extra work and stick to the standard interface. > > If there are tuples in a toast table containing xids that are older than > the toast table's relfrozenxid, then there are only two options: > > 1. Make relfrozenxid go backward to the right value. There is currently > no mechanism to do this without compiling C code into the server, > because (a) VACUUM FREEZE will never move the relfrozenxid backward; and > (b) there is no way to find the oldest xid in a table with a normal > snapshot. Couldn't you set relfrozenxid and datfrozenxid to txid_current() - 1100000000 (the highest possible vacuum_freeze_min_age, plus some slop), then run "SET vacuum_freeze_table_age = 0; VACUUM tbl" on all tables for which you did this? There's no need to set relfrozenxid back to a particular "right" value. Not suggesting we recommend this, but I can't think offhand why it wouldn't suffice. > 2. Get rid of those xids older than relfrozenxid (i.e. VACUUM FREEZE). > > I don't know what you mean about VACUUM FREEZE doing extra work. I > suppose you could set the vacuum_freeze_min_age to be exactly the right > value such that it freezes everything before the existing (and wrong) > relfrozenxid, but in practice I think it would be the same amount of > work. Suppose that your next xid at pg_upgrade time was 500M, and it's now 505M. If you're using the default vacuum_freeze_min_age = 50M, "SET vacuum_freeze_table_age = 0; VACUUM tbl" will only freeze tuples covering 5M transaction ids. "VACUUM FREEZE tbl" (a.k.a "SET vacuum_freeze_table_age = 0; SET vacuum_freeze_min_age = 0; VACUUM tbl") will freeze tuples covering 55M transaction ids. nm
Bruce Momjian wrote: > New version; I made some other small adjustments: > > -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8 > -- servers that was upgraded by pg_upgrade and pg_migrator. > -- Run the script using psql for every database in the cluster > -- except 'template0', e.g.: > -- psql -U postgres -a -f pg_upgrade_fix.sql dbname > -- This must be run from a writable directory. > -- It will not lock any tables but will generate I/O. > -- > CREATE TEMPORARY TABLE pg_upgrade_fix AS > SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';' > FROM pg_class c, pg_namespace n > WHERE c.relnamespace = n.oid AND > n.nspname = 'pg_toast' AND > c.relkind = 't' > ORDER by c.oid; > \copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql'; > \i pg_upgrade_tmp.sql OK, now that I have committed the fixes to git, I think it is time to consider how we are going to handle this fix for people who have already used pg_upgrade, or are using it in currently released versions. I am thinking an announce list email with this query would be enough, and state that we are planning a minor release with this fix in the next few weeks. I can provide details on the cause and behavior of the bug. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > OK, now that I have committed the fixes to git, I think it is time to > consider how we are going to handle this fix for people who have already > used pg_upgrade, or are using it in currently released versions. > > I am thinking an announce list email with this query would be enough, > and state that we are planning a minor release with this fix in the > next few weeks. I can provide details on the cause and behavior of the > bug. OK, here is a draft email announcement: --------------------------------------------------------------------------- A bug has been discovered in all released Postgres versions that performed major version upgrades using pg_upgrade and (formerly) pg_migrator. The bug can cause queries to return the following error: ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or directory This error prevents access to very wide values stored in the database. To prevent such failures, users are encourage to run the following psql script in all upgraded databases as soon as possible; a fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4: -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.-- Runthe script using psql for every database in the cluster-- except 'template0', e.g.:-- psql -U postgres -a -f pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';' FROM pg_classc, pg_namespace n WHERE c.relnamespace = n.oid AND n.nspname = 'pg_toast' AND c.relkind ='t' ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce, * Bruce Momjian (bruce@momjian.us) wrote: > OK, here is a draft email announcement: Couple suggestions (also on IRC): > --------------------------------------------------------------------------- A bug has been discovered in all released versions of pg_upgrade and (formerly) pg_migrator. Anyone who has used pg_upgrade or pg_migrator should take the following corrective actions as soon as possible. This bug can cause queries to return the following error: ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or directory This error prevents access to very wide values stored in the database. To prevent such failures users need to run the following psql script, as the superuser, in all upgraded databases as soon as possible: -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.-- Runthe script using psql for every database in the cluster-- except 'template0', e.g.:-- psql -U postgres -a -f pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';' FROM pg_classc, pg_namespace n WHERE c.relnamespace = n.oid AND n.nspname = 'pg_toast' AND c.relkind ='t' ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. The fixed version of pg_uprade will remove the need for the above script by correctly updating the TOAST tables in the migrated database. Thanks, Stephen
Stephen Frost wrote: -- Start of PGP signed section. > Bruce, > > * Bruce Momjian (bruce@momjian.us) wrote: > > OK, here is a draft email announcement: > > Couple suggestions (also on IRC): Yes, I like your version better; I did adjust the wording of the last sentence to mention it is really the release, not the new pg_upgrade, which fixes the problem because the fixes are in pg_dump, and hence a new pg_upgrade binary will not work; you need a new install. --------------------------------------------------------------------------- A bug has been discovered in all released versions of pg_upgrade and (formerly) pg_migrator. Anyone who has used pg_upgrade or pg_migrator should take the following corrective actions as soon as possible. This bug can cause queries to return the following error: ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or directory=20 This error prevents access to very wide values stored in the database. To prevent such failures users need to run the following psql script, as the superuser, in all upgraded databases as soon as possible: -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.-- Runthe script using psql for every database in the cluster-- except 'template0', e.g.:-- psql -U postgres -a -f pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';' FROM pg_classc, pg_namespace n=20 WHERE c.relnamespace =3D n.oid AND=20 n.nspname =3D 'pg_toast' AND c.relkind=3D 't' ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. These releases will remove the need for the above script by correctly updating all TOAST tables in the migrated databases. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
* Bruce Momjian (bruce@momjian.us) wrote: > Yes, I like your version better; I did adjust the wording of the last > sentence to mention it is really the release, not the new pg_upgrade, > which fixes the problem because the fixes are in pg_dump, and hence a > new pg_upgrade binary will not work; you need a new install. Err, right, good point. You might even want to call that out specifically, so no one is confused. Also this: > -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8 > -- servers that was upgraded by pg_upgrade and pg_migrator. 'that was' should be 'that were'. > A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. > These releases will remove the need for the above script by correctly > updating all TOAST tables in the migrated databases. My suggestion: A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. These releases will include an updated pg_dump which will remove the need for the above script by correctly dumping all TOAST tables in the migrated databases. Thanks, Stephen
On Fri, 2011-04-08 at 13:35 -0400, Noah Misch wrote: > > 1. Make relfrozenxid go backward to the right value. There is currently > > no mechanism to do this without compiling C code into the server, > > because (a) VACUUM FREEZE will never move the relfrozenxid backward; and > > (b) there is no way to find the oldest xid in a table with a normal > > snapshot. > > Couldn't you set relfrozenxid and datfrozenxid to txid_current() - 1100000000 > (the highest possible vacuum_freeze_min_age, plus some slop), then run "SET > vacuum_freeze_table_age = 0; VACUUM tbl" on all tables for which you did this? > There's no need to set relfrozenxid back to a particular "right" value. That's a good point that we don't need relfrozenxid to really be the right value; we just need it to be less than or equal to the right value. I don't think you need to mess around with vacuum_freeze_table_age though -- that looks like it's taken care of in the logic for deciding when to do a full table vacuum. This has the additional merit that transaction IDs are not needlessly removed; therefore leaving some forensic information if there are further problems. > > Suppose that your next xid at pg_upgrade time was 500M, and it's now 505M. If > you're using the default vacuum_freeze_min_age = 50M, "SET > vacuum_freeze_table_age = 0; VACUUM tbl" will only freeze tuples covering 5M > transaction ids. If the pg_upgrade time was at txid 500M, then the relfrozenxid of the toast table will be about 500M. That means you need to get rid of all xids less than about 500M (unless you already fixed relfrozenxid, perhaps using the process you mention above). So if you only freeze tuples less than about 455M (505M - 50M), then that is wrong. The only difference really is that you don't really need to freeze those last 5M transactions since the upgrade happened. Regards,Jeff Davis
Stephen Frost wrote: -- Start of PGP signed section. > * Bruce Momjian (bruce@momjian.us) wrote: > > Yes, I like your version better; I did adjust the wording of the last > > sentence to mention it is really the release, not the new pg_upgrade, > > which fixes the problem because the fixes are in pg_dump, and hence a > > new pg_upgrade binary will not work; you need a new install. > > Err, right, good point. You might even want to call that out > specifically, so no one is confused. Also this: > > > -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8 > > -- servers that was upgraded by pg_upgrade and pg_migrator. > > 'that was' should be 'that were'. > > > A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. > > These releases will remove the need for the above script by correctly > > updating all TOAST tables in the migrated databases. > > My suggestion: > > A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. > These releases will include an updated pg_dump which will remove the > need for the above script by correctly dumping all TOAST tables in the > migrated databases. I am worried if I mention pg_dump that people will think pg_dump is broken, when in fact it is only the --binary-upgrade mode of pg_dump that is broken. I adjusted the wording of the last paragraph slighly to be clearer, but hopefully not confuse. We don't actually check the pg_dump version and I am hesistant to add such a check. I was thinking of sending this out on Monday, but now think people might like to have the weekend to fix this so I am thinking of sending it to announce tonight, in 8 hours. OK? --------------------------------------------------------------------------- A bug has been discovered in all released versions of pg_upgrade and (formerly) pg_migrator. Anyone who has used pg_upgrade or pg_migrator should take the following corrective actions as soon as possible. This bug can cause queries to return the following error: ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or directory=20 This error prevents access to very wide values stored in the database. To prevent such failures users need to run the following psql script, as the superuser, in all upgraded databases as soon as possible: -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that was upgraded by pg_upgrade and pg_migrator.-- Runthe script using psql for every database in the cluster-- except 'template0', e.g.:-- psql -U postgres -a -f pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';' FROM pg_classc, pg_namespace n=20 WHERE c.relnamespace =3D n.oid AND=20 n.nspname =3D 'pg_toast' AND c.relkind=3D 't' ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. These releases will remove the need for the above script by correctly dumping all TOAST tables in the migrated databases. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > I am worried if I mention pg_dump that people will think pg_dump is > broken, when in fact it is only the --binary-upgrade mode of pg_dump > that is broken. > > I adjusted the wording of the last paragraph slighly to be clearer, but > hopefully not confuse. > > We don't actually check the pg_dump version and I am hesistant to add > such a check. > > I was thinking of sending this out on Monday, but now think people might > like to have the weekend to fix this so I am thinking of sending it to > announce tonight, in 8 hours. OK? Updated version with IRC user suggestions: --------------------------------------------------------------------------- Critical Fix for pg_upgrade/pg_migrator Users --------------------------------------------- A bug has been discovered in all released versions of pg_upgrade and (formerly) pg_migrator. Anyone who has used pg_upgrade or pg_migrator should take the following corrective actions as soon as possible. This bug can cause queries to return the following error: ERROR: could not access status of transaction ######DETAIL: could not open file "pg_clog/####": No such file or directory=20 This error prevents access to very wide values stored in the database. To prevent such failures users need to run the following psql script, as the superuser, in all upgraded databases as soon as possible: -- This script fixes data in pre-PG 9.0.4 and pre-8.4.8-- servers that were upgraded by pg_upgrade and pg_migrator.-- Runthe script using psql for every database in the cluster-- except 'template0', e.g.:-- psql -U postgres -a -f pg_upgrade_fix.sqldbname-- This must be run from a writable directory.-- It will not lock any tables but will generate I/O.--CREATETEMPORARY TABLE pg_upgrade_fix AS SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';' FROM pg_classc, pg_namespace n WHERE c.relnamespace = n.oid AND n.nspname = 'pg_toast' AND c.relkind = 't' ORDER by c.oid;\copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql';\i pg_upgrade_tmp.sql A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. These releases will remove the need for the above script by correctly restoring all TOAST tables in the migrated databases. 2011-04-08 -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, Apr 08, 2011 at 12:16:50PM -0700, Jeff Davis wrote: > On Fri, 2011-04-08 at 13:35 -0400, Noah Misch wrote: > > > 1. Make relfrozenxid go backward to the right value. There is currently > > > no mechanism to do this without compiling C code into the server, > > > because (a) VACUUM FREEZE will never move the relfrozenxid backward; and > > > (b) there is no way to find the oldest xid in a table with a normal > > > snapshot. > > > > Couldn't you set relfrozenxid and datfrozenxid to txid_current() - 1100000000 > > (the highest possible vacuum_freeze_min_age, plus some slop), then run "SET > > vacuum_freeze_table_age = 0; VACUUM tbl" on all tables for which you did this? > > There's no need to set relfrozenxid back to a particular "right" value. > > That's a good point that we don't need relfrozenxid to really be the > right value; we just need it to be less than or equal to the right > value. I don't think you need to mess around with > vacuum_freeze_table_age though -- that looks like it's taken care of in > the logic for deciding when to do a full table vacuum. Actually, I think the only reason to VACUUM at all after hacking relfrozenxid is to visit every tuple and see whether you need to restore any clog segments from backup. Suppose your postgresql.conf overrides vacuum_freeze_table_age to the maximum of 2B. If you hacked relfrozenxid and just VACUUMed without modifying vacuum_freeze_table_age, you wouldn't get a full table scan. In another ~1B transactions, you'll get that full-table VACUUM, and it might then discover missing clog segments. Though you wouldn't risk any new clog loss in the mean time, by doing the VACUUM with vacuum_freeze_table_age=0 on each affected table, you can go away confident that any clog restoration is behind you. > This has the additional merit that transaction IDs are not needlessly > removed; therefore leaving some forensic information if there are > further problems. > > > > > Suppose that your next xid at pg_upgrade time was 500M, and it's now 505M. If > > you're using the default vacuum_freeze_min_age = 50M, "SET > > vacuum_freeze_table_age = 0; VACUUM tbl" will only freeze tuples covering 5M > > transaction ids. > > If the pg_upgrade time was at txid 500M, then the relfrozenxid of the > toast table will be about 500M. That means you need to get rid of all > xids less than about 500M (unless you already fixed relfrozenxid, > perhaps using the process you mention above). > > So if you only freeze tuples less than about 455M (505M - 50M), then > that is wrong. Agreed. If you don't fix relfrozenxid, you can't win much in that example. > > The only difference really is that you don't really need to freeze those > last 5M transactions since the upgrade happened. But change the numbers somewhat. Say you ran pg_upgrade at xid 110M. Your TOAST table had relfrozenxid = 100M before pg_upgrade and 110M+epsilon after. The next xid now sits at 170M. Without any manual relfrozenxid changes, any full-table VACUUM will bump the relfrozenxid to 120M. A VACUUM FREEZE would freeze tuples covering 70M transactions, while a VACUUM with vacuum_freeze_table_age = 0 would freeze tuples across only 20M transactions. An unadorned VACUUM wouldn't even perform a full-table scan. All that being said, recommending VACUUM FREEZE seems sensibly conservative. Thanks, nm
> -- It will not lock any tables but will generate I/O. add: IMPORTANT: Depending on the size and configuration of your database, this script may generate a lot of I/O and degrade database performance. Users should execute this script during a low traffic period and watch the database load. > -- > CREATE TEMPORARY TABLE pg_upgrade_fix AS > SELECT 'VACUUM FREEZE pg_toast.' || c.relname || ';' > FROM pg_class c, pg_namespace n > WHERE c.relnamespace = n.oid AND > n.nspname = 'pg_toast' AND > c.relkind = 't' > ORDER by c.oid; > \copy pg_upgrade_fix TO 'pg_upgrade_tmp.sql'; > \i pg_upgrade_tmp.sql > > A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. > These releases will remove the need for the above script by correctly > restoring all TOAST tables in the migrated databases. add: However, users of databases which have been already migrated still need to run the script, even if they upgrade to 9.0.4. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> -- It will not lock any tables but will generate I/O. > add: > IMPORTANT: Depending on the size and configuration of your database, > this script may generate a lot of I/O and degrade database performance. > Users should execute this script during a low traffic period and watch > the database load. It might be worth suggesting that people can adjust their vacuum delay parameters to control the I/O load. regards, tom lane
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >> -- It will not lock any tables but will generate I/O. > > > add: > > IMPORTANT: Depending on the size and configuration of your database, > > this script may generate a lot of I/O and degrade database performance. > > Users should execute this script during a low traffic period and watch > > the database load. > > It might be worth suggesting that people can adjust their vacuum delay > parameters to control the I/O load. I talked to Tom about this and I am worried people will adjust it so it takes days to complete. Is that a valid concern? Does anyone have a suggested value? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Tom Lane wrote: > > Josh Berkus <josh@agliodbs.com> writes: > > >> -- It will not lock any tables but will generate I/O. > > > > > add: > > > IMPORTANT: Depending on the size and configuration of your database, > > > this script may generate a lot of I/O and degrade database performance. > > > Users should execute this script during a low traffic period and watch > > > the database load. > > > > It might be worth suggesting that people can adjust their vacuum delay > > parameters to control the I/O load. > > I talked to Tom about this and I am worried people will adjust it so it > takes days to complete. Is that a valid concern? Does anyone have a > suggested value? Josh Berkus helped me get lots more details on a wiki page for this: http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix I will reference the wiki in the email announcement. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, 2011-04-08 at 15:03 -0400, Bruce Momjian wrote: > A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. > These releases will remove the need for the above script by correctly > updating all TOAST tables in the migrated databases. You might want to clarify that the fix may be required if you ever used pg_upgrade before. Using the new version of pg_upgrade/dump when you still have a bad relfrozenxid doesn't help. Regards,Jeff Davis
On Fri, Apr 8, 2011 at 4:00 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2011-04-08 at 15:03 -0400, Bruce Momjian wrote: >> A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4. >> These releases will remove the need for the above script by correctly >> updating all TOAST tables in the migrated databases. > > You might want to clarify that the fix may be required if you ever used > pg_upgrade before. Using the new version of pg_upgrade/dump when you > still have a bad relfrozenxid doesn't help. > > Regards, > Jeff Davis > I've been noticing in my logs for the past few days the message you note in the wiki. It seems to occur during a vacuum around 7:30am every day. I will be running the suggested script shortly, but can anyone tell me in how bad of shape my db is in? This is our production db with two hot standby's running off it. grep -i 'could not access status of transaction' postgresql-2011-04*.log postgresql-2011-04-06.log:2011-04-06 07:28:27 PDT [15882]: [1-1] (user=postgres) (rhost=[local]) ERROR: could not access status of transaction 1273385235 postgresql-2011-04-07.log:2011-04-07 07:27:14 PDT [29790]: [1-1] (user=postgres) (rhost=[local]) ERROR: could not access status of transaction 1273385235 postgresql-2011-04-08.log:2011-04-08 07:26:35 PDT [2402]: [1-1] (user=postgres) (rhost=[local]) ERROR: could not access status of transaction 1273385235
On Fri, Apr 8, 2011 at 4:51 PM, bricklen <bricklen@gmail.com> wrote: > I've been noticing in my logs for the past few days the message you > note in the wiki. It seems to occur during a vacuum around 7:30am > every day. I will be running the suggested script shortly, but can > anyone tell me in how bad of shape my db is in? This is our production > db with two hot standby's running off it. > > grep -i 'could not access status of transaction' postgresql-2011-04*.log > postgresql-2011-04-06.log:2011-04-06 07:28:27 PDT [15882]: [1-1] > (user=postgres) (rhost=[local]) ERROR: could not access status of > transaction 1273385235 > postgresql-2011-04-07.log:2011-04-07 07:27:14 PDT [29790]: [1-1] > (user=postgres) (rhost=[local]) ERROR: could not access status of > transaction 1273385235 > postgresql-2011-04-08.log:2011-04-08 07:26:35 PDT [2402]: [1-1] > (user=postgres) (rhost=[local]) ERROR: could not access status of > transaction 1273385235 version 9.03, if that helps
bricklen, * bricklen (bricklen@gmail.com) wrote: > I've been noticing in my logs for the past few days the message you > note in the wiki. It seems to occur during a vacuum around 7:30am > every day. I will be running the suggested script shortly, but can > anyone tell me in how bad of shape my db is in? This is our production > db with two hot standby's running off it. Unfortunately, I don't think the script that Bruce posted will help if the clog files have been removed (which appears to be the case here). Do you have a backup which includes older files which existed under the 'pg_clog' directory under your database's root? Hopefully you do and can restore those and restart the database. If you restore and then restart then Bruce's script could be run and hopefully would clear out these errors. Bruce, please correct me if I got any part of this wrong. Thanks, Stephen
Hi Stephen, On Fri, Apr 8, 2011 at 6:57 PM, Stephen Frost <sfrost@snowman.net> wrote: > bricklen, > > * bricklen (bricklen@gmail.com) wrote: >> I've been noticing in my logs for the past few days the message you >> note in the wiki. It seems to occur during a vacuum around 7:30am >> every day. I will be running the suggested script shortly, but can >> anyone tell me in how bad of shape my db is in? This is our production >> db with two hot standby's running off it. > > Unfortunately, I don't think the script that Bruce posted will help if > the clog files have been removed (which appears to be the case here). > Do you have a backup which includes older files which existed under the > 'pg_clog' directory under your database's root? Hopefully you do and > can restore those and restart the database. If you restore and then > restart then Bruce's script could be run and hopefully would clear out > these errors. > > Bruce, please correct me if I got any part of this wrong. > > Thanks, > > Stephen I looked deeper into our backup archives, and it appears that I do have the clog file reference in the error message "DETAIL: Could not open file "pg_clog/04BE": No such file or directory." It exists in an untouched backup directory that I originally made when I set up the backup and ran pg_upgrade. I'm not sure if it is from version 8.4 or 9.0.2 though. Is it safe to just copy it into my production pg_clog dir and restart?
bricklen, * bricklen (bricklen@gmail.com) wrote: > I looked deeper into our backup archives, and it appears that I do > have the clog file reference in the error message "DETAIL: Could not > open file "pg_clog/04BE": No such file or directory." Great! And there's no file in pg_clog which matches that name (or exist which are smaller in value), right? > It exists in an untouched backup directory that I originally made when > I set up the backup and ran pg_upgrade. I'm not sure if it is from > version 8.4 or 9.0.2 though. Is it safe to just copy it into my > production pg_clog dir and restart? It should be, provided you're not overwriting any files or putting a clog file in place which is greater than the other clog files in that directory. Thanks, Stephen
On Fri, Apr 8, 2011 at 7:11 PM, Stephen Frost <sfrost@snowman.net> wrote: > bricklen, > > * bricklen (bricklen@gmail.com) wrote: >> I looked deeper into our backup archives, and it appears that I do >> have the clog file reference in the error message "DETAIL: Could not >> open file "pg_clog/04BE": No such file or directory." > > Great! And there's no file in pg_clog which matches that name (or > exist which are smaller in value), right? > >> It exists in an untouched backup directory that I originally made when >> I set up the backup and ran pg_upgrade. I'm not sure if it is from >> version 8.4 or 9.0.2 though. Is it safe to just copy it into my >> production pg_clog dir and restart? > > It should be, provided you're not overwriting any files or putting a > clog file in place which is greater than the other clog files in that > directory. It appears that there are no files lower. Missing clog: 04BE production pg_clog dir: ls -lhrt 9.0/data/pg_clog total 38M -rw------- 1 postgres postgres 256K Jan 25 21:04 04BF -rw------- 1 postgres postgres 256K Jan 26 12:35 04C0 -rw------- 1 postgres postgres 256K Jan 26 20:58 04C1 -rw------- 1 postgres postgres 256K Jan 27 13:02 04C2 -rw------- 1 postgres postgres 256K Jan 28 01:00 04C3 ... old backup pg_clog dir (possibly v8.4) ... -rw------- 1 postgres postgres 256K Jan 23 21:11 04BB -rw------- 1 postgres postgres 256K Jan 24 08:56 04BC -rw------- 1 postgres postgres 256K Jan 25 06:32 04BD -rw------- 1 postgres postgres 256K Jan 25 10:58 04BE -rw------- 1 postgres postgres 256K Jan 25 20:44 04BF -rw------- 1 postgres postgres 8.0K Jan 25 20:54 04C0 So, if I have this right, my steps to take are: - copy the backup 04BE to production pg_clog dir - restart the database - run Bruce's script Does that sound right? Has anyone else experienced this? I'm leery of testing this on my production db, as our last pg_dump was from early this morning, so I apologize for being so cautious. Thanks, Bricklen
On Fri, Apr 8, 2011 at 7:20 PM, bricklen <bricklen@gmail.com> wrote: > On Fri, Apr 8, 2011 at 7:11 PM, Stephen Frost <sfrost@snowman.net> wrote: >> bricklen, >> >> * bricklen (bricklen@gmail.com) wrote: >>> I looked deeper into our backup archives, and it appears that I do >>> have the clog file reference in the error message "DETAIL: Could not >>> open file "pg_clog/04BE": No such file or directory." >> >> Great! And there's no file in pg_clog which matches that name (or >> exist which are smaller in value), right? >> >>> It exists in an untouched backup directory that I originally made when >>> I set up the backup and ran pg_upgrade. I'm not sure if it is from >>> version 8.4 or 9.0.2 though. Is it safe to just copy it into my >>> production pg_clog dir and restart? >> >> It should be, provided you're not overwriting any files or putting a >> clog file in place which is greater than the other clog files in that >> directory. > > It appears that there are no files lower. > > Missing clog: 04BE > > production pg_clog dir: > ls -lhrt 9.0/data/pg_clog > total 38M > -rw------- 1 postgres postgres 256K Jan 25 21:04 04BF > -rw------- 1 postgres postgres 256K Jan 26 12:35 04C0 > -rw------- 1 postgres postgres 256K Jan 26 20:58 04C1 > -rw------- 1 postgres postgres 256K Jan 27 13:02 04C2 > -rw------- 1 postgres postgres 256K Jan 28 01:00 04C3 > ... > > old backup pg_clog dir (possibly v8.4) > ... > -rw------- 1 postgres postgres 256K Jan 23 21:11 04BB > -rw------- 1 postgres postgres 256K Jan 24 08:56 04BC > -rw------- 1 postgres postgres 256K Jan 25 06:32 04BD > -rw------- 1 postgres postgres 256K Jan 25 10:58 04BE > -rw------- 1 postgres postgres 256K Jan 25 20:44 04BF > -rw------- 1 postgres postgres 8.0K Jan 25 20:54 04C0 > > > So, if I have this right, my steps to take are: > - copy the backup 04BE to production pg_clog dir > - restart the database > - run Bruce's script > > Does that sound right? Has anyone else experienced this? I'm leery of > testing this on my production db, as our last pg_dump was from early > this morning, so I apologize for being so cautious. > > Thanks, > > Bricklen What I've tested and current status: When I saw the announcement a few hours ago, I started setting up a 9.0.3 hot standby. I brought it live a few minutes ago. - I copied over the 04BE clog from the original backup, - restarted the standby cluster - ran the script against the main database and turned up a bunch of other transactions that were missing: psql:pg_upgrade_tmp.sql:539: ERROR: could not access status of transaction 1248683931 DETAIL: Could not open file "pg_clog/04A6": No such file or directory. psql:pg_upgrade_tmp.sql:540: ERROR: could not access status of transaction 1249010987 DETAIL: Could not open file "pg_clog/04A7": No such file or directory. psql:pg_upgrade_tmp.sql:541: ERROR: could not access status of transaction 1250325059 DETAIL: Could not open file "pg_clog/04A8": No such file or directory. psql:pg_upgrade_tmp.sql:542: ERROR: could not access status of transaction 1252759918 DETAIL: Could not open file "pg_clog/04AA": No such file or directory. psql:pg_upgrade_tmp.sql:543: ERROR: could not access status of transaction 1254527871 DETAIL: Could not open file "pg_clog/04AC": No such file or directory. psql:pg_upgrade_tmp.sql:544: ERROR: could not access status of transaction 1256193334 DETAIL: Could not open file "pg_clog/04AD": No such file or directory. psql:pg_upgrade_tmp.sql:556: ERROR: could not access status of transaction 1268739471 DETAIL: Could not open file "pg_clog/04B9": No such file or directory. I checked, and found that each one of those files exists in the original backup location. - scp'd those files to the hot standby clog directory, - pg_ctl stop -m fast - pg_ctl start - ran the script Hit a bunch of missing clog file errors like above, repeated the scp + bounce + script process 4 or 5 more times until no more missing clog file messages surfaced. Now, is this safe to run against my production database? **Those steps again, to run against prod: cp the clog files from the original backup to dir to my production pg_clog dir bounce the database run the script against all database in the cluster Anyone have any suggestions or changes before I commit myself to this course of action? Thanks, Bricklen
bricklen, * bricklen (bricklen@gmail.com) wrote: > Now, is this safe to run against my production database? Yes, with a few caveats. One recommendation is to also increase autovacuum_freeze_max_age to 500000000 (500m), which will hopefully prevent autovacuum from 'butting in' and causing issues during the process. Also, a database-wide 'VACUUM FREEZE;' should be lower-risk, if you can afford it (it will cause a lot of i/o on the system). The per-table 'VACUUM FREEZE <table>;' that the script does can end up removing clog files prematurely. > Anyone have any suggestions or changes before I commit myself to this > course of action? If you run into problems, and perhaps even before starting, you may want to pop in to #postgresql on irc.freenode.net, there are people there who can help you with this process who are very familiar with PG. Thanks, Stephen
Stephen Frost wrote: -- Start of PGP signed section. > bricklen, > > * bricklen (bricklen@gmail.com) wrote: > > Now, is this safe to run against my production database? > > Yes, with a few caveats. One recommendation is to also increase > autovacuum_freeze_max_age to 500000000 (500m), which will hopefully > prevent autovacuum from 'butting in' and causing issues during the > process. Also, a database-wide 'VACUUM FREEZE;' should be lower-risk, > if you can afford it (it will cause a lot of i/o on the system). The > per-table 'VACUUM FREEZE <table>;' that the script does can end up > removing clog files prematurely. > > > Anyone have any suggestions or changes before I commit myself to this > > course of action? > > If you run into problems, and perhaps even before starting, you may want > to pop in to #postgresql on irc.freenode.net, there are people there who > can help you with this process who are very familiar with PG. Stephen is 100% correct and we have updated the wiki to explain recovery details: http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, Apr 8, 2011 at 8:07 PM, Bruce Momjian <bruce@momjian.us> wrote: > Stephen Frost wrote: > -- Start of PGP signed section. >> bricklen, >> >> * bricklen (bricklen@gmail.com) wrote: >> > Now, is this safe to run against my production database? >> >> Yes, with a few caveats. One recommendation is to also increase >> autovacuum_freeze_max_age to 500000000 (500m), which will hopefully >> prevent autovacuum from 'butting in' and causing issues during the >> process. Also, a database-wide 'VACUUM FREEZE;' should be lower-risk, >> if you can afford it (it will cause a lot of i/o on the system). The >> per-table 'VACUUM FREEZE <table>;' that the script does can end up >> removing clog files prematurely. >> >> > Anyone have any suggestions or changes before I commit myself to this >> > course of action? >> >> If you run into problems, and perhaps even before starting, you may want >> to pop in to #postgresql on irc.freenode.net, there are people there who >> can help you with this process who are very familiar with PG. > > Stephen is 100% correct and we have updated the wiki to explain recovery > details: > > http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix > Thanks guys, I really appreciate your help. For the vacuum freeze, you say database-wide, should I run vacuumdb -a -v -F ? Will freezing the other tables in the cluster help (not sure how that works with template0/1 databases?)
bricklen, * bricklen (bricklen@gmail.com) wrote: > Thanks guys, I really appreciate your help. For the vacuum freeze, you > say database-wide, should I run vacuumdb -a -v -F ? Will freezing the > other tables in the cluster help (not sure how that works with > template0/1 databases?) Yes, using the command-line 'vacuumdb -a -v -F' would work. It won't try to vacuum template0, and doing template1 is correct. Thanks, Stephen
bricklen wrote: > On Fri, Apr 8, 2011 at 8:07 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Stephen Frost wrote: > > -- Start of PGP signed section. > >> bricklen, > >> > >> * bricklen (bricklen@gmail.com) wrote: > >> > Now, is this safe to run against my production database? > >> > >> Yes, with a few caveats. ?One recommendation is to also increase > >> autovacuum_freeze_max_age to 500000000 (500m), which will hopefully > >> prevent autovacuum from 'butting in' and causing issues during the > >> process. ?Also, a database-wide 'VACUUM FREEZE;' should be lower-risk, > >> if you can afford it (it will cause a lot of i/o on the system). ?The > >> per-table 'VACUUM FREEZE <table>;' that the script does can end up > >> removing clog files prematurely. > >> > >> > Anyone have any suggestions or changes before I commit myself to this > >> > course of action? > >> > >> If you run into problems, and perhaps even before starting, you may want > >> to pop in to #postgresql on irc.freenode.net, there are people there who > >> can help you with this process who are very familiar with PG. > > > > Stephen is 100% correct and we have updated the wiki to explain recovery > > details: > > > > ? ? ? ?http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix > > > > Thanks guys, I really appreciate your help. For the vacuum freeze, you > say database-wide, should I run vacuumdb -a -v -F ? Will freezing the > other tables in the cluster help (not sure how that works with > template0/1 databases?) Exactly. Internally pg_upgrade uses: vacuumdb --all --freeze in the empty new cluster to remove any links to pg_clog files (because we copy the old pg_clog files into the new cluster directory). (This is proof that the old and new clog files are the same format.) If you run vacuumdb as above in the new cluster, it will again remove any requirement on pg_clog, which is our goal. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Why is it important to have the original pg_clog files around? Since the transactions in question are below the freeze horizon, surely the tuples that involve those transaction have all been visited by vacuum and thus removed if they were leftover from aborted transactions or deleted, no? So you could just fill those files with the 0x55 pattern (signalling "all transactions are committed") and the net result should be the same. No? Forgive me if I'm missing something. I haven't been following this thread and I'm more than a little tired (but wanted to shoot this today because I'm gonna be able to, until Monday). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Apr 8, 2011 at 9:28 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > > Why is it important to have the original pg_clog files around? Since > the transactions in question are below the freeze horizon, surely the > tuples that involve those transaction have all been visited by vacuum > and thus removed if they were leftover from aborted transactions or > deleted, no? So you could just fill those files with the 0x55 pattern > (signalling "all transactions are committed") and the net result should > be the same. No? > > Forgive me if I'm missing something. I haven't been following this > thread and I'm more than a little tired (but wanted to shoot this today > because I'm gonna be able to, until Monday). > > -- > Álvaro Herrera <alvherre@commandprompt.com> > The PostgreSQL Company - Command Prompt, Inc. > PostgreSQL Replication, Consulting, Custom Development, 24x7 support Update on the status of the steps we took, which were: - test on a hot standby by bringing it live, running the script, determing the missing clog files, copying them into the live (hot standby) pg_clog dir Now, on the master, copied the same old clog files into the production *master*, ran vacuumdb -a -v -F. The step I should have taken on the master before the vacuumdb -F would have been to run the http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix script to see if I was missing any clog files on the master. That vacuum freeze step pointed out a clog file, I copied that into the master pg_clog dir, ran the aforementioned script. It didn't fail on any of the clog files this time, so now I am rerunning the vacuum freeze command and hoping like hell it works! If the current run of the vacuum freeze fails, I'll report back. Thanks again for everyone's help.
On Fri, Apr 8, 2011 at 10:01 PM, bricklen <bricklen@gmail.com> wrote: > Update on the status of the steps we took, which were: > - test on a hot standby by bringing it live, running the script, > determing the missing clog files, copying them into the live (hot > standby) pg_clog dir > > Now, on the master, copied the same old clog files into the production > *master*, ran vacuumdb -a -v -F. The step I should have taken on the > master before the vacuumdb -F would have been to run the > http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix script to see > if I was missing any clog files on the master. > That vacuum freeze step pointed out a clog file, I copied that into > the master pg_clog dir, ran the aforementioned script. It didn't fail > on any of the clog files this time, so now I am rerunning the vacuum > freeze command and hoping like hell it works! > > If the current run of the vacuum freeze fails, I'll report back. > > Thanks again for everyone's help. > The vacuumdb -a -v F completed successfully this time. Cheers!
bricklen wrote: > On Fri, Apr 8, 2011 at 10:01 PM, bricklen <bricklen@gmail.com> wrote: > > Update on the status of the steps we took, which were: > > - test on a hot standby by bringing it live, running the script, > > determing the missing clog files, copying them into the live (hot > > standby) pg_clog dir > > > > Now, on the master, copied the same old clog files into the production > > *master*, ran vacuumdb -a -v -F. The step I should have taken on the > > master before the vacuumdb -F would have been to run the > > http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix script to see > > if I was missing any clog files on the master. > > That vacuum freeze step pointed out a clog file, I copied that into > > the master pg_clog dir, ran the aforementioned script. It didn't fail > > on any of the clog files this time, so now I am rerunning the vacuum > > freeze command and hoping like hell it works! > > > > If the current run of the vacuum freeze fails, I'll report back. > > > > Thanks again for everyone's help. > > > > The vacuumdb -a -v F completed successfully this time. YEA! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Alvaro Herrera wrote: > > Why is it important to have the original pg_clog files around? Since > the transactions in question are below the freeze horizon, surely the > tuples that involve those transaction have all been visited by vacuum > and thus removed if they were leftover from aborted transactions or > deleted, no? So you could just fill those files with the 0x55 pattern > (signalling "all transactions are committed") and the net result should > be the same. No? > > Forgive me if I'm missing something. I haven't been following this > thread and I'm more than a little tired (but wanted to shoot this today > because I'm gonna be able to, until Monday). Well, in most cases vacuum (or SELECT?) is going to set those xids as committed on the tuple, but if there have been few deletes in the toast table, it is possible vacuum did not run. I think the fact we only have three report query error cases is because in most cases vacuum is already taking care of this as part of space reuse. relfrozenxid is not going to cause freeze to run and therefore those xids, even though marked as committed, still are on the tuple, so we need this script to be run. In fact, if the tuple is marked as committed, do we even bother to mark the xids as fixed via vacuum freeze? Seems we don't have to. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Alvaro Herrera wrote: > > > > Why is it important to have the original pg_clog files around? Since > > the transactions in question are below the freeze horizon, surely the > > tuples that involve those transaction have all been visited by vacuum > > and thus removed if they were leftover from aborted transactions or > > deleted, no? So you could just fill those files with the 0x55 pattern > > (signalling "all transactions are committed") and the net result should > > be the same. No? > > > > Forgive me if I'm missing something. I haven't been following this > > thread and I'm more than a little tired (but wanted to shoot this today > > because I'm gonna be able to, until Monday). To answer your other question, it is true we _probably_ could assume all the rows were committed, except that again, vacuum might not have run and the pages might not be full so single-page cleanup wasn't done either. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Sat, Apr 9, 2011 at 7:03 AM, Bruce Momjian <bruce@momjian.us> wrote: > Bruce Momjian wrote: >> Alvaro Herrera wrote: >> > >> > Why is it important to have the original pg_clog files around? Since >> > the transactions in question are below the freeze horizon, surely the >> > tuples that involve those transaction have all been visited by vacuum >> > and thus removed if they were leftover from aborted transactions or >> > deleted, no? So you could just fill those files with the 0x55 pattern >> > (signalling "all transactions are committed") and the net result should >> > be the same. No? >> > >> > Forgive me if I'm missing something. I haven't been following this >> > thread and I'm more than a little tired (but wanted to shoot this today >> > because I'm gonna be able to, until Monday). > > To answer your other question, it is true we _probably_ could assume all > the rows were committed, except that again, vacuum might not have run > and the pages might not be full so single-page cleanup wasn't done > either. OK, continuing the thought of just making all the old clog files as "all committed"... Since it only affects "toast" tables, the only time the system (with normal queries) would check for a particular toast tuple, the tuple referring to it would have been committed, right? So forcing "all transactions committed" for the older clog segments might mean a scan on a *toast* heap might return tuples as committed when they might have been aborted, but the real table heap would never refer to those, right? a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Sat, Apr 09, 2011 at 09:05:42AM -0400, Aidan Van Dyk wrote: > On Sat, Apr 9, 2011 at 7:03 AM, Bruce Momjian <bruce@momjian.us> wrote: > > Bruce Momjian wrote: > >> Alvaro Herrera wrote: > >> > > >> > Why is it important to have the original pg_clog files around? ?Since > >> > the transactions in question are below the freeze horizon, surely the > >> > tuples that involve those transaction have all been visited by vacuum > >> > and thus removed if they were leftover from aborted transactions or > >> > deleted, no? ?So you could just fill those files with the 0x55 pattern > >> > (signalling "all transactions are committed") and the net result should > >> > be the same. ?No? > >> > > >> > Forgive me if I'm missing something. ?I haven't been following this > >> > thread and I'm more than a little tired (but wanted to shoot this today > >> > because I'm gonna be able to, until Monday). > > > > To answer your other question, it is true we _probably_ could assume all > > the rows were committed, except that again, vacuum might not have run > > and the pages might not be full so single-page cleanup wasn't done > > either. > > OK, continuing the thought of just making all the old clog files as > "all committed"... > > Since it only affects "toast" tables, the only time the system (with > normal queries) would check for a particular toast tuple, the tuple > referring to it would have been committed, right? So forcing "all > transactions committed" for the older clog segments might mean a scan > on a *toast* heap might return tuples as committed when they might > have been aborted, but the real table heap would never refer to those, > right? Yes; it would be relatively harmless to retain some unreferenced TOAST chunks. However, "all xacts committed" is not equivalent to "all tuples visible". If the user rolled back a DELETE shortly before the pg_upgrade run, we need to recognize that outcome to keep any deleted TOAST entries visible.
Aidan Van Dyk wrote: > On Sat, Apr 9, 2011 at 7:03 AM, Bruce Momjian <bruce@momjian.us> wrote: > > Bruce Momjian wrote: > >> Alvaro Herrera wrote: > >> > > >> > Why is it important to have the original pg_clog files around? ?Since > >> > the transactions in question are below the freeze horizon, surely the > >> > tuples that involve those transaction have all been visited by vacuum > >> > and thus removed if they were leftover from aborted transactions or > >> > deleted, no? ?So you could just fill those files with the 0x55 pattern > >> > (signalling "all transactions are committed") and the net result should > >> > be the same. ?No? > >> > > >> > Forgive me if I'm missing something. ?I haven't been following this > >> > thread and I'm more than a little tired (but wanted to shoot this today > >> > because I'm gonna be able to, until Monday). > > > > To answer your other question, it is true we _probably_ could assume all > > the rows were committed, except that again, vacuum might not have run > > and the pages might not be full so single-page cleanup wasn't done > > either. > > OK, continuing the thought of just making all the old clog files as > "all committed"... > > Since it only affects "toast" tables, the only time the system (with > normal queries) would check for a particular toast tuple, the tuple > referring to it would have been committed, right? So forcing "all > transactions committed" for the older clog segments might mean a scan > on a *toast* heap might return tuples as committed when they might > have been aborted, but the real table heap would never refer to those, > right? Uh, good point. I think you are right that you only get to a toast row from a non-aborted heap row. I think the problem might be in following the toast chain but even then I am unclear how that works. Anyone? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Robert Haas wrote: > > On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > Robert Haas wrote: > > >> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas > > >> <heikki.linnakangas@enterprisedb.com> wrote: > > >> >> ?I think the maintenance > > >> >> overhead of an invisible variable is too much. > > >> > > > >> > A simple GUC or command-line switch isn't much code. > > >> > > >> I like the idea of a command-line switch. > > > > > > If you want to do that you should gereralize it as --binary-upgrade in > > > case we have other needs for it. > > > > Yeah. Or we could do a binary_upgrade GUC which has the effect of > > forcibly suppressing autovacuum, and maybe other things later. I > > think that's a lot less hazardous than fiddling with the autovacuum > > GUC. > > I like the idea of a command-line flag because it forces everything to > be affected, and cannot be turned on and off in sessions --- if you are > doing a binary upgrade, locked-down is good. :-) The attached patch adds a new postmaster/postgres binary upgrade mode (-b) which disables autovacuum, allows only super-user connections, and prevents pg_upgrade_support oid assignment when not in upgrade mode. It also modifies pg_upgrade to use this new mode rather than play with trying to stop autovacuum. This does fix a very rare bug that could happen if autovacuum_freeze_max_age were set to maximum by the user. I think this should be applied to PG 9.1. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c new file mode 100644 index 2a0f50e..df2f2db *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *************** start_postmaster(ClusterInfo *cluster, b *** 173,178 **** --- 173,183 ---- const char *datadir; unsigned short port; bool exit_hook_registered = false; + #ifndef WIN32 + char *output_filename = log_opts.filename; + #else + char *output_filename = DEVNULL; + #endif bindir = cluster->bindir; datadir = cluster->pgdata; *************** start_postmaster(ClusterInfo *cluster, b *** 193,216 **** * same file because we get the error: "The process cannot access the file * because it is being used by another process." so we have to send all * other output to 'nul'. ! * ! * Using autovacuum=off disables cleanup vacuum and analyze, but freeze ! * vacuums can still happen, so we set autovacuum_freeze_max_age to its ! * maximum. We assume all datfrozenxid and relfrozen values are less than ! * a gap of 2000000000 from the current xid counter, so autovacuum will ! * not touch them. */ snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " ! "-o \"-p %d -c autovacuum=off " ! "-c autovacuum_freeze_max_age=2000000000\" " ! "start >> \"%s\" 2>&1" SYSTEMQUOTE, ! bindir, ! #ifndef WIN32 ! log_opts.filename, datadir, port, log_opts.filename); ! #else ! DEVNULL, datadir, port, DEVNULL); ! #endif exec_prog(true, "%s", cmd); /* wait for the server to start properly */ --- 198,212 ---- * same file because we get the error: "The process cannot access the file * because it is being used by another process." so we have to send all * other output to 'nul'. ! * Use binary upgrade mode on the server (-b), if supported. */ snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " ! "-o \"-p %d %s\" start >> \"%s\" 2>&1" SYSTEMQUOTE, ! bindir, output_filename, datadir, port, ! (GET_MAJOR_VERSION(cluster->major_version) >= 901) ? "-b" : "", ! log_opts.filename); ! exec_prog(true, "%s", cmd); /* wait for the server to start properly */ diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c new file mode 100644 index 8c5670f..870ddba *** a/src/backend/catalog/heap.c --- b/src/backend/catalog/heap.c *************** heap_create_with_catalog(const char *rel *** 1051,1057 **** * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (OidIsValid(binary_upgrade_next_heap_pg_class_oid) && (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE || relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE || relkind == RELKIND_FOREIGN_TABLE)) --- 1051,1058 ---- * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (IsBinaryUpgrade && ! OidIsValid(binary_upgrade_next_heap_pg_class_oid) && (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE || relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE || relkind == RELKIND_FOREIGN_TABLE)) *************** heap_create_with_catalog(const char *rel *** 1059,1065 **** relid = binary_upgrade_next_heap_pg_class_oid; binary_upgrade_next_heap_pg_class_oid = InvalidOid; } ! else if (OidIsValid(binary_upgrade_next_toast_pg_class_oid) && relkind == RELKIND_TOASTVALUE) { relid = binary_upgrade_next_toast_pg_class_oid; --- 1060,1067 ---- relid = binary_upgrade_next_heap_pg_class_oid; binary_upgrade_next_heap_pg_class_oid = InvalidOid; } ! else if (IsBinaryUpgrade && ! OidIsValid(binary_upgrade_next_toast_pg_class_oid) && relkind == RELKIND_TOASTVALUE) { relid = binary_upgrade_next_toast_pg_class_oid; diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c new file mode 100644 index c79402c..a662cfc *** a/src/backend/catalog/index.c --- b/src/backend/catalog/index.c *************** index_create(Relation heapRelation, *** 790,796 **** * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (OidIsValid(binary_upgrade_next_index_pg_class_oid)) { indexRelationId = binary_upgrade_next_index_pg_class_oid; binary_upgrade_next_index_pg_class_oid = InvalidOid; --- 790,797 ---- * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (IsBinaryUpgrade && ! OidIsValid(binary_upgrade_next_index_pg_class_oid)) { indexRelationId = binary_upgrade_next_index_pg_class_oid; binary_upgrade_next_index_pg_class_oid = InvalidOid; diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c new file mode 100644 index 08d8aa1..61a9322 *** a/src/backend/catalog/pg_enum.c --- b/src/backend/catalog/pg_enum.c *************** *** 21,26 **** --- 21,27 ---- #include "catalog/pg_enum.h" #include "catalog/pg_type.h" #include "storage/lmgr.h" + #include "miscadmin.h" #include "utils/builtins.h" #include "utils/fmgroids.h" #include "utils/rel.h" *************** restart: *** 311,317 **** } /* Get a new OID for the new label */ ! if (OidIsValid(binary_upgrade_next_pg_enum_oid)) { /* * Use binary-upgrade override for pg_enum.oid, if supplied. During --- 312,318 ---- } /* Get a new OID for the new label */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_enum_oid)) { /* * Use binary-upgrade override for pg_enum.oid, if supplied. During diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c new file mode 100644 index 9e35e73..80c1bfc *** a/src/backend/catalog/pg_type.c --- b/src/backend/catalog/pg_type.c *************** TypeShellMake(const char *typeName, Oid *** 125,131 **** tup = heap_form_tuple(tupDesc, values, nulls); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; --- 125,131 ---- tup = heap_form_tuple(tupDesc, values, nulls); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; *************** TypeCreate(Oid newTypeOid, *** 430,436 **** if (OidIsValid(newTypeOid)) HeapTupleSetOid(tup, newTypeOid); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! else if (OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; --- 430,436 ---- if (OidIsValid(newTypeOid)) HeapTupleSetOid(tup, newTypeOid); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! else if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c new file mode 100644 index 85fe57f..362d26d *** a/src/backend/catalog/toasting.c --- b/src/backend/catalog/toasting.c *************** create_toast_table(Relation rel, Oid toa *** 157,163 **** * creation even if it seems not to need one. */ if (!needs_toast_table(rel) && ! !OidIsValid(binary_upgrade_next_toast_pg_class_oid)) return false; /* --- 157,164 ---- * creation even if it seems not to need one. */ if (!needs_toast_table(rel) && ! (!IsBinaryUpgrade || ! !OidIsValid(binary_upgrade_next_toast_pg_class_oid))) return false; /* *************** create_toast_table(Relation rel, Oid toa *** 202,208 **** namespaceid = PG_TOAST_NAMESPACE; /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (OidIsValid(binary_upgrade_next_toast_pg_type_oid)) { toast_typid = binary_upgrade_next_toast_pg_type_oid; binary_upgrade_next_toast_pg_type_oid = InvalidOid; --- 203,209 ---- namespaceid = PG_TOAST_NAMESPACE; /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_toast_pg_type_oid)) { toast_typid = binary_upgrade_next_toast_pg_type_oid; binary_upgrade_next_toast_pg_type_oid = InvalidOid; diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c new file mode 100644 index 1a20b0d..b3b6dc2 *** a/src/backend/commands/typecmds.c --- b/src/backend/commands/typecmds.c *************** AssignTypeArrayOid(void) *** 1550,1556 **** Oid type_array_oid; /* Use binary-upgrade override for pg_type.typarray, if supplied. */ ! if (OidIsValid(binary_upgrade_next_array_pg_type_oid)) { type_array_oid = binary_upgrade_next_array_pg_type_oid; binary_upgrade_next_array_pg_type_oid = InvalidOid; --- 1550,1556 ---- Oid type_array_oid; /* Use binary-upgrade override for pg_type.typarray, if supplied. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_array_pg_type_oid)) { type_array_oid = binary_upgrade_next_array_pg_type_oid; binary_upgrade_next_array_pg_type_oid = InvalidOid; diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c new file mode 100644 index 3f7d499..838d6eb *** a/src/backend/commands/user.c --- b/src/backend/commands/user.c *************** CreateRole(CreateRoleStmt *stmt) *** 388,394 **** * pg_largeobject_metadata contains pg_authid.oid's, so we use the * binary-upgrade override, if specified. */ ! if (OidIsValid(binary_upgrade_next_pg_authid_oid)) { HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid); binary_upgrade_next_pg_authid_oid = InvalidOid; --- 388,394 ---- * pg_largeobject_metadata contains pg_authid.oid's, so we use the * binary-upgrade override, if specified. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_authid_oid)) { HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid); binary_upgrade_next_pg_authid_oid = InvalidOid; diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c new file mode 100644 index 6e7f664..c0cf033 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** PostmasterMain(int argc, char *argv[]) *** 529,535 **** * tcop/postgres.c (the option sets should not conflict) and with the * common help() function in main/main.c. */ ! while ((opt = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1) { switch (opt) { --- 529,535 ---- * tcop/postgres.c (the option sets should not conflict) and with the * common help() function in main/main.c. */ ! while ((opt = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1) { switch (opt) { *************** PostmasterMain(int argc, char *argv[]) *** 541,546 **** --- 541,551 ---- SetConfigOption("shared_buffers", optarg, PGC_POSTMASTER, PGC_S_ARGV); break; + case 'b': + /* Undocumented flag used for binary upgrades */ + IsBinaryUpgrade = true; + break; + case 'D': userDoption = optarg; break; *************** ServerLoop(void) *** 1480,1487 **** if (WalWriterPID == 0 && pmState == PM_RUN) WalWriterPID = StartWalWriter(); ! /* If we have lost the autovacuum launcher, try to start a new one */ ! if (AutoVacPID == 0 && (AutoVacuumingActive() || start_autovac_launcher) && pmState == PM_RUN) { --- 1485,1497 ---- if (WalWriterPID == 0 && pmState == PM_RUN) WalWriterPID = StartWalWriter(); ! /* ! * If we have lost the autovacuum launcher, try to start a new one. ! * We don't want autovacuum to run in binary upgrade mode because ! * autovacuum might update relfrozenxid for empty tables before ! * the physical files are put in place. ! */ ! if (!IsBinaryUpgrade && AutoVacPID == 0 && (AutoVacuumingActive() || start_autovac_launcher) && pmState == PM_RUN) { *************** reaper(SIGNAL_ARGS) *** 2413,2419 **** */ if (WalWriterPID == 0) WalWriterPID = StartWalWriter(); ! if (AutoVacuumingActive() && AutoVacPID == 0) AutoVacPID = StartAutoVacLauncher(); if (XLogArchivingActive() && PgArchPID == 0) PgArchPID = pgarch_start(); --- 2423,2429 ---- */ if (WalWriterPID == 0) WalWriterPID = StartWalWriter(); ! if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0) AutoVacPID = StartAutoVacLauncher(); if (XLogArchivingActive() && PgArchPID == 0) PgArchPID = pgarch_start(); diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c new file mode 100644 index 59b7666..a07661f *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *************** process_postgres_switches(int argc, char *** 3238,3244 **** * postmaster/postmaster.c (the option sets should not conflict) and with * the common help() function in main/main.c. */ ! while ((flag = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1) { switch (flag) { --- 3238,3244 ---- * postmaster/postmaster.c (the option sets should not conflict) and with * the common help() function in main/main.c. */ ! while ((flag = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1) { switch (flag) { *************** process_postgres_switches(int argc, char *** 3250,3255 **** --- 3250,3260 ---- SetConfigOption("shared_buffers", optarg, ctx, gucsource); break; + case 'b': + /* Undocumented flag used for binary upgrades */ + IsBinaryUpgrade = true; + break; + case 'D': if (secure) userDoption = strdup(optarg); diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c new file mode 100644 index 984ffd0..c4c4154 *** a/src/backend/utils/init/globals.c --- b/src/backend/utils/init/globals.c *************** pid_t PostmasterPid = 0; *** 85,90 **** --- 85,91 ---- */ bool IsPostmasterEnvironment = false; bool IsUnderPostmaster = false; + bool IsBinaryUpgrade = false; bool ExitOnAnyError = false; diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c new file mode 100644 index a4c5d4c..1f6fba5 *** a/src/backend/utils/init/postinit.c --- b/src/backend/utils/init/postinit.c *************** InitPostgres(const char *in_dbname, Oid *** 626,631 **** --- 626,641 ---- } /* + * Binary upgrades only allowed super-user connections + */ + if (IsBinaryUpgrade && !am_superuser) + { + ereport(FATAL, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("must be superuser to connect in binary upgrade mode"))); + } + + /* * The last few connections slots are reserved for superusers. Although * replication connections currently require superuser privileges, we * don't allow them to consume the reserved slots, which are intended for diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h new file mode 100644 index aa8cce5..9d19417 *** a/src/include/miscadmin.h --- b/src/include/miscadmin.h *************** do { \ *** 124,129 **** --- 124,130 ---- extern pid_t PostmasterPid; extern bool IsPostmasterEnvironment; extern PGDLLIMPORT bool IsUnderPostmaster; + extern bool IsBinaryUpgrade; extern bool ExitOnAnyError;
Bruce Momjian wrote: > Bruce Momjian wrote: > > Robert Haas wrote: > > > On Thu, Mar 31, 2011 at 12:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > > Robert Haas wrote: > > > >> On Thu, Mar 31, 2011 at 11:32 AM, Heikki Linnakangas > > > >> <heikki.linnakangas@enterprisedb.com> wrote: > > > >> >> ?I think the maintenance > > > >> >> overhead of an invisible variable is too much. > > > >> > > > > >> > A simple GUC or command-line switch isn't much code. > > > >> > > > >> I like the idea of a command-line switch. > > > > > > > > If you want to do that you should gereralize it as --binary-upgrade in > > > > case we have other needs for it. > > > > > > Yeah. Or we could do a binary_upgrade GUC which has the effect of > > > forcibly suppressing autovacuum, and maybe other things later. I > > > think that's a lot less hazardous than fiddling with the autovacuum > > > GUC. > > > > I like the idea of a command-line flag because it forces everything to > > be affected, and cannot be turned on and off in sessions --- if you are > > doing a binary upgrade, locked-down is good. :-) > > The attached patch adds a new postmaster/postgres binary upgrade mode > (-b) which disables autovacuum, allows only super-user connections, and > prevents pg_upgrade_support oid assignment when not in upgrade mode. > It also modifies pg_upgrade to use this new mode rather than play with > trying to stop autovacuum. > > This does fix a very rare bug that could happen if > autovacuum_freeze_max_age were set to maximum by the user. > > I think this should be applied to PG 9.1. One big problem with this patch is that it will not allow people to use pg_upgrade when upgrading from 9.1 alpha to beta. What I could do is to use the mode only on the "new" cluster for 9.1. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: >> The attached patch adds a new postmaster/postgres binary upgrade mode >> (-b) which disables autovacuum, allows only super-user connections, and >> prevents pg_upgrade_support oid assignment when not in upgrade mode. >> It also modifies pg_upgrade to use this new mode rather than play with >> trying to stop autovacuum. > One big problem with this patch is that it will not allow people to use > pg_upgrade when upgrading from 9.1 alpha to beta. Huh? Why would that be? Seems like you've done something in the wrong place if that's an issue. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > >> The attached patch adds a new postmaster/postgres binary upgrade mode > >> (-b) which disables autovacuum, allows only super-user connections, and > >> prevents pg_upgrade_support oid assignment when not in upgrade mode. > >> It also modifies pg_upgrade to use this new mode rather than play with > >> trying to stop autovacuum. > > > One big problem with this patch is that it will not allow people to use > > pg_upgrade when upgrading from 9.1 alpha to beta. > > Huh? Why would that be? Seems like you've done something in the wrong > place if that's an issue. Yeah, it is complicated. I don't really care if autovacuum runs on the old cluster (we only move the files while the server is down). We only want autovacuum not to mess with the relfrozenxids we set on the new cluster while the table file is empty. The other issue is that the old alpha binary will not know about the -b flag and hence will not start. This all came up when we were looking for the relfrozenxid bug, which we found as TOAST which has been fixed. This is a very small problem so maybe we just skip the fix for 9.1. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Huh? Why would that be? Seems like you've done something in the wrong >> place if that's an issue. > Yeah, it is complicated. I don't really care if autovacuum runs on the > old cluster (we only move the files while the server is down). We only > want autovacuum not to mess with the relfrozenxids we set on the new > cluster while the table file is empty. > The other issue is that the old alpha binary will not know about the -b > flag and hence will not start. Well, once again, why are you trying to do that? It's not the source postmaster that needs this flag. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> Huh? Why would that be? Seems like you've done something in the wrong > >> place if that's an issue. > > > Yeah, it is complicated. I don't really care if autovacuum runs on the > > old cluster (we only move the files while the server is down). We only > > want autovacuum not to mess with the relfrozenxids we set on the new > > cluster while the table file is empty. > > > The other issue is that the old alpha binary will not know about the -b > > flag and hence will not start. > > Well, once again, why are you trying to do that? It's not the source > postmaster that needs this flag. Well, consider that this also locks out non-super users so I figured it would be good to run the old and new in the same binary upgrade mode. Again, we can do just the new cluster for 9.1. I can also control the behavior based on the catalog version number, which seems the most logical. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, 2011-04-21 at 18:22 -0400, Bruce Momjian wrote: > I can also control the > behavior based on the catalog version number, which seems the most > logical. It seems like we want a simple "use -b if available; else don't". Is that right? If so, switching based on the version seems reasonable. However, can you get the information directly from the bianry, rather than trying to infer it from the catalog version? Regards,Jeff Davis
On Apr 21, 2011, at 6:22 PM, Bruce Momjian <bruce@momjian.us> wrote: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> Tom Lane wrote: >>>> Huh? Why would that be? Seems like you've done something in the wrong >>>> place if that's an issue. >> >>> Yeah, it is complicated. I don't really care if autovacuum runs on the >>> old cluster (we only move the files while the server is down). We only >>> want autovacuum not to mess with the relfrozenxids we set on the new >>> cluster while the table file is empty. >> >>> The other issue is that the old alpha binary will not know about the -b >>> flag and hence will not start. >> >> Well, once again, why are you trying to do that? It's not the source >> postmaster that needs this flag. > > Well, consider that this also locks out non-super users so I figured it > would be good to run the old and new in the same binary upgrade mode. > Again, we can do just the new cluster for 9.1. I can also control the > behavior based on the catalog version number, which seems the most > logical. I think you are over-engineering this. Just use it for the new cluster only, full stop, and you'll be right as rain. ...Robert
Robert Haas wrote: > On Apr 21, 2011, at 6:22 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Tom Lane wrote: > >> Bruce Momjian <bruce@momjian.us> writes: > >>> Tom Lane wrote: > >>>> Huh? Why would that be? Seems like you've done something in the wrong > >>>> place if that's an issue. > >> > >>> Yeah, it is complicated. I don't really care if autovacuum runs on the > >>> old cluster (we only move the files while the server is down). We only > >>> want autovacuum not to mess with the relfrozenxids we set on the new > >>> cluster while the table file is empty. > >> > >>> The other issue is that the old alpha binary will not know about the -b > >>> flag and hence will not start. > >> > >> Well, once again, why are you trying to do that? It's not the source > >> postmaster that needs this flag. > > > > Well, consider that this also locks out non-super users so I figured it > > would be good to run the old and new in the same binary upgrade mode. > > Again, we can do just the new cluster for 9.1. I can also control the > > behavior based on the catalog version number, which seems the most > > logical. > > I think you are over-engineering this. Just use it for the new cluster > only, full stop, and you'll be right as rain. I thought some more about this and I don't want autovacuum to run on the old server. This is because pg_dumpall --binary-upgrade --schema-only grabs the datfrozenxid for all the databases at the start, then connects to each database to gets the relfrozenxids. I don't want to risk any advancement of either of those during the pg_dumpall run. FYI, the existing code already doesn't allow autovacuum to run on the old or new cluster by setting autovacuum off and autovacuum_freeze_max_age very high, so this is not a behavior change --- just a more formalized way of turning off autovacuum. The attached patch uses catalog version to test; we use catalog version checking already for tablespace subdirectories. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index d1dc5db..7623c5e *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *************** check_cluster_compatibility(bool live_ch *** 264,270 **** /* Is it 9.0 but without tablespace directories? */ if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 && ! new_cluster.controldata.cat_ver < TABLE_SPACE_SUBDIRS) pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n" "because of backend API changes made during development.\n"); } --- 264,270 ---- /* Is it 9.0 but without tablespace directories? */ if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 && ! new_cluster.controldata.cat_ver < CAT_VER_TABLE_SPACE_SUBDIRS) pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n" "because of backend API changes made during development.\n"); } diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h new file mode 100644 index 5ca570e..70d74ba *** a/contrib/pg_upgrade/pg_upgrade.h --- b/contrib/pg_upgrade/pg_upgrade.h *************** *** 58,64 **** #define atooid(x) ((Oid) strtoul((x), NULL, 10)) /* OID system catalog preservation added during PG 9.0 development */ ! #define TABLE_SPACE_SUBDIRS 201001111 /* * Each relation is represented by a relinfo structure. --- 58,66 ---- #define atooid(x) ((Oid) strtoul((x), NULL, 10)) /* OID system catalog preservation added during PG 9.0 development */ ! #define CAT_VER_TABLE_SPACE_SUBDIRS 201001111 ! /* postmaster/postgres -b (binary_upgrade) flag added during PG 9.1 development */ ! #define CAT_VER_BINARY_UPGRADE_SERVER_FLAG 201104221 /* * Each relation is represented by a relinfo structure. diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c new file mode 100644 index 2a0f50e..f7d7653 *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *************** start_postmaster(ClusterInfo *cluster, b *** 173,178 **** --- 173,183 ---- const char *datadir; unsigned short port; bool exit_hook_registered = false; + #ifndef WIN32 + char *output_filename = log_opts.filename; + #else + char *output_filename = DEVNULL; + #endif bindir = cluster->bindir; datadir = cluster->pgdata; *************** start_postmaster(ClusterInfo *cluster, b *** 193,216 **** * same file because we get the error: "The process cannot access the file * because it is being used by another process." so we have to send all * other output to 'nul'. ! * ! * Using autovacuum=off disables cleanup vacuum and analyze, but freeze ! * vacuums can still happen, so we set autovacuum_freeze_max_age to its ! * maximum. We assume all datfrozenxid and relfrozen values are less than ! * a gap of 2000000000 from the current xid counter, so autovacuum will ! * not touch them. */ snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " ! "-o \"-p %d -c autovacuum=off " ! "-c autovacuum_freeze_max_age=2000000000\" " ! "start >> \"%s\" 2>&1" SYSTEMQUOTE, ! bindir, ! #ifndef WIN32 ! log_opts.filename, datadir, port, log_opts.filename); ! #else ! DEVNULL, datadir, port, DEVNULL); ! #endif exec_prog(true, "%s", cmd); /* wait for the server to start properly */ --- 198,213 ---- * same file because we get the error: "The process cannot access the file * because it is being used by another process." so we have to send all * other output to 'nul'. ! * Use binary upgrade mode on the server (-b), if supported. */ snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " ! "-o \"-p %d%s\" start >> \"%s\" 2>&1" SYSTEMQUOTE, ! bindir, output_filename, datadir, port, ! (cluster->controldata.cat_ver >= ! CAT_VER_BINARY_UPGRADE_SERVER_FLAG) ? " -b" : "", ! log_opts.filename); ! exec_prog(true, "%s", cmd); /* wait for the server to start properly */ diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c new file mode 100644 index 8c5670f..870ddba *** a/src/backend/catalog/heap.c --- b/src/backend/catalog/heap.c *************** heap_create_with_catalog(const char *rel *** 1051,1057 **** * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (OidIsValid(binary_upgrade_next_heap_pg_class_oid) && (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE || relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE || relkind == RELKIND_FOREIGN_TABLE)) --- 1051,1058 ---- * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (IsBinaryUpgrade && ! OidIsValid(binary_upgrade_next_heap_pg_class_oid) && (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE || relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE || relkind == RELKIND_FOREIGN_TABLE)) *************** heap_create_with_catalog(const char *rel *** 1059,1065 **** relid = binary_upgrade_next_heap_pg_class_oid; binary_upgrade_next_heap_pg_class_oid = InvalidOid; } ! else if (OidIsValid(binary_upgrade_next_toast_pg_class_oid) && relkind == RELKIND_TOASTVALUE) { relid = binary_upgrade_next_toast_pg_class_oid; --- 1060,1067 ---- relid = binary_upgrade_next_heap_pg_class_oid; binary_upgrade_next_heap_pg_class_oid = InvalidOid; } ! else if (IsBinaryUpgrade && ! OidIsValid(binary_upgrade_next_toast_pg_class_oid) && relkind == RELKIND_TOASTVALUE) { relid = binary_upgrade_next_toast_pg_class_oid; diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c new file mode 100644 index c79402c..a662cfc *** a/src/backend/catalog/index.c --- b/src/backend/catalog/index.c *************** index_create(Relation heapRelation, *** 790,796 **** * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (OidIsValid(binary_upgrade_next_index_pg_class_oid)) { indexRelationId = binary_upgrade_next_index_pg_class_oid; binary_upgrade_next_index_pg_class_oid = InvalidOid; --- 790,797 ---- * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (IsBinaryUpgrade && ! OidIsValid(binary_upgrade_next_index_pg_class_oid)) { indexRelationId = binary_upgrade_next_index_pg_class_oid; binary_upgrade_next_index_pg_class_oid = InvalidOid; diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c new file mode 100644 index 08d8aa1..61a9322 *** a/src/backend/catalog/pg_enum.c --- b/src/backend/catalog/pg_enum.c *************** *** 21,26 **** --- 21,27 ---- #include "catalog/pg_enum.h" #include "catalog/pg_type.h" #include "storage/lmgr.h" + #include "miscadmin.h" #include "utils/builtins.h" #include "utils/fmgroids.h" #include "utils/rel.h" *************** restart: *** 311,317 **** } /* Get a new OID for the new label */ ! if (OidIsValid(binary_upgrade_next_pg_enum_oid)) { /* * Use binary-upgrade override for pg_enum.oid, if supplied. During --- 312,318 ---- } /* Get a new OID for the new label */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_enum_oid)) { /* * Use binary-upgrade override for pg_enum.oid, if supplied. During diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c new file mode 100644 index 9e35e73..80c1bfc *** a/src/backend/catalog/pg_type.c --- b/src/backend/catalog/pg_type.c *************** TypeShellMake(const char *typeName, Oid *** 125,131 **** tup = heap_form_tuple(tupDesc, values, nulls); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; --- 125,131 ---- tup = heap_form_tuple(tupDesc, values, nulls); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; *************** TypeCreate(Oid newTypeOid, *** 430,436 **** if (OidIsValid(newTypeOid)) HeapTupleSetOid(tup, newTypeOid); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! else if (OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; --- 430,436 ---- if (OidIsValid(newTypeOid)) HeapTupleSetOid(tup, newTypeOid); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! else if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c new file mode 100644 index 85fe57f..362d26d *** a/src/backend/catalog/toasting.c --- b/src/backend/catalog/toasting.c *************** create_toast_table(Relation rel, Oid toa *** 157,163 **** * creation even if it seems not to need one. */ if (!needs_toast_table(rel) && ! !OidIsValid(binary_upgrade_next_toast_pg_class_oid)) return false; /* --- 157,164 ---- * creation even if it seems not to need one. */ if (!needs_toast_table(rel) && ! (!IsBinaryUpgrade || ! !OidIsValid(binary_upgrade_next_toast_pg_class_oid))) return false; /* *************** create_toast_table(Relation rel, Oid toa *** 202,208 **** namespaceid = PG_TOAST_NAMESPACE; /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (OidIsValid(binary_upgrade_next_toast_pg_type_oid)) { toast_typid = binary_upgrade_next_toast_pg_type_oid; binary_upgrade_next_toast_pg_type_oid = InvalidOid; --- 203,209 ---- namespaceid = PG_TOAST_NAMESPACE; /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_toast_pg_type_oid)) { toast_typid = binary_upgrade_next_toast_pg_type_oid; binary_upgrade_next_toast_pg_type_oid = InvalidOid; diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c new file mode 100644 index 1a20b0d..b3b6dc2 *** a/src/backend/commands/typecmds.c --- b/src/backend/commands/typecmds.c *************** AssignTypeArrayOid(void) *** 1550,1556 **** Oid type_array_oid; /* Use binary-upgrade override for pg_type.typarray, if supplied. */ ! if (OidIsValid(binary_upgrade_next_array_pg_type_oid)) { type_array_oid = binary_upgrade_next_array_pg_type_oid; binary_upgrade_next_array_pg_type_oid = InvalidOid; --- 1550,1556 ---- Oid type_array_oid; /* Use binary-upgrade override for pg_type.typarray, if supplied. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_array_pg_type_oid)) { type_array_oid = binary_upgrade_next_array_pg_type_oid; binary_upgrade_next_array_pg_type_oid = InvalidOid; diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c new file mode 100644 index 3f7d499..838d6eb *** a/src/backend/commands/user.c --- b/src/backend/commands/user.c *************** CreateRole(CreateRoleStmt *stmt) *** 388,394 **** * pg_largeobject_metadata contains pg_authid.oid's, so we use the * binary-upgrade override, if specified. */ ! if (OidIsValid(binary_upgrade_next_pg_authid_oid)) { HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid); binary_upgrade_next_pg_authid_oid = InvalidOid; --- 388,394 ---- * pg_largeobject_metadata contains pg_authid.oid's, so we use the * binary-upgrade override, if specified. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_authid_oid)) { HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid); binary_upgrade_next_pg_authid_oid = InvalidOid; diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c new file mode 100644 index 6e7f664..c0cf033 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** PostmasterMain(int argc, char *argv[]) *** 529,535 **** * tcop/postgres.c (the option sets should not conflict) and with the * common help() function in main/main.c. */ ! while ((opt = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1) { switch (opt) { --- 529,535 ---- * tcop/postgres.c (the option sets should not conflict) and with the * common help() function in main/main.c. */ ! while ((opt = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1) { switch (opt) { *************** PostmasterMain(int argc, char *argv[]) *** 541,546 **** --- 541,551 ---- SetConfigOption("shared_buffers", optarg, PGC_POSTMASTER, PGC_S_ARGV); break; + case 'b': + /* Undocumented flag used for binary upgrades */ + IsBinaryUpgrade = true; + break; + case 'D': userDoption = optarg; break; *************** ServerLoop(void) *** 1480,1487 **** if (WalWriterPID == 0 && pmState == PM_RUN) WalWriterPID = StartWalWriter(); ! /* If we have lost the autovacuum launcher, try to start a new one */ ! if (AutoVacPID == 0 && (AutoVacuumingActive() || start_autovac_launcher) && pmState == PM_RUN) { --- 1485,1497 ---- if (WalWriterPID == 0 && pmState == PM_RUN) WalWriterPID = StartWalWriter(); ! /* ! * If we have lost the autovacuum launcher, try to start a new one. ! * We don't want autovacuum to run in binary upgrade mode because ! * autovacuum might update relfrozenxid for empty tables before ! * the physical files are put in place. ! */ ! if (!IsBinaryUpgrade && AutoVacPID == 0 && (AutoVacuumingActive() || start_autovac_launcher) && pmState == PM_RUN) { *************** reaper(SIGNAL_ARGS) *** 2413,2419 **** */ if (WalWriterPID == 0) WalWriterPID = StartWalWriter(); ! if (AutoVacuumingActive() && AutoVacPID == 0) AutoVacPID = StartAutoVacLauncher(); if (XLogArchivingActive() && PgArchPID == 0) PgArchPID = pgarch_start(); --- 2423,2429 ---- */ if (WalWriterPID == 0) WalWriterPID = StartWalWriter(); ! if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0) AutoVacPID = StartAutoVacLauncher(); if (XLogArchivingActive() && PgArchPID == 0) PgArchPID = pgarch_start(); diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c new file mode 100644 index 59b7666..a07661f *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *************** process_postgres_switches(int argc, char *** 3238,3244 **** * postmaster/postmaster.c (the option sets should not conflict) and with * the common help() function in main/main.c. */ ! while ((flag = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1) { switch (flag) { --- 3238,3244 ---- * postmaster/postmaster.c (the option sets should not conflict) and with * the common help() function in main/main.c. */ ! while ((flag = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1) { switch (flag) { *************** process_postgres_switches(int argc, char *** 3250,3255 **** --- 3250,3260 ---- SetConfigOption("shared_buffers", optarg, ctx, gucsource); break; + case 'b': + /* Undocumented flag used for binary upgrades */ + IsBinaryUpgrade = true; + break; + case 'D': if (secure) userDoption = strdup(optarg); diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c new file mode 100644 index 984ffd0..c4c4154 *** a/src/backend/utils/init/globals.c --- b/src/backend/utils/init/globals.c *************** pid_t PostmasterPid = 0; *** 85,90 **** --- 85,91 ---- */ bool IsPostmasterEnvironment = false; bool IsUnderPostmaster = false; + bool IsBinaryUpgrade = false; bool ExitOnAnyError = false; diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c new file mode 100644 index a4c5d4c..1f6fba5 *** a/src/backend/utils/init/postinit.c --- b/src/backend/utils/init/postinit.c *************** InitPostgres(const char *in_dbname, Oid *** 626,631 **** --- 626,641 ---- } /* + * Binary upgrades only allowed super-user connections + */ + if (IsBinaryUpgrade && !am_superuser) + { + ereport(FATAL, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("must be superuser to connect in binary upgrade mode"))); + } + + /* * The last few connections slots are reserved for superusers. Although * replication connections currently require superuser privileges, we * don't allow them to consume the reserved slots, which are intended for diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h new file mode 100644 index 53c684a..22926b0 *** a/src/include/catalog/catversion.h --- b/src/include/catalog/catversion.h *************** *** 53,58 **** */ /* yyyymmddN */ ! #define CATALOG_VERSION_NO 201104181 #endif --- 53,58 ---- */ /* yyyymmddN */ ! #define CATALOG_VERSION_NO 201104221 #endif diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h new file mode 100644 index aa8cce5..9d19417 *** a/src/include/miscadmin.h --- b/src/include/miscadmin.h *************** do { \ *** 124,129 **** --- 124,130 ---- extern pid_t PostmasterPid; extern bool IsPostmasterEnvironment; extern PGDLLIMPORT bool IsUnderPostmaster; + extern bool IsBinaryUpgrade; extern bool ExitOnAnyError;
Bruce Momjian <bruce@momjian.us> writes: > I thought some more about this and I don't want autovacuum to run on the > old server. This is because pg_dumpall --binary-upgrade --schema-only > grabs the datfrozenxid for all the databases at the start, then connects > to each database to gets the relfrozenxids. I don't want to risk any > advancement of either of those during the pg_dumpall run. Why? It doesn't really matter --- if you grab a value that is older than the latest, it's still valid. As Robert said, you're over-engineering this, and thereby introducing potential failure modes, for no gain. regards, tom lane
Bruce Momjian wrote: > > > Well, consider that this also locks out non-super users so I figured it > > > would be good to run the old and new in the same binary upgrade mode. > > > Again, we can do just the new cluster for 9.1. I can also control the > > > behavior based on the catalog version number, which seems the most > > > logical. > > > > I think you are over-engineering this. Just use it for the new cluster > > only, full stop, and you'll be right as rain. > > I thought some more about this and I don't want autovacuum to run on the > old server. This is because pg_dumpall --binary-upgrade --schema-only > grabs the datfrozenxid for all the databases at the start, then connects > to each database to gets the relfrozenxids. I don't want to risk any > advancement of either of those during the pg_dumpall run. > > FYI, the existing code already doesn't allow autovacuum to run on the old > or new cluster by setting autovacuum off and autovacuum_freeze_max_age > very high, so this is not a behavior change --- just a more formalized > way of turning off autovacuum. > > The attached patch uses catalog version to test; we use catalog version > checking already for tablespace subdirectories. OK, thinking even more, it seems I have to keep the old vacuum disable flag for pre-9.1 old clusters. This new patch does that. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index d1dc5db..7623c5e *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *************** check_cluster_compatibility(bool live_ch *** 264,270 **** /* Is it 9.0 but without tablespace directories? */ if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 && ! new_cluster.controldata.cat_ver < TABLE_SPACE_SUBDIRS) pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n" "because of backend API changes made during development.\n"); } --- 264,270 ---- /* Is it 9.0 but without tablespace directories? */ if (GET_MAJOR_VERSION(new_cluster.major_version) == 900 && ! new_cluster.controldata.cat_ver < CAT_VER_TABLE_SPACE_SUBDIRS) pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n" "because of backend API changes made during development.\n"); } diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h new file mode 100644 index 5ca570e..70d74ba *** a/contrib/pg_upgrade/pg_upgrade.h --- b/contrib/pg_upgrade/pg_upgrade.h *************** *** 58,64 **** #define atooid(x) ((Oid) strtoul((x), NULL, 10)) /* OID system catalog preservation added during PG 9.0 development */ ! #define TABLE_SPACE_SUBDIRS 201001111 /* * Each relation is represented by a relinfo structure. --- 58,66 ---- #define atooid(x) ((Oid) strtoul((x), NULL, 10)) /* OID system catalog preservation added during PG 9.0 development */ ! #define CAT_VER_TABLE_SPACE_SUBDIRS 201001111 ! /* postmaster/postgres -b (binary_upgrade) flag added during PG 9.1 development */ ! #define CAT_VER_BINARY_UPGRADE_SERVER_FLAG 201104221 /* * Each relation is represented by a relinfo structure. diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c new file mode 100644 index 2a0f50e..ca4a957 *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *************** start_postmaster(ClusterInfo *cluster, b *** 173,178 **** --- 173,183 ---- const char *datadir; unsigned short port; bool exit_hook_registered = false; + #ifndef WIN32 + char *output_filename = log_opts.filename; + #else + char *output_filename = DEVNULL; + #endif bindir = cluster->bindir; datadir = cluster->pgdata; *************** start_postmaster(ClusterInfo *cluster, b *** 193,199 **** * same file because we get the error: "The process cannot access the file * because it is being used by another process." so we have to send all * other output to 'nul'. - * * Using autovacuum=off disables cleanup vacuum and analyze, but freeze * vacuums can still happen, so we set autovacuum_freeze_max_age to its * maximum. We assume all datfrozenxid and relfrozen values are less than --- 198,203 ---- *************** start_postmaster(ClusterInfo *cluster, b *** 202,216 **** */ snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " ! "-o \"-p %d -c autovacuum=off " ! "-c autovacuum_freeze_max_age=2000000000\" " ! "start >> \"%s\" 2>&1" SYSTEMQUOTE, ! bindir, ! #ifndef WIN32 ! log_opts.filename, datadir, port, log_opts.filename); ! #else ! DEVNULL, datadir, port, DEVNULL); ! #endif exec_prog(true, "%s", cmd); /* wait for the server to start properly */ --- 206,218 ---- */ snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " ! "-o \"-p %d %s\" start >> \"%s\" 2>&1" SYSTEMQUOTE, ! bindir, output_filename, datadir, port, ! (cluster->controldata.cat_ver >= ! CAT_VER_BINARY_UPGRADE_SERVER_FLAG) ? "-b" : ! "-c autovacuum=off -c autovacuum_freeze_max_age=2000000000", ! log_opts.filename); ! exec_prog(true, "%s", cmd); /* wait for the server to start properly */ diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c new file mode 100644 index 8c5670f..870ddba *** a/src/backend/catalog/heap.c --- b/src/backend/catalog/heap.c *************** heap_create_with_catalog(const char *rel *** 1051,1057 **** * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (OidIsValid(binary_upgrade_next_heap_pg_class_oid) && (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE || relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE || relkind == RELKIND_FOREIGN_TABLE)) --- 1051,1058 ---- * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (IsBinaryUpgrade && ! OidIsValid(binary_upgrade_next_heap_pg_class_oid) && (relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE || relkind == RELKIND_VIEW || relkind == RELKIND_COMPOSITE_TYPE || relkind == RELKIND_FOREIGN_TABLE)) *************** heap_create_with_catalog(const char *rel *** 1059,1065 **** relid = binary_upgrade_next_heap_pg_class_oid; binary_upgrade_next_heap_pg_class_oid = InvalidOid; } ! else if (OidIsValid(binary_upgrade_next_toast_pg_class_oid) && relkind == RELKIND_TOASTVALUE) { relid = binary_upgrade_next_toast_pg_class_oid; --- 1060,1067 ---- relid = binary_upgrade_next_heap_pg_class_oid; binary_upgrade_next_heap_pg_class_oid = InvalidOid; } ! else if (IsBinaryUpgrade && ! OidIsValid(binary_upgrade_next_toast_pg_class_oid) && relkind == RELKIND_TOASTVALUE) { relid = binary_upgrade_next_toast_pg_class_oid; diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c new file mode 100644 index c79402c..a662cfc *** a/src/backend/catalog/index.c --- b/src/backend/catalog/index.c *************** index_create(Relation heapRelation, *** 790,796 **** * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (OidIsValid(binary_upgrade_next_index_pg_class_oid)) { indexRelationId = binary_upgrade_next_index_pg_class_oid; binary_upgrade_next_index_pg_class_oid = InvalidOid; --- 790,797 ---- * Use binary-upgrade override for pg_class.oid/relfilenode, if * supplied. */ ! if (IsBinaryUpgrade && ! OidIsValid(binary_upgrade_next_index_pg_class_oid)) { indexRelationId = binary_upgrade_next_index_pg_class_oid; binary_upgrade_next_index_pg_class_oid = InvalidOid; diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c new file mode 100644 index 08d8aa1..61a9322 *** a/src/backend/catalog/pg_enum.c --- b/src/backend/catalog/pg_enum.c *************** *** 21,26 **** --- 21,27 ---- #include "catalog/pg_enum.h" #include "catalog/pg_type.h" #include "storage/lmgr.h" + #include "miscadmin.h" #include "utils/builtins.h" #include "utils/fmgroids.h" #include "utils/rel.h" *************** restart: *** 311,317 **** } /* Get a new OID for the new label */ ! if (OidIsValid(binary_upgrade_next_pg_enum_oid)) { /* * Use binary-upgrade override for pg_enum.oid, if supplied. During --- 312,318 ---- } /* Get a new OID for the new label */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_enum_oid)) { /* * Use binary-upgrade override for pg_enum.oid, if supplied. During diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c new file mode 100644 index 9e35e73..80c1bfc *** a/src/backend/catalog/pg_type.c --- b/src/backend/catalog/pg_type.c *************** TypeShellMake(const char *typeName, Oid *** 125,131 **** tup = heap_form_tuple(tupDesc, values, nulls); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; --- 125,131 ---- tup = heap_form_tuple(tupDesc, values, nulls); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; *************** TypeCreate(Oid newTypeOid, *** 430,436 **** if (OidIsValid(newTypeOid)) HeapTupleSetOid(tup, newTypeOid); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! else if (OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; --- 430,436 ---- if (OidIsValid(newTypeOid)) HeapTupleSetOid(tup, newTypeOid); /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! else if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_type_oid)) { HeapTupleSetOid(tup, binary_upgrade_next_pg_type_oid); binary_upgrade_next_pg_type_oid = InvalidOid; diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c new file mode 100644 index 85fe57f..362d26d *** a/src/backend/catalog/toasting.c --- b/src/backend/catalog/toasting.c *************** create_toast_table(Relation rel, Oid toa *** 157,163 **** * creation even if it seems not to need one. */ if (!needs_toast_table(rel) && ! !OidIsValid(binary_upgrade_next_toast_pg_class_oid)) return false; /* --- 157,164 ---- * creation even if it seems not to need one. */ if (!needs_toast_table(rel) && ! (!IsBinaryUpgrade || ! !OidIsValid(binary_upgrade_next_toast_pg_class_oid))) return false; /* *************** create_toast_table(Relation rel, Oid toa *** 202,208 **** namespaceid = PG_TOAST_NAMESPACE; /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (OidIsValid(binary_upgrade_next_toast_pg_type_oid)) { toast_typid = binary_upgrade_next_toast_pg_type_oid; binary_upgrade_next_toast_pg_type_oid = InvalidOid; --- 203,209 ---- namespaceid = PG_TOAST_NAMESPACE; /* Use binary-upgrade override for pg_type.oid, if supplied. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_toast_pg_type_oid)) { toast_typid = binary_upgrade_next_toast_pg_type_oid; binary_upgrade_next_toast_pg_type_oid = InvalidOid; diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c new file mode 100644 index 1a20b0d..b3b6dc2 *** a/src/backend/commands/typecmds.c --- b/src/backend/commands/typecmds.c *************** AssignTypeArrayOid(void) *** 1550,1556 **** Oid type_array_oid; /* Use binary-upgrade override for pg_type.typarray, if supplied. */ ! if (OidIsValid(binary_upgrade_next_array_pg_type_oid)) { type_array_oid = binary_upgrade_next_array_pg_type_oid; binary_upgrade_next_array_pg_type_oid = InvalidOid; --- 1550,1556 ---- Oid type_array_oid; /* Use binary-upgrade override for pg_type.typarray, if supplied. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_array_pg_type_oid)) { type_array_oid = binary_upgrade_next_array_pg_type_oid; binary_upgrade_next_array_pg_type_oid = InvalidOid; diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c new file mode 100644 index 3f7d499..838d6eb *** a/src/backend/commands/user.c --- b/src/backend/commands/user.c *************** CreateRole(CreateRoleStmt *stmt) *** 388,394 **** * pg_largeobject_metadata contains pg_authid.oid's, so we use the * binary-upgrade override, if specified. */ ! if (OidIsValid(binary_upgrade_next_pg_authid_oid)) { HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid); binary_upgrade_next_pg_authid_oid = InvalidOid; --- 388,394 ---- * pg_largeobject_metadata contains pg_authid.oid's, so we use the * binary-upgrade override, if specified. */ ! if (IsBinaryUpgrade && OidIsValid(binary_upgrade_next_pg_authid_oid)) { HeapTupleSetOid(tuple, binary_upgrade_next_pg_authid_oid); binary_upgrade_next_pg_authid_oid = InvalidOid; diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c new file mode 100644 index 6e7f664..c0cf033 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** PostmasterMain(int argc, char *argv[]) *** 529,535 **** * tcop/postgres.c (the option sets should not conflict) and with the * common help() function in main/main.c. */ ! while ((opt = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1) { switch (opt) { --- 529,535 ---- * tcop/postgres.c (the option sets should not conflict) and with the * common help() function in main/main.c. */ ! while ((opt = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:W:-:")) != -1) { switch (opt) { *************** PostmasterMain(int argc, char *argv[]) *** 541,546 **** --- 541,551 ---- SetConfigOption("shared_buffers", optarg, PGC_POSTMASTER, PGC_S_ARGV); break; + case 'b': + /* Undocumented flag used for binary upgrades */ + IsBinaryUpgrade = true; + break; + case 'D': userDoption = optarg; break; *************** ServerLoop(void) *** 1480,1487 **** if (WalWriterPID == 0 && pmState == PM_RUN) WalWriterPID = StartWalWriter(); ! /* If we have lost the autovacuum launcher, try to start a new one */ ! if (AutoVacPID == 0 && (AutoVacuumingActive() || start_autovac_launcher) && pmState == PM_RUN) { --- 1485,1497 ---- if (WalWriterPID == 0 && pmState == PM_RUN) WalWriterPID = StartWalWriter(); ! /* ! * If we have lost the autovacuum launcher, try to start a new one. ! * We don't want autovacuum to run in binary upgrade mode because ! * autovacuum might update relfrozenxid for empty tables before ! * the physical files are put in place. ! */ ! if (!IsBinaryUpgrade && AutoVacPID == 0 && (AutoVacuumingActive() || start_autovac_launcher) && pmState == PM_RUN) { *************** reaper(SIGNAL_ARGS) *** 2413,2419 **** */ if (WalWriterPID == 0) WalWriterPID = StartWalWriter(); ! if (AutoVacuumingActive() && AutoVacPID == 0) AutoVacPID = StartAutoVacLauncher(); if (XLogArchivingActive() && PgArchPID == 0) PgArchPID = pgarch_start(); --- 2423,2429 ---- */ if (WalWriterPID == 0) WalWriterPID = StartWalWriter(); ! if (!IsBinaryUpgrade && AutoVacuumingActive() && AutoVacPID == 0) AutoVacPID = StartAutoVacLauncher(); if (XLogArchivingActive() && PgArchPID == 0) PgArchPID = pgarch_start(); diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c new file mode 100644 index 59b7666..a07661f *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *************** process_postgres_switches(int argc, char *** 3238,3244 **** * postmaster/postmaster.c (the option sets should not conflict) and with * the common help() function in main/main.c. */ ! while ((flag = getopt(argc, argv, "A:B:c:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1) { switch (flag) { --- 3238,3244 ---- * postmaster/postmaster.c (the option sets should not conflict) and with * the common help() function in main/main.c. */ ! while ((flag = getopt(argc, argv, "A:B:bc:D:d:EeFf:h:ijk:lN:nOo:Pp:r:S:sTt:v:W:-:")) != -1) { switch (flag) { *************** process_postgres_switches(int argc, char *** 3250,3255 **** --- 3250,3260 ---- SetConfigOption("shared_buffers", optarg, ctx, gucsource); break; + case 'b': + /* Undocumented flag used for binary upgrades */ + IsBinaryUpgrade = true; + break; + case 'D': if (secure) userDoption = strdup(optarg); diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c new file mode 100644 index 984ffd0..c4c4154 *** a/src/backend/utils/init/globals.c --- b/src/backend/utils/init/globals.c *************** pid_t PostmasterPid = 0; *** 85,90 **** --- 85,91 ---- */ bool IsPostmasterEnvironment = false; bool IsUnderPostmaster = false; + bool IsBinaryUpgrade = false; bool ExitOnAnyError = false; diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c new file mode 100644 index a4c5d4c..1f6fba5 *** a/src/backend/utils/init/postinit.c --- b/src/backend/utils/init/postinit.c *************** InitPostgres(const char *in_dbname, Oid *** 626,631 **** --- 626,641 ---- } /* + * Binary upgrades only allowed super-user connections + */ + if (IsBinaryUpgrade && !am_superuser) + { + ereport(FATAL, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("must be superuser to connect in binary upgrade mode"))); + } + + /* * The last few connections slots are reserved for superusers. Although * replication connections currently require superuser privileges, we * don't allow them to consume the reserved slots, which are intended for diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h new file mode 100644 index 53c684a..22926b0 *** a/src/include/catalog/catversion.h --- b/src/include/catalog/catversion.h *************** *** 53,58 **** */ /* yyyymmddN */ ! #define CATALOG_VERSION_NO 201104181 #endif --- 53,58 ---- */ /* yyyymmddN */ ! #define CATALOG_VERSION_NO 201104221 #endif diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h new file mode 100644 index aa8cce5..9d19417 *** a/src/include/miscadmin.h --- b/src/include/miscadmin.h *************** do { \ *** 124,129 **** --- 124,130 ---- extern pid_t PostmasterPid; extern bool IsPostmasterEnvironment; extern PGDLLIMPORT bool IsUnderPostmaster; + extern bool IsBinaryUpgrade; extern bool ExitOnAnyError;
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I thought some more about this and I don't want autovacuum to run on the > > old server. This is because pg_dumpall --binary-upgrade --schema-only > > grabs the datfrozenxid for all the databases at the start, then connects > > to each database to gets the relfrozenxids. I don't want to risk any > > advancement of either of those during the pg_dumpall run. > > Why? It doesn't really matter --- if you grab a value that is older > than the latest, it's still valid. As Robert said, you're > over-engineering this, and thereby introducing potential failure modes, > for no gain. Uh, I am kind of paranoid about pg_upgrade because it is trying to do something Postgres was never designed to do. I am a little worried that we would be assuming that pg_dumpall always does the datfrozenxid first and if we ever did it last we would have relfrozenxids before the datfrozenxid. I am worried if we don't prevent autovacuum on the old server that pg_upgrade will be more fragile to changes in other parts of the system. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, 2011-04-22 at 17:34 -0400, Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > I thought some more about this and I don't want autovacuum to run on the > > > old server. This is because pg_dumpall --binary-upgrade --schema-only > > > grabs the datfrozenxid for all the databases at the start, then connects > > > to each database to gets the relfrozenxids. I don't want to risk any > > > advancement of either of those during the pg_dumpall run. > > > > Why? It doesn't really matter --- if you grab a value that is older > > than the latest, it's still valid. As Robert said, you're > > over-engineering this, and thereby introducing potential failure modes, > > for no gain. > > Uh, I am kind of paranoid about pg_upgrade because it is trying to do > something Postgres was never designed to do. I am a little worried that > we would be assuming that pg_dumpall always does the datfrozenxid first > and if we ever did it last we would have relfrozenxids before the > datfrozenxid. I am worried if we don't prevent autovacuum on the old > server that pg_upgrade will be more fragile to changes in other parts of > the system. If we back-patch the "-b" to 8.3, then we can always use it on both the old and new systems. Upgrading to the latest patch-level on both old and new should be a prerequisite for pg_upgrade anyway. That would turn the catalog check from a special case (use "-b" sometimes, other times don't; which could cause fragility and bugs), into just another sanity check with an easy workaround ("your postgres doesn't support '-b', upgrade to the latest patch-level before upgrading"). One of the things I like about the design of pg_upgrade is that it doesn't seem to have a lot of special cases for different version combinations. What do you think? Regards,Jeff Davis
Jeff Davis wrote: > On Fri, 2011-04-22 at 17:34 -0400, Bruce Momjian wrote: > > Tom Lane wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > > > I thought some more about this and I don't want autovacuum to run on the > > > > old server. This is because pg_dumpall --binary-upgrade --schema-only > > > > grabs the datfrozenxid for all the databases at the start, then connects > > > > to each database to gets the relfrozenxids. I don't want to risk any > > > > advancement of either of those during the pg_dumpall run. > > > > > > Why? It doesn't really matter --- if you grab a value that is older > > > than the latest, it's still valid. As Robert said, you're > > > over-engineering this, and thereby introducing potential failure modes, > > > for no gain. > > > > Uh, I am kind of paranoid about pg_upgrade because it is trying to do > > something Postgres was never designed to do. I am a little worried that > > we would be assuming that pg_dumpall always does the datfrozenxid first > > and if we ever did it last we would have relfrozenxids before the > > datfrozenxid. I am worried if we don't prevent autovacuum on the old > > server that pg_upgrade will be more fragile to changes in other parts of > > the system. > > If we back-patch the "-b" to 8.3, then we can always use it on both the > old and new systems. Upgrading to the latest patch-level on both old and > new should be a prerequisite for pg_upgrade anyway. > > That would turn the catalog check from a special case (use "-b" > sometimes, other times don't; which could cause fragility and bugs), > into just another sanity check with an easy workaround ("your postgres > doesn't support '-b', upgrade to the latest patch-level before > upgrading"). > > One of the things I like about the design of pg_upgrade is that it > doesn't seem to have a lot of special cases for different version > combinations. > > What do you think? Well, I am concerned that there isn't enough testing of the -b flag to be sure it has zero effect on a running server that is not doing a binary upgrade, which is why I liked doing it only in 9.1. And we would still need code to check if the -b flag is supported. We can save this for 9.2 if people prefer, but we would still need a PG version check, rather than a catalog version check. Of course, if people prefer backpatch, we can do that, but I would need many more eyes on this patch. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > I thought some more about this and I don't want autovacuum to run on the > > > old server. This is because pg_dumpall --binary-upgrade --schema-only > > > grabs the datfrozenxid for all the databases at the start, then connects > > > to each database to gets the relfrozenxids. I don't want to risk any > > > advancement of either of those during the pg_dumpall run. > > > > Why? It doesn't really matter --- if you grab a value that is older > > than the latest, it's still valid. As Robert said, you're > > over-engineering this, and thereby introducing potential failure modes, > > for no gain. > > Uh, I am kind of paranoid about pg_upgrade because it is trying to do > something Postgres was never designed to do. I am a little worried that > we would be assuming that pg_dumpall always does the datfrozenxid first > and if we ever did it last we would have relfrozenxids before the > datfrozenxid. I am worried if we don't prevent autovacuum on the old > server that pg_upgrade will be more fragile to changes in other parts of > the system. Hold, I overstated the fragility issue above. I now realize that the old system is not going to change and that I only need to worry about future changes, where are handled by the new -b flag, so maybe we can get away with only stopping autovacuum on the new server, but I would need someone to verify that, and this would be a change in the way 9.0 pg_upgrade operated because it did disable autovacuum on the old and new servers with 99.9% reliability. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Bruce Momjian wrote: > > Tom Lane wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > > > I thought some more about this and I don't want autovacuum to run on the > > > > old server. This is because pg_dumpall --binary-upgrade --schema-only > > > > grabs the datfrozenxid for all the databases at the start, then connects > > > > to each database to gets the relfrozenxids. I don't want to risk any > > > > advancement of either of those during the pg_dumpall run. > > > > > > Why? It doesn't really matter --- if you grab a value that is older > > > than the latest, it's still valid. As Robert said, you're > > > over-engineering this, and thereby introducing potential failure modes, > > > for no gain. > > > > Uh, I am kind of paranoid about pg_upgrade because it is trying to do > > something Postgres was never designed to do. I am a little worried that > > we would be assuming that pg_dumpall always does the datfrozenxid first > > and if we ever did it last we would have relfrozenxids before the > > datfrozenxid. I am worried if we don't prevent autovacuum on the old > > server that pg_upgrade will be more fragile to changes in other parts of > > the system. > > Hold, I overstated the fragility issue above. I now realize that the > old system is not going to change and that I only need to worry about > future changes, where are handled by the new -b flag, so maybe we can > get away with only stopping autovacuum on the new server, but I would > need someone to verify that, and this would be a change in the way 9.0 > pg_upgrade operated because it did disable autovacuum on the old and new > servers with 99.9% reliability. Well, having seen no replies, I am going to apply the version of the patch in a few days that keeps the old vacuum-disable behavior for older releases, and uses the -b flag for newer ones by testing the catalog version, e.g.: snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " "-o \"-p %d %s\" start>> \"%s\" 2>&1" SYSTEMQUOTE, bindir, output_filename, datadir, port, (cluster->controldata.cat_ver>= BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? "-b" : "-c autovacuum=off-c autovacuum_freeze_max_age=2000000000", log_opts.filename); I know people like that pg_upgrade doesn't care much about what version it is running on, but it is really the ability of pg_upgrade to ignore changes made to the server that is really why pg_upgrade is useful, and this change makes pg_upgrade even more immune to such changes. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian wrote: > Well, having seen no replies, I am going to apply the version of the > patch in a few days that keeps the old vacuum-disable behavior for older > releases, and uses the -b flag for newer ones by testing the catalog > version, e.g.: > > snprintf(cmd, sizeof(cmd), > SYSTEMQUOTE "\"%s/pg_ctl\" -l \"%s\" -D \"%s\" " > "-o \"-p %d %s\" start >> \"%s\" 2>&1" SYSTEMQUOTE, > bindir, output_filename, datadir, port, > (cluster->controldata.cat_ver >= > BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? "-b" : > "-c autovacuum=off -c autovacuum_freeze_max_age=2000000000", > log_opts.filename); > > I know people like that pg_upgrade doesn't care much about what version > it is running on, but it is really the ability of pg_upgrade to ignore > changes made to the server that is really why pg_upgrade is useful, and > this change makes pg_upgrade even more immune to such changes. Applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +