Thread: pg_upgrade may be mortally wounded
I re-enabled pg_upgrade this afternoon, thinking that it would be easier to use than dump/initdb/reload for coping with the pg_statistic change I'm about to commit. However, testing shows that it doesn't really work. The "upgraded" database behaves very strangely --- vacuum tends to fail, and I have seen duplicate listings for attributes of a relation in psql's \d listing, broken links between a relation and its indices, and other problems. I think the problem is that pg_upgrade no longer works in the presence of MVCC. In particular, forcibly moving the old database's pg_log into the new is probably a bad idea when there is no similarity between the sets of committed transaction numbers. I suspect the reason for the strange behaviors I've seen is that after the pg_log copy, the system no longer believes that all of the rows in the new database's system tables have been committed. Is it possible to make pg_upgrade work again, perhaps by requiring a vacuum on the old and/or new databases just before the move happens? Or must we consign pg_upgrade to the dustbin of history? regards, tom lane
> I re-enabled pg_upgrade this afternoon, thinking that it would be easier > to use than dump/initdb/reload for coping with the pg_statistic change > I'm about to commit. However, testing shows that it doesn't really > work. The "upgraded" database behaves very strangely --- vacuum tends > to fail, and I have seen duplicate listings for attributes of a relation > in psql's \d listing, broken links between a relation and its indices, > and other problems. > > I think the problem is that pg_upgrade no longer works in the presence > of MVCC. In particular, forcibly moving the old database's pg_log into > the new is probably a bad idea when there is no similarity between the > sets of committed transaction numbers. I suspect the reason for the > strange behaviors I've seen is that after the pg_log copy, the system no > longer believes that all of the rows in the new database's system tables > have been committed. > > Is it possible to make pg_upgrade work again, perhaps by requiring a > vacuum on the old and/or new databases just before the move happens? > Or must we consign pg_upgrade to the dustbin of history? I am unsure how MVCC would affect this. I will say that pg_upgrade does not work when the underlying table structure changes, though I don't think we have changed any of that. Strange. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
>> I think the problem is that pg_upgrade no longer works in the presence >> of MVCC. In particular, forcibly moving the old database's pg_log into >> the new is probably a bad idea when there is no similarity between the >> sets of committed transaction numbers. I suspect the reason for the >> strange behaviors I've seen is that after the pg_log copy, the system no >> longer believes that all of the rows in the new database's system tables >> have been committed. Some preliminary experiments suggest that vacuuming the new database just before moving the data files solves the problem --- at least, pg_upgrade seems to work then. I will commit this change, since it's very clear that pg_upgrade doesn't work without it. However, I'd sure like to hear Vadim's opinion before I trust pg_upgrade with MVCC very far... BTW, it seems to me that it is a good idea to kill and restart the postmaster immediately after pg_upgrade finishes. Otherwise there might be buffers in shared memory that do not reflect the actual contents of the corresponding pages of the relation files (now that pg_upgrade overwrote the files with other data). Another potential gotcha is that it'd be a really bad idea to let any other clients connect to the new database while it's being built. Looking at these two items together, it seems like the really safe way for pg_upgrade to operate would be *not* to start a postmaster for the new database until after pg_upgrade finishes; that is, the procedure would be "initdb; pg_upgrade; start postmaster". pg_upgrade would operate by invoking a standalone backend for initial table creation. This would guarantee no unwanted interference from other clients during the critical steps. The tricky part is that pg_dump output includes psql \connect commands, which AFAIK are not accepted by a standalone backend. We'd have to figure out another solution for those. Ideas? regards, tom lane PS: if you try to test pg_upgrade by running the regression database through it, and then "vacuum analyze" the result, you will observe a backend crash when vacuum gets to the table "c_star". This seems to be the fault of a bug that Chris Bitmead has complained of in the past. c_star has had a column added via inherited ALTER TABLE ADD COLUMN, and the output of pg_dump creates a database with a different column order for such a table than ADD COLUMN does. So, the reconstructed database schema does not match the table data that pg_upgrade has moved in. Ugh. But we already knew that inherited ADD COLUMN is pretty bogus. I wonder whether we shouldn't just disable it until it can be fixed properly...
> BTW, it seems to me that it is a good idea to kill and restart the > postmaster immediately after pg_upgrade finishes. Otherwise there might > be buffers in shared memory that do not reflect the actual contents of > the corresponding pages of the relation files (now that pg_upgrade > overwrote the files with other data). Honestly, I have been surprised at how well pg_upgrade worked in 6.4. I got very few complaints, and I think people used it. Your issue with buffer cache is a major one. Clearly, this would be a problem. However, it is my understanding that the buffer cache after initdb would only contain system table info, so if they pg_upgrade after that, there is no way they have bad stuf in the cache, right? > > Another potential gotcha is that it'd be a really bad idea to let any > other clients connect to the new database while it's being built. That is pretty obvious, and just basic sysadmin. > > Looking at these two items together, it seems like the really safe way > for pg_upgrade to operate would be *not* to start a postmaster for the > new database until after pg_upgrade finishes; that is, the procedure > would be "initdb; pg_upgrade; start postmaster". pg_upgrade would > operate by invoking a standalone backend for initial table creation. > This would guarantee no unwanted interference from other clients > during the critical steps. > > The tricky part is that pg_dump output includes psql \connect commands, > which AFAIK are not accepted by a standalone backend. We'd have to > figure out another solution for those. Ideas? > > regards, tom lane > > PS: if you try to test pg_upgrade by running the regression database > through it, and then "vacuum analyze" the result, you will observe a > backend crash when vacuum gets to the table "c_star". This seems to be > the fault of a bug that Chris Bitmead has complained of in the past. > c_star has had a column added via inherited ALTER TABLE ADD COLUMN, and > the output of pg_dump creates a database with a different column order > for such a table than ADD COLUMN does. So, the reconstructed database > schema does not match the table data that pg_upgrade has moved in. Ugh. > But we already knew that inherited ADD COLUMN is pretty bogus. I wonder > whether we shouldn't just disable it until it can be fixed properly... > And report a message to the user. Good idea. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> BTW, it seems to me that it is a good idea to kill and restart the >> postmaster immediately after pg_upgrade finishes. Otherwise there might >> be buffers in shared memory that do not reflect the actual contents of >> the corresponding pages of the relation files (now that pg_upgrade >> overwrote the files with other data). > Your issue with buffer cache is a major one. Clearly, this would be a > problem. However, it is my understanding that the buffer cache after > initdb would only contain system table info, so if they pg_upgrade after > that, there is no way they have bad stuf in the cache, right? Cached copies of system tables obviously are no problem, since pg_upgrade doesn't overwrite those. I'm concerned whether there can be cached copies of pages from user tables or indexes. Since we've just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack is right), it seems at least possible that this would happen. Now all those user tables will be empty (zero-length files), so there is nothing to cache. But the user indexes are *not* zero-length --- it looks like they are at least 2 pages long even when empty. So there seems to be a real risk of having a cached copy of one of the pages of a user index while pg_upgrade is overwriting the index file with new data... regards, tom lane
> Cached copies of system tables obviously are no problem, since > pg_upgrade doesn't overwrite those. I'm concerned whether there can > be cached copies of pages from user tables or indexes. Since we've > just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack > is right), it seems at least possible that this would happen. > > Now all those user tables will be empty (zero-length files), so there is > nothing to cache. But the user indexes are *not* zero-length --- it looks > like they are at least 2 pages long even when empty. So there seems > to be a real risk of having a cached copy of one of the pages of a user > index while pg_upgrade is overwriting the index file with new data... Oh, I see. That would be a problem. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> ... So there seems >> to be a real risk of having a cached copy of one of the pages of a user >> index while pg_upgrade is overwriting the index file with new data... > Oh, I see. That would be a problem. OK, then what do you think of the idea of changing pg_upgrade to use a standalone backend, so that no postmaster is running while it runs? That'd eliminate the shared-memory-cache issue and also prevent accidental interference from other clients. There's an awk script in there already that processes the pg_dump script, so maybe we could change it to look for \connect commands and replace them by re-executions of the backend. BTW, do you think it's really necessary for the awk script to remove COPY commands? There shouldn't be any unwanted copies in there in the first place, if the user made the dump with -s per instructions... regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> ... So there seems > >> to be a real risk of having a cached copy of one of the pages of a user > >> index while pg_upgrade is overwriting the index file with new data... > > > Oh, I see. That would be a problem. > > OK, then what do you think of the idea of changing pg_upgrade to use > a standalone backend, so that no postmaster is running while it runs? > That'd eliminate the shared-memory-cache issue and also prevent > accidental interference from other clients. > > There's an awk script in there already that processes the pg_dump > script, so maybe we could change it to look for \connect commands > and replace them by re-executions of the backend. That is risky. How do we know what flags to pass to the stand-alone backend? In most cases, there is not a backend running after a initdb. In fact, you can't have postmaster running during initdb. I recommend they be told in the instructions, and after the pg_upgrade finished to print something reminding them to start and stop the postmaster. Because each backend flushes dirty pages on exit, after each psql finishes, it has already updated the files with dirty pages, so starting/stopping postmaster will not cause the replaced tables to be modified, and then the cache will be empty. > BTW, do you think it's really necessary for the awk script to remove > COPY commands? There shouldn't be any unwanted copies in there in > the first place, if the user made the dump with -s per instructions... But we don't know that they did that. Maybe they found pg_upgrade _after_ they performed the pg_dump. Very likely. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom, did we address this. I forgot. > Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> BTW, it seems to me that it is a good idea to kill and restart the > >> postmaster immediately after pg_upgrade finishes. Otherwise there might > >> be buffers in shared memory that do not reflect the actual contents of > >> the corresponding pages of the relation files (now that pg_upgrade > >> overwrote the files with other data). > > > Your issue with buffer cache is a major one. Clearly, this would be a > > problem. However, it is my understanding that the buffer cache after > > initdb would only contain system table info, so if they pg_upgrade after > > that, there is no way they have bad stuf in the cache, right? > > Cached copies of system tables obviously are no problem, since > pg_upgrade doesn't overwrite those. I'm concerned whether there can > be cached copies of pages from user tables or indexes. Since we've > just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack > is right), it seems at least possible that this would happen. > > Now all those user tables will be empty (zero-length files), so there is > nothing to cache. But the user indexes are *not* zero-length --- it looks > like they are at least 2 pages long even when empty. So there seems > to be a real risk of having a cached copy of one of the pages of a user > index while pg_upgrade is overwriting the index file with new data... > > regards, tom lane > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Tom, did we address this. I forgot. No, it's still an open issue as far as I'm concerned. I was hoping to hear something from Vadim about how pg_upgrade could work safely under MVCC... regards, tom lane >> Bruce Momjian <maillist@candle.pha.pa.us> writes: >>>>> BTW, it seems to me that it is a good idea to kill and restart the >>>>> postmaster immediately after pg_upgrade finishes. Otherwise there might >>>>> be buffers in shared memory that do not reflect the actual contents of >>>>> the corresponding pages of the relation files (now that pg_upgrade >>>>> overwrote the files with other data). >> >>>> Your issue with buffer cache is a major one. Clearly, this would be a >>>> problem. However, it is my understanding that the buffer cache after >>>> initdb would only contain system table info, so if they pg_upgrade after >>>> that, there is no way they have bad stuf in the cache, right? >> >> Cached copies of system tables obviously are no problem, since >> pg_upgrade doesn't overwrite those. I'm concerned whether there can >> be cached copies of pages from user tables or indexes. Since we've >> just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack >> is right), it seems at least possible that this would happen. >> >> Now all those user tables will be empty (zero-length files), so there is >> nothing to cache. But the user indexes are *not* zero-length --- it looks >> like they are at least 2 pages long even when empty. So there seems >> to be a real risk of having a cached copy of one of the pages of a user >> index while pg_upgrade is overwriting the index file with new data...
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Tom, did we address this. I forgot. > > No, it's still an open issue as far as I'm concerned. I was hoping to > hear something from Vadim about how pg_upgrade could work safely under > MVCC... > > regards, tom lane Would a solution to this be to add instructions to pg_upgrade to require the user to stop and restart the postmaster? Seems like that is the only solution unless we do that stop of postmater inside pg_upgrade, but that seems risky. > > > >> Bruce Momjian <maillist@candle.pha.pa.us> writes: > >>>>> BTW, it seems to me that it is a good idea to kill and restart the > >>>>> postmaster immediately after pg_upgrade finishes. Otherwise there might > >>>>> be buffers in shared memory that do not reflect the actual contents of > >>>>> the corresponding pages of the relation files (now that pg_upgrade > >>>>> overwrote the files with other data). > >> > >>>> Your issue with buffer cache is a major one. Clearly, this would be a > >>>> problem. However, it is my understanding that the buffer cache after > >>>> initdb would only contain system table info, so if they pg_upgrade after > >>>> that, there is no way they have bad stuf in the cache, right? > >> > >> Cached copies of system tables obviously are no problem, since > >> pg_upgrade doesn't overwrite those. I'm concerned whether there can > >> be cached copies of pages from user tables or indexes. Since we've > >> just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack > >> is right), it seems at least possible that this would happen. > >> > >> Now all those user tables will be empty (zero-length files), so there is > >> nothing to cache. But the user indexes are *not* zero-length --- it looks > >> like they are at least 2 pages long even when empty. So there seems > >> to be a real risk of having a cached copy of one of the pages of a user > >> index while pg_upgrade is overwriting the index file with new data... > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Tom, did we address this. I forgot. > > No, it's still an open issue as far as I'm concerned. I was hoping to > hear something from Vadim about how pg_upgrade could work safely under > MVCC... > I don't think there is going to be any way to fix the incorrect postmaster buffers without restarting the postmaster, so I have added this to the bottom of pg_upgrade: echo "You must stop/start the postmaster before doing anything else." and have re-organized the instructions to tell them to stop/start the postmaster right after running pg_upgrade. As it is, 6.5.* upgrades can not use it, and 6.6 can not use it either because base structures will change. It does allow 6.5 people moving to other 6.5 releases to use initdb to get new features. Let's close this item. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
New instructions now say that you must stop/restart postmaster after upgrade. That should fix problem because all index buffers are not dirty, so stop/start just clears out buffers. > Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> BTW, it seems to me that it is a good idea to kill and restart the > >> postmaster immediately after pg_upgrade finishes. Otherwise there might > >> be buffers in shared memory that do not reflect the actual contents of > >> the corresponding pages of the relation files (now that pg_upgrade > >> overwrote the files with other data). > > > Your issue with buffer cache is a major one. Clearly, this would be a > > problem. However, it is my understanding that the buffer cache after > > initdb would only contain system table info, so if they pg_upgrade after > > that, there is no way they have bad stuf in the cache, right? > > Cached copies of system tables obviously are no problem, since > pg_upgrade doesn't overwrite those. I'm concerned whether there can > be cached copies of pages from user tables or indexes. Since we've > just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack > is right), it seems at least possible that this would happen. > > Now all those user tables will be empty (zero-length files), so there is > nothing to cache. But the user indexes are *not* zero-length --- it looks > like they are at least 2 pages long even when empty. So there seems > to be a real risk of having a cached copy of one of the pages of a user > index while pg_upgrade is overwriting the index file with new data... > > regards, tom lane > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026