Thread: pg_upgrade may be mortally wounded

pg_upgrade may be mortally wounded

From
Tom Lane
Date:
I re-enabled pg_upgrade this afternoon, thinking that it would be easier
to use than dump/initdb/reload for coping with the pg_statistic change
I'm about to commit.  However, testing shows that it doesn't really
work.  The "upgraded" database behaves very strangely --- vacuum tends
to fail, and I have seen duplicate listings for attributes of a relation
in psql's \d listing, broken links between a relation and its indices,
and other problems.

I think the problem is that pg_upgrade no longer works in the presence
of MVCC.  In particular, forcibly moving the old database's pg_log into
the new is probably a bad idea when there is no similarity between the
sets of committed transaction numbers.  I suspect the reason for the
strange behaviors I've seen is that after the pg_log copy, the system no
longer believes that all of the rows in the new database's system tables
have been committed.

Is it possible to make pg_upgrade work again, perhaps by requiring a
vacuum on the old and/or new databases just before the move happens?
Or must we consign pg_upgrade to the dustbin of history?
        regards, tom lane


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Bruce Momjian
Date:
> I re-enabled pg_upgrade this afternoon, thinking that it would be easier
> to use than dump/initdb/reload for coping with the pg_statistic change
> I'm about to commit.  However, testing shows that it doesn't really
> work.  The "upgraded" database behaves very strangely --- vacuum tends
> to fail, and I have seen duplicate listings for attributes of a relation
> in psql's \d listing, broken links between a relation and its indices,
> and other problems.
> 
> I think the problem is that pg_upgrade no longer works in the presence
> of MVCC.  In particular, forcibly moving the old database's pg_log into
> the new is probably a bad idea when there is no similarity between the
> sets of committed transaction numbers.  I suspect the reason for the
> strange behaviors I've seen is that after the pg_log copy, the system no
> longer believes that all of the rows in the new database's system tables
> have been committed.
> 
> Is it possible to make pg_upgrade work again, perhaps by requiring a
> vacuum on the old and/or new databases just before the move happens?
> Or must we consign pg_upgrade to the dustbin of history?

I am unsure how MVCC would affect this.  I will say that pg_upgrade does
not work when the underlying table structure changes, though I don't
think we have changed any of that.  Strange.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Tom Lane
Date:
>> I think the problem is that pg_upgrade no longer works in the presence
>> of MVCC.  In particular, forcibly moving the old database's pg_log into
>> the new is probably a bad idea when there is no similarity between the
>> sets of committed transaction numbers.  I suspect the reason for the
>> strange behaviors I've seen is that after the pg_log copy, the system no
>> longer believes that all of the rows in the new database's system tables
>> have been committed.

Some preliminary experiments suggest that vacuuming the new database
just before moving the data files solves the problem --- at least,
pg_upgrade seems to work then.  I will commit this change, since it's
very clear that pg_upgrade doesn't work without it.

However, I'd sure like to hear Vadim's opinion before I trust pg_upgrade
with MVCC very far...


BTW, it seems to me that it is a good idea to kill and restart the
postmaster immediately after pg_upgrade finishes.  Otherwise there might
be buffers in shared memory that do not reflect the actual contents of
the corresponding pages of the relation files (now that pg_upgrade
overwrote the files with other data).

Another potential gotcha is that it'd be a really bad idea to let any
other clients connect to the new database while it's being built.

Looking at these two items together, it seems like the really safe way
for pg_upgrade to operate would be *not* to start a postmaster for the
new database until after pg_upgrade finishes; that is, the procedure
would be "initdb; pg_upgrade; start postmaster".  pg_upgrade would
operate by invoking a standalone backend for initial table creation.
This would guarantee no unwanted interference from other clients
during the critical steps.

The tricky part is that pg_dump output includes psql \connect commands,
which AFAIK are not accepted by a standalone backend.  We'd have to
figure out another solution for those.  Ideas?
        regards, tom lane

PS: if you try to test pg_upgrade by running the regression database
through it, and then "vacuum analyze" the result, you will observe a
backend crash when vacuum gets to the table "c_star".  This seems to be
the fault of a bug that Chris Bitmead has complained of in the past.
c_star has had a column added via inherited ALTER TABLE ADD COLUMN, and
the output of pg_dump creates a database with a different column order
for such a table than ADD COLUMN does.  So, the reconstructed database
schema does not match the table data that pg_upgrade has moved in.  Ugh.
But we already knew that inherited ADD COLUMN is pretty bogus.  I wonder
whether we shouldn't just disable it until it can be fixed properly...


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Bruce Momjian
Date:
> BTW, it seems to me that it is a good idea to kill and restart the
> postmaster immediately after pg_upgrade finishes.  Otherwise there might
> be buffers in shared memory that do not reflect the actual contents of
> the corresponding pages of the relation files (now that pg_upgrade
> overwrote the files with other data).

Honestly, I have been surprised at how well pg_upgrade worked in 6.4.  I
got very few complaints, and I think people used it.

Your issue with buffer cache is a major one.  Clearly, this would be a
problem.   However, it is my understanding that the buffer cache after
initdb would only contain system table info, so if they pg_upgrade after
that, there is no way they have bad stuf in the cache, right?


> 
> Another potential gotcha is that it'd be a really bad idea to let any
> other clients connect to the new database while it's being built.

That is pretty obvious, and just basic sysadmin.

> 
> Looking at these two items together, it seems like the really safe way
> for pg_upgrade to operate would be *not* to start a postmaster for the
> new database until after pg_upgrade finishes; that is, the procedure
> would be "initdb; pg_upgrade; start postmaster".  pg_upgrade would
> operate by invoking a standalone backend for initial table creation.
> This would guarantee no unwanted interference from other clients
> during the critical steps.
> 
> The tricky part is that pg_dump output includes psql \connect commands,
> which AFAIK are not accepted by a standalone backend.  We'd have to
> figure out another solution for those.  Ideas?
> 
>             regards, tom lane
> 
> PS: if you try to test pg_upgrade by running the regression database
> through it, and then "vacuum analyze" the result, you will observe a
> backend crash when vacuum gets to the table "c_star".  This seems to be
> the fault of a bug that Chris Bitmead has complained of in the past.
> c_star has had a column added via inherited ALTER TABLE ADD COLUMN, and
> the output of pg_dump creates a database with a different column order
> for such a table than ADD COLUMN does.  So, the reconstructed database
> schema does not match the table data that pg_upgrade has moved in.  Ugh.
> But we already knew that inherited ADD COLUMN is pretty bogus.  I wonder
> whether we shouldn't just disable it until it can be fixed properly...
> 

And report a message to the user.  Good idea.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> BTW, it seems to me that it is a good idea to kill and restart the
>> postmaster immediately after pg_upgrade finishes.  Otherwise there might
>> be buffers in shared memory that do not reflect the actual contents of
>> the corresponding pages of the relation files (now that pg_upgrade
>> overwrote the files with other data).

> Your issue with buffer cache is a major one.  Clearly, this would be a
> problem.   However, it is my understanding that the buffer cache after
> initdb would only contain system table info, so if they pg_upgrade after
> that, there is no way they have bad stuf in the cache, right?

Cached copies of system tables obviously are no problem, since
pg_upgrade doesn't overwrite those.  I'm concerned whether there can
be cached copies of pages from user tables or indexes.  Since we've
just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack
is right), it seems at least possible that this would happen.

Now all those user tables will be empty (zero-length files), so there is
nothing to cache.  But the user indexes are *not* zero-length --- it looks
like they are at least 2 pages long even when empty.  So there seems
to be a real risk of having a cached copy of one of the pages of a user
index while pg_upgrade is overwriting the index file with new data...
        regards, tom lane


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Bruce Momjian
Date:
> Cached copies of system tables obviously are no problem, since
> pg_upgrade doesn't overwrite those.  I'm concerned whether there can
> be cached copies of pages from user tables or indexes.  Since we've
> just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack
> is right), it seems at least possible that this would happen.
> 
> Now all those user tables will be empty (zero-length files), so there is
> nothing to cache.  But the user indexes are *not* zero-length --- it looks
> like they are at least 2 pages long even when empty.  So there seems
> to be a real risk of having a cached copy of one of the pages of a user
> index while pg_upgrade is overwriting the index file with new data...

Oh, I see.  That would be a problem.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> ... So there seems
>> to be a real risk of having a cached copy of one of the pages of a user
>> index while pg_upgrade is overwriting the index file with new data...

> Oh, I see.  That would be a problem.

OK, then what do you think of the idea of changing pg_upgrade to use
a standalone backend, so that no postmaster is running while it runs?
That'd eliminate the shared-memory-cache issue and also prevent
accidental interference from other clients.

There's an awk script in there already that processes the pg_dump
script, so maybe we could change it to look for \connect commands
and replace them by re-executions of the backend.

BTW, do you think it's really necessary for the awk script to remove
COPY commands?  There shouldn't be any unwanted copies in there in
the first place, if the user made the dump with -s per instructions...
        regards, tom lane


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >> ... So there seems
> >> to be a real risk of having a cached copy of one of the pages of a user
> >> index while pg_upgrade is overwriting the index file with new data...
> 
> > Oh, I see.  That would be a problem.
> 
> OK, then what do you think of the idea of changing pg_upgrade to use
> a standalone backend, so that no postmaster is running while it runs?
> That'd eliminate the shared-memory-cache issue and also prevent
> accidental interference from other clients.
> 
> There's an awk script in there already that processes the pg_dump
> script, so maybe we could change it to look for \connect commands
> and replace them by re-executions of the backend.

That is risky.  How do we know what flags to pass to the stand-alone
backend?  In most cases, there is not a backend running after a initdb. 
In fact, you can't have postmaster running during initdb.  I recommend
they be told in the instructions, and after the pg_upgrade finished to
print something reminding them to start and stop the postmaster. 
Because each backend flushes dirty pages on exit, after each psql
finishes, it has already updated the files with dirty pages, so
starting/stopping postmaster will not cause the replaced tables to be
modified, and then the cache will be empty.

> BTW, do you think it's really necessary for the awk script to remove
> COPY commands?  There shouldn't be any unwanted copies in there in
> the first place, if the user made the dump with -s per instructions...

But we don't know that they did that.  Maybe they found pg_upgrade
_after_ they performed the pg_dump.  Very likely.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Bruce Momjian
Date:
Tom, did we address this.  I forgot.


> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >> BTW, it seems to me that it is a good idea to kill and restart the
> >> postmaster immediately after pg_upgrade finishes.  Otherwise there might
> >> be buffers in shared memory that do not reflect the actual contents of
> >> the corresponding pages of the relation files (now that pg_upgrade
> >> overwrote the files with other data).
> 
> > Your issue with buffer cache is a major one.  Clearly, this would be a
> > problem.   However, it is my understanding that the buffer cache after
> > initdb would only contain system table info, so if they pg_upgrade after
> > that, there is no way they have bad stuf in the cache, right?
> 
> Cached copies of system tables obviously are no problem, since
> pg_upgrade doesn't overwrite those.  I'm concerned whether there can
> be cached copies of pages from user tables or indexes.  Since we've
> just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack
> is right), it seems at least possible that this would happen.
> 
> Now all those user tables will be empty (zero-length files), so there is
> nothing to cache.  But the user indexes are *not* zero-length --- it looks
> like they are at least 2 pages long even when empty.  So there seems
> to be a real risk of having a cached copy of one of the pages of a user
> index while pg_upgrade is overwriting the index file with new data...
> 
>             regards, tom lane
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Tom, did we address this.  I forgot.

No, it's still an open issue as far as I'm concerned.  I was hoping to
hear something from Vadim about how pg_upgrade could work safely under
MVCC...
        regards, tom lane


>> Bruce Momjian <maillist@candle.pha.pa.us> writes:
>>>>> BTW, it seems to me that it is a good idea to kill and restart the
>>>>> postmaster immediately after pg_upgrade finishes.  Otherwise there might
>>>>> be buffers in shared memory that do not reflect the actual contents of
>>>>> the corresponding pages of the relation files (now that pg_upgrade
>>>>> overwrote the files with other data).
>> 
>>>> Your issue with buffer cache is a major one.  Clearly, this would be a
>>>> problem.   However, it is my understanding that the buffer cache after
>>>> initdb would only contain system table info, so if they pg_upgrade after
>>>> that, there is no way they have bad stuf in the cache, right?
>> 
>> Cached copies of system tables obviously are no problem, since
>> pg_upgrade doesn't overwrite those.  I'm concerned whether there can
>> be cached copies of pages from user tables or indexes.  Since we've
>> just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack
>> is right), it seems at least possible that this would happen.
>> 
>> Now all those user tables will be empty (zero-length files), so there is
>> nothing to cache.  But the user indexes are *not* zero-length --- it looks
>> like they are at least 2 pages long even when empty.  So there seems
>> to be a real risk of having a cached copy of one of the pages of a user
>> index while pg_upgrade is overwriting the index file with new data...


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Tom, did we address this.  I forgot.
>
> No, it's still an open issue as far as I'm concerned.  I was hoping to
> hear something from Vadim about how pg_upgrade could work safely under
> MVCC...
>
>             regards, tom lane

Would a solution to this be to add instructions to pg_upgrade to require
the user to stop and restart the postmaster?  Seems like that is the
only solution unless we do that stop of postmater inside pg_upgrade, but
that seems risky.

>
>
> >> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >>>>> BTW, it seems to me that it is a good idea to kill and restart the
> >>>>> postmaster immediately after pg_upgrade finishes.  Otherwise there might
> >>>>> be buffers in shared memory that do not reflect the actual contents of
> >>>>> the corresponding pages of the relation files (now that pg_upgrade
> >>>>> overwrote the files with other data).
> >>
> >>>> Your issue with buffer cache is a major one.  Clearly, this would be a
> >>>> problem.   However, it is my understanding that the buffer cache after
> >>>> initdb would only contain system table info, so if they pg_upgrade after
> >>>> that, there is no way they have bad stuf in the cache, right?
> >>
> >> Cached copies of system tables obviously are no problem, since
> >> pg_upgrade doesn't overwrite those.  I'm concerned whether there can
> >> be cached copies of pages from user tables or indexes.  Since we've
> >> just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack
> >> is right), it seems at least possible that this would happen.
> >>
> >> Now all those user tables will be empty (zero-length files), so there is
> >> nothing to cache.  But the user indexes are *not* zero-length --- it looks
> >> like they are at least 2 pages long even when empty.  So there seems
> >> to be a real risk of having a cached copy of one of the pages of a user
> >> index while pg_upgrade is overwriting the index file with new data...
>


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Tom, did we address this.  I forgot.
>
> No, it's still an open issue as far as I'm concerned.  I was hoping to
> hear something from Vadim about how pg_upgrade could work safely under
> MVCC...
>

I don't think there is going to be any way to fix the incorrect
postmaster buffers without restarting the postmaster, so I have added
this to the bottom of pg_upgrade:

    echo "You must stop/start the postmaster before doing anything else."

and have re-organized the instructions to tell them to stop/start the
postmaster right after running pg_upgrade.

As it is, 6.5.* upgrades can not use it, and 6.6 can not use it either
because base structures will change.  It does allow 6.5 people moving to
other 6.5 releases to use initdb to get new features.

Let's close this item.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] pg_upgrade may be mortally wounded

From
Bruce Momjian
Date:
New instructions now say that you must stop/restart postmaster after
upgrade.  That should fix problem because all index buffers are not
dirty, so stop/start just clears out buffers.


> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >> BTW, it seems to me that it is a good idea to kill and restart the
> >> postmaster immediately after pg_upgrade finishes.  Otherwise there might
> >> be buffers in shared memory that do not reflect the actual contents of
> >> the corresponding pages of the relation files (now that pg_upgrade
> >> overwrote the files with other data).
> 
> > Your issue with buffer cache is a major one.  Clearly, this would be a
> > problem.   However, it is my understanding that the buffer cache after
> > initdb would only contain system table info, so if they pg_upgrade after
> > that, there is no way they have bad stuf in the cache, right?
> 
> Cached copies of system tables obviously are no problem, since
> pg_upgrade doesn't overwrite those.  I'm concerned whether there can
> be cached copies of pages from user tables or indexes.  Since we've
> just done a bunch of CREATE INDEXes (and a VACUUM, if my latest hack
> is right), it seems at least possible that this would happen.
> 
> Now all those user tables will be empty (zero-length files), so there is
> nothing to cache.  But the user indexes are *not* zero-length --- it looks
> like they are at least 2 pages long even when empty.  So there seems
> to be a real risk of having a cached copy of one of the pages of a user
> index while pg_upgrade is overwriting the index file with new data...
> 
>             regards, tom lane
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026