Thread: [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

[PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

From
Marti Raudsepp
Date:
Hi list

One of my databases failed to upgrade successfully and produced this error in the copying phase:

error while copying relation "pg_catalog.pg_largeobject" ("/srv/ssd/PG_9.3_201306121/1/12023" to "/PG_9.4_201409291/1/12130"): No such file or directory

Turns out this happens when either the "postgres" or "template1" databases have been moved to a non-default tablespace. pg_dumpall does not dump attributes (such as tablespace) for these databases; pg_upgrade queries the new cluster about the tablespace for these relations and builds a broken destination path for the copy/link operation.

The least bad solution seems to be generating ALTER DATBASE SET TABLESPACE commands for these from pg_dumpall. Previously a --globals-only dump didn't generate psql \connect commands, but you can't run SET TABLESPACE from within the same database you're connected to. So to move "postgres", it needs to connect to "template1" and vice versa. That seems fine for the purpose of pg_upgrade which can assume a freshly created cluster with both databases intact.

I have implemented a proof of concept patch for this. Currently I'm only tackling the binary upgrade failure and not general pg_dumpall.

Alternatively, we could allow SET TABLESPACE in the current database, which seems less ugly to me. A code comment says "Obviously can't move the tables of my own database", but it's not obvious to me why. If I'm the only connected backend, it seems that any caches and open files could be invalidated. But I don't know how big of an undertaking that would be.

----

While working on this, I also noticed other things that pg_dumpall (and this pg_upgrade) misses:
* Nothing at all is dumped for the template0 database, although ACLs, settings and the tablespace can be changed by the user
* template1 & postgres databases retain ACLs and settings, but not attributes like TABLESPACE or CONNECTION LIMIT. Other attributes like LC_COLLATE and LC_CTYPE can't be changed without recreating the DB, but those don't  matter for the pg_upgrade case anyway.

It seems those would be good material for another patch?

Regards,
Marti Raudsepp

Attachment
On Fri, Jun 19, 2015 at 12:10 PM, Marti Raudsepp <marti@juffo.org> wrote:
> One of my databases failed to upgrade successfully and produced this error
> in the copying phase:
>
> error while copying relation "pg_catalog.pg_largeobject"
> ("/srv/ssd/PG_9.3_201306121/1/12023" to "/PG_9.4_201409291/1/12130"): No
> such file or directory
>
> Turns out this happens when either the "postgres" or "template1" databases
> have been moved to a non-default tablespace. pg_dumpall does not dump
> attributes (such as tablespace) for these databases; pg_upgrade queries the
> new cluster about the tablespace for these relations and builds a broken
> destination path for the copy/link operation.
>
> The least bad solution seems to be generating ALTER DATBASE SET TABLESPACE
> commands for these from pg_dumpall. Previously a --globals-only dump didn't
> generate psql \connect commands, but you can't run SET TABLESPACE from
> within the same database you're connected to. So to move "postgres", it
> needs to connect to "template1" and vice versa. That seems fine for the
> purpose of pg_upgrade which can assume a freshly created cluster with both
> databases intact.
>
> I have implemented a proof of concept patch for this. Currently I'm only
> tackling the binary upgrade failure and not general pg_dumpall.

I haven't looked at the patch, but this seems like a good thing to
fix.  Hopefully Bruce will take a look soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Mon, Jun 22, 2015 at 9:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 19, 2015 at 12:10 PM, Marti Raudsepp <marti@juffo.org> wrote:
>> One of my databases failed to upgrade successfully and produced this error
>> in the copying phase:
>>
>> error while copying relation "pg_catalog.pg_largeobject"
>> ("/srv/ssd/PG_9.3_201306121/1/12023" to "/PG_9.4_201409291/1/12130"): No
>> such file or directory

> I haven't looked at the patch, but this seems like a good thing to
> fix.  Hopefully Bruce will take a look soon.

Ping? This has gone a month without a reply.

Should I add this patch to a commitfest? (I was under the impression
that bugfixes don't normally go through the commitfest process)

Regards,
Marti Raudsepp



On Mon, Jul 20, 2015 at 4:31 PM, Marti Raudsepp <marti@juffo.org> wrote:
> On Mon, Jun 22, 2015 at 9:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jun 19, 2015 at 12:10 PM, Marti Raudsepp <marti@juffo.org> wrote:
>>> One of my databases failed to upgrade successfully and produced this error
>>> in the copying phase:
>>>
>>> error while copying relation "pg_catalog.pg_largeobject"
>>> ("/srv/ssd/PG_9.3_201306121/1/12023" to "/PG_9.4_201409291/1/12130"): No
>>> such file or directory
>
>> I haven't looked at the patch, but this seems like a good thing to
>> fix.  Hopefully Bruce will take a look soon.
>
> Ping? This has gone a month without a reply.
>
> Should I add this patch to a commitfest? (I was under the impression
> that bugfixes don't normally go through the commitfest process)

After a certain amount of time without anything happening, I would
recommend just adding it to a CF to have it get attention. I imagine
that it is one of the reasons why there is as well a category "Bug
Fix".
-- 
Michael



On Mon, Jul 20, 2015 at 8:20 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> After a certain amount of time without anything happening, I would
> recommend just adding it to a CF to have it get attention. I imagine
> that it is one of the reasons why there is as well a category "Bug
> Fix".

+1.  I would recommend adding it to the CF *immediately* to have it
get attention.   The CF app is basically our patch tracker.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, Jul 22, 2015 at 5:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> +1.  I would recommend adding it to the CF *immediately* to have it
> get attention.   The CF app is basically our patch tracker.

Thanks, I have done so now: https://commitfest.postgresql.org/6/313/

Regards,
Marti



On Fri, Jun 19, 2015 at 07:10:40PM +0300, Marti Raudsepp wrote:
> Hi list

Sorry I am just getting this report.  Thanks to the people who "stalled"
for me.

> One of my databases failed to upgrade successfully and produced this error in
> the copying phase:
>
> error while copying relation "pg_catalog.pg_largeobject" ("/srv/ssd/
> PG_9.3_201306121/1/12023" to "/PG_9.4_201409291/1/12130"): No such file or
> directory

As with all good bug reports, there are multiple bugs here.  ;-)  Notice
the second path has an invalid prefix:  "/PG_9.4_201409291/1/12130" ---
obviously something is seriously wrong.  The failure of pg_dumpall to
move 'postgres' and 'template1' databases to the new tablespace cannot
explain that, i.e. I could understand pg_upgrade looking for
pg_largeobject in the default data directory, or in the new one, but not
in a path that doesn't even exist.  pg_upgrade tracks old and new
tablespaces separately, so there is obviously something wrong.

And I found it, patch attached.  The pg_upgrade code was testing for the
tablespace of the _old_ object, then assigning the old and _new_
tablespaces based on that.  The first patch fixes that, and should be
backpatched to all supported pg_upgrade versions.

> Turns out this happens when either the "postgres" or "template1" databases have
> been moved to a non-default tablespace. pg_dumpall does not dump attributes
> (such as tablespace) for these databases; pg_upgrade queries the new cluster
> about the tablespace for these relations and builds a broken destination path
> for the copy/link operation.
>
> The least bad solution seems to be generating ALTER DATBASE SET TABLESPACE
> commands for these from pg_dumpall. Previously a --globals-only dump didn't
> generate psql \connect commands, but you can't run SET TABLESPACE from within
> the same database you're connected to. So to move "postgres", it needs to
> connect to "template1" and vice versa. That seems fine for the purpose of
> pg_upgrade which can assume a freshly created cluster with both databases
> intact.

Yes, seems like a good solution.

> I have implemented a proof of concept patch for this. Currently I'm only
> tackling the binary upgrade failure and not general pg_dumpall.
>
> Alternatively, we could allow SET TABLESPACE in the current database, which
> seems less ugly to me. A code comment says "Obviously can't move the tables of
> my own database", but it's not obvious to me why. If I'm the only connected
> backend, it seems that any caches and open files could be invalidated. But I
> don't know how big of an undertaking that would be.

Your submitted patch, attached, also looks good, and should be
backpatched.  My only question is whether this should be for all runs of
pg_dumpall, not just in binary upgrade mode.  Comments?

Once we agree on this, I will apply these to all back branches and run
tests by moving template1 before the upgrade to make sure it works for
all PG versions.

> pg_upgrade) misses:
> * Nothing at all is dumped for the template0 database, although ACLs, settings
> and the tablespace can be changed by the user

Uh, we _assume_ no one is connecting to template0, but you are right
people can change things _about_ template0.  It would be nice to
preserve those.

> * template1 & postgres databases retain ACLs and settings, but not attributes
> like TABLESPACE or CONNECTION LIMIT. Other attributes like LC_COLLATE and
> LC_CTYPE can't be changed without recreating the DB, but those don't  matter
> for the pg_upgrade case anyway.
>
> It seems those would be good material for another patch?

Agreed.  I am not sure how we have gotten this far with so few
complaints about this problem.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

Re: [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

From
Andres Freund
Date:
Hi Bruce,

Are you actually waiting for review in this thread? Or is the CF entry
more of a reminder?

Andres



Re: [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

From
Bruce Momjian
Date:
On Thu, Sep  3, 2015 at 01:03:30PM +0200, Andres Freund wrote:
> Hi Bruce,
>
> Are you actually waiting for review in this thread? Or is the CF entry
> more of a reminder?

Oh, I have a commit-fest entry for this?  Well, whoever did that was
doing the right thing so things are not forgotten.  Yeah!  :-)

Anyway, I was going to work on this once I had read my email backlog,
but because of the commit-fest entry, I worked on it today instead.

I have developed the attached patch which fixes the pg_upgrade problem,
and the pg_dumpall problem with postgres and template1 in non-default
tablespaces.  I modified the pg_dumpall patch with the following changes:

*  It is a general pg_dumpall bug, not specific to binary upgrade mode,
so the new code will be used in non-binary upgrade mode too.

* I hard-coded the "connect template1" string, as it is a constant (no
need for %s and fmtId()).

*  You can't mix fprintf() and appendPQExpBuffer() and get the output in
the specified order, i.e. fprintf will come out first.  Now, in your
case, it didn't matter, but it isn't clean either.

*  I wanted the original database connection to be restored right after
the switch.

The new output looks like this:

    \connect postgres
    ALTER DATABASE template1 SET TABLESPACE tt;
    \connect template1

Based on our previous policy, these are both bugs and cause either
errors or inaccurate restores, so I plan to apply the attached patch to
all back branches.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

Re: [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

From
Hyongtae Lim
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            not tested

This looks good to me.
I've done some tests(pg_dumpall & restore with sql, upgrade test(8.4, 9.x, ...)) and there was no problems.
so I'm marking this as ready for committer.

Regards,
Hyongtae

The new status of this patch is: Ready for Committer



Re: Re: [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

From
Bruce Momjian
Date:
On Fri, Sep 11, 2015 at 12:41:09AM +0000, Hyongtae Lim wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, passed
> Spec compliant:           tested, passed
> Documentation:            not tested
> 
> This looks good to me.
> I've done some tests(pg_dumpall & restore with sql, upgrade test(8.4, 9.x, ...)) and there was no problems.
> so I'm marking this as ready for committer.

Thanks.  I am going to commit it in the next 24 hours.  Thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Re: [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

From
Bruce Momjian
Date:
On Thu, Sep 10, 2015 at 09:59:41PM -0400, Bruce Momjian wrote:
> On Fri, Sep 11, 2015 at 12:41:09AM +0000, Hyongtae Lim wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:       tested, passed
> > Spec compliant:           tested, passed
> > Documentation:            not tested
> > 
> > This looks good to me.
> > I've done some tests(pg_dumpall & restore with sql, upgrade test(8.4, 9.x, ...)) and there was no problems.
> > so I'm marking this as ready for committer.
> 
> Thanks.  I am going to commit it in the next 24 hours.  Thanks.

Patch applied and marked as committed in the commitfest app.

One odd thing I found in exhaustive testing is that pre-9.3, we use a
line starting with \connect to identify when the pg_dumpall globals end
and the per-database information begins.  (In 9.3+, we do per-database
dumps to allow parallel dump/restore.)  I modified the patch for pre-9.3
to prefix a space for \connect lines when we are jumping in and out of
the database so we can alter its tablespace.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +