Thread: Re: [BUGS] 9.3beta2: Failure to pg_upgrade

Re: [BUGS] 9.3beta2: Failure to pg_upgrade

From
Alvaro Herrera
Date:
Jesse Denardo escribió:

> $ 9.2_dev/bin/pg_controldata data

> Latest checkpoint's NextMultiXactId:  2982
> Latest checkpoint's NextMultiOffset:  6479

So what's happening here is that the MultiXact 2982 lives in a SLRU page
that doesn't exist.  pg_upgrade didn't copy the pg_multixact files from
the old cluster, because they are not compatible; instead it just sets
the values in pg_control.  As soon as a new multixact is to be created,
things fail because the code is not prepared to deal with the
possibility that the underlying SLRU files have not been extended during
normal operation.

I see two ways to deal with this:

1. On each multixact creation, verify whether the pages we're trying to
modify do in fact exist.  If they don't, create them.

2. At startup, verify the "next" multixact values, and extend the files
if necessary.

I think (1) is not a very good idea because it will cause too large an
impact at runtime, when it is not really necessary.  I lean more towards
(2).  On IM, Bruce suggested instead:

2a. Same as (2), but only do it in pg_upgrade's usage of postgres'
binary-upgrade mode (postgres -b).  Thus this will be done once during
the upgrade process and not every time the system starts up.


As it turns out, I have a patched slru.c that adds a new function to
verify whether a page exists on disk.  I created this for the commit
timestamp module, for the BDR branch, but I think it's what we need
here.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [BUGS] 9.3beta2: Failure to pg_upgrade

From
Alvaro Herrera
Date:
Alvaro Herrera escribió:

> As it turns out, I have a patched slru.c that adds a new function to
> verify whether a page exists on disk.  I created this for the commit
> timestamp module, for the BDR branch, but I think it's what we need
> here.

Here's a patch that should fix the problem.  Jesse, if you're able to
test it, please give it a run and let me know if it works for you.  I
was able to upgrade an installation containing a problem that should
reproduce yours.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [BUGS] 9.3beta2: Failure to pg_upgrade

From
Andres Freund
Date:
On 2013-08-02 18:17:43 -0400, Alvaro Herrera wrote:
> Alvaro Herrera escribió:
> 
> > As it turns out, I have a patched slru.c that adds a new function to
> > verify whether a page exists on disk.  I created this for the commit
> > timestamp module, for the BDR branch, but I think it's what we need
> > here.
> 
> Here's a patch that should fix the problem.  Jesse, if you're able to
> test it, please give it a run and let me know if it works for you.  I
> was able to upgrade an installation containing a problem that should
> reproduce yours.

Wouldn't it be easier to make pg_upgrade fudge pg_control to have a safe
NextMultiXactId/Offset using pg_resetxlog?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [BUGS] 9.3beta2: Failure to pg_upgrade

From
Alvaro Herrera
Date:
Andres Freund escribió:
> On 2013-08-02 18:17:43 -0400, Alvaro Herrera wrote:
> > Alvaro Herrera escribió:
> > 
> > > As it turns out, I have a patched slru.c that adds a new function to
> > > verify whether a page exists on disk.  I created this for the commit
> > > timestamp module, for the BDR branch, but I think it's what we need
> > > here.
> > 
> > Here's a patch that should fix the problem.  Jesse, if you're able to
> > test it, please give it a run and let me know if it works for you.  I
> > was able to upgrade an installation containing a problem that should
> > reproduce yours.
> 
> Wouldn't it be easier to make pg_upgrade fudge pg_control to have a safe
> NextMultiXactId/Offset using pg_resetxlog?

I don't understand.  pg_upgrade already fudges pg_control to have a safe
next multi, namely the same value used by the old cluster.  The reason
to preserve this value is that we must ensure no older value is
consulted in pg_multixact: those might be present in tuples that were
locked in the old cluster.  (To be precise, this is the value to set as
oldest multi, not next multi.  But of course, the next multi must be
greater than that one.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [BUGS] 9.3beta2: Failure to pg_upgrade

From
Bruce Momjian
Date:
On Fri, Aug  2, 2013 at 11:20:37PM -0400, Jesse Denardo wrote:
> Alvaro,
> 
> I applied the patch and tried upgrading again, and everything seemed to work as
> expected. We are now up and running the beta!

Yeah, great, thanks everyone!

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: [BUGS] 9.3beta2: Failure to pg_upgrade

From
Andres Freund
Date:
On 2013-08-02 22:25:36 -0400, Alvaro Herrera wrote:
> Andres Freund escribió:
> > On 2013-08-02 18:17:43 -0400, Alvaro Herrera wrote:
> > > Alvaro Herrera escribió:
> > > 
> > > > As it turns out, I have a patched slru.c that adds a new function to
> > > > verify whether a page exists on disk.  I created this for the commit
> > > > timestamp module, for the BDR branch, but I think it's what we need
> > > > here.
> > > 
> > > Here's a patch that should fix the problem.  Jesse, if you're able to
> > > test it, please give it a run and let me know if it works for you.  I
> > > was able to upgrade an installation containing a problem that should
> > > reproduce yours.
> > 
> > Wouldn't it be easier to make pg_upgrade fudge pg_control to have a safe
> > NextMultiXactId/Offset using pg_resetxlog?
> 
> I don't understand.  pg_upgrade already fudges pg_control to have a safe
> next multi, namely the same value used by the old cluster.  The reason
> to preserve this value is that we must ensure no older value is
> consulted in pg_multixact: those might be present in tuples that were
> locked in the old cluster.  (To be precise, this is the value to set as
> oldest multi, not next multi.  But of course, the next multi must be
> greater than that one.)

I am suggesting to set them to a greater value than in the old cluster,
computed so it's guaranteed that they are proper page boundaries. Then
the situation described upthread shouldn't occur anymore, right?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [BUGS] 9.3beta2: Failure to pg_upgrade

From
Jesse Denardo
Date:
Alvaro,

I applied the patch and tried upgrading again, and everything seemed to work as expected. We are now up and running the beta!


--
Jesse Denardo


On Fri, Aug 2, 2013 at 10:25 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Andres Freund escribió:
> On 2013-08-02 18:17:43 -0400, Alvaro Herrera wrote:
> > Alvaro Herrera escribió:
> >
> > > As it turns out, I have a patched slru.c that adds a new function to
> > > verify whether a page exists on disk.  I created this for the commit
> > > timestamp module, for the BDR branch, but I think it's what we need
> > > here.
> >
> > Here's a patch that should fix the problem.  Jesse, if you're able to
> > test it, please give it a run and let me know if it works for you.  I
> > was able to upgrade an installation containing a problem that should
> > reproduce yours.
>
> Wouldn't it be easier to make pg_upgrade fudge pg_control to have a safe
> NextMultiXactId/Offset using pg_resetxlog?

I don't understand.  pg_upgrade already fudges pg_control to have a safe
next multi, namely the same value used by the old cluster.  The reason
to preserve this value is that we must ensure no older value is
consulted in pg_multixact: those might be present in tuples that were
locked in the old cluster.  (To be precise, this is the value to set as
oldest multi, not next multi.  But of course, the next multi must be
greater than that one.)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [BUGS] 9.3beta2: Failure to pg_upgrade

From
Alvaro Herrera
Date:
Jesse Denardo escribió:
> Alvaro,
> 
> I applied the patch and tried upgrading again, and everything seemed to
> work as expected. We are now up and running the beta!

Pushed, thanks.


-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services