Thread: Postgresql 9.1 replication failing

Postgresql 9.1 replication failing

From

Jim Buttafuoco

Date:

01 December 2011, 17:41:44

All,<br /><br />I have a large PG 9.1.1 server (over 1TB of data) and replica using log shipping.  I had some hardware
issueson the replica system and now I am getting the following in my pg_log/* files.  Same 2 lines over and over since
yesterday.<br/><br />2011-12-01 07:46:30 EST  >LOG:  restored log file "000000010000028E000000E5" from archive<br
/>2011-12-0107:46:30 EST  >LOG:  incorrect resource manager data checksum in record at 28E/E555E1B8<br /><br
/>AnythingI can do on the replica or do I have to start over?<br /><br />Finally, I know this is not the correct list,
Itried general with no answer.<br /><br />Thanks<br />Jim<br /><div
apple-content-edited="true">___________________________________________________________<br/><br /><br
/><span></span><span></span><span><imgapple-height="yes" apple-width="yes" height="67"
id="aea68a11-024f-4f2a-9ad9-aac2bdaaa400"src="cid:6330A43A-012D-4F5B-9908-82269F8D15EF@contactbda.com" width="153"
/></span><br/><br /><br /><br /><br />Jim Buttafuoco<br /><a
href="mailto:jim@contacttelecom.com">jim@contacttelecom.com</a><br/>603-647-7170 ext. 2222- Office<br />603-490-3409 -
Cell<br/>jimbuttafuoco - Skype<br /><br /><br /><br /><br /><br /><br /></div><br />

Re: Postgresql 9.1 replication failing

From

Robert Haas

Date:

01 December 2011, 18:00:12

On Thu, Dec 1, 2011 at 1:41 PM, Jim Buttafuoco <jim@contacttelecom.com> wrote:
> 2011-12-01 07:46:30 EST  >LOG:  restored log file "000000010000028E000000E5" from archive
> 2011-12-01 07:46:30 EST  >LOG:  incorrect resource manager data checksum in record at 28E/E555E1B8
>
> Anything I can do on the replica or do I have to start over?

I think you want to rebuild the standby.  Even if you could repair the
damaged WAL record, how can you have any confidence that there is no
other corruption?

Note that rsync has some options to only copy the changed data, which
might greatly accelerated resyncing the standby from the master.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgresql 9.1 replication failing

From

Jerry Sievers

Date:

01 December 2011, 18:03:13

Jim Buttafuoco <jim@contacttelecom.com> writes:

> All,
>
> I have a large PG 9.1.1 server (over 1TB of data) and replica using log shipping.  I had some hardware issues on the
> replica system and now I am getting the following in my pg_log/* files.  Same 2 lines over and over since yesterday.
>
> 2011-12-01 07:46:30 EST  >LOG:  restored log file "000000010000028E000000E5" from archive
> 2011-12-01 07:46:30 EST  >LOG:  incorrect resource manager data checksum in record at 28E/E555E1B8
>
> Anything I can do on the replica or do I have to start over?

INspect that WAL segment or possibly the one immediatly following it
in comparison to another copy if you still have it on the master or a
central WAL repository.

A standby crashing meanwhile copying in a WAL segment and/or synching
one to disk could result in ramdon corruption.

If you have another copy of the segment and does not compare equal to
the one your standby is trying to read, try another copy.

> Finally, I know this is not the correct list, I tried general with no answer.

The admin list is the right one for such a post probably.

HTH

> Thanks
> Jim
> ___________________________________________________________
>
> [cid]
>
> Jim Buttafuoco
> jim@contacttelecom.com
> 603-647-7170 ext. 2222- Office
> 603-490-3409 - Cell
> jimbuttafuoco - Skype
>

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 305.321.1144

Re: Postgresql 9.1 replication failing

From

Jim Buttafuoco

Date:

01 December 2011, 18:09:28

the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG "move" on?

On Dec 1, 2011, at 2:02 PM, Jerry Sievers wrote:

Jim Buttafuoco <jim@contacttelecom.com> writes:

All,

I have a large PG 9.1.1 server (over 1TB of data) and replica using log shipping. I had some hardware issues on the
replica system and now I am getting the following in my pg_log/* files. Same 2 lines over and over since yesterday.

2011-12-01 07:46:30 EST >LOG: restored log file "000000010000028E000000E5" from archive
2011-12-01 07:46:30 EST >LOG: incorrect resource manager data checksum in record at 28E/E555E1B8

Anything I can do on the replica or do I have to start over?

INspect that WAL segment or possibly the one immediatly following it
in comparison to another copy if you still have it on the master or a
central WAL repository.

A standby crashing meanwhile copying in a WAL segment and/or synching
one to disk could result in ramdon corruption.

If you have another copy of the segment and does not compare equal to
the one your standby is trying to read, try another copy.

Finally, I know this is not the correct list, I tried general with no answer.

The admin list is the right one for such a post probably.

HTH

Thanks
Jim
___________________________________________________________

[cid]

Jim Buttafuoco
jim@contacttelecom.com
603-647-7170 ext. 2222- Office
603-490-3409 - Cell
jimbuttafuoco - Skype

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 305.321.1144

___________________________________________________________

Jim Buttafuoco
jim@contacttelecom.com
603-647-7170 ext. 2222- Office
603-490-3409 - Cell
jimbuttafuoco - Skype

Attachment

image.gif

Re: Postgresql 9.1 replication failing

From

Simon Riggs

Date:

01 December 2011, 20:08:48

On Thu, Dec 1, 2011 at 7:09 PM, Jim Buttafuoco <jim@contacttelecom.com> wrote:

the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG "move" on?

Regenerate the master.

--

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgresql 9.1 replication failing

From

Simon Riggs

Date:

01 December 2011, 20:09:56

On Thu, Dec 1, 2011 at 9:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, Dec 1, 2011 at 7:09 PM, Jim Buttafuoco <jim@contacttelecom.com>
> wrote:
>>
>> the WAL file on the master is long gone, how would one inspect the web
>> segment?  Any way to have PG "move" on?
>
>
> Regenerate the master.

typo: regenerate *from* the master

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Postgresql 9.1 replication failing

From

Jim Buttafuoco

Date:

01 December 2011, 20:11:48

Simon,

What do you mean, start over with a base backup?

Jim

On Dec 1, 2011, at 4:08 PM, Simon Riggs wrote:

On Thu, Dec 1, 2011 at 7:09 PM, Jim Buttafuoco <jim@contacttelecom.com> wrote:
the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG "move" on?

Regenerate the master.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

___________________________________________________________

Jim Buttafuoco

jim@contacttelecom.com

603-647-7170 ext. 2222- Office

603-490-3409 - Cell

jimbuttafuoco - Skype

Attachment

image.gif

Re: Postgresql 9.1 replication failing

From

desmodemone

Date:

01 December 2011, 21:54:04

Hello Jim,
I think you not have other possibilities if the archives are corrupted and there are no possibilities to restore it,
you need to recreate the standby starting from a base backup.

Kind Regards

2011/12/1 Jim Buttafuoco <jim@contacttelecom.com>

Simon,

What do you mean, start over with a base backup?

Jim

On Dec 1, 2011, at 4:08 PM, Simon Riggs wrote:

On Thu, Dec 1, 2011 at 7:09 PM, Jim Buttafuoco <jim@contacttelecom.com> wrote:
the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG "move" on?

Regenerate the master.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

___________________________________________________________

Jim Buttafuoco
jim@contacttelecom.com
603-647-7170 ext. 2222- Office
603-490-3409 - Cell
jimbuttafuoco - Skype