[GENERAL] PostgreSQL mirroring from RPM install to RPM install-revisited - Mailing list pgsql-general

From Richard Brosnahan
Subject [GENERAL] PostgreSQL mirroring from RPM install to RPM install-revisited
Date
Msg-id f68449c3-a011-41b8-abc0-04a0d4392c1d@me.com
Whole thread Raw
Responses Re: [GENERAL] PostgreSQL mirroring from RPM install to RPMinstall-revisited  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Hi all,

Way back in December I posted a question about mirroring from an RPM installed PostgreSQL (binary) to a source built PostgreSQL, with the same version (9.4.1 --> 9.4.1). Both servers are running OEL6. 

I won't copy the entire thread from before, as the situation has changed a bit. The biggest changes are that I have root on the slave, temporarily, and I've installed PostgreSQL on the slave using yum (also binary).

I've followed all the instructions found here:

https://www.postgresql.org/docs/9.4/static/warm-standby.html#STREAMING-REPLICATION


The slave is running PostgreSQL 9.4.11 and was installed using yum. It runs fine after I've run initdb and set things up. The master was also installed from rpm binaries, but the installers used Puppet. That version is 9.4.1. Yes, I know I should be using the exact same version, but I couldn't find 9.4.1 in the PostgreSQL yum repo. 


When I replace its data directory as part of the mirroring instructions, using pg_basebackup, PostgreSQL won't start. I used pg_basebackup. 


I get a checksum error, from pg_ctl.

2016-12-15 08:27:14.520 PST >FATAL: incorrect checksum in control file


Previously, Tom Lane suggested I try this:

You could try using pg_controldata to compare the pg_control contents;

it should be willing to print field values even if it thinks the checksum

is bad. It would be interesting to see (a) what the master's

pg_controldata prints about its pg_control, (b) what the slave's

pg_controldata prints about pg_control from a fresh initdb there, and

(c) what the slave's pg_controldata prints about the copied pg_control.


For Tom's requests (a and b), I can provide good output from pg_controldata from the master with production data, and from the slave right after initdb. I'll provide that on request.


for Tom's request (c) I get this from the slave, after data is copied.

$ pg_controldata

WARNING: Calculated CRC checksum does not match value stored in file.

Either the file is corrupt, or it has a different layout than this program

is expecting.  The results below are untrustworthy.


Segmentation fault (core dumped)


With this new installation on the slave, same result. core dump


Tom Lane then suggested:

$ gdb path/to/pg_controldata

gdb> run /apps/database/postgresql-data

(wait

for it to report segfault)

gdb> bt


Since I now have gdb, I can do that:

$ gdb /usr/pgsql-9.4/bin/pg_controldata

-bash: gdb: command not found

-bash-4.1$ gdb /usr/pgsql-9.4/bin/pg_controldata

GNU gdb (GDB) Red Hat Enterprise Linux (7.2-90.el6)

Copyright (C) 2010 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-redhat-linux-gnu".

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>...

Reading symbols from /usr/pgsql-9.4/bin/pg_controldata...(no debugging symbols found)...done.

Missing separate debuginfos, use: debuginfo-install postgresql94-server-9.4.11-1PGDG.rhel6.x86_64

(gdb) run /var/lib/pgsql/9.4/data

Starting program: /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/9.4/data

WARNING: Calculated CRC checksum does not match value stored in file.

Either the file is corrupt, or it has a different layout than this program

is expecting.  The results below are untrustworthy.



Program received signal SIGSEGV, Segmentation fault.

0x00000033d20a3a15 in __strftime_internal () from /lib64/libc.so.6

(gdb) bt

#0  0x00000033d20a3a15 in __strftime_internal () from /lib64/libc.so.6

#1  0x00000033d20a5a36 in strftime_l () from /lib64/libc.so.6

#2  0x00000000004015c7 in ?? ()

#3  0x00000033d201ed1d in __libc_start_main () from /lib64/libc.so.6

#4  0x0000000000401349 in ?? ()

#5  0x00007fffffffe518 in ?? ()

#6  0x000000000000001c in ?? ()

#7  0x0000000000000002 in ?? ()

#8  0x00007fffffffe751 in ?? ()

#9  0x00007fffffffe773 in ?? ()

#10 0x0000000000000000 in ?? ()

(gdb)


pg_controldata shouldn't be core dumping. 


Should I give up trying to use 9.4.1 and 9.4.11 as master/slave? 


My options appear to be

1 upgrade the master to 9.4.11, which will be VERY DIFFICULT given its Puppet install, and the difficulty I have getting root access to our servers.

2 Downgrade the slave. This is easier than option 1, but I would need to find a yum repo that has that version. 

3 Make what I have work, somehow. 


Any assistance would be greatly appreciated!

-- 

Richard Brosnahan

pgsql-general by date:

Previous
From: David Hinkle
Date:
Subject: Re: [GENERAL] Bad planning data resulting in OOM killing of postgres
Next
From: Patrick B
Date:
Subject: [GENERAL] updating dup row