[GENERAL] Re: [GENERAL] PostgreSQL mirroring from RPM install to RPM install-revisited - Mailing list pgsql-general

From Richard Brosnahan
Subject [GENERAL] Re: [GENERAL] PostgreSQL mirroring from RPM install to RPM install-revisited
Date
Msg-id dbf1e17a-6e86-43d8-a980-342622d976b4@me.com
Whole thread Raw
Responses Re: [GENERAL] PostgreSQL mirroring from RPM install to RPMinstall-revisited  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Thanks for the response Adrian,

Both servers are pretty much identical. 

uname -a
master
Linux devtmbm178 2.6.32-642.6.2.el6.x86_64 #1 SMP Tue Oct 25 13:37:48 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux

slave
Linux devtmbm176 2.6.32-642.11.1.el6.x86_64 #1 SMP Tue Nov 15 09:40:59 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

Since the last message, I've downgraded PostgreSQL to 9.4.1 on the slave, using 
rpm -Uvh --oldpackage [file names]

I had wisely kept copies of the rpm files for PostgreSQL 9.4.1 for OEL6 and used those. rpm did the downgrade without issue, and I tested the 9.4.1 PostgreSQL installation. The minimal testing I did after the install worked fine. initdb, start the server, psql, etc.

I then stopped the new slave PostgreSQL instance, and proceeded with the instructions for creating a slave. 
I again used pg_basebackup

postgres $ pg_basebackup -D /var/lib/pgsql/9.4/data --write-recovery-conf -h devtmbm178.unix.gsm1900.org -U pgrepuser -p 5432 -W


NOTICE:  pg_stop_backup complete, all required WAL segments have been archived


This executed without incident. 


After verifying, and modifying postgresql.conf, recovery.conf I attempted to start postgresql. This was again, not successful.


postgres $ pg_ctl start

server starting

-bash-4.1$ < 2017-02-17 12:13:53.176 PST >FATAL:  incorrect checksum in control file


postgres $ pg_controldata

WARNING: Calculated CRC checksum does not match value stored in file.

Either the file is corrupt, or it has a different layout than this program

is expecting.  The results below are untrustworthy.


Segmentation fault (core dumped)


Now I'm really unhappy. Same server architecture, same PostgreSQL versions. No joy!



-- 

Richard Brosnahan

On Feb 17, 2017, at 10:43 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 02/16/2017 04:39 PM, Richard Brosnahan wrote:
Hi all,

Way back in December I posted a question about mirroring from an RPM
installed PostgreSQL (binary) to a source built PostgreSQL, with the
same version (9.4.1 --> 9.4.1). Both servers are running OEL6.

I went back to the previous threads and I could not find if you ever
said whether the two systems are using the same hardware architecture or
not? Vincent Veyron asked but I can't find a response.


I won't copy the entire thread from before, as the situation has changed
a bit. The biggest changes are that I have root on the slave,
temporarily, and I've installed PostgreSQL on the slave using yum (also
binary).

I've followed all the instructions found here:

https://www.postgresql.org/docs/9.4/static/warm-standby.html#STREAMING-REPLICATION


The slave is running PostgreSQL 9.4.11 and was installed using yum.
It runs fine after I've run initdb and set things up. The master was
also installed from rpm binaries, but the installers used Puppet. That
version is 9.4.1. Yes, I know I should be using the exact same version,
but I couldn't find 9.4.1 in the PostgreSQL yum repo.


When I replace its data directory as part of the mirroring instructions,
using pg_basebackup, PostgreSQL won't start. I used pg_basebackup.


I get a checksum error, from pg_ctl.

2016-12-15 08:27:14.520 PST >FATAL: incorrect checksum in control file


Previously, Tom Lane suggested I try this:

You could try using pg_controldata to compare the pg_control contents;

it should be willing to print field values even if it thinks the checksum

is bad. It would be interesting to see (a) what the master's

pg_controldata prints about its pg_control, (b) what the slave's

pg_controldata prints about pg_control from a fresh initdb there, and

(c) what the slave's pg_controldata prints about the copied pg_control.


For Tom's requests (a and b), I can provide good output from
pg_controldata from the master with production data, and from the slave
right after initdb. I'll provide that on request.


for Tom's request (c) I get this from the slave, after data is copied.

$ pg_controldata

WARNING: Calculated CRC checksum does not match value stored in file.

Either the file is corrupt, or it has a different layout than this program

is expecting. The results below are untrustworthy.


Segmentation fault (core dumped)


With this new installation on the slave, same result. core dump


Tom Lane then suggested:

$ gdb path/to/pg_controldata

gdb> run /apps/database/postgresql-data

(wait

for it to report segfault)

gdb> bt


Since I now have gdb, I can do that:

$ gdb /usr/pgsql-9.4/bin/pg_controldata

-bash: gdb: command not found

-bash-4.1$ gdb /usr/pgsql-9.4/bin/pg_controldata

GNU gdb (GDB) Red Hat Enterprise Linux (7.2-90.el6)

Copyright (C) 2010 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-redhat-linux-gnu".

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>...

Reading symbols from /usr/pgsql-9.4/bin/pg_controldata...(no debugging
symbols found)...done.

Missing separate debuginfos, use: debuginfo-install
postgresql94-server-9.4.11-1PGDG.rhel6.x86_64

(gdb) run /var/lib/pgsql/9.4/data

Starting program: /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/9.4/data

WARNING: Calculated CRC checksum does not match value stored in file.

Either the file is corrupt, or it has a different layout than this program

is expecting. The results below are untrustworthy.



Program received signal SIGSEGV, Segmentation fault.

0x00000033d20a3a15 in __strftime_internal () from /lib64/libc.so.6

(gdb) bt

#0 0x00000033d20a3a15 in __strftime_internal () from /lib64/libc.so.6

#1 0x00000033d20a5a36 in strftime_l () from /lib64/libc.so.6

#2 0x00000000004015c7 in ?? ()

#3 0x00000033d201ed1d in __libc_start_main () from /lib64/libc.so.6

#4 0x0000000000401349 in ?? ()

#5 0x00007fffffffe518 in ?? ()

#6 0x000000000000001c in ?? ()

#7 0x0000000000000002 in ?? ()

#8 0x00007fffffffe751 in ?? ()

#9 0x00007fffffffe773 in ?? ()

#10 0x0000000000000000 in ?? ()

(gdb)


pg_controldata shouldn't be core dumping.


Should I give up trying to use 9.4.1 and 9.4.11 as master/slave?

My options appear to be

1 upgrade the master to 9.4.11, which will be VERY DIFFICULT given its
Puppet install, and the difficulty I have getting root access to our
servers.

2 Downgrade the slave. This is easier than option 1, but I would need to
find a yum repo that has that version.

3 Make what I have work, somehow.

Any assistance would be greatly appreciated!

--

Richard Brosnahan



--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

Previous
From: Hannes Erven
Date:
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries
Next
From: Rakesh Kumar
Date:
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries