Re: [GENERAL] PostgreSQL mirroring from RPM install to RPMinstall-revisited - Mailing list pgsql-general

From Richard Brosnahan
Subject Re: [GENERAL] PostgreSQL mirroring from RPM install to RPMinstall-revisited
Date
Msg-id 856A7E1A-1F59-4237-A2E3-9E49242BFC68@mac.com
Whole thread Raw
In response to Re: [GENERAL] PostgreSQL mirroring from RPM install to RPMinstall-revisited  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Hi again Adrian,

Facepalm...

The master server was not installed by me. I was assured by the installer guy that it was version 9.4.1 and 64 bit. 

Facepalm... I managed to get enough access to that server to discover they had installed the 32 bit version of PostgreSQL. Who knows why? This explains everything about my issues with the 64 bit PostgreSQL on the slave. It's difficult to get access to our servers, so try not to blame me and think "Why didn't he do that first?" Still, I should have tried harder to get access.

In the PostgreSQL documentation, it clearly states that the two servers have to be the same architecture (both 32 bit or both 64 bit). Further, when Google searching for the errors I see, I find a number of people with similar issues, and they were fighting with 32 bit vs 64 bit PostgreSQLs. 

I wasted a LOT of time trying to track this down. I'm sorry I wasted other people's time too. 

Anyhow, I uninstalled PostgreSQL on the slave, and reinstalled the 32 bit version. Then I followed the instructions for setting up the slave, and it all works.

Plenty to do, including setting up proper monitoring, and documentation. It's great we have a hot standby, but if nobody knows how to use it in case the master goes away, it's not so great. 

THANK YOU for your assistance!


On Feb 17, 2017, at 10:43 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 02/16/2017 04:39 PM, Richard Brosnahan wrote:
Hi all,

Way back in December I posted a question about mirroring from an RPM
installed PostgreSQL (binary) to a source built PostgreSQL, with the
same version (9.4.1 --> 9.4.1). Both servers are running OEL6.

I went back to the previous threads and I could not find if you ever said whether the two systems are using the same hardware architecture or not? Vincent Veyron asked but I can't find a response.


I won't copy the entire thread from before, as the situation has changed
a bit. The biggest changes are that I have root on the slave,
temporarily, and I've installed PostgreSQL on the slave using yum (also
binary).

I've followed all the instructions found here:

https://www.postgresql.org/docs/9.4/static/warm-standby.html#STREAMING-REPLICATION


The slave is running PostgreSQL 9.4.11 and was installed using yum.
It runs fine after I've run initdb and set things up. The master was
also installed from rpm binaries, but the installers used Puppet. That
version is 9.4.1. Yes, I know I should be using the exact same version,
but I couldn't find 9.4.1 in the PostgreSQL yum repo.


When I replace its data directory as part of the mirroring instructions,
using pg_basebackup, PostgreSQL won't start. I used pg_basebackup.


I get a checksum error, from pg_ctl.

2016-12-15 08:27:14.520 PST >FATAL: incorrect checksum in control file


Previously, Tom Lane suggested I try this:

You could try using pg_controldata to compare the pg_control contents;

it should be willing to print field values even if it thinks the checksum

is bad. It would be interesting to see (a) what the master's

pg_controldata prints about its pg_control, (b) what the slave's

pg_controldata prints about pg_control from a fresh initdb there, and

(c) what the slave's pg_controldata prints about the copied pg_control.


For Tom's requests (a and b), I can provide good output from
pg_controldata from the master with production data, and from the slave
right after initdb. I'll provide that on request.


for Tom's request (c) I get this from the slave, after data is copied.

$ pg_controldata

WARNING: Calculated CRC checksum does not match value stored in file.

Either the file is corrupt, or it has a different layout than this program

is expecting.  The results below are untrustworthy.


Segmentation fault (core dumped)


With this new installation on the slave, same result. core dump


Tom Lane then suggested:

$ gdb path/to/pg_controldata

gdb> run /apps/database/postgresql-data

(wait

for it to report segfault)

gdb> bt


Since I now have gdb, I can do that:

$ gdb /usr/pgsql-9.4/bin/pg_controldata

-bash: gdb: command not found

-bash-4.1$ gdb /usr/pgsql-9.4/bin/pg_controldata

GNU gdb (GDB) Red Hat Enterprise Linux (7.2-90.el6)

Copyright (C) 2010 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-redhat-linux-gnu".

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>...

Reading symbols from /usr/pgsql-9.4/bin/pg_controldata...(no debugging
symbols found)...done.

Missing separate debuginfos, use: debuginfo-install
postgresql94-server-9.4.11-1PGDG.rhel6.x86_64

(gdb) run /var/lib/pgsql/9.4/data

Starting program: /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/9.4/data

WARNING: Calculated CRC checksum does not match value stored in file.

Either the file is corrupt, or it has a different layout than this program

is expecting.  The results below are untrustworthy.



Program received signal SIGSEGV, Segmentation fault.

0x00000033d20a3a15 in __strftime_internal () from /lib64/libc.so.6

(gdb) bt

#0  0x00000033d20a3a15 in __strftime_internal () from /lib64/libc.so.6

#1  0x00000033d20a5a36 in strftime_l () from /lib64/libc.so.6

#2  0x00000000004015c7 in ?? ()

#3  0x00000033d201ed1d in __libc_start_main () from /lib64/libc.so.6

#4  0x0000000000401349 in ?? ()

#5  0x00007fffffffe518 in ?? ()

#6  0x000000000000001c in ?? ()

#7  0x0000000000000002 in ?? ()

#8  0x00007fffffffe751 in ?? ()

#9  0x00007fffffffe773 in ?? ()

#10 0x0000000000000000 in ?? ()

(gdb)


pg_controldata shouldn't be core dumping.


Should I give up trying to use 9.4.1 and 9.4.11 as master/slave?

My options appear to be

1 upgrade the master to 9.4.11, which will be VERY DIFFICULT given its
Puppet install, and the difficulty I have getting root access to our
servers.

2 Downgrade the slave. This is easier than option 1, but I would need to
find a yum repo that has that version.

3 Make what I have work, somehow.

Any assistance would be greatly appreciated!

--

Richard Brosnahan



-- 
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

Previous
From: Arnold Somogyi
Date:
Subject: [GENERAL] Multiply ON CONFLICT ON CONSTRAINT
Next
From: Scott Marlowe
Date:
Subject: Re: [GENERAL] Autovacuum stuck for hours, blocking queries