Home > mailing lists

Re: Check that streaming replica received all data after master shutdown - Mailing list pgsql-hackers

From	Sameer Kumar
Subject	Re: Check that streaming replica received all data after master shutdown
Date	January 15, 2015 08:43:59
Msg-id	CADp-Sm7LWp-9Ngk5BHcfR1WSwnTtCNmdFkqROA0pD50fjGF5kw@mail.gmail.com Whole thread Raw
In response to	Re: Check that streaming replica received all data after master shutdown (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: Check that streaming replica received all data after master shutdown
List	pgsql-hackers

Tree view

On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 01/13/2015 12:11 PM, Vladimir Borodin wrote:

05 янв. 2015 г., в 18:15, Vladimir Borodin <root@simply.name> написал(а):

Hi all.

I have a simple script for planned switchover of PostgreSQL (9.3 and 9.4) master to one of its replicas. This script checks a lot of things before doing it and one of them is that all data from master has been received by replica that is going to be promoted. Right now the check is done like below:

On the master:

postgres@pgtest03d ~ $ psql -t -A -c 'select pg_current_xlog_location();'
0/33000090
postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
pg_control version number: 937
Catalog version number: 201306121
Database system identifier: 6061800518091528182
Database cluster state: shut down
pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
Latest checkpoint location: 0/34000028
Prior checkpoint location: 0/33000028
Latest checkpoint's REDO location: 0/34000028
Latest checkpoint's REDO WAL file: 0000001B0000000000000034
Latest checkpoint's TimeLineID: 27
postgres@pgtest03d ~ $

On the replica (after shutdown of master):

postgres@pgtest03g ~ $ psql -t -A -c "select pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/34000028');"
104
postgres@pgtest03g ~ $

These 104 bytes seems to be the size of shutdown checkpoint record (as I can understand from pg_xlogdump output).

postgres@pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump -s 0/33000090 -t 27
rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn: 0/33000090, prev 0/33000028, bkp: 0000, desc: xlog switch
rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 0/34000028, prev 0/33000090, bkp: 0000, desc: checkpoint: redo 0/34000028; tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0; oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
pg_xlogdump: FATAL: error in WAL record at 0/34000028: record with zero length at 0/34000090

postgres@pgtest03g ~/9.3/data/pg_xlog $

I’m not sure that these 104 bytes will always be 104 bytes to have a strict equality while checking. Could it change in the future? Or is there a better way to understand that streaming replica received all data after master shutdown? The check that pg_xlog_location_diff returns 104 bytes seems a bit strange.

Don't rely on it being 104 bytes. It can vary across versions, and across different architectures.

You could simply check that the standby's pg_last_xlog_replay_location() > master's "Latest checkpoint location", and not care about the exact difference.

I believe there were some changes made in v9.3 which will wait for pending WALs to be replicated before a fast and smart shutdown (of master) can close the replication connection.

http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459

pgsql-hackers by date:

From: Peter Geoghegan
Date: 15 January 2015, 08:40:38
Subject: Re: hung backends stuck in spinlock heavy endless loop

From: Alexander Korotkov
Date: 15 January 2015, 09:51:08
Subject: Re: Fillfactor for GIN indexes

Re: Check that streaming replica received all data after master shutdown - Mailing list pgsql-hackers

Previous

Next