Re: [HACKERS] Broken hint bits (freeze) - Mailing list pgsql-hackers

From Sergey Burladyan
Subject Re: [HACKERS] Broken hint bits (freeze)
Date
Msg-id 87mv98af8j.fsf@seb.koffice.internal
Whole thread Raw
In response to Re: [HACKERS] Broken hint bits (freeze)  (Bruce Momjian <bruce@momjian.us>)
Responses Re: [HACKERS] Broken hint bits (freeze)
Re: [HACKERS] Broken hint bits (freeze)
List pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:

> !      against the old primary and standby clusters.  Verify that the
> !      <quote>Latest checkpoint location</> values match in all clusters.

For "Log-Shipping only" standby server this cannot be satisfied, because
last WAL from master (with shutdown checkpoint) never archived. For
example (git master):
==== postgresql.conf ===
port = 5430
shared_buffers = 32MB
wal_level = hot_standby
archive_mode = on
archive_command = 'test ! -f "$ARH/%f" && ( echo "arch %p"; cp %p "$ARH/%f"; )'
max_wal_senders = 5
hot_standby = on
log_line_prefix = '%t '
log_checkpoints = on
lc_messages = C
========================

==== pg_control ====
pg_control version number:            1002
Catalog version number:               201705301
Database system identifier:           6432034080221219745
Database cluster state:               shut down
pg_control last modified:             Fri Jun 16 03:57:22 2017
Latest checkpoint location:           0/D000028
Prior checkpoint location:            0/1604878
Latest checkpoint's REDO location:    0/D000028
Latest checkpoint's REDO WAL file:    00000001000000000000000D
====================

==== WALs archive ====
-rw------- 1 sergey users 16777216 Jun 16 03:57 000000010000000000000003
-rw------- 1 sergey users 16777216 Jun 16 03:57 000000010000000000000004
-rw------- 1 sergey users 16777216 Jun 16 03:57 000000010000000000000005
-rw------- 1 sergey users 16777216 Jun 16 03:57 000000010000000000000006
-rw------- 1 sergey users 16777216 Jun 16 03:57 000000010000000000000007
-rw------- 1 sergey users 16777216 Jun 16 03:57 000000010000000000000008
-rw------- 1 sergey users 16777216 Jun 16 03:57 000000010000000000000009
-rw------- 1 sergey users 16777216 Jun 16 03:57 00000001000000000000000A
-rw------- 1 sergey users 16777216 Jun 16 03:57 00000001000000000000000B
-rw------- 1 sergey users 16777216 Jun 16 03:57 00000001000000000000000C
======================

==== logfile ====
arch pg_wal/00000001000000000000000A
arch pg_wal/00000001000000000000000B
2017-06-16 00:57:21 GMT LOG:  received fast shutdown request
2017-06-16 00:57:21 GMT LOG:  aborting any active transactions
2017-06-16 00:57:21 GMT LOG:  shutting down
arch pg_wal/00000001000000000000000C
2017-06-16 00:57:21 GMT LOG:  checkpoint starting: shutdown immediate
2017-06-16 00:57:22 GMT LOG:  checkpoint complete: wrote 4058 buffers (99.1%); 0 WAL file(s) added, 0 removed, 0
recycled;write=0.033 s, sync=0.949 s, total=1.144 s; sync files=32, longest=0.598 s, average=0.029 s; distance=190445
kB,estimate=190445 kB
 
2017-06-16 00:57:22 GMT LOG:  database system is shut down
=================

There is no 00000001000000000000000D in archive and after shutdown,
standby can only be at it previous restartpoint (0/1604878) because it
does not receive latest checkpoint (0/D000028) from master.

So, after shutdown master and "Log-Shipping only" standby, it always "one
checkpoint early" then master and "Latest checkpoint location" never
match for it.

I think this must be mentioned somehow in documentation. 


> !     <para>
> !      Also, if upgrading standby servers, change <varname>wal_level</>
> !      to <literal>replica</> in the <filename>postgresql.conf</> file on
> !      the new cluster.
>       </para>
>      </step>

I am not sure how this help.

wal_level is reset by pg_resetxlog during pg_upgrade, so it does not
depend on postgresql.conf. After pg_upgrade wal_level always is
'minimal', that is why you must start and stop new master before rsync:

==== output ====
$ "$bin"/pg_controldata "$ver" | grep wal_level
wal_level setting:                    replica

$ "$bin"/pg_resetwal "$ver"
Write-ahead log reset

$ "$bin"/pg_controldata "$ver" | grep wal_level
wal_level setting:                    minimal
================

If you rsync standby now (without start/stop new master after pg_upgrade)
you will send pg_control with wal_level=minimal into it and after that
standby abort on startup:
==== standby logfile ====
2017-06-16 01:22:14 GMT LOG:  entering standby mode
2017-06-16 01:22:14 GMT WARNING:  WAL was generated with wal_level=minimal, data may be missing
2017-06-16 01:22:14 GMT HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup.
2017-06-16 01:22:14 GMT FATAL:  hot standby is not possible because wal_level was not set to "replica" or higher on the
masterserver
 
2017-06-16 01:22:14 GMT HINT:  Either set wal_level to "replica" on the master, or turn off hot_standby here.
2017-06-16 01:22:14 GMT LOG:  startup process (PID 27916) exited with exit code 1
=================

PS: Thank you for answer, Bruce!

-- 
Sergey Burladyan



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [HACKERS] logical replication read-only slave
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Get stuck when dropping a subscription duringsynchronizing table