Thread: BUG #15638: pg_basebackup with --wal-method=stream incorrectly generates WAL segment created during backup
BUG #15638: pg_basebackup with --wal-method=stream incorrectly generates WAL segment created during backup
The following bug has been logged on the website: Bug reference: 15638 Logged by: Andre Piwoni Email address: apiwoni@webmd.net PostgreSQL version: 10.5 Operating system: CentOS 7.3 Description: When new slave is created by taking base backup from the primary using pg_basebackup with --wal-method=stream option the WAL file generated during the backup is different (as compared with diff or cmp command) than that on the master and in WAL archive directory. Furthermore, this file does not exist in pg_wal/archive_status with .done extension on new slave, though it exists in pg_wal directory, resulting in failed attempt to archive this file when slave node is promoted as master node. 2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was: test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002 pg_basebackup with --wal-method=fetch option correctly creates this WAL segment on slave in pg_wal/archive_status with .done extension and in pg_wal directory and its contents are equivalent to those on the old primary and in WAL archive directory. Here are contents of old_primary, new_primary and WAL archive: OLD PRIMARY /var/lib/pgsql/10/data/pg_wal/: -rw-------. 1 postgres postgres 16777216 Feb 15 13:39 000000010000000000000001 -rw-------. 1 postgres postgres 16777216 Feb 15 13:40 000000010000000000000002 -rw-------. 1 postgres postgres 302 Feb 15 13:40 000000010000000000000002.00000028.backup -rw-------. 1 postgres postgres 16777216 Feb 15 13:40 000000010000000000000003 drwx------. 2 postgres postgres 133 Feb 15 13:40 archive_status /var/lib/pgsql/10/data/pg_wal/archive_status: -rw-------. 1 postgres postgres 0 Feb 15 13:39 000000010000000000000001.done -rw-------. 1 postgres postgres 0 Feb 15 13:40 000000010000000000000002.00000028.backup.done -rw-------. 1 postgres postgres 0 Feb 15 13:40 000000010000000000000002.done WAL ARCHIVE -rw-rw-rw-. 1 root root 16777216 Feb 15 13:39 000000010000000000000001 -rw-rw-rw-. 1 root root 16777216 Feb 15 13:40 000000010000000000000002 -rw-rw-rw-. 1 root root 302 Feb 15 13:40 000000010000000000000002.00000028.backup NEW PRIMARY (created with --wal-method=stream) /var/lib/pgsql/10/data/pg_wal/: -rw-------. 1 postgres root 16777216 Feb 15 13:40 000000010000000000000002 -rw-------. 1 postgres postgres 16777216 Feb 15 13:40 000000010000000000000003 drwx------. 2 postgres root 6 Feb 15 13:40 archive_status /var/lib/pgsql/10/data/pg_wal/archive_status: total 0 Please note lack of 000000010000000000000002.done file in /var/lib/pgsql/10/data/pg_wal/archive_status. When using backup with --wal-method=fetch this file is created in /var/lib/pgsql/10/data/pg_wal/archive_status and NEW_PRIMARY won't try to archive it, which would fail because it has been archived when NEW_PRIMARY was created as slave from OLD_PRIMARY. Consequence of all this is that failover does not work correctly.
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On Sat, Feb 16, 2019 at 12:26:13AM +0000, PG Bug reporting form wrote: > When new slave is created by taking base backup from the primary using > pg_basebackup with --wal-method=stream option the WAL file generated during > the backup is different (as compared with diff or cmp command) than that on > the master and in WAL archive directory. Furthermore, this file does not > exist in pg_wal/archive_status with .done extension on new slave, though it > exists in pg_wal directory, resulting in failed attempt to archive this file > when slave node is promoted as master node. > 2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was: > test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp > pg_wal/000000010000000000000002 > /mnt/pgsql/archive/000000010000000000000002 How do you promote your standby? In Postgres 10, the last, partial WAL segment of a past timeline generated at promotion is renamed .partial to avoid any conflicts, so as this should normally not happen if you do not use archive_mode = always. Please note that your archive command is not safe. For one, it does not sync the archived segment before archive_command returns to the backend.. -- Michael
Attachment
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On Sat, Feb 16, 2019 at 12:26:13AM +0000, PG Bug reporting form wrote:
> When new slave is created by taking base backup from the primary using
> pg_basebackup with --wal-method=stream option the WAL file generated during
> the backup is different (as compared with diff or cmp command) than that on
> the master and in WAL archive directory. Furthermore, this file does not
> exist in pg_wal/archive_status with .done extension on new slave, though it
> exists in pg_wal directory, resulting in failed attempt to archive this file
> when slave node is promoted as master node.
> 2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was:
> test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp
> pg_wal/000000010000000000000002
> /mnt/pgsql/archive/000000010000000000000002
How do you promote your standby? In Postgres 10, the last, partial
WAL segment of a past timeline generated at promotion is renamed
.partial to avoid any conflicts, so as this should normally not
happen if you do not use archive_mode = always.
Please note that your archive command is not safe. For one, it does
not sync the archived segment before archive_command returns to the
backend..
--
Michael
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On 2/16/19 6:25 PM, Michael Paquier wrote:
On Sat, Feb 16, 2019 at 12:26:13AM +0000, PG Bug reporting form wrote:When new slave is created by taking base backup from the primary using pg_basebackup with --wal-method=stream option the WAL file generated during the backup is different (as compared with diff or cmp command) than that on the master and in WAL archive directory. Furthermore, this file does not exist in pg_wal/archive_status with .done extension on new slave, though it exists in pg_wal directory, resulting in failed attempt to archive this file when slave node is promoted as master node. 2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was: test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002How do you promote your standby? In Postgres 10, the last, partial WAL segment of a past timeline generated at promotion is renamed .partial to avoid any conflicts, so as this should normally not happen if you do not use archive_mode = always.
This issue concerns only segment that contains wal records from pg_basebackup. For example, I have reproduced this issue on master branch getting the following content of archive directory after standby promoting:
000000010000000000000001.node1
000000010000000000000002.00000028.backup.node1
000000010000000000000002.node1
000000010000000000000002.node2
000000010000000000000003.node1
000000010000000000000004.node1
000000010000000000000005.partial.node2
000000020000000000000005.node2
00000002.history.node2
Each wal segment specifically was appended with .<node> suffix to distinguish its source. Initially the node1 was master, node2 - standby. The segment 000000010000000000000002 has the following content:
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000028, prev 0/015DB060, desc: RUNNING_XACTS nextXid 474 latestCompletedXid 473 oldestRunningXid 474
rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 0/02000060, prev 0/02000028, desc: CHECKPOINT_ONLINE redo 0/2000028; tli 1; prev tli 1; fpw true; xid 0:474; oid 12690; multi 1; offset 0; oldest xid 467 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 474; online
rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn: 0/020000D0, prev 0/02000060, desc: BACKUP_END 0/2000028
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/020000F8, prev 0/020000D0, desc: SWITCH
On the current state of master branch the content of two segments are the same:
$ md5sum 000000010000000000000002.node1 000000010000000000000002.node2
252e3e8cf3a85f218bb467005c565e3c 000000010000000000000002.node1
252e3e8cf3a85f218bb467005c565e3c 000000010000000000000002.node2
But on PG 10.6 the contents are differentiated on the tails:
$ cmp 000000010000000000000002.node1 000000010000000000000002.node2
000000010000000000000002.node1 000000010000000000000002.node2 differ: byte 131073, line 1
while logical part (prefix) are the same - the lsn of last record (SWITCH) in this segment is 0/02000130 and have length 24 bytes.
Is it correct behavior? What would be the correct canonical representation of archive command taking into account this issue?
-- Regards, Maksim Milyutin
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On 2/16/19 6:25 PM, Michael Paquier wrote:
On Sat, Feb 16, 2019 at 12:26:13AM +0000, PG Bug reporting form wrote:When new slave is created by taking base backup from the primary using pg_basebackup with --wal-method=stream option the WAL file generated during the backup is different (as compared with diff or cmp command) than that on the master and in WAL archive directory. Furthermore, this file does not exist in pg_wal/archive_status with .done extension on new slave, though it exists in pg_wal directory, resulting in failed attempt to archive this file when slave node is promoted as master node. 2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was: test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002How do you promote your standby? In Postgres 10, the last, partial WAL segment of a past timeline generated at promotion is renamed .partial to avoid any conflicts, so as this should normally not happen if you do not use archive_mode = always.
This issue concerns only segment that contains wal records from pg_basebackup. For example, I have reproduced this issue on master branch getting the following content of archive directory after standby promoting:
000000010000000000000001.node1
000000010000000000000002.00000028.backup.node1
000000010000000000000002.node1
000000010000000000000002.node2
000000010000000000000003.node1
000000010000000000000004.node1
000000010000000000000005.partial.node2
000000020000000000000005.node2
00000002.history.node2Each wal segment specifically was appended with .<node> suffix to distinguish its source. Initially the node1 was master, node2 - standby. The segment 000000010000000000000002 has the following content:
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000028, prev 0/015DB060, desc: RUNNING_XACTS nextXid 474 latestCompletedXid 473 oldestRunningXid 474
rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 0/02000060, prev 0/02000028, desc: CHECKPOINT_ONLINE redo 0/2000028; tli 1; prev tli 1; fpw true; xid 0:474; oid 12690; multi 1; offset 0; oldest xid 467 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 474; online
rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn: 0/020000D0, prev 0/02000060, desc: BACKUP_END 0/2000028
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/020000F8, prev 0/020000D0, desc: SWITCH
On the current state of master branch the content of two segments are the same:
$ md5sum 000000010000000000000002.node1 000000010000000000000002.node2
252e3e8cf3a85f218bb467005c565e3c 000000010000000000000002.node1
252e3e8cf3a85f218bb467005c565e3c 000000010000000000000002.node2But on PG 10.6 the contents are differentiated on the tails:
$ cmp 000000010000000000000002.node1 000000010000000000000002.node2
000000010000000000000002.node1 000000010000000000000002.node2 differ: byte 131073, line 1
while logical part (prefix) are the same - the lsn of last record (SWITCH) in this segment is 0/02000130 and have length 24 bytes.
Is it correct behavior? What would be the correct canonical representation of archive command taking into account this issue?
-- Regards, Maksim Milyutin
Andre Piwoni
Sr. Software Developer, BI/Database
WebMD Health Services
Mobile: 801.541.4722
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
I call pg_ctl -D /var/lib/pgsql/10/data promote to upgrade slave to master when failover happens. archive_mode is set to "on" and not "always".I repoint slave to the master by stopping it, updating recovery.conf and restarting it. Let me know if I'm doing it wrong.I think this problem is created before promotion when new slave is created using pg_basebackup with --wal-method=stream and manifests when actual promotion happens.What I'm trying to say it does not seem that .partial extension is the issue here but lack of .done extension.Wouldn't you agree that given directory contents below, when failover happens and new slave is promoted to the primary then with fetch mode master would should not archive 000000010000000000000002 segment because it is marked as .done but in stream method segment is not marked as .done so it would be attempted to be archived again? This obviously fails because segment has been archived by initial master. If you disagree, can you explain the purpose behind .done extension in pg_wal/archive_status directory?Below are contents of directories after base backup but before promoting new slave to master status..PRIMARY/var/lib/pgsql/10/data/pg_wal/:total 49156-rw-------. 1 postgres postgres 16777216 Feb 15 13:39 000000010000000000000001-rw-------. 1 postgres postgres 16777216 Feb 15 13:40 000000010000000000000002-rw-------. 1 postgres postgres 302 Feb 15 13:40 000000010000000000000002.00000028.backup-rw-------. 1 postgres postgres 16777216 Feb 15 13:40 000000010000000000000003drwx------. 2 postgres postgres 133 Feb 15 13:40 archive_status/var/lib/pgsql/10/data/pg_wal/archive_status:total 0-rw-------. 1 postgres postgres 0 Feb 15 13:39 000000010000000000000001.done-rw-------. 1 postgres postgres 0 Feb 15 13:40 000000010000000000000002.00000028.backup.done-rw-------. 1 postgres postgres 0 Feb 15 13:40 000000010000000000000002.doneWAL ARCHIVE-rw-rw-rw-. 1 root root 16777216 Feb 15 13:39 000000010000000000000001-rw-rw-rw-. 1 root root 16777216 Feb 15 13:40 000000010000000000000002-rw-rw-rw-. 1 root root 302 Feb 15 13:40 000000010000000000000002.00000028.backupNEW SLAVE (stream method)-rw-------. 1 postgres root 16777216 Feb 15 13:40 000000010000000000000002-rw-------. 1 postgres postgres 16777216 Feb 15 13:40 000000010000000000000003drwx------. 2 postgres root 6 Feb 15 13:40 archive_status/var/lib/pgsql/10/data/pg_wal/archive_status:total 0NEW SLAVE (stream method)-rw-------. 1 postgres root 16777216 Feb 15 13:40 000000010000000000000002-rw-------. 1 postgres postgres 16777216 Feb 15 13:40 000000010000000000000003drwx------. 2 postgres root 6 Feb 15 13:40 archive_status/var/lib/pgsql/10/data/pg_wal/archive_status:total 0NEW SLAVE (fetch method)-rw-------. 1 postgres root 16777216 Feb 15 13:24 000000010000000000000002-rw-------. 1 postgres postgres 16777216 Feb 15 13:24 000000010000000000000003drwx------. 2 postgres root 43 Feb 15 13:24 archive_status/var/lib/pgsql/10/data/pg_wal/archive_status:total 0-rw-------. 1 postgres root 0 Feb 15 13:24 000000010000000000000002.doneEventual failover(s) produced this structure on master:/var/lib/pgsql/10/data/pg_wal/archive_status:total 0-rw-------. 1 postgres postgres 0 Feb 15 14:15 000000010000000000000002.readyHence attempt to archive again:2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was: test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002On Mon, Feb 18, 2019 at 9:03 AM Maksim Milyutin <milyutinma@gmail.com> wrote:On 2/16/19 6:25 PM, Michael Paquier wrote:
On Sat, Feb 16, 2019 at 12:26:13AM +0000, PG Bug reporting form wrote:When new slave is created by taking base backup from the primary using pg_basebackup with --wal-method=stream option the WAL file generated during the backup is different (as compared with diff or cmp command) than that on the master and in WAL archive directory. Furthermore, this file does not exist in pg_wal/archive_status with .done extension on new slave, though it exists in pg_wal directory, resulting in failed attempt to archive this file when slave node is promoted as master node. 2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was: test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002How do you promote your standby? In Postgres 10, the last, partial WAL segment of a past timeline generated at promotion is renamed .partial to avoid any conflicts, so as this should normally not happen if you do not use archive_mode = always.
This issue concerns only segment that contains wal records from pg_basebackup. For example, I have reproduced this issue on master branch getting the following content of archive directory after standby promoting:
000000010000000000000001.node1
000000010000000000000002.00000028.backup.node1
000000010000000000000002.node1
000000010000000000000002.node2
000000010000000000000003.node1
000000010000000000000004.node1
000000010000000000000005.partial.node2
000000020000000000000005.node2
00000002.history.node2Each wal segment specifically was appended with .<node> suffix to distinguish its source. Initially the node1 was master, node2 - standby. The segment 000000010000000000000002 has the following content:
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000028, prev 0/015DB060, desc: RUNNING_XACTS nextXid 474 latestCompletedXid 473 oldestRunningXid 474
rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 0/02000060, prev 0/02000028, desc: CHECKPOINT_ONLINE redo 0/2000028; tli 1; prev tli 1; fpw true; xid 0:474; oid 12690; multi 1; offset 0; oldest xid 467 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 474; online
rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn: 0/020000D0, prev 0/02000060, desc: BACKUP_END 0/2000028
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/020000F8, prev 0/020000D0, desc: SWITCH
On the current state of master branch the content of two segments are the same:
$ md5sum 000000010000000000000002.node1 000000010000000000000002.node2
252e3e8cf3a85f218bb467005c565e3c 000000010000000000000002.node1
252e3e8cf3a85f218bb467005c565e3c 000000010000000000000002.node2But on PG 10.6 the contents are differentiated on the tails:
$ cmp 000000010000000000000002.node1 000000010000000000000002.node2
000000010000000000000002.node1 000000010000000000000002.node2 differ: byte 131073, line 1
while logical part (prefix) are the same - the lsn of last record (SWITCH) in this segment is 0/02000130 and have length 24 bytes.
Is it correct behavior? What would be the correct canonical representation of archive command taking into account this issue?
-- Regards, Maksim Milyutin--Andre Piwoni
Sr. Software Developer, BI/Database
WebMD Health Services
Mobile: 801.541.4722
Andre Piwoni
Sr. Software Developer, BI/Database
WebMD Health Services
Mobile: 801.541.4722
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On Tue, Feb 19, 2019 at 09:25:57AM -0800, Andre Piwoni wrote: > I call pg_ctl -D /var/lib/pgsql/10/data promote to upgrade slave to master > when failover happens. archive_mode is set to "on" and not "always". > I repoint slave to the master by stopping it, updating recovery.conf and > restarting it. Let me know if I'm doing it wrong. As long as you stop the primary cleanly (stop or smart mode) so as the primary has the possibility to send its shutdown checkpoint record to the standby and makes sure that the standby has flushed the record, that's safe. > I think this problem is created before promotion when new slave is created > using pg_basebackup with --wal-method=stream and manifests when actual > promotion happens. > What I'm trying to say it does not seem that .partial extension is the > issue here but lack of .done extension. Well, sure. If you begin by reusing an old backup, you have a risk to potentially archive the same segment multiple times if you use the same archive location for all your servers. Since 9.5 this can get even more complication as archive_mode has gained an "always" mode which makes also standbys archive segments while in recovery to give the users a switch for more archiving redundancy, which is useful when working with async standbys across multiple sites. My point is that this stuff has always worked this way. And people around do not actually complain about the difference made for archive_status/ when using the stream of fetch methods with pg_basebackup. From what I can see as well, your archive_command is actually unsafe on many points, so my take is that you should more carefully design it, or rely on an existing backup solution developed by experts in PostgreSQL backups. And no, it is not safe to change a behavior that other people may rely heavily on for their solutions since pg_basebackup got smarter with its stream mode. -- Michael
Attachment
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On Tue, Feb 19, 2019 at 09:25:57AM -0800, Andre Piwoni wrote:
> I call pg_ctl -D /var/lib/pgsql/10/data promote to upgrade slave to master
> when failover happens. archive_mode is set to "on" and not "always".
> I repoint slave to the master by stopping it, updating recovery.conf and
> restarting it. Let me know if I'm doing it wrong.
As long as you stop the primary cleanly (stop or smart mode) so as the
primary has the possibility to send its shutdown checkpoint record to
the standby and makes sure that the standby has flushed the record,
that's safe.
> I think this problem is created before promotion when new slave is created
> using pg_basebackup with --wal-method=stream and manifests when actual
> promotion happens.
> What I'm trying to say it does not seem that .partial extension is the
> issue here but lack of .done extension.
Well, sure. If you begin by reusing an old backup, you have a risk to
potentially archive the same segment multiple times if you use the
same archive location for all your servers. Since 9.5 this can get
even more complication as archive_mode has gained an "always" mode
which makes also standbys archive segments while in recovery to give
the users a switch for more archiving redundancy, which is useful when
working with async standbys across multiple sites. My point is that
this stuff has always worked this way. And people around do not
actually complain about the difference made for archive_status/ when
using the stream of fetch methods with pg_basebackup. From what I can
see as well, your archive_command is actually unsafe on many points,
so my take is that you should more carefully design it, or rely on an
existing backup solution developed by experts in PostgreSQL backups.
And no, it is not safe to change a behavior that other people may rely
heavily on for their solutions since pg_basebackup got smarter with
its stream mode.
--
Michael
Andre Piwoni
Sr. Software Developer, BI/Database
WebMD Health Services
Mobile: 801.541.4722
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On Wed, Feb 20, 2019 at 09:58:12AM -0800, Andre Piwoni wrote: > Also, what do you mean that my archive command is not safe? It is straight > from PostrgreSQL docs and other resources: > https://www.postgresql.org/docs/10/continuous-archiving.html > https://www.opsdash.com/blog/postgresql-wal-archiving-backup.html The docs are not the best thing on this side... One immediate problem one could note is that cp does not flush the segment. So if your host crashes before the OS flushes the copied data then the WAL segment data is lost. The flush needs to happen before the archive command returns its status to the server. > It seems you agree that segment created during pg_basebackup with stream > mode, which is not marked with .done extension, would be attempted to be > archived again. I also understand stream method during backup is default so > changing this behavior may have an impact. If there's any free solution you > could recommend for backup strategy please let me know. pgBackRest and repmgr are tools mentioned on the lists. -- Michael
Attachment
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On 2/19/19 9:08 PM, Andre Piwoni wrote:
By the way, you can assert this structure, including now explained diff in WAL segment taken during backup, for fetch vs stream WAL method by creating master and then creating new slave by taking base backup using both methods:pg_basebackup -h new_slave -D /var/lib/pgsql/10/data -v -P -R -U replication --wal-method=streampg_basebackup -h new_slave -D /var/lib/pgsql/10/data -v -P -R -U replication --wal-method=fetch
I have verified on different versions of PostgreSQL the identity of conflict segments received from master node and from pg_basebackup --wal-method=stream. On PG11+ they are binary identical. However on PG10 and lower the segments are binary different. The logical content are the same while meaningless tails differ.
In case of pg_basebackup --wal-method=fetch the segments are entirely the same. And this case doesn't produce any problem in archiving.
I think the backup tools as well as custom archive scripts have to take account of these points.
-- Regards, Maksim Milyutin
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On 2/19/19 9:08 PM, Andre Piwoni wrote:
By the way, you can assert this structure, including now explained diff in WAL segment taken during backup, for fetch vs stream WAL method by creating master and then creating new slave by taking base backup using both methods:pg_basebackup -h new_slave -D /var/lib/pgsql/10/data -v -P -R -U replication --wal-method=streampg_basebackup -h new_slave -D /var/lib/pgsql/10/data -v -P -R -U replication --wal-method=fetch
I have verified on different versions of PostgreSQL the identity of conflict segments received from master node and from pg_basebackup --wal-method=stream. On PG11+ they are binary identical. However on PG10 and lower the segments are binary different. The logical content are the same while meaningless tails differ.
In case of pg_basebackup --wal-method=fetch the segments are entirely the same. And this case doesn't produce any problem in archiving.
I think the backup tools as well as custom archive scripts have to take account of these points.-- Regards, Maksim Milyutin
Andre Piwoni
Sr. Software Developer, BI/Database
WebMD Health Services
Mobile: 801.541.4722
Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectlygenerates WAL segment created during backup
On 2/21/19 9:57 PM, Andre Piwoni wrote: > Were you able to confirm that with fetch method slave created with > --wal-method=fetch had record in pg_wal/archive_status directory with > .done extension after running pg_basebackup and no record with > --wal-method=stream? I'm also wondering if PG11 has the same behavior. Yes, it's so. Such behavior is also on PG11 and master branch. -- Regards, Maksim Milyutin