Re: Race condition in recovery? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Race condition in recovery?
Date
Msg-id 20210524.134709.805985657416573716.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Race condition in recovery?
List pgsql-hackers
At Sun, 23 May 2021 21:37:58 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Sun, May 23, 2021 at 2:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sat, May 22, 2021 at 8:33 PM Robert Haas <robertmhaas@gmail.com> wrote:
> 
> I have created a tap test based on Robert's test.sh script.  It
> reproduces the issue.  I am new with perl so this still needs some
> cleanup/improvement, but at least it shows the idea.

I'm not sure I'm following the discussion here, however, if we were
trying to reproduce Dilip's case using base backup, we would need such
a broken archive command if using pg_basebackup witn -Xnone.  Becuase
the current version of pg_basebackup waits for all required WAL
segments to be archived when connecting to a standby with -Xnone.  I
don't bother reconfirming the version that fix took place, but just
using -X stream instead of "none" we successfully miss the first
segment of the new timeline in the upstream archive, though we need to
erase pg_wal in the backup.  Either the broken archive command or
erasing pg_wal of the cascade is required to the behavior to occur.

The attached is how it looks like.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
# Copyright (c) 2021, PostgreSQL Global Development Group

# Minimal test testing streaming replication
use Cwd;
use strict;
use warnings;
use PostgresNode;
use TestLib;
use Test::More tests => 1;

# Initialize primary node
my $node_primary = get_new_node('primary');
# A specific role is created to perform some tests related to replication,
# and it needs proper authentication configuration.
$node_primary->init(allows_streaming => 1);
$node_primary->append_conf(
    'postgresql.conf', qq(
wal_keep_size=128MB
));
$node_primary->start;

my $backup_name = 'my_backup';

# Take backup
$node_primary->backup($backup_name);

my $node_standby_1 = get_new_node('standby_1');
$node_standby_1->init_from_backup($node_primary, $backup_name,
                                  allows_streaming => 1, has_streaming => 1);
my $archivedir_standby_1 = $node_standby_1->archive_dir;
$node_standby_1->append_conf(
    'postgresql.conf', qq(
archive_mode=always
archive_command='cp "%p" "$archivedir_standby_1/%f"'
));
$node_standby_1->start;


# Take backup of standby 1
# NB: Use -Xnone so that pg_wal is empty.
#$node_standby_1->backup($backup_name, backup_options => ['-Xnone']);
$node_standby_1->backup($backup_name);

# Promote the standby.
$node_standby_1->psql('postgres', 'SELECT pg_promote()');

# clean up pg_wal from the backup
my $pgwaldir = $node_standby_1->backup_dir. "/" . $backup_name . "/pg_wal";
opendir my $dh, $pgwaldir or die "failed to open $pgwaldir";
while (my $f = readdir($dh))
{
    unlink("$pgwaldir/$f") if (-f "$pgwaldir/$f");
}
closedir($dh);

# Create cascading standby but don't start it yet.
# NB: Must set up both streaming and archiving.
my $node_cascade = get_new_node('cascade');
$node_cascade->init_from_backup($node_standby_1, $backup_name,
    has_streaming => 1);
$node_cascade->append_conf(
    'postgresql.conf', qq(
restore_command = 'cp "$archivedir_standby_1/%f" "%p"'
log_line_prefix = '%m [%p:%b] %q%a '
archive_mode=off
));


# Start cascade node
$node_cascade->start;

# Create some content on primary and check its presence in standby 1
$node_standby_1->safe_psql('postgres',
    "CREATE TABLE tab_int AS SELECT 1 AS a");

# Wait for standbys to catch up
$node_standby_1->wait_for_catchup($node_cascade, 'replay',
    $node_standby_1->lsn('replay'));

ok(1, 'test'); # it's sucess if we come here.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Move pg_attribute.attcompression to earlier in struct for reduced size?
Next
From: Michael Paquier
Date:
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump