Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly - Mailing list pgsql-hackers

From Tsunakawa, Takayuki
Subject Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
Date
Msg-id 0A3221C70F24FB45833433255569204D1F656653@G01JPEXMBYT05
Whole thread Raw
In response to Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
List pgsql-hackers
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Amit Kapila
> I have tried using attached script multiple times on latest 9.2 code, but
> couldn't reproduce the issue.  Please find the log attached with this mail.
> Apart from log file, below prints appear:
> 
> WARNING: enabling "trust" authentication for local connections You can
> change this by editing pg_hba.conf or using the option -A, or --auth-local
> and --auth-host, the next time you run initdb.
> 20075/20075 kB (100%), 1/1 tablespace
> NOTICE:  pg_stop_backup complete, all required WAL segments have been
> archived
> 20079/20079 kB (100%), 1/1 tablespace
> 
> Let me know, if some parameters need to be tweaked to reproduce the issue?
> 
> 
> It seems that the patch proposed is good, but it is better if somebody other
> than you can reproduce the issue and verify if the patch fixes the same.
> 

Thank you for reviewing the code and testing.  Hmm, we could reproduce the problem on PostgreSQL 9.2.19.  The script's
stdoutis attached as test.log, and the stderr is as follows:
 

WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the
option-A, or --auth-local and --auth-host, the next time you run initdb.
 
20099/20099 kB (100%), 1/1 tablespace
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
20103/20103 kB (100%), 1/1 tablespace

The sizes pg_basebackup outputs is a bit different from yours.  I don't see a reason for this.  The test script
explicitlyspecifies the database encoding and locale, so the encoding difference doesn't seem to be the cause.  The
targetproblem occurs only when a WAL record crosses a WAL segment boundary, so subtle change in WAL record volume would
preventthe problem from happening.
 

Anyway, could you retry with the attached test.sh?  It just changes restore_command.

If the problem occurs, the following pair of lines appear in the server log of the cascading standby.  Could you check
it?

LOG:  restored log file "000000020000000000000003" from archive
LOG:  out-of-sequence timeline ID 1 (after 2) in log file 0, segment 3, offset 0

Regards
Takayuki Tsunakawa



Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Next
From: Dilip Kumar
Date:
Subject: Re: Parallel bitmap heap scan