Re: Streaming Replication Error - Mailing list pgsql-admin

From Jeff Janes
Subject Re: Streaming Replication Error
Date
Msg-id CAMkU=1yxVevNqs6rvWAyt5Le+OVM8ZdWHJ0MWKguMmrvnYgpLg@mail.gmail.com
Whole thread Raw
In response to RE: Streaming Replication Error  (<soumik.bhattacharjee@kpn.com>)
Responses RE: Streaming Replication Error  (<soumik.bhattacharjee@kpn.com>)
List pgsql-admin
On Fri, Dec 20, 2019 at 2:12 PM <soumik.bhattacharjee@kpn.com> wrote:

-          Did a pg_restore on MASTER (The existing instance - MASTER)

-          The SLAVE  has all the data after the restore was done in MASTER – both were in sync.

How did you determine that it had all the data?  I suspect that they fell out of sync towards the end of the pg_restore, and your method just couldn't detect this fact.  At least, I can't think of anything which would cause them to lose sync exactly at the end of pg_restore.  Maybe it was just that the first checkpoint after pg_restore was finished caused the necessary WAL files to be recycled.  It could have just as easily been a checkpoint running during the pg_restore which caused the problem, but by luck it was not.
 

Please let me know if I did something wrong above step highlighted ?


I don't think you did anything objectively wrong.  You could argue that using wal_keep_segments rather than a replication slot was wrong, or you could say it was not wrong but just a calculated risk.  In this case, it seems the risk was realized.  Using a replication slot would also be a risk, the risk in that case being that the streaming to replica can't keep up, and so pg_wal fills up to capacity and crashes the master.  You have to decide what risk you would rather take.
 

Does that mean I cannot refresh the MASTER anytime which should replicate to SLAVE?


Usually the master is your production server.  Why would you be refreshing it?  Where would you be refreshing it from?  What other server exists that contains a higher level of truth than what your master production server already has?

You can use a replication slot, you can increase wal_keep_segments to a larger value (although there is no way to know with certainty ahead of time what value will be large enough), or you can just deal with the risk that your replica may occasionally lose sync and need to be recreated.  You might also be able to change the topology so your current replica and current master both stream from the higher-source-of-truth server, rather than cascading changes, first logically and then physically. There is no correct answer, you have to understand and weigh the balance of risks for yourself.

Cheers,

Jeff

pgsql-admin by date:

Previous
From:
Date:
Subject: RE: Streaming Replication Error
Next
From:
Date:
Subject: RE: Streaming Replication Error