Reliable WAL file shipping over unreliable network - Mailing list pgsql-admin

From Nagy László Zsolt
Subject Reliable WAL file shipping over unreliable network
Date
Msg-id dd116172-3ddf-ee56-57f3-6e18d49abf8e@shopzeus.com
Whole thread Raw
Responses RE: Reliable WAL file shipping over unreliable network  (Alvaro Aguayo Garcia-Rada <aaguayo@opensysperu.com>)
Re: Reliable WAL file shipping over unreliable network  (scott ribe <scott_ribe@elevated-dev.com>)
Re: Reliable WAL file shipping over unreliable network  (Ray Stell <stellr@vt.edu>)
Re: Reliable WAL file shipping over unreliable network  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-admin
  Hello!

Let's suppose that a replication master is writting a WAL segment file
into a directory. That directory is mounted on the replication slave. Is
it possible that the slave will try to read a WAL segment file that is
not yet fully flushed to disk? I did not find any requirement in the
official documentation about this. How does it work? Do I have to copy
segments to temp files, and rename them when they are fully flushed to
disk? Or is it okay to have half complete files in the archive dir for a
while?

Actually I cannot mount the slave's archive directory on the mater,
because the network is not reliable. The WAL files need to be copied
from the master to the slave. The network can go down in the middle of a
copy operation. I can write a program that detects this, and retries the
copy when the network comes back. But what happens on the slave side?
What will a replication slave do if it sees a half complete WAL file?

Is there a utility that is widely used for WAL file shipping, and
addresses these problems? I know that PostgreSQL itself does not care
about how the log files are shipped, but I suspect that there are
robust, proven methods for shipping WAL files over unreliable networks.

And finally: if I also enable streaming replication, then it seems that
log file shipping is not needed at all. If I omit archive_command and
restore_command from the configs, and setup the replication slots and
primary_conninfo only, then it seems to be working just fine. But when
the network goes down for a while, then the slave goes out of sync and
it cannot recover. It was not clear for me from the documentation, but
am I right in that I can combine log file shipping with streaming
replication, and achieve small replication delays plus the ability to
recover after a longer period if network outage?

Thanks,

   Laszlo





pgsql-admin by date:

Previous
From: Ricardo Martin Gomez
Date:
Subject: Re: Huge swap usage
Next
From: Alvaro Aguayo Garcia-Rada
Date:
Subject: RE: Reliable WAL file shipping over unreliable network