Thread: pg_basebackup on slave running for a long time
Hi Team, We have setup PostgreSQL master-slave topology with Streaming replication setup. One of the steps for setting up streaming replication is to do pg_basebackup on slave from master. For subsequent update of this database, this step is repeated every time, deleting the existing data copy of slave and running pg_basebackup again. For a huge data size of over 500GB, it takes a lot of time to copy the data from master to slave. We were looking for some optimization technique so that it doesnt have to copy the whole data in every update of the system. Is there a way to do that? Can somebody throw some light on this? Subhankar Chattopadhyay Bangalore, India
On 11/20/2016 9:29 PM, Subhankar Chattopadhyay wrote: > We have setup PostgreSQL master-slave topology with Streaming > replication setup. > One of the steps for setting up streaming replication is to do > pg_basebackup on slave from master. > > For subsequent update of this database, this step is repeated every > time, deleting the existing data copy of slave and running > pg_basebackup again. > > For a huge data size of over 500GB, it takes a lot of time to copy > the data from master to slave. > We were looking for some optimization technique so that it doesnt have > to copy the whole data in every update of the system. > > Is there a way to do that? Can somebody throw some light on this? if you have streaming replication, why do you delete it and start over ??!? the streaming replication should replicate all updates of the master in near realtime to the slave(s). -- john r pierce, recycling bits in santa cruz
Hi John, Thanks for reply. In situations where slave is behind master, if I don't start over, will it catch up automatically? I am using 9.4 version. On Mon, Nov 21, 2016 at 11:22 AM, John R Pierce <pierce@hogranch.com> wrote: > On 11/20/2016 9:29 PM, Subhankar Chattopadhyay wrote: >> >> We have setup PostgreSQL master-slave topology with Streaming >> replication setup. >> One of the steps for setting up streaming replication is to do >> pg_basebackup on slave from master. >> >> For subsequent update of this database, this step is repeated every >> time, deleting the existing data copy of slave and running >> pg_basebackup again. >> >> For a huge data size of over 500GB, it takes a lot of time to copy >> the data from master to slave. >> We were looking for some optimization technique so that it doesnt have >> to copy the whole data in every update of the system. >> >> Is there a way to do that? Can somebody throw some light on this? > > > if you have streaming replication, why do you delete it and start over ??!? > > the streaming replication should replicate all updates of the master in near > realtime to the slave(s). > > > -- > john r pierce, recycling bits in santa cruz > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Subhankar Chattopadhyay Bangalore, India
On 11/20/2016 10:00 PM, Subhankar Chattopadhyay wrote: > Thanks for reply. In situations where slave is behind master, if I > don't start over, will it catch up automatically? > I am using 9.4 version. it should stay within a few seconds under normal conditions. why is it falling behind, is your write workload too high for the speed of the connection between the hosts? or is the slave shut down for some period of time? If the slave is intermittently offline, and if you're using replication slots (that was a new feature in 9.4), then the master will hold a queue of pending data as long as is needed until the slave catches up again. Alternately, you can increase wal_keep_segments for a long enough interval to cover the worst case time the slave is offline, or you can implement a wal archive that the slave can recover from when resuming streaming replication. See https://www.postgresql.org/docs/9.4/static/warm-standby.html for discussions of these various options. -- john r pierce, recycling bits in santa cruz
Hi John, We are using the database in a cloud setup and the nodes are running in VMs. The slave can fall behind for various reasons as you have stated, like slave is shut down or high write workload. We don't use replication slot but the wal_keep_segments is high enough for us, 5000 to be exact. In that case, we can do pg_basebackup only when necessary, like only when slave is behind. We can check that from the query "SELECT state FROM pg_stat_replication WHERE pid <> pg_backend_pid();" Will that be correct way to do it? On Mon, Nov 21, 2016 at 11:42 AM, John R Pierce <pierce@hogranch.com> wrote: > On 11/20/2016 10:00 PM, Subhankar Chattopadhyay wrote: >> >> Thanks for reply. In situations where slave is behind master, if I >> don't start over, will it catch up automatically? >> I am using 9.4 version. > > > it should stay within a few seconds under normal conditions. why is it > falling behind, is your write workload too high for the speed of the > connection between the hosts? or is the slave shut down for some period of > time? > > If the slave is intermittently offline, and if you're using replication > slots (that was a new feature in 9.4), then the master will hold a queue of > pending data as long as is needed until the slave catches up again. > > Alternately, you can increase wal_keep_segments for a long enough interval > to cover the worst case time the slave is offline, or you can implement a > wal archive that the slave can recover from when resuming streaming > replication. > > See https://www.postgresql.org/docs/9.4/static/warm-standby.html for > discussions of these various options. > > > > > > -- > john r pierce, recycling bits in santa cruz > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Subhankar Chattopadhyay Bangalore, India
On 11/20/2016 10:37 PM, Subhankar Chattopadhyay wrote: > We are using the database in a cloud setup and the nodes are running > in VMs. The slave can fall behind for various reasons as you have > stated, like slave is shut down or high write workload. > > We don't use replication slot but the wal_keep_segments is high enough > for us, 5000 to be exact. > > In that case, we can do pg_basebackup only when necessary, like only > when slave is behind. We can check that from the query "SELECT state > FROM pg_stat_replication WHERE pid <> pg_backend_pid();" Will that be > correct way to do it? maintaining a wal archive, and configuring the slaves so they can recover from it when they are restarted will likely take care of things if they are getting so far behind that 5000 wal segments is insufficient.. I'm not that familiar with pg_stat_replication, I'm not sure what that query is telling you. others can probably chime in with more info on this. -- john r pierce, recycling bits in santa cruz
Yes so if the slave is behind I need to start over pgbasebackup. I saw according to the documentation this query gives us the replication state. Can somebody tell me if this would be sufficient to know if I need to start over the backup ?
On 11/20/2016 10:37 PM, Subhankar Chattopadhyay wrote:We are using the database in a cloud setup and the nodes are running
in VMs. The slave can fall behind for various reasons as you have
stated, like slave is shut down or high write workload.
We don't use replication slot but the wal_keep_segments is high enough
for us, 5000 to be exact.
In that case, we can do pg_basebackup only when necessary, like only
when slave is behind. We can check that from the query "SELECT state
FROM pg_stat_replication WHERE pid <> pg_backend_pid();" Will that be
correct way to do it?
maintaining a wal archive, and configuring the slaves so they can recover from it when they are restarted will likely take care of things if they are getting so far behind that 5000 wal segments is insufficient..
I'm not that familiar with pg_stat_replication, I'm not sure what that query is telling you. others can probably chime in with more info on this.
--
john r pierce, recycling bits in santa cruz
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Yes so if the slave is behind I need to start over pgbasebackup. I saw according to the documentation this query gives us the replication state. Can somebody tell me if this would be sufficient to know if I need to start over the backup ?
if the slave is behind but is catching up, no, restarting replication would be overkill. only if the slave gets so far behind that it can't catch up, and in that case, a wal archive would be a better choice than a new base backup.
I've never run into these problems as I run on dedicated hardware servers, which don't have all these reliability and performance problems. a complete server failure requiring a full rebuild is something that would happen less than annually.
-- john r pierce, recycling bits in santa cruz
So, the question here is while I apply update on Slave, how do I know if if it will be able to catch up or I need Wal archive? Is there a way I can determine this? In my case, while applying update on slave, the db process will be stopped, so the query, even if it gives correct value, won't help. Can anybody help in here? On Mon, Nov 21, 2016 at 12:40 PM, John R Pierce <pierce@hogranch.com> wrote: > On 11/20/2016 11:00 PM, Subhankar Chattopadhyay wrote: > > Yes so if the slave is behind I need to start over pgbasebackup. I saw > according to the documentation this query gives us the replication state. > Can somebody tell me if this would be sufficient to know if I need to start > over the backup ? > > > > > if the slave is behind but is catching up, no, restarting replication would > be overkill. only if the slave gets so far behind that it can't catch up, > and in that case, a wal archive would be a better choice than a new base > backup. > > I've never run into these problems as I run on dedicated hardware servers, > which don't have all these reliability and performance problems. a > complete server failure requiring a full rebuild is something that would > happen less than annually. > > > > -- > john r pierce, recycling bits in santa cruz -- Subhankar Chattopadhyay Bangalore, India
John, Can you explain the Wal Archive procedure, how it can be setup so that the slave never goes out of sync, even if master deletes the WAL files? On Tue, Nov 22, 2016 at 4:04 PM, Subhankar Chattopadhyay <subho.atg@gmail.com> wrote: > So, the question here is while I apply update on Slave, how do I know > if if it will be able to catch up or I need Wal archive? Is there a > way I can determine this? In my case, while applying update on slave, > the db process will be stopped, so the query, even if it gives correct > value, won't help. Can anybody help in here? > > On Mon, Nov 21, 2016 at 12:40 PM, John R Pierce <pierce@hogranch.com> wrote: >> On 11/20/2016 11:00 PM, Subhankar Chattopadhyay wrote: >> >> Yes so if the slave is behind I need to start over pgbasebackup. I saw >> according to the documentation this query gives us the replication state. >> Can somebody tell me if this would be sufficient to know if I need to start >> over the backup ? >> >> >> >> >> if the slave is behind but is catching up, no, restarting replication would >> be overkill. only if the slave gets so far behind that it can't catch up, >> and in that case, a wal archive would be a better choice than a new base >> backup. >> >> I've never run into these problems as I run on dedicated hardware servers, >> which don't have all these reliability and performance problems. a >> complete server failure requiring a full rebuild is something that would >> happen less than annually. >> >> >> >> -- >> john r pierce, recycling bits in santa cruz > > > > -- > > > > > Subhankar Chattopadhyay > Bangalore, India -- Subhankar Chattopadhyay Bangalore, India
On 11/22/2016 2:34 AM, Subhankar Chattopadhyay wrote: > So, the question here is while I apply update on Slave, how do I know > if if it will be able to catch up or I need Wal archive? Is there a > way I can determine this? In my case, while applying update on slave, > the db process will be stopped, so the query, even if it gives correct > value, won't help. Can anybody help in here? if the slave it setup with the proper recovery commands to fetch from the WAL archive, then when the slave is woken up after a slumber it will attempt to recover as many WAL's as it can from the archive before it resumes streaming. This will happen automatically. -- john r pierce, recycling bits in santa cruz
On 11/22/2016 3:41 AM, Subhankar Chattopadhyay wrote: > John, > > Can you explain the Wal Archive procedure, how it can be setup so that > the slave never goes out of sync, even if master deletes the WAL > files? The WAL archive will typically be a separate file server that both the master and slave can reach... it could be accessed via NFS or via scp or whatever is appropriate for your environment. The master is configured with an archive command (cp in the case of nfs, or scp for ssh/scp, or whatever) to copy WAL segments to the archive. The slave is setup with an recovery command (cp, scp, etc) to fetch from this same archive. The archive will continue grow without limit if you don't do some cleanup on it. one strategy is to periodically (weekly? monthly?) do a base backup of the master (possibly by using rsync or another file copy method, rather than pg_basebackup), and keep 2 of these full backups, and all wal archives since the beginning of the oldest one. with this backup + archive, you can initialize a new slave without bothering the master (rsync or scp or cp the latest backup, then let the slave recover from the wal archive). this backup+archive will also let you do point-in-time-recovery (aka PITR). say something catastrophic happens and the data in the master is bad after some point in time (maybe a jr admin accidentally clobbers key data, but the app kept running). you can restore the last good base backup, and recover up to but not including the point in time of the transaction that clobbered your data. -- john r pierce, recycling bits in santa cruz
Thanks John, Well that clarifies about archive a lot!
On 11/22/2016 3:41 AM, Subhankar Chattopadhyay wrote:The WAL archive will typically be a separate file server that both the master and slave can reach... it could be accessed via NFS or via scp or whatever is appropriate for your environment. The master is configured with an archive command (cp in the case of nfs, or scp for ssh/scp, or whatever) to copy WAL segments to the archive. The slave is setup with an recovery command (cp, scp, etc) to fetch from this same archive.John,
Can you explain the Wal Archive procedure, how it can be setup so that
the slave never goes out of sync, even if master deletes the WAL
files?
The archive will continue grow without limit if you don't do some cleanup on it. one strategy is to periodically (weekly? monthly?) do a base backup of the master (possibly by using rsync or another file copy method, rather than pg_basebackup), and keep 2 of these full backups, and all wal archives since the beginning of the oldest one. with this backup + archive, you can initialize a new slave without bothering the master (rsync or scp or cp the latest backup, then let the slave recover from the wal archive).
this backup+archive will also let you do point-in-time-recovery (aka PITR). say something catastrophic happens and the data in the master is bad after some point in time (maybe a jr admin accidentally clobbers key data, but the app kept running). you can restore the last good base backup, and recover up to but not including the point in time of the transaction that clobbered your data.
--
john r pierce, recycling bits in santa cruz
--Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general