Thread: better architecture?
i need some advice:
1. for our small business, i have a master (dev) - slave (prod) setup.
i develop using the master. i get data from other people's dev (mssql) databases. i also get data from their prod (mssql) databases. i replicate everything on slave.apps using the master can connect only to dev databases, and apps using the slave can connect only to prod databases. so when it's time to go from dev to prod, i simply change that connection. no restart.
comments?
2. what happens if master-slave are rebooted at different times?
3. i also need to auto-promote slave to master if master fails (without using repmgr or postgres-r or even postgres-xl). how?
On 11/19/2014 05:58 PM, zach cruise wrote: > i need some advice: > > 1. for our small business, i have a master (dev) - slave (prod) setup. > i develop using the master. i get data from other people's dev > (mssql) databases. i also get data from their prod (mssql) databases. i > replicate everything on slave. > > apps using the master can connect only to dev databases, and apps using > the slave can connect only to prod databases. so when it's time to go > from dev to prod, i simply change that connection. no restart. > > comments? Yes I am confused. What replication method are you using? The built in methods, Slony. Bucardo, etc? The production users cannot enter or update records? If you have replication set up master -> slave, how can there be a difference between the two? Not sure where the mssql databases into this? > > 2. what happens if master-slave are rebooted at different times? > > 3. i also need to auto-promote slave to master if master fails (without > using repmgr or postgres-r or even postgres-xl). how? Answers for 2 & 3 are dependent on answers to the above questions. > > -- Adrian Klaver adrian.klaver@aklaver.com
combining replies for the list: On 11/19/14, Charles Zaffery <charlesz@focusschoolsoftware.com> wrote: > 2 and 3 can be covered by this: > http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster does something similar exist for windows? On 11/20/14, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Nov 20, 2014 at 10:58 AM, zach cruise <zachc1980@gmail.com> wrote: >> 2. what happens if master-slave are rebooted at different times? > What do you mean by that? If replication is impacted? eg if i were to reboot the vmware running the master in the evening, and reboot the vmware running the slave in the night, how would they sync up? On 11/20/14, Adrian Klaver wrote: > What replication method are you using? > The built in methods, Slony. Bucardo, etc? built in > The production users cannot enter or update records? they can't. slave is read-only. > If you have replication set up master -> slave, how can there be a difference between the two? there isn't. both contain dev and prod databases. users connect to the dev databases from the dev web server, and to the prod databases from the prod web server. > Not sure where the mssql databases into this? our corporate partners use them. when i need to query against them, i import. >> 2. what happens if master-slave are rebooted at different times? >> >> 3. i also need to auto-promote slave to master if master fails (without using repmgr or postgres-r or even postgres-xl).how? > Answers for 2 & 3 are dependent on answers to the above questions.
On 11/20/2014 08:00 AM, zach cruise wrote: > combining replies for the list: > > > On 11/19/14, Charles Zaffery <charlesz@focusschoolsoftware.com> wrote: >> 2 and 3 can be covered by this: >> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster > does something similar exist for windows? > > > On 11/20/14, Michael Paquier <michael.paquier@gmail.com> wrote: >> On Thu, Nov 20, 2014 at 10:58 AM, zach cruise <zachc1980@gmail.com> wrote: >>> 2. what happens if master-slave are rebooted at different times? >> What do you mean by that? If replication is impacted? > eg if i were to reboot the vmware running the master in the evening, > and reboot the vmware running the slave in the night, how would they > sync up? Well it would depend on your setup and the load on the master. Assuming streaming replication. Simple explanation: 1) If the master is down and slave is up then the slave will stall at whatever the last WAL was sent. When the master comes back up it will catch up as new WALs are generated. 2) If the slave is down and the master is up, the master will keep on creating WALs. The issue is that WALs are recycled over time, so given a significant load on the master and extended downtime for the slave it is possible that when the slave comes back up a WAL it needs is no longer available and it will start throwing errors. One way to tune this is modify wal_keep_segments (integer): http://www.postgresql.org/docs/9.3/interactive/runtime-config-replication.html > > > On 11/20/14, Adrian Klaver wrote: >> What replication method are you using? >> The built in methods, Slony. Bucardo, etc? > built in > >> The production users cannot enter or update records? > they can't. slave is read-only. > >> If you have replication set up master -> slave, how can there be a difference between the two? > there isn't. both contain dev and prod databases. users connect to the > dev databases from the dev web server, and to the prod databases from > the prod web server. Crossed wires on my part, I was reading databases and thinking database clusters. > >> Not sure where the mssql databases into this? > our corporate partners use them. when i need to query against them, i import. > >>> 2. what happens if master-slave are rebooted at different times? >>> >>> 3. i also need to auto-promote slave to master if master fails (without using repmgr or postgres-r or even postgres-xl).how? >> Answers for 2 & 3 are dependent on answers to the above questions. For failover see: http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html "PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby database server. Many such tools exist and are well integrated with the operating system facilities required for successful failover, such as IP address migration." So if you are looking for auto-promote you will need to look at third party tools or writing your own script. > > -- Adrian Klaver adrian.klaver@aklaver.com
On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 11/20/2014 08:00 AM, zach cruise wrote: >> combining replies for the list: >> >> >> On 11/19/14, Charles Zaffery <charlesz@focusschoolsoftware.com> wrote: >>> 2 and 3 can be covered by this: >>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster >> does something similar exist for windows? >> >> >> On 11/20/14, Michael Paquier <michael.paquier@gmail.com> wrote: >>> On Thu, Nov 20, 2014 at 10:58 AM, zach cruise <zachc1980@gmail.com> >>> wrote: >>>> 2. what happens if master-slave are rebooted at different times? >>> What do you mean by that? If replication is impacted? >> eg if i were to reboot the vmware running the master in the evening, >> and reboot the vmware running the slave in the night, how would they >> sync up? > > Well it would depend on your setup and the load on the master. Assuming > streaming replication. Simple explanation: yes streaming replication. > > 1) If the master is down and slave is up then the slave will stall at > whatever the last WAL was sent. When the master comes back up it will > catch up as new WALs are generated. > > 2) If the slave is down and the master is up, the master will keep on > creating WALs. The issue is that WALs are recycled over time, so given a > significant load on the master and extended downtime for the slave it is > possible that when the slave comes back up a WAL it needs is no longer > available and it will start throwing errors. One way to tune this is > modify wal_keep_segments (integer): > > http://www.postgresql.org/docs/9.3/interactive/runtime-config-replication.html while the WAL archive directory has to be shared with both master and slave, should the WAL archive directory be independent of them ie should it not go down with either of them? if it has to go down with one, it seems it'd best for the WAL archive directory to go down with slave? >> On 11/20/14, Adrian Klaver wrote: >>> What replication method are you using? >>> The built in methods, Slony. Bucardo, etc? >> built in >> >>> The production users cannot enter or update records? >> they can't. slave is read-only. >> >>> If you have replication set up master -> slave, how can there be a >>> difference between the two? >> there isn't. both contain dev and prod databases. users connect to the >> dev databases from the dev web server, and to the prod databases from >> the prod web server. > > Crossed wires on my part, I was reading databases and thinking database > clusters. > >> >>> Not sure where the mssql databases into this? >> our corporate partners use them. when i need to query against them, i >> import. >> >>>> 2. what happens if master-slave are rebooted at different times? >>>> >>>> 3. i also need to auto-promote slave to master if master fails (without >>>> using repmgr or postgres-r or even postgres-xl). how? >>> Answers for 2 & 3 are dependent on answers to the above questions. > > For failover see: > > http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html > > "PostgreSQL does not provide the system software required to identify a > failure on the primary and notify the standby database server. Many such > tools exist and are well integrated with the operating system facilities > required for successful failover, such as IP address migration." > > So if you are looking for auto-promote you will need to look at third > party tools or writing your own script. while i can always use "pg_ctl promote", any recommendations for windows?
On 11/20/2014 11:02 AM, zach cruise wrote: > On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> On 11/20/2014 08:00 AM, zach cruise wrote: >>> combining replies for the list: >>> >> >> Well it would depend on your setup and the load on the master. Assuming >> streaming replication. Simple explanation: > yes streaming replication. >> >> 1) If the master is down and slave is up then the slave will stall at >> whatever the last WAL was sent. When the master comes back up it will >> catch up as new WALs are generated. >> >> 2) If the slave is down and the master is up, the master will keep on >> creating WALs. The issue is that WALs are recycled over time, so given a >> significant load on the master and extended downtime for the slave it is >> possible that when the slave comes back up a WAL it needs is no longer >> available and it will start throwing errors. One way to tune this is >> modify wal_keep_segments (integer): >> >> http://www.postgresql.org/docs/9.3/interactive/runtime-config-replication.html > while the WAL archive directory has to be shared with both master and > slave, should the WAL archive directory be independent of them ie > should it not go down with either of them? if it has to go down with > one, it seems it'd best for the WAL archive directory to go down with > slave? > So I am to understand that you have WAL archiving set up also? Again a simplified version: The ideal situation is you have a third machine that has the WAL archives. The issue is that if the master cannot archive a WAL it will keep it around until it can. So depending on load and outage you can end with disk space issues on the master should it not be able clear the WALs. For more info see: http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html > >> >> For failover see: >> >> http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html >> >> "PostgreSQL does not provide the system software required to identify a >> failure on the primary and notify the standby database server. Many such >> tools exist and are well integrated with the operating system facilities >> required for successful failover, such as IP address migration." >> >> So if you are looking for auto-promote you will need to look at third >> party tools or writing your own script. > while i can always use "pg_ctl promote", any recommendations for windows? Not from me, I do not run Postgres on Windows so I will be of no help there. > > -- Adrian Klaver adrian.klaver@aklaver.com
On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 11/20/2014 11:02 AM, zach cruise wrote: >> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>> On 11/20/2014 08:00 AM, zach cruise wrote: >>>> combining replies for the list: >>>> > >>> Well it would depend on your setup and the load on the master. Assuming >>> streaming replication. Simple explanation: >> yes streaming replication. >>> >>> 1) If the master is down and slave is up then the slave will stall at >>> whatever the last WAL was sent. When the master comes back up it will >>> catch up as new WALs are generated. >>> >>> 2) If the slave is down and the master is up, the master will keep on >>> creating WALs. The issue is that WALs are recycled over time, so given a >>> significant load on the master and extended downtime for the slave it is >>> possible that when the slave comes back up a WAL it needs is no longer >>> available and it will start throwing errors. One way to tune this is >>> modify wal_keep_segments (integer): >>> >>> http://www.postgresql.org/docs/9.3/interactive/runtime-config-replication.html >> while the WAL archive directory has to be shared with both master and >> slave, should the WAL archive directory be independent of them ie >> should it not go down with either of them? if it has to go down with >> one, it seems it'd best for the WAL archive directory to go down with >> slave? > > So I am to understand that you have WAL archiving set up also? yes, slave gets updates from stream and WAL. if either fails, the other will update. > Again a simplified version: > > The ideal situation is you have a third machine that has the WAL > archives. The issue is that if the master cannot archive a WAL it will > keep it around until it can. So depending on load and outage you can end > with disk space issues on the master should it not be able clear the WALs. > > For more info see: > > http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3. archive (wal)"}. but what do i gain? as it is, in the worst case, VMs can always be restored "fairly quickly" for our use. >>> For failover see: >>> >>> http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html >>> >>> "PostgreSQL does not provide the system software required to identify a >>> failure on the primary and notify the standby database server. Many such >>> tools exist and are well integrated with the operating system facilities >>> required for successful failover, such as IP address migration." >>> >>> So if you are looking for auto-promote you will need to look at third >>> party tools or writing your own script. >> while i can always use "pg_ctl promote", any recommendations for windows? > > Not from me, I do not run Postgres on Windows so I will be of no help > there.
On 11/20/2014 12:30 PM, zach cruise wrote: >> >> For more info see: >> >> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html > to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave > (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3. > archive (wal)"}. > > but what do i gain? Extra protection against failure, maybe. So: ---> WAL Archive --- | | | Streaming | master --- --------------------> slave If the direct link between the master and slave goes down, the slave can still get WALs from the archive. If the archive machine goes down you still have the direct link. If you take the slave down the master can still push WALs to the archive. This assumes the 'machines' are actually separated and connecting through different networks. You say you are using VMs, but not where they are running. If they are all running on the same machine running through the same network link then you really do not have protection against network issues. The same if the host machine goes down. This is one of those pen and paper times, when you sketch out the arrangement and start doing what ifs. > > as it is, in the worst case, VMs can always be restored "fairly > quickly" for our use. > >>>> For failover see: >>>> >>>> http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html >>>> >>>> "PostgreSQL does not provide the system software required to identify a >>>> failure on the primary and notify the standby database server. Many such >>>> tools exist and are well integrated with the operating system facilities >>>> required for successful failover, such as IP address migration." >>>> >>>> So if you are looking for auto-promote you will need to look at third >>>> party tools or writing your own script. >>> while i can always use "pg_ctl promote", any recommendations for windows? >> >> Not from me, I do not run Postgres on Windows so I will be of no help >> there. > -- Adrian Klaver adrian.klaver@aklaver.com
On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 11/20/2014 12:30 PM, zach cruise wrote: >>> >>> For more info see: >>> >>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html >> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave >> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3. >> archive (wal)"}. >> >> but what do i gain? > > Extra protection against failure, maybe. > > So: > > ---> WAL Archive --- > | | > | Streaming | > master --- --------------------> slave > > If the direct link between the master and slave goes down, the slave can > still get WALs from the archive. If the archive machine goes down you > still have the direct link. If you take the slave down the master can > still push WALs to the archive. This assumes the 'machines' are actually > separated and connecting through different networks. You say you are > using VMs, but not where they are running. If they are all running on > the same machine running through the same network link then you really > do not have protection against network issues. The same if the host > machine goes down. This is one of those pen and paper times, when you > sketch out the arrangement and start doing what ifs. master, slave and archive can be 3 separate VMs on 1 host, with their clones on 2nd and 3rd hosts. a follow-up question on WAL recycling: ("When WAL archiving is being done, the log segments must be archived before being recycled or removed" from http://www.postgresql.org/docs/9.3/static/wal-configuration.html) say streaming is off- * if both master and archive are down, slave is still up and running. yes? * if master writes data when archive is down, it will copy over to slave when archive is back up. yes? * but if WAL is recycled before archive is back up, it will not copy over to slave. yes? see my concern with a separate archive is if archive is down and master gets stuck retrying to push the same segment again and again, there may be a problem in recovery when archive is back up. no?
On 11/20/2014 04:57 PM, zach cruise wrote: > On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> On 11/20/2014 12:30 PM, zach cruise wrote: >>>> >>>> For more info see: >>>> >>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html >>> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave >>> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3. >>> archive (wal)"}. >>> >>> but what do i gain? >> >> Extra protection against failure, maybe. >> >> So: >> >> ---> WAL Archive --- >> | | >> | Streaming | >> master --- --------------------> slave >> >> If the direct link between the master and slave goes down, the slave can >> still get WALs from the archive. If the archive machine goes down you >> still have the direct link. If you take the slave down the master can >> still push WALs to the archive. This assumes the 'machines' are actually >> separated and connecting through different networks. You say you are >> using VMs, but not where they are running. If they are all running on >> the same machine running through the same network link then you really >> do not have protection against network issues. The same if the host >> machine goes down. This is one of those pen and paper times, when you >> sketch out the arrangement and start doing what ifs. First of all, the below is really in need of whiteboard/paper diagram to keep track of the moving parts. That being said here it goes: > > master, slave and archive can be 3 separate VMs on 1 host, with their > clones on 2nd and 3rd hosts. I can see the above being a potential nightmare. I am not sure how you ensure that the cloning process results in clones that exactly mirror the state of the originals at a particular point in time. Failing in that would seem to me to lead to no end of issues in the replication process. > > a follow-up question on WAL recycling: ("When WAL archiving is being > done, the log segments must be archived before being recycled or > removed" from http://www.postgresql.org/docs/9.3/static/wal-configuration.html) > > say streaming is off- > * if both master and archive are down, slave is still up and running. yes? Yes. > * if master writes data when archive is down, it will copy over to > slave when archive is back up. yes? If streaming is off and you are doing archiving then it will copy over to the archive. > * but if WAL is recycled before archive is back up, it will not copy > over to slave. yes? The issue here as pointed out previously is that the WALs will stack up on the master because it will not be able to archive them. So then you run into a potential of of space issue on the master. From here: http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html "It is important that the archive command return zero exit status if and only if it succeeds. Upon getting a zero result, PostgreSQL will assume that the file has been successfully archived, and will remove or recycle it. However, a nonzero status tells PostgreSQL that the file was not archived; it will try again periodically until it succeeds." ... "While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved reasonably quickly. The pg_xlog/ directory will continue to fill with WAL segment files until the situation is resolved. (If the file system containing pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed transactions will be lost, but the database will remain offline until you free some space.)" > see my concern with a separate archive is if archive is down and > master gets stuck retrying to push the same segment again and again, > there may be a problem in recovery when archive is back up. no? See above. > > -- Adrian Klaver adrian.klaver@aklaver.com
On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 11/20/2014 04:57 PM, zach cruise wrote: >> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>> On 11/20/2014 12:30 PM, zach cruise wrote: >>>>> >>>>> For more info see: >>>>> >>>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html >>>> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave >>>> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3. >>>> archive (wal)"}. >>>> >>>> but what do i gain? >>> >>> Extra protection against failure, maybe. >>> >>> So: >>> >>> ---> WAL Archive --- >>> | | >>> | Streaming | >>> master --- --------------------> slave >>> >>> If the direct link between the master and slave goes down, the slave can >>> still get WALs from the archive. If the archive machine goes down you >>> still have the direct link. If you take the slave down the master can >>> still push WALs to the archive. This assumes the 'machines' are actually >>> separated and connecting through different networks. You say you are >>> using VMs, but not where they are running. If they are all running on >>> the same machine running through the same network link then you really >>> do not have protection against network issues. The same if the host >>> machine goes down. This is one of those pen and paper times, when you >>> sketch out the arrangement and start doing what ifs. > > First of all, the below is really in need of whiteboard/paper diagram to > keep track of the moving parts. That being said here it goes: > >> >> master, slave and archive can be 3 separate VMs on 1 host, with their >> clones on 2nd and 3rd hosts. > > I can see the above being a potential nightmare. I am not sure how you > ensure that the cloning process results in clones that exactly mirror > the state of the originals at a particular point in time. Failing in > that would seem to me to lead to no end of issues in the replication > process. > >> >> a follow-up question on WAL recycling: ("When WAL archiving is being >> done, the log segments must be archived before being recycled or >> removed" from >> http://www.postgresql.org/docs/9.3/static/wal-configuration.html) >> >> say streaming is off- >> * if both master and archive are down, slave is still up and running. >> yes? > > Yes. > >> * if master writes data when archive is down, it will copy over to >> slave when archive is back up. yes? > > If streaming is off and you are doing archiving then it will copy over > to the archive. > >> * but if WAL is recycled before archive is back up, it will not copy >> over to slave. yes? > > The issue here as pointed out previously is that the WALs will stack up > on the master because it will not be able to archive them. So then you > run into a potential of of space issue on the master. From here: > > http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html > > "It is important that the archive command return zero exit status if and > only if it succeeds. Upon getting a zero result, PostgreSQL will assume > that the file has been successfully archived, and will remove or recycle > it. However, a nonzero status tells PostgreSQL that the file was not > archived; it will try again periodically until it succeeds." > > ... > > "While designing your archiving setup, consider what will happen if the > archive command fails repeatedly because some aspect requires operator > intervention or the archive runs out of space. For example, this could > occur if you write to tape without an autochanger; when the tape fills, > nothing further can be archived until the tape is swapped. You should > ensure that any error condition or request to a human operator is > reported appropriately so that the situation can be resolved reasonably > quickly. The pg_xlog/ directory will continue to fill with WAL segment > files until the situation is resolved. (If the file system containing > pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed > transactions will be lost, but the database will remain offline until > you free some space.)" > > >> see my concern with a separate archive is if archive is down and >> master gets stuck retrying to push the same segment again and again, >> there may be a problem in recovery when archive is back up. no? > > See above. also streaming is working (checked with wireshark) and WAL is being written to by master, but if i comment out streaming in pg_hba.conf, WAL is not copying over to slave?! how can i verify that WAL archiving is working for slave?
On 11/21/2014 07:38 PM, zach cruise wrote: > On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> On 11/20/2014 04:57 PM, zach cruise wrote: >>> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>>> On 11/20/2014 12:30 PM, zach cruise wrote: >>>>>> >>>>>> For more info see: >>>>>> >>>>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html >>>>> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave >>>>> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3. >>>>> archive (wal)"}. >>>>> >>>>> but what do i gain? >>>> >>>> Extra protection against failure, maybe. >>>> >>>> So: >>>> >>>> ---> WAL Archive --- >>>> | | >>>> | Streaming | >>>> master --- --------------------> slave >>>> >>>> If the direct link between the master and slave goes down, the slave can >>>> still get WALs from the archive. If the archive machine goes down you >>>> still have the direct link. If you take the slave down the master can >>>> still push WALs to the archive. This assumes the 'machines' are actually >>>> separated and connecting through different networks. You say you are >>>> using VMs, but not where they are running. If they are all running on >>>> the same machine running through the same network link then you really >>>> do not have protection against network issues. The same if the host >>>> machine goes down. This is one of those pen and paper times, when you >>>> sketch out the arrangement and start doing what ifs. >> >> First of all, the below is really in need of whiteboard/paper diagram to >> keep track of the moving parts. That being said here it goes: >> >>> >>> master, slave and archive can be 3 separate VMs on 1 host, with their >>> clones on 2nd and 3rd hosts. >> >> I can see the above being a potential nightmare. I am not sure how you >> ensure that the cloning process results in clones that exactly mirror >> the state of the originals at a particular point in time. Failing in >> that would seem to me to lead to no end of issues in the replication >> process. >> >>> >>> a follow-up question on WAL recycling: ("When WAL archiving is being >>> done, the log segments must be archived before being recycled or >>> removed" from >>> http://www.postgresql.org/docs/9.3/static/wal-configuration.html) >>> >>> say streaming is off- >>> * if both master and archive are down, slave is still up and running. >>> yes? >> >> Yes. >> >>> * if master writes data when archive is down, it will copy over to >>> slave when archive is back up. yes? >> >> If streaming is off and you are doing archiving then it will copy over >> to the archive. >> >>> * but if WAL is recycled before archive is back up, it will not copy >>> over to slave. yes? >> >> The issue here as pointed out previously is that the WALs will stack up >> on the master because it will not be able to archive them. So then you >> run into a potential of of space issue on the master. From here: >> >> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html >> >> "It is important that the archive command return zero exit status if and >> only if it succeeds. Upon getting a zero result, PostgreSQL will assume >> that the file has been successfully archived, and will remove or recycle >> it. However, a nonzero status tells PostgreSQL that the file was not >> archived; it will try again periodically until it succeeds." >> >> ... >> >> "While designing your archiving setup, consider what will happen if the >> archive command fails repeatedly because some aspect requires operator >> intervention or the archive runs out of space. For example, this could >> occur if you write to tape without an autochanger; when the tape fills, >> nothing further can be archived until the tape is swapped. You should >> ensure that any error condition or request to a human operator is >> reported appropriately so that the situation can be resolved reasonably >> quickly. The pg_xlog/ directory will continue to fill with WAL segment >> files until the situation is resolved. (If the file system containing >> pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed >> transactions will be lost, but the database will remain offline until >> you free some space.)" >> >> >>> see my concern with a separate archive is if archive is down and >>> master gets stuck retrying to push the same segment again and again, >>> there may be a problem in recovery when archive is back up. no? >> >> See above. > > also streaming is working (checked with wireshark) and WAL is being > written to by master, but if i comment out streaming in pg_hba.conf, > WAL is not copying over to slave?! Do you have a restore_command set on the slave?: http://www.postgresql.org/docs/9.3/static/archive-recovery-settings.html#RESTORE-COMMAND per the docs here: http://www.postgresql.org/docs/9.3/static/standby-settings.html standby_mode (boolean) Specifies whether to start the PostgreSQL server as a standby. If this parameter is on, the server will not stop recovery when the end of archived WAL is reached, but will keep trying to continue recovery by fetching new WAL segments using restore_command and/or by connecting to the primary server as specified by the primary_conninfo setting. > > how can i verify that WAL archiving is working for slave? Look in the logs or use the following functions: http://www.postgresql.org/docs/9.3/interactive/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL > > -- Adrian Klaver adrian.klaver@aklaver.com
On 11/22/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 11/21/2014 07:38 PM, zach cruise wrote: >> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>> On 11/20/2014 04:57 PM, zach cruise wrote: >>>> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>>>> On 11/20/2014 12:30 PM, zach cruise wrote: >>>>>>> >>>>>>> For more info see: >>>>>>> >>>>>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html >>>>>> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave >>>>>> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - >>>>>> 3. >>>>>> archive (wal)"}. >>>>>> >>>>>> but what do i gain? >>>>> >>>>> Extra protection against failure, maybe. >>>>> >>>>> So: >>>>> >>>>> ---> WAL Archive --- >>>>> | | >>>>> | Streaming | >>>>> master --- --------------------> slave >>>>> >>>>> If the direct link between the master and slave goes down, the slave >>>>> can >>>>> still get WALs from the archive. If the archive machine goes down you >>>>> still have the direct link. If you take the slave down the master can >>>>> still push WALs to the archive. This assumes the 'machines' are >>>>> actually >>>>> separated and connecting through different networks. You say you are >>>>> using VMs, but not where they are running. If they are all running on >>>>> the same machine running through the same network link then you really >>>>> do not have protection against network issues. The same if the host >>>>> machine goes down. This is one of those pen and paper times, when you >>>>> sketch out the arrangement and start doing what ifs. >>> >>> First of all, the below is really in need of whiteboard/paper diagram to >>> keep track of the moving parts. That being said here it goes: >>> >>>> >>>> master, slave and archive can be 3 separate VMs on 1 host, with their >>>> clones on 2nd and 3rd hosts. >>> >>> I can see the above being a potential nightmare. I am not sure how you >>> ensure that the cloning process results in clones that exactly mirror >>> the state of the originals at a particular point in time. Failing in >>> that would seem to me to lead to no end of issues in the replication >>> process. >>> >>>> >>>> a follow-up question on WAL recycling: ("When WAL archiving is being >>>> done, the log segments must be archived before being recycled or >>>> removed" from >>>> http://www.postgresql.org/docs/9.3/static/wal-configuration.html) >>>> >>>> say streaming is off- >>>> * if both master and archive are down, slave is still up and running. >>>> yes? >>> >>> Yes. >>> >>>> * if master writes data when archive is down, it will copy over to >>>> slave when archive is back up. yes? >>> >>> If streaming is off and you are doing archiving then it will copy over >>> to the archive. >>> >>>> * but if WAL is recycled before archive is back up, it will not copy >>>> over to slave. yes? >>> >>> The issue here as pointed out previously is that the WALs will stack up >>> on the master because it will not be able to archive them. So then you >>> run into a potential of of space issue on the master. From here: >>> >>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html >>> >>> "It is important that the archive command return zero exit status if and >>> only if it succeeds. Upon getting a zero result, PostgreSQL will assume >>> that the file has been successfully archived, and will remove or recycle >>> it. However, a nonzero status tells PostgreSQL that the file was not >>> archived; it will try again periodically until it succeeds." >>> >>> ... >>> >>> "While designing your archiving setup, consider what will happen if the >>> archive command fails repeatedly because some aspect requires operator >>> intervention or the archive runs out of space. For example, this could >>> occur if you write to tape without an autochanger; when the tape fills, >>> nothing further can be archived until the tape is swapped. You should >>> ensure that any error condition or request to a human operator is >>> reported appropriately so that the situation can be resolved reasonably >>> quickly. The pg_xlog/ directory will continue to fill with WAL segment >>> files until the situation is resolved. (If the file system containing >>> pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed >>> transactions will be lost, but the database will remain offline until >>> you free some space.)" >>> >>> >>>> see my concern with a separate archive is if archive is down and >>>> master gets stuck retrying to push the same segment again and again, >>>> there may be a problem in recovery when archive is back up. no? >>> >>> See above. >> >> also streaming is working (checked with wireshark) and WAL is being >> written to by master, but if i comment out streaming in pg_hba.conf, >> WAL is not copying over to slave?! > > Do you have a restore_command set on the slave?: > > http://www.postgresql.org/docs/9.3/static/archive-recovery-settings.html#RESTORE-COMMAND yes (recovery.conf on slave: restore_command = 'copy "\\path\\%f" "%p"') > per the docs here: > > http://www.postgresql.org/docs/9.3/static/standby-settings.html > > standby_mode (boolean) > > Specifies whether to start the PostgreSQL server as a standby. If > this parameter is on, the server will not stop recovery when the end of > archived WAL is reached, but will keep trying to continue recovery by > fetching new WAL segments using restore_command and/or by connecting to > the primary server as specified by the primary_conninfo setting. yes (recovery.conf on slave: standby_mode = 'on') >> how can i verify that WAL archiving is working for slave? > > Look in the logs or use the following functions: > > http://www.postgresql.org/docs/9.3/interactive/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL here're my outputs: pg_is_in_recovery() --> t pg_last_xlog_receive_location() --> "5/D..." pg_last_xlog_replay_location() --> "5/D..." pg_last_xact_replay_timestamp() --> "2014-..." pg_is_xlog_replay_paused() --> f pg_xlog_replay_pause() pg_xlog_replay_resume()