Thread: better architecture?

better architecture?

From
zach cruise
Date:
i need some advice:

1. for our small business, i have a master (dev) - slave (prod) setup.
i develop using the master. i get data from other people's dev (mssql) databases. i also get data from their prod (mssql) databases. i replicate everything on slave.

apps using the master can connect only to dev databases, and apps using the slave can connect only to prod databases. so when it's time to go from dev to prod, i simply change that connection. no restart.

comments?

2. what happens if master-slave are rebooted at different times?

3. i also need to auto-promote slave to master if master fails (without using repmgr or postgres-r or even postgres-xl). how?


Re: better architecture?

From
Adrian Klaver
Date:
On 11/19/2014 05:58 PM, zach cruise wrote:
> i need some advice:
>
> 1. for our small business, i have a master (dev) - slave (prod) setup.
> i develop using the master. i get data from other people's dev
> (mssql) databases. i also get data from their prod (mssql) databases. i
> replicate everything on slave.
>
> apps using the master can connect only to dev databases, and apps using
> the slave can connect only to prod databases. so when it's time to go
> from dev to prod, i simply change that connection. no restart.
>
> comments?

Yes I am confused.

What replication method are you using?
The built in methods, Slony. Bucardo, etc?

The production users cannot enter or update records?

If you have replication set up master -> slave, how can there be a
difference between the two?

Not sure where the mssql databases into this?

>
> 2. what happens if master-slave are rebooted at different times?
>
> 3. i also need to auto-promote slave to master if master fails (without
> using repmgr or postgres-r or even postgres-xl). how?

Answers for 2 & 3 are dependent on answers to the above questions.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: better architecture?

From
zach cruise
Date:
combining replies for the list:


On 11/19/14, Charles Zaffery <charlesz@focusschoolsoftware.com> wrote:
> 2 and 3 can be covered by this:
> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
does something similar exist for windows?


On 11/20/14, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Thu, Nov 20, 2014 at 10:58 AM, zach cruise <zachc1980@gmail.com> wrote:
>> 2. what happens if master-slave are rebooted at different times?
> What do you mean by that? If replication is impacted?
eg if i were to reboot the vmware running the master in the evening,
and reboot the vmware running the slave in the night, how would they
sync up?


On 11/20/14, Adrian Klaver wrote:
> What replication method are you using?
> The built in methods, Slony. Bucardo, etc?
built in

> The production users cannot enter or update records?
they can't. slave is read-only.

> If you have replication set up master -> slave, how can there be a difference between the two?
there isn't. both contain dev and prod databases. users connect to the
dev databases from the dev web server, and to the prod databases from
the prod web server.

> Not sure where the mssql databases into this?
our corporate partners use them. when i need to query against them, i import.

>> 2. what happens if master-slave are rebooted at different times?
>>
>> 3. i also need to auto-promote slave to master if master fails (without using repmgr or postgres-r or even
postgres-xl).how? 
> Answers for 2 & 3 are dependent on answers to the above questions.


Re: better architecture?

From
Adrian Klaver
Date:
On 11/20/2014 08:00 AM, zach cruise wrote:
> combining replies for the list:
>
>
> On 11/19/14, Charles Zaffery <charlesz@focusschoolsoftware.com> wrote:
>> 2 and 3 can be covered by this:
>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
> does something similar exist for windows?
>
>
> On 11/20/14, Michael Paquier <michael.paquier@gmail.com> wrote:
>> On Thu, Nov 20, 2014 at 10:58 AM, zach cruise <zachc1980@gmail.com> wrote:
>>> 2. what happens if master-slave are rebooted at different times?
>> What do you mean by that? If replication is impacted?
> eg if i were to reboot the vmware running the master in the evening,
> and reboot the vmware running the slave in the night, how would they
> sync up?

Well it would depend on your setup and the load on the master. Assuming
streaming replication. Simple explanation:

1) If the master is down and slave is up then the slave will stall at
whatever the last WAL was sent. When the master comes back up it will
catch up as new WALs are generated.

2) If the slave is down and the master is up, the master will keep on
creating WALs. The issue is that WALs are recycled over time, so given a
significant load on the master and extended downtime for the slave it is
possible that when the slave comes back up a WAL it needs is no longer
available and it will start throwing errors. One way to tune this is
modify wal_keep_segments (integer):

http://www.postgresql.org/docs/9.3/interactive/runtime-config-replication.html

>
>
> On 11/20/14, Adrian Klaver wrote:
>> What replication method are you using?
>> The built in methods, Slony. Bucardo, etc?
> built in
>
>> The production users cannot enter or update records?
> they can't. slave is read-only.
>
>> If you have replication set up master -> slave, how can there be a difference between the two?
> there isn't. both contain dev and prod databases. users connect to the
> dev databases from the dev web server, and to the prod databases from
> the prod web server.

Crossed wires on my part, I was reading databases and thinking database
clusters.

>
>> Not sure where the mssql databases into this?
> our corporate partners use them. when i need to query against them, i import.
>
>>> 2. what happens if master-slave are rebooted at different times?
>>>
>>> 3. i also need to auto-promote slave to master if master fails (without using repmgr or postgres-r or even
postgres-xl).how? 
>> Answers for 2 & 3 are dependent on answers to the above questions.

For failover see:

http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html

"PostgreSQL does not provide the system software required to identify a
failure on the primary and notify the standby database server. Many such
tools exist and are well integrated with the operating system facilities
required for successful failover, such as IP address migration."

So if you are looking for auto-promote you will need to look at third
party tools or writing your own script.

>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: better architecture?

From
zach cruise
Date:
On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> On 11/20/2014 08:00 AM, zach cruise wrote:
>> combining replies for the list:
>>
>>
>> On 11/19/14, Charles Zaffery <charlesz@focusschoolsoftware.com> wrote:
>>> 2 and 3 can be covered by this:
>>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
>> does something similar exist for windows?
>>
>>
>> On 11/20/14, Michael Paquier <michael.paquier@gmail.com> wrote:
>>> On Thu, Nov 20, 2014 at 10:58 AM, zach cruise <zachc1980@gmail.com>
>>> wrote:
>>>> 2. what happens if master-slave are rebooted at different times?
>>> What do you mean by that? If replication is impacted?
>> eg if i were to reboot the vmware running the master in the evening,
>> and reboot the vmware running the slave in the night, how would they
>> sync up?
>
> Well it would depend on your setup and the load on the master. Assuming
> streaming replication. Simple explanation:
yes streaming replication.
>
> 1) If the master is down and slave is up then the slave will stall at
> whatever the last WAL was sent. When the master comes back up it will
> catch up as new WALs are generated.
>
> 2) If the slave is down and the master is up, the master will keep on
> creating WALs. The issue is that WALs are recycled over time, so given a
> significant load on the master and extended downtime for the slave it is
> possible that when the slave comes back up a WAL it needs is no longer
> available and it will start throwing errors. One way to tune this is
> modify wal_keep_segments (integer):
>
> http://www.postgresql.org/docs/9.3/interactive/runtime-config-replication.html
while the WAL archive directory has to be shared with both master and
slave, should the WAL archive directory be independent of them ie
should it not go down with either of them? if it has to go down with
one, it seems it'd best for the WAL archive directory to go down with
slave?


>> On 11/20/14, Adrian Klaver wrote:
>>> What replication method are you using?
>>> The built in methods, Slony. Bucardo, etc?
>> built in
>>
>>> The production users cannot enter or update records?
>> they can't. slave is read-only.
>>
>>> If you have replication set up master -> slave, how can there be a
>>> difference between the two?
>> there isn't. both contain dev and prod databases. users connect to the
>> dev databases from the dev web server, and to the prod databases from
>> the prod web server.
>
> Crossed wires on my part, I was reading databases and thinking database
> clusters.
>
>>
>>> Not sure where the mssql databases into this?
>> our corporate partners use them. when i need to query against them, i
>> import.
>>
>>>> 2. what happens if master-slave are rebooted at different times?
>>>>
>>>> 3. i also need to auto-promote slave to master if master fails (without
>>>> using repmgr or postgres-r or even postgres-xl). how?
>>> Answers for 2 & 3 are dependent on answers to the above questions.
>
> For failover see:
>
> http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html
>
> "PostgreSQL does not provide the system software required to identify a
> failure on the primary and notify the standby database server. Many such
> tools exist and are well integrated with the operating system facilities
> required for successful failover, such as IP address migration."
>
> So if you are looking for auto-promote you will need to look at third
> party tools or writing your own script.
while i can always use "pg_ctl promote", any recommendations for windows?


Re: better architecture?

From
Adrian Klaver
Date:
On 11/20/2014 11:02 AM, zach cruise wrote:
> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> On 11/20/2014 08:00 AM, zach cruise wrote:
>>> combining replies for the list:
>>>

>>
>> Well it would depend on your setup and the load on the master. Assuming
>> streaming replication. Simple explanation:
> yes streaming replication.
>>
>> 1) If the master is down and slave is up then the slave will stall at
>> whatever the last WAL was sent. When the master comes back up it will
>> catch up as new WALs are generated.
>>
>> 2) If the slave is down and the master is up, the master will keep on
>> creating WALs. The issue is that WALs are recycled over time, so given a
>> significant load on the master and extended downtime for the slave it is
>> possible that when the slave comes back up a WAL it needs is no longer
>> available and it will start throwing errors. One way to tune this is
>> modify wal_keep_segments (integer):
>>
>> http://www.postgresql.org/docs/9.3/interactive/runtime-config-replication.html
> while the WAL archive directory has to be shared with both master and
> slave, should the WAL archive directory be independent of them ie
> should it not go down with either of them? if it has to go down with
> one, it seems it'd best for the WAL archive directory to go down with
> slave?
>


So I am to understand that you have WAL archiving set up also?

Again a simplified version:

The ideal situation is you have a third machine that has the WAL
archives. The issue is that if the master cannot archive a WAL it will
keep it around until it can. So depending on load and outage you can end
with disk space issues on the master should it not be able clear the WALs.

For more info see:

http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html


>

>>
>> For failover see:
>>
>> http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html
>>
>> "PostgreSQL does not provide the system software required to identify a
>> failure on the primary and notify the standby database server. Many such
>> tools exist and are well integrated with the operating system facilities
>> required for successful failover, such as IP address migration."
>>
>> So if you are looking for auto-promote you will need to look at third
>> party tools or writing your own script.
> while i can always use "pg_ctl promote", any recommendations for windows?

Not from me, I do not run Postgres on Windows so I will be of no help there.

>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: better architecture?

From
zach cruise
Date:
On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> On 11/20/2014 11:02 AM, zach cruise wrote:
>> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>> On 11/20/2014 08:00 AM, zach cruise wrote:
>>>> combining replies for the list:
>>>>
>
>>> Well it would depend on your setup and the load on the master. Assuming
>>> streaming replication. Simple explanation:
>> yes streaming replication.
>>>
>>> 1) If the master is down and slave is up then the slave will stall at
>>> whatever the last WAL was sent. When the master comes back up it will
>>> catch up as new WALs are generated.
>>>
>>> 2) If the slave is down and the master is up, the master will keep on
>>> creating WALs. The issue is that WALs are recycled over time, so given a
>>> significant load on the master and extended downtime for the slave it is
>>> possible that when the slave comes back up a WAL it needs is no longer
>>> available and it will start throwing errors. One way to tune this is
>>> modify wal_keep_segments (integer):
>>>
>>> http://www.postgresql.org/docs/9.3/interactive/runtime-config-replication.html
>> while the WAL archive directory has to be shared with both master and
>> slave, should the WAL archive directory be independent of them ie
>> should it not go down with either of them? if it has to go down with
>> one, it seems it'd best for the WAL archive directory to go down with
>> slave?
>
> So I am to understand that you have WAL archiving set up also?
yes, slave gets updates from stream and WAL. if either fails, the
other will update.

> Again a simplified version:
>
> The ideal situation is you have a third machine that has the WAL
> archives. The issue is that if the master cannot archive a WAL it will
> keep it around until it can. So depending on load and outage you can end
> with disk space issues on the master should it not be able clear the WALs.
>
> For more info see:
>
> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave
(prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3.
archive (wal)"}.

but what do i gain?

as it is, in the worst case, VMs can always be restored "fairly
quickly" for our use.

>>> For failover see:
>>>
>>> http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html
>>>
>>> "PostgreSQL does not provide the system software required to identify a
>>> failure on the primary and notify the standby database server. Many such
>>> tools exist and are well integrated with the operating system facilities
>>> required for successful failover, such as IP address migration."
>>>
>>> So if you are looking for auto-promote you will need to look at third
>>> party tools or writing your own script.
>> while i can always use "pg_ctl promote", any recommendations for windows?
>
> Not from me, I do not run Postgres on Windows so I will be of no help
> there.


Re: better architecture?

From
Adrian Klaver
Date:
On 11/20/2014 12:30 PM, zach cruise wrote:

>>
>> For more info see:
>>
>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave
> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3.
> archive (wal)"}.
>
> but what do i gain?

Extra protection against failure, maybe.

So:


           --->  WAL Archive ---
           |                    |
           |    Streaming       |
master ---  --------------------> slave


If the direct link between the master and slave goes down, the slave can
still get WALs from the archive. If the archive machine goes down you
still have the direct link. If you take the slave down the master can
still push WALs to the archive. This assumes the 'machines' are actually
separated and connecting through different networks. You say you are
using VMs, but not where they are running. If they are all running on
the same machine running through the same network link then you really
do not have protection against network issues. The same if the host
machine goes down. This is one of those pen and paper times, when you
sketch out the arrangement and start doing what ifs.


>
> as it is, in the worst case, VMs can always be restored "fairly
> quickly" for our use.
>
>>>> For failover see:
>>>>
>>>> http://www.postgresql.org/docs/9.3/interactive/warm-standby-failover.html
>>>>
>>>> "PostgreSQL does not provide the system software required to identify a
>>>> failure on the primary and notify the standby database server. Many such
>>>> tools exist and are well integrated with the operating system facilities
>>>> required for successful failover, such as IP address migration."
>>>>
>>>> So if you are looking for auto-promote you will need to look at third
>>>> party tools or writing your own script.
>>> while i can always use "pg_ctl promote", any recommendations for windows?
>>
>> Not from me, I do not run Postgres on Windows so I will be of no help
>> there.
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: better architecture?

From
zach cruise
Date:
On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> On 11/20/2014 12:30 PM, zach cruise wrote:
>>>
>>> For more info see:
>>>
>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
>> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave
>> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3.
>> archive (wal)"}.
>>
>> but what do i gain?
>
> Extra protection against failure, maybe.
>
> So:
>
>            --->  WAL Archive ---
>            |                    |
>            |    Streaming       |
> master ---  --------------------> slave
>
> If the direct link between the master and slave goes down, the slave can
> still get WALs from the archive. If the archive machine goes down you
> still have the direct link. If you take the slave down the master can
> still push WALs to the archive. This assumes the 'machines' are actually
> separated and connecting through different networks. You say you are
> using VMs, but not where they are running. If they are all running on
> the same machine running through the same network link then you really
> do not have protection against network issues. The same if the host
> machine goes down. This is one of those pen and paper times, when you
> sketch out the arrangement and start doing what ifs.

master, slave and archive can be 3 separate VMs on 1 host, with their
clones on 2nd and 3rd hosts.

a follow-up question on WAL recycling: ("When WAL archiving is being
done, the log segments must be archived before being recycled or
removed" from http://www.postgresql.org/docs/9.3/static/wal-configuration.html)

say streaming is off-
* if both master and archive are down, slave is still up and running. yes?
* if master writes data when archive is down, it will copy over to
slave when archive is back up. yes?
* but if WAL is recycled before archive is back up, it will not copy
over to slave. yes?
see my concern with a separate archive is if archive is down and
master gets stuck retrying to push the same segment again and again,
there may be a problem in recovery when archive is back up. no?


Re: better architecture?

From
Adrian Klaver
Date:
On 11/20/2014 04:57 PM, zach cruise wrote:
> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> On 11/20/2014 12:30 PM, zach cruise wrote:
>>>>
>>>> For more info see:
>>>>
>>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
>>> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave
>>> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3.
>>> archive (wal)"}.
>>>
>>> but what do i gain?
>>
>> Extra protection against failure, maybe.
>>
>> So:
>>
>>             --->  WAL Archive ---
>>             |                    |
>>             |    Streaming       |
>> master ---  --------------------> slave
>>
>> If the direct link between the master and slave goes down, the slave can
>> still get WALs from the archive. If the archive machine goes down you
>> still have the direct link. If you take the slave down the master can
>> still push WALs to the archive. This assumes the 'machines' are actually
>> separated and connecting through different networks. You say you are
>> using VMs, but not where they are running. If they are all running on
>> the same machine running through the same network link then you really
>> do not have protection against network issues. The same if the host
>> machine goes down. This is one of those pen and paper times, when you
>> sketch out the arrangement and start doing what ifs.

First of all, the below is really in need of whiteboard/paper diagram to
keep track of the moving parts. That being said here it goes:

>
> master, slave and archive can be 3 separate VMs on 1 host, with their
> clones on 2nd and 3rd hosts.

I can see the above being a potential nightmare. I am not sure how you
ensure that the cloning process results in clones that exactly mirror
the state of the originals at a particular point in time. Failing in
that would seem to me to lead to no end of issues in the replication
process.

>
> a follow-up question on WAL recycling: ("When WAL archiving is being
> done, the log segments must be archived before being recycled or
> removed" from http://www.postgresql.org/docs/9.3/static/wal-configuration.html)
>
> say streaming is off-
> * if both master and archive are down, slave is still up and running. yes?

Yes.

> * if master writes data when archive is down, it will copy over to
> slave when archive is back up. yes?

If streaming is off and you are doing archiving then it will copy over
to the archive.

> * but if WAL is recycled before archive is back up, it will not copy
> over to slave. yes?

The issue here as pointed out previously is that the WALs will stack up
on the master because it will not be able to archive them. So then you
run into a potential of of space issue on the master. From here:

http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html

"It is important that the archive command return zero exit status if and
only if it succeeds. Upon getting a zero result, PostgreSQL will assume
that the file has been successfully archived, and will remove or recycle
it. However, a nonzero status tells PostgreSQL that the file was not
archived; it will try again periodically until it succeeds."

...

"While designing your archiving setup, consider what will happen if the
archive command fails repeatedly because some aspect requires operator
intervention or the archive runs out of space. For example, this could
occur if you write to tape without an autochanger; when the tape fills,
nothing further can be archived until the tape is swapped. You should
ensure that any error condition or request to a human operator is
reported appropriately so that the situation can be resolved reasonably
quickly. The pg_xlog/ directory will continue to fill with WAL segment
files until the situation is resolved. (If the file system containing
pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed
transactions will be lost, but the database will remain offline until
you free some space.)"


> see my concern with a separate archive is if archive is down and
> master gets stuck retrying to push the same segment again and again,
> there may be a problem in recovery when archive is back up. no?

See above.

>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: better architecture?

From
zach cruise
Date:
On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> On 11/20/2014 04:57 PM, zach cruise wrote:
>> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>> On 11/20/2014 12:30 PM, zach cruise wrote:
>>>>>
>>>>> For more info see:
>>>>>
>>>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
>>>> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave
>>>> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3.
>>>> archive (wal)"}.
>>>>
>>>> but what do i gain?
>>>
>>> Extra protection against failure, maybe.
>>>
>>> So:
>>>
>>>             --->  WAL Archive ---
>>>             |                    |
>>>             |    Streaming       |
>>> master ---  --------------------> slave
>>>
>>> If the direct link between the master and slave goes down, the slave can
>>> still get WALs from the archive. If the archive machine goes down you
>>> still have the direct link. If you take the slave down the master can
>>> still push WALs to the archive. This assumes the 'machines' are actually
>>> separated and connecting through different networks. You say you are
>>> using VMs, but not where they are running. If they are all running on
>>> the same machine running through the same network link then you really
>>> do not have protection against network issues. The same if the host
>>> machine goes down. This is one of those pen and paper times, when you
>>> sketch out the arrangement and start doing what ifs.
>
> First of all, the below is really in need of whiteboard/paper diagram to
> keep track of the moving parts. That being said here it goes:
>
>>
>> master, slave and archive can be 3 separate VMs on 1 host, with their
>> clones on 2nd and 3rd hosts.
>
> I can see the above being a potential nightmare. I am not sure how you
> ensure that the cloning process results in clones that exactly mirror
> the state of the originals at a particular point in time. Failing in
> that would seem to me to lead to no end of issues in the replication
> process.
>
>>
>> a follow-up question on WAL recycling: ("When WAL archiving is being
>> done, the log segments must be archived before being recycled or
>> removed" from
>> http://www.postgresql.org/docs/9.3/static/wal-configuration.html)
>>
>> say streaming is off-
>> * if both master and archive are down, slave is still up and running.
>> yes?
>
> Yes.
>
>> * if master writes data when archive is down, it will copy over to
>> slave when archive is back up. yes?
>
> If streaming is off and you are doing archiving then it will copy over
> to the archive.
>
>> * but if WAL is recycled before archive is back up, it will not copy
>> over to slave. yes?
>
> The issue here as pointed out previously is that the WALs will stack up
> on the master because it will not be able to archive them. So then you
> run into a potential of of space issue on the master. From here:
>
> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
>
> "It is important that the archive command return zero exit status if and
> only if it succeeds. Upon getting a zero result, PostgreSQL will assume
> that the file has been successfully archived, and will remove or recycle
> it. However, a nonzero status tells PostgreSQL that the file was not
> archived; it will try again periodically until it succeeds."
>
> ...
>
> "While designing your archiving setup, consider what will happen if the
> archive command fails repeatedly because some aspect requires operator
> intervention or the archive runs out of space. For example, this could
> occur if you write to tape without an autochanger; when the tape fills,
> nothing further can be archived until the tape is swapped. You should
> ensure that any error condition or request to a human operator is
> reported appropriately so that the situation can be resolved reasonably
> quickly. The pg_xlog/ directory will continue to fill with WAL segment
> files until the situation is resolved. (If the file system containing
> pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed
> transactions will be lost, but the database will remain offline until
> you free some space.)"
>
>
>> see my concern with a separate archive is if archive is down and
>> master gets stuck retrying to push the same segment again and again,
>> there may be a problem in recovery when archive is back up. no?
>
> See above.

also streaming is working (checked with wireshark) and WAL is being
written to by master, but if i comment out streaming in pg_hba.conf,
WAL is not copying over to slave?!

how can i verify that WAL archiving is working for slave?


Re: better architecture?

From
Adrian Klaver
Date:
On 11/21/2014 07:38 PM, zach cruise wrote:
> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> On 11/20/2014 04:57 PM, zach cruise wrote:
>>> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>>> On 11/20/2014 12:30 PM, zach cruise wrote:
>>>>>>
>>>>>> For more info see:
>>>>>>
>>>>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
>>>>> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave
>>>>> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup - 3.
>>>>> archive (wal)"}.
>>>>>
>>>>> but what do i gain?
>>>>
>>>> Extra protection against failure, maybe.
>>>>
>>>> So:
>>>>
>>>>              --->  WAL Archive ---
>>>>              |                    |
>>>>              |    Streaming       |
>>>> master ---  --------------------> slave
>>>>
>>>> If the direct link between the master and slave goes down, the slave can
>>>> still get WALs from the archive. If the archive machine goes down you
>>>> still have the direct link. If you take the slave down the master can
>>>> still push WALs to the archive. This assumes the 'machines' are actually
>>>> separated and connecting through different networks. You say you are
>>>> using VMs, but not where they are running. If they are all running on
>>>> the same machine running through the same network link then you really
>>>> do not have protection against network issues. The same if the host
>>>> machine goes down. This is one of those pen and paper times, when you
>>>> sketch out the arrangement and start doing what ifs.
>>
>> First of all, the below is really in need of whiteboard/paper diagram to
>> keep track of the moving parts. That being said here it goes:
>>
>>>
>>> master, slave and archive can be 3 separate VMs on 1 host, with their
>>> clones on 2nd and 3rd hosts.
>>
>> I can see the above being a potential nightmare. I am not sure how you
>> ensure that the cloning process results in clones that exactly mirror
>> the state of the originals at a particular point in time. Failing in
>> that would seem to me to lead to no end of issues in the replication
>> process.
>>
>>>
>>> a follow-up question on WAL recycling: ("When WAL archiving is being
>>> done, the log segments must be archived before being recycled or
>>> removed" from
>>> http://www.postgresql.org/docs/9.3/static/wal-configuration.html)
>>>
>>> say streaming is off-
>>> * if both master and archive are down, slave is still up and running.
>>> yes?
>>
>> Yes.
>>
>>> * if master writes data when archive is down, it will copy over to
>>> slave when archive is back up. yes?
>>
>> If streaming is off and you are doing archiving then it will copy over
>> to the archive.
>>
>>> * but if WAL is recycled before archive is back up, it will not copy
>>> over to slave. yes?
>>
>> The issue here as pointed out previously is that the WALs will stack up
>> on the master because it will not be able to archive them. So then you
>> run into a potential of of space issue on the master. From here:
>>
>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
>>
>> "It is important that the archive command return zero exit status if and
>> only if it succeeds. Upon getting a zero result, PostgreSQL will assume
>> that the file has been successfully archived, and will remove or recycle
>> it. However, a nonzero status tells PostgreSQL that the file was not
>> archived; it will try again periodically until it succeeds."
>>
>> ...
>>
>> "While designing your archiving setup, consider what will happen if the
>> archive command fails repeatedly because some aspect requires operator
>> intervention or the archive runs out of space. For example, this could
>> occur if you write to tape without an autochanger; when the tape fills,
>> nothing further can be archived until the tape is swapped. You should
>> ensure that any error condition or request to a human operator is
>> reported appropriately so that the situation can be resolved reasonably
>> quickly. The pg_xlog/ directory will continue to fill with WAL segment
>> files until the situation is resolved. (If the file system containing
>> pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed
>> transactions will be lost, but the database will remain offline until
>> you free some space.)"
>>
>>
>>> see my concern with a separate archive is if archive is down and
>>> master gets stuck retrying to push the same segment again and again,
>>> there may be a problem in recovery when archive is back up. no?
>>
>> See above.
>
> also streaming is working (checked with wireshark) and WAL is being
> written to by master, but if i comment out streaming in pg_hba.conf,
> WAL is not copying over to slave?!

Do you have a restore_command set on the slave?:

http://www.postgresql.org/docs/9.3/static/archive-recovery-settings.html#RESTORE-COMMAND

per the docs here:

http://www.postgresql.org/docs/9.3/static/standby-settings.html

standby_mode (boolean)

     Specifies whether to start the PostgreSQL server as a standby. If
this parameter is on, the server will not stop recovery when the end of
archived WAL is reached, but will keep trying to continue recovery by
fetching new WAL segments using restore_command and/or by connecting to
the primary server as specified by the primary_conninfo setting.


>
> how can i verify that WAL archiving is working for slave?

Look in the logs or use the following functions:

http://www.postgresql.org/docs/9.3/interactive/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL

>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: better architecture?

From
zach cruise
Date:
On 11/22/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> On 11/21/2014 07:38 PM, zach cruise wrote:
>> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>> On 11/20/2014 04:57 PM, zach cruise wrote:
>>>> On 11/20/14, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>>>> On 11/20/2014 12:30 PM, zach cruise wrote:
>>>>>>>
>>>>>>> For more info see:
>>>>>>>
>>>>>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
>>>>>> to be clear- i change my 2 VMs setup {"1. master (dev) - 2. slave
>>>>>> (prod) setup"} to 3 VMs {"1. master (dev) - 2. slave (prod) setup -
>>>>>> 3.
>>>>>> archive (wal)"}.
>>>>>>
>>>>>> but what do i gain?
>>>>>
>>>>> Extra protection against failure, maybe.
>>>>>
>>>>> So:
>>>>>
>>>>>              --->  WAL Archive ---
>>>>>              |                    |
>>>>>              |    Streaming       |
>>>>> master ---  --------------------> slave
>>>>>
>>>>> If the direct link between the master and slave goes down, the slave
>>>>> can
>>>>> still get WALs from the archive. If the archive machine goes down you
>>>>> still have the direct link. If you take the slave down the master can
>>>>> still push WALs to the archive. This assumes the 'machines' are
>>>>> actually
>>>>> separated and connecting through different networks. You say you are
>>>>> using VMs, but not where they are running. If they are all running on
>>>>> the same machine running through the same network link then you really
>>>>> do not have protection against network issues. The same if the host
>>>>> machine goes down. This is one of those pen and paper times, when you
>>>>> sketch out the arrangement and start doing what ifs.
>>>
>>> First of all, the below is really in need of whiteboard/paper diagram to
>>> keep track of the moving parts. That being said here it goes:
>>>
>>>>
>>>> master, slave and archive can be 3 separate VMs on 1 host, with their
>>>> clones on 2nd and 3rd hosts.
>>>
>>> I can see the above being a potential nightmare. I am not sure how you
>>> ensure that the cloning process results in clones that exactly mirror
>>> the state of the originals at a particular point in time. Failing in
>>> that would seem to me to lead to no end of issues in the replication
>>> process.
>>>
>>>>
>>>> a follow-up question on WAL recycling: ("When WAL archiving is being
>>>> done, the log segments must be archived before being recycled or
>>>> removed" from
>>>> http://www.postgresql.org/docs/9.3/static/wal-configuration.html)
>>>>
>>>> say streaming is off-
>>>> * if both master and archive are down, slave is still up and running.
>>>> yes?
>>>
>>> Yes.
>>>
>>>> * if master writes data when archive is down, it will copy over to
>>>> slave when archive is back up. yes?
>>>
>>> If streaming is off and you are doing archiving then it will copy over
>>> to the archive.
>>>
>>>> * but if WAL is recycled before archive is back up, it will not copy
>>>> over to slave. yes?
>>>
>>> The issue here as pointed out previously is that the WALs will stack up
>>> on the master because it will not be able to archive them. So then you
>>> run into a potential of of space issue on the master. From here:
>>>
>>> http://www.postgresql.org/docs/9.3/interactive/continuous-archiving.html
>>>
>>> "It is important that the archive command return zero exit status if and
>>> only if it succeeds. Upon getting a zero result, PostgreSQL will assume
>>> that the file has been successfully archived, and will remove or recycle
>>> it. However, a nonzero status tells PostgreSQL that the file was not
>>> archived; it will try again periodically until it succeeds."
>>>
>>> ...
>>>
>>> "While designing your archiving setup, consider what will happen if the
>>> archive command fails repeatedly because some aspect requires operator
>>> intervention or the archive runs out of space. For example, this could
>>> occur if you write to tape without an autochanger; when the tape fills,
>>> nothing further can be archived until the tape is swapped. You should
>>> ensure that any error condition or request to a human operator is
>>> reported appropriately so that the situation can be resolved reasonably
>>> quickly. The pg_xlog/ directory will continue to fill with WAL segment
>>> files until the situation is resolved. (If the file system containing
>>> pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed
>>> transactions will be lost, but the database will remain offline until
>>> you free some space.)"
>>>
>>>
>>>> see my concern with a separate archive is if archive is down and
>>>> master gets stuck retrying to push the same segment again and again,
>>>> there may be a problem in recovery when archive is back up. no?
>>>
>>> See above.
>>
>> also streaming is working (checked with wireshark) and WAL is being
>> written to by master, but if i comment out streaming in pg_hba.conf,
>> WAL is not copying over to slave?!
>
> Do you have a restore_command set on the slave?:
>
> http://www.postgresql.org/docs/9.3/static/archive-recovery-settings.html#RESTORE-COMMAND

yes (recovery.conf on slave: restore_command = 'copy "\\path\\%f" "%p"')

> per the docs here:
>
> http://www.postgresql.org/docs/9.3/static/standby-settings.html
>
> standby_mode (boolean)
>
>      Specifies whether to start the PostgreSQL server as a standby. If
> this parameter is on, the server will not stop recovery when the end of
> archived WAL is reached, but will keep trying to continue recovery by
> fetching new WAL segments using restore_command and/or by connecting to
> the primary server as specified by the primary_conninfo setting.

yes (recovery.conf on slave: standby_mode = 'on')

>> how can i verify that WAL archiving is working for slave?
>
> Look in the logs or use the following functions:
>
> http://www.postgresql.org/docs/9.3/interactive/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL

here're my outputs:
pg_is_in_recovery() --> t
pg_last_xlog_receive_location() --> "5/D..."
pg_last_xlog_replay_location() --> "5/D..."
pg_last_xact_replay_timestamp() --> "2014-..."

pg_is_xlog_replay_paused() --> f
pg_xlog_replay_pause()
pg_xlog_replay_resume()