Re: streaming replication - pgsql 9.2.3 - Mailing list pgsql-admin

From Jason Mathis
Subject Re: streaming replication - pgsql 9.2.3
Date
Msg-id etPan.534c59b5.2eb141f2.226e@palos
Whole thread Raw
In response to Re: streaming replication - pgsql 9.2.3  (Scott Whitney <scott@journyx.com>)
List pgsql-admin
Yes, you cannot use streaming replication for PITR. Although you don’t necessarily need a standby server up all the time to do a PITR and in fact may defeat the purpose. For example if that bad thing happens, depending on the time when you found out it could have been applied to the standby server. Then you are stuck going back to just doing the PITR again. While there are some other uses this is the splitting moment of DR/HA. 

As long as you have a base backup(not a dump!) and all the wal files stored somewhere (NAS/S3/another server) you can go to any point in time. It may take some time to get there, but you can do it. 

If you can lose 24 hours of data and take nightly dumps, then you are go to good and probably do not need archiving. Although if you can’t lose more than x minutes/hours/seconds of incoming data then you probably need archiving.

hope that clears it up. 

On April 14, 2014 at 3:31:39 PM, Scott Whitney (scott@journyx.com) wrote:

Only one clarification here, and, Jason, correct me if I am wrong.

You cannot easily use a steaming replication server for PITR. Ideally you would want a WAL replay server that is set to standby (which would allow you to play forward any number of WALs to a point in time that you specify) for PITR, and a replication server for HA/failover.

In my example, I _could_ lose up to 24hrs data, as my pig _ dumps happen nightly. A "for each in/drop database" malicious (or stupid) statement would replicate in real time to my servers, and not having a standby ready to replay the WALs just prior to that point would require use of my nightly backups.

Had I also a standby server that I could instruct to replay the WALs, I could get to the proper PITR (minus whatever my archive timeout is).

So...like I said. I see no value in WAL archiving for a strictly - replication setup, but that MUST be coupled with a valid DR plan.


-------- Original message --------
From: Jason Mathis
Date:04/14/2014 4:14 PM (GMT-06:00)
To: Jim Mercer ,Scott Whitney ,Scott Whitney
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] streaming replication - pgsql 9.2.3

Just keep in mind that while wal archiving and replication sound very similar they are intended for two different things. 

Wal Archiving = backup/DR/PITR/standby
Replication = High availability/reporting 

You don’t need to set up archiving for streaming replication, although you may want to; I think scott explains that well. If something bad happens like a mistaken dml or ddl statement, that will get replicated (streaming replication works nice and fast;) and you can not go back to recover, unless you do something tricky with the keep_segments; don’t rely on that. Although, if you are even thinking about doing something tricky with the keep_segments then just setup archiving and forget about it. 

With archiving you can go back to ANY point in time. With replication you can have a new server w/data up and ready in seconds or off-load read queries for reporting. Use one, the other, or both!

Hope that explains it? Let us know if not. 

-jason 


On April 14, 2014 at 10:58:29 AM, Scott Whitney (scott@journyx.com) wrote:

My particular opinion on this sometimes causes "a bit of a stir," let us say, so take this as my opinion and my use case.

I have:
1) Production server
2) "Non-production" (demos/training) used to replicate #1
3) Offsite "internal production" used to replicate #1 and #2

So,
#2 replicates #1 (gigabit back end)
#3 replicates #2 _and_ #1 (65Mbps back end)

Each does their own individual pg_dump of each db nightly for DR/backup purposes.

I do not archive the WAL files and I feel it's more trouble than it is worth. Rather, I set:

wal_keep_segments = 500

500 on my production server gives me about 30 hours, give or take.

I know this because I watch how quickly I fill up a 16MB log over the course
of several says.

So...why keep WAL files in a replicated environment?

I only see them as useful for one of 2 reasons.

a) In theory you could do something clever and have your replicated server set up to "be behind"
a certain amount, and you could likely use those WAL files for a PITR at some point in the past
(in between your current replication time and the latest WAL file). Maybe. Don't know. Never tried.
This would be useful if you "lost" your production cluster at some point in between those 2 windows,
and you were fast enough to notice it. Maybe.

b) If your server stops replicating (out of space, can't connect, fell behind due to bandwidth are
the 3 scenarios I have run into more than once), and I get to 502 WAL files required, in order to
get my slave back to good, I then have to take #3 offline, do my pg_startbackup(), rsync, and bring it
back online in order to get replication functioning again. If I _had_ archived those files, in theory it would
eventually catch up rather than missing a current WAL file that is needed for replication.

That's my .02 on the matter. I've been running it this way for about a year, now, and it works great for me.



Hi,

I have set up streaming replication between two servers, using the various
walkthroughs the googlepedia found for me.

At this point, it certainly seems that the replication is working, and I am
able to actually blow away the slave, and have it rebuild, resync and be back
in action as a read-only source.

I am confused about the purpose and/or point of the WAL files in the context
of streaming replication.

The documentation seems to be quite insistent that the 'archive_command' should
be enabled on the master, and the 'restore_command' should be enabled on the
slave.

On the master, I made my own script which is fairly pedantic:
- copy the WAL file from $PGDATA/pg_xlog into $MASTER:$ARCHDIR
- rsync file(s) from $MASTER:$ARCHDIR to $SLAVE:$ARCHDIR
- remove copy of file in $MASTER:$ARCHDIR
(i don't have a network share between then, so, i am rsyncing)

On the slave, I made a similarly pedandic script:
- copy the file from $SLAVE:$ARCHDIR to $PGDATA/pg_xlog/RECOVERYXLOG

The script on the master runs with regularity, and i am seeing the WAL files
show up in $SLAVE:$ARCHDIR

however, on the slave side, i'm seeing some issues:

- the filename being handed to the script is sometimes for a non-existent file
- it appears to be the next file in sequence, but that filename hasn't even
been generated on the master yet.
- it appears, that if there are multiple files in $SLAVE:$ARCHDIR, they might
actually overwrite themselves when copied to pg_xlog/RECOVERYXLOG

these are the logs my script is generating, if i do a stop/start of postgres:

Apr 14 12:19:20 dt-pgsql-001 archive-slave.sh: notice: rtn=0 - cp [/export/pg-archive/00000001000000220000007F] /export/postgresql/data/main/[pg_xlog/RECOVERYXLOG]
Apr 14 12:19:20 dt-pgsql-001 archive-slave.sh: notice: rtn=0 - cp [/export/pg-archive/000000010000002200000080] /export/postgresql/data/main/[pg_xlog/RECOVERYXLOG]
Apr 14 12:19:23 dt-pgsql-001 archive-slave.sh: error: no such file[/export/pg-archive/000000010000002200000081]
Apr 14 12:19:23 dt-pgsql-001 archive-slave.sh: error: no such file[/export/pg-archive/000000010000002200000081]


am i doing something wrong?

please advise.


--
Jim Mercer     Reptilian Research      jim@reptiles.org    +1 416 410-5633
"He who dies with the most toys is nonetheless dead"


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


This transmission contains confidential and privileged information intended solely for the party identified above. If you receive this message in error, you must not use it or convey it to others. Please destroy it immediately and contact the sender at (303) 386-3955 or by return e-mail to the sender.

This transmission contains confidential and privileged information intended solely for the party identified above. If you receive this message in error, you must not use it or convey it to others. Please destroy it immediately and contact the sender at (303) 386-3955 or by return e-mail to the sender.

pgsql-admin by date:

Previous
From: Jim Mercer
Date:
Subject: Re: streaming replication - pgsql 9.2.3
Next
From: Jason Mathis
Date:
Subject: Re: streaming replication - pgsql 9.2.3