Thread: WAL segment question

WAL segment question

From
"Donald Fraser"
Date:
Our company has two sites, a master and a disaster recovery site.
I am trying to assess whether data has been lost during a fail-over, due to the asynchronous method of transporting WAL file data between sites (we use DRBD).
 
Disaster recovery node start-up log:
user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG:  record with zero length at 0/7C62F52C
user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG:  redo done at 0/7C62F4E8
 
My question is: given the above WAL segment information, can I use it to decide whether information was lost in the master database site, at the point of failure, without starting the database up?
 
Regards
Donald Fraser.
 
 

Re: WAL segment question

From
Decibel!
Date:
On Sat, Aug 04, 2007 at 12:00:52AM +0100, Donald Fraser wrote:
> Our company has two sites, a master and a disaster recovery site.
> I am trying to assess whether data has been lost during a fail-over, due to the asynchronous method of transporting
WALfile data between sites (we use DRBD). 
>
> Disaster recovery node start-up log:
> user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG:  record with zero length at 0/7C62F52C
> user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG:  redo done at 0/7C62F4E8
>
> My question is: given the above WAL segment information, can I use it to decide whether information was lost in the
masterdatabase site, at the point of failure, without starting the database up? 

I'm not sure, but you could try md5-ing the appropriate WAL file on the
master and slave; if the files are identical then you shouldn't have
lost any data... if they are then you probably have.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: WAL segment question

From
"Simon Riggs"
Date:
On Sat, 2007-08-04 at 00:00 +0100, Donald Fraser wrote:
> Our company has two sites, a master and a disaster recovery site.
> I am trying to assess whether data has been lost during a fail-over,
> due to the asynchronous method of transporting WAL file data between
> sites (we use DRBD).
>
> Disaster recovery node start-up log:
> user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG:  record
> with zero length at 0/7C62F52C
> user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG:  redo
> done at 0/7C62F4E8
>
> My question is: given the above WAL segment information, can I use it
> to decide whether information was lost in the master database site, at
> the point of failure, without starting the database up?

Currently there is no built-in means of finding out the time of the last
record transferred, if the primary stays down. You need to use xlogdump
to locate the time of the last commit/abort record.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: WAL segment question

From
Chander Ganesan
Date:
Donald Fraser wrote:
Our company has two sites, a master and a disaster recovery site.
I am trying to assess whether data has been lost during a fail-over, due to the asynchronous method of transporting WAL file data between sites (we use DRBD).
 
Disaster recovery node start-up log:
user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG:  record with zero length at 0/7C62F52C
user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG:  redo done at 0/7C62F4E8
 
My question is: given the above WAL segment information, can I use it to decide whether information was lost in the master database site, at the point of failure, without starting the database up?
 
Regards
Donald Fraser.
DRBD is a synchronous method of transferring data - it's not asynchronous.  Assuming you are using Protocol C you should find that both are always in sync.  If you are using protocol B then things will be in sync so long as the remote system didn't crash, and if you are using Protocol A then things should be in sync as long as the local system didn't crash.
-- 
Chander Ganesan
Open Technology Group, Inc.
One Copley Parkway, Suite 210
Morrisville, NC  27560
919-463-0999/866-229-3386
http://www.otg-nc.com

Re: WAL segment question

From
"Donald Fraser"
Date:
----- Original Message -----
From: Chander Ganesan
> DRBD is a synchronous method of transferring data - it's not asynchronous.
Assuming you are using Protocol C you should find that both are always in
sync.  If you are using protocol B then things will be in sync so long as
the remote system didn't crash, and if you are using Protocol A then things
should be in sync as long as the local system didn't crash.

Apologies if you thought I was insinuating DRBD is categorically
asynchronous.
But you assumed wrong, we use protocol A, because we don't have a fast WAN
link < 10Mbps.
According to the DRBD documentation:
Protocol A: write IO is reported as completed, if it has reached local disk
and local tcp send buffer.

Which I interpret as: A for Asynchronous, which means I can complete writing
data and start writing the next set of data before the first set of data has
been synchronised at the remote site.

This is the cost of our system - we are willing to loose data for improved
performance of our master server. We've had one failure in 7 years that I've
been at the company... Many thanks to Linux and more importantly PostgreSQL!

One day when our local telecomms company stop the monopoly on leased data
lines, we may upgrade our DR link to a 100Mbps or even 1Gbps link, in which
case, protocol B or C will be a better solution. Unfortunately I think
there's a better chance that the cost of postage will go down first.