Thread: WAL segment question
Our company has two sites, a master and a disaster recovery site.
I am trying to assess whether data has been lost during a fail-over, due to the asynchronous method of transporting WAL file data between sites (we use DRBD).
Disaster recovery node start-up log:
user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG: record with zero length at 0/7C62F52C
user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG: redo done at 0/7C62F4E8
user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG: redo done at 0/7C62F4E8
My question is: given the above WAL segment information, can I use it to decide whether information was lost in the master database site, at the point of failure, without starting the database up?
Regards
Donald Fraser.
On Sat, Aug 04, 2007 at 12:00:52AM +0100, Donald Fraser wrote: > Our company has two sites, a master and a disaster recovery site. > I am trying to assess whether data has been lost during a fail-over, due to the asynchronous method of transporting WALfile data between sites (we use DRBD). > > Disaster recovery node start-up log: > user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG: record with zero length at 0/7C62F52C > user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG: redo done at 0/7C62F4E8 > > My question is: given the above WAL segment information, can I use it to decide whether information was lost in the masterdatabase site, at the point of failure, without starting the database up? I'm not sure, but you could try md5-ing the appropriate WAL file on the master and slave; if the files are identical then you shouldn't have lost any data... if they are then you probably have. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
On Sat, 2007-08-04 at 00:00 +0100, Donald Fraser wrote: > Our company has two sites, a master and a disaster recovery site. > I am trying to assess whether data has been lost during a fail-over, > due to the asynchronous method of transporting WAL file data between > sites (we use DRBD). > > Disaster recovery node start-up log: > user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG: record > with zero length at 0/7C62F52C > user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG: redo > done at 0/7C62F4E8 > > My question is: given the above WAL segment information, can I use it > to decide whether information was lost in the master database site, at > the point of failure, without starting the database up? Currently there is no built-in means of finding out the time of the last record transferred, if the primary stays down. You need to use xlogdump to locate the time of the last commit/abort record. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Donald Fraser wrote:
DRBD is a synchronous method of transferring data - it's not asynchronous. Assuming you are using Protocol C you should find that both are always in sync. If you are using protocol B then things will be in sync so long as the remote system didn't crash, and if you are using Protocol A then things should be in sync as long as the local system didn't crash. Our company has two sites, a master and a disaster recovery site.I am trying to assess whether data has been lost during a fail-over, due to the asynchronous method of transporting WAL file data between sites (we use DRBD).Disaster recovery node start-up log:user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG: record with zero length at 0/7C62F52C
user= pid=9907 timestamp=[2007-08-01 17:09:51 BST] tid= LOG: redo done at 0/7C62F4E8My question is: given the above WAL segment information, can I use it to decide whether information was lost in the master database site, at the point of failure, without starting the database up?RegardsDonald Fraser.
-- Chander Ganesan Open Technology Group, Inc. One Copley Parkway, Suite 210 Morrisville, NC 27560 919-463-0999/866-229-3386 http://www.otg-nc.com
----- Original Message ----- From: Chander Ganesan > DRBD is a synchronous method of transferring data - it's not asynchronous. Assuming you are using Protocol C you should find that both are always in sync. If you are using protocol B then things will be in sync so long as the remote system didn't crash, and if you are using Protocol A then things should be in sync as long as the local system didn't crash. Apologies if you thought I was insinuating DRBD is categorically asynchronous. But you assumed wrong, we use protocol A, because we don't have a fast WAN link < 10Mbps. According to the DRBD documentation: Protocol A: write IO is reported as completed, if it has reached local disk and local tcp send buffer. Which I interpret as: A for Asynchronous, which means I can complete writing data and start writing the next set of data before the first set of data has been synchronised at the remote site. This is the cost of our system - we are willing to loose data for improved performance of our master server. We've had one failure in 7 years that I've been at the company... Many thanks to Linux and more importantly PostgreSQL! One day when our local telecomms company stop the monopoly on leased data lines, we may upgrade our DR link to a 100Mbps or even 1Gbps link, in which case, protocol B or C will be a better solution. Unfortunately I think there's a better chance that the cost of postage will go down first.