streaming replication and data file consistency - Mailing list pgsql-general

From Matt Savona
Subject streaming replication and data file consistency
Date
Msg-id CAKuu0C=uUYZUzxP6KcLNrx3KG0eKQOSg_SEMMLWpdaEufp=1tg@mail.gmail.com
Whole thread Raw
List pgsql-general
Hi all,

I am currently running Postgresql 9.2.1 with streaming replication: one primary, one standby.  Once an hour I have a job which compares pg_current_xlog_location on the primary against pg_last_xlog_replay_location on the standby to ensure the standby is not lagging too far behind the primary. So far everything is working great.

I noticed, however, that despite the fact that the cluster is consistently in sync the md5sums and modified timestamps on many of my data files differ. For example:

PRIMARY

# stat pgsql/data/base/16385/17600
  File: `pgsql/data/base/16385/17600'
  Size: 3112960         Blocks: 6080       IO Block: 4096   regular file
Device: fd02h/64770d    Inode: 39167976    Links: 1
Access: (0600/-rw-------)  Uid: (   26/postgres)   Gid: (   26/postgres)
Access: 2012-10-22 10:05:29.314607927 -0400
Modify: 2012-10-22 09:48:03.770209170 -0400
Change: 2012-10-22 09:48:03.770209170 -0400

# md5sum pgsql/data/base/16385/17600
5fb7909ea14ab7aa9636b31df5679bd4  pgsql/data/base/16385/17600

STANDBY

# stat pgsql/data/base/16385/17600
  File: `pgsql/data/base/16385/17600'
  Size: 3112960         Blocks: 6080       IO Block: 4096   regular file
Device: fd02h/64770d    Inode: 134229639   Links: 1
Access: (0600/-rw-------)  Uid: (   26/postgres)   Gid: (   26/postgres)
Access: 2012-10-22 10:05:25.361235742 -0400
Modify: 2012-10-22 09:50:29.674567827 -0400
Change: 2012-10-22 09:50:29.674567827 -0400

# md5sum pgsql/data/base/16385/17600
9deeb7b446c12fbb5745d4d282113d3c  pgsql/data/base/16385/17600

The reason I am curious about this is because when both systems are healthy and I wish to swap primaries, I will bring the primary and the standby down and do a full rsync of the data/ directory from old primary to new primary. However, because the data files are different, the rsync run takes a very long time.

My questions are:
  1) While the xlog location between primary and standby remains consistent, are the data files, internally, structured differently between primary and standby?
  2) Is this expected, and if so, what causes them to diverge?

Thanks in advance for helping me understand this behavior!

- Matt

pgsql-general by date:

Previous
From: chinnaobi
Date:
Subject: Re: Streaming replication failed to start scenarios
Next
From: Merlin Moncure
Date:
Subject: Re: Revert TRUNCATE CASCADE?