I am currently running Postgresql 9.2.1 with streaming replication: one primary, one standby. Once an hour I have a job which compares pg_current_xlog_location on the primary against pg_last_xlog_replay_location on the standby to ensure the standby is not lagging too far behind the primary. So far everything is working great.
I noticed, however, that despite the fact that the cluster is consistently in sync the md5sums and modified timestamps on many of my data files differ. For example:
The reason I am curious about this is because when both systems are healthy and I wish to swap primaries, I will bring the primary and the standby down and do a full rsync of the data/ directory from old primary to new primary. However, because the data files are different, the rsync run takes a very long time.
My questions are: 1) While the xlog location between primary and standby remains consistent, are the data files, internally, structured differently between primary and standby? 2) Is this expected, and if so, what causes them to diverge?
Thanks in advance for helping me understand this behavior!