Thread: replication recovery/startup question
Hi,
I started replicating my Postgres9.1.3 server on Friday, and it seems to be working well.
The master server database is quite large (somewhere in the order of 85GB) so the rsync to copy the base_backup took a while to complete.
However, looking at the log files and ‘ps’ I’m wondering if the replication has caught up or is still catching up.
Master :
postgres 3964 3926 0 Jun21 ? 00:00:32 postgres: archiver process last was 000000010000003D00000055
postgres 15080 3926 0 Jun22 ? 00:02:39 postgres: wal sender process dbrepl xxx.xxx.xxx.xxx(46496) streaming 3D/560752D0
Slave :
postgres 5048 5040 0 Jun22 ? 00:11:37 postgres: startup process recovering 000000010000003D00000056
postgres 6714 5040 0 Jun22 ? 00:06:24 postgres: wal receiver process streaming 3D/560752D0
The log files in ~data/pg_xlog show :
ls -lart
<list truncated>
-rw------- 1 postgres postgres 16777216 Jun 25 13:36 000000010000003F0000004C
-rw------- 1 postgres postgres 16777216 Jun 25 13:36 000000010000003F0000004D
-rw------- 1 postgres postgres 16777216 Jun 25 13:37 000000010000003F00000050
-rw------- 1 postgres postgres 16777216 Jun 25 13:46 000000010000003F00000052
-rw------- 1 postgres postgres 16777216 Jun 25 14:24 000000010000003F00000053
-rw------- 1 postgres postgres 16777216 Jun 25 14:50 000000010000003F00000054
-rw------- 1 postgres postgres 16777216 Jun 25 14:59 000000010000003F00000055
-rw------- 1 postgres postgres 16777216 Jun 25 15:03 000000010000003F00000056
-rw------- 1 postgres postgres 16777216 Jun 25 15:06 000000010000003F00000057
-rw------- 1 postgres postgres 16777216 Jun 25 15:10 000000010000003F00000058
drwx------ 4 postgres postgres 69632 Jun 25 15:24 .
-rw------- 1 postgres postgres 16777216 Jun 25 15:29 000000010000003D00000055
-rw------- 1 postgres postgres 16777216 Jun 25 15:30 000000010000003D00000056
Why would the output from ‘ls’ show older filenames (0....1....3D...xx) as newer in date than the “0....1....3F...xx” filenames?
Does Postgres re-cycle old log filenames ?
Does the output from ‘ps’ mean the master/slave are in sync, or is the slave really still playing catchup (based on the names of the logfiles in pg_xlog) ?
Cheers,
Rob.
On 06/25/2012 11:40 AM, Rob Cowell wrote: > Why would the output from ‘ls’ show older filenames (0....1....3D...xx) > as newer in date than the “0....1....3F...xx” filenames? > > Does Postgres re-cycle old log filenames ? It recycles old log files. If you turn on log_checkpoints, you can see how many and how often. It will list a count of recycled WAL files at each checkpoint, along with how many of the old ones were just deleted instead. The weird pattern in the timestamps you're seeing is a state in the middle of doing that, and yes they look quite weird sometimes. The files are noteed as reusable, get re-initialized to hold new data (they're not overwritten completely with zeros like new WAL files are), and renamed to a new segment number. And each of those steps has a corresponding flush to disk step which makes sure the filesystem metadata is updated. Some of the middle states there are unusual. > Does the output from ‘ps’ mean the master/slave are in sync, or is the > slave really still playing catchup (based on the names of the logfiles > in pg_xlog) ? Your example was in sync, with the file names just being odd due to the implementation of WAL file recycling. You might also check pg_stat_replication to get an easier view of things, rather than relying on ps. ps is correct, it's just harder to check. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com