Thread: replication recovery/startup question

replication recovery/startup question

From
Rob Cowell
Date:

Hi,

 

I started replicating my Postgres9.1.3 server on Friday, and it seems to be working well.

The master server database is quite large (somewhere in the order of 85GB) so the rsync to copy the base_backup took a while to complete.

 

However, looking at the log files and ‘ps’ I’m wondering if the replication has caught up or is still catching up.

 

Master :

postgres  3964  3926  0 Jun21 ?        00:00:32 postgres: archiver process   last was 000000010000003D00000055

postgres 15080  3926  0 Jun22 ?        00:02:39 postgres: wal sender process dbrepl xxx.xxx.xxx.xxx(46496) streaming 3D/560752D0

 

Slave :

postgres  5048  5040  0 Jun22 ?        00:11:37 postgres: startup process   recovering 000000010000003D00000056

postgres  6714  5040  0 Jun22 ?        00:06:24 postgres: wal receiver process   streaming 3D/560752D0

 

The log files in ~data/pg_xlog show :

 

ls -lart

<list truncated>

-rw-------  1 postgres postgres 16777216 Jun 25 13:36 000000010000003F0000004C

-rw-------  1 postgres postgres 16777216 Jun 25 13:36 000000010000003F0000004D

-rw-------  1 postgres postgres 16777216 Jun 25 13:37 000000010000003F00000050

-rw-------  1 postgres postgres 16777216 Jun 25 13:46 000000010000003F00000052

-rw-------  1 postgres postgres 16777216 Jun 25 14:24 000000010000003F00000053

-rw-------  1 postgres postgres 16777216 Jun 25 14:50 000000010000003F00000054

-rw-------  1 postgres postgres 16777216 Jun 25 14:59 000000010000003F00000055

-rw-------  1 postgres postgres 16777216 Jun 25 15:03 000000010000003F00000056

-rw-------  1 postgres postgres 16777216 Jun 25 15:06 000000010000003F00000057

-rw-------  1 postgres postgres 16777216 Jun 25 15:10 000000010000003F00000058

drwx------  4 postgres postgres    69632 Jun 25 15:24 .

-rw-------  1 postgres postgres 16777216 Jun 25 15:29 000000010000003D00000055

-rw-------  1 postgres postgres 16777216 Jun 25 15:30 000000010000003D00000056

 

 

Why would the output from ‘ls’ show older filenames (0....1....3D...xx) as newer in date than the “0....1....3F...xx” filenames?

Does Postgres re-cycle old log filenames ?

 

Does the output from ‘ps’ mean the master/slave are in sync, or is the slave really still playing catchup (based on the names of the logfiles in pg_xlog) ?

 

Cheers,

Rob.

Re: replication recovery/startup question

From
Greg Smith
Date:
On 06/25/2012 11:40 AM, Rob Cowell wrote:
> Why would the output from ‘ls’ show older filenames (0....1....3D...xx)
> as newer in date than the “0....1....3F...xx” filenames?
>
> Does Postgres re-cycle old log filenames ?

It recycles old log files.  If you turn on log_checkpoints, you can see
how many and how often.  It will list a count of recycled WAL files at
each checkpoint, along with how many of the old ones were just deleted
instead.

The weird pattern in the timestamps you're seeing is a state in the
middle of doing that, and yes they look quite weird sometimes.  The
files are noteed as reusable, get re-initialized to hold new data
(they're not overwritten completely with zeros like new WAL files are),
and renamed to a new segment number.  And each of those steps has a
corresponding flush to disk step which makes sure the filesystem
metadata is updated.  Some of the middle states there are unusual.

> Does the output from ‘ps’ mean the master/slave are in sync, or is the
> slave really still playing catchup (based on the names of the logfiles
> in pg_xlog) ?

Your example was in sync, with the file names just being odd due to the
implementation of WAL file recycling.  You might also check
pg_stat_replication to get an easier view of things, rather than relying
on ps.  ps is correct, it's just harder to check.


--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com