Re: Streaming Replication Randomly Locking Up - Mailing list pgsql-general

From John DeSoi
Subject Re: Streaming Replication Randomly Locking Up
Date
Msg-id C9B41E27-B487-411F-A1BD-9FDC9340E5C3@pgedit.com
Whole thread Raw
In response to Streaming Replication Randomly Locking Up  (Andrew Berman <rexxe98@gmail.com>)
Responses Re: Streaming Replication Randomly Locking Up  (Andrew Berman <rexxe98@gmail.com>)
List pgsql-general
On Aug 15, 2013, at 1:07 PM, Andrew Berman <rexxe98@gmail.com> wrote:

> I'm having an issue where streaming replication just randomly stops working.  I haven't been able to find anything in
thelogs which point to an issue, but the Postgres process shows a "waiting" status on the slave: 
>
> postgres  5639  0.1 24.3 3428264 2970236 ?     Ss   Aug14   1:54 postgres: startup process   recovering
000000010000053D0000003Fwaiting 
> postgres  5642  0.0 21.4 3428356 2613252 ?     Ss   Aug14   0:30 postgres: writer process
> postgres  5659  0.0  0.0 177524   788 ?        Ss   Aug14   0:03 postgres: stats collector process
> postgres  7159  1.2  0.1 3451360 18352 ?       Ss   Aug14  17:31 postgres: wal receiver process   streaming
549/216B3730
>
> The replication works great for days, but randomly seems to lock up and replication halts.  I verified that the two
databaseswere out of sync with a query on both of them.  Has anyone experienced this issue before?  
>
> Here are some relevant config settings:
>
> Master:
>
> wal_level = hot_standby
> checkpoint_segments = 32
> checkpoint_completion_target = 0.9
> archive_mode = on
> archive_command = 'rsync -a %p foo@foo:/var/lib/pgsql/9.1/wals/%f </dev/null'
> max_wal_senders = 2
> wal_keep_segments = 32

I recently posted about the same thing -- replication just stops after working OK for days or weeks, no errors in the
logson master or slave. 

It appears I solved it by adding --timeout=30 to my rsync command. My guess was some kind of network hang and then
rsyncwould just wait forever and never return. 

John DeSoi, Ph.D.



pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: Re: devide and summarize sql result
Next
From: Andrew Berman
Date:
Subject: Re: Streaming Replication Randomly Locking Up