Home > mailing lists

Re: Streaming Replication Randomly Locking Up - Mailing list pgsql-general

From	Andrew Berman
Subject	Re: Streaming Replication Randomly Locking Up
Date	August 15, 2013 22:38:27
Msg-id	CAEVpa74hs1sK1-uM9GGvu5aqgmK+y+aPTrHz8ZvzV_krU=OT1g@mail.gmail.com Whole thread Raw
In response to	Re: Streaming Replication Randomly Locking Up (Lonni J Friedman <netllama@gmail.com>)
List	pgsql-general

Tree view

Yep, that's the first thing I'm going to do.

On Thu, Aug 15, 2013 at 12:34 PM, Lonni J Friedman <netllama@gmail.com> wrote:

I'd suggest enhancing your logging to include time/datestamps for
every entry, and also the client hostname. That will help to rule
in/out those 'unexpected EOF' errors.

On Thu, Aug 15, 2013 at 12:22 PM, Andrew Berman <rexxe98@gmail.com> wrote:
> The only thing I see that is a possibility for the issue is in the slave
> log:
>
> LOG: unexpected EOF on client connection
> LOG: could not receive data from client: Connection reset by peer
>
> I don't know if that's related or not as it could just be somebody running a
> query. The log file does seem to be riddled with these but the replication
> failures don't happen constantly.
>
> As far as I know I'm not swallowing any errors. The logging is all set as
> the default:
>
> log_destination = 'stderr'
> logging_collector = on
> #client_min_messages = notice
> #log_min_messages = warning
> #log_min_error_statement = error
> #log_min_duration_statement = -1
> #log_checkpoints = off
> #log_connections = off
> #log_disconnections = off
> #log_error_verbosity = default
>
> I'm going to have a look at the NICs to make sure there's no issue there.
>
> Thanks again for your help!
>
>
> On Thu, Aug 15, 2013 at 11:51 AM, Lonni J Friedman <netllama@gmail.com>
> wrote:
>>
>> Are you certain that there are no relevant errors in the database logs
>> (on both master & slave)? Also, are you sure that you didn't
>> misconfigure logging such that errors wouldn't appear?
>>
>> On Thu, Aug 15, 2013 at 11:45 AM, Andrew Berman <rexxe98@gmail.com> wrote:
>> > Hi Lonni,
>> >
>> > Yes, I am using PG 9.1.9.
>> > Yes, 1 slave syncing from the master
>> > CentOS 6.4
>> > I don't see any network or hardware issues (e.g. NIC) but will look more
>> > into this. They are communicating on a private network and switch.
>> >
>> > I forgot to mention that after I restart the slave, everything syncs
>> > right
>> > back up and all if working again so if it is a network issue, the
>> > replication is just stopping after some hiccup instead of retrying and
>> > resuming when things are back up.
>> >
>> > Thanks!
>> >
>> >
>> >
>> > On Thu, Aug 15, 2013 at 11:32 AM, Lonni J Friedman <netllama@gmail.com>
>> > wrote:
>> >>
>> >> I've never seen this happen. Looks like you might be using 9.1? Are
>> >> you up to date on all the 9.1.x releases?
>> >>
>> >> Do you have just 1 slave syncing from the master?
>> >> Which OS are you using?
>> >> Did you verify that there aren't any network problems between the
>> >> slave & master?
>> >> Or hardware problems (like the NIC dying, or dropping packets)?
>> >>
>> >>
>> >> On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman <rexxe98@gmail.com>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > I'm having an issue where streaming replication just randomly stops
>> >> > working.
>> >> > I haven't been able to find anything in the logs which point to an
>> >> > issue,
>> >> > but the Postgres process shows a "waiting" status on the slave:
>> >> >
>> >> > postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54
>> >> > postgres:
>> >> > startup process recovering 000000010000053D0000003F waiting
>> >> > postgres 5642 0.0 21.4 3428356 2613252 ? Ss Aug14 0:30
>> >> > postgres:
>> >> > writer process
>> >> > postgres 5659 0.0 0.0 177524 788 ? Ss Aug14 0:03
>> >> > postgres:
>> >> > stats collector process
>> >> > postgres 7159 1.2 0.1 3451360 18352 ? Ss Aug14 17:31
>> >> > postgres:
>> >> > wal receiver process streaming 549/216B3730
>> >> >
>> >> > The replication works great for days, but randomly seems to lock up
>> >> > and
>> >> > replication halts. I verified that the two databases were out of
>> >> > sync
>> >> > with
>> >> > a query on both of them. Has anyone experienced this issue before?
>> >> >
>> >> > Here are some relevant config settings:
>> >> >
>> >> > Master:
>> >> >
>> >> > wal_level = hot_standby
>> >> > checkpoint_segments = 32
>> >> > checkpoint_completion_target = 0.9
>> >> > archive_mode = on
>> >> > archive_command = 'rsync -a %p foo@foo:/var/lib/pgsql/9.1/wals/%f
>> >> > </dev/null'
>> >> > max_wal_senders = 2
>> >> > wal_keep_segments = 32
>> >> >
>> >> > Slave:
>> >> >
>> >> > wal_level = hot_standby
>> >> > checkpoint_segments = 32
>> >> > #checkpoint_completion_target = 0.5
>> >> > hot_standby = on
>> >> > max_standby_archive_delay = -1
>> >> > max_standby_streaming_delay = -1
>> >> > #wal_receiver_status_interval = 10s
>> >> > #hot_standby_feedback = off
>> >> >
>> >> > Thank you for any help you can provide!
>> >> >
>> >> > Andrew
>> >> >
>
>

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand https://netllama.linux-sxs.org

pgsql-general by date:

From: Lonni J Friedman
Date: 15 August 2013, 22:34:26
Subject: Re: Streaming Replication Randomly Locking Up

From: Robert James
Date: 15 August 2013, 23:16:12
Subject: Escape string for LIKE op

Re: Streaming Replication Randomly Locking Up - Mailing list pgsql-general

Previous

Next