On Fri, Oct 2, 2015 at 6:52 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
>
>
> On Fri, Oct 2, 2015 at 4:58 PM, Michael Paquier <michael.paquier@gmail.co=
m
> > wrote:
>
>>
>>
>> On Fri, Oct 2, 2015 at 2:14 PM, Maxim Boguk <maxim.boguk@gmail.com>
>> wrote:
>>
>>> =E2=80=8B>=E2=80=8B
>>> This backtrace is not indicating that this process is waiting on a
>>> relation lock, it is resolving a recovery conflict while removing tuple=
s,
>>> killing the virtual transaction depending on if max_standby_streaming_d=
elay
>>> or max_standby_archive_delay are set if the conflict gets longer. Did y=
ou
>>> change the default of those parameters, which is 30s, to -1? This would
>>> mean that the standby waits indefinitely.
>>>
>>>
>>> =E2=80=8BProblem that startup process have confict with a query, which =
blocked
>>> (waiting for) on the startup process itself (query could not process
>>> because it waiting for lock which held by startup process, and startup
>>> process waiting for finishing this query). So it's an undetected deadlo=
ck
>>> condtion here (as I understand situation). =E2=80=8B
>>>
>>> PS: there are no other activity on the database during that problem
>>> except blocked query.
>>>
>>
>> Don't you have other queries running in parallel of the one you are
>> defining as "stuck" on the standby that prevent replay to move on? Like =
a
>> long-running transaction working on the relation involved? Are you sure
>> that you did not set up
>> =E2=80=8B=E2=80=8B
>> max_standby_streaming_delay to -1?
>> --
>> Michael
>>
>
> During the problem period on the database had runned only one query
> (listed in intial report) and nothing more (and this query had beed in
> waiting state according to pg_stat_activity).
> The pg_locks show that the query waiting for AccessShareLock on relation
> 17987, in the same time the startup process have AccessExclusiveLock on t=
he
> same relation and waiting for something. No other activity on the replica
> going on.
> And yes, the=E2=80=8B max_standby_streaming_delay to -1, as a result the
> replication process had been stuck on query from external monitoring tool
> forever until I killed that query, but situation repeated in few hours
> again.
>
>
=E2=80=8BOne last addition to the report. The same database replica (server=
), the
same monitoring queries, all the same but using PostgreSQL =E2=80=8B9.2.13 =
doesn't
have that problem.
It been used for a few years without such issues, problem only arise only
after whole cluster upgrade to version 9.4.4.
> --
> Maxim Boguk
>