Thread: BUG #15313: Waiting `startup` process on a streaming replica

BUG #15313: Waiting `startup` process on a streaming replica

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15313
Logged by:          Victor Yegorov
Email address:      vyegorov@gmail.com
PostgreSQL version: 10.4
Operating system:   Ubuntu 14.04.5 LTS (trusty)
Description:

We have one customer, that get one of their HotStandby replicas to be stuck
via `startup` process deadlock against session.
Case can be reproduced, we've installed dbg packages and collected
backtraces (I will attach them in a followup e-mail).

This happens during schema changes on the master. Processes is being far
from optimal, as they re-create all triggers,
views and materialized views in the scope. We've recommended this customer
to avoid such heavy schema changes.

Still, as we have evidence at hands, here comes the report.


Re: BUG #15313: Waiting `startup` process on a streaming replica

From
Victor Yegorov
Date:
вт, 7 авг. 2018 г. в 16:36, PG Bug reporting form <noreply@postgresql.org>:

Bug reference:      15313
Logged by:          Victor Yegorov
Email address:      vyegorov@gmail.com
PostgreSQL version: 10.4
Operating system:   Ubuntu 14.04.5 LTS (trusty)
Description:       

In the attached files, 22607 is pid of a waiting session (we have had around 100 sessions in such state)
and 22591 is a pid of a startup process.

 
--
Victor Yegorov
Attachment

Re: BUG #15313: Waiting `startup` process on a streaming replica

From
Maxim Boguk
Date:


On Tue, Aug 7, 2018 at 4:40 PM, Victor Yegorov <vyegorov@gmail.com> wrote:
вт, 7 авг. 2018 г. в 16:36, PG Bug reporting form <noreply@postgresql.org>:

Bug reference:      15313
Logged by:          Victor Yegorov
Email address:      vyegorov@gmail.com
PostgreSQL version: 10.4
Operating system:   Ubuntu 14.04.5 LTS (trusty)
Description:       

In the attached files, 22607 is pid of a waiting session (we have had around 100 sessions in such state)
and 22591 is a pid of a startup process.

 
--
Victor Yegorov


Some additional notes about this problem:

1)deadlock_timeout cannot not resolve this problem or simple cannot detect it properly.
2)max_standby_streaming_delay doesn't work as well (with sufficiently high rate of new problem queries from application).

Only one way to fix it on the loaded replica is preventing all new incoming connections via pg_hba.conf, killing all locked queries and verify that the offending part of wal had been replayed. No built-in mechanisms designed to deal with such issues work in that case. Only manual intervention.

I seen such issues like 10 times over last year on different projects. It isn't once per lifetime issue unfortunately.




--
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/

Phone RU: +7  985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"