Re: Reindex "locked" standby database - Mailing list pgsql-general

From Michael Paquier
Subject Re: Reindex "locked" standby database
Date
Msg-id Ybli/z1eOBwmomgV@paquier.xyz
Whole thread Raw
In response to Reindex "locked" standby database  (Martín Fernández <fmartin91@gmail.com>)
Responses Re: Reindex "locked" standby database  (Mladen Gogala <gogala.mladen@gmail.com>)
Re: Reindex "locked" standby database  (Martín Fernández <fmartin91@gmail.com>)
List pgsql-general
On Wed, Dec 15, 2021 at 12:15:27AM -0300, Martín Fernández wrote:
> The reindex went fine in the primary database and in one of our
> standby. The other standby that we also operate for some reason
> ended up in a state where all transactions were locked by the WAL
> process and the WAL process was not able to make any progress. In
> order to solve this issue we had to move traffic from the “bad”
> standby to the healthy one and then kill all transactions that were
> running in the “bad” standby. After that, replication was able to
> resume successfully.

You are referring to the startup process that replays WAL, right?
Without having an idea about the type of workload your primary and/or
standbys are facing, as well as an idea of the configuration you are
using on both (hot_standby_feedback for one), I have no direct idea,
but that could be a conflict caused by a concurrent vacuum.

Seeing where things got stuck could also be useful, perhaps with a
backtrace of the area where it happens and some information around
it.

> I’m just trying to understand what could have caused this issue. I
> was not able to identify any queries in the standby that would be
> locking the WAL process. Any insight would be more than welcome!

That's not going to be easy without more information, I am afraid.
--
Michael

Attachment

pgsql-general by date:

Previous
From: Martín Fernández
Date:
Subject: Reindex "locked" standby database
Next
From: Mladen Gogala
Date:
Subject: Re: Reindex "locked" standby database