Re: BUG #13559: WAL replay stuck after DROP VIEW - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #13559: WAL replay stuck after DROP VIEW
Date
Msg-id CAB7nPqQqX8sunfTXXsOCDkrVnrW4-i46wZb5Sn6buA-szq1y-w@mail.gmail.com
Whole thread Raw
In response to BUG #13559: WAL replay stuck after DROP VIEW  (maciek@heroku.com)
Responses Re: BUG #13559: WAL replay stuck after DROP VIEW  (Maciek Sakrejda <maciek@heroku.com>)
List pgsql-bugs
On Tue, Aug 11, 2015 at 7:31 AM,  <maciek@heroku.com> wrote:
> We had some code in production that automatically dropped and recreated
> views periodically. This database also has a replica that serves some
> moderately intensive queries (read: on the order of several minutes). This
> generally this works fine, but we ran into an issue the other day where the
> startup process on the replica was holding a bunch of AccessExclusive locks
> on these views (presumably due to the DROP) and would not progress even
> though there were no conflicting queries (there may very well have been
> queries against these views at one point, but not not when I looked--all the
> locks held by the startup process showed up as granted in pg_locks). This
> resolved when we restarted the replica.

Yes, possible. Access exclusive lock is taken when dropping the
relation before removing it from heap.

> We're on 9.4.1, but skimming through the 9.4.2-through-9.4.4 release notes,
> I don't see anything relevant.
>
> Could this be an outstanding bug? For what it's worth, we've been running
> the view drop / recreate for about 9 months, totalling probably ~240 drops /
> creates and this is the first time we've run into an issue.

The closest fix post-9.4.4 fix in this area is this commit:
commit: bab959906911c97437f410a03b0346e6dd28d528
author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
date: Sun, 2 Aug 2015 20:09:05 +0300
Fix race condition that lead to WALInsertLock deadlock with commit_delay.

But this fixed a problem on master where queries got stuck because of
the WAL insert lock patch that has been introduced in 9.4 because of a
race condition between two backends.

Do you have some remnants of the logs when this problem happened? Or
even better, a stack trace to understand where the startup process got
stucked? That's perhaps unrelated, but were you using commit_delay?

I have been playing with pgbench, creating and dropping views with a
couple of hundred connections on 9.4.1 but could not reproduce the
issue.
Regards,
--
Michael

pgsql-bugs by date:

Previous
From: ismaelbezerra@gmail.com
Date:
Subject: BUG #13558: PANIC: stuck spinlock
Next
From: Maciek Sakrejda
Date:
Subject: Re: BUG #13559: WAL replay stuck after DROP VIEW