Re: Something is rotten in the state of Denmark... - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Something is rotten in the state of Denmark...
Date
Msg-id 14837.1427993642@sss.pgh.pa.us
Whole thread Raw
In response to Re: Something is rotten in the state of Denmark...  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Something is rotten in the state of Denmark...
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Apr 1, 2015 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've not fully tracked it down, but I think that the blame falls on the
>> MVCC-snapshots-for-catalog-scans patch; it appears that it's trying to
>> read pg_am's pg_class entry with a snapshot that's too old, possibly
>> because it assumes that sinval signaling is alive which I think ain't so.

> Hmm, sorry for the bug.  It looks to me like sinval signaling gets
> started up at the beginning of InitPostgres().  Perhaps something like
> this?

Yeah, it does, so you'd think this would work.  I've now tracked down
exactly what's happening, and it's this: while we're reading pg_authid
during PerformAuthentication, we establish a catalog snapshot.  Later on,
we read pg_database to find out what DB we're connecting to.  If any
sinval messages for unshared system catalogs arrive at that point, they'll
effectively be ignored, *because our MyDatabaseId is still zero and so
doesn't match the incoming message*.  That's okay as far as the catcaches
are concerned, cause they're empty, and the relcache only knows about
shared catalogs so far.  But InvalidateCatalogSnapshot doesn't get called
by LocalExecuteInvalidationMessage, and so we think we can plow ahead with
the old snapshot.

It looks to me like an appropriate fix would be as attached; thoughts?
(I've tested this with a delay during relation lock acquisition and
concurrent whole-database VACUUM FULLs, and so far been unable to break
it.)

            regards, tom lane

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 1e646a1..6aa6868 100644
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
*************** InitPostgres(const char *in_dbname, Oid
*** 853,858 ****
--- 853,866 ----
      MyProc->databaseId = MyDatabaseId;

      /*
+      * We may have already established a catalog snapshot while trying to read
+      * pg_authid; but until we have set up MyDatabaseId, we won't react to
+      * incoming sinval messages properly, so we won't realize it if the
+      * snapshot has been invalidated.  Best to assume it's no good anymore.
+      */
+     InvalidateCatalogSnapshot();
+
+     /*
       * Now, take a writer's lock on the database we are trying to connect to.
       * If there is a concurrently running DROP DATABASE on that database, this
       * will block us until it finishes (and has committed its update of

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: POLA violation with \c service=
Next
From: Tom Lane
Date:
Subject: Re: vac truncation scan problems