On Wed, Apr 1, 2015 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've been able to reproduce this. The triggering event seems to be that
> the "VACUUM FULL pg_am" in vacuum.sql has to happen while another backend
> is starting up. With a ten-second delay inserted at the bottom of
> PerformAuthentication(), it's trivial to hit it manually. The reason we'd
> not seen this before the rolenames.sql test was added is that none of the
> other tests that run concurrently with vacuum.sql perform mid-test
> reconnections, or ever have AFAIR. So as long as they all managed to
> start up before vacuum.sql got to the dangerous step, no problem.
>
> I've not fully tracked it down, but I think that the blame falls on the
> MVCC-snapshots-for-catalog-scans patch; it appears that it's trying to
> read pg_am's pg_class entry with a snapshot that's too old, possibly
> because it assumes that sinval signaling is alive which I think ain't so.
Hmm, sorry for the bug. It looks to me like sinval signaling gets
started up at the beginning of InitPostgres(). Perhaps something like
this?
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 7cfa0cf..47c4002 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -303,7 +303,7 @@ GetNonHistoricCatalogSnapshot(Oid relid) * Mark new snapshost as valid. We must do
thislast,
in case an * ERROR occurs inside GetSnapshotData(). */
- CatalogSnapshotStale = false;
+ CatalogSnapshotStale = !IsNormalProcessingMode(); }
return CatalogSnapshot;
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company