On 2013-06-27 00:12:07 -0700, Hitoshi Harada wrote:
> > Two, until we get MVCC catalog scans, it's not safe to update any
> > system catalog tuple without an AccessExclusiveLock on some locktag
> > that will prevent concurrent catalog scans for that tuple. Under
> > SnapshotNow semantics, concurrent readers can fail to see that the
> > object is present at all, leading to mysterious failures - especially
> > if some of the object's catalog scans are seen and others are missed.
> >
> >
> > So what I'm saying above is take AccessExclusiveLock on swapping relfile
> in catalog. This doesn't violate your statement, I suppose. I'm actually
> still skeptical about MVCC catalog, because even if you can make catalog
> lookup MVCC, relfile on the filesystem is not MVCC. If session 1 changes
> relfilenode in pg_class and commit transaction, delete the old relfile from
> the filesystem, but another concurrent session 2 that just took a snapshot
> before 1 made such change keeps running and tries to open this relation,
> grabbing the old relfile and open it from filesystem -- ERROR: relfile not
> found.
We can play cute tricks akin to what CREATE INDEX CONCURRENTLY currently
does, i.e. wait for all other relations that could have possibly seen
the old relfilenode (they must have at least a share lock on the
relation) before dropping the actual storage.
The reason we cannot currently do that in most scenarios is that we
cannot perform transactional/mvcc updates of non-exclusively locked
objects due to the SnapshotNow problems of seeing multiple or no
versions of a row during a single scan.
Greetings,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services