On Tue, Feb 22, 2011 at 10:18:36PM -0500, Robert Haas wrote:
> On Wed, Feb 2, 2011 at 12:20 AM, Noah Misch <noah@leadboat.com> wrote:
> >> CREATE TABLE account (account_id SERIAL PRIMARY KEY, name CHARACTER VARYING
> >> NOT NULL);
> >> CREATE TABLE foo (account_id INTEGER NOT NULL REFERENCES account
> >> (account_id), stuff CHARACTER VARYING);
> >
> >> DROP TABLE foo;
> >
> >> That query shows that the DROP takes an AccessExclusiveLock on account.
> >> ?This isn't totally unexpected but it is unfortunate because it means we
> >> have to wait for a downtime window to maintain constraints even if they are
> >> not really in use.
> >
> > PostgreSQL 9.1 will contain changes to make similar operations, though not that
> > one, take ShareRowExclusiveLock instead of AccessExclusiveLock. ?Offhand, the
> > same optimization probably could be arranged for it with minimal fuss. ?If
> > "account" is heavily queried but seldom changed, that might be enough for you.
>
> The problem is that constraints can affect the query plan. If a
> transaction sees the constraint in the system catalogs (under
> SnapshotNow) but the table data doesn't conform (under some earlier
> snapshot) and if the chosen plan depends on the validity of the
> constraint, then we've got trouble. At least when running at READ
> COMMITTED, taking an AccessExclusiveLock protects us against that
> hazard (I'm not exactly sure what if anything protects us at higher
> isolation levels... but I hope there is something).
AccessExclusiveLock does not prevent that problem. We're already on thin ice in
this regard:
-- session 1
CREATE TABLE t (x) AS SELECT NULL::int;
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT 1;
-- session 2
DELETE FROM t;
ALTER TABLE t ALTER x SET NOT NULL;
-- session 1
TABLE t;
With contortions, we can coax the same from READ COMMITTED:
-- session 1
CREATE TABLE t (x) AS SELECT NULL::int;
CREATE FUNCTION pg_temp.f() RETURNS int LANGUAGE sql
STABLE -- reuse snapshot
AS 'SELECT 1; TABLE t'; -- extra statement to avoid inlining
VALUES (pg_sleep(15), pg_temp.f());
-- session 2
DELETE FROM t;
ALTER TABLE t ALTER x SET NOT NULL;
The catalogs say x is NOT NULL, but we read a NULL value just the same. I'm not
sure what anomalies this permits today, if any, but it's in the same vein.