Re: 7.2.3 vacuum bug - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: 7.2.3 vacuum bug
Date
Msg-id 200211020350.gA23oW910502@candle.pha.pa.us
Whole thread Raw
In response to Re: 7.2.3 vacuum bug  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 7.2.3 vacuum bug  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Is this a TODO?

---------------------------------------------------------------------------

Tom Lane wrote:
> Rod Taylor <rbt@rbt.ca> writes:
> > ERROR:  RelationClearRelation: relation 11584078 deleted while still in
> > use 
> 
> > I've been unable to come up with a test case that will cause the
> > problem, seems to be timing related.  The queries that are currently
> > running when these errors occur do a lot or work with temp tables that
> > are frequently truncated.
> 
> Hm.  vacuum.c tries to avoid this class of problem:
> 
>     /*
>      * Race condition -- if the pg_class tuple has gone away since the
>      * last time we saw it, we don't need to vacuum it.
>      */
>     if (!SearchSysCacheExists(RELOID,
>                               ObjectIdGetDatum(relid),
>                               0, 0, 0))
>     {
>         CommitTransactionCommand(true);
>         return true;            /* okay 'cause no data there */
>     }
> 
>     ...
> 
>     onerel = relation_open(relid, lmode);
> 
> but on reflection it's clear that this doesn't really prevent a race
> condition.  If the table is already exclusive-locked by a DROP TABLE
> that hasn't committed yet (eg, the implicit DROP that happens when temp
> tables are cleared out at backend exit), then the syscache lookup will
> go fine, but the relation_open() routine blocks waiting for lock and
> eventually fails.
> 
> What would probably work better is to first lock the relation OID,
> then see if we can open the relation or not.
> 
> Thinking further, it's really kinda bogus that LockRelation() works on
> an already-opened Relation; if possible we should acquire the lock
> before attempting to create a relcache entry.  (We only need to know the
> OID and the relisshared status before we can make a locktag, so it'd be
> possible to acquire the lock using only the contents of the pg_class row.)
> Not sure how much code restructuring might be involved to make this
> happen, but it'd be worth thinking about for 7.4.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: elog(PANIC) should abort()?
Next
From: Bruce Momjian
Date:
Subject: Re: 7.3B3 psql talking to a 7.2.3 server?