Thread: findDependentObjects() mutual exclusion vs. MVCC catalog scans

findDependentObjects() mutual exclusion vs. MVCC catalog scans

From
Noah Misch
Date:
Consider this sequence of commands:

    create type rowtype as (c int, d int);
    create temp table t of rowtype;
    \c -
    drop type rowtype cascade;

Since the switch to MVCC catalog scans, it exhibits the following error about
10% of the time on my system:

CREATE TYPE
CREATE TABLE
You are now connected to database "test" as user "nm".
ERROR:  XX000: cache lookup failed for relation 17009
LOCATION:  getRelationDescription, objectaddress.c:2186

With "\c", in general, you may end up executing commands under the new session
before the old backend has finished exiting.  For this test case specifically,
the two backends' attempts to drop table "t" regularly overlap.  The old
backend would drop it within RemoveTempRelationsCallback(), and the new
backend would cascade from "rowtype" to drop it.  findDependentObjects() deals
with concurrent deletion attempts by acquiring a lock on each object it will
delete, then calling systable_recheck_tuple() to determine whether another
deleter was successful while the current backend was waiting for the lock.
systable_recheck_tuple() uses the scan snapshot, which really only works if
that snapshot is SnapshotNow or some other that changes its decision in
response to concurrent transaction commits.  The switch to MVCC snapshots left
this mutual exclusion protocol ineffective.

Let's fix this by having systable_recheck_tuple() acquire a fresh catalog MVCC
snapshot and recheck against that.  I believe it would also be fully safe to
use SnapshotNow here; however, I'm assuming we would otherwise manage to
remove SnapshotNow entirely.

Thanks,
nm

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Attachment

Re: findDependentObjects() mutual exclusion vs. MVCC catalog scans

From
Robert Haas
Date:
On Tue, Jul 16, 2013 at 9:50 AM, Noah Misch <noah@leadboat.com> wrote:
> Consider this sequence of commands:
>
>     create type rowtype as (c int, d int);
>     create temp table t of rowtype;
>     \c -
>     drop type rowtype cascade;
>
> Since the switch to MVCC catalog scans, it exhibits the following error about
> 10% of the time on my system:
>
> CREATE TYPE
> CREATE TABLE
> You are now connected to database "test" as user "nm".
> ERROR:  XX000: cache lookup failed for relation 17009
> LOCATION:  getRelationDescription, objectaddress.c:2186
>
> With "\c", in general, you may end up executing commands under the new session
> before the old backend has finished exiting.  For this test case specifically,
> the two backends' attempts to drop table "t" regularly overlap.  The old
> backend would drop it within RemoveTempRelationsCallback(), and the new
> backend would cascade from "rowtype" to drop it.  findDependentObjects() deals
> with concurrent deletion attempts by acquiring a lock on each object it will
> delete, then calling systable_recheck_tuple() to determine whether another
> deleter was successful while the current backend was waiting for the lock.
> systable_recheck_tuple() uses the scan snapshot, which really only works if
> that snapshot is SnapshotNow or some other that changes its decision in
> response to concurrent transaction commits.  The switch to MVCC snapshots left
> this mutual exclusion protocol ineffective.
>
> Let's fix this by having systable_recheck_tuple() acquire a fresh catalog MVCC
> snapshot and recheck against that.  I believe it would also be fully safe to
> use SnapshotNow here; however, I'm assuming we would otherwise manage to
> remove SnapshotNow entirely.

I recommend reworking the header comment to avoid mention of
SnapshotNow, since if we get rid of SnapshotNow, the reference might
not be too clear to far-future hackers.

+    /*
+     * For a scan using a non-MVCC snapshot like SnapshotSelf, we would simply
+     * reuse the old snapshot.  So far, the only caller uses MVCC snapshots.
+     */
+    freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan->heap_rel));

This comment is not very clear, because it doesn't describe what the
code actually does, but rather speculates about what the code could do
if the intention of some future caller were different.  I recommend
adding Assert(IsMVCCSnapshot(scan->xs_snapshot)) and changing the
comment to something like this: "For now, we don't handle the case of
a non-MVCC scan snapshot.  This is adequate for existing uses of this
function, but might need to be changed in the future."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: findDependentObjects() mutual exclusion vs. MVCC catalog scans

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jul 16, 2013 at 9:50 AM, Noah Misch <noah@leadboat.com> wrote:
>> Let's fix this by having systable_recheck_tuple() acquire a fresh catalog MVCC
>> snapshot and recheck against that.  I believe it would also be fully safe to
>> use SnapshotNow here; however, I'm assuming we would otherwise manage to
>> remove SnapshotNow entirely.

I agree with Robert's comments, and in addition suggest that this code
needs a comment about why it's safe to use the snapshot without doing
RegisterSnapshot or equivalent.
        regards, tom lane



Re: findDependentObjects() mutual exclusion vs. MVCC catalog scans

From
Andres Freund
Date:
On 2013-07-16 09:50:07 -0400, Noah Misch wrote:
> With "\c", in general, you may end up executing commands under the new session
> before the old backend has finished exiting.  For this test case specifically,
> the two backends' attempts to drop table "t" regularly overlap.  The old
> backend would drop it within RemoveTempRelationsCallback(), and the new
> backend would cascade from "rowtype" to drop it.  findDependentObjects() deals
> with concurrent deletion attempts by acquiring a lock on each object it will
> delete, then calling systable_recheck_tuple() to determine whether another
> deleter was successful while the current backend was waiting for the lock.
> systable_recheck_tuple() uses the scan snapshot, which really only works if
> that snapshot is SnapshotNow or some other that changes its decision in
> response to concurrent transaction commits.  The switch to MVCC snapshots left
> this mutual exclusion protocol ineffective.

Nice catch.

I wonder though, isn't that code unsafe in other ways as well? What if
the pg_depend entry was rewritten inbetween? Consider somebody doing
something like REASSIGN OWNED concurrently with a DROP. The DROP
possibly will cascade to an entry which changed the owner already. And
the recheck will then report that the object doesn't exist anymore and
abort since it does a simple HeapTupleSatisfiesVisibility() and doesn't
follow the ctid chain if the tuple has been updated...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: findDependentObjects() mutual exclusion vs. MVCC catalog scans

From
Noah Misch
Date:
On Tue, Jul 16, 2013 at 05:56:10PM +0200, Andres Freund wrote:
> On 2013-07-16 09:50:07 -0400, Noah Misch wrote:
> > With "\c", in general, you may end up executing commands under the new session
> > before the old backend has finished exiting.  For this test case specifically,
> > the two backends' attempts to drop table "t" regularly overlap.  The old
> > backend would drop it within RemoveTempRelationsCallback(), and the new
> > backend would cascade from "rowtype" to drop it.  findDependentObjects() deals
> > with concurrent deletion attempts by acquiring a lock on each object it will
> > delete, then calling systable_recheck_tuple() to determine whether another
> > deleter was successful while the current backend was waiting for the lock.
> > systable_recheck_tuple() uses the scan snapshot, which really only works if
> > that snapshot is SnapshotNow or some other that changes its decision in
> > response to concurrent transaction commits.  The switch to MVCC snapshots left
> > this mutual exclusion protocol ineffective.
> 
> Nice catch.
> 
> I wonder though, isn't that code unsafe in other ways as well? What if
> the pg_depend entry was rewritten inbetween? Consider somebody doing
> something like REASSIGN OWNED concurrently with a DROP. The DROP
> possibly will cascade to an entry which changed the owner already. And
> the recheck will then report that the object doesn't exist anymore and
> abort since it does a simple HeapTupleSatisfiesVisibility() and doesn't
> follow the ctid chain if the tuple has been updated...

I'm not seeing a problem with that particular route.  Say we're examining a
pg_depend tuple where the referencing object is a table and the referenced
object is a role, the table's owner.  We're dropping the role and cascade to
the table.  If the REASSIGNED OWNED assigns the table to a different role,
then we are correct to treat the dependency as gone.  If it's the same role
("REASSIGNED OWNED BY alice TO alice", pointless but permitted), several of
the rename implementations short-circuit that case and don't change catalog
entries.  But even if that optimization were omitted, shdepChangeDep() will
have blocked against our previously-acquired deletion lock on the role.  Code
that adds or updates a dependency without locking both objects of the new
pg_depend tuple is buggy independently.

That being said, there may well be a related mechanism that can slip past the
locking here.  My brain turns to mush when I ponder findDependentObjects() too
thoroughly.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: findDependentObjects() mutual exclusion vs. MVCC catalog scans

From
Noah Misch
Date:
On Tue, Jul 16, 2013 at 11:27:02AM -0400, Robert Haas wrote:
> I recommend reworking the header comment to avoid mention of
> SnapshotNow, since if we get rid of SnapshotNow, the reference might
> not be too clear to far-future hackers.
> 
> +    /*
> +     * For a scan using a non-MVCC snapshot like SnapshotSelf, we would simply
> +     * reuse the old snapshot.  So far, the only caller uses MVCC snapshots.
> +     */
> +    freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan->heap_rel));
> 
> This comment is not very clear, because it doesn't describe what the
> code actually does, but rather speculates about what the code could do
> if the intention of some future caller were different.  I recommend
> adding Assert(IsMVCCSnapshot(scan->xs_snapshot)) and changing the
> comment to something like this: "For now, we don't handle the case of
> a non-MVCC scan snapshot.  This is adequate for existing uses of this
> function, but might need to be changed in the future."

On Tue, Jul 16, 2013 at 11:35:48AM -0400, Tom Lane wrote:
> I agree with Robert's comments, and in addition suggest that this code
> needs a comment about why it's safe to use the snapshot without doing
> RegisterSnapshot or equivalent.

Committed with hopefully-better comments.  Thanks.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com