Thread: a modest improvement to get_object_address()

a modest improvement to get_object_address()

From
Robert Haas
Date:
I'd like to propose the attached patch, which changes
get_object_address() in a manner similar to what we did in
RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7.
 The basic idea is that, if we look up an object name, acquire the
corresponding lock, and then find that the object was dropped during
the lock wait, we retry the whole operation instead of emitting a
baffling error message.  Example:

rhaas=# create schema x;
CREATE SCHEMA
rhaas=# begin;
BEGIN
rhaas=# drop schema x;
DROP SCHEMA

Then, in another session:

rhaas=# comment on schema x is 'doodle';

Then, in the first session:

rhaas=# commit;
COMMIT

At this point, the first session must error out.  The current code
produces this:

ERROR:  cache lookup failed for class 2615 object 16386 subobj 0

With the attached patch, you instead get:

ERROR:  schema "x" does not exist

...which is obviously quite a bit nicer.

Also, if the concurrent transaction drops and creates the schema
instead of just dropping it, the new code will allow the operation to
succeed (with the expected results) rather than failing.

Objections?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: a modest improvement to get_object_address()

From
Cédric Villemain
Date:
2011/11/9 Robert Haas <robertmhaas@gmail.com>:
> I'd like to propose the attached patch, which changes
> get_object_address() in a manner similar to what we did in
> RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7.
>  The basic idea is that, if we look up an object name, acquire the
> corresponding lock, and then find that the object was dropped during
> the lock wait, we retry the whole operation instead of emitting a
> baffling error message.  Example:
>
> rhaas=# create schema x;
> CREATE SCHEMA
> rhaas=# begin;
> BEGIN
> rhaas=# drop schema x;
> DROP SCHEMA
>
> Then, in another session:
>
> rhaas=# comment on schema x is 'doodle';
>
> Then, in the first session:
>
> rhaas=# commit;
> COMMIT
>
> At this point, the first session must error out.  The current code
> produces this:
>
> ERROR:  cache lookup failed for class 2615 object 16386 subobj 0
>
> With the attached patch, you instead get:
>
> ERROR:  schema "x" does not exist
>
> ...which is obviously quite a bit nicer.
>
> Also, if the concurrent transaction drops and creates the schema
> instead of just dropping it, the new code will allow the operation to
> succeed (with the expected results) rather than failing.
>
> Objections?

Maybe I miss something but:
The ERROR message is misleading:  the schema 'x' does exist. And also
why a drop schema would fail and a drop+create would success ?!


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>



--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


Re: a modest improvement to get_object_address()

From
Robert Haas
Date:
On Wed, Nov 9, 2011 at 8:37 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
> Maybe I miss something but:
> The ERROR message is misleading:  the schema 'x' does exist.

No, it doesn't.  The concurrent transaction has dropped it.

> And also
> why a drop schema would fail and a drop+create would success ?!

Because you can't comment on a schema that doesn't exist any more, but
you can comment on one that does.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: a modest improvement to get_object_address()

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'd like to propose the attached patch, which changes
> get_object_address() in a manner similar to what we did in
> RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7.

I would think you need to drop the now-useless lock, and I sure hope
that RangeVarGetRelid does likewise.
        regards, tom lane


Re: a modest improvement to get_object_address()

From
Robert Haas
Date:
On Wed, Nov 9, 2011 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I'd like to propose the attached patch, which changes
>> get_object_address() in a manner similar to what we did in
>> RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7.
>
> I would think you need to drop the now-useless lock, and I sure hope
> that RangeVarGetRelid does likewise.

It doesn't currently.  The now-useless lock doesn't really hurt
anything, aside from taking up space in the lock table.  But we can
certainly make it (and this) do that, if you think it's worth the
extra lines of code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: a modest improvement to get_object_address()

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Nov 9, 2011 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I would think you need to drop the now-useless lock, and I sure hope
>> that RangeVarGetRelid does likewise.

> It doesn't currently.  The now-useless lock doesn't really hurt
> anything, aside from taking up space in the lock table.

Well, there are corner cases where the object OID gets reused during
the lifetime of the transaction, and then the lock *does* do something
(and what it does would be bad).  But taking up extra space in the
finite-size lock table is sufficient reason IMO to drop the lock.
It's not like these are performance-critical code paths.
        regards, tom lane


Re: a modest improvement to get_object_address()

From
Cédric Villemain
Date:
2011/11/9 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Nov 9, 2011 at 8:37 AM, Cédric Villemain
> <cedric.villemain.debian@gmail.com> wrote:
>> Maybe I miss something but:

I read that the error was produced by first session and didn't check
carefuly (it fails silently in 9.0! and 'works' as expected in 9.1)

No objection, but I would like to still be able to diagnose the same
things as in the past, can you make it clear that the schema/object
just disappear ? (else we don't know if the relation just never exists
or was drop while we were waiting)

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


Re: a modest improvement to get_object_address()

From
Robert Haas
Date:
On Wed, Nov 9, 2011 at 10:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Nov 9, 2011 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I would think you need to drop the now-useless lock, and I sure hope
>>> that RangeVarGetRelid does likewise.
>
>> It doesn't currently.  The now-useless lock doesn't really hurt
>> anything, aside from taking up space in the lock table.
>
> Well, there are corner cases where the object OID gets reused during
> the lifetime of the transaction, and then the lock *does* do something
> (and what it does would be bad).  But taking up extra space in the
> finite-size lock table is sufficient reason IMO to drop the lock.
> It's not like these are performance-critical code paths.

OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: a modest improvement to get_object_address()

From
Robert Haas
Date:
On Wed, Nov 9, 2011 at 10:50 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
> 2011/11/9 Robert Haas <robertmhaas@gmail.com>:
>> On Wed, Nov 9, 2011 at 8:37 AM, Cédric Villemain
>> <cedric.villemain.debian@gmail.com> wrote:
>>> Maybe I miss something but:
>
> I read that the error was produced by first session and didn't check
> carefuly (it fails silently in 9.0! and 'works' as expected in 9.1)
>
> No objection, but I would like to still be able to diagnose the same
> things as in the past, can you make it clear that the schema/object
> just disappear ? (else we don't know if the relation just never exists
> or was drop while we were waiting)

I don't see a clean way to do that, and I'm not convinced it's a good
idea anyway.  I think that if we start generating different error
messages based on whether or not a lock wait was involved at some
point in the operation, we're going to drive ourselves nuts.  There
are a lot of places where that can happen.

e.g. Suppose that you have a table with a unique index on column a.
Transaction A deletes the tuple where a = 1. Transaction B attempts to
insert a new tuple with a = 1, and blocks.  Now if A commits, B will
succeed, but if A rolls back, B will abort.  Had transaction A not
existed, B would simply abort at once.  But the error message will not
indicate which of the two it was, and I don't thinkit needs to.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: a modest improvement to get_object_address()

From
Dimitri Fontaine
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> e.g. Suppose that you have a table with a unique index on column a.
> Transaction A deletes the tuple where a = 1. Transaction B attempts to

That's DML, I agree with you there, no need.  In DML we have MVCC.

Back to the problem you raised, it's DDL and we're sitting in between
SnapshotNow and catalog cache entries.  Not so comfy.  I would guess
that the problem (I confess didn't read carefully enough) happens after
having done a cache lookup when trying to use its result?

Could we check the object still exists as part of the cache lookup, or
would that mean we don't have a cache anymore?  Or is the answer related
to consuming invalidation messages before returning a stale entry from
the cache?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: a modest improvement to get_object_address()

From
Robert Haas
Date:
On Wed, Nov 9, 2011 at 3:40 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Back to the problem you raised, it's DDL and we're sitting in between
> SnapshotNow and catalog cache entries.  Not so comfy.  I would guess
> that the problem (I confess didn't read carefully enough) happens after
> having done a cache lookup when trying to use its result?

There's a test case in the original post, but yes, the problem happens
when something changes between the time you do the catcache lookup and
the time you acquire the lock.  This is not a new problem; I'm just
trying to give a more intelligible error message - and avoid
unnecessary failures, as in the case where two concurrent DROP IF
EXISTS operations target the same object and one of them unnecessarily
rolls back.

> Could we check the object still exists as part of the cache lookup, or
> would that mean we don't have a cache anymore?  Or is the answer related
> to consuming invalidation messages before returning a stale entry from
> the cache?

All of that is way beyond the scope of what I'm doing here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company