Thread: refactoring comment.c

refactoring comment.c

From
Robert Haas
Date:
At PGCon, we discussed the possibility that a minimal SE-PostgreSQL
implementation would need little more than a hook in
ExecCheckRTPerms() [which we've since added] and a security label
facility [for which KaiGai has submitted a patch].  I actually sat
down to write the security label patch myself while we were in Ottawa,
but quickly ran into difficulties: while the hook we have now can't do
anything useful with objects other than relations, it's pretty clear
from previous discussions on this topic that the demand for labels on
other kinds of objects is not going to go away.  Rather than adding
additional syntax to every object type in the system (some of which
don't even have ALTER commands at present), I suggested basing the
syntax on the existing COMMENT syntax.  After some discussion[1], we
seem to have settled on the following:

SECURITY LABEL [ FOR <provider> ] ON <object class> <object name> IS '<label>';

At present, there are some difficulties with generalizing this syntax
to other object types.  As I found out when I initially set out to
write this patch, it'd basically require duplicating all of comment.c,
which is an unpleasant prospect, because that file is big and crufty;
it has a large amount of internal duplication.  Furthermore, the
existing locking mechanism that we're using for comments is known to
be inadequate[2].  Dropping a comment while someone else is in the
midst of commenting on it leaves orphaned comments lying around in
pg_(sh)description that could later be inherited by a new object.
That's a minor nuisance for comments and would be nice to fix, but is
obviously a far larger problem for security labels, where even a small
chance of randomly mislabeling an object is no good.

So I wrote a patch.  The attached patch factors out all of the code in
comment.c that is responsible for translating parser representations
into a new file parser/parse_object.c, leaving just the
comment-specific stuff in commands/comment.c.  It also adds
appropriate locking, so that concurrent COMMENT/DROP scenarios don't
leave behind leftovers.  It's a fairly large patch, but the changes
are extremely localized: comment.c gets a lot smaller, and
parse_object.c gets bigger by a slightly smaller amount.

Any comments?  (ha ha ha...)

[1] http://archives.postgresql.org/pgsql-hackers/2010-07/msg01328.php
[2] http://archives.postgresql.org/pgsql-hackers/2010-07/msg00351.php

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment

Re: refactoring comment.c

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of vie ago 06 11:02:58 -0400 2010:

> Any comments?  (ha ha ha...)

Interesting idea.  The patch looks fine on a quick once-over.  Two small
things: this comment

+    /*
+     * Databases, tablespaces, and roles are cluster-wide objects, so any
+     * comments on those objects are record in the shared pg_shdescription
+     * catalog.  Comments on all other objects are recorded in pg_description.
+     */

says "record" where it should say "recorded".

Also, it strikes me that perhaps the ObjectAddress struct definition
should be moved to the new header file; seems a more natural place for
it (but then, it seems like a lot of files will need to include the new
header, so perhaps that should be left to a subsequent patch).

Thirdly, is it just me or just could replace a lot of code in DropFoo
functions with this new auxiliary code?  Seems it's just missing
"missing_ok" support ...

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: refactoring comment.c

From
Robert Haas
Date:
On Fri, Aug 6, 2010 at 11:36 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of vie ago 06 11:02:58 -0400 2010:
>
>> Any comments?  (ha ha ha...)
>
> Interesting idea.  The patch looks fine on a quick once-over.

Thanks for taking a look.

> Two small
> things: this comment
>
> +    /*
> +     * Databases, tablespaces, and roles are cluster-wide objects, so any
> +     * comments on those objects are record in the shared pg_shdescription
> +     * catalog.  Comments on all other objects are recorded in pg_description.
> +     */
>
> says "record" where it should say "recorded".

Thanks, good eye.

> Also, it strikes me that perhaps the ObjectAddress struct definition
> should be moved to the new header file; seems a more natural place for
> it (but then, it seems like a lot of files will need to include the new
> header, so perhaps that should be left to a subsequent patch).

I thought about that, but erred on the side of being conservative and
didn't move it.  I like the idea, though.

> Thirdly, is it just me or just could replace a lot of code in DropFoo
> functions with this new auxiliary code?  Seems it's just missing
> "missing_ok" support ...

I am not sure how much code it would save you at the level of the
individual DropFoo() functions; I'd have to look through them more
carefully.  But now that you mention it, what about getting rid of all
of the individual parse nodes for drop statements?  Right now we have:

DropTableSpaceStmt
DropFdwStmt
DropForeignServerStmt
DropUserMappingStmt
DropPLangStmt
DropRoleStmt
DropStmt (table, sequence, view, index, domain, conversion, schema,
text search {parser, dictionary, template, configuration}
DropPropertyStmt (rules and triggers)
DropdbStmt (capitalized differently, just for fun)
DropCastStmt
RemoveFuncStmt (so you can't find it by grepping for Drop!)
RemoveOpClassStmt
RemoveOpFamilyStmt

It seems like we could probably unify all of these into a single
DropStmt, following the same pattern as CommentStmt, although I think
perhaps that should be a follow-on patch rather than doing it as part
of this one.  GRANT also has some code to translate object names to
OIDs, which I thought might be able to use this machinery as well,
though I haven't really checked whether it makes sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
Simon Riggs
Date:
On Fri, 2010-08-06 at 11:02 -0400, Robert Haas wrote:
> At PGCon, we discussed the possibility that a minimal SE-PostgreSQL
> implementation would need little more than a hook in
> ExecCheckRTPerms() [which we've since added] and a security label
> facility [for which KaiGai has submitted a patch].  I actually sat
> down to write the security label patch myself while we were in Ottawa,
> but quickly ran into difficulties: while the hook we have now can't do
> anything useful with objects other than relations, it's pretty clear
> from previous discussions on this topic that the demand for labels on
> other kinds of objects is not going to go away.  Rather than adding
> additional syntax to every object type in the system (some of which
> don't even have ALTER commands at present), I suggested basing the
> syntax on the existing COMMENT syntax.  After some discussion[1], we
> seem to have settled on the following:
> 
> SECURITY LABEL [ FOR <provider> ] ON <object class> <object name> IS '<label>';

I understand the concept and it seems like it might work. Not too keen
on pretending a noun is a verb. That leads to erroring.

<verb> SECURITY LABEL? verb = CREATE, ADD, ...

Can't objects have more than one label?

How will you set default security labels on objects?

Where do you define labels?

Will there be a new privilege to define this? Presumably object owners
would not be able to set that themselves, otherwise you could create an
object, add a security label to it and then use it to see other things
at that level.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: refactoring comment.c

From
Robert Haas
Date:
On Fri, Aug 6, 2010 at 12:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I understand the concept and it seems like it might work. Not too keen
> on pretending a noun is a verb. That leads to erroring.
>
> <verb> SECURITY LABEL? verb = CREATE, ADD, ...
>
> Can't objects have more than one label?
>
> How will you set default security labels on objects?
>
> Where do you define labels?
>
> Will there be a new privilege to define this? Presumably object owners
> would not be able to set that themselves, otherwise you could create an
> object, add a security label to it and then use it to see other things
> at that level.

Uh, these are all good questions, but I think they'd be more
appropriate on the thread about the security label patch, to which I
linked in my previous email.  Many of them have already been discussed
there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
KaiGai Kohei
Date:
(2010/08/07 0:02), Robert Haas wrote:
> At PGCon, we discussed the possibility that a minimal SE-PostgreSQL
> implementation would need little more than a hook in
> ExecCheckRTPerms() [which we've since added] and a security label
> facility [for which KaiGai has submitted a patch].  I actually sat
> down to write the security label patch myself while we were in Ottawa,
> but quickly ran into difficulties: while the hook we have now can't do
> anything useful with objects other than relations, it's pretty clear
> from previous discussions on this topic that the demand for labels on
> other kinds of objects is not going to go away.  Rather than adding
> additional syntax to every object type in the system (some of which
> don't even have ALTER commands at present), I suggested basing the
> syntax on the existing COMMENT syntax.  After some discussion[1], we
> seem to have settled on the following:
>
> SECURITY LABEL [ FOR<provider>  ] ON<object class>  <object name>  IS '<label>';
>
> At present, there are some difficulties with generalizing this syntax
> to other object types.  As I found out when I initially set out to
> write this patch, it'd basically require duplicating all of comment.c,
> which is an unpleasant prospect, because that file is big and crufty;
> it has a large amount of internal duplication.  Furthermore, the
> existing locking mechanism that we're using for comments is known to
> be inadequate[2].  Dropping a comment while someone else is in the
> midst of commenting on it leaves orphaned comments lying around in
> pg_(sh)description that could later be inherited by a new object.
> That's a minor nuisance for comments and would be nice to fix, but is
> obviously a far larger problem for security labels, where even a small
> chance of randomly mislabeling an object is no good.
>
> So I wrote a patch.  The attached patch factors out all of the code in
> comment.c that is responsible for translating parser representations
> into a new file parser/parse_object.c, leaving just the
> comment-specific stuff in commands/comment.c.  It also adds
> appropriate locking, so that concurrent COMMENT/DROP scenarios don't
> leave behind leftovers.  It's a fairly large patch, but the changes
> are extremely localized: comment.c gets a lot smaller, and
> parse_object.c gets bigger by a slightly smaller amount.
>
> Any comments?  (ha ha ha...)
>
> [1] http://archives.postgresql.org/pgsql-hackers/2010-07/msg01328.php
> [2] http://archives.postgresql.org/pgsql-hackers/2010-07/msg00351.php
>

Thanks for your efforts.
I believe the get_object_address() enables to implement security
label features on various kind of database objects.

I tried to look at the patch. Most part is fine, but I found out
two issues.

On the object_exists(), when we verify existence of a large object,
it needs to scan pg_largeobject_metadata, instead of pg_largeobject.
When we implement pg_largeobject_metadata catalog, we decided to set
LargeObjectRelationId on object.classId due to the backend compatibility.

|    /*
|     * For object types that have a relevant syscache, we use it; for
|     * everything else, we'll have to do an index-scan.  This switch
|     * sets either the cache to be used for the syscache lookup, or the
|     * index to be used for the index scan.
|     */
|    switch (address.classId)
|    {
|        case RelationRelationId:
|            cache = RELOID;
|            break;
|              :
|        case LargeObjectRelationId:
|            indexoid = LargeObjectMetadataOidIndexId;
|            break;
|              :
|     }
|
|    /* Found a syscache? */
|    if (cache != -1)
|        return SearchSysCacheExists1(cache, ObjectIdGetDatum(address.objectId));
|
|    /* No syscache, so examine the table directly. */
|    Assert(OidIsValid(indexoid));
|    ScanKeyInit(&skey[0], ObjectIdAttributeNumber, BTEqualStrategyNumber,
|                F_OIDEQ, ObjectIdGetDatum(address.objectId));
|    rel = heap_open(address.classId, AccessShareLock);                     ^^^^^^^^^^^^^^^ <- It tries to open
pg_largeobject
|    sd = systable_beginscan(rel, indexoid, true, SnapshotNow, 1, skey);
|    found = HeapTupleIsValid(systable_getnext(sd));
|    systable_endscan(sd);
|    heap_close(rel, AccessShareLock);
|    return found;
| }


Although no caller invokes get_object_address() with lockmode = NoLock,
isn't it necessary to skip locking if NoLock was given.

|    /*
|     * If we're dealing with a relation or attribute, then the relation is
|     * already locked.  If we're dealing with any other type of object, we need
|     * to lock it and then verify that it still exists.
|     */
|    if (address.classId != RelationRelationId)
|    {
|        if (IsSharedRelation(address.classId))
|            LockSharedObject(address.classId, address.objectId, 0, lockmode);
|        else
|            LockDatabaseObject(address.classId, address.objectId, 0, lockmode);
|        /* Did it go away while we were waiting for the lock? */
|        if (!object_exists(address))
|            elog(ERROR, "cache lookup failed for class %u object %u subobj %d",
|                 address.classId, address.objectId, address.objectSubId);
|    }

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: refactoring comment.c

From
Robert Haas
Date:
On Fri, Aug 6, 2010 at 11:15 PM, KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:
> [brief review]

OK, here's an updated patch:

1. I fixed the typo Alvaro spotted.

2. I haven't done anything about moving the definition of
ObjectAddress elsewhere, as Alvaro suggested, because I'm not sure
quite where it ought to go.  I still think it's a good idea, though
I'm not dead set on it, either.  Suggestions?

3. I fixed the issue Kaigai Kohei spotted, regarding
LargeObjectRelationId vs. LargeObjectMetadataRelationId, by adding a
grotty hack.  However, I feel that I'm not so much adding a new grotty
hack as working around an existing grotty hack which was added for
reasons I'm unclear on.  Is there a pg_upgrade-related reason not to
revert the original hack instead?

4. In response to Kaigai Kohei's complaint about lockmode possibly
being NoLock, I've just added an Assert() that it isn't, in lieu of
trying to do something sensible in that case.  I can't at present
think of a situation in which being able to call it that way would be
useful, and the Assert() seems like it ought to be enough warning to
anyone coming along later that they'd better think twice before
thinking that will work.

5. Since I'm hoping Tom will read this, I ran it through filterdiff.  :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment

Re: refactoring comment.c

From
KaiGai Kohei
Date:
(2010/08/16 11:50), Robert Haas wrote:
> On Fri, Aug 6, 2010 at 11:15 PM, KaiGai Kohei<kaigai@kaigai.gr.jp>  wrote:
>> [brief review]
> 
> OK, here's an updated patch:
> 
> 1. I fixed the typo Alvaro spotted.
> 
> 2. I haven't done anything about moving the definition of
> ObjectAddress elsewhere, as Alvaro suggested, because I'm not sure
> quite where it ought to go.  I still think it's a good idea, though
> I'm not dead set on it, either.  Suggestions?
> 
> 3. I fixed the issue Kaigai Kohei spotted, regarding
> LargeObjectRelationId vs. LargeObjectMetadataRelationId, by adding a
> grotty hack.  However, I feel that I'm not so much adding a new grotty
> hack as working around an existing grotty hack which was added for
> reasons I'm unclear on.  Is there a pg_upgrade-related reason not to
> revert the original hack instead?
> 

When we were developing largeobject access controls, Tom Lane commented
as follows:

* Re: [HACKERS] [PATCH] Largeobject access controls http://marc.info/?l=pgsql-hackers&m=125548822906571&w=2
| I notice that the patch decides to change the pg_description classoid for
| LO comments from pg_largeobject's OID to pg_largeobject_metadata's.  This
| will break existing clients that look at pg_description (eg, pg_dump and
| psql, plus any other clients that have any intelligence about comments,
| for instance it probably breaks pgAdmin).  And there's just not a lot of
| return that I can see.  I agree that using pg_largeobject_metadata would
| be more consistent given the new catalog layout, but I'm inclined to think
| we should stick to the old convention on compatibility grounds.  Given
| that choice, for consistency we'd better also use pg_largeobject's OID not
| pg_largeobject_metadata's in pg_shdepend entries for LOs.

He concerned about existing applications which have knowledge about internal
layout of system catalogs, then I fixed up the patch according to the suggestion.

> 4. In response to Kaigai Kohei's complaint about lockmode possibly
> being NoLock, I've just added an Assert() that it isn't, in lieu of
> trying to do something sensible in that case.  I can't at present
> think of a situation in which being able to call it that way would be
> useful, and the Assert() seems like it ought to be enough warning to
> anyone coming along later that they'd better think twice before
> thinking that will work.
> 
> 5. Since I'm hoping Tom will read this, I ran it through filterdiff.  :-)
> 
-- 
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: refactoring comment.c

From
Alvaro Herrera
Date:
Excerpts from KaiGai Kohei's message of lun ago 16 00:19:54 -0400 2010:
> (2010/08/16 11:50), Robert Haas wrote:

> When we were developing largeobject access controls, Tom Lane commented
> as follows:
> 
> * Re: [HACKERS] [PATCH] Largeobject access controls
>   http://marc.info/?l=pgsql-hackers&m=125548822906571&w=2
> | I notice that the patch decides to change the pg_description classoid for
> | LO comments from pg_largeobject's OID to pg_largeobject_metadata's.  This
> | will break existing clients that look at pg_description (eg, pg_dump and
> | psql, plus any other clients that have any intelligence about comments,
> | for instance it probably breaks pgAdmin).  And there's just not a lot of
> | return that I can see.  I agree that using pg_largeobject_metadata would
> | be more consistent given the new catalog layout, but I'm inclined to think
> | we should stick to the old convention on compatibility grounds.  Given
> | that choice, for consistency we'd better also use pg_largeobject's OID not
> | pg_largeobject_metadata's in pg_shdepend entries for LOs.
> 
> He concerned about existing applications which have knowledge about internal
> layout of system catalogs, then I fixed up the patch according to the suggestion.

I think that with this patch we have the return for the change that we
didn't have previously.  A patch that changes it should also fix pg_dump
and psql at the same time, but IMHO it doesn't make sense to keep adding
grotty hacks on top of it.

Maybe we could do with a grotty hack in obj_description() instead?
(...checks...) 
Oh, it's defined as a SQL function directly in pg_proc.h :-(

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: refactoring comment.c

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> 5. Since I'm hoping Tom will read this, I ran it through filterdiff.  :-)

OK, I looked ;-).  It mostly looks good, but of course I've got some
opinions...

> 2. I haven't done anything about moving the definition of
> ObjectAddress elsewhere, as Alvaro suggested, because I'm not sure
> quite where it ought to go.  I still think it's a good idea, though
> I'm not dead set on it, either.  Suggestions?

I think the problem is you're trying to put this into backend/parser
which is not really the right place for it.  It's an execution-time
animal not a parse-time animal.  I would put it into backend/catalog,
perhaps named objectaddress.c, and similarly the header file would be
objectaddress.h.  Then it would be reasonable to move struct
ObjectAddress into this header and have dependency.h #include it.
There might be some other stuff in dependency.c that more naturally
belongs here, too.

> 3. I fixed the issue Kaigai Kohei spotted, regarding
> LargeObjectRelationId vs. LargeObjectMetadataRelationId, by adding a
> grotty hack.  However, I feel that I'm not so much adding a new grotty
> hack as working around an existing grotty hack which was added for
> reasons I'm unclear on.  Is there a pg_upgrade-related reason not to
> revert the original hack instead?

It's not pg_upgrade (nor psql, nor pg_dump) we are protecting here.
It's third-party applications that might understand the contents of
pg_description, pg_depend, etc.  I think that hack is quite small
and localized enough to live with, rather than causing a flag day
for an unknown number of clients by changing the representation.

+     /*
+      * Translate the parser representation which identifies this object into
+      * an ObjectAddress. get_object_address() will throw an error if the
+      * object does not exist.
+      */
+     address = get_object_address(stmt->objtype, stmt->objname, stmt->objargs,
+                                  &relation, ShareUpdateExclusiveLock);

I think this comment should also explicitly mention that we're getting a
lock to protect against concurrent DROPs.

+ /*
+  * Translate an object name and arguments (as passed by the parser) to an
+  * ObjectAddress.
+  *
+  * The returned object will be locked using the specified lockmode.  If a
+  * sub-object is looked up, the parent object will be locked instead.
+  *
+  * We don't currently provide a function to release the locks acquired here;
+  * typically, the lock must be held until commit to guard against a concurrent
+  * drop operation.
+  */
+ ObjectAddress
+ get_object_address(ObjectType objtype, List *objname, List *objargs,
+                    Relation *relp, LOCKMODE lockmode)

This comment's a bit shy of a load too, since it totally fails to
mention the relp argument, or specify what the caller is required to
do with it, or explain how the locking on the relation works.

As a matter of style, I'd suggest putting the single externally callable
function (ie get_object_address) at the top of the file not the bottom.
People shouldn't have to read through the entire file before finding out
what API it is supposed to provide to the outside world.
        regards, tom lane


Re: refactoring comment.c

From
Robert Haas
Date:
On Mon, Aug 16, 2010 at 3:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 2. I haven't done anything about moving the definition of
>> ObjectAddress elsewhere, as Alvaro suggested, because I'm not sure
>> quite where it ought to go.  I still think it's a good idea, though
>> I'm not dead set on it, either.  Suggestions?
>
> I think the problem is you're trying to put this into backend/parser
> which is not really the right place for it.  It's an execution-time
> animal not a parse-time animal.  I would put it into backend/catalog,
> perhaps named objectaddress.c, and similarly the header file would be
> objectaddress.h.  Then it would be reasonable to move struct
> ObjectAddress into this header and have dependency.h #include it.
> There might be some other stuff in dependency.c that more naturally
> belongs here, too.

If this isn't parse analysis, then you and I have very different ideas
of what parse analysis is.  And under this theory, what are routines
like LookupAggNameTypeNames() doing in src/backend/parser?

I'll make the rest of the changes you suggest...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Aug 16, 2010 at 3:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the problem is you're trying to put this into backend/parser
>> which is not really the right place for it.

> If this isn't parse analysis, then you and I have very different ideas
> of what parse analysis is.

Maybe so, but the parser is expected to put out a representation that
will still be valid when the command is executed some time later.
That is exactly why utility statements have the barely-more-than-source
parsetree representation they do: because we do not hold locks on the
objects from parsing to execution, we could not expect an OID-level
representation to remain good.  This is a lot different from what we
do with DML statements, but there are good reasons for it.

I repeat my observation that this code doesn't belong in /parser.
The code you're replacing was not in /parser, and that was because
it didn't belong there, not because somebody didn't understand the
system structure.
        regards, tom lane


Re: refactoring comment.c

From
Tom Lane
Date:
I wrote:
> Maybe so, but the parser is expected to put out a representation that
> will still be valid when the command is executed some time later.

Rereading this, I see I didn't make my point very clearly.  The reason
this code doesn't belong in parser/ is that there's no prospect the
parser itself would ever use it.  ObjectAddress is an execution-time
creature because we don't want utility statement representations to be
resolved to OID-level detail before they execute.
        regards, tom lane


Re: refactoring comment.c

From
Robert Haas
Date:
On Tue, Aug 17, 2010 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Maybe so, but the parser is expected to put out a representation that
>> will still be valid when the command is executed some time later.
>
> Rereading this, I see I didn't make my point very clearly.  The reason
> this code doesn't belong in parser/ is that there's no prospect the
> parser itself would ever use it.  ObjectAddress is an execution-time
> creature because we don't want utility statement representations to be
> resolved to OID-level detail before they execute.

Well, that is a good reason for doing it your way, but I'm slightly
fuzzy on why we need a crisp separation between parse-time and
execution-time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 17, 2010 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Rereading this, I see I didn't make my point very clearly. �The reason
>> this code doesn't belong in parser/ is that there's no prospect the
>> parser itself would ever use it. �ObjectAddress is an execution-time
>> creature because we don't want utility statement representations to be
>> resolved to OID-level detail before they execute.

> Well, that is a good reason for doing it your way, but I'm slightly
> fuzzy on why we need a crisp separation between parse-time and
> execution-time.

I don't insist that the separation has to be crisp.  I'm merely saying
that putting a large chunk of useful-only-at-execution-time code into
backend/parser is the Wrong Thing.
        regards, tom lane


Re: refactoring comment.c

From
Robert Haas
Date:
On Tue, Aug 17, 2010 at 2:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Aug 17, 2010 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Rereading this, I see I didn't make my point very clearly.  The reason
>>> this code doesn't belong in parser/ is that there's no prospect the
>>> parser itself would ever use it.  ObjectAddress is an execution-time
>>> creature because we don't want utility statement representations to be
>>> resolved to OID-level detail before they execute.
>
>> Well, that is a good reason for doing it your way, but I'm slightly
>> fuzzy on why we need a crisp separation between parse-time and
>> execution-time.
>
> I don't insist that the separation has to be crisp.  I'm merely saying
> that putting a large chunk of useful-only-at-execution-time code into
> backend/parser is the Wrong Thing.

OK, but there should be a reason for that.  For example, if there are
circumstances when we parse a statement, and then time passes, and
then we execute it later, that's a good argument for what you're
saying here.  But otherwise, the fact that these bits of barely-parsed
stuff get passed all over the backend seems like an inconvenient wart.I was actually thinking of proposing that we make
morethings happen 
during the parsing process and postpone less to the execution phase,
and I need to make sure that I understand why you don't want to do
that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 17, 2010 at 2:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't insist that the separation has to be crisp. �I'm merely saying
>> that putting a large chunk of useful-only-at-execution-time code into
>> backend/parser is the Wrong Thing.

> OK, but there should be a reason for that.  For example, if there are
> circumstances when we parse a statement, and then time passes, and
> then we execute it later, that's a good argument for what you're
> saying here.

Yeah, and that's exactly what happens with utility statements that (for
example) get into the plan cache.

>  I was actually thinking of proposing that we make more things happen
> during the parsing process and postpone less to the execution phase,
> and I need to make sure that I understand why you don't want to do
> that.

The reason to not do that is that you'd have to hold more locks,
and do more management of those locks, in order to protect the
more-thoroughly-analyzed statements from parsing to execution.  In the
case of utility statements, particularly ones that affect a lot of
objects, I think that would be a seriously bad idea.  In fact it would
fly in the face of lessons we learned the hard way.  The whole of
parser/parse_utilcmd.c is code that we used to execute "at parse time",
and had to rearrange to postpone to execution time, in order to avoid
nasty bugs.  It's still in backend/parser because it fits well there
and uses a lot of parser infrastructure that's shared with parse-time
analysis of DML statements.  But neither of those statements hold for
the ObjectAddress resolution code.
        regards, tom lane


Re: refactoring comment.c

From
Robert Haas
Date:
On Tue, Aug 17, 2010 at 2:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Aug 17, 2010 at 2:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I don't insist that the separation has to be crisp.  I'm merely saying
>>> that putting a large chunk of useful-only-at-execution-time code into
>>> backend/parser is the Wrong Thing.
>
>> OK, but there should be a reason for that.  For example, if there are
>> circumstances when we parse a statement, and then time passes, and
>> then we execute it later, that's a good argument for what you're
>> saying here.
>
> Yeah, and that's exactly what happens with utility statements that (for
> example) get into the plan cache.

OK.  I'll have to look through that code some time.

Here's v3.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment

Re: refactoring comment.c

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of mié ago 18 21:32:48 -0400 2010:

> Here's v3.

The header comment in objectaddress.c contains a funny mistake: it says
it works with ObjectAddresses.  However, ObjectAddresses is a different
type altogether, so I recommend not using that as plural for
ObjectAddress.  Maybe "ObjectAddress objects"?  :-D

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: refactoring comment.c

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Robert Haas's message of mié ago 18 21:32:48 -0400 2010:
>> Here's v3.

> The header comment in objectaddress.c contains a funny mistake: it says
> it works with ObjectAddresses.  However, ObjectAddresses is a different
> type altogether, so I recommend not using that as plural for
> ObjectAddress.  Maybe "ObjectAddress objects"?  :-D

Alternatively, maybe ObjectAddresses was a bad choice of type name,
and it should be ObjectAddressList or ObjectAddressArray or some such.
But changing that might be more trouble than it's worth.
        regards, tom lane


Re: refactoring comment.c

From
Robert Haas
Date:
On Thu, Aug 19, 2010 at 11:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Excerpts from Robert Haas's message of mié ago 18 21:32:48 -0400 2010:
>>> Here's v3.
>
>> The header comment in objectaddress.c contains a funny mistake: it says
>> it works with ObjectAddresses.  However, ObjectAddresses is a different
>> type altogether, so I recommend not using that as plural for
>> ObjectAddress.  Maybe "ObjectAddress objects"?  :-D
>
> Alternatively, maybe ObjectAddresses was a bad choice of type name,
> and it should be ObjectAddressList or ObjectAddressArray or some such.
> But changing that might be more trouble than it's worth.

Yeah, I think it was a bad choice of type name.  If I were otherwise
touching that code, I'd probably advocate for changing it, but since
I'm not, I'm inclined to just reword the comment.  It might be
something to keep in mind if we ever overhaul that part of the system,
though, since at that point anything that must be back-patched will
have merge conflicts anyway.

Any other kibitzing before I commit this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Any other kibitzing before I commit this?

Sure ...

+  * If the object is a relation or a child object of a relation (e.g. an
+  * attribute or contraint, *relp will set to point to that relation).  This

Parenthesis in the wrong place here, grammar and spelling not much better.
Also, I still feel that this comment could do better about explaining the
behavior, particularly with respect to locking.  Perhaps say

+  * If the target object is a relation or a child object of a relation
+  * (e.g. an attribute or constraint), the relation is also opened, and *relp
+  * receives the open relcache entry pointer; otherwise *relp is set to NULL.
+  * This is a bit grotty but it makes life simpler, since the caller will
+  * typically need the relcache entry too.  Caller must close the relcache
+  * entry when done with it.  The relation is locked with the specified
+  * lockmode if the target object is the relation itself or an attribute,
+  * but for other child objects, only AccessShareLock is acquired on the
+  * relation.


+     ScanKeyInit(&skey[0], ObjectIdAttributeNumber, BTEqualStrategyNumber,
+                 F_OIDEQ, ObjectIdGetDatum(address.objectId));

There's a standard convention for the layout of ScanKeyInit calls, and
this isn't it.  Trivial, I know, but it's better to make similar code
look similar.

There's no longer any need for a diff in src/backend/parser/Makefile.

+ #define OBJECTADDRESS_H
+ 
+ #include "nodes/parsenodes.h"
+ #include "nodes/pg_list.h"
+ #include "storage/lock.h"
+ #include "utils/rel.h"

You shouldn't need pg_list.h here, as parsenodes.h surely includes it.
        regards, tom lane


Re: refactoring comment.c

From
Robert Haas
Date:
On Thu, Aug 19, 2010 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Any other kibitzing before I commit this?
>
> Sure ...
>
> [kibitzing]

v4.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment

Re: refactoring comment.c

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Aug 19, 2010 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Any other kibitzing before I commit this?
>> 
>> Sure ...
>> 
>> [kibitzing]

> v4.

For my money, you could just have said "Applied with suggested changes".
They were pretty darn trivial.
        regards, tom lane


Re: refactoring comment.c

From
Robert Haas
Date:
On Fri, Aug 27, 2010 at 12:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Aug 19, 2010 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> Any other kibitzing before I commit this?
>>>
>>> Sure ...
>>>
>>> [kibitzing]
>
>> v4.
>
> For my money, you could just have said "Applied with suggested changes".
> They were pretty darn trivial.

Boy, tough to please.  Complaints if you commit it too soon,
complaints if you don't commit it soon enough.

Anyway, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
Peter Eisentraut
Date:
On fre, 2010-08-27 at 07:49 -0400, Robert Haas wrote:
> Anyway, committed.

I suppose this is responsible for this:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -Werror -Wno-inline -I../../../src/include -D_GNU_SOURCE
-I/usr/include/libxml2   -c -o objectaddress.o objectaddress.c -MMD -MP
-MF .deps/objectaddress.Po
cc1: warnings being treated as errors
objectaddress.c: In function ‘get_object_address’:
objectaddress.c:102: error: ‘address.objectId’ may be used uninitialized
in this function
objectaddress.c:102: error: ‘address.classId’ may be used uninitialized
in this function
make[1]: *** [objectaddress.o] Error 1



Re: refactoring comment.c

From
Robert Haas
Date:
On Fri, Aug 27, 2010 at 4:16 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On fre, 2010-08-27 at 07:49 -0400, Robert Haas wrote:
>> Anyway, committed.
>
> I suppose this is responsible for this:
>
> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
> -fwrapv -g -Werror -Wno-inline -I../../../src/include -D_GNU_SOURCE
> -I/usr/include/libxml2   -c -o objectaddress.o objectaddress.c -MMD -MP
> -MF .deps/objectaddress.Po
> cc1: warnings being treated as errors
> objectaddress.c: In function ‘get_object_address’:
> objectaddress.c:102: error: ‘address.objectId’ may be used uninitialized
> in this function
> objectaddress.c:102: error: ‘address.classId’ may be used uninitialized
> in this function
> make[1]: *** [objectaddress.o] Error 1

I suppose this is unhappy because it things elog(ERROR) might return?
For whatever reason, I don't get this on my machine.  But I'll try
plugging that hole and perhaps that will fix it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> I suppose this is unhappy because it things elog(ERROR) might
> return?
It looks more like this code uses it without initialization:       case OBJECT_INDEX:       case OBJECT_SEQUENCE:
caseOBJECT_TABLE:       case OBJECT_VIEW:           relation =               get_relation_by_qualified_name(objtype,
objname,
lockmode);           address.classId = RelationRelationId;           address.objectId = RelationGetRelid(relation);
     address.objectSubId = 0;           break;
 
-Kevin


Re: refactoring comment.c

From
Robert Haas
Date:
On Fri, Aug 27, 2010 at 5:35 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> I suppose this is unhappy because it things elog(ERROR) might
>> return?
>
> It looks more like this code uses it without initialization:
>
>        case OBJECT_INDEX:
>        case OBJECT_SEQUENCE:
>        case OBJECT_TABLE:
>        case OBJECT_VIEW:
>            relation =
>                get_relation_by_qualified_name(objtype, objname,
> lockmode);
>            address.classId = RelationRelationId;
>            address.objectId = RelationGetRelid(relation);
>            address.objectSubId = 0;
>            break;

OK, I must be missing something.  Go through that a little slower?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> OK, I must be missing something.  Go through that a little slower?
No, my mistake.  Please ignore.
I am getting the same warnings, but I think your take on the cause
must be right.
-Kevin



Re: refactoring comment.c

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> I suppose this is unhappy because it things elog(ERROR) might
> return?
If I set these fields right in front of the elog(ERROR) the warnings
go away for me.  (Hopefully this observation will make up, to some
extent, for my earlier brain fart.)
-Kevin


Re: refactoring comment.c

From
Robert Haas
Date:
On Fri, Aug 27, 2010 at 5:57 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> I suppose this is unhappy because it things elog(ERROR) might
>> return?
>
> If I set these fields right in front of the elog(ERROR) the warnings
> go away for me.  (Hopefully this observation will make up, to some
> extent, for my earlier brain fart.)

I set them right after the ERROR so that those statements needn't
actually be executed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> I set them right after the ERROR so that those statements needn't
> actually be executed.
Good thinking.
I checked and that also eliminates the warnings for me.  (No
surprise there, of course.)
-Kevin


Re: refactoring comment.c

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of vie ago 27 18:07:55 -0400 2010:
> On Fri, Aug 27, 2010 at 5:57 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
> > Robert Haas <robertmhaas@gmail.com> wrote:
> >> I suppose this is unhappy because it things elog(ERROR) might
> >> return?
> >
> > If I set these fields right in front of the elog(ERROR) the warnings
> > go away for me.  (Hopefully this observation will make up, to some
> > extent, for my earlier brain fart.)
> 
> I set them right after the ERROR so that those statements needn't
> actually be executed.

Didn't we inject some smarts so that the compiler would notice that
elog(ERROR) doesn't return?  Why isn't it working here?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: refactoring comment.c

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Didn't we inject some smarts so that the compiler would notice that
> elog(ERROR) doesn't return?

No.  If you know a portable (as in "works on every compiler") way
to do that, we could talk.  If only some compilers understand it,
we'll probably end up worse off --- the ones that don't understand it
will still need things like these unreachable assignments, while the
ones that do understand will start warning about unreachable code.
        regards, tom lane


Re: refactoring comment.c

From
Robert Haas
Date:
On Fri, Aug 27, 2010 at 9:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Didn't we inject some smarts so that the compiler would notice that
>> elog(ERROR) doesn't return?
>
> No.  If you know a portable (as in "works on every compiler") way
> to do that, we could talk.  If only some compilers understand it,
> we'll probably end up worse off --- the ones that don't understand it
> will still need things like these unreachable assignments, while the
> ones that do understand will start warning about unreachable code.

What's a bit odd about this is that I do get warnings for this kind of
thing in general, which get turned into errors since I compile with
-Werror; and in fact the version of the patch that I committed has no
fewer than four places where there is a comment that says "placate
compiler".  But for some reason the compiler I used to develop this
patch (gcc-4.2.1 i686-apple-darwin10) did not complain about this
case, for reasons that are not quite clear to me.

Which I guess also goes to your point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> What's a bit odd about this is that I do get warnings for this kind of
> thing in general, which get turned into errors since I compile with
> -Werror; and in fact the version of the patch that I committed has no
> fewer than four places where there is a comment that says "placate
> compiler".  But for some reason the compiler I used to develop this
> patch (gcc-4.2.1 i686-apple-darwin10) did not complain about this
> case, for reasons that are not quite clear to me.

gcc has been able to detect possibly-uninitialized scalar variables
for many years, but only fairly-recent versions seem to apply the
same type of logic to fields of local structs.  I've also noticed
that sometimes it can only spot the potential problem after inlining
a function that sets the local variable, and so a more recent version
and/or a more aggressive -O setting can also affect whether you get
a warning.  In short: this warning is a lot more context sensitive
than you might guess.
        regards, tom lane


Re: refactoring comment.c

From
Robert Haas
Date:
On Fri, Aug 27, 2010 at 9:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> What's a bit odd about this is that I do get warnings for this kind of
>> thing in general, which get turned into errors since I compile with
>> -Werror; and in fact the version of the patch that I committed has no
>> fewer than four places where there is a comment that says "placate
>> compiler".  But for some reason the compiler I used to develop this
>> patch (gcc-4.2.1 i686-apple-darwin10) did not complain about this
>> case, for reasons that are not quite clear to me.
>
> gcc has been able to detect possibly-uninitialized scalar variables
> for many years, but only fairly-recent versions seem to apply the
> same type of logic to fields of local structs.

I thought it might be something like that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: refactoring comment.c

From
Martijn van Oosterhout
Date:
On Fri, Aug 27, 2010 at 09:35:55PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Didn't we inject some smarts so that the compiler would notice that
> > elog(ERROR) doesn't return?
>
> No.  If you know a portable (as in "works on every compiler") way
> to do that, we could talk.  If only some compilers understand it,
> we'll probably end up worse off --- the ones that don't understand it
> will still need things like these unreachable assignments, while the
> ones that do understand will start warning about unreachable code.

We've been here before:

http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg72113.html

The problem appears to be mainly avoiding double evaluation, as
otherwise it's just some macro tweaking. On gcc you can avoid the
double evaluation also, but it's other compilers which we also need to
support.

If we really wanted to we could arrange for GCC to throw an error if
the first argument to elog was non-constant, which would prevent bugs
creeping in due to double evaluation. That still won't help users of
other compilers though.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle