Re: Command Triggers patch v18 - Mailing list pgsql-hackers

From Dimitri Fontaine
Subject Re: Command Triggers patch v18
Date
Msg-id 87vclm5w0t.fsf@hi-media-techno.com
Whole thread Raw
In response to Re: Command Triggers patch v18  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Command Triggers patch v18
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> I'm thinking of things like extension whitelisting.  When some
> unprivileged user says "CREATE EXTENSION harmless", and harmless is
> marked as superuser-only, we might like to have a hook that gets
> called *at permissions-checking time* and gets to say, oh, well, that
> extension is on the white-list, so we're going to allow it.  I think
> you can come up with similar cases for other commands, where in
> general the operation is restricted to superusers or database owners
> or table owners but in specific cases you want to allow others to do
> it.

I did that another way in previous incarnations of the patch, which was
to allow for INSTEAD OF event trigger backed by a SECURITY DEFINER
function. When the extension is whitelisted, prevent against recursion
then CREATE EXTENSION in the security definer function, then signal that
the execution should now be aborted.

That was too dangerous given the lack of policy about where exactly the
user code is fired, but I think we could now implement that for some of
the event timing specs we're listing. Only some of them, I guess only
those that are happening before we lock the objects.

I would then prefer using the INSTEAD OF words that are way more easy to
grasp than AT.

>>>>> CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...])
>>>>> EXECUTE PROCEDURE function_name(args);
>>>>
>>>>  create event trigger prohibit_some_ddl
>>>>     preceding <timing spec>
>>>>          when tag in ('CREATE TABLE', 'ALTER TABLE')
>>>>       execute procedure throw_an_error();
>
> I guess that would make sense if you think there would ever be more
> than one choice for <trigger variable>.  I'm not immediately seeing a
> use case for that, though - I was explicitly viewing the syntax foo

So, the variables in question are tag, objectid, objectname, schemaname
and from a very recent email context. On reflexion, I think the variable
here would only be either tag or context, and that's it.

> More generally, my thought on the structure of this is that you're
> going to have certain toplevel events, many of which will happen at
> only a single place in the code, like "an object got dropped" or "a
> DDL command started" or "a DDL command ended".  So we give those
> names, like sql_drop, ddl_command_start, and ddl_command_end.  Inside

I really dislike mixing sql_drop and ddl_command_start as being the same
kind of objects here, even if I can bend my head in the right angle and
see that it's a fair view when looking at how it's implemented. I can't
see a way to explain that to users without having to explain them how
drop cascade is implemented.

So my proposal here is to “fake” a “proper“ subcommand thanks to the new
context variable. If you DROP TYPE foo CASCADE and that in turn drops a
function foo_in(), then an event trigger is fired with
  context = 'DROP TYPE'  tag = 'DROP FUNCTION'

Same idea when you DROP TABLE … CASCADE and a SEQUENCE and a bunch of
index need to disappear too, you get an usual event trigger fired with
the context set to 'DROP TABLE' this time.

I don't think we need to arrange for explicitly publishing the context
specific information here. If we need to, we have to find the right
timing spec where we can guarantee still being in the top level command
and where we already have the details filled in, then users can attach a
trigger here and register the information for themselves.

> your trigger procedure, the set of magic variables that is available
> will depend on which toplevel event you set the trigger on, but
> hopefully all firings of that toplevel event can provide the same
> magic variables.  For example, at ddl_command_start time, you're just
> gonna get the command tag, but at ddl_command_end time you will get
> the command tag plus maybe some other stuff.

With my proposal above, you could get the same set of information when
being called as a toplevel event or a subevent (one where the context is
not null). That would mean adding object name and schema name lokkups in
the drop cascade code, though. We can also decide not to do that extra
lookup and just publish the object id which we certainly do have.

This way, the timing spec of a sub-event can still be of the same kind
as the top-level event ones, we still have before and after lock entry
points, same with lookup if we add that feature, etc.

> Now, we COULD stop there.  I mean, we could document that you can
> create a trigger on ddl_command_start and every DDL command will fire
> that trigger, and if the trigger doesn't care about some DDL
> operations, then it can look at the command tag and return without
> doing anything for the operations it doesn't care about.  The only
> real disadvantage of doing it that way is speed, and maybe a bit of
> code complexity within the trigger.  So my further thought was that

Within my “context proposal”, you also lose the ability to refer to sub
events as plain events with a context, which I find so much cleaner.

> we'd allow you to specify an optional filter list to restrict which
> events would fire the trigger, and the exact meaning of that filter
> list would vary depending on the toplevel event you've chosen - i.e.
> for ddl_command_start, the filter would be a list of commands, but for
> sql_drop it would be a list of object types.  We could make that more
> complicated if we think that an individual toplevel event will need
> more than one kind of filtering.  For example, if you wanted to filter
> sql_drop events based on the object type AND/OR the schema name, then
> the syntax I proposed would obviously not be adequate.  I'm just not
> convinced we need that, especially because you'd then need to set up a
> dependency between the command trigger and the schema, since obviously
> the command trigger must be dropped if the schema is.  Maybe there are
> better examples; I just can't think of any.

This WHEN syntax allows us to add support for more things in the way
you're describing down the road, and as you say for first version of
that we can limit ourselves to only supporting two variables, context
and tag, and a single operation, in against a list of string literals.

Thinking some more about it we need to add the AND operator in between
several expressions here. We don't need to implement OR because that's
easy to implement by creating more than one event trigger that calls the
same function. We could choose to implement not in, too.

Given the scope of this mini expression language, we can easily bypass
calling the executor in v1 here, and reconsider later if we want to
allow calling a UDF in the WHEN clause… I don't think it's an easy
feature to add in, though.

>> The namespace is often resolved first, sometime even before permission
>> checks are done. See CreateConversionCommand() for an example of that,
>> but really that happens about everywhere IIRC.
>
> Ah!  I was thinking of ALTER commands, but you're right: CREATE
> commands are different.  In many ways a CREATE command is more like an
> operation on the schema than it is an operation on the object itself
> (since that doesn't exist yet).  Those all need to have permissions
> checks and locking *on the schema* intertwined with the schema lookup,
> as RangeVarGetAndCheckCreationNamespace() does.  Unfortunately, the
> equivalent code for non-relations is broken, and I didn't have time to
> fix it in time for 9.2.  I think that would be a good thing to get
> done for 9.3, though: it's only about 2 days worth of work, I think.

Meanwhile we can easily enough refrain from exposing those timing specs
to CREATE command events, though.

> Yeah.  This might even be a case where we should write the
> documentation first and then make the implementation match it, rather
> than the other way around.

I still think we might want to double check how the code is actually
written in each case (create vs alter vs drop, relation commands vs
other types of objects, etc), so that will end up intertwined.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: Re: HTTP Frontend? (and a brief thought on materialized views)
Next
From: Pavel Stehule
Date:
Subject: Re: poll: CHECK TRIGGER?