Thread: Command Triggers patch v18
Hi, I guess I sent v17 a little early considering that we now already have v18 including support for CREATE TABLE AS and SELECT INTO, thanks to the work of Andres and Tom. There was some spurious tags in the sgml files in v17 that I did clean up too. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
On 20 March 2012 17:49, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Hi, > > I guess I sent v17 a little early considering that we now already have > v18 including support for CREATE TABLE AS and SELECT INTO, thanks to the > work of Andres and Tom. > > There was some spurious tags in the sgml files in v17 that I did clean > up too. The new command triggers work correctly. Having looked at your regression tests, you don't seem to have enough "before" triggers in the tests. There's no test for before CREATE TABLE, CREATE TABLE AS or SELECT INTO. In my tests I have 170 unique command triggers, but there are only 44 in the regression test. Is there a reason why there aren't many tests? A problem still outstanding is that when I build the docs, the CREATE COMMAND TRIGGER is listed after COMMAND TRIGGER in html/sql-commands.html. I recall you mentioned you didn't have this issue on your side. Can you just confirm this again? I believe I've located the cause of this problem. In doc/src/sgml/reference.sgml the ALTER/CREATE/DROP COMMAND TRIGGER references are placed below their respective trigger counterparts. Putting them back into alphabetical order corrects the issue. On the pg_cmdtrigger page in the documentation, I'd recommend the following changes: s/Trigger's name/Trigger name/ s/Command TAG/Command tag/ s/The OID of the function that is called by this command trigger./The OID of the function called by this command trigger./ Also "contype" should read "ctgtype". Note that I haven't tested pl/Perl, pl/Python or pl/tcl yet. Regards -- Thom
Thom Brown <thom@linux.com> writes: > The new command triggers work correctly. Thanks for your continued testing :) > Having looked at your regression tests, you don't seem to have enough > "before" triggers in the tests. There's no test for before CREATE > TABLE, CREATE TABLE AS or SELECT INTO. In my tests I have 170 unique > command triggers, but there are only 44 in the regression test. Is > there a reason why there aren't many tests? Now that we share the same code for ANY triggers and specific ones, I guess we could drop a lot of specific command triggers from the regression tests. > A problem still outstanding is that when I build the docs, the CREATE I would like to get back on code level review now if at all possible, and I would integrate your suggestions here into the next patch revision if another one is needed. The only point yet to address from last round from Andres is about the API around CommandFiresTrigger() and the Memory Context we use here. We're missing an explicit Reset call, and to be able to have we need to have a more complex API, because of the way RemoveObjects() and RemoveRelations() work. We would need to add no-reset APIs and an entry point to manually reset the memory context, which currently gets disposed at the same time as its parent context, the current one that's been setup before entering standard_ProcessUtility(). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Friday, March 23, 2012 04:32:02 PM Dimitri Fontaine wrote: > I would like to get back on code level review now if at all possible, > and I would integrate your suggestions here into the next patch revision > if another one is needed. Ok, I will give it another go. Btw I just wanted to alert you to being careful when checking in the expect files ;) NOTICE: snitch: BEFORE any DROP TRIGGER -ERROR: unexpected name list length (3) +NOTICE: snitch: BEFORE DROP TRIGGER <NULL> foo_trigger +NOTICE: snitch: AFTER any DROP TRIGGERcreate conversion test for 'utf8' to 'sjis' from utf8_to_sjis; j you had an apparerently un-noticed error in there ;) 1.if (!HeapTupleIsValid(tup)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg("commandtrigger \"%s\" does not exist, skipping", trigname))); The "skipping" part looks like a copy/pasto... 2. In PLy_exec_command_trigger youre doing a PG_TRY() which looks pointless in the current incarnation. Did you intend to add something in the catch? I think without doing a decref of pltdata both in the sucess and the failure path youre leaking memory. 3. In plpython: Why do you pass objectId/pltobjectname/... as "NULL" instead of None? Using a string for it seems like a bad from of in-band signalling to me. 4. Not sure whether InitCommandContext is the best place to suppress command trigger usage for some commands. That seems rather unintuitive to me. But perhaps the implementation-ease is big enough... Thats everything new I found... Not bad I think. After this somebody else should take a look at I think (commiter or not). > The only point yet to address from last round from Andres is about the > API around CommandFiresTrigger() and the Memory Context we use here. > We're missing an explicit Reset call, and to be able to have we need to > have a more complex API, because of the way RemoveObjects() and > RemoveRelations() work. > > We would need to add no-reset APIs and an entry point to manually reset > the memory context, which currently gets disposed at the same time as > its parent context, the current one that's been setup before entering > standard_ProcessUtility(). Not sure if youre expecting further input from me about that? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 20, 2012 at 1:49 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Hi, > > I guess I sent v17 a little early considering that we now already have > v18 including support for CREATE TABLE AS and SELECT INTO, thanks to the > work of Andres and Tom. > > There was some spurious tags in the sgml files in v17 that I did clean > up too. There are spurious whitespace changes in the documentation. Some of these are of the following form: - return_arr + return_arr create_trigger.sgml adds a stray blank line as well. I think that the syntax for enabling or disabling a command trigger should not use the keyword SET. So just: ALTER COMMAND TRIGGER name ENABLE [ ALWAYS | REPLICA ]; ALTER COMMAND TRIGGER name DISABLE; That way is more parallel with the existing syntax for ordinary triggers. + The name to give the new trigger. This must be distinct from the name + of any other trigger for the same table. The name cannot be + schema-qualified. "For the same table" is a copy-and-pasteo. - /* Look up object address. */ + /* Look up object address. */ Spurious diff. I think that get_object_address_cmdtrigger should be turned into a case within get_object_address_unqualified; I can't see a reason for it to have its own function. + elog(ERROR, "could not find tuple for command trigger %u", trigOid); The standard phrasing is "cache lookup failed for command trigger %u". +/* + * Functions to execute the command triggers. + * + * We call the functions that matches the command triggers definitions in + * alphabetical order, and give them those arguments: + * + * command tag, text + * objectId, oid + * schemaname, text + * objectname, text + * + */ This will not survive pgindent, unless you surround it with ------ and -----. More generally, it would be nice if you could pgindent the whole patch and then fix anything that breaks in such a way that it will survive the next pgindent run. + * if (CommandFiresTriggers(cmd) Missing paren. + /* before or instead of command trigger might have cancelled the command */ Leftovers. @@ -169,7 +169,7 @@ ExplainQuery(ExplainStmt *stmt, const char *queryString, ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("EXPLAIN option BUFFERS requires ANALYZE"))); - + /* if the timing was not set explicitly, set default value */ es.timing = (timing_set) ? es.timing : es.analyze; Spurious diff. All the changes to ruleutils.c appear spurious. Ditto the change to src/test/regress/sql/triggers.sql. In the pg_dump support, it seems you're going to try to execute an empty query if this is run against a pre-9.2 server. It seems like you should just return, or something like that. What do we do in comparable cases elsewhere? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Mar 26, 2012 at 9:11 PM, Robert Haas <robertmhaas@gmail.com> wrote: > [ various trivial issues ] OK, now I got that out of my system. Now on to bigger topics. I am extremely concerned about the way in which this patch arranges to invoke command triggers. You've got call sites spattered all over the place, and I think that's going to be, basically, an unfixable mess and a breeding ground for bugs. On a first read-through: 1. BEFORE ALTER TABLE triggers are fired in AlterTable(). However, an ALTER TABLE statement does not necessarily call AlterTable() once and only once. The real top-level logic for AlterTable is in ProcessUtility(), which runs transformAlterTableStmt() to generate a list of commands and then either calls AlterTable() or recursively invokes ProcessUtility() for each one. This means that if IF EXISTS is used and the table does not exist, then BEFORE command triggers won't get invoked at all. On the other hand, if multiple commands are specified, then I think AlterTable() may get invoked either once or more than once, depending on exactly which commands are specified; and we might manage to fire some CREATE INDEX command triggers or whatnot as well, again depending on exactly what that ALTER TABLE command is doing. 2. BEFORE CREATE TABLE triggers seem to have similar issues; see transformCreateStmt. rhaas=# create table foo (a serial primary key); NOTICE: CREATE TABLE will create implicit sequence "foo_a_seq" for serial column "foo.a" NOTICE: snitch: BEFORE CREATE SEQUENCE public.foo_a_seq [<NULL>] NOTICE: snitch: AFTER CREATE SEQUENCE public.foo_a_seq [16392] NOTICE: snitch: BEFORE CREATE TABLE public.foo [<NULL>] NOTICE: snitch: BEFORE CREATE INDEX public.foo_pkey [<NULL>] NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo" NOTICE: snitch: AFTER CREATE INDEX public.foo_pkey [16398] NOTICE: snitch: BEFORE ALTER SEQUENCE public.foo_a_seq [16392] NOTICE: snitch: AFTER ALTER SEQUENCE public.foo_a_seq [16392] NOTICE: snitch: AFTER CREATE TABLE public.foo [16394] CREATE TABLE 3. RemoveRelations() and RemoveObjects() similarly take the position that when the object does not exist, command triggers need not fire. This seems entirely arbitrary. CREATE EXTENSION IF NOT EXISTS, however, takes the opposite (and probably correct) position that even if we decide not to create the extension, we should still fire command triggers. In a similar vein, AlterFunctionOwner_oid() skips firing the command triggers if the old and new owners happen to be the same, but other forms of ALTER FUNCTION (e.g. ALTER FUNCTION .. COST) fire triggers even if the old and new parameters are the same; and AlterForeignDataWrapperOwner_internal does NOT skip firing command triggers just because the old and new owners are the same. 4. RemoveRelations() and RemoveObjects() also take the position that a statement like "DROP TABLE foo, bar" should fire each relevant BEFORE command trigger twice, then drop both objects, then fire each relevant AFTER command trigger twice. I think that's wrong. It's 100% clear that the user executed one, and only one, command. The only real argument for it is that if we were to only fire command triggers once, we wouldn't be able to pass identifying information about the objects being dropped, since the argument-passing mechanism only has room to pass details about a single object. I think that means that the argument-passing mechanism needs improvement, not that we should redefine one command as two commands. Something like CREATE SCHEMA foo CREATE VIEW bar ... has the same problem: the user only entered one command, but we're firing command triggers as if they entered multiple commands. This case is possibly more defensible than the other one, but note that the two aren't consistent with each other as regards the order of trigger firing, and I actually think that they're both wrong, and that only the toplevel command should fire triggers. 5. It seems desirable for BEFORE command triggers to fire at a consistent point during command execution, but currently they don't. For example, BEFORE DROP VIEW triggers don't fire until we've verified that q exists, is a view, and that we have permission to drop it, but LOAD triggers fire much earlier, before we've really checked anything at all. And ALTER TABLE is somewhere in the middle: we fire the BEFORE trigger after checking permissions on the main table, but before all permissions checks are done, viz: rhaas=> alter table p add foreign key (a) references p2 (a); NOTICE: snitch: BEFORE ALTER TABLE public.p [16418] ERROR: permission denied for relation p2 6. Another pretty hideous aspect of the CREATE TABLE behavior is that AFTER triggers are fired from a completely different place than BEFORE triggers. It's not this patch's fault that our CREATE TABLE and ALTER TABLE code is a Rube Goldberg machine, but finding some place to jam each trigger invocation in that happens to (mostly) work as the code exists today is not sufficiently future-proof. I am not too sure the above list is exhaustive. I think that what's leading you down the garden path here is the desire to pass as much information as possible to each command trigger. But for that, we could simply put the hooks in ProcessUtility(), adding perhaps a bit of logic to deal with recursive invocations of ProcessUtility() to handle subcommands, and call it good. In fact, I think that for AFTER triggers, you could make that work anyway - every time you enter ProcessUtility(), you push something onto a global stack, and then individual commands annotate that data structure with details such as the OIDs of objects being operated on, and after processing the command we fire the after triggers from ProcessUtility(). That way, no matter how anybody changes the code in the future, we're guaranteed that AFTER triggers will always fire exactly once per command, and the worst thing that happens if someone fails to make the right changes in some other part of the code is that the AFTER trigger doesn't get all the command details, and even that seems less likely since a lot of the early exit paths won't have those details to supply anyway. BEFORE triggers are a lot harder, because there you want to gather a certain amount of information and then fire the trigger. There's more than a little bit of tension there. The longer you delay firing the command trigger, the more information you can reliably supply - but on the other hand, the more chance that things will error out before we even reach the command trigger firing site. I think this is going to cause problems in the future as people want to supply more details to the command trigger - people who want more details to, say, selectively deny commands will want to fire the command trigger later in the execution sequence, but people who want to use it for auditing will want to fire it earlier in the sequence, to log all attempts. I don't know how to resolve this tension, but it seems to me that we'd better try to pick a rule and follow it consistently; and we'd better make sure that the rule is clear, documented, and well-defined. More nitpicking: - Creating a command trigger on a nonexistent function fails with "cache lookup failed for function 0". - tablespace.c has useless leftover changes. - One of the examples in the docs says "RETURNS command trigger" instead of "RETURNS command_trigger". - The "DROP COMMAND TRIGGER" documentation contains leftovers that refer to a parameter called "command" which no longer exists. - The example in the "DROP COMMAND TRIGGER" documentation fails to execute, as the syntax is no longer recognized. - check_cmdtrigger_name is mostly duplicative of get_cmdtrigger_oid. I think it can be rewritten as if (OidIsValid(get_cmdtrigger_oid(...)) ereport(...). - I believe that all the changes to dropdb() can be reverted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, First things first, thanks for the review! I'm working on a new version of the patch to fix all the specific comments you called "nitpicking" here and in your previous email. This new patch will also implement a single list of triggers to run in alphabetical order, not split by ANY/specific command. Robert Haas <robertmhaas@gmail.com> writes: > I am extremely concerned about the way in which this patch arranges to > invoke command triggers. You've got call sites spattered all over the > place, and I think that's going to be, basically, an unfixable mess > and a breeding ground for bugs. On a first read-through: In the first versions of the patch I did try to have a single point where to integrate the feature and that didn't work out. If you want to publish information such as object id, name and schema you need to have specialized code spread out everywhere. Then about where to call the trigger, it's a per-command decision with a general policy. Your comments here are of two kinds, mostly about having to guess the policy because it's not explicit, and some specifics that either are in question or not following up on the policy. The policy I've been willing to install is that command triggers should get fired once the basic error checking is done. That's not perfect for auditing purposes *if you want to log all attempts* but it's good enough to audit all commands that had an impact on your system, and you still can block commands. Also, in most commands you can't get the object id and name before the basic error checking is done. Now, about the IF NOT EXISTS case, with the policy just described there's no reason to trigger user code, but I can also see your position here. > 1. BEFORE ALTER TABLE triggers are fired in AlterTable(). However, an I used to have that in utility.c but Andres had me move that out, and it seems like I should get back to utility.c for the reasons you're mentioning and that I missed. > 2. BEFORE CREATE TABLE triggers seem to have similar issues; see > transformCreateStmt. > > rhaas=# create table foo (a serial primary key); > NOTICE: CREATE TABLE will create implicit sequence "foo_a_seq" for > serial column "foo.a" > NOTICE: snitch: BEFORE CREATE SEQUENCE public.foo_a_seq [<NULL>] > NOTICE: snitch: AFTER CREATE SEQUENCE public.foo_a_seq [16392] > NOTICE: snitch: BEFORE CREATE TABLE public.foo [<NULL>] > NOTICE: snitch: BEFORE CREATE INDEX public.foo_pkey [<NULL>] > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "foo_pkey" for table "foo" > NOTICE: snitch: AFTER CREATE INDEX public.foo_pkey [16398] > NOTICE: snitch: BEFORE ALTER SEQUENCE public.foo_a_seq [16392] > NOTICE: snitch: AFTER ALTER SEQUENCE public.foo_a_seq [16392] > NOTICE: snitch: AFTER CREATE TABLE public.foo [16394] > CREATE TABLE That's meant to be a feature of the command trigger system, that's been asked about by a lot of people. Having triggers fire for sub commands is possibly The Right Thing™, or at least the most popular one. > 3. RemoveRelations() and RemoveObjects() similarly take the position > that when the object does not exist, command triggers need not fire. > This seems entirely arbitrary. CREATE EXTENSION IF NOT EXISTS, > however, takes the opposite (and probably correct) position that even > if we decide not to create the extension, we should still fire command > triggers. In a similar vein, AlterFunctionOwner_oid() skips firing > the command triggers if the old and new owners happen to be the same, > but other forms of ALTER FUNCTION (e.g. ALTER FUNCTION .. COST) fire > triggers even if the old and new parameters are the same; and > AlterForeignDataWrapperOwner_internal does NOT skip firing command > triggers just because the old and new owners are the same. We integrate here with the code as it stands, I didn't question the error management already existing. Do we need to revise that in the scope of the command triggers patch? > 4. RemoveRelations() and RemoveObjects() also take the position that a > statement like "DROP TABLE foo, bar" should fire each relevant BEFORE > command trigger twice, then drop both objects, then fire each relevant > AFTER command trigger twice. I think that's wrong. It's 100% clear See above, it's what users are asking us to implement as a feature. > 5. It seems desirable for BEFORE command triggers to fire at a > consistent point during command execution, but currently they don't. The policy should be to fire the triggers once the usual error checking is done. > For example, BEFORE DROP VIEW triggers don't fire until we've verified > that q exists, is a view, and that we have permission to drop it, but > LOAD triggers fire much earlier, before we've really checked anything > at all. And ALTER TABLE is somewhere in the middle: we fire the > BEFORE trigger after checking permissions on the main table, but > before all permissions checks are done, viz: I will rework LOAD, and ALTER TABLE too, which is more work as you can imagine, because now we have to insinuate the command trigger calling into each branch of what ALTER TABLE is able to implement. > 6. Another pretty hideous aspect of the CREATE TABLE behavior is that > AFTER triggers are fired from a completely different place than BEFORE > triggers. It's not this patch's fault that our CREATE TABLE and > ALTER TABLE code is a Rube Goldberg machine, but finding some place to > jam each trigger invocation in that happens to (mostly) work as the > code exists today is not sufficiently future-proof. I'm willing to make it better, what are your ideas? > command we fire the after triggers from ProcessUtility(). That way, > no matter how anybody changes the code in the future, we're guaranteed > that AFTER triggers will always fire exactly once per command, and the From last year's cluster hacker meeting then several mails on this list, it appears that we don't want to do it the way you propose, which indeed would be simpler to implement. > the other hand, the more chance that things will error out before we > even reach the command trigger firing site. I think this is going to I think that's a feature. You skip calling the commands trigger when you know you won't run the command at all. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Tue, Mar 27, 2012 at 4:27 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > In the first versions of the patch I did try to have a single point > where to integrate the feature and that didn't work out. If you want to > publish information such as object id, name and schema you need to have > specialized code spread out everywhere. > [...] > > That's meant to be a feature of the command trigger system, that's been > asked about by a lot of people. Having triggers fire for sub commands is > possibly The Right Thing™, or at least the most popular one. > [...] > > I will rework LOAD, and ALTER TABLE too, which is more work as you can > imagine, because now we have to insinuate the command trigger calling > into each branch of what ALTER TABLE is able to implement. > [...] > > From last year's cluster hacker meeting then several mails on this list, > it appears that we don't want to do it the way you propose, which indeed > would be simpler to implement. > [...] > > I think that's a feature. You skip calling the commands trigger when you > know you won't run the command at all. I am coming more and more strongly to the conclusion that we're going in the wrong direction here. It seems to me that you're spending an enormous amount of energy implementing something that goes by the name COMMAND TRIGGER when what you really want is an EVENT TRIGGER. Apparently, you don't want a callback every time someone types CREATE TABLE: you want a callback every time a table gets created. You don't want a callback every time someone types DROP FUNCTION; you want a callback every time a function gets dropped. If the goal here is to do replication, you'd more than likely even want such callbacks to occur when the function is dropped indirectly via CASCADE. In the interest of getting event triggers, you're arguing that we should contort the definition of the work "command" to include sub-commands, but not necessarily commands that turn out to be a no-op, and maybe things that are sort of like what commands do even though nobody actually ever executed a command by that name. I just don't think that's a good idea. We either have triggers on commands, and they execute once per command, period; or we have triggers on events and they execute every time that event happens. As it turns out, two people have already put quite a bit of work into designing and implementing what amounts to an event trigger system for PostgreSQL: me, and KaiGai Kohei. It's called the ObjectAccessHook mechanism, and it fires every time we create or drop an object. It passes the type of object created or dropped, the OID of the object, and the column number also in the case of a column. The major drawback of this mechanism is that you have to write the code you want to execute in C, and you can't arrange for it to be executed via a DDL command; instead, you have to set a global variable from your shared library's _PG_init() function. However, I don't think that would be very hard to fix. We could simply replaces the InvokeObjectAccessHook() macro with a function that calls a list of triggers pulled from the catalog. I think there are a couple of advantages of this approach over what you've got right now. First, there are a bunch of tested hook points that are already committed. They have well-defined semantics that are easy to understand: every time we create or drop an object, you get a callback with these arguments. Second, KaiGai Kohei is interested in adding more hook points in the future to service sepgsql. If the command triggers code and the sepgsql code both use the same set of hook points then (1) we won't clutter the code with multiple sets of hook points and (2) any features that get added for purposes of command triggers will also benefit sepgsql, and visca versa. Since the two of you are trying to solve very similar problems, it would be nice if both of you were pulling in the same direction. Third, and most importantly, it seems to match the semantics you want, which is a callback every time something *happens* rather than a callback every time someone *types something*. Specifically, I propose the following plan: - Rename COMMAND TRIGGER to EVENT TRIGGER. Rewrite the documentation to say that we're going to fire triggers every time an *event* happens. Rewrite the code to put the firing mechanism inside InvokeObjectAccessHook, which will become a function rather than a macro. - Change the list of supported trigger types to match what the ObjectAccessHook mechanism understands, which means, at present, post-create and drop. It might even make sense to forget about having separate hooks for every time of object that can be created or dropped and instead just let people say CREATE EVENT TRIGGER name ON { CREATE | DROP } EXECUTE PROCEDURE function_name ( arguments ). - Once that's done, start adding object-access-hook invocations (and thus, the corresponding command trigger functionality) to whatever other operations you'd like to have support it. I realize this represents a radical design change from what you have right now, but what you have right now is messy and ill-defined and I don't think you can easily fix it. You're exposing great gobs of implementation details which means that, inevitably, every time anyone wants to refactor some code, the semantics of command triggers are going to change, or else the developer will have to go to great lengths to ensure that they don't. I don't think either of those things is going to make anyone very happy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tuesday, March 27, 2012 02:55:47 PM Robert Haas wrote: > On Tue, Mar 27, 2012 at 4:27 AM, Dimitri Fontaine > > <dimitri@2ndquadrant.fr> wrote: > > In the first versions of the patch I did try to have a single point > > where to integrate the feature and that didn't work out. If you want to > > publish information such as object id, name and schema you need to have > > specialized code spread out everywhere. > > [...] > > > That's meant to be a feature of the command trigger system, that's been > > asked about by a lot of people. Having triggers fire for sub commands is > > possibly The Right Thing™, or at least the most popular one. > > [...] > > > I will rework LOAD, and ALTER TABLE too, which is more work as you can > > imagine, because now we have to insinuate the command trigger calling > > into each branch of what ALTER TABLE is able to implement. > > [...] > > > From last year's cluster hacker meeting then several mails on this list, > > it appears that we don't want to do it the way you propose, which indeed > > would be simpler to implement. > > [...] > > > I think that's a feature. You skip calling the commands trigger when you > > know you won't run the command at all. > > I am coming more and more strongly to the conclusion that we're going > in the wrong direction here. It seems to me that you're spending an > enormous amount of energy implementing something that goes by the name > COMMAND TRIGGER when what you really want is an EVENT TRIGGER. > Apparently, you don't want a callback every time someone types CREATE > TABLE: you want a callback every time a table gets created. Not necessarily. One use-case is doing something *before* something happens like enforcing naming conventions, data types et al during development/schema migration. > In the > interest of getting event triggers, you're arguing that we should > contort the definition of the work "command" to include sub-commands, > but not necessarily commands that turn out to be a no-op, and maybe > things that are sort of like what commands do even though nobody > actually ever executed a command by that name. I just don't think > that's a good idea. We either have triggers on commands, and they > execute once per command, period; or we have triggers on events and > they execute every time that event happens. I don't think thats a very helpfull definition. What on earth would you want to do with plain passing of CREATE SCHEMA blub CREATE TABLE foo() CREATE TABLE bar(); So some decomposition seems to be necessary. Getting the level right sure ain't totally easy. It might be helpful to pass in the context from which something like that happened. > As it turns out, two people have already put quite a bit of work into > designing and implementing what amounts to an event trigger system for > PostgreSQL: me, and KaiGai Kohei. It's called the ObjectAccessHook > mechanism, and it fires every time we create or drop an object. It > passes the type of object created or dropped, the OID of the object, > and the column number also in the case of a column. The major > drawback of this mechanism is that you have to write the code you want > to execute in C, and you can't arrange for it to be executed via a DDL > command; instead, you have to set a global variable from your shared > library's _PG_init() function. However, I don't think that would be > very hard to fix. We could simply replaces the > InvokeObjectAccessHook() macro with a function that calls a list of > triggers pulled from the catalog. Which would basically require including pg_dump in the backend to implement replication and logging. I don't think librarifying pg_dump is a fair burden at all. Also I have a *very hard* time to imagining to sensibly implement replicating CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object access hooks. There is just not enough context. How would you discern between a ADD COLUMN and a CREATE TABLE? They look very similar or even identical on a catalog level. I also have the strong feeling that all this would expose implementation details *at least* as much as command triggers. A slight change in order of catalog modifcation would be *way* harder to hide via the object hook than something of a similar scale via command triggers. > I think there are a couple of advantages of this approach over what > you've got right now. First, there are a bunch of tested hook points > that are already committed. They have well-defined semantics that are > easy to understand: every time we create or drop an object, you get a > callback with these arguments. Second, KaiGai Kohei is interested in > adding more hook points in the future to service sepgsql. If the > command triggers code and the sepgsql code both use the same set of > hook points then (1) we won't clutter the code with multiple sets of > hook points and (2) any features that get added for purposes of > command triggers will also benefit sepgsql, and visca versa. Since > the two of you are trying to solve very similar problems, it would be > nice if both of you were pulling in the same direction. Third, and > most importantly, it seems to match the semantics you want, which is a > callback every time something *happens* rather than a callback every > time someone *types something*. Obviously unifying both on implementation level would be nice, but I don't really see it happening. The ObjectAccessHook mechanism seems too low level to me. > Specifically, I propose the following plan: > > - Rename COMMAND TRIGGER to EVENT TRIGGER. Rewrite the documentation > to say that we're going to fire triggers every time an *event* > happens. Rewrite the code to put the firing mechanism inside > InvokeObjectAccessHook, which will become a function rather than a > macro. That would mean extending the ObjectAccessHook mechanism to handle: - altering objects - before events (which would make the current data passing mechanism moot, you can't look in the catalogs yet) - passing names (otherwise simple triggers are *WAY* to hard to write) - giving enough usefull context to know whats happening. Knowing that a new row was created is just about pointless in itself if you don't know why/from where. - ... Which would - at least as far I can see - essentially morph into the command triggers patch. > I realize this represents a radical design change from what you have > right now, but what you have right now is messy and ill-defined and I > don't think you can easily fix it. You're exposing great gobs of > implementation details which means that, inevitably, every time anyone > wants to refactor some code, the semantics of command triggers are > going to change, or else the developer will have to go to great > lengths to ensure that they don't. I don't think either of those > things is going to make anyone very happy. Hm. I agree that there is some work needed to make the whole handling more consistent. But I don't think its as bad as you paint it. I don't want to brush of your review here - you definitely do raise valid points. I don't particularly like the current implementation very much. But I simply fail to see a less annoying way. Its also not that I am just defending a colleague - you might have noticed the @2ndquadrant - I got interested in this patch quite a bit before that was on the horizon. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 27, 2012 at 9:37 AM, Andres Freund <andres@2ndquadrant.com> wrote: > Not necessarily. One use-case is doing something *before* something happens > like enforcing naming conventions, data types et al during development/schema > migration. That is definitely a valid use case. The only reason why we don't have something like that built into the ObjectAccessHook framework is because we haven't yet figured out a clean way to make it work. Most of KaiGai's previous attempts involved passing a bunch of garbage selected apparently at random into the hook function, and I rejected that as unmaintainable. Dimitri's code here doesn't have that problem - it passes in a consistent set of information across the board. But I still think it's unmaintainable, because there's no consistency about when triggers get invoked, or whether they get invoked at all. We need something that combines the strengths of both approaches. I actually think that, to really meet all needs here, we may need the ability to get control in more than one place. For example, one thing that KaiGai has wanted to do (and I bet Dimitri would live to be able to do it too, and I'm almost sure that Dan Farina would like to do it) is override the built-in security policy for particular commands. I think that KaiGai only wants to be able to deny things that would normally be allowed, but I suspect some of those other folks would also like to be able to allow things that would normally be denied. For those use cases, you want to get control at permissions-checking time. However, where Dimitri has the hooks right now, BEFORE triggers for ALTER commands fire after you've already looked up the object that you're manipulating. That has the advantage of allowing you to use the OID of the object, rather than its name, to make policy decisions; but by that time it's too late to cancel a denial-of-access by the first-line permissions checks. Dimitri also mentioned wanting to be able to cancel the actual operation - and presumably, do something else instead, like go execute it on a different node, and I think that's another valid use case for this kind of trigger. It's going to take some work, though, to figure out what the right set of control points is, and it's probably going to require some refactoring of the existing code, which is a mess that I have been slowly trying to clean up. >> In the >> interest of getting event triggers, you're arguing that we should >> contort the definition of the work "command" to include sub-commands, >> but not necessarily commands that turn out to be a no-op, and maybe >> things that are sort of like what commands do even though nobody >> actually ever executed a command by that name. I just don't think >> that's a good idea. We either have triggers on commands, and they >> execute once per command, period; or we have triggers on events and >> they execute every time that event happens. > I don't think thats a very helpfull definition. What on earth would you want to > do with plain passing of > CREATE SCHEMA blub CREATE TABLE foo() CREATE TABLE bar(); > > So some decomposition seems to be necessary. Getting the level right sure > ain't totally easy. > It might be helpful to pass in the context from which something like that > happened. I agree that it's not a very helpful decision, but I'm not the one who said we wanted command triggers rather than event triggers. :-) > Which would basically require including pg_dump in the backend to implement > replication and logging. I don't think librarifying pg_dump is a fair burden > at all. I don't either, but that strikes me as a largely unrelated problem. As you may recall, I've complained that too much of that functionality is in the client in the past, and I haven't changed my opinion on that. But how is that any different with Dimitri's approach? You can get a callback AFTER CREATE TABLE, and you'll get the table name. Now what? If you get the trigger in C you can get the node tree, but that's hardly any better. You're still going to need to do some pretty tricky push-ups to get reliable replication. It's not at all evident to me that the parse-tree is any better a place to start than the system catalog representation; in fact, I would argue that it's probably much worse, because you'll have to exactly replicate whatever the backend did to decide what catalog entries to create, or you'll get drift between servers. > Also I have a *very hard* time to imagining to sensibly implement replicating > CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object access hooks. > There is just not enough context. How would you discern between a ADD COLUMN > and a CREATE TABLE? They look very similar or even identical on a catalog > level. That can be fixed using the optional argument to InvokeObjectAccessHook, similar to what we've done for differentiating internal drops from other drops. > I also have the strong feeling that all this would expose implementation > details *at least* as much as command triggers. A slight change in order of > catalog modifcation would be *way* harder to hide via the object hook than > something of a similar scale via command triggers. I don't see that. Right now, when you do something like CREATE TABLE foo (a serial primary key), the exact order in which the objects are created is apparent from the order of the trigger firing. The order of BEFORE vs. AFTER triggers is also drawn out of a hat. So I don't see how it's any worse. It also avoids a lot of definitional ambiguity. If you say that we're going to have a trigger on the CREATE SEQUENCE command, then what happens when the user creates a sequence via some other method? The current patch says that we should handle that by calling the CREATE SEQUENCE trigger if it happens to be convenient because we're going through the same code path that a normal CREATE SEQUENCE would have gone through, but if it uses a different code path then let's not bother. Otherwise, how do you explain the fact that a DROP .. CASCADE doesn't fire triggers for the subordinate objects dropped, but CREATE TABLE does fire triggers for the subordinate objects created? If you trigger on events, then things are a lot more cut-and-dried. You fire the trigger when the event happens. If the event doesn't happen, you don't fire the trigger. If an implementation change causes objects to be created when they weren't before, or not created when they were before, then, yes, trigger firing behavior will change, but it will be clear that the new behavior is correct. Right now, future developers modifying this code have no reasonable basis for deciding whether or not a trigger ought to fire in a given situation. There's no consistent rule, no specification, and no documentation for how that's intended to happen. > Hm. I agree that there is some work needed to make the whole handling more > consistent. But I don't think its as bad as you paint it. > > I don't want to brush of your review here - you definitely do raise valid > points. I don't particularly like the current implementation very much. But I > simply fail to see a less annoying way. > > Its also not that I am just defending a colleague - you might have noticed the > @2ndquadrant - I got interested in this patch quite a bit before that was on > the horizon. Yep, understood, and I'd have the same complaints about this patch that I do presently if it were submitted by someone who worked for EnterpriseDB, or some independent third party. I do think it's a bit of a shame that more people didn't pay attention to and help improve the ObjectAccessHook machinery as we were working on that. I mean, Dimitri is not the first or last person to want to get control during DDL operations, and KaiGai's already done a lot of work figuring out how to make it work reasonably. Pre-create hooks don't exist in that machinery not because nobody wants them, but because it's hard. This whole problem is hard. It would be wrong to paint it as a problem that is unsolvable or not valuable, but it would be equally wrong to expect that it's easy or that anyone's first attempt (mine, yours, Dimitri's, KaiGai's, or Tom Lane's) is going to fall painlessly into place without anyone needing to sweat a little blood. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I am coming more and more strongly to the conclusion that we're going > in the wrong direction here. It seems to me that you're spending an > enormous amount of energy implementing something that goes by the name > COMMAND TRIGGER when what you really want is an EVENT TRIGGER. No. The following two features are not allowed by what you call an EVENT TRIGGER yet the very reason why I started working on COMMAND TRIGGERs: - BEFORE COMMAND TRIGGER- Having the command string available in the command trigger Now, because of scheduling, the current patch has been reduced not to include the second feature yet, which is a good trade-off for now. Yet it's entirely possible to implement such feature as an extension once 9.2 is out given current COMMAND TRIGGER patch. > I realize this represents a radical design change from what you have > right now, but what you have right now is messy and ill-defined and I That's only because I've not been doing the hard choices alone, I wanted to be able to speak about them here, and the only time that discussion happen is when serious hand down code review is being done. My take? Let's make the hard decisions together. Mechanisms are implemented. The plural is what is causing problems here, but that also mean we can indeed implement several policies now. I've been proposing a non-messy policy in a previous mail, which I realize the patch is not properly implementing now. I'd think moving the patch to implement said policy (or another one after discussion) is next step. > don't think you can easily fix it. You're exposing great gobs of > implementation details which means that, inevitably, every time anyone > wants to refactor some code, the semantics of command triggers are > going to change, or else the developer will have to go to great > lengths to ensure that they don't. I don't think either of those > things is going to make anyone very happy. I guess you can't really have your cake and eat it too, right? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes: > I actually think that, to really meet all needs here, we may need the > ability to get control in more than one place. For example, one thing > that KaiGai has wanted to do (and I bet Dimitri would live to be able > to do it too, and I'm almost sure that Dan Farina would like to do it) > is override the built-in security policy for particular commands. I I had that in a previous version of the patch, and removed it because you were concerned about our ability to review it in time for 9.2, which has obviously been a right decision. That was called INSTEAD OF command triggers, and they could call a SECURITY DEFINER function. > I agree that it's not a very helpful decision, but I'm not the one who > said we wanted command triggers rather than event triggers. :-) Color me unconvinced about event triggers. That's not answering my use cases. > that. But how is that any different with Dimitri's approach? You can > get a callback AFTER CREATE TABLE, and you'll get the table name. Now > what? If you get the trigger in C you can get the node tree, but > that's hardly any better. You're still going to need to do some > pretty tricky push-ups to get reliable replication. It's not at all What you do with the parse tree is rewrite the command. It's possible to do, but would entail exposing the internal parser state which Tom objects too. I'm now thinking that can be maintained as a C extension. > evident to me that the parse-tree is any better a place to start than > the system catalog representation; in fact, I would argue that it's > probably much worse, because you'll have to exactly replicate whatever > the backend did to decide what catalog entries to create, or you'll > get drift between servers. Try to build a command string from the catalogs… even if you can store a snapshot of them before and after the command. Remember that you might want to “replicate” to things that are NOT a PostgreSQL server. > ambiguity. If you say that we're going to have a trigger on the > CREATE SEQUENCE command, then what happens when the user creates a > sequence via some other method? The current patch says that we should > handle that by calling the CREATE SEQUENCE trigger if it happens to be > convenient because we're going through the same code path that a > normal CREATE SEQUENCE would have gone through, but if it uses a > different code path then let's not bother. Otherwise, how do you Yes, the current set of which commands fire which triggers is explained by how the code is written wrt standard_ProcessUtility() calls. We could mark re-entrant calls and disable the command trigger feature, it would not be our first backend global variable in flight. > Dimitri is not the first or last person to want to get control during > DDL operations, and KaiGai's already done a lot of work figuring out > how to make it work reasonably. Pre-create hooks don't exist in that > machinery not because nobody wants them, but because it's hard. This I've been talking with Kaigai about using the Command Trigger infrastructure to implement its control hooks, while reviewing one of his patches, and he said that's not low-level enough for him. > whole problem is hard. It would be wrong to paint it as a problem > that is unsolvable or not valuable, but it would be equally wrong to > expect that it's easy or that anyone's first attempt (mine, yours, > Dimitri's, KaiGai's, or Tom Lane's) is going to fall painlessly into > place without anyone needing to sweat a little blood. Sweating over that feature is a good summary of a whole lot of my and some others' time lately. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Hi, On Tuesday, March 27, 2012 04:29:58 PM Robert Haas wrote: > On Tue, Mar 27, 2012 at 9:37 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > Not necessarily. One use-case is doing something *before* something > > happens like enforcing naming conventions, data types et al during > > development/schema migration. > > That is definitely a valid use case. The only reason why we don't > have something like that built into the ObjectAccessHook framework is > because we haven't yet figured out a clean way to make it work. Most > of KaiGai's previous attempts involved passing a bunch of garbage > selected apparently at random into the hook function, and I rejected > that as unmaintainable. Dimitri's code here doesn't have that problem > - it passes in a consistent set of information across the board. But > I still think it's unmaintainable, because there's no consistency > about when triggers get invoked, or whether they get invoked at all. > We need something that combines the strengths of both approaches. Yes. I still think the likeliest way towards that is defining some behaviour we want regarding * naming conflict handling * locking And then religiously make sure the patch adheres to that. That might need some refactoring of existing code (like putting a art of the utility.c handling of create table into its own function and such), but should be doable. I think BEFORE command triggers ideally should run * before permission checks * before locking * before internal checks are done (nameing conflicts, type checks and such) Obviously some things will be caught before that (parse analysis of some commands) and I think we won't be able to fully stop that, but its not totally consistent now and while doing some work in the path of this patch seems sensible it cannot do-over everything wrt this. Looking up oids and such before calling the trigger doesn't seem to be problematic from my pov. Command triggers are a superuser only facility, additional information they gain are no problem. Thinking about that I think we should enforce command trigger functions to be security definer functions. > I actually think that, to really meet all needs here, we may need the > ability to get control in more than one place. Not sure what you mean by that. Before/after permission checks, before/after consistency checks? That sort of thing? > For example, one thing > that KaiGai has wanted to do (and I bet Dimitri would live to be able > to do it too, and I'm almost sure that Dan Farina would like to do it) > is override the built-in security policy for particular commands. Dim definitely seems to want that: https://github.com/dimitri/pgextwlist ;) > I think that KaiGai only wants to be able to deny things that would > normally be allowed, but I suspect some of those other folks would > also like to be able to allow things that would normally be denied. Denying seems to be easier than allowing stuff safely.... > For those use cases, you want to get control at permissions-checking > time. However, where Dimitri has the hooks right now, BEFORE triggers > for ALTER commands fire after you've already looked up the object that > you're manipulating. That has the advantage of allowing you to use > the OID of the object, rather than its name, to make policy decisions; > but by that time it's too late to cancel a denial-of-access by the > first-line permissions checks. Why? Just throw a access denied exception? Unless its done after the locking that won't be visible by anything but timing? Additional granting is more complex though, I am definitely with you there. That will probably need INSTEAD triggers which I don't see for now. Those will have their own share of problems. > Dimitri also mentioned wanting to be able to cancel the actual operation - > and presumably, do something else instead, like go execute it on a different > node, and I think that's another valid use case for this kind of trigger. > It's going to take some work, though, to figure out what the right set of > control points is, and it's probably going to require some refactoring of > the existing code, which is a mess that I have been slowly trying to clean > up. I commend your bravery... > >> In the interest of getting event triggers, you're arguing that we should > >> contort the definition of the work "command" to include sub-commands, > >> but not necessarily commands that turn out to be a no-op, and maybe > >> things that are sort of like what commands do even though nobody > >> actually ever executed a command by that name. I just don't think > >> that's a good idea. We either have triggers on commands, and they > >> execute once per command, period; or we have triggers on events and > >> they execute every time that event happens. > > I don't think thats a very helpfull definition. What on earth would you > > want to do with plain passing of > > CREATE SCHEMA blub CREATE TABLE foo() CREATE TABLE bar(); > > So some decomposition seems to be necessary. Getting the level right sure > > ain't totally easy. > > It might be helpful to pass in the context from which something like that > > happened. > I agree that it's not a very helpful decision, but I'm not the one who > said we wanted command triggers rather than event triggers. :-) Well, but on the other hand what will you do with a "pg_attribute row added" event. It doesn't even have a oid ;) > > Which would basically require including pg_dump in the backend to > > implement replication and logging. I don't think librarifying pg_dump is > > a fair burden at all. > I don't either, but that strikes me as a largely unrelated problem. > As you may recall, I've complained that too much of that functionality > is in the client in the past, and I haven't changed my opinion on > that. I am not really happy with that as well. But if we start putting all that into this patch we will never get anywhere. I think moving that nearer to the server in some form is a MAJOR project on its own. Also I dont see it solving differential stuff at all. But how would do something like logging the originating statement with the object access hook machinery? > But how is that any different with Dimitri's approach? You can > get a callback AFTER CREATE TABLE, and you'll get the table name. Now > what? If you get the trigger in C you can get the node tree, but > that's hardly any better. You're still going to need to do some > pretty tricky push-ups to get reliable replication. It's not at all > evident to me that the parse-tree is any better a place to start than > the system catalog representation; in fact, I would argue that it's > probably much worse, because you'll have to exactly replicate whatever > the backend did to decide what catalog entries to create, or you'll > get drift between servers. Well, the problem is with just catalog access its just about impossible to generate differential changes from them. Also the catalog between clusters *wont be the same* in a large part of the use-cases. Oids will basically never match. Matching the intent, which is mostly visible in the parsetree, seems to be a whole much more helpfull. I am pretty confident, given time, that I could write a fairly complete differential-sql generation for the usual parts of DDL from the parsetree alone in 2-3 days with a C level trigger. I am also very sure that I wouldn't be able to even remotely get that far with catalog only access. > > Also I have a *very hard* time to imagining to sensibly implement > > replicating CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object > > access hooks. There is just not enough context. How would you discern > > between a ADD COLUMN and a CREATE TABLE? They look very similar or even > > identical on a catalog level. > That can be fixed using the optional argument to > InvokeObjectAccessHook, similar to what we've done for differentiating > internal drops from other drops. Several of the points where those hooks are invoked don't even seem to have the necessary information for that... Doing that seems to involve passing heaps of additional state through the backend. > > I also have the strong feeling that all this would expose implementation > > details *at least* as much as command triggers. A slight change in order > > of catalog modifcation would be *way* harder to hide via the object hook > > than something of a similar scale via command triggers. > I don't see that. Right now, when you do something like CREATE TABLE > foo (a serial primary key), the exact order in which the objects are > created is apparent from the order of the trigger firing. The order > of BEFORE vs. AFTER triggers is also drawn out of a hat. So I don't > see how it's any worse. It also avoids a lot of definitional > ambiguity. If you say that we're going to have a trigger on the > CREATE SEQUENCE command, then what happens when the user creates a > sequence via some other method? The current patch says that we should > handle that by calling the CREATE SEQUENCE trigger if it happens to be > convenient because we're going through the same code path that a > normal CREATE SEQUENCE would have gone through, but if it uses a > different code path then let's not bother. > Otherwise, how do you explain the fact that a DROP .. CASCADE doesn't fire > triggers for the subordinate objects dropped, but CREATE TABLE does fire > triggers for the subordinate objects created? I complained about that several times. As I recall it dim basically aggreed but didn't want to do it now because it involve passing more data around. Didn't convince me much, but I think I could live with it. Adding proper command trigger calling via that doesn't seem to be too complex to me. > If you trigger on events, then things are a lot more cut-and-dried. You fire > the trigger when the event happens. If the event doesn't happen, you don't > fire the trigger. If an implementation change causes objects to be created > when they weren't before, or not created when they were before, then, > yes, trigger firing behavior will change, but it will be clear that > the new behavior is correct. Right now, future developers modifying > this code have no reasonable basis for deciding whether or not a > trigger ought to fire in a given situation. There's no consistent > rule, no specification, and no documentation for how that's intended > to happen. This stuff definitely needs to be properly specified. And I am sorry that I didn't point this out earlier. I guess I just looked at too many iterations of this to still have a big picture. > > Hm. I agree that there is some work needed to make the whole handling > > more consistent. But I don't think its as bad as you paint it. > > > > I don't want to brush of your review here - you definitely do raise valid > > points. I don't particularly like the current implementation very much. > > But I simply fail to see a less annoying way. > > > > Its also not that I am just defending a colleague - you might have > > noticed the @2ndquadrant - I got interested in this patch quite a bit > > before that was on the horizon. > > Yep, understood, and I'd have the same complaints about this patch > that I do presently if it were submitted by someone who worked for > EnterpriseDB, or some independent third party. I didn't want to suggest that you do! I just wanted to make it explicit that its not somebody asking me via external channels (expect maybe Dim asking for reviews on #postgresql on irc) to do a round of review for some company reason. > I do think it's a bit of a shame that more people didn't pay attention to > and help improve the ObjectAccessHook machinery as we were working on that. > I mean, Dimitri is not the first or last person to want to get control during > DDL operations, and KaiGai's already done a lot of work figuring out > how to make it work reasonably. > Pre-create hooks don't exist in that machinery not because nobody wants > them, but because it's hard. I am not sure if the machinery is a very good fit for pre-create hooks. Thats no argument against the machinery but against reusing it for this usecase. > This whole problem is hard. It would be wrong to paint it as a problem that > is unsolvable or not valuable, but it would be equally wrong to expect that > it's easy or that anyone's first attempt (mine, yours, Dimitri's, KaiGai's, > or Tom Lane's) is going to fall painlessly into place without anyone needing > to sweat a little blood. If I appeared a bit grumpy about your review its mostly because I would like to get the feature to into 9.3 because it would have saved me many hours of pain in the past, and the review pushed it further away from that. The review also didn't come in that early - but thats definitely not your fault, Tom and you did a lot of work on many patches and unfortunately for us you can't scale limiteless. You shouldn't need to, but reality seems to tell us otherwise :( Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 27, 2012 at 11:05 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> I agree that it's not a very helpful decision, but I'm not the one who >> said we wanted command triggers rather than event triggers. :-) > > Color me unconvinced about event triggers. That's not answering my use > cases. Could we get a list of those use cases, maybe on a wiki page somewhere, and add to it all the use cases that KaiGai Kohei or others who are interested in this are aware of? Or maybe we can just discuss via email, but it's going to be hard to agree that we've got something that meets the requirements or doesn't if we're all imagining different sets of requirements. The main use cases I can think of are: 1. Replication. That is, after we perform a DDL operation, we call a trigger and tell it what we did, so that it can make a record of that information and ship it to another machine, where it will arrange to do the same thing on the remote side. 2. Redirection. That is, before we perform a DDL operation, we call a trigger and tell it what we've been asked to do, so that it can go execute the request elsewhere (e.g. the master rather than the Hot Standby slave). 3. Additional security or policy checks. That is, before we perform a DDL operation, we call a trigger and tell it what we've been asked to do, so that it can throw an error if it doesn't like our credentials or, say, the naming convention we've used for our column names. 4. Relaxation of security checks. That is, we let a trigger get control at permissions-checking time and let it make the go-or-no-go decision in lieu of the usual permissions-checking code. 5. Auditing. Either when something is attempted (i.e. before) or after it happens, we log the attempt/event somewhere. Anything else? In that list of use cases, it seems to me that you want BEFORE and AFTER triggers to have somewhat different firing points. For the BEFORE cases, you really need the command trigger to fire early, and once. For example, if someone says "ALTER TABLE dimitri ADD COLUMN fontaine INTEGER, DROP COLUMN IF EXISTS haas", permissions on dimitri should really only get checked once, not once for each subcommand. That's the point at which you need to get control for #3 or #4, and it would be workable for #5 as well; I'm less sure about #2. On the other hand, for the AFTER cases I've listed here, I think you really want to know what *actually* happened, not what somebody thought about doing. You want to know which tables, sequences, etc. *actually* got created or dropped, not the ones that the user mentioned. If the user mentioned a table but we didn't drop it (and we also didn't error out, because IF EXISTS) is used, none of the AFTER cases really care; if we dropped other stuff (because of CASCADE) the AFTER cases may very well care. Another thing to think about with respect to deciding on the correct firing points is that you can't fire easily the trigger after we've identified the object in question but before we've checked permissions on it, which otherwise seems like an awfully desirable thing to do for use cases 3, 4, and 5 from the above list, and maybe 2 as well. We don't want to try to take locks on objects that the current user doesn't have permission to access, because then a user with no permissions whatsoever on an object can interfere with access by authorized users. On the flip side, we can't reliably check permissions before we've locked the object, because somebody else might rename or drop it after we check permissions and before we get the lock. Noah Misch invented a clever technique that I then used to fix a bunch of these problems in 9.2: the fixed code (sadly, not all cases are fixed yet, due to the fact that we ran out of time in the development cycle) looks up the object name, checks permissions (erroring out if the check fails), and then locks the object. Once it gets the lock, it checks whether any shared-invalidation messages have been processed since the point just before we looked up the object name. If so, it redoes the name lookup. If the referrent of the name has not changed, we're done; if it has, we drop the old lock and relock the new object and loop around again, not being content until we're sure that the object we locked is still the referrant of the name. This leads to much more logical behavior than the old way of doing things, and not incidentally gets rid of a lot of errors of the form "cache lookup failed for relation %u" that users of existing releases will remember, probably not too fondly. However, it's got serious implications for triggers that want to relax security policy, because the scope of what you can do inside that loop is pretty limited. You can't really do anything to the relation while you're checking permissions on it, because you haven't locked it yet. If you injected a trigger there, it would have to be similarly limited, and I don't know how we'd enforce that, and it would have to be prepared to get called multiple times for the same operation to handle the retry logic, which would be awkward for other cases, like auditing. For cases where people just want to impose extra security or policy checks, we could possibly just punt and do them after the lock is taken, as Dimitri's patch does. That has the disadvantage that you can possibly obtain a lock on an object that you don't have permissions on, but maybe that's tolerable. To understand the issue here, think about LOCK TABLE foo. If the table isn't otherwise locked and you lack permissions, this will fail immediately. But in 9.1, it will try to AccessExclusiveLock the table first and then check permissions. As soon as it gets the lock, it will fail, but if other people are using the table, those will block the AccessExclusiveLock, but the pending AccessExclusiveLock will also block any new locks from being granted. So essentially all new operations against the table get blocked until all the existing transactions that have touched that table commit or roll back, even though the guy doing the LOCK TABLE has no privileges. Bummer. However, I think we could possibly live with some limitations of this kind for *additional* checks imposed by command triggers; we'll still be protected against someone getting a table lock when they have no privileges at all. It's a lot more awkward when someone wants to *relax* privilege checks, though - I'm fairly confident that we *don't* want to go back to the pre-9.2 behavior of locking first and asking questions later. > Try to build a command string from the catalogs… even if you can store a > snapshot of them before and after the command. Remember that you might > want to “replicate” to things that are NOT a PostgreSQL server. For CREATE commands, I am sure this is doable, although there also other ways. For ALTER commands, I agree that you need to pass detailed information about what is being altered and how. But that drags you well way from the idea of a command trigger. At a minimum, you're going to want to know about each subcommand of ALTER TABLE. However, I think it's unrealistic to suppose we're going to get all of that straightened out in time for 9.2. KaiGai took a whole release cycle to get working support for just CREATE and DROP. >> ambiguity. If you say that we're going to have a trigger on the >> CREATE SEQUENCE command, then what happens when the user creates a >> sequence via some other method? The current patch says that we should >> handle that by calling the CREATE SEQUENCE trigger if it happens to be >> convenient because we're going through the same code path that a >> normal CREATE SEQUENCE would have gone through, but if it uses a >> different code path then let's not bother. Otherwise, how do you > > Yes, the current set of which commands fire which triggers is explained > by how the code is written wrt standard_ProcessUtility() calls. We could > mark re-entrant calls and disable the command trigger feature, it would > not be our first backend global variable in flight. Agreed, although I think just adding an argument to ProcessUtility would be enough. >> Dimitri is not the first or last person to want to get control during >> DDL operations, and KaiGai's already done a lot of work figuring out >> how to make it work reasonably. Pre-create hooks don't exist in that >> machinery not because nobody wants them, but because it's hard. This > > I've been talking with Kaigai about using the Command Trigger > infrastructure to implement its control hooks, while reviewing one of > his patches, and he said that's not low-level enough for him. Right. That worries me. The infrastructure KaiGai is using is low-level in two senses. First, it requires you to write a C module, load it into the backend, and use _PG_init to frob global variables. That is, it's not user-friendly; you have to do very low-level things to get access to the feature. Second, it's fine-grained. A single command can potentially hit those hooks many times; each individual operation triggers a hook call, not the command as a whole. When KaiGai says that command triggers wouldn't work for his purposes, he's talking about the second issue, not the first one. It might be less efficient for him to define his hooks as C-callable functions, register them in pg_proc, and install them using CREATE COMMAND TRIGGER, but it would work. No problem. However, the firing points you've chosen would not be useful for his purposes. I'm beating a dead horse here, but I think that's because you've got the wrong firing points, not because his needs are somehow very different from everything that anyone else wants to do. > Sweating over that feature is a good summary of a whole lot of my and > some others' time lately. Understood. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 27, 2012 at 11:58 AM, Andres Freund <andres@2ndquadrant.com> wrote: > I still think the likeliest way towards that is defining some behaviour we want > regarding > * naming conflict handling > * locking > > And then religiously make sure the patch adheres to that. That might need some > refactoring of existing code (like putting a art of the utility.c handling of > create table into its own function and such), but should be doable. > > I think BEFORE command triggers ideally should run > * before permission checks > * before locking > * before internal checks are done (nameing conflicts, type checks and such) It is possible to do this, and it would actually be much easier and less invasive to implement than what Dimitri has done here, but the downside is that you won't have done the name lookup either. See my last email to Dimitri for a long explanation of why that restriction is not easily circumventable: name lookups, locking, and permission checks are necessarily and inextricably intertwined. Still, if others agree this is useful, I think it would be a lot easier to get right than what we have now. > Obviously some things will be caught before that (parse analysis of some > commands) and I think we won't be able to fully stop that, but its not totally > consistent now and while doing some work in the path of this patch seems > sensible it cannot do-over everything wrt this. Some of what we're now doing as part of parse analysis should really be reclassified. For example, the ProcessUtility branch that handles T_CreateStmt and T_CreateForeignTableStmt should really be split out as a separate function that lives in tablecmds.c, and I think at least some of what's in transformCreateStmt should be moved to tablecmds.c as well. The current arrangement makes it really hard to guarantee things like "the name gets looked up just once", which seems obviously desirable, since strange things will surely happen if the whole command doesn't agree on which OID it's operating on. > Looking up oids and such before calling the trigger doesn't seem to be > problematic from my pov. Command triggers are a superuser only facility, > additional information they gain are no problem. > Thinking about that I think we should enforce command trigger functions to be > security definer functions. I don't see any benefit from that restriction. >> I actually think that, to really meet all needs here, we may need the >> ability to get control in more than one place. > Not sure what you mean by that. Before/after permission checks, before/after > consistency checks? That sort of thing? Yes. For example, above you proposed a kind of trigger that fires really early - basically after parsing but before everything else. What Dimitri has implemented is a much later trigger that is still before the meat of the command, but not before *everything*. And then there's an AFTER trigger, which could fire either after an individual piece or stage of the command, or after the whole command is complete, either of which seems potentially useful depending on the circumstances. I almost think that the BEFORE/AFTER name serves to confuse rather than to clarify, in this case. It's really a series of specific hook points: whenever we parse a command (but before we execute it), after security and sanity checks but before we begin executing the command, before or after various subcommands, after the whole command is done, and maybe a few others. When we say BEFORE or AFTER, we implicitly assume that we want at most two of the things from that list, and I am not at all sure that's what going to be enough. One thing I had thought about doing, in the context of sepgsql, and we may yet do it, is create a hook that gets invoked whenever we need to decide whether it's OK to perform an action on an object. For example, consider ALTER TABLE .. ADD FOREIGN KEY: we'd ask the hook both "is it OK to add a foreign key to table X?" and also "is it OK to make a foreign key refer to table Y"? This doesn't fit into the command-trigger framework at all, but it's definitely useful for sepgsql, and I bet it's good for other things, too - maybe not that specific example, but that kind of thing. >> I think that KaiGai only wants to be able to deny things that would >> normally be allowed, but I suspect some of those other folks would >> also like to be able to allow things that would normally be denied. > Denying seems to be easier than allowing stuff safely.... Yes. >> For those use cases, you want to get control at permissions-checking >> time. However, where Dimitri has the hooks right now, BEFORE triggers >> for ALTER commands fire after you've already looked up the object that >> you're manipulating. That has the advantage of allowing you to use >> the OID of the object, rather than its name, to make policy decisions; >> but by that time it's too late to cancel a denial-of-access by the >> first-line permissions checks. > Why? Just throw a access denied exception? Unless its done after the locking > that won't be visible by anything but timing? I think you misread what I wrote. It's still possible to deny things, but it's too late to allow something that an earlier check has already denied. Right now, sepgsql actually does its denying very late, from the post-create hook. That's not ideal, of course, but it works. I know KaiGai would like it to happen earlier, to avoid wasting work. > Additional granting is more complex though, I am definitely with you there. > That will probably need INSTEAD triggers which I don't see for now. Those will > have their own share of problems. Right. >> I agree that it's not a very helpful decision, but I'm not the one who >> said we wanted command triggers rather than event triggers. :-) > Well, but on the other hand what will you do with a "pg_attribute row added" > event. It doesn't even have a oid ;) We refer to everything using ObjectAddress notation - the OID of the system catalog that contains objects of that type, the OID of the object itself, and a "sub-object ID" which is 0 in general but the column number in that specific case. What you can do is go fetch the pg_attribute row and then do whatever you like - log it, throw an error, etc. It's a hook that gets invoked when you add a column. If it's important to differentiate between a column that gets added at CREATE TABLE time and one that gets added by ALTER TABLE .. ADD COLUMN, that could be done. It's a lot easier to deal with triggers that fire in cases you don't care about than to deal with triggers that don't fire in cases you do care about. The former you can just ignore. >> > Which would basically require including pg_dump in the backend to >> > implement replication and logging. I don't think librarifying pg_dump is >> > a fair burden at all. >> I don't either, but that strikes me as a largely unrelated problem. >> As you may recall, I've complained that too much of that functionality >> is in the client in the past, and I haven't changed my opinion on >> that. > I am not really happy with that as well. But if we start putting all that into > this patch we will never get anywhere. Just to be clear, I wasn't proposing that. > I think moving that nearer to the > server in some form is a MAJOR project on its own. > Also I dont see it solving differential stuff at all. True. > But how would do something like logging the originating statement with the > object access hook machinery? You can't, but you can do a lot of other useful stuff that command triggers as currently envisioned won't allow. I'm starting to think that maybe BEFORE triggers need to fire on commands and AFTER triggers on events? Can we define "start of command execution" as an event? I don't think we need two completely different facilities, but I really want to make sure we have a plan for capturing events. I can't imagine how replication is gonna work if we can't log exactly what we did, rather than just the raw query string. Statement-based replication is sometimes useful, especially in heterogenous environments, but it's always kinda sucky, in that there are edge cases that break. If you want to know what schema the server is going to use for a newly created object before it creates it, you've got to do that calculation yourself, and now you've created a risk of getting a different answer. If you want to capture all the drops and replay them on the slave, you have to be able to handle CASCADE. > Well, the problem is with just catalog access its just about impossible to > generate differential changes from them. Also the catalog between clusters > *wont be the same* in a large part of the use-cases. Oids will basically never > match. True. ALTER commands are going to need additional information. But some of that information also won't be available until quite late. For example, an ALTER on a parent might cascade down to children, and a command trigger might want to know exactly which tables were affected. That trigger isn't really BEFORE or AFTER or INSTEAD OF; it's more like DURING, and at a very specific point DURING. > Matching the intent, which is mostly visible in the parsetree, seems to be a > whole much more helpfull. But note that there are a lot of possible things you can't tell from the parse-tree without deducing them. Sure, for LISTEN, the parse tree is fine. But for ALTER TABLE or DROP the parse tree doesn't even tell you what object you're operating on (at least, not unless the user includes an explicit schema-qualification), let alone the whole list of objects the command is ultimately going to process due to cascading, inheritance, etc. You can reimplement that logic in your trigger but that's both fragile and laborious. >> > Also I have a *very hard* time to imagining to sensibly implement >> > replicating CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object >> > access hooks. There is just not enough context. How would you discern >> > between a ADD COLUMN and a CREATE TABLE? They look very similar or even >> > identical on a catalog level. >> That can be fixed using the optional argument to >> InvokeObjectAccessHook, similar to what we've done for differentiating >> internal drops from other drops. > Several of the points where those hooks are invoked don't even seem to have > the necessary information for that... Doing that seems to involve passing > heaps of additional state through the backend. I've been trying pretty hard to avoid that, but I agree that there are possible stumbling blocks there. I don't think it's an insurmountable problem, though. We just need to start with a clean definition of what we want the behavior to be; then, we may need to refactor somewhat to get there. The problem with the current code is that it's just accepting whatever fits nicely into the current code as a specification-in-fact, and I don't think that's a good basis for a feature on which people will want to build complex and robust systems. > If I appeared a bit grumpy about your review its mostly because I would like > to get the feature to into 9.3 because it would have saved me many hours of > pain in the past, and the review pushed it further away from that. > The review also didn't come in that early - but thats definitely not your > fault, Tom and you did a lot of work on many patches and unfortunately for us > you can't scale limiteless. You shouldn't need to, but reality seems to tell > us otherwise :( Yeah. I probably should have looked at this more sooner, but (1) I was busy, (2) it seemed like a lot of active development was happening, and (3) Thom was reviewing away. And I hate reviewing code when there are new versions coming out every day, because obviously there's no point in my doing final pre-commit hacking when that's happening. I did do some big picture review that resulted in some significant scope-trimming early in the CommitFest, but I wish now that I'd dived into it a bit more. I find it rather regrettable that there were so few people doing review this CommitFest. I spent a lot of time on things that could have been done by other people, and the result is that some things like this have gone rather late. All that having been said, if I had my druthers, we would have bounced this patch out of the CommitFest on day 1, because it has long been and continues to be my belief that getting a major feature done in one CommitFest is unrealistic, unless we're willing to have that CommitFest take as long as 2 or 3 CommitFests, which I dislike. The last CommitFest is hard enough without last-minute major feature submissions; and you'll note that the latest any feature of mine has ever been committed is late January, and not because I ignore everyone else to work on my own stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On Tuesday, March 27, 2012 07:34:46 PM Robert Haas wrote: > On Tue, Mar 27, 2012 at 11:58 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > I still think the likeliest way towards that is defining some behaviour > > we want regarding > > * naming conflict handling > > * locking > > > > And then religiously make sure the patch adheres to that. That might need > > some refactoring of existing code (like putting a art of the utility.c > > handling of create table into its own function and such), but should be > > doable. > > > > I think BEFORE command triggers ideally should run > > * before permission checks > > * before locking > > * before internal checks are done (nameing conflicts, type checks and > > such) > It is possible to do this, and it would actually be much easier and > less invasive to implement than what Dimitri has done here, but the > downside is that you won't have done the name lookup either. See my > last email to Dimitri for a long explanation of why that restriction > is not easily circumventable: name lookups, locking, and permission > checks are necessarily and inextricably intertwined. Read your other mail. Comming back to it I think the above might be to restrictive to be usefull for a big part of use-cases. So your idea of more hook points below has some merits. > Still, if others > agree this is useful, I think it would be a lot easier to get right > than what we have now. I think its pretty important to have something thats usable rather easily which requires names to be resolved and thus permission checks already performed and (some) locks already acquired. I think quite some of the possible usages need the facility to be as simple as possible and won't care about already acquired locks/names. > Some of what we're now doing as part of parse analysis should really > be reclassified. For example, the ProcessUtility branch that handles > T_CreateStmt and T_CreateForeignTableStmt should really be split out > as a separate function that lives in tablecmds.c, and I think at least > some of what's in transformCreateStmt should be moved to tablecmds.c > as well. The current arrangement makes it really hard to guarantee > things like "the name gets looked up just once", which seems obviously > desirable, since strange things will surely happen if the whole > command doesn't agree on which OID it's operating on. Yes, I aggree, most of that should go from utility.c. > > Looking up oids and such before calling the trigger doesn't seem to be > > problematic from my pov. Command triggers are a superuser only facility, > > additional information they gain are no problem. > > Thinking about that I think we should enforce command trigger functions > > to be security definer functions. > I don't see any benefit from that restriction. The reason I was thinking it might be a good idea is that they get might get more knowledge passed than the user could get directly otherwise. Especially if we extend the scheme to more places/informations. > >> I actually think that, to really meet all needs here, we may need the > >> ability to get control in more than one place. > > Not sure what you mean by that. Before/after permission checks, > > before/after consistency checks? That sort of thing? > Yes. For example, above you proposed a kind of trigger that fires > really early - basically after parsing but before everything else. > What Dimitri has implemented is a much later trigger that is still > before the meat of the command, but not before *everything*. And then > there's an AFTER trigger, which could fire either after an individual > piece or stage of the command, or after the whole command is complete, > either of which seems potentially useful depending on the > circumstances. I almost think that the BEFORE/AFTER name serves to > confuse rather than to clarify, in this case. It's really a series of > specific hook points: whenever we parse a command (but before we > execute it), after security and sanity checks but before we begin > executing the command, before or after various subcommands, after the > whole command is done, and maybe a few others. When we say BEFORE or > AFTER, we implicitly assume that we want at most two of the things > from that list, and I am not at all sure that's what going to be > enough. You might have a point there. Not sure if the complexity of explaining all the different hook points is worth the pain. What are the phases we have: * after parse * logging * after resolving name * after checking permisssions (interwined with the former) * override permission check? INSTEAD? * after locking (interwined with the former) * except it needs to be connected to resolving the name/permission check this doesn't seem to be an attractive hook point * after validation * additional validation likely want to hook here * after execution * replication might want to hook here Am I missing an interesting phase? Allowing all that probably seems to require a substantial refactoring of commands/ and catalog/ > One thing I had thought about doing, in the context of sepgsql, and we > may yet do it, is create a hook that gets invoked whenever we need to > decide whether it's OK to perform an action on an object. For > example, consider ALTER TABLE .. ADD FOREIGN KEY: we'd ask the hook > both "is it OK to add a foreign key to table X?" and also "is it OK to > make a foreign key refer to table Y"? This doesn't fit into the > command-trigger framework at all, but it's definitely useful for > sepgsql, and I bet it's good for other things, too - maybe not that > specific example, but that kind of thing. Its a neat feature, yes. Seems to be orthogonal yes. > >> I agree that it's not a very helpful decision, but I'm not the one who > >> said we wanted command triggers rather than event triggers. :-) > > > > Well, but on the other hand what will you do with a "pg_attribute row > > added" event. It doesn't even have a oid ;) > > We refer to everything using ObjectAddress notation - the OID of the > system catalog that contains objects of that type, the OID of the > object itself, and a "sub-object ID" which is 0 in general but the > column number in that specific case. What you can do is go fetch the > pg_attribute row and then do whatever you like - log it, throw an > error, etc. It's a hook that gets invoked when you add a column. If > it's important to differentiate between a column that gets added at > CREATE TABLE time and one that gets added by ALTER TABLE .. ADD > COLUMN, that could be done. It's a lot easier to deal with triggers > that fire in cases you don't care about than to deal with triggers > that don't fire in cases you do care about. The former you can just > ignore. I don't believe that distinction can be done that easily without more impact than the CT patch. I think you need a surprisingly high amount of context to know when to ignore a trigger. Especially as its not exactly easy to transfer knowledge from one to the next. > > But how would do something like logging the originating statement with > > the object access hook machinery? > You can't, but you can do a lot of other useful stuff that command > triggers as currently envisioned won't allow. I'm starting to think > that maybe BEFORE triggers need to fire on commands and AFTER triggers > on events? Can we define "start of command execution" as an event? I > don't think we need two completely different facilities, but I really > want to make sure we have a plan for capturing events. I can't > imagine how replication is gonna work if we can't log exactly what we > did, rather than just the raw query string. Statement-based > replication is sometimes useful, especially in heterogenous > environments, but it's always kinda sucky, in that there are edge > cases that break. I don't think creating *new* DDL from the parsetree can really count as statement based replication. And again, I don't think implementing that will cost that much effort. How would it help us to know - as individual events! - which tuples where created where? Reassembling that will be a huge mess. I basically fail to see *any* use case besides access checking. > If you want to know what schema the server is going > to use for a newly created object before it creates it, you've got to > do that calculation yourself, and now you've created a risk of getting > a different answer. If you want to capture all the drops and replay > them on the slave, you have to be able to handle CASCADE. I would like to have CASCADE handled, yes. > > Well, the problem is with just catalog access its just about impossible > > to generate differential changes from them. Also the catalog between > > clusters *wont be the same* in a large part of the use-cases. Oids will > > basically never match. > True. ALTER commands are going to need additional information. But > some of that information also won't be available until quite late. > For example, an ALTER on a parent might cascade down to children, and > a command trigger might want to know exactly which tables were > affected. That trigger isn't really BEFORE or AFTER or INSTEAD OF; > it's more like DURING, and at a very specific point DURING. Well, *I* would like a separate commmand trigger to be fired for sub-actions but see them qualified as such... > > Matching the intent, which is mostly visible in the parsetree, seems to > > be a whole much more helpfull. > But note that there are a lot of possible things you can't tell from > the parse-tree without deducing them. Sure, for LISTEN, the parse > tree is fine. But for ALTER TABLE or DROP the parse tree doesn't even > tell you what object you're operating on (at least, not unless the > user includes an explicit schema-qualification), let alone the whole > list of objects the command is ultimately going to process due to > cascading, inheritance, etc. You can reimplement that logic in your > trigger but that's both fragile and laborious. Resolving objects to the schema qualified variant isn't that much of a problem. Inheritance is, I definitely can see that. For lots of simple use-cases even something that has some restrictions attached would be *WAY* better than the current situation where all that is not possible at all. I fear a bit that this discussion is leading to something thats never going to materialize because it would require a huge amount of work to get there. > >> That can be fixed using the optional argument to > >> InvokeObjectAccessHook, similar to what we've done for differentiating > >> internal drops from other drops. > > Several of the points where those hooks are invoked don't even seem to > > have the necessary information for that... Doing that seems to involve > > passing heaps of additional state through the backend. > I've been trying pretty hard to avoid that, but I agree that there are > possible stumbling blocks there. I don't think it's an insurmountable > problem, though. > We just need to start with a clean definition of > what we want the behavior to be; then, we may need to refactor > somewhat to get there. The problem with the current code is that it's > just accepting whatever fits nicely into the current code as a > specification-in-fact, and I don't think that's a good basis for a > feature on which people will want to build complex and robust systems. I agree that we need consistency there. > I find it rather regrettable that there were so few people doing review this > CommitFest. I spent a lot of time on things that could have been done by > other people, and the result is that some things like this have gone rather > late. I find that sad too. Even though I am part of the crowd that didn't do enough. Thanks for the work! Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 27, 2012 at 4:05 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> > Looking up oids and such before calling the trigger doesn't seem to be >> > problematic from my pov. Command triggers are a superuser only facility, >> > additional information they gain are no problem. >> > Thinking about that I think we should enforce command trigger functions >> > to be security definer functions. >> I don't see any benefit from that restriction. > The reason I was thinking it might be a good idea is that they get might get > more knowledge passed than the user could get directly otherwise. Especially > if we extend the scheme to more places/informations. As long as the superuser gets to decide whether or not a given function is installed as a command trigger, there's no harm in allowing him to make it either SECURITY INVOKER or SECURITY DEFINER. I agree that making it SECURITY DEFINER will often be useful and appropriate; I just see no reason to enforce that restriction programatically. > What are the phases we have: > > * after parse > * logging > * after resolving name > * after checking permisssions (interwined with the former) > * override permission check? INSTEAD? > * after locking (interwined with the former) > * except it needs to be connected to resolving the name/permission check > this doesn't seem to be an attractive hook point > * after validation > * additional validation likely want to hook here > * after execution > * replication might want to hook here > > Am I missing an interesting phase? I'd sort of categorize it like this: - pre-parse - post-parse - at permissions-checking time - post permissions-checking/name-lookup, before starting the main work of the command, i.e. pre-execution - "event" type triggers that happen when specific actions are performed (e.g. CREATE, DROP, etc.) or as subcommands fire (e.g. in ALTER TABLE, we could fire an alter-table-subcommand trigger every time we execute a subcommand) - post-execution > Allowing all that probably seems to require a substantial refactoring of > commands/ and catalog/ Probably. But we don't need to allow it all at once. What we need to do is pick one or two things that are relatively easily done, implement those first, and then build on it. Pre-parse, post-parse, CREATE-event, DROP-event, and post-execution hooks are all pretty easy to implement without major refactoring, I think. OTOH, post-permissions-checking-pre-execution is going to be hard to implement without refactoring, and ALTER-event hooks are going to be hard, too. > I think you need a surprisingly high amount of context to know when to ignore > a trigger. Especially as its not exactly easy to transfer knowledge from one > to the next. I'm not convinced, but maybe it would be easier to resolve this in the context of a specific proposal. > I don't think creating *new* DDL from the parsetree can really count as > statement based replication. And again, I don't think implementing that will > cost that much effort. > How would it help us to know - as individual events! - which tuples where > created where? Reassembling that will be a huge mess. I basically fail to see > *any* use case besides access checking. I think if you'd said this to me two years ago, I would have believed you, but poking through this code in the last year or two, I've come to the conclusion that there are a lot of subtle things that get worked out after parse time, during execution. Aside from things like recursing down the inheritance hierarchy and maybe creating some indexes or sequences when creating a table, there's also subtle things like working out exactly what index we're creating: name, access method, operator class, collation, etc. And I'm pretty sure that if we examine the code carefully we'll find there are a bunch more. > I fear a bit that this discussion is leading to something thats never going to > materialize because it would require a huge amount of work to get there. Again, the way to avoid that is to tackle the simple cases first and then work on the harder cases after that, but I don't think that's what the current patch is doing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> I think BEFORE command triggers ideally should run >> * before permission checks >> * before locking >> * before internal checks are done (nameing conflicts, type checks and such) > > It is possible to do this, and it would actually be much easier and > less invasive to implement than what Dimitri has done here, but the > downside is that you won't have done the name lookup either. There's a trade-off decision to take here, that was different in previous versions of the patch. You can either run the trigger very early or have specific information details. The way to have both and keep your sanity, and that was implemented in the patch, is to have ANY command triggers run before the process utility big switch and provide only the command tag and parse tree, and have the specific triggers called after permission, locking and internal checks are done. I've been asked to have a single call site for ANY and specific triggers, which means you can't have BEFORE triggers running either before or after those steps. Now I can see why implementing that on top of the ANY command feature is surprising enough for wanting to not do it this way. Maybe the answer is to use another keyword to be able to register command triggers that run that early and without any specific information other than the command tag. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Mar 28, 2012 at 4:12 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> I think BEFORE command triggers ideally should run >>> * before permission checks >>> * before locking >>> * before internal checks are done (nameing conflicts, type checks and such) >> >> It is possible to do this, and it would actually be much easier and >> less invasive to implement than what Dimitri has done here, but the >> downside is that you won't have done the name lookup either. > > There's a trade-off decision to take here, that was different in > previous versions of the patch. You can either run the trigger very > early or have specific information details. > > The way to have both and keep your sanity, and that was implemented in > the patch, is to have ANY command triggers run before the process > utility big switch and provide only the command tag and parse tree, and > have the specific triggers called after permission, locking and internal > checks are done. > > I've been asked to have a single call site for ANY and specific > triggers, which means you can't have BEFORE triggers running either > before or after those steps. > > Now I can see why implementing that on top of the ANY command feature is > surprising enough for wanting to not do it this way. Maybe the answer is > to use another keyword to be able to register command triggers that run > that early and without any specific information other than the command > tag. Yeah, I think so. I objected to the way you had it because of the inconsistency, not because I think it's a useless place to fire triggers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 28, 2012 at 7:16 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Now I can see why implementing that on top of the ANY command feature is >> surprising enough for wanting to not do it this way. Maybe the answer is >> to use another keyword to be able to register command triggers that run >> that early and without any specific information other than the command >> tag. > > Yeah, I think so. I objected to the way you had it because of the > inconsistency, not because I think it's a useless place to fire > triggers. Further thought: I think maybe we shouldn't use keywords at all for this, and instead use descriptive strings like post-parse or pre-execution or command-start, because I bet in the end we're going to end up with a bunch of them (create-schema-subcommand-start? alter-table-subcommand-start?), and it would be nice not to hassle with the grammar every time we want to add a new one. To make this work with the grammar, we could either (1) enforce that each is exactly one word or (2) require them to be quoted or (3) require those that are not a single word to be quoted. I tend think #2 might be the best option in this case, but... I've also had another general thought about safety: we need to make sure that we're only firing triggers from places where it's safe for user code to perform arbitrary actions. In particular, we have to assume that any triggers we invoke will AcceptInvalidationMessages() and CommandCounterIncrement(); and we probably can't do it at all (or at least not without some additional safeguard) in places where the code does CheckTableNotInUse() up through the point where it's once again safe to access the table, since the trigger might try. We also need to think about re-entrancy: that is, in theory, the trigger might execute any other DDL command - e.g. it might drop the table that we've been asked to rename. I suspect that some of the current BEFORE trigger stuff might fall down on that last one, since the existing code not-unreasonably expects that once it's locked the table, the catalog entries can't go away. Maybe it all happens to work out OK, but I don't think we can count on that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Further thought: I think maybe we shouldn't use keywords at all for > this, and instead use descriptive strings like post-parse or > pre-execution or command-start, because I bet in the end we're going > to end up with a bunch of them (create-schema-subcommand-start? > alter-table-subcommand-start?), and it would be nice not to hassle > with the grammar every time we want to add a new one. To make this > work with the grammar, we could either (1) enforce that each is > exactly one word or (2) require them to be quoted or (3) require those > that are not a single word to be quoted. I tend think #2 might be the > best option in this case, but... What about : create command trigger foo before prepare alter table … create command trigger foo before start of alter table … createcommand trigger foo before execute alter table … create command trigger foo before end of alter table … create command trigger foo when prepare alter table … create command trigger foo when start alter table … create commandtrigger foo when execute of alter table … create command trigger foo when end of alter table … create command trigger foo preceding alter table … create command trigger foo preceding alter table … deferred create commandtrigger foo preceding alter table … immediate create command trigger foo following parser of alter table … create command trigger foo preceding execute of alter table… create command trigger foo following alter table … create command trigger foo before alter table nowait … I'm not sure how many hooks we can really export here, but I guess we have enough existing keywords to describe when they get to run (parser, mapping, lock, check, begin, analyze, access, disable, not exists, do, end, escape, extract, fetch, following, search…) > I've also had another general thought about safety: we need to make > sure that we're only firing triggers from places where it's safe for > user code to perform arbitrary actions. In particular, we have to > assume that any triggers we invoke will AcceptInvalidationMessages() > and CommandCounterIncrement(); and we probably can't do it at all (or > at least not without some additional safeguard) in places where the > code does CheckTableNotInUse() up through the point where it's once > again safe to access the table, since the trigger might try. We also I've been trying to do that already. > need to think about re-entrancy: that is, in theory, the trigger might > execute any other DDL command - e.g. it might drop the table that > we've been asked to rename. I suspect that some of the current BEFORE That's why I implemented ALTER COMMAND TRIGGER ... SET DISABLE in the first place, so that you could run the same command from the trigger itself without infinite recursion. > trigger stuff might fall down on that last one, since the existing > code not-unreasonably expects that once it's locked the table, the > catalog entries can't go away. Maybe it all happens to work out OK, > but I don't think we can count on that. I didn't think of DROP TABLE in a command table running BEFORE ALTER TABLE, say. That looks evil. It could be documented as yet another way to shoot yourself in the foot though? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Mar 28, 2012 at 3:41 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > What about : > > create command trigger foo before prepare alter table … > create command trigger foo before start of alter table … > create command trigger foo before execute alter table … > create command trigger foo before end of alter table … > > create command trigger foo when prepare alter table … > create command trigger foo when start alter table … > create command trigger foo when execute of alter table … > create command trigger foo when end of alter table … > > create command trigger foo preceding alter table … > create command trigger foo preceding alter table … deferred > create command trigger foo preceding alter table … immediate > > create command trigger foo following parser of alter table … > create command trigger foo preceding execute of alter table … > > create command trigger foo following alter table … > > create command trigger foo before alter table nowait … > > I'm not sure how many hooks we can really export here, but I guess we > have enough existing keywords to describe when they get to run (parser, > mapping, lock, check, begin, analyze, access, disable, not exists, do, > end, escape, extract, fetch, following, search…) I am sure that we could find a way to beat this with a stick until it behaves, but I don't really like that idea. It seems to me to be a case of adding useless parser bloat. Prior to 9.0, when we introduced the new EXPLAIN syntax, new EXPLAIN options were repeatedly proposed and partly rejected on the grounds that they weren't important enough to justify adding new keywords. We've now got EXPLAIN (COSTS, BUFFERS, TIMING, FORMAT) and there will probably be more in the future, and the parser overhead of adding a new one is zero. I think we should learn from that lesson: when you may want to have a bunch of different options and it's not too clear what they're all going to be named, designing things in a way that avoids a dependency on the main parser having the right keywords leads to less patch rejection and more getting stuff done. >> I've also had another general thought about safety: we need to make >> sure that we're only firing triggers from places where it's safe for >> user code to perform arbitrary actions. In particular, we have to >> assume that any triggers we invoke will AcceptInvalidationMessages() >> and CommandCounterIncrement(); and we probably can't do it at all (or >> at least not without some additional safeguard) in places where the >> code does CheckTableNotInUse() up through the point where it's once >> again safe to access the table, since the trigger might try. We also > > I've been trying to do that already. > >> need to think about re-entrancy: that is, in theory, the trigger might >> execute any other DDL command - e.g. it might drop the table that >> we've been asked to rename. I suspect that some of the current BEFORE > > That's why I implemented ALTER COMMAND TRIGGER ... SET DISABLE in the > first place, so that you could run the same command from the trigger > itself without infinite recursion. > >> trigger stuff might fall down on that last one, since the existing >> code not-unreasonably expects that once it's locked the table, the >> catalog entries can't go away. Maybe it all happens to work out OK, >> but I don't think we can count on that. > > I didn't think of DROP TABLE in a command table running BEFORE ALTER > TABLE, say. That looks evil. It could be documented as yet another way > to shoot yourself in the foot though? Well, it depends somewhat on how it fails. If it fails by crashing the server, for example, I don't think that's going to fly. My suspicion is that it won't do that, but what it might do is fail in some pretty odd and unpredictable ways, possibly leading to catalog corruption, which I don't feel too good about either. Think about not just ALTER vs. DROP but also ALTER vs. ALTER. It's probably easier to add guards against this kind of thing than it is to prove that it's not going to do anything too wacky; the obvious idea that comes to mind is to bump the command counter after returning from the last trigger (if that doesn't happen already) and then verify that the tuple is still present and has whatever other properties we've already checked and are counting on, and if not chuck an error. I think that for a first version of this feature it probably would be smart to trim this back to something that doesn't involve surgery on the guts of every command; then, we can extend incrementally. Nothing you've proposed seems impossible to me, but most of the really interesting things are hard, and it would be much easier to handle patches intended to cater to more complex use cases if the basic infrastructure were already committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I am sure that we could find a way to beat this with a stick until it > behaves, but I don't really like that idea. It seems to me to be a [...] > we should learn from that lesson: when you may want to have a bunch of I first wanted to ensure that reusing existing parser keyword wouldn't be well enough for our case as I find that solution more elegant. If we can't think of future places where we would want to add support for calling command triggers, then yes, you're right, something more flexible is needed. I'll go make that happen, and still need input here. We first want to have command triggers on specific commands or ANY command, and we want to implement 3 places from where to fire them. Here's a new syntax proposal to cope with that: create command trigger before COMMAND_STEP of alter table execute procedure snitch(); - before the process utility switch, with only command tag and parse tree create command trigger foo before start of alter table execute procedure snitch(); - before running the command, after having done basic error checks, security checks, name lookups and locking, with allinformation create command trigger before execute of alter table execute procedure snitch(); - after having run the command create command trigger foo before end of alter table execute procedure snitch(); Some other places make sense to add support for, but working within the delays we have and with the current patch, those 3 points are what's easy enough to implement now. >> I didn't think of DROP TABLE in a command table running BEFORE ALTER >> TABLE, say. That looks evil. It could be documented as yet another way >> to shoot yourself in the foot though? > > I think that for a first version of this feature it probably would be > smart to trim this back to something that doesn't involve surgery on > the guts of every command; then, we can extend incrementally. A simple enough solution here would be to register a list disabled commands per objectid before running the user defined function, and check against that list before allowing it to run. As we have command trigger support for all the commands we want to be able to block, the lookup and ereport() call can be implemented in InitCommandContext() where we already apply some limitations (like no command triggers on command trigger commands). Once that is in place it's possible to refine the pre-condition checks on a per command basis and stop adding the command from the black list. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 29 March 2012 13:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > I'll go make that happen, and still need input here. We first want to > have command triggers on specific commands or ANY command, and we want > to implement 3 places from where to fire them. > > Here's a new syntax proposal to cope with that: > > create command trigger before COMMAND_STEP of alter table > execute procedure snitch(); > > - before the process utility switch, with only command tag and parse > tree > > create command trigger foo before start of alter table > execute procedure snitch(); > > - before running the command, after having done basic error checks, > security checks, name lookups and locking, with all information > > create command trigger before execute of alter table > execute procedure snitch(); > > - after having run the command > > create command trigger foo before end of alter table > execute procedure snitch(); Is it necessary to add this complexity in this version? Can't we keep it simple but in a way that allows the addition of this later? The testing of all these new combinations sounds like a lot of work. -- Thom
On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown <thom@linux.com> wrote: > On 29 March 2012 13:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> I'll go make that happen, and still need input here. We first want to >> have command triggers on specific commands or ANY command, and we want >> to implement 3 places from where to fire them. >> >> Here's a new syntax proposal to cope with that: >> >> create command trigger before COMMAND_STEP of alter table >> execute procedure snitch(); >> >> - before the process utility switch, with only command tag and parse >> tree >> >> create command trigger foo before start of alter table >> execute procedure snitch(); >> >> - before running the command, after having done basic error checks, >> security checks, name lookups and locking, with all information >> >> create command trigger before execute of alter table >> execute procedure snitch(); >> >> - after having run the command >> >> create command trigger foo before end of alter table >> execute procedure snitch(); > > Is it necessary to add this complexity in this version? Can't we keep > it simple but in a way that allows the addition of this later? The > testing of all these new combinations sounds like a lot of work. I concur. This is way more complicated than we should be trying to do in version 1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 29, 2012 at 8:30 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > I'll go make that happen, and still need input here. We first want to > have command triggers on specific commands or ANY command, and we want > to implement 3 places from where to fire them. > > Here's a new syntax proposal to cope with that: > > create command trigger before COMMAND_STEP of alter table > execute procedure snitch(); > > - before the process utility switch, with only command tag and parse > tree > > create command trigger foo before start of alter table > execute procedure snitch(); > > - before running the command, after having done basic error checks, > security checks, name lookups and locking, with all information > > create command trigger before execute of alter table > execute procedure snitch(); > > - after having run the command > > create command trigger foo before end of alter table > execute procedure snitch(); One thought is that it might be better to say AT or ON or WHEN rather than BEFORE, since BEFORE END is just a little strange; and also because a future hook point might be something like "permissions-checking", and we really want it to be AT permissions-checking time, not BEFORE permissions-checking time. If I got to pick, I'd pick this syntax: CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) EXECUTE PROCEDURE function_name(args); Then you could eventually imagine: CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE PROCEDURE throw_an_error(); CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start (alter_table) EXECUTE PROCEDURE throw_an_error(); CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE PROCEDURE record_table_drop(); CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check (create_extension) EXECUTE PROCEDURE make_heroku_happy(); CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start (alter_table) EXECUTE PROCEDURE record_alter_table_subcommand(); Piece by piece: - I prefer EVENT to COMMAND, because the start or end or any other point during the execution of a command can be viewed as an event; however, it is pretty clear that not everything that can be viewed as an event upon which we might like triggers to fire can be viewed as a command, even if we have a somewhat well-defined notion of subcommands. I continue to think that there is no sense in which creating a sequence to back a serial column or dropping an object due to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an event. Anything we want can be an event. I think if we don't get this right in the first version we're going to be stuck with a misnomer forever. :-( - It is pretty clear that for triggers that are supposed to fire at the start or end of a command, it's useful to be able to specify which commands it fires for. However, there must be other types of events (as in my postulated sql_drop case above) where the name of the command is not relevant - people aren't going to find all the objects that got dropped as a result of a toplevel drop table command; they're going to want to find all the tables that got dropped regardless of which incantation did it. Also keep in mind that you can do things like use ALTER TABLE to rename a view (and there are other cases of that sort of confusion with domains vs. types, aggregates vs. functions, etc), so being able to filter based on object type seems clearly better in a bunch of real-world cases. And, even if you don't believe that specific example, a general syntax leaves us with freedom to maneuver if we discover other needs in the future. Writing alter_table etc. as tokens rather than as ALTER TABLE also has the further advantages of (1) greatly decreasing the parser footprint of this feature and (2) relieving everyone who adds commands in the future of the need to also change the command-trigger grammar. - I don't think that triggers at what Dimitri is calling "before execute" are a good idea at all for the first version of this feature, because there is a lot of stuff that might break and examining that should really be a separate project from getting the basic infrastructure in place. However, I also don't like the name. Unlike queries, DDL commands don't have clearly separate planning and execution phases, so saying that something happens before execution isn't really very clear. As previously discussed, the hook points in the latest patch are not all entirely consistent between one command and the next, not only in the larger ways I've already pointed out but also in some smaller ways. Different commands do different kinds of integrity checks and the firing points are not always in exactly the same place in that sequence. Of course, not all the commands do all the checks in the same order, so some possibly-not-trivial refactoring is probably required to get that consistent. Moreover, it doesn't seem out of reach to think that in the future we might draw a more clear line again between pre-execution and execution proper, to solve problems like the tendency of the current code to look up the same name more than once and pray that it gets the same OID back every time. So I think that when we get to the point of sticking hooks in at this location, we should call them something much more specific than before-execute hooks. I'm not sure exactly what. Maybe something like after_ddl_name_lookup or something like that. The DDL name lookup might move slightly earlier or later as a result of refactoring, but it can't disappear and it can't cease to involve locking and permissions checking. The set of things that you can reasonably do there (additional security checks, auditing) can't really change much either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown <thom@linux.com> wrote: >> On 29 March 2012 13:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >>> I'll go make that happen, and still need input here. We first want to >>> have command triggers on specific commands or ANY command, and we want >>> to implement 3 places from where to fire them. >>> >>> Here's a new syntax proposal to cope with that: >> >> Is it necessary to add this complexity in this version? Can't we keep >> it simple but in a way that allows the addition of this later? The >> testing of all these new combinations sounds like a lot of work. > > I concur. This is way more complicated than we should be trying to do > in version 1. I'm at a loss here. This proposal was so that we can reach a commonly agreed minimal solution and design in first version. There's no new piece of infrastructure to add, the syntax is changed only to open the road for later, I'm not changing where the current command triggers are to be called (except for those which are misplaced). So, please help me here: what do we want to have in 9.3? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 29 March 2012 16:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown <thom@linux.com> wrote: >>> On 29 March 2012 13:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >>>> I'll go make that happen, and still need input here. We first want to >>>> have command triggers on specific commands or ANY command, and we want >>>> to implement 3 places from where to fire them. >>>> >>>> Here's a new syntax proposal to cope with that: >>> >>> Is it necessary to add this complexity in this version? Can't we keep >>> it simple but in a way that allows the addition of this later? The >>> testing of all these new combinations sounds like a lot of work. >> >> I concur. This is way more complicated than we should be trying to do >> in version 1. > > I'm at a loss here. This proposal was so that we can reach a commonly > agreed minimal solution and design in first version. There's no new > piece of infrastructure to add, the syntax is changed only to open the > road for later, I'm not changing where the current command triggers are > to be called (except for those which are misplaced). > > So, please help me here: what do we want to have in 9.3? Perhaps I misunderstood. It was the addition of the fine-grained even options (parse, execute etc) I saw as new. If you're saying those are just possible options for later that won't be in this version, I'm fine with that. If those are to make it for 9.2, then creating the necessary test cases and possible fixes sounds infeasible in such a short space of time. Please disregard if this is not the case. -- Thom
Robert Haas <robertmhaas@gmail.com> writes: >> create command trigger before COMMAND_STEP of alter table >> execute procedure snitch(); > > One thought is that it might be better to say AT or ON or WHEN rather > than BEFORE, since BEFORE END is just a little strange; and also > because a future hook point might be something like > "permissions-checking", and we really want it to be AT > permissions-checking time, not BEFORE permissions-checking time. Yeah I tried different spellings and almost sent a version using AT or WHEN, but it appeared to me not to be specific enough: AT permission checking time does not exist, it either happens before or after permission checking. I played with using “preceding” rather than before, too, maybe you would like that better. So I would document the different steps and their ordering, then use only BEFORE as a qualifier, and add a pseudo step which is the end of the command. > If I got to pick, I'd pick this syntax: > > CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) > EXECUTE PROCEDURE function_name(args); > > Then you could eventually imagine: > > CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE > PROCEDURE throw_an_error(); > CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start > (alter_table) EXECUTE PROCEDURE throw_an_error(); > CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE > PROCEDURE record_table_drop(); > CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check > (create_extension) EXECUTE PROCEDURE make_heroku_happy(); > CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start > (alter_table) EXECUTE PROCEDURE record_alter_table_subcommand(); I would amend that syntax to allow for a WHEN <expr> much like in the DML trigger case, where the expression can play with the variables exposed at the time of calling the trigger. create event trigger prohibit_some_ddl preceding <timing spec> when tag in ('CREATE TABLE', 'ALTER TABLE') execute procedure throw_an_error(); We must also take some time to describe the timing specs carefully, their order of operation and which information are available for triggers in each of them. See below. I also don't like the way you squeeze in the drop cascade support here by re qualifying it as a timing spec, which it's clearly not. On subcommand_start, what is the tag set to? the main command or the subcommand? if the former, where do you find the current subcommand tag? I don't think subcommand_start should be a timing spec either. > Piece by piece: > > - I prefer EVENT to COMMAND, because the start or end or any other > point during the execution of a command can be viewed as an event; > however, it is pretty clear that not everything that can be viewed as > an event upon which we might like triggers to fire can be viewed as a > command, even if we have a somewhat well-defined notion of > subcommands. I continue to think that there is no sense in which > creating a sequence to back a serial column or dropping an object due > to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an > event. Anything we want can be an event. I think if we don't get > this right in the first version we're going to be stuck with a > misnomer forever. :-( I can buy naming what we're building here EVENT TRIGGERs, albeit with some caveats: what I implemented here only caters for commands for which we have a distinct parse tree. That's what sub commands are in effect and why it supports create schema sub commands but not all alter table declination, and not cascaded drops. Also, implementing cascade operations as calling process utility with a complete parse tree is not going to be a little project on its own. We can also punt that and document that triggers fired on cascaded drops are not receiving a parse tree, only the command tag and object id and object name and schema name, or even just the command tag and object id in order not to incur too many additional name lookups. Lastly, EVENT opens the road to thinking about transaction event triggers, on begin|commit|rollback etc. Do we want to take that road? > - It is pretty clear that for triggers that are supposed to fire at > the start or end of a command, it's useful to be able to specify which > commands it fires for. However, there must be other types of events > (as in my postulated sql_drop case above) where the name of the > command is not relevant - people aren't going to find all the objects > that got dropped as a result of a toplevel drop table command; they're > going to want to find all the tables that got dropped regardless of > which incantation did it. Also keep in mind that you can do things > like use ALTER TABLE to rename a view (and there are other cases of > that sort of confusion with domains vs. types, aggregates vs. > functions, etc), so being able to filter based on object type seems > clearly better in a bunch of real-world cases. And, even if you don't > believe that specific example, a general syntax leaves us with freedom > to maneuver if we discover other needs in the future. Writing > alter_table etc. as tokens rather than as ALTER TABLE also has the > further advantages of (1) greatly decreasing the parser footprint of > this feature and (2) relieving everyone who adds commands in the > future of the need to also change the command-trigger grammar. Well that's a very little burden considering that you will need to add support for firing the triggers at the right time with the right piece of information provided, but why not. I'm also not convinced that the complexity of implementing proper error management is worth simplifying the parser too much. I still think that maybe we should bite the bullet and have two separate parsers, one dedicated for ExplainableStmt and transaction control and such, and one for DDL operations. It seems to me the trade offs here are not the same at all: the performance versus user friendliness ratio is very different. Of course, that's a whole different project now and won't happen for 9.3, nor is it a pre-requisite for this patch. Fortunately. > - I don't think that triggers at what Dimitri is calling "before > execute" are a good idea at all for the first version of this feature, > because there is a lot of stuff that might break and examining that > should really be a separate project from getting the basic > infrastructure in place. However, I also don't like the name. Unlike I smell confusion, I meant for "before execute" command triggers to be the one I've already implemented in the current patch as "BEFORE COMMAND" triggers. That is they fire after permission checks, error checks and name lookups and locking. > queries, DDL commands don't have clearly separate planning and > execution phases, so saying that something happens before execution > isn't really very clear. As previously discussed, the hook points in The reason for that is that I still was trying to reuse the current set of keywords rather than already trying to use another more flexible mechanism. That's about it. > the latest patch are not all entirely consistent between one command > and the next, not only in the larger ways I've already pointed out but > also in some smaller ways. Different commands do different kinds of I had not set a solid policy here yet, as I figured out that whatever I would decide to implement would get thrown away at this stage of the review. > integrity checks and the firing points are not always in exactly the > same place in that sequence. Of course, not all the commands do all > the checks in the same order, so some possibly-not-trivial refactoring > is probably required to get that consistent. Moreover, it doesn't I wouldn't be surprised that we have some adjusting to do here, but the current code on this light is clearly the result of copy/pastes. So it's quite ok even if not exactly on purpose. > seem out of reach to think that in the future we might draw a more > clear line again between pre-execution and execution proper, to solve > problems like the tendency of the current code to look up the same > name more than once and pray that it gets the same OID back every > time. So I think that when we get to the point of sticking hooks in > at this location, we should call them something much more specific > than before-execute hooks. I'm not sure exactly what. Maybe I think we should describe the steps we would like to have now already then filter out those that are out of scope for 9.2 and implement the others. We would only document the implemented steps, I guess. So we seem to have: 1. start2. permissions_check3. consistency_check4. acquire_lock5. execute, for lack of a better term6. end The main problem with that scheme is that while it works fine for simple commands, complex one will have to do lookup and take locks in the middle of the permission and consistency checks, and I don't think we're going to be able to change that. One possible answer is to say that for some commands, they all are in fact the same event and your event triggers will in fact get run from the same place in the command implementation, but still in that order. So we can't guarantee at large that a consistency_check trigger is always run before we did name lookups and acquired locks. > something like after_ddl_name_lookup or something like that. The DDL > name lookup might move slightly earlier or later as a result of > refactoring, but it can't disappear and it can't cease to involve > locking and permissions checking. The set of things that you can > reasonably do there (additional security checks, auditing) can't > really change much either. Exactly. In theory at least :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Mar 29, 2012 at 11:49 AM, Thom Brown <thom@linux.com> wrote: > On 29 March 2012 16:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Thu, Mar 29, 2012 at 9:01 AM, Thom Brown <thom@linux.com> wrote: >>>> On 29 March 2012 13:30, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >>>>> I'll go make that happen, and still need input here. We first want to >>>>> have command triggers on specific commands or ANY command, and we want >>>>> to implement 3 places from where to fire them. >>>>> >>>>> Here's a new syntax proposal to cope with that: >>>> >>>> Is it necessary to add this complexity in this version? Can't we keep >>>> it simple but in a way that allows the addition of this later? The >>>> testing of all these new combinations sounds like a lot of work. >>> >>> I concur. This is way more complicated than we should be trying to do >>> in version 1. >> >> I'm at a loss here. This proposal was so that we can reach a commonly >> agreed minimal solution and design in first version. There's no new >> piece of infrastructure to add, the syntax is changed only to open the >> road for later, I'm not changing where the current command triggers are >> to be called (except for those which are misplaced). >> >> So, please help me here: what do we want to have in 9.3? > > Perhaps I misunderstood. It was the addition of the fine-grained even > options (parse, execute etc) I saw as new. If you're saying those are > just possible options for later that won't be in this version, I'm > fine with that. If those are to make it for 9.2, then creating the > necessary test cases and possible fixes sounds infeasible in such a > short space of time. Please disregard if this is not the case. So... I've said repeatedly and over a long period of time that development of this feature wasn't started early enough in the cycle to get it finished in time for 9.2. I think that I've identified some pretty serious issues in the discussion we've had so far, especially (1) the points at which figures are fired aren't consistent between commands, (2) not much thought has been given to what happens if, say, a DDL trigger performs a DDL operation on the table the outer DDL command is due to modify, and (3) we are eventually going to want to trap a much richer set of events than can be captured by the words "before" and "after". Now, you could view this as me throwing up roadblocks to validate my previously-expressed opinion that this wasn't going to get done, but I really, honestly believe that these are important issues and that getting them right is more important than getting something done now. If we still want to try to squeeze something into 9.2, I recommend stripping out everything except for what Dimitri called the before-any-command firing point. In other words, add a way to run a procedure after parsing of a command but before any name lookups have been done, any permissions checks have been done, or any locks have been taken. The usefulness of such a hook is of course limited but it is also a lot less invasive than the patch I recently reviewed and probably a lot safer. I actually think it's wise to do that as a first step even if it doesn't make 9.2, because it is much easier to build features like this incrementally and even a patch that does that will be reasonably complicated and difficult to review. Parenthetically, what Dimitri previously called the after-any-command firing point, all the way at the end of the statement but without any specific details about the object the statement operated on, seems just as good for a first step, maybe better, so that would be a fine foundation from my point of view as well. The stuff that happens somewhere in the middle, even just after locking and permissions checking, is more complex and I think that should be phase 2 regardless of which phase ends up in which release cycle. I'm not sure whether Dimitri's question about 9.3 was a typo for 9.2 or whether that's what he was actually asking about. But to answer that question for 9.3, I think we have time to build an extremely extensive infrastructure that covers a huge variety of use cases, and I think there is hardly any reasonable proposal that is off the table as far as that goes. We have a year to get that work done, and a year is a long time, and with incremental progress at each CommitFest we can do a huge amount. I can also say that I would be willing to put in some serious work during the 9.3 cycle to accelerate that progress, too, especally if (ahem) some other people can return the favor by reviewing my patches. :-) I could see us adding the functionality described above in one CommitFest and then spending the next three adding more whiz-bango frammishes and ending up with something really nice. Right now, though, we are very crunched for time, and probably shouldn't be entertaining anything that requires a tenth the code churn that this patch probably does; if we are going to do anything at all, it had better be as simple and uninvasive as we can make it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I've said repeatedly and over a long period of time that development > of this feature wasn't started early enough in the cycle to get it > finished in time for 9.2. I think that I've identified some pretty That could well be, yeah. > serious issues in the discussion we've had so far, especially (1) the > points at which figures are fired aren't consistent between commands, > (2) not much thought has been given to what happens if, say, a DDL > trigger performs a DDL operation on the table the outer DDL command is > due to modify, and (3) we are eventually going to want to trap a much > richer set of events than can be captured by the words "before" and > "after". Now, you could view this as me throwing up roadblocks to > validate my previously-expressed opinion that this wasn't going to get > done, but I really, honestly believe that these are important issues > and that getting them right is more important than getting something > done now. (1) is not hard to fix as soon as we set a policy, which the most pressing thing we need to do in my mind, whatever wemanage to commit in 9.2. (2) can be addressed with a simple blacklist that would be set just before calling user defined code, and cleaned whendone running it. When in place the blacklist lookup is easy to implement in a central place (utility.c or cmdtrigger.c)and ereport() when the current command is in the blacklist. e.g. alter table would blacklist alter table and drop table commands on the current object id (3) we need to continue designing that, yes. I think we can have a first set of events defined now, even if we don't implementsupport for all of them readily. > If we still want to try to squeeze something into 9.2, I recommend > stripping out everything except for what Dimitri called the > before-any-command firing point. In other words, add a way to run a I would like that we can make that consistent rather than throw it, or maybe salvage a part of the command we support here. It's easy enough if boresome to document which commands are supported in which event timing. > procedure after parsing of a command but before any name lookups have > been done, any permissions checks have been done, or any locks have > been taken. The usefulness of such a hook is of course limited but it > is also a lot less invasive than the patch I recently reviewed and > probably a lot safer. I actually think it's wise to do that as a > first step even if it doesn't make 9.2, because it is much easier to > build features like this incrementally and even a patch that does that > will be reasonably complicated and difficult to review. Yes. > Parenthetically, what Dimitri previously called the after-any-command > firing point, all the way at the end of the statement but without any > specific details about the object the statement operated on, seems > just as good for a first step, maybe better, so that would be a fine > foundation from my point of view as well. The stuff that happens Now that fetching the information is implemented, I guess that we could still provide for it when firing event trigger at that timing spec. Of course that means a bigger patch to review when compared to not having the feature, but Thom did spend loads of time to test this part of the implementation. > somewhere in the middle, even just after locking and permissions > checking, is more complex and I think that should be phase 2 > regardless of which phase ends up in which release cycle. Mmmm, ok. > I'm not sure whether Dimitri's question about 9.3 was a typo for 9.2 Typo :) I appreciate your offer, though :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Mar 29, 2012 at 12:17 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> create command trigger before COMMAND_STEP of alter table >>> execute procedure snitch(); >> >> One thought is that it might be better to say AT or ON or WHEN rather >> than BEFORE, since BEFORE END is just a little strange; and also >> because a future hook point might be something like >> "permissions-checking", and we really want it to be AT >> permissions-checking time, not BEFORE permissions-checking time. > > Yeah I tried different spellings and almost sent a version using AT or > WHEN, but it appeared to me not to be specific enough: AT permission > checking time does not exist, it either happens before or after > permission checking. I played with using “preceding” rather than before, > too, maybe you would like that better. Well, preceding and before are synonyms, so I don't see any advantage in that change. But I really did mean AT permissions_checking time, not before or after it. That is, we'd have a hook where instead of doing something like this: aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT); ...we'd call a superuser-supplied trigger function and let it make the call. This requires a clear separation of permissions and integrity checking that we are currently lacking, and probably quite a bit of other engineering too, but I think we should assume that we're going to do it at some point because there is it seems pretty clear that there is a huge amount of pent-up demand for being able to do things like this. >> If I got to pick, I'd pick this syntax: >> >> CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) >> EXECUTE PROCEDURE function_name(args); >> >> Then you could eventually imagine: >> >> CREATE EVENT TRIGGER prohibit_all_ddl ON ddl_command_start EXECUTE >> PROCEDURE throw_an_error(); >> CREATE EVENT TRIGGER prohibit_some_ddl ON ddl_command_start >> (alter_table) EXECUTE PROCEDURE throw_an_error(); >> CREATE EVENT TRIGGER replicate_table_drops ON sql_drop (table) EXECUTE >> PROCEDURE record_table_drop(); >> CREATE EVENT TRIGGER allow_whitelisted_extensions ON permissions_check >> (create_extension) EXECUTE PROCEDURE make_heroku_happy(); >> CREATE EVENT TRIGGER replicate_at_subcommands ON subcommand_start >> (alter_table) EXECUTE PROCEDURE record_alter_table_subcommand(); > > I would amend that syntax to allow for a WHEN <expr> much like in the > DML trigger case, where the expression can play with the variables > exposed at the time of calling the trigger. > > create event trigger prohibit_some_ddl > preceding <timing spec> > when tag in ('CREATE TABLE', 'ALTER TABLE') > execute procedure throw_an_error(); We could do it that way, but the tag in ('CREATE TABLE', 'ALTER TABLE') syntax will be tricky to evaluate. I'd really prefer NOT to need to start up and shut down the executor to figure out whether the trigger has to fire. If we are going to do it that way then we need to carefully define what variables are available and what values they get. I think that most people's event-filtering needs will be simple enough to be served by a more declarative syntax. > We must also take some time to describe the timing specs carefully, > their order of operation and which information are available for > triggers in each of them. See below. > > I also don't like the way you squeeze in the drop cascade support here > by re qualifying it as a timing spec, which it's clearly not. > > On subcommand_start, what is the tag set to? the main command or the > subcommand? if the former, where do you find the current subcommand tag? > I don't think subcommand_start should be a timing spec either. That's up for grabs, but I was thinking the subcommand. For example, consider: alter table t alter column a add column b int, disable trigger all; I would imagine this having a firing sequence something like this: ddl_command_start:alter_table ddl_command_permissions_check:alter_table ddl_command_name_lookup:alter_table alter_table_subcommand_prep:add_column alter_table_subcommand_prep:disable_trigger_all alter_table_subcommand_start:add_column sql_create:column alter_table_subcommand_end:add_column alter_table_subcommand_start:disable_trigger_all alter_table_subcommand_end:disable_trigger_all ddl_command_end:alter_table Lots of room for argument there, of course. >> Piece by piece: >> >> - I prefer EVENT to COMMAND, because the start or end or any other >> point during the execution of a command can be viewed as an event; >> however, it is pretty clear that not everything that can be viewed as >> an event upon which we might like triggers to fire can be viewed as a >> command, even if we have a somewhat well-defined notion of >> subcommands. I continue to think that there is no sense in which >> creating a sequence to back a serial column or dropping an object due >> to DROP .. CASCADE is a subcommand; it is, however, pretty clearly an >> event. Anything we want can be an event. I think if we don't get >> this right in the first version we're going to be stuck with a >> misnomer forever. :-( > > I can buy naming what we're building here EVENT TRIGGERs, albeit with > some caveats: what I implemented here only caters for commands for which > we have a distinct parse tree. That's what sub commands are in effect > and why it supports create schema sub commands but not all alter table > declination, and not cascaded drops. > > Also, implementing cascade operations as calling process utility with a > complete parse tree is not going to be a little project on its own. We > can also punt that and document that triggers fired on cascaded drops > are not receiving a parse tree, only the command tag and object id and > object name and schema name, or even just the command tag and object id > in order not to incur too many additional name lookups. Yeah, I would not try to pass the parse tree to every event trigger, just the ones where it's well-defined. For drops, I would just say, hey, this is what we dropped. The user can log it or ereport, but that's about it. Still, that's clearly enough to do a lot of really interesting stuff, replication being the most obvious example. > Lastly, EVENT opens the road to thinking about transaction event > triggers, on begin|commit|rollback etc. Do we want to take that road? I wouldn't rule it out. If we can make it work, I think it'd make a lot of people happy. Not all combinations are practical, in all likelihood, like I don't think you can have a trigger that fires before you begin the transaction, because you need an error-handling context before you can do anything. But you might be able to have post-begin and pre-commit triggers. > Well that's a very little burden considering that you will need to add > support for firing the triggers at the right time with the right piece > of information provided, but why not. I'm also not convinced that the > complexity of implementing proper error management is worth simplifying > the parser too much. Fair point. My main motivation is really that I'd like to have a syntax that doesn't presume very much about what sorts of firing points we might have. For many purposes, it makes sense to key on which command is executing, but for some it might make sense to cue on subcommand, type of SQL object affected, or something else entirely. If we can leave the door open to such use cases, I think we will be happier, because I bet they will come along in time if not right now, and it will make everything simpler if we haven't painted ourselves into a corner syntax-wise. [..snip..] > So we seem to have: > > 1. start > 2. permissions_check > 3. consistency_check > 4. acquire_lock > 5. execute, for lack of a better term > 6. end > > The main problem with that scheme is that while it works fine for simple > commands, complex one will have to do lookup and take locks in the > middle of the permission and consistency checks, and I don't think we're > going to be able to change that. I think that you've got that a bit mixed up: consistency checks typically happen after acquiring locks, not before, because until you've locked any consistency checks you do are so much rubbish, as the situation can change under you at any time. Also, ALL commands need to do lookups and take locks in the middle of permissions checks; any that don't are buggy. SOME commands also need to do such things during execution, as in the case of ALTER TABLE .. ADD FOREIGN KEY, which locks and checks the main table first and then eventually gets around to checking the other table. > One possible answer is to say that for some commands, they all are in > fact the same event and your event triggers will in fact get run from > the same place in the command implementation, but still in that order. Yes, I think there may be cases where it works out that way. Or possibly some events aren't defined for some command types at all, or a little of both. I think we can and should do whatever makes the behavior least surprising from a user perspective. > So we can't guarantee at large that a consistency_check trigger is > always run before we did name lookups and acquired locks. Is this a typo, can't for can? If not, I'm confused, because it seems like you were saying we could and should guarantee that, and I agree that we can and should. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Apropos of nothing and since I haven't found a particularly good time to say this in amidst all the technical discussion, I appreciate very much all the work you've been putting into this. On Thu, Mar 29, 2012 at 12:42 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> serious issues in the discussion we've had so far, especially (1) the >> points at which figures are fired aren't consistent between commands, >> (2) not much thought has been given to what happens if, say, a DDL >> trigger performs a DDL operation on the table the outer DDL command is >> due to modify, and (3) we are eventually going to want to trap a much >> richer set of events than can be captured by the words "before" and >> "after". Now, you could view this as me throwing up roadblocks to >> validate my previously-expressed opinion that this wasn't going to get >> done, but I really, honestly believe that these are important issues >> and that getting them right is more important than getting something >> done now. > > (1) is not hard to fix as soon as we set a policy, which the most > pressing thing we need to do in my mind, whatever we manage to > commit in 9.2. I agree that setting a policy is enormously important, not so sure about how easy it is to fix, but we shall see. My primary and overriding goal here is to make sure that we don't make any design decisions, in the syntax or otherwise, that foreclose the ability to add policies that we've not yet dreamed of to future versions of the code. It seems vanishingly unlikely that the first commit will cover all the use cases that people care about, and only slightly more likely that we'll even be able to list them all at that point. Perhaps I'm wrong and we already know what they are, but I'd like to bet against any of us - and certainly me - being that smart. In terms of policies, there are two that seems to me to be very clear: at the very beginning of ProcessUtility, and at the very end of it, *excluding* recursive calls to ProcessUtility that are intended to handle subcommands. I think we could call the first start or command_start or ddl_command_start or post-parse or pre-locking or something along those lines, and the second end or command_end or ddl_command_end. I think it might be worth including "ddl" in there because the scope of the current patch really is just DDL (as opposed to say DCL) and having a separate set of firing points for DCL does not seem dumb. If we ever try to do anything with transaction control as you suggested upthread that's likely to also require special handling. Calling it, specifically, ddl_command_start makes the scope very clear. > (2) can be addressed with a simple blacklist that would be set just > before calling user defined code, and cleaned when done running it. > When in place the blacklist lookup is easy to implement in a central > place (utility.c or cmdtrigger.c) and ereport() when the current > command is in the blacklist. > > e.g. alter table would blacklist alter table and drop table commands > on the current object id Here we're talking about a firing point that we might call ddl_post_name_lookup or something along those lines. I would prefer to handle this by making the following rules - here's I'm assuming that we're talking about the case where the object in question is a relation: 1. The trigger fires immediately after RangeVarGetRelidExtended and before any other checks are performed, and especially before any CheckTableNotInUse(). This last is important, because what matters is whether the table's not in use AFTER all possible user-supplied code is executed. If the trigger opens or closes cursors, for example, what matters is whether there are any cursors left open *after* the triggers complete, not whether there were any open on entering the trigger. The same is true of any other integrity check: if there's a chance that the trigger could change something that affects whether the integrity constraint gets fired, then we'd better be darn sure to fire the trigger before we check the integrity constraint. 2. If we fire any triggers at this point, we CommandCounterIncrement() and then re-do RangeVarGetRelidExtended() with the same arguments we passed before. If it doesn't return the same OID we got the first time, we abort with some kind of serialization error. We need to be a little careful about the phrasing of the error message, because it's possible for this to change during command execution even without command triggers if somebody creates a table earlier in the search path with the same unqualified name that was passed to the command, but I think it's OK for the existence of a command-trigger to cause a spurious abort in that very narrow case, as long as the error message includes some appropriate weasel language. Alternatively, we could try to avoid that corner case by rechecking only that a tuple with the right OID is still present and still has the correct relkind, but that seems like it might be a little less bullet-proof for no real functional gain. There is some difficulty with non-relation objects because we don't really do proper locking there. For those we should, I think, rearrange things so that the permissions check happens right after the name lookup if it doesn't already, and then have the ddl_post_name_lookup trigger fire right after that. After firing the triggers we should redo the name lookup and complain if the result has changed. If somebody improves the locking there in the future, these cases will become fully symmetrical with the above. As compared with your proposed blacklist solution, I think this has the advantage of (1) being more localized in terms of code modification and (2) being slightly more bullet-proof. For example, suppose somebody manually deletes a row out of pg_class from the trigger. Yeah, I know they shouldn't do that, and yeah, it's not our job to make those cases work, but if we can tolerate them better for the same amount of work (or less) then I'd rather do it. > (3) we need to continue designing that, yes. I think we can have a first > set of events defined now, even if we don't implement support for > all of them readily. No argument. >> Parenthetically, what Dimitri previously called the after-any-command >> firing point, all the way at the end of the statement but without any >> specific details about the object the statement operated on, seems >> just as good for a first step, maybe better, so that would be a fine >> foundation from my point of view as well. The stuff that happens > > Now that fetching the information is implemented, I guess that we could > still provide for it when firing event trigger at that timing spec. Of > course that means a bigger patch to review when compared to not having > the feature, but Thom did spend loads of time to test this part of the > implementation. I agree that end-of-command triggers should be able to get a full report about what happened during the command. I think that will take a bunch more work to get right, even though you've whittled down the scope of what information is to be provided quite a bit. I think it's important to get some of the other things we've been talking about (see above) hammered out first, so I haven't spent a lot of time yet figuring out exactly what I think might break in the current implementation, but I'm going to analyze it more once we've got the basics done. It's worth noting that this can be added in a backward-compatible way - i.e. if PG 9.x supports end-of-command triggers that don't get any information besides the command tag, PG 9.(x+1) can include set a bunch of magic variables to provide that information. I'm glad you ended up revising things to use the magic-variable approach rather than a list of specific parameters for just this reason. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Well, preceding and before are synonyms, so I don't see any advantage > in that change. But I really did mean AT permissions_checking time, > not before or after it. That is, we'd have a hook where instead of > doing something like this: > > aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT); > > ...we'd call a superuser-supplied trigger function and let it make the Wow. I don't think I like that at all. It's indeed powerful, but how do you go explaining and fixing unanticipated behavior with such things in production? It looks too much like an invitation to break a very careful design where each facility has to rove itself to get in. So yes, that's not at all what I was envisioning, I still think that most event specs for event triggers are going to have to happen in between our different internal implementation of given facility. Maybe we should even use BEFORE in cases I had in mind and AT for cases like the one you're picturing? >>> CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) >>> EXECUTE PROCEDURE function_name(args); >> >> I would amend that syntax to allow for a WHEN <expr> much like in the >> DML trigger case, where the expression can play with the variables >> exposed at the time of calling the trigger. >> >> create event trigger prohibit_some_ddl >> preceding <timing spec> >> when tag in ('CREATE TABLE', 'ALTER TABLE') >> execute procedure throw_an_error(); > > We could do it that way, but the tag in ('CREATE TABLE', 'ALTER > TABLE') syntax will be tricky to evaluate. I'd really prefer NOT to > need to start up and shut down the executor to figure out whether the > trigger has to fire. If we are going to do it that way then we need > to carefully define what variables are available and what values they > get. I think that most people's event-filtering needs will be simple > enough to be served by a more declarative syntax. We could then maybe restrict that idea so that the syntax is more like when <trigger variable> in (<literal>[, ...]) So that we just have to store the array of strings and support only one operation there. We might want to go as far as special casing the object id to store an oid vector there rather than a text array, but we could also decide not to support per-oid command triggers in the first release and remove that from the list of accepted <trigger variable>s. >> I also don't like the way you squeeze in the drop cascade support here >> by re qualifying it as a timing spec, which it's clearly not. >> >> On subcommand_start, what is the tag set to? the main command or the >> subcommand? if the former, where do you find the current subcommand tag? >> I don't think subcommand_start should be a timing spec either. > > That's up for grabs, but I was thinking the subcommand. For example, consider: > > alter table t alter column a add column b int, disable trigger all; > > I would imagine this having a firing sequence something like this: > > ddl_command_start:alter_table > ddl_command_permissions_check:alter_table > ddl_command_name_lookup:alter_table > alter_table_subcommand_prep:add_column > alter_table_subcommand_prep:disable_trigger_all > alter_table_subcommand_start:add_column > sql_create:column > alter_table_subcommand_end:add_column > alter_table_subcommand_start:disable_trigger_all > alter_table_subcommand_end:disable_trigger_all > ddl_command_end:alter_table > > Lots of room for argument there, of course. Yeah well, in my mind the alter table actions are not subcommands yet because they don't go back to standard_ProcessUtility(). The commands in CREATE SCHEMA do that, same for create table foo(id serial primary key) and its sequence and index. We could maybe add another variable called "context" in the command trigger procedure API that would generally be NULL and would be set to the main command tag if we're running a subcommand. We could still support alter table commands as sub commands as far as the event triggers are concerned, and document that you don't get a parse tree there even when implementing the trigger in C. > Yeah, I would not try to pass the parse tree to every event trigger, > just the ones where it's well-defined. For drops, I would just say, > hey, this is what we dropped. The user can log it or ereport, but > that's about it. Still, that's clearly enough to do a lot of really > interesting stuff, replication being the most obvious example. Yes the use case is important enough to want to support it. Now, to square it into the design. I think it should fire as a “normal” event trigger, with the context set to e.g. DROP TYPE, the tag set to this object classid (e.g. DROP FUNCTION). It's easy enough to implement given that CreateCommandTag() already switch (((DropStmt *) parsetree)->removeType), we just need to move that code in another function and reuse it at the right places. It's only missing DROP COLUMN apparently, I would have to add that. So you would know that the trigger is firing for a DROP FUNCTION that happens as part of a DROP TYPE, and I guess you can deduce it's happening because of a CASCADE behavior, right? If there's room for doubt we need to expose a boolean variable named “behavior” that would clarify that doubt situation, maybe it's just too late but I don't see a need for that. >> Lastly, EVENT opens the road to thinking about transaction event >> triggers, on begin|commit|rollback etc. Do we want to take that road? > > I wouldn't rule it out. If we can make it work, I think it'd make a > lot of people happy. Not all combinations are practical, in all > likelihood, like I don't think you can have a trigger that fires > before you begin the transaction, because you need an error-handling > context before you can do anything. But you might be able to have > post-begin and pre-commit triggers. Good. Not this release, though, of course. > Fair point. My main motivation is really that I'd like to have a > syntax that doesn't presume very much about what sorts of firing > points we might have. For many purposes, it makes sense to key on IIUC the technique we're using in the EXPLAIN options case, it's easy to stay flexible as long as we're providing only single words here, or at least a simple “parser unit” thing. Hence the use of underscores in the timing spec names. >> So we seem to have: >> >> 1. start >> 2. permissions_check >> 3. consistency_check >> 4. acquire_lock >> 5. execute, for lack of a better term >> 6. end >> >> The main problem with that scheme is that while it works fine for simple >> commands, complex one will have to do lookup and take locks in the >> middle of the permission and consistency checks, and I don't think we're >> going to be able to change that. > > I think that you've got that a bit mixed up: consistency checks > typically happen after acquiring locks, not before, because until > you've locked any consistency checks you do are so much rubbish, as > the situation can change under you at any time. Also, ALL commands Ah right. > need to do lookups and take locks in the middle of permissions checks; > any that don't are buggy. SOME commands also need to do such things > during execution, as in the case of ALTER TABLE .. ADD FOREIGN KEY, > which locks and checks the main table first and then eventually gets > around to checking the other table. The namespace is often resolved first, sometime even before permission checks are done. See CreateConversionCommand() for an example of that, but really that happens about everywhere IIRC. >> One possible answer is to say that for some commands, they all are in >> fact the same event and your event triggers will in fact get run from >> the same place in the command implementation, but still in that order. > > Yes, I think there may be cases where it works out that way. Or > possibly some events aren't defined for some command types at all, or > a little of both. I think we can and should do whatever makes the > behavior least surprising from a user perspective. Good, agreed. That will make it into a nice documentation table. >> So we can't guarantee at large that a consistency_check trigger is >> always run before we did name lookups and acquired locks. > > Is this a typo, can't for can? If not, I'm confused, because it seems > like you were saying we could and should guarantee that, and I agree > that we can and should. We don't share the same view on the current state of the code, I don't think we can guarantee that even if you think we should, unfortunately. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes: > Apropos of nothing and since I haven't found a particularly good time > to say this in amidst all the technical discussion, I appreciate very > much all the work you've been putting into this. Hey, thanks, I very much appreciate your support here! >> (1) is not hard to fix as soon as we set a policy, which the most >> pressing thing we need to do in my mind, whatever we manage to >> commit in 9.2. > > I agree that setting a policy is enormously important, not so sure > about how easy it is to fix, but we shall see. I think your proposal idea on the table is fixing it (the new event timing specs), it's all about fine tuning it now. > My primary and overriding goal here is to make sure that we don't make > any design decisions, in the syntax or otherwise, that foreclose the > ability to add policies that we've not yet dreamed of to future Yeah we're not hard coding the list of possible event timings in the syntax. > In terms of policies, there are two that seems to me to be very clear: > at the very beginning of ProcessUtility, and at the very end of it, > *excluding* recursive calls to ProcessUtility that are intended to > handle subcommands. I think we could call the first start or > command_start or ddl_command_start or post-parse or pre-locking or > something along those lines, and the second end or command_end or > ddl_command_end. I think it might be worth including "ddl" in there > because the scope of the current patch really is just DDL (as opposed > to say DCL) and having a separate set of firing points for DCL does > not seem dumb. If we ever try to do anything with transaction control > as you suggested upthread that's likely to also require special > handling. Calling it, specifically, ddl_command_start makes the scope > very clear. I'm not sure about that really. First, the only reason why DCL command are not supported in the current patch is because roles are global objects and we don't want role related behavior to depend on which database you're currently connected to. Then, I guess any utility command implementation will share the same principle of having a series of step and allowing a user defined function to get called in between some of them. So adding support for commands that won't fit in the DDL timing specs we're trying to design now amounts to adding timing specs for those commands. I'm not sold that the timing specs should contain ddl or dcl or other things, really, I find that to be ugly, but I won't fight over that. >> (2) can be addressed with a simple blacklist that would be set just >> before calling user defined code, and cleaned when done running it. >> When in place the blacklist lookup is easy to implement in a central >> place (utility.c or cmdtrigger.c) and ereport() when the current >> command is in the blacklist. >> >> e.g. alter table would blacklist alter table and drop table commands >> on the current object id > > Here we're talking about a firing point that we might call > ddl_post_name_lookup or something along those lines. I would prefer > to handle this by making the following rules - here's I'm assuming > that we're talking about the case where the object in question is a > relation: > > 1. The trigger fires immediately after RangeVarGetRelidExtended and > before any other checks are performed, and especially before any > CheckTableNotInUse(). This last is important, because what matters is > whether the table's not in use AFTER all possible user-supplied code > is executed. If the trigger opens or closes cursors, for example, > what matters is whether there are any cursors left open *after* the > triggers complete, not whether there were any open on entering the > trigger. The same is true of any other integrity check: if there's a > chance that the trigger could change something that affects whether > the integrity constraint gets fired, then we'd better be darn sure to > fire the trigger before we check the integrity constraint. Ack. > 2. If we fire any triggers at this point, we CommandCounterIncrement() > and then re-do RangeVarGetRelidExtended() with the same arguments we > passed before. If it doesn't return the same OID we got the first > time, we abort with some kind of serialization error. We need to be a > little careful about the phrasing of the error message, because it's > possible for this to change during command execution even without > command triggers if somebody creates a table earlier in the search > path with the same unqualified name that was passed to the command, > but I think it's OK for the existence of a command-trigger to cause a > spurious abort in that very narrow case, as long as the error message > includes some appropriate weasel language. Alternatively, we could > try to avoid that corner case by rechecking only that a tuple with the > right OID is still present and still has the correct relkind, but that > seems like it might be a little less bullet-proof for no real > functional gain. I can buy having a special error case here as it's about concurrent DDL and while we need to get it right, errors are often the best we can do. Now it seems to me you're already describing the ultimate goal of being able to recheck the pre-conditions once the user defined code returned, whereas I'm proposing a blunt tool for version 1 in order to defensively avoid the class of problems. > There is some difficulty with non-relation objects because we don't > really do proper locking there. For those we should, I think, > rearrange things so that the permissions check happens right after the > name lookup if it doesn't already, and then have the > ddl_post_name_lookup trigger fire right after that. After firing the > triggers we should redo the name lookup and complain if the result has > changed. If somebody improves the locking there in the future, these > cases will become fully symmetrical with the above. Again, next version, right? Rearrange things is gigantic code churn. > As compared with your proposed blacklist solution, I think this has > the advantage of (1) being more localized in terms of code > modification and (2) being slightly more bullet-proof. For example, > suppose somebody manually deletes a row out of pg_class from the > trigger. Yeah, I know they shouldn't do that, and yeah, it's not our > job to make those cases work, but if we can tolerate them better for > the same amount of work (or less) then I'd rather do it. It's more sophisticated, no doubt. It's at least an order of magnitude more work, too, from the simple fact that it begins with proof reading all commands implementation to “rearrange things”. >> (3) we need to continue designing that, yes. I think we can have a first >> set of events defined now, even if we don't implement support for >> all of them readily. > > No argument. Cool :) > I agree that end-of-command triggers should be able to get a full > report about what happened during the command. I think that will take > a bunch more work to get right, even though you've whittled down the > scope of what information is to be provided quite a bit. I think it's > important to get some of the other things we've been talking about > (see above) hammered out first, so I haven't spent a lot of time yet > figuring out exactly what I think might break in the current > implementation, but I'm going to analyze it more once we've got the > basics done. It's worth noting that this can be added in a Ok, so I soon will be able to reshuffle the code with the current design, adding two simple entry points for command triggers and revising the internal APIs to adjust. Then with some luck we get back the intermediate call that used to be spelled "before command". To be able to have the “full report” in the end-of-command case, we will need to maintain the location from where the command triggers are called as of now, that is spread in every command implementation. > backward-compatible way - i.e. if PG 9.x supports end-of-command > triggers that don't get any information besides the command tag, PG > 9.(x+1) can include set a bunch of magic variables to provide that > information. I'm glad you ended up revising things to use the > magic-variable approach rather than a list of specific parameters for > just this reason. Yeah it's better this way even if it means implementing the same thing for each pl in core, so 4 times… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Mar 29, 2012 at 4:21 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Well, preceding and before are synonyms, so I don't see any advantage >> in that change. But I really did mean AT permissions_checking time, >> not before or after it. That is, we'd have a hook where instead of >> doing something like this: >> >> aclresult = pg_class_aclcheck(reloid, GetUserId(), ACL_SELECT); >> >> ...we'd call a superuser-supplied trigger function and let it make the > > Wow. I don't think I like that at all. It's indeed powerful, but how do > you go explaining and fixing unanticipated behavior with such things in > production? It looks too much like an invitation to break a very careful > design where each facility has to rove itself to get in. I'm thinking of things like extension whitelisting. When some unprivileged user says "CREATE EXTENSION harmless", and harmless is marked as superuser-only, we might like to have a hook that gets called *at permissions-checking time* and gets to say, oh, well, that extension is on the white-list, so we're going to allow it. I think you can come up with similar cases for other commands, where in general the operation is restricted to superusers or database owners or table owners but in specific cases you want to allow others to do it. >>>> CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) >>>> EXECUTE PROCEDURE function_name(args); >>> >>> I would amend that syntax to allow for a WHEN <expr> much like in the >>> DML trigger case, where the expression can play with the variables >>> exposed at the time of calling the trigger. >>> >>> create event trigger prohibit_some_ddl >>> preceding <timing spec> >>> when tag in ('CREATE TABLE', 'ALTER TABLE') >>> execute procedure throw_an_error(); >> >> We could do it that way, but the tag in ('CREATE TABLE', 'ALTER >> TABLE') syntax will be tricky to evaluate. I'd really prefer NOT to >> need to start up and shut down the executor to figure out whether the >> trigger has to fire. If we are going to do it that way then we need >> to carefully define what variables are available and what values they >> get. I think that most people's event-filtering needs will be simple >> enough to be served by a more declarative syntax. > > We could then maybe restrict that idea so that the syntax is more like > > when <trigger variable> in (<literal>[, ...]) > > So that we just have to store the array of strings and support only one > operation there. We might want to go as far as special casing the object > id to store an oid vector there rather than a text array, but we could > also decide not to support per-oid command triggers in the first > release and remove that from the list of accepted <trigger variable>s. I guess that would make sense if you think there would ever be more than one choice for <trigger variable>. I'm not immediately seeing a use case for that, though - I was explicitly viewing the syntax foo (bar, baz) to mean foo when <the-only-trigger-variable-that-makes-sense-given-that-we-are-talking-about-a-trigger-on-foo> in (bar, baz). More generally, my thought on the structure of this is that you're going to have certain toplevel events, many of which will happen at only a single place in the code, like "an object got dropped" or "a DDL command started" or "a DDL command ended". So we give those names, like sql_drop, ddl_command_start, and ddl_command_end. Inside your trigger procedure, the set of magic variables that is available will depend on which toplevel event you set the trigger on, but hopefully all firings of that toplevel event can provide the same magic variables. For example, at ddl_command_start time, you're just gonna get the command tag, but at ddl_command_end time you will get the command tag plus maybe some other stuff. Now, we COULD stop there. I mean, we could document that you can create a trigger on ddl_command_start and every DDL command will fire that trigger, and if the trigger doesn't care about some DDL operations, then it can look at the command tag and return without doing anything for the operations it doesn't care about. The only real disadvantage of doing it that way is speed, and maybe a bit of code complexity within the trigger. So my further thought was that we'd allow you to specify an optional filter list to restrict which events would fire the trigger, and the exact meaning of that filter list would vary depending on the toplevel event you've chosen - i.e. for ddl_command_start, the filter would be a list of commands, but for sql_drop it would be a list of object types. We could make that more complicated if we think that an individual toplevel event will need more than one kind of filtering. For example, if you wanted to filter sql_drop events based on the object type AND/OR the schema name, then the syntax I proposed would obviously not be adequate. I'm just not convinced we need that, especially because you'd then need to set up a dependency between the command trigger and the schema, since obviously the command trigger must be dropped if the schema is. Maybe there are better examples; I just can't think of any. >> Fair point. My main motivation is really that I'd like to have a >> syntax that doesn't presume very much about what sorts of firing >> points we might have. For many purposes, it makes sense to key on > > IIUC the technique we're using in the EXPLAIN options case, it's easy to > stay flexible as long as we're providing only single words here, or at > least a simple “parser unit” thing. Hence the use of underscores in the > timing spec names. Yes. >> need to do lookups and take locks in the middle of permissions checks; >> any that don't are buggy. SOME commands also need to do such things >> during execution, as in the case of ALTER TABLE .. ADD FOREIGN KEY, >> which locks and checks the main table first and then eventually gets >> around to checking the other table. > > The namespace is often resolved first, sometime even before permission > checks are done. See CreateConversionCommand() for an example of that, > but really that happens about everywhere IIRC. Ah! I was thinking of ALTER commands, but you're right: CREATE commands are different. In many ways a CREATE command is more like an operation on the schema than it is an operation on the object itself (since that doesn't exist yet). Those all need to have permissions checks and locking *on the schema* intertwined with the schema lookup, as RangeVarGetAndCheckCreationNamespace() does. Unfortunately, the equivalent code for non-relations is broken, and I didn't have time to fix it in time for 9.2. I think that would be a good thing to get done for 9.3, though: it's only about 2 days worth of work, I think. >>> One possible answer is to say that for some commands, they all are in >>> fact the same event and your event triggers will in fact get run from >>> the same place in the command implementation, but still in that order. >> >> Yes, I think there may be cases where it works out that way. Or >> possibly some events aren't defined for some command types at all, or >> a little of both. I think we can and should do whatever makes the >> behavior least surprising from a user perspective. > > Good, agreed. That will make it into a nice documentation table. Yeah. This might even be a case where we should write the documentation first and then make the implementation match it, rather than the other way around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I'm thinking of things like extension whitelisting. When some > unprivileged user says "CREATE EXTENSION harmless", and harmless is > marked as superuser-only, we might like to have a hook that gets > called *at permissions-checking time* and gets to say, oh, well, that > extension is on the white-list, so we're going to allow it. I think > you can come up with similar cases for other commands, where in > general the operation is restricted to superusers or database owners > or table owners but in specific cases you want to allow others to do > it. I did that another way in previous incarnations of the patch, which was to allow for INSTEAD OF event trigger backed by a SECURITY DEFINER function. When the extension is whitelisted, prevent against recursion then CREATE EXTENSION in the security definer function, then signal that the execution should now be aborted. That was too dangerous given the lack of policy about where exactly the user code is fired, but I think we could now implement that for some of the event timing specs we're listing. Only some of them, I guess only those that are happening before we lock the objects. I would then prefer using the INSTEAD OF words that are way more easy to grasp than AT. >>>>> CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) >>>>> EXECUTE PROCEDURE function_name(args); >>>> >>>> create event trigger prohibit_some_ddl >>>> preceding <timing spec> >>>> when tag in ('CREATE TABLE', 'ALTER TABLE') >>>> execute procedure throw_an_error(); > > I guess that would make sense if you think there would ever be more > than one choice for <trigger variable>. I'm not immediately seeing a > use case for that, though - I was explicitly viewing the syntax foo So, the variables in question are tag, objectid, objectname, schemaname and from a very recent email context. On reflexion, I think the variable here would only be either tag or context, and that's it. > More generally, my thought on the structure of this is that you're > going to have certain toplevel events, many of which will happen at > only a single place in the code, like "an object got dropped" or "a > DDL command started" or "a DDL command ended". So we give those > names, like sql_drop, ddl_command_start, and ddl_command_end. Inside I really dislike mixing sql_drop and ddl_command_start as being the same kind of objects here, even if I can bend my head in the right angle and see that it's a fair view when looking at how it's implemented. I can't see a way to explain that to users without having to explain them how drop cascade is implemented. So my proposal here is to “fake” a “proper“ subcommand thanks to the new context variable. If you DROP TYPE foo CASCADE and that in turn drops a function foo_in(), then an event trigger is fired with context = 'DROP TYPE' tag = 'DROP FUNCTION' Same idea when you DROP TABLE … CASCADE and a SEQUENCE and a bunch of index need to disappear too, you get an usual event trigger fired with the context set to 'DROP TABLE' this time. I don't think we need to arrange for explicitly publishing the context specific information here. If we need to, we have to find the right timing spec where we can guarantee still being in the top level command and where we already have the details filled in, then users can attach a trigger here and register the information for themselves. > your trigger procedure, the set of magic variables that is available > will depend on which toplevel event you set the trigger on, but > hopefully all firings of that toplevel event can provide the same > magic variables. For example, at ddl_command_start time, you're just > gonna get the command tag, but at ddl_command_end time you will get > the command tag plus maybe some other stuff. With my proposal above, you could get the same set of information when being called as a toplevel event or a subevent (one where the context is not null). That would mean adding object name and schema name lokkups in the drop cascade code, though. We can also decide not to do that extra lookup and just publish the object id which we certainly do have. This way, the timing spec of a sub-event can still be of the same kind as the top-level event ones, we still have before and after lock entry points, same with lookup if we add that feature, etc. > Now, we COULD stop there. I mean, we could document that you can > create a trigger on ddl_command_start and every DDL command will fire > that trigger, and if the trigger doesn't care about some DDL > operations, then it can look at the command tag and return without > doing anything for the operations it doesn't care about. The only > real disadvantage of doing it that way is speed, and maybe a bit of > code complexity within the trigger. So my further thought was that Within my “context proposal”, you also lose the ability to refer to sub events as plain events with a context, which I find so much cleaner. > we'd allow you to specify an optional filter list to restrict which > events would fire the trigger, and the exact meaning of that filter > list would vary depending on the toplevel event you've chosen - i.e. > for ddl_command_start, the filter would be a list of commands, but for > sql_drop it would be a list of object types. We could make that more > complicated if we think that an individual toplevel event will need > more than one kind of filtering. For example, if you wanted to filter > sql_drop events based on the object type AND/OR the schema name, then > the syntax I proposed would obviously not be adequate. I'm just not > convinced we need that, especially because you'd then need to set up a > dependency between the command trigger and the schema, since obviously > the command trigger must be dropped if the schema is. Maybe there are > better examples; I just can't think of any. This WHEN syntax allows us to add support for more things in the way you're describing down the road, and as you say for first version of that we can limit ourselves to only supporting two variables, context and tag, and a single operation, in against a list of string literals. Thinking some more about it we need to add the AND operator in between several expressions here. We don't need to implement OR because that's easy to implement by creating more than one event trigger that calls the same function. We could choose to implement not in, too. Given the scope of this mini expression language, we can easily bypass calling the executor in v1 here, and reconsider later if we want to allow calling a UDF in the WHEN clause… I don't think it's an easy feature to add in, though. >> The namespace is often resolved first, sometime even before permission >> checks are done. See CreateConversionCommand() for an example of that, >> but really that happens about everywhere IIRC. > > Ah! I was thinking of ALTER commands, but you're right: CREATE > commands are different. In many ways a CREATE command is more like an > operation on the schema than it is an operation on the object itself > (since that doesn't exist yet). Those all need to have permissions > checks and locking *on the schema* intertwined with the schema lookup, > as RangeVarGetAndCheckCreationNamespace() does. Unfortunately, the > equivalent code for non-relations is broken, and I didn't have time to > fix it in time for 9.2. I think that would be a good thing to get > done for 9.3, though: it's only about 2 days worth of work, I think. Meanwhile we can easily enough refrain from exposing those timing specs to CREATE command events, though. > Yeah. This might even be a case where we should write the > documentation first and then make the implementation match it, rather > than the other way around. I still think we might want to double check how the code is actually written in each case (create vs alter vs drop, relation commands vs other types of objects, etc), so that will end up intertwined. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Mar 30, 2012 at 4:32 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > I did that another way in previous incarnations of the patch, which was > to allow for INSTEAD OF event trigger backed by a SECURITY DEFINER > function. When the extension is whitelisted, prevent against recursion > then CREATE EXTENSION in the security definer function, then signal that > the execution should now be aborted. > > That was too dangerous given the lack of policy about where exactly the > user code is fired, but I think we could now implement that for some of > the event timing specs we're listing. Only some of them, I guess only > those that are happening before we lock the objects. Oh, right: I remember that now. I still think it's a bad way to do it, because the trigger potentially has a lot of work to do to reconstruct a working command string, and it still ends up getting executed by the wrong user. For CREATE EXTENSION it's not that bad, because the arguments to the command are so simple, but of course any time we extend the CREATE EXTENSION syntax, the trigger needs to know about it too whether it's security-relevant or not, and doing something similar with, say, ALTER TABLE would be a ridiculously complicated. I think there is a use case for what you called an INSTEAD OF trigger, but I don't believe in this one. It seems to me that there's a lot of power in being able to *just* intercept the security decision and then let the rest of the command go about its business. Of course, you have to avoid getting security checks (like, you must own the table in order to drop a column) with integrity checks (like, you can't drop a column from pg_class) but I think that's not very hard to get right. >> More generally, my thought on the structure of this is that you're >> going to have certain toplevel events, many of which will happen at >> only a single place in the code, like "an object got dropped" or "a >> DDL command started" or "a DDL command ended". So we give those >> names, like sql_drop, ddl_command_start, and ddl_command_end. Inside > > I really dislike mixing sql_drop and ddl_command_start as being the same > kind of objects here, even if I can bend my head in the right angle and > see that it's a fair view when looking at how it's implemented. I can't > see a way to explain that to users without having to explain them how > drop cascade is implemented. > > So my proposal here is to “fake” a “proper“ subcommand thanks to the new > context variable. If you DROP TYPE foo CASCADE and that in turn drops a > function foo_in(), then an event trigger is fired with > > context = 'DROP TYPE' > tag = 'DROP FUNCTION' > > Same idea when you DROP TABLE … CASCADE and a SEQUENCE and a bunch of > index need to disappear too, you get an usual event trigger fired with > the context set to 'DROP TABLE' this time. > > I don't think we need to arrange for explicitly publishing the context > specific information here. If we need to, we have to find the right > timing spec where we can guarantee still being in the top level command > and where we already have the details filled in, then users can attach a > trigger here and register the information for themselves. I'm not sure I understand how you're using the words "context" and "tag". I think for a drop trigger I would want the function to receive this information: type of object dropped, OID of object dropped, column number in the case of a column drop, flag indicating whether it's a toplevel drop or a cascaded drop. I wouldn't object to also making the currently-in-context toplevel command tag available, but I think most drop triggers wouldn't really care, so I wouldn't personally spend much implementation effort on it if it turns out to be hard. But in general, I don't really know what a "proper" subcommand is or why some subcommands should be more proper than others, or why we should even be concerned about whether something is a subcommand at all. I think it's fine and useful to have triggers that fire in those kinds of places, but I don't see why we should limit ourselves to that. For applications like replication, auditing, and enhanced security, the parse tree and subcommand/non-subcommand status of a particular operation are irrelevant. What you need is an exact description of the operation that got performed (e.g. the default on table X column Y got dropped); you might be able to reverse-engineer that from the parse tree, but it's much better to have the system pass you the information you need more directly. Certainly, there are cases where you might want to have the parse tree, or even the raw command text, available, but I'm not even convinced that that those cases will be the most commonly used ones unless, of course, they're the only ones we offer, in which case everyone will go down that path by necessity. >> your trigger procedure, the set of magic variables that is available >> will depend on which toplevel event you set the trigger on, but >> hopefully all firings of that toplevel event can provide the same >> magic variables. For example, at ddl_command_start time, you're just >> gonna get the command tag, but at ddl_command_end time you will get >> the command tag plus maybe some other stuff. > > With my proposal above, you could get the same set of information when > being called as a toplevel event or a subevent (one where the context is > not null). That would mean adding object name and schema name lokkups in > the drop cascade code, though. We can also decide not to do that extra > lookup and just publish the object id which we certainly do have. > > This way, the timing spec of a sub-event can still be of the same kind > as the top-level event ones, we still have before and after lock entry > points, same with lookup if we add that feature, etc. Again, I'm not understanding the distinction between toplevel events and sub-events. I don't see any need for such a distinction. I think there are just events, and some of them happen at command start/end and others happen somewhere in the middle. As long as it's a safe and useful place to fire a trigger, who cares? > Given the scope of this mini expression language, we can easily bypass > calling the executor in v1 here, and reconsider later if we want to > allow calling a UDF in the WHEN clause… I don't think it's an easy > feature to add in, though. Or a necessary one. AFAICS, the main benefit of WHEN clauses on regular triggers is that you can prevent the AFTER trigger queue from getting huge, and maybe a save a little on the cost of invoking a trigger function just to exit again. But neither of those should be relevant here - nobody does that much DDL, and anybody writing command triggers should understand that this is advanced magic not intended for beginners. Wizards below level 10 need not apply. BTW, in reference to your other email, the reason I suggested ddl_command_start rather than just command_start is because we already know that there are cases here we'll never be able to handle, like before-begin. I think it's better to be explicit about what the scope of the feature is (i.e. DDL) rather than just calling it command_start and, oh, by the way, the commands we either don't know how to support or didn't get around to supporting yet aren't included. The latter approach is basically a guaranteed compatibility break every time we get around to adding a few more commands, and I'm not keen on that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Oh, right: I remember that now. I still think it's a bad way to do > it, because the trigger potentially has a lot of work to do to > reconstruct a working command string, and it still ends up getting > executed by the wrong user. For CREATE EXTENSION it's not that bad, That's true, I'm only kind of saying that the INSTEAD OF keyword still makes sense (instead of security_checks). I agree that the feature is simpler to use the way to propose it. >> context = 'DROP TYPE' >> tag = 'DROP FUNCTION' > > I'm not sure I understand how you're using the words "context" and Yeah context is not explicit, we could call that "toplevel": the command tag of the command that the user typed. When toplevel is null, the event trigger is fired on a command the user sent, when it's not null, the trigger is fired on some inner command operation. > "tag". I think for a drop trigger I would want the function to > receive this information: type of object dropped, OID of object > dropped, column number in the case of a column drop, flag indicating > whether it's a toplevel drop or a cascaded drop. I wouldn't object to > also making the currently-in-context toplevel command tag available, > but I think most drop triggers wouldn't really care, so I wouldn't > personally spend much implementation effort on it if it turns out to > be hard. I'm not sure it would be hard as I'm only seeing a single depth possible here, so a single per-backend string static variable would do. > But in general, I don't really know what a "proper" subcommand is or > why some subcommands should be more proper than others, or why we > should even be concerned about whether something is a subcommand at > all. I think it's fine and useful to have triggers that fire in those > kinds of places, but I don't see why we should limit ourselves to > that. For applications like replication, auditing, and enhanced > security, the parse tree and subcommand/non-subcommand status of a > particular operation are irrelevant. What you need is an exact Not really. When replicating you could perfectly say that you only replicate the toplevel DROP because the replica will also do the cascade dance and you might have decided not to replicate all related objects on the other side. The information you need really want not to miss is when only the cascaded object is part of the replication, not the main one. That was not covered by my previous patch but now we have a way to cover it. > description of the operation that got performed (e.g. the default on > table X column Y got dropped); you might be able to reverse-engineer > that from the parse tree, but it's much better to have the system pass > you the information you need more directly. Certainly, there are > cases where you might want to have the parse tree, or even the raw > command text, available, but I'm not even convinced that that those > cases will be the most commonly used ones unless, of course, they're > the only ones we offer, in which case everyone will go down that path > by necessity. There are far too many variants and cases of our command to be able to extract their parameters in a flat way (a bunch of variables compared to a nested description ala json or xml), and I don't think such a flat representation is going to be much better than the parse tree. Now, we will later be able to offer a normalized rewritten command string from the parse tree to the use, but I don't see us adding support for that from cascaded drops, one other reason why I like to expose them as sub commands. > Again, I'm not understanding the distinction between toplevel events > and sub-events. I don't see any need for such a distinction. I think > there are just events, and some of them happen at command start/end > and others happen somewhere in the middle. As long as it's a safe and > useful place to fire a trigger, who cares? I guess you're head is too heavily in the code side of things as opposed to the SQL user view point. Maybe my attempt to conciliate both views is not appropriate, but I really do think it is. >> Given the scope of this mini expression language, we can easily bypass >> calling the executor in v1 here, and reconsider later if we want to >> allow calling a UDF in the WHEN clause… I don't think it's an easy >> feature to add in, though. > > Or a necessary one. AFAICS, the main benefit of WHEN clauses on Exactly. > regular triggers is that you can prevent the AFTER trigger queue from > getting huge, and maybe a save a little on the cost of invoking a > trigger function just to exit again. But neither of those should be > relevant here - nobody does that much DDL, and anybody writing command > triggers should understand that this is advanced magic not intended > for beginners. Wizards below level 10 need not apply. Here it's only a facility to manage your event trigger code organization, and I insisted in having it in the syntax because in the event trigger grammar I don't see another place where to stuff which command you're targeting. The command tag certainly can not be considered as another kind of event. As much as I like the event timing idea and appreciate that it nicely solves some problem that were hairy with the previous BEFORE/AFTER too simple syntax, I want to keep it sane. An event timing is a generic named place in the code where we “hook” the execution of one or several use defined function. This place is generic because a whole series of command are providing the hooking place, such as command_start, command_end, permissions_check, consistency_check, etc etc. I foresee a matrix in the documentation in the form of timing | command_start | pcheck | check | command_end | … command / | | | | | … -----------------+---------------+--------+-------+-------------+---- GRANT | ✔ | ✘ | ✘ | ✔ CREATE TABLE | ✔ | ✔ | ✔ | ✔ ALTER TABLE | ✔ | ✔ | ✔ | ✔ … I don't think GRANT will be there in 9.2 by the way, but it helps seeing commands that won't support all of events to express the idea, I guess. > BTW, in reference to your other email, the reason I suggested > ddl_command_start rather than just command_start is because we already > know that there are cases here we'll never be able to handle, like > before-begin. I think it's better to be explicit about what the scope I agree with the fact that any given command will not be able to support any given timing spec, what we need is document them. That said, I don't see any gain in spelling command_start differently depending on whether we are applying to some kind of command or some other. > of the feature is (i.e. DDL) rather than just calling it command_start > and, oh, by the way, the commands we either don't know how to support > or didn't get around to supporting yet aren't included. The latter > approach is basically a guaranteed compatibility break every time we > get around to adding a few more commands, and I'm not keen on that. I don't buy that argument. What would be the difference between adding support for DCL commands and adding a new DDL in term of compatibility breakage for event triggers that are firing on all commands? In both cases they will get called on non anticipated events, the fact that one event is called ddl_command_start and the other dcl_command_start does not appear to me to be relevant in the least. I might be missing something though. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Mar 30, 2012 at 11:19 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Yeah context is not explicit, we could call that "toplevel": the command > tag of the command that the user typed. When toplevel is null, the event > trigger is fired on a command the user sent, when it's not null, the > trigger is fired on some inner command operation. How about calling it "command tag"? I think both context and toplevel are inconsistent with other uses of those terms: context is an error-reporting field, among other things; and we don't care about the toplevel command, just the command-tag of the one we're executing - e.g. if DROP fires a command trigger which invokes CREATE which fires another command trigger, the inner one is going to get CREATE not DROP. Or at least so I presume. >> "tag". I think for a drop trigger I would want the function to >> receive this information: type of object dropped, OID of object >> dropped, column number in the case of a column drop, flag indicating >> whether it's a toplevel drop or a cascaded drop. I wouldn't object to >> also making the currently-in-context toplevel command tag available, >> but I think most drop triggers wouldn't really care, so I wouldn't >> personally spend much implementation effort on it if it turns out to >> be hard. > > I'm not sure it would be hard as I'm only seeing a single depth possible > here, so a single per-backend string static variable would do. See above example: I am pretty sure you need a stack. >> But in general, I don't really know what a "proper" subcommand is or >> why some subcommands should be more proper than others, or why we >> should even be concerned about whether something is a subcommand at >> all. I think it's fine and useful to have triggers that fire in those >> kinds of places, but I don't see why we should limit ourselves to >> that. For applications like replication, auditing, and enhanced >> security, the parse tree and subcommand/non-subcommand status of a >> particular operation are irrelevant. What you need is an exact > > Not really. When replicating you could perfectly say that you only > replicate the toplevel DROP because the replica will also do the cascade > dance and you might have decided not to replicate all related objects on > the other side. I would not want my replication system issuing cascaded drops, because if the sides don't match it might cascade to something on the remote side that it doesn't cascade to on the local side, which exceeds my tolerance for scary behavior. > The information you need really want not to miss is when only the > cascaded object is part of the replication, not the main one. That was > not covered by my previous patch but now we have a way to cover it. Also true. >> description of the operation that got performed (e.g. the default on >> table X column Y got dropped); you might be able to reverse-engineer >> that from the parse tree, but it's much better to have the system pass >> you the information you need more directly. Certainly, there are >> cases where you might want to have the parse tree, or even the raw >> command text, available, but I'm not even convinced that that those >> cases will be the most commonly used ones unless, of course, they're >> the only ones we offer, in which case everyone will go down that path >> by necessity. > > There are far too many variants and cases of our command to be able to > extract their parameters in a flat way (a bunch of variables compared to > a nested description ala json or xml), and I don't think such a flat > representation is going to be much better than the parse tree. I strongly disagree. I think we'll find that with the right choice of hook points, the number of variables that need to be exposed is quite compact. Indeed, I'd venture to say that needing to pass lots and lots of information is evidence that you've made a poor choice of hook point. > Now, we will later be able to offer a normalized rewritten command > string from the parse tree to the use, but I don't see us adding support > for that from cascaded drops, one other reason why I like to expose them > as sub commands. > >> Again, I'm not understanding the distinction between toplevel events >> and sub-events. I don't see any need for such a distinction. I think >> there are just events, and some of them happen at command start/end >> and others happen somewhere in the middle. As long as it's a safe and >> useful place to fire a trigger, who cares? > > I guess you're head is too heavily in the code side of things as opposed > to the SQL user view point. Maybe my attempt to conciliate both views is > not appropriate, but I really do think it is. > >>> Given the scope of this mini expression language, we can easily bypass >>> calling the executor in v1 here, and reconsider later if we want to >>> allow calling a UDF in the WHEN clause… I don't think it's an easy >>> feature to add in, though. >> >> Or a necessary one. AFAICS, the main benefit of WHEN clauses on > > Exactly. > >> regular triggers is that you can prevent the AFTER trigger queue from >> getting huge, and maybe a save a little on the cost of invoking a >> trigger function just to exit again. But neither of those should be >> relevant here - nobody does that much DDL, and anybody writing command >> triggers should understand that this is advanced magic not intended >> for beginners. Wizards below level 10 need not apply. > > Here it's only a facility to manage your event trigger code > organization, and I insisted in having it in the syntax because in the > event trigger grammar I don't see another place where to stuff which > command you're targeting. The command tag certainly can not be > considered as another kind of event. No, but whether or not you mention it in the CREATE TRIGGER syntax has nothing to do with whether it's available as a magic parameter inside the procedure. Those things out to be independent. I imagine that the stuff that is accessible from inside the trigger will be richer than what you can do in the trigger syntax itself. > As much as I like the event timing idea and appreciate that it nicely > solves some problem that were hairy with the previous BEFORE/AFTER too > simple syntax, I want to keep it sane. > > An event timing is a generic named place in the code where we “hook” the > execution of one or several use defined function. This place is generic > because a whole series of command are providing the hooking place, such > as command_start, command_end, permissions_check, consistency_check, etc > etc. > > I foresee a matrix in the documentation in the form of > > > timing | command_start | pcheck | check | command_end | … > command / | | | | | … > -----------------+---------------+--------+-------+-------------+---- > GRANT | ✔ | ✘ | ✘ | ✔ > CREATE TABLE | ✔ | ✔ | ✔ | ✔ > ALTER TABLE | ✔ | ✔ | ✔ | ✔ > … > > I don't think GRANT will be there in 9.2 by the way, but it helps seeing > commands that won't support all of events to express the idea, I guess. I'm still not sold on the idea of lumping together every command under a single command_start event. I can't see anyone wanting to hook anything that broad. Don't forget that we need to document not only which triggers will fire but also what magic variables they'll get. A dcl_command_start hook could conceivably get the list of privileges being granted or revoked, but a general command_start trigger is going to need a different set of magic variables depending on the actual command type. I think it might be simpler and more clear to say that each event type provides these variables, rather than having to conditionalize it based on the command type. >> BTW, in reference to your other email, the reason I suggested >> ddl_command_start rather than just command_start is because we already >> know that there are cases here we'll never be able to handle, like >> before-begin. I think it's better to be explicit about what the scope > > I agree with the fact that any given command will not be able to support > any given timing spec, what we need is document them. That said, I don't > see any gain in spelling command_start differently depending on whether > we are applying to some kind of command or some other. > >> of the feature is (i.e. DDL) rather than just calling it command_start >> and, oh, by the way, the commands we either don't know how to support >> or didn't get around to supporting yet aren't included. The latter >> approach is basically a guaranteed compatibility break every time we >> get around to adding a few more commands, and I'm not keen on that. > > I don't buy that argument. What would be the difference between adding > support for DCL commands and adding a new DDL in term of compatibility > breakage for event triggers that are firing on all commands? In both > cases they will get called on non anticipated events, the fact that one > event is called ddl_command_start and the other dcl_command_start does > not appear to me to be relevant in the least. I might be missing > something though. See above - generally, I think that it's useful for a command trigger to know that it's being called because of a DDL event, rather than some command that could be doing anything. Also, I think that wanting to hook "all DDL commands" is likely to be a useful thing to do, and without having to explicitly list 50 command names... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > How about calling it "command tag"? I think both context and toplevel > are inconsistent with other uses of those terms: context is an > error-reporting field, among other things; and we don't care about the > toplevel command, just the command-tag of the one we're executing - > e.g. if DROP fires a command trigger which invokes CREATE which fires > another command trigger, the inner one is going to get CREATE not > DROP. Or at least so I presume. It's not about that though, it's about a DROP TYPE that cascades to a DROP FUNCTION, or a DROP SCHEMA that cascades to 10 DROP TABLE. I want to know in each cascaded DROP TABLE that it's happening as a result of DROP SCHEMA ... CASCADE, so I'm calling that a top-level command. > See above example: I am pretty sure you need a stack. In next version, certainly. As of now I'm willing to start a new stack in each command executed in a command trigger. That means 9.2 will only expose the first level of the stack, I guess. Also there's a difference in CASCADE (no new command emitted) and in an event trigger that executes a new top-level command. > I would not want my replication system issuing cascaded drops, because > if the sides don't match it might cascade to something on the remote > side that it doesn't cascade to on the local side, which exceeds my > tolerance for scary behavior. Well it depends on what you're achieving with replication, this term includes so many different use cases… What I want core to provide is the mechanism that allows implementing the replication you need. >> There are far too many variants and cases of our command to be able to >> extract their parameters in a flat way (a bunch of variables compared to >> a nested description ala json or xml), and I don't think such a flat >> representation is going to be much better than the parse tree. > > I strongly disagree. I think we'll find that with the right choice of > hook points, the number of variables that need to be exposed is quite > compact. Indeed, I'd venture to say that needing to pass lots and > lots of information is evidence that you've made a poor choice of hook > point. Currently we're exposing a very limited set of variables. So I think we're good in your book. > No, but whether or not you mention it in the CREATE TRIGGER syntax has > nothing to do with whether it's available as a magic parameter inside > the procedure. Those things out to be independent. I imagine that > the stuff that is accessible from inside the trigger will be richer > than what you can do in the trigger syntax itself. Exactly, we're in agreement here. > I'm still not sold on the idea of lumping together every command under > a single command_start event. I can't see anyone wanting to hook > anything that broad. Don't forget that we need to document not only > which triggers will fire but also what magic variables they'll get. A Yeah, and command start triggers are only going to have tag, toplevel tag or whatever the right name of that is, and parse tree if it's written in C. And that's it. The command start event trigger are typically fired directly from utility.c. > dcl_command_start hook could conceivably get the list of privileges > being granted or revoked, but a general command_start trigger is going > to need a different set of magic variables depending on the actual > command type. I think it might be simpler and more clear to say that > each event type provides these variables, rather than having to > conditionalize it based on the command type. That's going to be true for other event timing specs, but not for the command start as I picture it. > See above - generally, I think that it's useful for a command trigger > to know that it's being called because of a DDL event, rather than > some command that could be doing anything. Also, I think that wanting > to hook "all DDL commands" is likely to be a useful thing to do, and > without having to explicitly list 50 command names... Yeah, just omit the WHEN clause then. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Apr 1, 2012 at 7:22 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> See above example: I am pretty sure you need a stack. > > In next version, certainly. As of now I'm willing to start a new stack > in each command executed in a command trigger. That means 9.2 will only > expose the first level of the stack, I guess. I think we're talking past each other. If someone executes DDL command A and the command trigger executes DDL command B which fires another command trigger, then the command trigger for A needs to see the information relevant to A both before and after the command trigger for B executes. So you can't just store all the context information in a flat global variable, because otherwise when the trigger for B executes it will clobber the values for A. You need to do a save/restore so that when the execution of the trigger on A resumes, it still sees the right stuff. > Well it depends on what you're achieving with replication, this term > includes so many different use cases… What I want core to provide is the > mechanism that allows implementing the replication you need. Agreed. >> See above - generally, I think that it's useful for a command trigger >> to know that it's being called because of a DDL event, rather than >> some command that could be doing anything. Also, I think that wanting >> to hook "all DDL commands" is likely to be a useful thing to do, and >> without having to explicitly list 50 command names... > > Yeah, just omit the WHEN clause then. Well, that's "absolutely everything" rather than just "all DDL". What's the use case for that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I think we're talking past each other. If someone executes DDL > command A and the command trigger executes DDL command B which fires > another command trigger, then the command trigger for A needs to see > the information relevant to A both before and after the command > trigger for B executes. So you can't just store all the context > information in a flat global variable, because otherwise when the > trigger for B executes it will clobber the values for A. You need to > do a save/restore so that when the execution of the trigger on A > resumes, it still sees the right stuff. Agreed. I was trying to say that we won't have all of this in 9.2. >> Yeah, just omit the WHEN clause then. > > Well, that's "absolutely everything" rather than just "all DDL". > What's the use case for that? You can still filter in the trigger code. A use case is forbidding any command that's not purely data related to happen in certain cases, like live server migrations. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support