Thread: Command Triggers
Hi, As proposed by Jan Wieck on hackers and discussed in Ottawa at the Clusters Hackers Meeting, the most (only?) workable way to ever have DDL triggers is to have "command triggers" instead. Rather than talking about catalogs and the like, it's all about firing a registered user-defined function before or after processing the utility command, or instead of processing it. This naturally involves a new catalog (pg_cmdtrigger) and some new subcommands of CREATE and DROP TRIGGER, and some support code for calling the function at the right time. The right place to hook this code in seems to be ProcessUtility(), which is a central place for about all the commands that we handle. An exception to that rule would be SELECT INTO and CREATE TABLE AS SELECT, and my proposal would be to add specific call sites to the functions I've provided in the attached patch rather than try to contort them into being a good citizen as a utility command. The ProcessUtility() integration currently looks like that: const char *commandTag = CreateCommandTag(parsetree); if (ExecBeforeOrInsteadOfCommandTriggers(parsetree, commandTag) > 0) return; ... current code ... ExecAfterCommandTriggers(parsetree, commandTag); So I guess adding some call sites is manageable. Now, the patch contains a Proof Of Concept implementation with a small level of documentation and regression tests in order to get an agreement on the principles that we already discussed. The other part of the command trigger facility is what information we should give to the user-defined functions, and in which form. I've settled on passing always 4 text arguments: the command string, the parse tree as a nodeToString() representation (Jan believes that this is the easiest form we can provide for code consumption, and I tend to agree), the schema name or NULL if the targeted object of the command is not qualified, and the object name (or NULL if that does not apply). CREATE FUNCTION cmdtrigger_notice ( IN cmd_string text, IN cmd_nodestring text, IN schemaname text, IN relname text ) RETURNS void LANGUAGE plpgsql AS $$ BEGIN RAISE NOTICE 'cmd_string: %', cmd_string; END; $$; CREATE TRIGGER cmdtrigger_notice AFTER COMMAND CREATE TABLE EXECUTE PROCEDURE cmdtrigger_notice(); The v1 patch attached contains implementation for some commands only. We need to add rewriting facilities for those commands we want to support in the proposed model, because of multi-queries support in the protocol and dynamic queries in EXECUTE e.g. (though I admit not having had a look at EXECUTE implementation). So each supported command has a cost, and the ones I claim to support in the grammar in the patch are not seeing a complete support: first, I'm missing some outfuncs and readfuncs (but I believe we can complete that using a script on the source code) so that the cmd_nodestring argument is currently always NULL. Second, I didn't complete the rewriting of some more complex commands such as CREATE TABLE and ALTER TABLE. Note that we can't reuse that much of ruleutils.c because it's written to work from what we store in the catalogs rather than from a parsetree object. So, any comment on the approach before I complete the rewriting of the commands, the out/read funcs, and add more commands to the trigger support code? https://github.com/dimitri/postgres/compare/master...command_triggers $ git diff --stat master.. doc/src/sgml/ref/create_trigger.sgml | 97 +++++- doc/src/sgml/ref/drop_trigger.sgml | 19 +- src/backend/catalog/Makefile | 2 +- src/backend/catalog/dependency.c | 47 ++- src/backend/catalog/objectaddress.c | 8 + src/backend/commands/Makefile | 4 +- src/backend/commands/cmdtrigger.c | 580 ++++++++++++++++++++++++++ src/backend/nodes/copyfuncs.c | 32 ++ src/backend/nodes/equalfuncs.c | 28 ++ src/backend/nodes/outfuncs.c | 405 ++++++++++++++++++ src/backend/parser/gram.y | 70 +++- src/backend/tcop/utility.c | 44 ++ src/backend/utils/adt/ruleutils.c | 613 +++++++++++++++++++++++++++- src/include/catalog/dependency.h | 1 + src/include/catalog/indexing.h | 5 + src/include/catalog/pg_cmdtrigger.h | 59 +++ src/include/commands/cmdtrigger.h | 43 ++ src/include/commands/defrem.h | 14 + src/include/nodes/nodes.h | 2 + src/include/nodes/parsenodes.h | 28 ++ src/include/parser/kwlist.h | 1 + src/test/regress/expected/sanity_check.out | 3 +- src/test/regress/expected/triggers.out | 35 ++ src/test/regress/sql/triggers.sql | 35 ++ 24 files changed, 2157 insertions(+), 18 deletions(-) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > As proposed by Jan Wieck on hackers and discussed in Ottawa at the > Clusters Hackers Meeting, the most (only?) workable way to ever have DDL > triggers is to have "command triggers" instead. [...] > The v1 patch attached contains implementation for some commands only. Here's a v2 which is only about cleaning up the merge from master. The recent DROP rework conflicted badly with the command triggers patch, quite obviously. I didn't reimplement DROP COMMAND TRIGGER in terms of the new facilities yet, though. > So, any comment on the approach before I complete the rewriting of the > commands, the out/read funcs, and add more commands to the trigger > support code? > > https://github.com/dimitri/postgres/compare/master...command_triggers FWIW (scheduling mainly), I don't think I'm going to spend more time on this work until I get some comments, because all that remains to be done is about building on top of what I've already been doing here. Well, I'd like to implement command triggers on replica too, but baring psql support and a bunch of commands not there yet, that's about it as far as features go. Oh, and have expressions rewriting from the parsetree (default, check) actually work (rather than segfault) is still a TODO too. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
Hi Dimitri, Hi all, On Tuesday, November 08, 2011 06:47:13 PM Dimitri Fontaine wrote: > As proposed by Jan Wieck on hackers and discussed in Ottawa at the > Clusters Hackers Meeting, the most (only?) workable way to ever have DDL > triggers is to have "command triggers" instead. > Rather than talking about catalogs and the like, it's all about firing a > registered user-defined function before or after processing the utility > command, or instead of processing it. This naturally involves a new > catalog (pg_cmdtrigger) and some new subcommands of CREATE and DROP > TRIGGER, and some support code for calling the function at the right > time. Great ;) fwiw I think thats the way forward as well. The patch obviously isn't thought to be commitable yet, so I am not going to do a very detailed code level review. Attached is a patch with low level targeted comments and such I noticed while reading it. At this state the important things are highlevel so I will try to concentrate on those: == the trigger function == I don't like the set of options parsed to the trigger functions very much. To cite an example of yours those currently are: * cmd_string text * cmd_nodestring text * schemaname text * relname text I think at least a * command_tag text is missing. Sure, you can disambiguate it by creating a function for every command type but that seems pointlessly complex for many applications. And adding it should be trivial as its already tracked ;) Currently the examples refer to relname as relname which I dislike as that seems to be too restricted to one use case. The code is already doing it correctly (as objectname) though ;) Why is there explicit documentation of not checking the arguments of said trigger function? That seems to be a bit strange to me. schemaname currently is mightily unusable because whether its sent or not depends wether the table was created with a schemaname specified or not. That doesn't seem to be a good approach. That brings me to the next point: == normalization of statements == Currently the normalization is a bit confusing. E.g. for: CREATE SCHEMA barf; SET search_path = barf; CREATE TYPE bart AS ENUM ('a', 'b'); CREATE TABLE bar(a int, b bigint, c serial, d text, e varchar, "F" text, g bart, h int4); one gets CREATE TABLE bar (a pg_catalog.int4,b pg_catalog.int8,c serial,d text,e pg_catalog.varchar,F text,g bart,h int4); Which points out two problems: * inconsistent schema prefixing * no quoting Imo the output should fully schema qualify anything including operators. Yes, thats rather ugly but anything else seems to be too likely to lead to problems. As an alternative it would be possible to pass the current search path but that doesn't seem to the best approach to me; Currently CHECK() constraints are not decodable, but inside those the same quoting/qualifying rules should apply. Then there is nice stuff like CREATE TABLE .... (LIKE ...); What shall that return in future? I vote for showing it as the plain CREATE TABLE where all columns are specified. That touches another related topic. Dim's work in adding support for utility cmd support in nodeToString and friends possibly will make the code somewhat command trigger specific. Is there any other usage we envision? == grammar == * multiple actions: I think it would sensible to allow multiple actions on which to trigger to be specified just as you can for normal triggers. I also would like an ALL option * options: Currently the grammar allows options to be passed to command triggers. Do we want to keep that? If yes, with the same arcane set of datatypes as in data triggers? If yes it needs to be wired up. * schema qualification: An option to add triggers only to a specific schema would be rather neat for many scenarios. I am not totally sure if its worth the hassle of defining what happens in the edge cases of e.g. moving from one into another schema though. == locking == Currently there is no locking strategy at all. Which e.g. means that there is no protection against two sessions changing stuff concurrently or such. Was that just left out - which is ok - or did you miss that? I think it would be ok to just always grab row level locks there. == permissions == Command triggers should only be allowed for the database owner. == recursion == I contrast to data triggers command triggers cannot change what is done. They can either prevent it from running or not. Which imo is fine. The question I have is whether it would be a good idea to make it easy to prevent recursion.... Especially if the triggers wont be per schema. == calling the trigger == Imo the current callsite in ProcessUtility isn't that good. I think it would make more sense moving it into standard_ProcessUtility. It should be *after* the check_xact_readonly check in there in my opinion. I am also don't think its a good idea to run the ExecBeforeOrInsteadOfCommandTriggers stuff there because thats allso the path that COMMIT/BEGIN and other stuff take. Those are pretty "fast path". I suggest making two switches in standard_ProcessUtility, one for the non- command trigger stuff which returns on success and one after that. Only after the first triggers are checked. I wonder whether its the right choice to run triggers on untransformed input. I.e. CREATE TABLE ... (LIKE ...) seems to only make sense to me after transforming. Also youre very repeatedly transforming the parstree into a string. It would be better doing that only once instead of every trigger... Ok, thats the first round of high level stuff... Cool Work! Some lower level annotations: * many functions should be static but are not. Especially in cmdtrigger.c * Either tgenable's datatype or its readfunc is wrong (watch the compiler ;)) * ruleutils.c: * generic routine for IF EXISTS, CASCADE, ... Greetings, Andres
Hi, On Tuesday, November 08, 2011 06:47:13 PM Dimitri Fontaine wrote: > exception to that rule would be SELECT INTO and CREATE TABLE AS SELECT, > and my proposal would be to add specific call sites to the functions > I've provided in the attached patch rather than try to contort them into > being a good citizen as a utility command. I would very much prefer to make them utility commands and rip out the executor support for that. It doesn't look that complicated and would allow us to get rid of the partially duplicated defineRelation and related stuff from the executor where its violating my aestetic feelings ;) Is there something making that especially hard that I overlooked? The infrastructure for doing so seems to be there. Andres
Andres Freund <andres@anarazel.de> writes: > On Tuesday, November 08, 2011 06:47:13 PM Dimitri Fontaine wrote: >> exception to that rule would be SELECT INTO and CREATE TABLE AS SELECT, >> and my proposal would be to add specific call sites to the functions >> I've provided in the attached patch rather than try to contort them into >> being a good citizen as a utility command. > I would very much prefer to make them utility commands and rip out the > executor support for that. > Is there something making that especially hard that I overlooked? The > infrastructure for doing so seems to be there. Well, I think the main problem is going to be shunting the query down the right parsing track (SELECT versus utility-statement) early enough. Making this work cleanly would be a bigger deal than I think you're thinking. But yeah, it would be nicer if the executor didn't have to worry about this. +1 if you're willing to take on the work. regards, tom lane
On Thursday, December 01, 2011 07:21:25 PM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On Tuesday, November 08, 2011 06:47:13 PM Dimitri Fontaine wrote: > >> exception to that rule would be SELECT INTO and CREATE TABLE AS SELECT, > >> and my proposal would be to add specific call sites to the functions > >> I've provided in the attached patch rather than try to contort them into > >> being a good citizen as a utility command. > > > > I would very much prefer to make them utility commands and rip out the > > executor support for that. > > Is there something making that especially hard that I overlooked? The > > infrastructure for doing so seems to be there. > > Well, I think the main problem is going to be shunting the query down > the right parsing track (SELECT versus utility-statement) early enough. > Making this work cleanly would be a bigger deal than I think you're > thinking. Yes. I forgot how ugly SELECT INTO is... > But yeah, it would be nicer if the executor didn't have to worry about > this. +1 if you're willing to take on the work. I won't promise anything but I will play around with the grammar and see if I find a nice enough way. Andres
On Thursday, December 01, 2011 07:21:25 PM Tom Lane wrote: > Well, I think the main problem is going to be shunting the query down > the right parsing track (SELECT versus utility-statement) early enough. > Making this work cleanly would be a bigger deal than I think you're > thinking. Obviously that depends on the definition of clean... Changing the grammar to make that explicit seems to involve a bit too many changes on a first glance. The cheap way out seems to be to make the decision in analyze.c:transformQuery. Would that be an acceptable way forward? Andres
On Sat, Nov 26, 2011 at 7:20 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > FWIW (scheduling mainly), I don't think I'm going to spend more time on > this work until I get some comments, because all that remains to be done > is about building on top of what I've already been doing here. +1 overall One of the main use cases for me is the ability to * prevent all commands * prevent extension commands - to maintain stable execution environment Those are especially important because in 9.2 DDL commands will cause additional locking overheads, so preventing DDL will be essential to keeping performance stable in high txn rate databases. So I'd like to see a few more triggers that work out of the box for those cases (perhaps wrapped by a function?). It would also allow a more useful man page example of how to use this. On the patch, header comment for cmdtrigger.c needs updating. Comments for DropCmdTrigger etc look like you forgot to update them after cut and paste. Comments say "For any given command tag, you can have either Before and After triggers, orInstead Of triggers, not both.", which seems like a great thing to put in the docs. Looks good. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@anarazel.de> writes: > On Thursday, December 01, 2011 07:21:25 PM Tom Lane wrote: >> Making this work cleanly would be a bigger deal than I think you're >> thinking. > Obviously that depends on the definition of clean... > Changing the grammar to make that explicit seems to involve a bit too many > changes on a first glance. The cheap way out seems to be to make the decision > in analyze.c:transformQuery. > Would that be an acceptable way forward? ITYM transformStmt, but yeah, somewhere around there is probably reasonable. The way I'm imagining this would work is that IntoClause disappears from Query altogether: analyze.c would build a utility statement CreateTableAs, pull the IntoClause out of the SelectStmt structure and put it into the utility statement, and then proceed much as we do for ExplainStmt (and for the same reasons). regards, tom lane
Hi, Hm. I just noticed a relatively big hole in the patch: The handling of deletion of dependent objects currently is nonexistant because they don't go through ProcessUtility... Currently thinking what the best way to nicely implement that is. For other commands the original command string is passed - that obviously won't work for that case... Andres
Hi all, There is also the point about how permission checks on the actual commands (in comparison of modifying command triggers) and such are handled: BEFORE and INSTEAD will currently be called independently of the fact whether the user is actually allowed to do said action (which is inconsistent with data triggers) and indepentent of whether the object they concern exists. I wonder if anybody considers that a problem? Andres
Hi, First thing first: thank you Andres for a great review, I do appreciate it. Please find attached a newer version of the patch. The github repository is also updated. Andres Freund <andres@anarazel.de> writes: > I think at least a > * command_tag text Added. > Why is there explicit documentation of not checking the arguments of said > trigger function? That seems to be a bit strange to me. The code is searching for a function with the given name and 5 text arguments, whatever you say in the command. That could be changed later on if there's a need. > schemaname currently is mightily unusable because whether its sent or not > depends wether the table was created with a schemaname specified or not. That > doesn't seem to be a good approach. I'm not sure about that, we're spiting out what the user entered. > Imo the output should fully schema qualify anything including operators. Yes, > thats rather ugly but anything else seems to be too likely to lead to > problems. Will look into qualifying names. > As an alternative it would be possible to pass the current search path but > that doesn't seem to the best approach to me; The trigger runs with the same search_path settings as the command, right? > Then there is nice stuff like CREATE TABLE .... (LIKE ...); > What shall that return in future? I vote for showing it as the plain CREATE > TABLE where all columns are specified. I don't think so. Think about the main use cases for this patch, auditing and replication. Both cases you want to deal with what the user said, not what the system understood. > I think it would sensible to allow multiple actions on which to trigger to be > specified just as you can for normal triggers. I also would like an ALL option Both are now implemented. When dealing with an ANY command trigger, your procedure can get fired on a command for which we don't have internal support for back parsing etc, so you only get the command tag not null. I think that's the way to go here, as I don't want to think we need to implement full support for command triggers on ALTER OPERATOR FAMILY from the get go. > Currently the grammar allows options to be passed to command triggers. Do we > want to keep that? If yes, with the same arcane set of datatypes as in data > triggers? > If yes it needs to be wired up. In fact it's not the case, you always get called with the same 5 arguments, all nullable now that we have ANY COMMAND support. > * schema qualification: > An option to add triggers only to a specific schema would be rather neat for > many scenarios. > I am not totally sure if its worth the hassle of defining what happens in the > edge cases of e.g. moving from one into another schema though. I had written down support for a WHEN clause in command triggers, but the exact design has yet to be worked out. I'd like to have this facility but I'd like it even more if that could happen in a later patch. > Currently there is no locking strategy at all. Which e.g. means that there is > no protection against two sessions changing stuff concurrently or such. > > Was that just left out - which is ok - or did you miss that? I left out the locking as of now. I tried to fix some of it in the new patch though, but I will need to spend more time on that down the road. > Command triggers should only be allowed for the database owner. Yes, that needs to happen soon, along with pg_dump and psql support. > I contrast to data triggers command triggers cannot change what is done. They > can either prevent it from running or not. Which imo is fine. > The question I have is whether it would be a good idea to make it easy to > prevent recursion.... Especially if the triggers wont be per schema. You already have ATER TRIGGER foo ON COMMAND create table DISABLE; that you can use from the trigger procedure itself. Of course, locking is an issue if you want to go parallel on running commands with recursive triggers attached. I think we might be able to skip solving that problem in the first run. > Imo the current callsite in ProcessUtility isn't that good. I think it would > make more sense moving it into standard_ProcessUtility. It should be *after* > the check_xact_readonly check in there in my opinion. Makes sense, will fix in the next revision. > I am also don't think its a good idea to run the > ExecBeforeOrInsteadOfCommandTriggers stuff there because thats allso the path > that COMMIT/BEGIN and other stuff take. Those are pretty "fast path". > > I suggest making two switches in standard_ProcessUtility, one for the non- > command trigger stuff which returns on success and one after that. Only after > the first triggers are checked. Agreed. > Also youre very repeatedly transforming the parstree into a string. It would > be better doing that only once instead of every trigger... It was only done once per before and instead of triggers (you can't have both on the same command), and another time for any and all after triggers, meaning at most twice. It's now down to only once, at most. > * many functions should be static but are not. Especially in cmdtrigger.c Fixed. > * Either tgenable's datatype or its readfunc is wrong (watch the compiler ;)) > * ruleutils.c: > * generic routine for IF EXISTS, CASCADE, ... Will have to wait until next revision. Thanks you again for a great review, regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
Excerpts from Andres Freund's message of vie dic 02 19:09:47 -0300 2011: > Hi all, > > There is also the point about how permission checks on the actual commands (in > comparison of modifying command triggers) and such are handled: > > BEFORE and INSTEAD will currently be called independently of the fact whether > the user is actually allowed to do said action (which is inconsistent with > data triggers) and indepentent of whether the object they concern exists. > > I wonder if anybody considers that a problem? Hmm, we currently even have a patch (or is it already committed?) to avoid locking objects before we know the user has permission on the object. Getting to the point of calling the trigger would surely be even worse. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Dec 2, 2011 at 7:09 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Hmm, we currently even have a patch (or is it already committed?) to > avoid locking objects before we know the user has permission on the > object. Getting to the point of calling the trigger would surely be > even worse. I committed a first round of cleanup in that area, but unfortunately there is a lot more to be done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Saturday, December 03, 2011 01:09:48 AM Alvaro Herrera wrote: > Excerpts from Andres Freund's message of vie dic 02 19:09:47 -0300 2011: > > Hi all, > > > > There is also the point about how permission checks on the actual > > commands (in comparison of modifying command triggers) and such are > > handled: > > > > BEFORE and INSTEAD will currently be called independently of the fact > > whether the user is actually allowed to do said action (which is > > inconsistent with data triggers) and indepentent of whether the object > > they concern exists. > > > > I wonder if anybody considers that a problem? > > Hmm, we currently even have a patch (or is it already committed?) to > avoid locking objects before we know the user has permission on the > object. Getting to the point of calling the trigger would surely be > even worse. Well, calling the trigger won't allow them to lock the object. It doesn't even confirm the existance of the table. Andres
Simon Riggs <simon@2ndQuadrant.com> writes: > Those are especially important because in 9.2 DDL commands will cause > additional locking overheads, so preventing DDL will be essential to > keeping performance stable in high txn rate databases. The patch now implements "any command" triggers, and you have the command tag to branch in the function if you need to. CREATE TRIGGER noddl INSTEAD OF ANY COMMAND EXECUTE PROCEDURE cancel_any_ddl(); > So I'd like to see a few more triggers that work out of the box for > those cases (perhaps wrapped by a function?). It would also allow a > more useful man page example of how to use this. We could solve that by providing an extension implementing command triggers ready for use. One that allows to easily switch on and off the capability of running commands seems like a good candidate. That said, with the current coding, you can't have both a instead of trigger on "any command" and a before or after trigger on any given command, so installing that extension would prevent you from any other usage of command triggers. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Saturday, December 03, 2011 12:04:57 AM Dimitri Fontaine wrote: > Hi, > > First thing first: thank you Andres for a great review, I do appreciate > it. Please find attached a newer version of the patch. The github > repository is also updated. > > Andres Freund <andres@anarazel.de> writes: > > I think at least a > > * command_tag text > > Added. > > > Why is there explicit documentation of not checking the arguments of said > > trigger function? That seems to be a bit strange to me. > The code is searching for a function with the given name and 5 text > arguments, whatever you say in the command. That could be changed later > on if there's a need. Forget it, i was too dumb to remember the code when I wrote the above ;) > > schemaname currently is mightily unusable because whether its sent or not > > depends wether the table was created with a schemaname specified or not. > > That doesn't seem to be a good approach. > I'm not sure about that, we're spiting out what the user entered. No, were not. Were writing out something that semantically the same what the user entered. At several places NULL/NOT NULL and such get added. > > As an alternative it would be possible to pass the current search path > > but that doesn't seem to the best approach to me; > The trigger runs with the same search_path settings as the command, right? For many things it makes sense to specify the search path of a function uppon creation of the function. Even moreso if you have security definer functions... Otherwise you can get into troubles via operators and such... > > Then there is nice stuff like CREATE TABLE .... (LIKE ...); > > What shall that return in future? I vote for showing it as the plain > > CREATE TABLE where all columns are specified. > I don't think so. Think about the main use cases for this patch, > auditing and replication. Both cases you want to deal with what the > user said, not what the system understood. Is that so? In the replication case the origin table very well may look differently on the master compared to the standby. Which is about impossible to diagnose with the above behaviour. > > I think it would sensible to allow multiple actions on which to trigger > > to be specified just as you can for normal triggers. I also would like > > an ALL option > Both are now implemented. Cool, will check. > When dealing with an ANY command trigger, your procedure can get fired > on a command for which we don't have internal support for back parsing > etc, so you only get the command tag not null. I think that's the way to > go here, as I don't want to think we need to implement full support for > command triggers on ALTER OPERATOR FAMILY from the get go. Thats ok with me. > > Currently the grammar allows options to be passed to command triggers. Do > > we want to keep that? If yes, with the same arcane set of datatypes as > > in data triggers? > > If yes it needs to be wired up. > In fact it's not the case, you always get called with the same 5 > arguments, all nullable now that we have ANY COMMAND support. I think you misunderstood me. Check the following excerpt from gram.y: CreateCmdTrigStmt: CREATE TRIGGER name TriggerActionTime COMMAND trigger_command_list EXECUTE PROCEDURE func_name'(' TriggerFuncArgs ')' You have TriggerfuncArgs there but you ignore them. > > * schema qualification: > > An option to add triggers only to a specific schema would be rather neat > > for many scenarios. > > I am not totally sure if its worth the hassle of defining what happens in > > the edge cases of e.g. moving from one into another schema though. > I had written down support for a WHEN clause in command triggers, but > the exact design has yet to be worked out. I'd like to have this > facility but I'd like it even more if that could happen in a later > patch. Hm. I am not sure a generic WHEN clause is needed. In my opinion schema qualification would be enough... A later patch is fine imo. > > I contrast to data triggers command triggers cannot change what is done. > > They can either prevent it from running or not. Which imo is fine. > > The question I have is whether it would be a good idea to make it easy to > > prevent recursion.... Especially if the triggers wont be per schema. > > You already have > > ATER TRIGGER foo ON COMMAND create table DISABLE; > > that you can use from the trigger procedure itself. Of course, locking > is an issue if you want to go parallel on running commands with > recursive triggers attached. I think we might be able to skip solving > that problem in the first run. hm. Not yet convinced. I need to think about it a bit. Have fun ;) Andres
On Friday, December 02, 2011 03:09:55 AM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On Thursday, December 01, 2011 07:21:25 PM Tom Lane wrote: > >> Making this work cleanly would be a bigger deal than I think you're > >> thinking. > > > > Obviously that depends on the definition of clean... > > > > Changing the grammar to make that explicit seems to involve a bit too > > many changes on a first glance. The cheap way out seems to be to make > > the decision in analyze.c:transformQuery. > > Would that be an acceptable way forward? > > ITYM transformStmt, but yeah, somewhere around there is probably > reasonable. The way I'm imagining this would work is that IntoClause > disappears from Query altogether: analyze.c would build a utility > statement > CreateTableAs, pull the IntoClause out of the SelectStmt structure and > put it into the utility statement, and then proceed much as we do for > ExplainStmt (and for the same reasons). Ok. Because my book turned out to be boring I started looking at this. I wonder if wouldn't be better to do check in directly in raw_parser(). The advantage would be that that magically would fix the issue of not logging CTAS/select when using log_statement = ddl and we don't need a snapshot or such so that shouldn't be a problem. Any arguments against doing so? Andres
On Friday, December 02, 2011 03:09:55 AM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On Thursday, December 01, 2011 07:21:25 PM Tom Lane wrote: > >> Making this work cleanly would be a bigger deal than I think you're > >> thinking. > > > > Obviously that depends on the definition of clean... > > > > Changing the grammar to make that explicit seems to involve a bit too > > many changes on a first glance. The cheap way out seems to be to make > > the decision in analyze.c:transformQuery. > > Would that be an acceptable way forward? > > ITYM transformStmt, but yeah, somewhere around there is probably > reasonable. The way I'm imagining this would work is that IntoClause > disappears from Query altogether: analyze.c would build a utility > statement > CreateTableAs, pull the IntoClause out of the SelectStmt structure and > put it into the utility statement, and then proceed much as we do for > ExplainStmt (and for the same reasons). I have a patch which basically does all this minus some cleanup. I currently do the the differentiation for SELECT INTO (not CREATE AS, thats done in gram.y) in raw_parser but thats quick to move if others don't aggree on that. I have two questions now: First, does anybody think it would be worth getting rid of the duplication from OpenIntoRel (formerly from execMain.c) in regard to DefineRelation()? I noticed that there already is some diversion between both. E.g. CREATE TABLE frak TABLESPACE pg_global AS SELECT 1; is possible while it would be forbidden via a plain CREATE TABLE. (I will send a fix for this separately). Secondly, I am currently wondering whether it would be a good idea to use the ModifyTable infrastructure for doing the insertion instead an own DestReceiver infrastructure thats only used for CTAS. It looks a bit too complicated to do this without removing the bulk insert and HEAP_INSERT_SKIP_WAL optimizations. Andres
Andres Freund <andres@anarazel.de> writes: > I have two questions now: > First, does anybody think it would be worth getting rid of the duplication > from OpenIntoRel (formerly from execMain.c) in regard to DefineRelation()? That's probably reasonable to do, since as you say it would remove the opportunity for bugs-of-omission in the CTAS table creation step. OTOH, if you find yourself having to make any significant changes to DefineRelation, then maybe not. > Secondly, I am currently wondering whether it would be a good idea to use the > ModifyTable infrastructure for doing the insertion instead an own DestReceiver > infrastructure thats only used for CTAS. I think this is probably a bad idea; it will complicate matters and buy little. There's not a lot of stuff needed for the actual data insertion step, since we know the table can't have any defaults, constraints, triggers, etc as yet. regards, tom lane
Hi, Attached is a first version of the patch. On Sunday, December 04, 2011 05:34:44 PM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I have two questions now: > > > > First, does anybody think it would be worth getting rid of the > > duplication from OpenIntoRel (formerly from execMain.c) in regard to > > DefineRelation()? > > That's probably reasonable to do, since as you say it would remove the > opportunity for bugs-of-omission in the CTAS table creation step. > OTOH, if you find yourself having to make any significant changes to > DefineRelation, then maybe not. Building a CreateStmt seems to work well enough so far. The only problem with that approach so far that I found is that: CREATE TABLE collate_test2 ( a int, b textCOLLATE "POSIX" ); CREATE TABLE collate_test1 ( a int, b textCOLLATE "C" NOT NULL ); CREATE TABLE test_u AS SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2; -- fail failed with: ERROR: no collation was derived for column "b" with collatable type text HINT: Use the COLLATE clause to set the collation explicitly. "works" now. I am currently setting ColumnDef.collOid of new collumns to attcollation of the QueryDesc's column. Unfortunately they have a different meaning... > > Secondly, I am currently wondering whether it would be a good idea to use > > the ModifyTable infrastructure for doing the insertion instead an own > > DestReceiver infrastructure thats only used for CTAS. > I think this is probably a bad idea; it will complicate matters and buy > little. There's not a lot of stuff needed for the actual data insertion > step, since we know the table can't have any defaults, constraints, > triggers, etc as yet. I got to the same conclusion. Remaining problems are: * how to tell ExecContextForcesOids which oid we want * implementing CREATE TABLE AS ... EXECUTE without duplicating ExecuteQuery * the attcollation setting problems from above * returning better error messages for using INTO at places its not allowed Comments about the direction of the patch? Andres
On Sat, Dec 03, 2011 at 01:26:22AM +0100, Andres Freund wrote: > On Saturday, December 03, 2011 01:09:48 AM Alvaro Herrera wrote: > > Excerpts from Andres Freund's message of vie dic 02 19:09:47 -0300 2011: > > > Hi all, > > > > > > There is also the point about how permission checks on the actual > > > commands (in comparison of modifying command triggers) and such are > > > handled: > > > > > > BEFORE and INSTEAD will currently be called independently of the fact > > > whether the user is actually allowed to do said action (which is > > > inconsistent with data triggers) and indepentent of whether the object > > > they concern exists. > > > > > > I wonder if anybody considers that a problem? > > > > Hmm, we currently even have a patch (or is it already committed?) to > > avoid locking objects before we know the user has permission on the > > object. Getting to the point of calling the trigger would surely be > > even worse. > Well, calling the trigger won't allow them to lock the object. It doesn't even > confirm the existance of the table. > didn't I see a discussion in passing about the possibility of using these command triggers to implement some aspects of se-pgsql? In that case, you'd need the above behavior. Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 Connexions http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
Hi, Please find an update attached, v4, fixing most remaining items. Next steps are better docs and more commands support (along with finishing currently supported ones), and a review locking behavior. If you want to just scroll over the patch to get an impression of what's in there rather than try out the attachment, follow this URL: https://github.com/dimitri/postgres/compare/master...command_triggers Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Will look into qualifying names. I'm now qualifying relation names even if they have not been entered with a namespace qualifier. What do you think? The other components are left alone, I think the internal APIs for qualifying all kind of objects from the parse tree and current context are mostly missing. >> As an alternative it would be possible to pass the current search path but >> that doesn't seem to the best approach to me; > > The trigger runs with the same search_path settings as the command, right? Maybe that's good enough for command triggers? >> Command triggers should only be allowed for the database owner. > > Yes, that needs to happen soon, along with pg_dump and psql support. All three are implemented in the attached new revision of the patch. >> Imo the current callsite in ProcessUtility isn't that good. I think it would >> make more sense moving it into standard_ProcessUtility. It should be *after* >> the check_xact_readonly check in there in my opinion. > > Makes sense, will fix in the next revision. Done too. It's better this way, thank you. >> I am also don't think its a good idea to run the >> ExecBeforeOrInsteadOfCommandTriggers stuff there because thats allso the path >> that COMMIT/BEGIN and other stuff take. Those are pretty "fast path". >> >> I suggest making two switches in standard_ProcessUtility, one for the non- >> command trigger stuff which returns on success and one after that. Only after >> the first triggers are checked. > > Agreed. That's about the way I've done it. Please note that doing it this way means that a ProcessUtility_hook can decide whether or not the command triggers are going to be fired or not, and that's true for BEFORE, AFTER and INSTEAD OF command triggers. I think that's the way to go, though. >> * Either tgenable's datatype or its readfunc is wrong (watch the compiler ;)) >> * ruleutils.c: >> * generic routine for IF EXISTS, CASCADE, ... > > Will have to wait until next revision. Fixed. Well, the generic routine would only be called twice and would only share a rather short expression, so will have to wait until I add support for more commands. There's a regression tests gotcha. Namely that the parallel running of triggers against inheritance makes it impossible to predict if the trigger on the command CREATE TABLE will spit out a notice in the inherit tests. I don't know how that is usually avoided, but I guess it involves picking some other command example that don't conflict with the parallel tests of that section? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
Hi Peter, On Sunday, December 04, 2011 08:01:34 PM Andres Freund wrote: > On Sunday, December 04, 2011 05:34:44 PM Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > I have two questions now: > > > > > > First, does anybody think it would be worth getting rid of the > > > duplication from OpenIntoRel (formerly from execMain.c) in regard to > > > DefineRelation()? > > > > That's probably reasonable to do, since as you say it would remove the > > opportunity for bugs-of-omission in the CTAS table creation step. > > OTOH, if you find yourself having to make any significant changes to > > DefineRelation, then maybe not. > Building a CreateStmt seems to work well enough so far. > The only problem with that approach so far that I found is that: > CREATE TABLE collate_test2 > ( > a int, > b text COLLATE "POSIX" > ); > > CREATE TABLE collate_test1 > ( > a int, > b text COLLATE "C" NOT NULL > ); > > CREATE TABLE test_u AS SELECT a, b FROM collate_test1 UNION ALL SELECT a, > b FROM collate_test2; -- fail > > failed with: > ERROR: no collation was derived for column "b" with collatable type text > HINT: Use the COLLATE clause to set the collation explicitly. > "works" now. Could you explain why the above should fail? After all the UNION is valid outside the CREATE TABLE and you can even sort on b. That tidbit is he only thing I couldn't quickly solve since the last submission... Andres
Hi, Andres Freund <andres@anarazel.de> writes: > Hm. I just noticed a relatively big hole in the patch: The handling of > deletion of dependent objects currently is nonexistant because they don't go > through ProcessUtility... That's the reason why we're talking about “command triggers” rather than “DDL triggers”. We don't intend to fire the triggers at each DDL operation happening on the server, but for each command. This restriction still allows us to provide a very useful feature when checked against the main use cases we target here. Those are auditing, and replication (the replay will also CASCADEs), and a generic enough SUDO facility (because the trigger function can well be SECURITY DEFINER). We could also add a 'cascading bool' parameter to the trigger function API and have that always false in 9.2, then choose what to fill the other parameters with in a later release. The obvious risk would be to decide that we need another API, then we didn't make a good move after all. My current feeling and vote is thus to leave that alone and document the restriction. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On sön, 2011-12-11 at 04:26 +0100, Andres Freund wrote: > > Building a CreateStmt seems to work well enough so far. > > The only problem with that approach so far that I found is that: > > > CREATE TABLE collate_test2 > > ( > > a int, > > b text COLLATE "POSIX" > > ); > > > > CREATE TABLE collate_test1 > > ( > > a int, > > b text COLLATE "C" NOT NULL > > ); > > > > CREATE TABLE test_u AS SELECT a, b FROM collate_test1 UNION ALL SELECT a, > > b FROM collate_test2; -- fail > > > > failed with: > > ERROR: no collation was derived for column "b" with collatable type text > > HINT: Use the COLLATE clause to set the collation explicitly. > > "works" now. > Could you explain why the above should fail? After all the UNION is valid > outside the CREATE TABLE and you can even sort on b. That would be strange, because earlier in the test file there is also SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2 ORDER BY 2; -- fail The union itself is valid, but because it combines two different collations, the collation derivation for the column is "unknown", and so it cannot be ordered. And we made the implementation decision to not allow creating columns with unknown collation.
On Sunday, December 11, 2011 08:09:36 PM Peter Eisentraut wrote: > On sön, 2011-12-11 at 04:26 +0100, Andres Freund wrote: > > > Building a CreateStmt seems to work well enough so far. > > > The only problem with that approach so far that I found is that: > > > > > > CREATE TABLE collate_test2 > > > ( > > > > > > a int, > > > > > > b text COLLATE "POSIX" > > > > > > ); > > > > > > CREATE TABLE collate_test1 > > > ( > > > > > > a int, > > > > > > b text COLLATE "C" NOT NULL > > > > > > ); > > > > > > CREATE TABLE test_u AS SELECT a, b FROM collate_test1 UNION ALL SELECT > > > a, b FROM collate_test2; -- fail > > > > > > failed with: > > > ERROR: no collation was derived for column "b" with collatable type > > > text HINT: Use the COLLATE clause to set the collation explicitly. > > > "works" now. > > > > Could you explain why the above should fail? After all the UNION is valid > > outside the CREATE TABLE and you can even sort on b. > > That would be strange, because earlier in the test file there is also > > SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2 > ORDER BY 2; -- fail Yea. I didn't realize that it only fails during execution if there actually are rows present (confusing behaviour btw). > The union itself is valid, but because it combines two different > collations, the collation derivation for the column is "unknown", and so > it cannot be ordered. And we made the implementation decision to not > allow creating columns with unknown collation. I copied that behaviour. Its a bit complicated by the fact that GetColumnDefCollation cannot be taught to accept an invalidOid so I had to duplicate the check in CreateTableAs... I attached the - from my side - final version of the patch. I dislike two things about it: * code duplication due to error handling. Before making the error message for various illegal SELECT INTOs the patch actually shrank the code size... If anybody has a good idea to avoid duplicating that loop around SelectStmt->ops I would be happy. * new executor flags to define whether oids should be returned Andres
On Sun, Dec 11, 2011 at 1:55 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Andres Freund <andres@anarazel.de> writes: >> Hm. I just noticed a relatively big hole in the patch: The handling of >> deletion of dependent objects currently is nonexistant because they don't go >> through ProcessUtility... > > That's the reason why we're talking about “command triggers” rather than > “DDL triggers”. We don't intend to fire the triggers at each DDL > operation happening on the server, but for each command. > > This restriction still allows us to provide a very useful feature when > checked against the main use cases we target here. Those are auditing, > and replication (the replay will also CASCADEs), and a generic enough > SUDO facility (because the trigger function can well be SECURITY > DEFINER). I haven't yet thought about your specific proposal here in enough to have a fully-formed opinion, but I am a little nervous that this may turn out to be one of those cases where the obvious API ends up working less well than might have been supposed. For example, consider the "auditing" use case, and suppose that the user wants to log a message (to a table, or some other separate logging facility) every time someone creates an index. In order to do that, they're going to need to trap not only CREATE INDEX but also CREATE TABLE and ALTER CONSTRAINT, and in the latter two cases they're then going to have to then write some grotty logic to grovel through the statement or its node tree in order to figure out what indexes are being created. Also, if they want to log the name of the new index in cases where the user doesn't specify it, they're going to have to duplicate the backend logic which generates the index name, which is going to be a pain in the ass and a recipe for future bugs (e.g. because we might change the algorithm at some point, or the trigger might see a different view of the world than the actual command, due to intervening DDL). This is something that has come up a lot in the context of sepgsql: it's not really enough to know what the toplevel command is in some cases, because whether or not the operation is allowed depends on exactly what it's doing, which depends on things like which schema gets used to create the table, and which tablespaces get used to create the table and its indexes, and perhaps (given Peter's pending work) what types are involved. You can deduce all of these things from the command text, but it requires duplicating nontrivial amounts of backend code, which is undesirable from a maintainability point of view. I think the same issues are likely to arise in the context of auditing, as in the example above, and even for replication: if, for example, the index name that is chosen on the master happens to exist on the standby with a different definition, replaying the actual command text might succeed or fail depending on how the command is phrased. Maybe it's OK for the slave to just choose a different name for that index, but now a subsequent DROP INDEX command that gets replayed across all servers might end up dropping logically unrelated indexes on different machines. Now, on the other hand, the conceptual simplicity of this approach is very appealing, and I can certainly see people who would never consider writing a ProcessUtility_hook using this type of facility. However, I'm worried that it will be difficult to use as a foundation for robust, general-purpose tools. Is that enough reason to reject the whole approach? I'm not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of lun dic 12 13:32:45 -0300 2011: > On Sun, Dec 11, 2011 at 1:55 PM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: > > Andres Freund <andres@anarazel.de> writes: > >> Hm. I just noticed a relatively big hole in the patch: The handling of > >> deletion of dependent objects currently is nonexistant because they don't go > >> through ProcessUtility... > > > > That's the reason why we're talking about “command triggers” rather than > > “DDL triggers”. We don't intend to fire the triggers at each DDL > > operation happening on the server, but for each command. > > > > This restriction still allows us to provide a very useful feature when > > checked against the main use cases we target here. Those are auditing, > > and replication (the replay will also CASCADEs), and a generic enough > > SUDO facility (because the trigger function can well be SECURITY > > DEFINER). > > I haven't yet thought about your specific proposal here in enough to > have a fully-formed opinion, but I am a little nervous that this may > turn out to be one of those cases where the obvious API ends up > working less well than might have been supposed. For example, > consider the "auditing" use case, and suppose that the user wants to > log a message (to a table, or some other separate logging facility) > every time someone creates an index. In order to do that, they're > going to need to trap not only CREATE INDEX but also CREATE TABLE and > ALTER CONSTRAINT, Yeah. I remember mentioning the ability of CREATE SCHEMA to embed all sort of object creation commands in a single top-level command, and being handwaved away by Jan. "Nobody knows about that", I was told. However, if this is to work in a robust way, these things should not be ignored. In particular, firing triggers on each top-level command _is_ going to get some objects ignored; the people writing the triggers _are_ going to forget to handle CREATE SCHEMA properly most of the time. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Monday, December 12, 2011 05:32:45 PM Robert Haas wrote: > On Sun, Dec 11, 2011 at 1:55 PM, Dimitri Fontaine > > <dimitri@2ndquadrant.fr> wrote: > > Andres Freund <andres@anarazel.de> writes: > >> Hm. I just noticed a relatively big hole in the patch: The handling of > >> deletion of dependent objects currently is nonexistant because they > >> don't go through ProcessUtility... > > > > That's the reason why we're talking about “command triggers” rather than > > “DDL triggers”. We don't intend to fire the triggers at each DDL > > operation happening on the server, but for each command. > > > > This restriction still allows us to provide a very useful feature when > > checked against the main use cases we target here. Those are auditing, > > and replication (the replay will also CASCADEs), and a generic enough > > SUDO facility (because the trigger function can well be SECURITY > > DEFINER). > > I haven't yet thought about your specific proposal here in enough to > have a fully-formed opinion, but I am a little nervous that this may > turn out to be one of those cases where the obvious API ends up > working less well than might have been supposed. What are your thoughts about a "not-obvious api"? > For example, > consider the "auditing" use case, and suppose that the user wants to > log a message (to a table, or some other separate logging facility) > every time someone creates an index. In order to do that, they're > going to need to trap not only CREATE INDEX but also CREATE TABLE and > ALTER CONSTRAINT, and in the latter two cases they're then going to > have to then write some grotty logic to grovel through the statement > or its node tree in order to figure out what indexes are being > created. Also, if they want to log the name of the new index in cases > where the user doesn't specify it, they're going to have to duplicate > the backend logic which generates the index name, which is going to be > a pain in the ass and a recipe for future bugs (e.g. because we might > change the algorithm at some point, or the trigger might see a > different view of the world than the actual command, due to > intervening DDL). I thought about using transformed statements before passing them to the triggers to avoid exactly that issue. That would make stuff like CREATE TABLE blub (LIKE), constraints and such transparent. It would mean quite some additional effort though (because for several statements the transformation is not really separated). > This is something that has come up a lot in the context of sepgsql: > it's not really enough to know what the toplevel command is in some > cases, because whether or not the operation is allowed depends on > exactly what it's doing, which depends on things like which schema > gets used to create the table, and which tablespaces get used to > create the table and its indexes, and perhaps (given Peter's pending > work) what types are involved. You can deduce all of these things > from the command text, but it requires duplicating nontrivial amounts > of backend code, which is undesirable from a maintainability point of > view. I think the same issues are likely to arise in the context of > auditing, as in the example above, and even for replication: if, for > example, the index name that is chosen on the master happens to exist > on the standby with a different definition, replaying the actual > command text might succeed or fail depending on how the command is > phrased. Maybe it's OK for the slave to just choose a different name > for that index, but now a subsequent DROP INDEX command that gets > replayed across all servers might end up dropping logically unrelated > indexes on different machines. Well, you shouldn't grovel through the text - you do get passed the parsetree which should make accessing that far easier. I can imagine somebody writing some shiny layer which lets you grovel trough that with some xpath alike api. > Now, on the other hand, the conceptual simplicity of this approach is > very appealing, and I can certainly see people who would never > consider writing a ProcessUtility_hook using this type of facility. > However, I'm worried that it will be difficult to use as a foundation > for robust, general-purpose tools. Is that enough reason to reject > the whole approach? I'm not sure. Thats where I am a bit unsure as well. I think the basic approach is sound but it might need some work in several areas (transformed statements, dependencies, ...). On the other hand: perfect is the enemy of good... Having looked at it I don't think dependency handling is that much effort. Havent looked at separation of the transformation phase. Andres
Alvaro Herrera <alvherre@commandprompt.com> writes: > Yeah. I remember mentioning the ability of CREATE SCHEMA to embed all > sort of object creation commands in a single top-level command, and > being handwaved away by Jan. "Nobody knows about that", I was told. In fact CREATE SCHEMA implementation will fire a process utility command per element, see src/backend/commands/schemacmds.c:122 * Execute each command contained in the CREATE SCHEMA. Since the grammar * allows only utility commands in CREATE SCHEMA,there is no need to pass * them through parse_analyze() or the rewriter; we can just hand them * straight to ProcessUtility. > However, if this is to work in a robust way, these things should not be > ignored. In particular, firing triggers on each top-level command _is_ > going to get some objects ignored; the people writing the triggers _are_ > going to forget to handle CREATE SCHEMA properly most of the time. My current thinking is that the tool as proposed in the patch is useful enough now already, and that we should cover its limitations in the documentation. And also be somehow verbose about the nodeToString() output where to find almost all you need. Down the road, we might decide that either what we've done is covering enough needs (after all, we managed not to have the features for years) or decide to have internal commands implementation (create index from inside a create table statement, say) go through ProcessUtility. Also note that currently, anyone motivated enough to write a ProcessUtility hook (in C) will face the same problem here, the hook will not get called for any event where the command trigger would not get called either. So while I understand your concerns here, I think they are rather theoretical, as we can already add support for create table, alter table, and drop table to slony, londiste and bucardo with the existing patch, and also implement create extension whitelisting and allow non-superuser to install selected C coded extensions. FWIW, those are the two main use cases I'm after. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, Dec 12, 2011 at 11:49 AM, Andres Freund <andres@anarazel.de> wrote: >> I haven't yet thought about your specific proposal here in enough to >> have a fully-formed opinion, but I am a little nervous that this may >> turn out to be one of those cases where the obvious API ends up >> working less well than might have been supposed. > What are your thoughts about a "not-obvious api"? It seems to me (and it may seem differently to other people) that what most people who want to trap DDL events really want to do is either (a) get control at the permissions-checking stage so that they can override the system's default access control policy, either to allow something that would normally be denied (as in Dimitri's sudo example) or deny something that would normally be allowed (as with sepgsql, or the example in the documentation for Dimitri's patch showing how to enforce relation naming conventions) or else (b) perform some additional steps after completing the operation (e.g. audit it, replicate it, apply a security label to it). Much of the selinux-related work that KaiGai Kohei and I have been doing over the last couple of years has revolved around where to put those hooks and what information to pass to them. For example, we added some post-creation hooks where the system basically says "we just created a <object-type> and its OID is <newly-assigned-oid>". I'm not going to present this as a model of usability (because it isn't). However, I think the underlying model is worth considering. It's not a command trigger, because it doesn't care what command the user executed that led to the object getting created. It's more like an event system - whenever a table gets created (no matter how exactly that happens), you get a callback. I think that's probably closer to what people want for the additional-steps-after-completing-the-operation use case. Typically, you're not really interested in what the user typed: you want to know what the system decided to do about it. The "get control at the permissions checking stage" use case is a bit harder. We've added some hooks for that as well, but really only for DML thus far, and I'm not convinced we've got the right design even there. Part of the complexity here is that there are actually a lot of different permissions rules depending on exactly what operation you are performing - you might think of REINDEX TABLE, ALTER TABLE .. ALTER COLUMN .. RENAME and DROP TABLE as requiring the same permissions (i.e. ownership) but in fact they're all slightly different. It would be nice if the permissions checking machinery were better separated from the code that actually does the work, so that you could rip and replace just that part. Dimitri's idea of using a security definer function as a DDL instead-of trigger is an interesting one, but does that leave newly created objects owned by the wrong user? Or what if you want user A to be able to do some but not all of the things user B can do? Drilling an optimally-sized hole through the existing permissions system is unfortunately not very simple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > It seems to me (and it may seem differently to other people) that what > most people who want to trap DDL events really want to do is either [ detailed analysis, mostly right on spot ] Yeah, I'm proposing a rather crude tool. I think it's still solving real problems we have now, and that no other tool nor design is solving for us. So for me the real questions are: - do we want the feature in that form?- are we avoiding to paint ourselves into a corner? I think both answers are positive, because I want the feature bad enough to have spent time working on it, and it's able to solve two problems in one stone already. Also, the only way to have that feature in an extension implementing ProcessUtility_hook is duplicating what I've been doing, minus grammar support (just because you can't). Also that's not the kind of efforts that either slony or londiste will put into their own project, the amount of work wouldn't be justified for this only return, as history is telling us. (there's a game changer here though, which is the idea of doing “command triggers” as opposed to “ddl triggers” or even “catalog triggers”, thanks Jan) Then, we can expand the trigger function signature down the road and keep the current one as a compatibility support. For example we could add a cascading boolean argument and decide whether or not to call the trigger function when cascading based upon the trigger's procedure signature. So I believe it's somewhat coarse or crude, still useful enough, and not painting us into a corner. > using a security definer function as a DDL instead-of trigger is an > interesting one, but does that leave newly created objects owned by > the wrong user? Or what if you want user A to be able to do some but > not all of the things user B can do? Drilling an optimally-sized hole > through the existing permissions system is unfortunately not very > simple. You can still use ALTER OBJECT … OWNER TO …; from within the security definer function to allow the setup you're interested into. I guess that you can still have some conditional on current_user from the code too, but I didn't check that we have user id and effective user id both available from the SQL level (yet). All in all, I understand your reticence here, but I'm not sharing it, Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 12/12/2011 11:32 AM, Robert Haas wrote: > I haven't yet thought about your specific proposal here in enough to > have a fully-formed opinion, but I am a little nervous that this may > turn out to be one of those cases where the obvious API ends up > working less well than might have been supposed. There's another cautionary tale from the sepgsql history worth mentioning here, which surely I don't have to remind you about. Making the goal for a first shippable subset include proof you can solve the hardest problems in that area can lead to a long road without committing anything. With sepgsql, that was focusing on the worst of the ALTER TABLE issues. As Dimitri was pointing out, the name change to Command Triggers includes a sort of admission that DDL Triggers are too hard to solve in all cases yet. We shouldn't be as afraid to introduce APIs that are aimed at developers who currently have none. Yes, there's a risk that will end with "...and this one has to be broken in the next release because of this case we didn't see". We can't be so afraid of that we don't do anything, especially when the users who would be impacted by that theoretical case are currently suffering from an even worse problem than that. To provide the big picture infrastructure tools that people are desperate for now, PostgreSQL needs to get a lot more agile when it comes to revving hooks whose main consumers are not regular application programs. They're the administrators of the system instead. I know what I was just rallying against is not what you were arguing for, you just triggered a stored rant of mine. [Bad trigger joke goes here] Regardless, thoughts on where the holes are here are appreciated. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Tue, Dec 13, 2011 at 8:25 AM, Greg Smith <greg@2ndquadrant.com> wrote: > There's another cautionary tale from the sepgsql history worth mentioning > here, which surely I don't have to remind you about. Making the goal for a > first shippable subset include proof you can solve the hardest problems in > that area can lead to a long road without committing anything. With > sepgsql, that was focusing on the worst of the ALTER TABLE issues. As > Dimitri was pointing out, the name change to Command Triggers includes a > sort of admission that DDL Triggers are too hard to solve in all cases yet. > We shouldn't be as afraid to introduce APIs that are aimed at developers > who currently have none. > > Yes, there's a risk that will end with "...and this one has to be broken in > the next release because of this case we didn't see". Well, the problem is that just because something better comes along doesn't mean we'll actually deprecate and remove the old functionality. We still have contrib/xml2, even though the docs say we're "planning" to remove it in 8.4. Tom even rewrote the memory handling, because it was easier to rewrite a module he probably doesn't intrinsically care much about than to convince people we should remove something that was both planned for deprecation anyway and a huge security risk because it crashed if you looked at it sideways. And we still have rules, so people read the documentation and say to themselves "hmm, should i use triggers or rules for this project?". And elsewhere we're discussing whether and under what conditions it would be suitable to get rid of recovery.conf, which almost everyone seems to agree is a poor design, largely AFAICT because third-party tools find recovery.conf a convenient way to circumvent the need to rewrite postgresql.conf, which is a pain in the neck because of our insistence that it has to contain arbitrary user comments. In other words, more often than not, we are extremely reluctant to remove or modify features of even marginal utility because there will certainly be somebody, somewhere who is relying on the old behavior. Of course, it does go the other way sometimes: we removed old-style VACUUM FULL (which was useful if you were short of disk space and long on time), flat-file authentication (which was used by third party tools), and made removing OIDs require a table rewrite. Still, I think it's entirely appropriate to be cautious about adding new features that might not turn out to be the design we really want to have. Odds are good that we will end up supporting them for backward compatibility reasons for many, many years. Now, all that having been said, I also agree that the perfect can be the enemy of the good, and we go there frequently. The question I'm asking is not whether the feature is perfect, but whether it's adequate for even the most basic things people might want to do with it. Dimitri says that he wants it so that we can add support for CREATE TABLE, ALTER TABLE, and DROP TABLE to Slony, Bucardo, and Londiste. My fear is that it won't turn out to be adequate to that task, because there won't actually be enough information in the CREATE TABLE statement to do the same thing on all servers. In particular, you won't have the index or constraint names, and you might not have the schema or tablespace information either. But maybe we ought to put the question to the intended audience for the feature - is there a Slony developer in the house? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 13, 2011 at 10:53 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Dec 13, 2011 at 8:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Now, all that having been said, I also agree that the perfect can be >> the enemy of the good, and we go there frequently. The question I'm >> asking is not whether the feature is perfect, but whether it's >> adequate for even the most basic things people might want to do with >> it. Dimitri says that he wants it so that we can add support for >> CREATE TABLE, ALTER TABLE, and DROP TABLE to Slony, Bucardo, and >> Londiste. My fear is that it won't turn out to be adequate to that >> task, because there won't actually be enough information in the CREATE >> TABLE statement to do the same thing on all servers. In particular, >> you won't have the index or constraint names, and you might not have >> the schema or tablespace information either. > > But, you could query all that out from the system catalogs right? You could probably get a lot of it that way, although first you'll have to figure out which schema to look up the name in. It seems likely that everyone who uses the trigger will need to write that code, though, and they'll all have different implementations with different bugs, because many of them probably really want the facility that you write in your next sentence: > Maybe a better facility should exist to convert a table name to a > create table statement than hand rolling it or invoking pg_dump, but > that's a separate issue. > > This feature fills an important niche given that you can't hook RI > triggers to system catalogs...it comes up (in short, +1). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 13, 2011 at 8:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Now, all that having been said, I also agree that the perfect can be > the enemy of the good, and we go there frequently. The question I'm > asking is not whether the feature is perfect, but whether it's > adequate for even the most basic things people might want to do with > it. Dimitri says that he wants it so that we can add support for > CREATE TABLE, ALTER TABLE, and DROP TABLE to Slony, Bucardo, and > Londiste. My fear is that it won't turn out to be adequate to that > task, because there won't actually be enough information in the CREATE > TABLE statement to do the same thing on all servers. In particular, > you won't have the index or constraint names, and you might not have > the schema or tablespace information either. But, you could query all that out from the system catalogs right? Maybe a better facility should exist to convert a table name to a create table statement than hand rolling it or invoking pg_dump, but that's a separate issue. This feature fills an important niche given that you can't hook RI triggers to system catalogs...it comes up (in short, +1). merlin
On Tue, Dec 13, 2011 at 9:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Now, all that having been said, I also agree that the perfect can be > the enemy of the good, and we go there frequently. The question I'm > asking is not whether the feature is perfect, but whether it's > adequate for even the most basic things people might want to do with > it. Dimitri says that he wants it so that we can add support for > CREATE TABLE, ALTER TABLE, and DROP TABLE to Slony, Bucardo, and > Londiste. My fear is that it won't turn out to be adequate to that > task, because there won't actually be enough information in the CREATE > TABLE statement to do the same thing on all servers. In particular, > you won't have the index or constraint names, and you might not have > the schema or tablespace information either. But maybe we ought to > put the question to the intended audience for the feature - is there a > Slony developer in the house? Yeah, I'm not certain yet what is being provided, and the correspondence with what would be needed. - It's probably not sufficient to capture the raw statement, as that gets invoked within a context replete with GUC values, and you may need to duplicate that context/environment on a replica. Mind you, a command trigger is doubtless capable of querying GUCs to duplicate relevant portions of the environment. - What I'd much rather have is a form of the query that is replete with Full Qualification, so that "create table foo (id serial primary key, data text not null unique default 'better replace this', dns_data dnsrr not null);" may be transformed into a safer form like: "create table public.foo (id serial primary key, data text not null unique default 'better replace this'::text, dns_data dns_rr.dnsrr not null);" What's not clear, yet, is which transformations are troublesome. For instance, if there's already a sequence called "foo_id_seq", then the sequence defined for that table winds up being "foo_id_seq1", and it's not quite guaranteed that *that* would be identical across databases. But perhaps it's sufficient to implement what, of COMMAND TRIGGERS, can be done, and we'll see, as we proceed, whether or not it's enough. It's conceivable that a first implementation won't be enough to implement DDL triggers for Slony, and that we'd need to ask for additional functionality that doesn't make it in until 9.3. That seems better, to me, than putting it on the shelf, and having functionality in neither 9.2 nor 9.3... -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Tue, Dec 13, 2011 at 12:29 PM, Christopher Browne <cbbrowne@gmail.com> wrote: > But perhaps it's sufficient to implement what, of COMMAND TRIGGERS, > can be done, and we'll see, as we proceed, whether or not it's enough. > > It's conceivable that a first implementation won't be enough to > implement DDL triggers for Slony, and that we'd need to ask for > additional functionality that doesn't make it in until 9.3. That > seems better, to me, than putting it on the shelf, and having > functionality in neither 9.2 nor 9.3... The thing is, I don't really see the approach Dimitri is taking as being something that we can extend to meet the requirement you just laid out. So it's not like, OK, let's do this, and we'll improve it later. It's, let's do this, and then later do something completely different, and that other thing will be the one that really solves the problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/13/2011 09:59 AM, Robert Haas wrote: > Well, the problem is that just because something better comes along > doesn't mean we'll actually deprecate and remove the old > functionality. The examples you gave fall into three groups, and I think it's useful to demarcate how they're different. Please pardon me going wildly off topic before returning back. If you drop xml2 or rules, people lose something. It's primarily the PostgreSQL developers who gain something. You can make a case that people who won't get sucked into doing something wrong with rules one day will gain something, but those people are a future speculation; they're not here asking to be saved for a problem they don't know will happen yet. This sort of deprecation battle is nearly impossible to win. One of the reasons I placed a small bet helping sponsor PGXN is that I hope it allows some of this should be deprecated stuff to move there usefully. Let the people who use it maintain it moving forward if they feel it's important. The recovery.conf change and other attempts to reorganize the postgresql.conf are in a second category. These break scripts, without providing an immediate and obvious gain to everyone. Some say it's better, some say it's worse, from the list traffic it seems like a zero-sum game. The burden is on the person advocating the change to justify it if there's not a clear win for everyone. You might note that my latest attitude toward this area is to provide the mechanism I want as a new option, and not even try to argue about removing the old thing anymore. This lets implementation ideas battle it out in the wild. Let's say a year from now everyone who hasn't switched to using a conf.d dirctory approach looks like an old stick in the mud. Then maybe I pull the sheet I have an enormous bikeshed hidden behind, waiting for just that day.[1] When VACUUM FULL was rewritten, it took a recurring large problem that has impacted a lot of people, and replaced with a much better thing for most cases. A significantly smaller number of people lost something that was slightly useful. There weren't as many complaints because the thing that was lost was replaced with something better by most metrics. Different, but better. This third category of changes are much easier to sell. We have another such discontinuity coming with pg_stat_activity. The changes Scott Mead's patch kicked off make it different and better. Anyone who has a tool using the old thing can look at the new design and say "well, that makes the whole 'what state is the connection in' logic I used to worry about go away; that's progress even if it breaks my old scripts". People need to get something that offsets the breakage to keep griping down. Anyone who argues against those sort of changes has a challenging battle on their hands. If there is a Command Trigger implementation that Slony etc. use, and we discover a limitation that requires an API break, that's OK so long as it's expected that will fall into the last category. Breakage to add support for something new should be a feature clearly gained, something lost, and with a net benefit to most consumers of that feature. People accept it or block obvious forward progress. We don't want to get too confused between what makes for good progress on that sort of thing with the hard deprecation problems. (Not that I'm saying you are here, just pointing out it happens) > Dimitri says that he wants it so that we can add support for > CREATE TABLE, ALTER TABLE, and DROP TABLE to Slony, Bucardo, and > Londiste. My fear is that it won't turn out to be adequate to that > task, because there won't actually be enough information in the CREATE > TABLE statement to do the same thing on all servers. These are all good things to look into, please keep those test set ideas coming and hopefully we'll get some more input on this. But let's say this rolls out seeming good enough, and later someone discovers some weird index thing that isn't supported. If that's followed by "here's a new API; it breaks your old code, but it allows supporting that index you couldn't deal with before", that's unlikely to get shot down by that API's consumers. What you wouldn't be able to do is say "this new API doesn't work right, let's just yank it out". Your concerns about making sure at least the fundamentals hold here are on point though. [1] Look at that, I can now say that 100% of the programs I compose e-mail with now have "bikeshed" added to their dictionary. I don't bother with this often, but there's entries for "PostgreSQL" and "committer" there too.[2] [2] Would you believe a Google search for "committer" shows the PostgreSQL wiki page as its second hit? That's only behind the Wikipedia link, and ahead of the FreeBSD, Chromium, Apache, and Mozilla pages on that topic. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On 12/13/2011 9:59 AM, Robert Haas wrote: > it. Dimitri says that he wants it so that we can add support for > CREATE TABLE, ALTER TABLE, and DROP TABLE to Slony, Bucardo, and > Londiste. My fear is that it won't turn out to be adequate to that > task, because there won't actually be enough information in the CREATE > TABLE statement to do the same thing on all servers. In particular, > you won't have the index or constraint names, and you might not have > the schema or tablespace information either. But maybe we ought to > put the question to the intended audience for the feature - is there a > Slony developer in the house? I agree. While it is one of the most "asked for" features among the trigger based replication systems, I fear that an incomplete solution will cause more problems than it solves. It is far easier to tell people "DDL doesn't propagate automatically, do this instead ..." than to try to support a limited list of commands, that may or may not propagate as intended. And all sorts of side effects, like search_path, user names and even the existing schema in the replica can cause any given DDL "string" to actually do something completely different than what happened on the origin. On top of that, the PostgreSQL main project has a built in replication solution that doesn't need any of this. There is no need for anyone, but us trigger replication folks, to keep command triggers in sync with all other features. I don't think it is going to be reliable enough any time soon to make this the default for any of the trigger based replication systems. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
Robert Haas <robertmhaas@gmail.com> writes: > it. Dimitri says that he wants it so that we can add support for > CREATE TABLE, ALTER TABLE, and DROP TABLE to Slony, Bucardo, and > Londiste. My fear is that it won't turn out to be adequate to that > task, because there won't actually be enough information in the CREATE > TABLE statement to do the same thing on all servers. In particular, > you won't have the index or constraint names, and you might not have > the schema or tablespace information either. In my experience of managing lots of trigger based replications (more than 100 nodes in half a dozen different projects), what I can tell from the field is that I don't care about index and constraint names. Being able to replicate the same CREATE TABLE statement that the provider just executed on the subscriber is perfectly fine for my use cases. Again, that's a caveat of the first implementation, you can't have sub commands support without forcing them through ProcessUtility and that's a much more invasive patch. Maybe we will need that later. Also it's quite easy to add support for the CREATE INDEX command, including index name support, and ALTER TABLE is already on the go. So we can document how to organize your DDL scripts for them to just work with the replication system. And you can even implement a command trigger that enforces respecting the limits (RAISE EXCEPTION when the CREATE TABLE command is embedding primary key creation rather than using a separate command for that). As for the schema, you can easily get the current search_path setting from the command trigger and force it to the same value on the subscriber before replaying the commands (hint: add current search_path to the event you're queuing for replay). select setting from pg_settings where name = 'search_path'; I appreciate that some use cases won't be possible to implement with the first version of this patch, really, but I believe we have enough use cases that are possible to implement with it that it's worth providing the feature. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Christopher Browne <cbbrowne@gmail.com> writes: > - What I'd much rather have is a form of the query that is replete > with Full Qualification, so that "create table foo (id serial primary > key, data text not null unique default 'better replace this', dns_data > dnsrr not null);" may be transformed into a safer form like: "create > table public.foo (id serial primary key, data text not null unique > default 'better replace this'::text, dns_data dns_rr.dnsrr not null);" Andres did the same comment and I've begun working on that. The facility for fully qualifying object names are not always available in a form that we can use from the parsetree, but that's a SMOP. > What's not clear, yet, is which transformations are troublesome. For > instance, if there's already a sequence called "foo_id_seq", then the > sequence defined for that table winds up being "foo_id_seq1", and it's > not quite guaranteed that *that* would be identical across databases. > > But perhaps it's sufficient to implement what, of COMMAND TRIGGERS, > can be done, and we'll see, as we proceed, whether or not it's enough. I'm not convinced that having the same sequence names on the subscribers is something we need here. Let me detail that, because maybe I just understood a major misunderstanding in the use case I'm interested into. I mostly use trigger based replication in cases where it's not possible to implement failover or switchover with that technique, because I'm doing cross-replication or more complex architectures. Failover is handled with WAL based techniques (wal shipping, streaming rep, etc). So I don't care that much about the sub object names (constraints, indexes, sequences): I need them to get created on the subscriber and that's about it. I think that's an important enough use case here. > It's conceivable that a first implementation won't be enough to > implement DDL triggers for Slony, and that we'd need to ask for > additional functionality that doesn't make it in until 9.3. That > seems better, to me, than putting it on the shelf, and having > functionality in neither 9.2 nor 9.3... Or maybe Slony would end up relying partly on the command trigger facility and implementing the missing pieces in its own code base. Like it did with the txid_snapshot data type some years ago, for example. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Jan Wieck <JanWieck@Yahoo.com> writes: > I agree. While it is one of the most "asked for" features among the trigger > based replication systems, I fear that an incomplete solution will cause > more problems than it solves. It is far easier to tell people "DDL doesn't > propagate automatically, do this instead ..." than to try to support a > limited list of commands, that may or may not propagate as intended. And all Nothing stops you from checking that the command you want to replicate is indeed supported and refuse to run it on the provider when not, that's what command triggers are for :) > sorts of side effects, like search_path, user names and even the existing > schema in the replica can cause any given DDL "string" to actually do > something completely different than what happened on the origin. Grab those on the provider from pg_settings and the like in the command trigger and restore them on the subscriber before applying the command? > On top of that, the PostgreSQL main project has a built in replication > solution that doesn't need any of this. There is no need for anyone, but us > trigger replication folks, to keep command triggers in sync with all other > features. You can't implement cross replication with built-in replication. I'm yet to work on a medium sized project where I don't need both streaming replication, wal archiving, and a trigger based replication system. > I don't think it is going to be reliable enough any time soon to make this > the default for any of the trigger based replication systems. You need to add "yet" and "without some work in the client implementation". Last time I read such comments, it was about extensions. We still shipped something in 9.1, and by your measure, it's quite broken. When you implement an SQL only extension (no .so) you still have to use a Makefile and you need to deploy your scripts on the server filesystem before hand, it's not possible to install or update the extension from an SQL connection (even when doing only self-contained SQL commands). Also, the dependency system is not solving much real world problems, apart from the very simplest one that we can play with in contrib/. You can't even list shared objects (.so, .dll) that have been loaded so far in a session and reference the extension they belong to. Yet, I bet that some people will find that the extension system as we have it in 9.1 still is useful enough to have been released in there. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support PS: yes I intend to be working on fixing those extension limitations and caveats with a series of patches.
Excerpts from Dimitri Fontaine's message of mié dic 14 07:22:21 -0300 2011: > Again, that's a caveat of the first implementation, you can't have sub > commands support without forcing them through ProcessUtility and that's > a much more invasive patch. Maybe we will need that later. I can get behind this argument: force all stuff through ProcessUtility for regularity, and not necessarily in the first patch for this feature. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Dec 14, 2011 at 9:05 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Dimitri Fontaine's message of mié dic 14 07:22:21 -0300 2011: >> Again, that's a caveat of the first implementation, you can't have sub >> commands support without forcing them through ProcessUtility and that's >> a much more invasive patch. Maybe we will need that later. > > I can get behind this argument: force all stuff through ProcessUtility > for regularity, and not necessarily in the first patch for this feature. That seems like a pretty heavy dependency on an implementation detail that we might want to change at some point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> I can get behind this argument: force all stuff through ProcessUtility >> for regularity, and not necessarily in the first patch for this feature. > > That seems like a pretty heavy dependency on an implementation detail > that we might want to change at some point. Given ProcessUtility_hook, how much of an implementation detail rather than a public API are we talking about? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Dec 14, 2011 at 5:44 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> I can get behind this argument: force all stuff through ProcessUtility >>> for regularity, and not necessarily in the first patch for this feature. >> >> That seems like a pretty heavy dependency on an implementation detail >> that we might want to change at some point. > > Given ProcessUtility_hook, how much of an implementation detail rather > than a public API are we talking about? I think that a hook and an SQL command are not on the same level. Hooks are not documented; SQL commands are. You may, of course, disagree. But the basic point is that it seems pretty weird to say, on the one hand, these are command triggers. They fire when you execute a command. So the CREATE TABLE trigger will fire when someone says CREATE TABLE. But then we say, oh, well if you use CREATE SCHEMA foo CREATE TABLE blah ... we will fire the trigger anyway. Now it's not a command trigger any more; it's a trigger that fires when you perform a certain operation - e.g. create a table. Unless, of course, you create the table using CREATE TABLE AS SELECT or SELECT .. INTO. Then it doesn't fire. Unless we decide to make those utility commands, which I think was just recently under discussion, in which case it will suddenly start firing for those operations. So now something that is, right now, an essentially an implementation detail which we can rearrange as we like turns into a user-visible behavior change. And what if some day we want to change it back? I think it would be a much better idea to decree from the beginning that we're trapping the *operation* of creating a table, rather than the *command* create table. Then, the behavior is clear and well-defined from day one, and it doesn't depend on how we happen to implement things internally right at the moment. If there's some way of creating a table without firing the trigger, it's a bug. If there's some way of firing the trigger without attempting to create a table, it's a bug. That might require some more thought about what information to pass to the trigger function (e.g. if it's a SELECT .. INTO, you're not going to have pregenerated SQL that starts with the words "CREATE TABLE") but the fact that gives much more clear definition to the core feature seems like a big plus in my book. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thursday, December 15, 2011 03:36:25 PM Robert Haas wrote: > On Wed, Dec 14, 2011 at 5:44 PM, Dimitri Fontaine > > <dimitri@2ndquadrant.fr> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >>> I can get behind this argument: force all stuff through ProcessUtility > >>> for regularity, and not necessarily in the first patch for this > >>> feature. > >> That seems like a pretty heavy dependency on an implementation detail > >> that we might want to change at some point. > > Given ProcessUtility_hook, how much of an implementation detail rather > > than a public API are we talking about? > I think that a hook and an SQL command are not on the same level. > Hooks are not documented; SQL commands are. You may, of course, > disagree. I am not disagreeing. > But the basic point is that it seems pretty weird to say, on the one > hand, these are command triggers. They fire when you execute a > command. So the CREATE TABLE trigger will fire when someone says > CREATE TABLE. But then we say, oh, well if you use CREATE SCHEMA foo > CREATE TABLE blah ... we will fire the trigger anyway. Now it's not a > command trigger any more; it's a trigger that fires when you perform a > certain operation - e.g. create a table. Unless, of course, you > create the table using CREATE TABLE AS SELECT or SELECT .. INTO. Then > it doesn't fire. Unless we decide to make those utility commands, > which I think was just recently under discussion, in which case it > will suddenly start firing for those operations. Command triggers were the initial reason for me doing that conversion. > So now something > that is, right now, an essentially an implementation detail which we > can rearrange as we like turns into a user-visible behavior change. > And what if some day we want to change it back? > I think it would be a much better idea to decree from the beginning > that we're trapping the *operation* of creating a table, rather than > the *command* create table. Then, the behavior is clear and > well-defined from day one, and it doesn't depend on how we happen to > implement things internally right at the moment. I am with you there. > If there's some way > of creating a table without firing the trigger, it's a bug. If > there's some way of firing the trigger without attempting to create a > table, it's a bug. Again. > That might require some more thought about what > information to pass to the trigger function (e.g. if it's a SELECT .. > INTO, you're not going to have pregenerated SQL that starts with the > words "CREATE TABLE") but the fact that gives much more clear > definition to the core feature seems like a big plus in my book. Not sure what youre getting at here? The command tag in Dim's patch is alread independent of the form of actual statement that was run. A big +1 on the general direction of this email. I dislike reducing the scope of command triggers to "top level commands run by the actual user" because imo that would reduce the possible scope of the patch rather much. On the other hand I think delivering a complete patch covering just about anything triggered anywhere is not really realistic. Not sure whats the best way to continue is. Andres
Hi, Thank your Robert for this continued review, I think we're making good progress in a community discussion that needs to happen! Robert Haas <robertmhaas@gmail.com> writes: >> Given ProcessUtility_hook, how much of an implementation detail rather >> than a public API are we talking about? > > I think that a hook and an SQL command are not on the same level. > Hooks are not documented; SQL commands are. You may, of course, > disagree. Mmmm, yes. I think that we should document hooks, and I'm going to propose soon a pg_extension_module() SRF that lists all currently loaded modules and which extension implements them, and I've begun thinking about what it would take to be able to list what hooks each module is implementing. That would require an hook registration API, and would allow a much easier security auditing of a setup you don't know beforehand. So I think that hooks not being documented is an implementation detail here :) > But the basic point is that it seems pretty weird to say, on the one > hand, these are command triggers. They fire when you execute a > command. So the CREATE TABLE trigger will fire when someone says > CREATE TABLE. But then we say, oh, well if you use CREATE SCHEMA foo > CREATE TABLE blah ... we will fire the trigger anyway. Now it's not a Yes, this CREATE SCHEMA <any create command> is weird, and unique, so it's not that difficult to document, I think. And if we fix things by having all subcommands go through ProcessUtility and command triggers, then it's easy to document as the new rule. My documentation idea would then be explaining what is a command and what is a subcommand, and then the current rule (command triggers do not support subcommands) and then the exception to that rule (CREATE SCHEMA subcommands do, in fact, support command triggers). If we decide that we should in fact support (nested) subcommands in command triggers, then we will be in a position to prepare a patch implementing that with a documentation change that the rule is now that command triggers do in fact support subcommands. When the command trigger is called on a subcommand, I think you will want both the main command string and the subcommand string, and we have to research if what you need isn't a whole stack of commands, because I'm not sure you can only have 1 level deep nesting here. That would be done with a special API for the trigger functions, and with a specific syntax, because it seems to me that having a trigger code that is only ever called on a top-level command is a good thing to have. create trigger foo after command CREATE TABLE … create trigger foo after top level command CREATE TABLE … create triggerfoo after nested command CREATE TABLE … Either we add support for that kind of syntax now (and not implementing it in 9.3 would seem weird as hell) or we instruct pg_dump and pg_upgrade to switch from current syntax to the new one (add in the “top level” keywords) when we do implement the feature down the road. > command trigger any more; it's a trigger that fires when you perform a > certain operation - e.g. create a table. Unless, of course, you > create the table using CREATE TABLE AS SELECT or SELECT .. INTO. Then > it doesn't fire. Unless we decide to make those utility commands, > which I think was just recently under discussion, in which case it > will suddenly start firing for those operations. So now something Andres has been sending a complete patch to fix that old oddity of the parser. And again, that's a very narrow exception to the usual state of things, and as such, easy to document in the list of things that don't fall into the general rule. Even when fixed, though, you have 3 different command tags to deal with, and we identify the command triggers on command tags, so that a command trigger on CREATE TABLE will not be called when doing SELECT INTO, nor when doing CREATE TABLE AS. Also, I think that the command triggers in 9.2 will not address all and any command that PostgreSQL offers (think “alter operator family”), so that we will need to maintain an exhaustive list of supported commands, identified by command tags. > that is, right now, an essentially an implementation detail which we > can rearrange as we like turns into a user-visible behavior change. > And what if some day we want to change it back? Yes, we need to make a decision about that now. Do we want any “operation” to go through ProcessUtility so that hooks and command triggers can get called? I think it's a good idea in the long run, and Alvaro seems to be thinking it is too. That entails changing the backend code to build a Statement then call ProcessUtility on it rather than calling the internal functions directly, and that also means some more DDL are subject to being logged on the server logs. Removing those “cross-modules” #include might be a good think in the long run though. And we don't have to do that unless we want to make subcommands available to ProcessUtility_hook and command triggers. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thursday, December 15, 2011 04:53:15 PM Dimitri Fontaine wrote: > Hi, > > Thank your Robert for this continued review, I think we're making good > progress in a community discussion that needs to happen! > > Robert Haas <robertmhaas@gmail.com> writes: > >> Given ProcessUtility_hook, how much of an implementation detail rather > >> than a public API are we talking about? > > > > I think that a hook and an SQL command are not on the same level. > > Hooks are not documented; SQL commands are. You may, of course, > > disagree. > Mmmm, yes. I think that we should document hooks, and I'm going to > propose soon a pg_extension_module() SRF that lists all currently loaded > modules and which extension implements them, and I've begun thinking > about what it would take to be able to list what hooks each module is > implementing. I don't really see that as possible/realistic. > > But the basic point is that it seems pretty weird to say, on the one > > hand, these are command triggers. They fire when you execute a > > command. So the CREATE TABLE trigger will fire when someone says > > CREATE TABLE. But then we say, oh, well if you use CREATE SCHEMA foo > > CREATE TABLE blah ... we will fire the trigger anyway. Now it's not a > > Yes, this CREATE SCHEMA <any create command> is weird, and unique, so > it's not that difficult to document, I think. And if we fix things by > having all subcommands go through ProcessUtility and command triggers, > then it's easy to document as the new rule. I don't think I ever saw that one in real worldddl scripts ;) > My documentation idea would then be explaining what is a command and > what is a subcommand, and then the current rule (command triggers do not > support subcommands) and then the exception to that rule (CREATE SCHEMA > subcommands do, in fact, support command triggers). > If we decide that we should in fact support (nested) subcommands in > command triggers, then we will be in a position to prepare a patch > implementing that with a documentation change that the rule is now that > command triggers do in fact support subcommands. > When the command trigger is called on a subcommand, I think you will > want both the main command string and the subcommand string, and we have > to research if what you need isn't a whole stack of commands, because > I'm not sure you can only have 1 level deep nesting here. I don't think you need that. If needed you will have to build the data structure in $pl. > That would be done with a special API for the trigger functions, and > with a specific syntax, because it seems to me that having a trigger > code that is only ever called on a top-level command is a good thing to > have. > > create trigger foo after command CREATE TABLE … > create trigger foo after top level command CREATE TABLE … > create trigger foo after nested command CREATE TABLE … > > Either we add support for that kind of syntax now (and not implementing > it in 9.3 would seem weird as hell) or we instruct pg_dump and > pg_upgrade to switch from current syntax to the new one (add in the “top > level” keywords) when we do implement the feature down the road. I personally think there should only be one variant which is always called. With a parameter that indicates whether its a subcommand or not. Why would you ever only want top level commands? > > that is, right now, an essentially an implementation detail which we > > can rearrange as we like turns into a user-visible behavior change. > > And what if some day we want to change it back? > Yes, we need to make a decision about that now. Do we want any > “operation” to go through ProcessUtility so that hooks and command > triggers can get called? I personally think thats a good idea for most stuff. I don't see that for alter table subcommands and such though. > I think it's a good idea in the long run, and Alvaro seems to be > thinking it is too. That entails changing the backend code to build a > Statement then call ProcessUtility on it rather than calling the > internal functions directly, and that also means some more DDL are > subject to being logged on the server logs. Removing those > “cross-modules” #include might be a good think in the long run though. Uhm. I don't think building strings is the way to go here. I think building *Stmt nodes is better. Andres
Andres Freund <andres@anarazel.de> writes: >> create trigger foo after command CREATE TABLE … >> create trigger foo after top level command CREATE TABLE … >> create trigger foo after nested command CREATE TABLE … >> >> Either we add support for that kind of syntax now (and not implementing >> it in 9.3 would seem weird as hell) or we instruct pg_dump and >> pg_upgrade to switch from current syntax to the new one (add in the “top >> level” keywords) when we do implement the feature down the road. > I personally think there should only be one variant which is always called. > With a parameter that indicates whether its a subcommand or not. > > Why would you ever only want top level commands? Because the command "create table foo(id primary key)" could then fire your "index" and "constraint" command triggers twice and if all you do is storing the command string in a table for auditing purposes, maybe you just don't care. >> Yes, we need to make a decision about that now. Do we want any >> “operation” to go through ProcessUtility so that hooks and command >> triggers can get called? > I personally think thats a good idea for most stuff. > > I don't see that for alter table subcommands and such though. By subcommand, I mean any operation that a main command do for you and that you could have been doing manually with another command, such as serial and sequences, primary key and its alter table form, embedded checks or not null and its alter table form, etc. I don't remember that ALTER TABLE implement facilities that are in turn calling another command for you? >> I think it's a good idea in the long run, and Alvaro seems to be >> thinking it is too. That entails changing the backend code to build a >> Statement then call ProcessUtility on it rather than calling the >> internal functions directly, and that also means some more DDL are >> subject to being logged on the server logs. Removing those >> “cross-modules” #include might be a good think in the long run though. > Uhm. I don't think building strings is the way to go here. I think building > *Stmt nodes is better. Agreed, I meant that exactly. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Dec 15, 2011 at 05:46:05PM +0100, Dimitri Fontaine wrote: > Andres Freund <andres@anarazel.de> writes: > >> Yes, we need to make a decision about that now. Do we want any > >> ???operation??? to go through ProcessUtility so that hooks and command > >> triggers can get called? > > I personally think thats a good idea for most stuff. > > > > I don't see that for alter table subcommands and such though. > > By subcommand, I mean any operation that a main command do for you and > that you could have been doing manually with another command, such as > serial and sequences, primary key and its alter table form, embedded > checks or not null and its alter table form, etc. > > I don't remember that ALTER TABLE implement facilities that are in turn > calling another command for you? When ALTER TABLE ALTER TYPE changes an indexed column, it updates the index by extracting its SQL definition, dropping it, and running that SQL. (Not sure whether it passes through the code paths to hit your trigger.) We ought not to promise that ALTER TABLE will always do so. By comparison, we could implement ALTER TABLE SET NOT NULL on an inheritance tree as several ALTER TABLE ONLY, but we don't implement it that way. Triggering on top-level commands can be quite well-defined; triggers on subcommands (as you have defined the term) will risk firing at different times across major versions. That may just be something to document. I think we'll need a way to tell SPI whether a command should be considered a subcommand or not. A PL/pgSQL function that calls CREATE TABLE had better fire a trigger, but some SPI usage will qualify as subcommands. nm
On Thu, Dec 15, 2011 at 10:53 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Yes, we need to make a decision about that now. Do we want any > “operation” to go through ProcessUtility so that hooks and command > triggers can get called? No. That's 100% backwards. We should first decide what functionality we want, and then decide how we're going to implement it. If we arbitrarily decide to force everything that someone might want to write a trigger on through ProcessUtility_hook, then we're going to end up being unable to add triggers for some things because they can't be conveniently forced through ProcessUtility, or else we're going to end up contorting the code in bizarre ways because we've drawn some line in the sand that ProcessUtility is the place where triggers have to get called. In doing this project, I think we should pay a lot of attention to the lessons that have been learned developing sepgsql. I can certainly understand if your eyes roll back in your head when you hear that, because that project has been exceedingly long and difficult and shows no sign of reaching its full potential for at least another few release cycles. But I think it's really worth the effort to avoid pounding our heads against the brick wall twice. Two things that leap out at me in this regard are: (1) It's probably a mistake to assume that we only need one interface.sepgsql has several hooks, and will need more beforethe dust settles. We have one hook for checking permissions on table names that appear in DML queries, a second for applying security labels just after a new SQL object is created, and a third for adjusting the security context when an sepgsql trusted procedure is invoked. In a similar way, I think it's probably futile to think that we can come up with a one-size-fits-all interface where every command (or operation) trigger can accept the same arguments. CREATE TABLE is going to want to know different stuff than LOCK TABLE or ALTER OPERATOR FAMILY, and trigger writers are going to want a *convenient* API to that information, not just the raw query text. (2) It's almost certainly a mistake to assume that everything you want to trigger on is a command. For example, somebody might want to get control whenever a table gets added to a column, either at table create time or later. I don't think most of us would consider CREATE TABLE bob (a int, b int) to be a create-a-table-with-no-columns operation plus two add-a-column-to-a-table operations. But being able to enforce a column naming policy is no less useful than being able to enforce a table naming policy, and there are other things you might want to do there as well (logging, updating metadata, prohibiting use of certain types, etc.). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > No. That's 100% backwards. We should first decide what functionality > we want, and then decide how we're going to implement it. If we > arbitrarily decide to force everything that someone might want to > write a trigger on through ProcessUtility_hook, then we're going to > end up being unable to add triggers for some things because they can't > be conveniently forced through ProcessUtility, or else we're going to > end up contorting the code in bizarre ways because we've drawn some > line in the sand that ProcessUtility is the place where triggers have > to get called. In theory you're right. In practice, we're talking about utility command triggers, that fires on a top-level command. We're now enlarging the talk about what to do with sub-commands, that is things done by a command as part of its implementation but that you could have done separately with another finer grained dedicated top-level command. I'm not wanting to implement a general ”event” trigger mechanism where anyone can come and help define the set of events, and I think what you're talking about now amounts to be doing that. Here again, trying to generalize before we have anything useful is a recipe for failure. I concur that “Process Utility Top-Level Only Command Triggers” is a pretty limited feature in scope, yet that's what I want to obtain here, and I think it's useful enough on its own. If you disagree, please propose a user level scheme where we can fit the work I'm doing so that I can adapt my patch and implement a part of your scheme in a future proof way. I'm ready to do that even when I have no need for what you're talking about, if that's what it takes. > (1) It's probably a mistake to assume that we only need one interface. [... useful sepgsql history ...] > trigger can accept the same arguments. CREATE TABLE is going to want > to know different stuff than LOCK TABLE or ALTER OPERATOR FAMILY, and > trigger writers are going to want a *convenient* API to that > information, not just the raw query text. Are you familiar with the nodeToString() output? That's close to a full blown XML, JSON or YAML document that you can easily use from a program. Command triggers functions are not given just a raw text. When trying to define a more complex API in the line of what you're referencing here, back at the Cluster Hackers Developer Meeting, it was devised that the nodeToString() output is all you need. For having written code that produces it and code that consumes it, Jan has been very clear that you hardly can do better than that while still being generic enough. > to enforce a column naming policy is no less useful than being able to > enforce a table naming policy, and there are other things you might > want to do there as well (logging, updating metadata, prohibiting use > of certain types, etc.). Are you familiar with the nodeToString() output? What makes you think that the uses cases you propose here are hard to implement in a command trigger when given this parse tree string? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine wrote: > Here again, trying to generalize before we have anything useful is a > recipe for failure. I concur that ?Process Utility Top-Level Only > Command Triggers? is a pretty limited feature in scope, yet that's what > I want to obtain here, and I think it's useful enough on its own. > > If you disagree, please propose a user level scheme where we can fit the > work I'm doing so that I can adapt my patch and implement a part of your > scheme in a future proof way. I'm ready to do that even when I have no > need for what you're talking about, if that's what it takes. We have a big user community and what _you_ want for this feature is only a small part of our decision on what is needed. Robert's concern that this might not be useful enough for the general use-cases people want is a legitimate, if difficult to hear, analysis. The resistance we get when removing features is sobering. Imagine the worst case, e.g. xml2, where we rightly implement it later, but there is something that is better done with the old interface, and we end up having to support both. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > We have a big user community and what _you_ want for this feature is > only a small part of our decision on what is needed. Robert's concern > that this might not be useful enough for the general use-cases people > want is a legitimate, if difficult to hear, analysis. Agreed, his concern is legitimate. Now, I've never been trying to implement a generic event trigger system and I don't know what it would take to implement such a beast. Transaction BEGIN, COMMIT, and ROLLBACK triggers anyone? not me :) Exploring if my proposal would be a pain to maintain once we have a fully generic event trigger system someday is legitimate, asking me to design both the generic event system and the command triggers so that they fit in is just asking for too much. The main part of my answer, though, is that all the more complex use cases involving command triggers that Robert is offering are in fact possible to implement with what my patch is providing, as soon as you're ok with understanding the content and format of the nodeToString() output. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine wrote: > Bruce Momjian <bruce@momjian.us> writes: > > We have a big user community and what _you_ want for this feature is > > only a small part of our decision on what is needed. Robert's concern > > that this might not be useful enough for the general use-cases people > > want is a legitimate, if difficult to hear, analysis. > > Agreed, his concern is legitimate. Now, I've never been trying to > implement a generic event trigger system and I don't know what it would > take to implement such a beast. > > Transaction BEGIN, COMMIT, and ROLLBACK triggers anyone? not me?:) > > Exploring if my proposal would be a pain to maintain once we have a fully > generic event trigger system someday is legitimate, asking me to design > both the generic event system and the command triggers so that they fit > in is just asking for too much. Agreed. I am not against this addition, just pointing out that we have to be careful. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > The main part of my answer, though, is that all the more complex use > cases involving command triggers that Robert is offering are in fact > possible to implement with what my patch is providing, as soon as you're > ok with understanding the content and format of the nodeToString() > output. Hmm ... I don't think that I *am* ok with that. ISTM that we'd then find ourselves with any changes in utility statement parse trees amounting to a user-visible API break, and that's not an acceptable situation. We already have this issue of course with respect to C-code add-ons, but (1) we've established an understanding that people should have to recompile those for every major release, and (2) changes such as adding a new field, or even changing an existing field that you don't care about, don't break C source code. I don't know exactly what you're imagining that user-written triggers would do with nodeToString strings, but I'd bet a good lunch that people will use ad-hoc interpretation methods that are not robust against changes at all. And then they'll blame us when their triggers break --- not unreasonably, because we failed to provide a sane API for them to use. We really need some higher-level API than the raw parse tree, and I have to admit that I have no idea what that would look like. But exposing parse trees to user-written triggers is a decision that we will come to regret, probably as soon as the next release. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Hmm ... I don't think that I *am* ok with that. ISTM that we'd then > find ourselves with any changes in utility statement parse trees > amounting to a user-visible API break, and that's not an acceptable > situation. Oh, you mean like exposing the parser for syntax coloring etc. I failed to see it's the same case. Do we have an acceptable proposal on that front yet? > We already have this issue of course with respect to C-code add-ons, > but (1) we've established an understanding that people should have to > recompile those for every major release, and (2) changes such as adding > a new field, or even changing an existing field that you don't care > about, don't break C source code. I don't know exactly what you're > imagining that user-written triggers would do with nodeToString strings, > but I'd bet a good lunch that people will use ad-hoc interpretation > methods that are not robust against changes at all. And then they'll > blame us when their triggers break --- not unreasonably, because we > failed to provide a sane API for them to use. Could we offer people a sane API? Another way could be to bypass BEFORE triggers and let people look at the catalogs in the AFTER command trigger, and give them the object oid, name and schemaname for them to do their lookups. You can still RAISE EXCEPTION in an AFTER command trigger to cancel the command execution, what you can not do anymore is canceling the command without killing the current transaction. > We really need some higher-level API than the raw parse tree, and > I have to admit that I have no idea what that would look like. > But exposing parse trees to user-written triggers is a decision > that we will come to regret, probably as soon as the next release. I was under the illusion that providing users with ready to tweak examples of robust-against-changes code would cut it. I'd like the command triggers patch not to depend on designing this API we need. What do you think of removing the parsetree and the BEFORE trigger support (so that trigger function can query the catalogs)? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> We really need some higher-level API than the raw parse tree, and >> I have to admit that I have no idea what that would look like. >> But exposing parse trees to user-written triggers is a decision >> that we will come to regret, probably as soon as the next release. > I was under the illusion that providing users with ready to tweak > examples of robust-against-changes code would cut it. I'd like the > command triggers patch not to depend on designing this API we need. Well, we don't have any such examples, because frankly the nodeToString representation is pretty damn unfriendly. The only code we have that does anything with it at all is the readfuncs.c code that turns it back into trees of C structs, and that's no help for triggers not themselves written in C. Besides which, readfuncs.c is the poster child for code that does have to change every time we tweak the struct definitions. We can't tell people to copy that approach. > What do you think of removing the parsetree and the BEFORE trigger > support (so that trigger function can query the catalogs)? Well, it gets us out of the business of inventing a suitable API, but I think it also reduces the feature to a point of near uselessness. Essentially we'd be saying to trigger authors "something changed, feel free to inspect the catalogs and see if you can guess what". Just because the problem is hard doesn't mean you can get away with not solving it. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Well, we don't have any such examples, because frankly the nodeToString > representation is pretty damn unfriendly. The only code we have that I tend to agree here, but I know that Jan is convincing enough when he's saying that it is in fact very friendly. > does anything with it at all is the readfuncs.c code that turns it back > into trees of C structs, and that's no help for triggers not themselves > written in C. Besides which, readfuncs.c is the poster child for code > that does have to change every time we tweak the struct definitions. > We can't tell people to copy that approach. Providing the same nested C structs thingy in python or perl or tcl might be feasible and not too sketchy to maintain, but I'm failing to see how to even approach that for plpgsql. >> What do you think of removing the parsetree and the BEFORE trigger >> support (so that trigger function can query the catalogs)? > > Well, it gets us out of the business of inventing a suitable API, > but I think it also reduces the feature to a point of near uselessness. Not being generic and flexible is not the same thing as not being of any use at all. Extension whitelisting is still possible to implement because all you need to know is the extension's name, then you choose to let the command string you're given execute or not. Same with replication or simple auditing cases, you still have the plain command string to play with. Not useful enough for being what we ship in 9.2, I can follow you there, not useful at all, disagreed. > Essentially we'd be saying to trigger authors "something changed, feel > free to inspect the catalogs and see if you can guess what". No, we'd also be providing the main OID of the object that changed (a pg_class entry for a CREATE TABLE command, etc), the object name and its schema name too. And the command string too. ALTER TABLE is still difficult to handle, other more simple commands might be ok. > Just because the problem is hard doesn't mean you can get away with > not solving it. That is the single simplest way of handling it, though, so I had to try that first. Now, maybe we can find the right approach to publishing the parse tree this month still. Any ideas welcome! I guess XML would be ok but we don't embed powerful enough tools, and JSON might be perfect but we would need to have a full blown datatype and functions to work with that from plpgsql. What other tree-ish data type can we have? EXPLAIN is already able to spit out XML and JSON (and YAML) but the typical client consuming that output is not running as a backend stored procedure, so I guess that's not a precedent and we still need something with a good support (type, operators, walking functions…) to back it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Dec 18, 2011 at 5:11 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > I guess XML would be ok but we don't embed powerful enough tools, and > JSON might be perfect but we would need to have a full blown datatype > and functions to work with that from plpgsql. What other tree-ish data > type can we have? > > EXPLAIN is already able to spit out XML and JSON (and YAML) but the > typical client consuming that output is not running as a backend stored > procedure, so I guess that's not a precedent and we still need something > with a good support (type, operators, walking functions…) to back it. Right. If we're actually going to expose the parse tree, I think JSON (or even XML) would be a far better way to expose that than the existing nodeToString() output. Sure, you could make due with the nodeToString() output for some things, especially in PL/perl or PL/python. But JSON would be far better, since it's a standard format rather than something we just made up, and could be used in PL/pgsql as well, given proper support functions. Another option would be to do something like this: CREATE TYPE pg_trigger_on_create_table AS ( catalog_name text, schema_name text, relation_name text, ... ); That's not materially different from exposing the parse tree, but it's more convenient for PL/pgsql and doesn't require adding a new datatype like JSON. It might require an awful lot of tuple-construction code and datatype definitions, though. Still another option would be to expose some of this information through "magical" variables or functions, sort of like the way that declaring a function to return trigger causes it to have NEW and OLD. It could have STUFF.catalog_name, STUFF.schema_name, STUFF.relation_name, or whatever we want. None of these approaches really get around the fact that if the command syntax changes, the trigger API has to change, too. You might be able to get around that for CREATE commands by having only AFTER triggers, and just passing the OID; and for DROP commands by having only BEFORE triggers, and just passing the OID. But I don't see any way to make it work very well for ALTER commands. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/18/2011 09:02 PM, Robert Haas wrote: > If we're actually going to expose the parse tree, I think JSON > (or even XML) would be a far better way to expose that than the > existing nodeToString() output. Sure, you could make due with the > nodeToString() output for some things, especially in PL/perl or > PL/python. But JSON would be far better, since it's a standard format > rather than something we just made up, and could be used in PL/pgsql > as well, given proper support functions. That seems pretty sensible; I wouldn't want to make it a hard requirement though. There are three major ways this could go for 9.2: 1) Nothing is changed in core, the best that can be done is whatever you can cram into an extension. 2) 9.2 is released with Command Triggers using nodeToString() output as the detailed description. That part is labeled "experimental and the API is expected to change completely in the next release" 3) 9.2 gets enough JSON support to also have an early nodeToJSON API instead. Now it's labeled "early release and some API changes may be needed in future releases". From the perspective of enabling a developer community to spring up around working in this area, I don't see a huge difference between (2) and (3). We're going in a direction I don't think is very well explored yet, by anyone. The idea that we're going to learn enough to accumulate a comprehensive, stable API before release is rather optimistic. There's something to be said for Dimitri's suggested approach from the perspective of "ship it knowing it works for features A, B, then let's see what else the users think to do with it". We can't determine what feature complete looks like from what other people are doing anymore; only way to find out is to see what the world at large thinks of after release. Think of it as being like the EXPLAIN plan output. That shipped as just text, programs popped up to parse it anyway, they suffered some breaks with each new release. That was still a major win. Then, new easier to parse formats appeared, making the bar to entry for writing new such tool much lower. And the construction of a better API for output benefited from seeing what people had actually been struggling with before then. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Excerpts from Dimitri Fontaine's message of dom dic 18 16:54:11 -0300 2011: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > > Hmm ... I don't think that I *am* ok with that. ISTM that we'd then > > find ourselves with any changes in utility statement parse trees > > amounting to a user-visible API break, and that's not an acceptable > > situation. > > Oh, you mean like exposing the parser for syntax coloring etc. I failed > to see it's the same case. Do we have an acceptable proposal on that > front yet? The conclusion that was reached during developer's meeting was that those interested should use a technique similar to the one used by the ecpg parser, namely use some sort of tool to transform the gram.y source code into something else (different productions). That idea is not useful to you here, I'm afraid. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Dec 19, 2011 at 9:31 AM, Greg Smith <greg@2ndquadrant.com> wrote: > That seems pretty sensible; I wouldn't want to make it a hard requirement > though. There are three major ways this could go for 9.2: > > 1) Nothing is changed in core, the best that can be done is whatever you can > cram into an extension. > > 2) 9.2 is released with Command Triggers using nodeToString() output as the > detailed description. That part is labeled "experimental and the API is > expected to change completely in the next release" > > 3) 9.2 gets enough JSON support to also have an early nodeToJSON API > instead. Now it's labeled "early release and some API changes may be needed > in future releases". > > From the perspective of enabling a developer community to spring up around > working in this area, I don't see a huge difference between (2) and (3). I do. Anyone coding in PL/pgsql is going to find the nodeToString() output unusable, and we can easily provide a built-in JSON datatype with enough richness to make that problem go away in time for 9.2. People in PL/python and PL/perl may be a bit better off, but I see no reason to ship something for 9.2 and then break it for 9.3 when we could perfectly well make that compatibility break before the release.(And, in case it's not clear, yes, I am volunteeringto do the work, if it comes down to that.) But before we get bogged down in the data representation, I think we need to make a more fundamental decision: to what extent are we OK with exporting the parse tree at all, in any form? Tom is arguing that we shouldn't, and I see his point: the recent DROP command rework would have broken everybody's command triggers if we had adopted this proposal, and that would be a real shame, because I don't particularly like the idea that we can't continue to improve the code and refactor things because someone out there might be depending on an older and less well-considered behavior. On the flip side, I don't really see any other way to make this feature work at all. I think that AFTER CREATE triggers and BEFORE DROP triggers could potentially be implemented by just passing OIDs in, and that might be useful enough for many people. But what about ALTER? I don't see that you're going to be able to write any sort of meaningful triggers around ALTER without passing at least some of the parse tree information to the trigger function. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On Monday, December 19, 2011 05:12:13 PM Robert Haas wrote: > On Mon, Dec 19, 2011 at 9:31 AM, Greg Smith <greg@2ndquadrant.com> wrote: > > That seems pretty sensible; I wouldn't want to make it a hard requirement > > though. There are three major ways this could go for 9.2: > > > > 1) Nothing is changed in core, the best that can be done is whatever you > > can cram into an extension. > > > > 2) 9.2 is released with Command Triggers using nodeToString() output as > > the detailed description. That part is labeled "experimental and the > > API is expected to change completely in the next release" > > > > 3) 9.2 gets enough JSON support to also have an early nodeToJSON API > > instead. Now it's labeled "early release and some API changes may be > > needed in future releases". > > > > From the perspective of enabling a developer community to spring up > > around working in this area, I don't see a huge difference between (2) > > and (3). > > I do. Anyone coding in PL/pgsql is going to find the nodeToString() > output unusable, and we can easily provide a built-in JSON datatype > with enough richness to make that problem go away in time for 9.2. > People in PL/python and PL/perl may be a bit better off, but I see no > reason to ship something for 9.2 and then break it for 9.3 when we > could perfectly well make that compatibility break before the release. > (And, in case it's not clear, yes, I am volunteering to do the work, > if it comes down to that.) I personally think youre underestimating the complexity of providing a sensible json compatibility shim ontop the nodestring representation. But I would like to be proven wrong ;) Whats your idea? > But before we get bogged down in the data representation, I think we > need to make a more fundamental decision: to what extent are we OK > with exporting the parse tree at all, in any form? Tom is arguing > that we shouldn't, and I see his point: the recent DROP command rework > would have broken everybody's command triggers if we had adopted this > proposal, and that would be a real shame, because I don't particularly > like the idea that we can't continue to improve the code and refactor > things because someone out there might be depending on an older and > less well-considered behavior. I don't see any realistic way to present the data in way thats abstracted from the internals for now. The infrastructure is completely new and we don't really know what it will be used for. I personally have no problem requiring that any nontrivial nodestring access has to be done via c functions for now. Building a completely new form of parstree seems way too much effort/flamebait for me. Andres
2011/12/19 Robert Haas <robertmhaas@gmail.com>: > On Mon, Dec 19, 2011 at 9:31 AM, Greg Smith <greg@2ndquadrant.com> wrote: >> That seems pretty sensible; I wouldn't want to make it a hard requirement >> though. There are three major ways this could go for 9.2: >> >> 1) Nothing is changed in core, the best that can be done is whatever you can >> cram into an extension. >> >> 2) 9.2 is released with Command Triggers using nodeToString() output as the >> detailed description. That part is labeled "experimental and the API is >> expected to change completely in the next release" >> >> 3) 9.2 gets enough JSON support to also have an early nodeToJSON API >> instead. Now it's labeled "early release and some API changes may be needed >> in future releases". >> >> From the perspective of enabling a developer community to spring up around >> working in this area, I don't see a huge difference between (2) and (3). > > I do. Anyone coding in PL/pgsql is going to find the nodeToString() > output unusable, and we can easily provide a built-in JSON datatype > with enough richness to make that problem go away in time for 9.2. > People in PL/python and PL/perl may be a bit better off, but I see no > reason to ship something for 9.2 and then break it for 9.3 when we > could perfectly well make that compatibility break before the release. > (And, in case it's not clear, yes, I am volunteering to do the work, > if it comes down to that.) absolutelly Parsing a some document is not a good way for plpgsql. We can enhance a diagnostics statement for triggers values. there should be lot of variables, and it is cheap because we can take content when it is requested STATEMENT_NAME, OBJECT_SCHEMA, OBJECT_NAME, OBJECT_TYPE, OBJECT_OID, ATTRIBUT_NAME, ATTRIBUT_OID, ATTRIBUT_TYPE, STATEMENT_PARAMETERS_ARRAY plpgsql is not good for recursive data - can be nice if all necessary data can be flat. Regards Pavel > > But before we get bogged down in the data representation, I think we > need to make a more fundamental decision: to what extent are we OK > with exporting the parse tree at all, in any form? Tom is arguing > that we shouldn't, and I see his point: the recent DROP command rework > would have broken everybody's command triggers if we had adopted this > proposal, and that would be a real shame, because I don't particularly > like the idea that we can't continue to improve the code and refactor > things because someone out there might be depending on an older and > less well-considered behavior. > > On the flip side, I don't really see any other way to make this > feature work at all. I think that AFTER CREATE triggers and BEFORE > DROP triggers could potentially be implemented by just passing OIDs > in, and that might be useful enough for many people. But what about > ALTER? I don't see that you're going to be able to write any sort of > meaningful triggers around ALTER without passing at least some of the > parse tree information to the trigger function. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Dec 19, 2011 at 11:20 AM, Andres Freund <andres@anarazel.de> wrote: >> I do. Anyone coding in PL/pgsql is going to find the nodeToString() >> output unusable, and we can easily provide a built-in JSON datatype >> with enough richness to make that problem go away in time for 9.2. >> People in PL/python and PL/perl may be a bit better off, but I see no >> reason to ship something for 9.2 and then break it for 9.3 when we >> could perfectly well make that compatibility break before the release. >> (And, in case it's not clear, yes, I am volunteering to do the work, >> if it comes down to that.) > I personally think youre underestimating the complexity of providing a > sensible json compatibility shim ontop the nodestring representation. But I > would like to be proven wrong ;) > Whats your idea? I haven't gotten that far down into the minutiae yet. :-) But the reason I think it won't be too bad is because the current representation is darn close to JSON already: {VAR :varno 2 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 2 :varoattno 1 :location 9998} => {"_":"VAR","varno":2,"varattno":1,"vartype":25,"vartypmod":-1,"varcollid":100,"varlevelsup":0,"varnoold":2,"varoattno":1,"location":9998} If you've got something like: {OPEXPR :opno 98 :opfuncid 67 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 100 :args ({VAR :varno 2 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 2 :varoattno 1 :location 9998} {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 10009}) :location 10007} ...then the value for the "args" label will just be an object rather than an integer. >> But before we get bogged down in the data representation, I think we >> need to make a more fundamental decision: to what extent are we OK >> with exporting the parse tree at all, in any form? Tom is arguing >> that we shouldn't, and I see his point: the recent DROP command rework >> would have broken everybody's command triggers if we had adopted this >> proposal, and that would be a real shame, because I don't particularly >> like the idea that we can't continue to improve the code and refactor >> things because someone out there might be depending on an older and >> less well-considered behavior. > I don't see any realistic way to present the data in way thats abstracted from > the internals for now. The infrastructure is completely new and we don't > really know what it will be used for. That's my feeling as well, but I'm hoping Tom or someone else has a better idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andres Freund <andres@anarazel.de> writes: > On Monday, December 19, 2011 05:12:13 PM Robert Haas wrote: >> I do. Anyone coding in PL/pgsql is going to find the nodeToString() >> output unusable, and we can easily provide a built-in JSON datatype >> with enough richness to make that problem go away in time for 9.2. I hear the sound of goalposts moving. > I personally think youre underestimating the complexity of providing a > sensible json compatibility shim ontop the nodestring representation. But I > would like to be proven wrong ;) If we were going to do this at all, the way to do it is to flush the existing nodestring representation and redefine it as being JSON. No? If this is as easy as people are claiming, it should not be hard to revise the lower-level bits of outfuncs/readfuncs to make the text representation compatible. And there's no reason we can't change what is stored in pg_rewrite. >> But before we get bogged down in the data representation, I think we >> need to make a more fundamental decision: to what extent are we OK >> with exporting the parse tree at all, in any form? Tom is arguing >> that we shouldn't, and I see his point: the recent DROP command rework >> would have broken everybody's command triggers if we had adopted this >> proposal, and that would be a real shame, because I don't particularly >> like the idea that we can't continue to improve the code and refactor >> things because someone out there might be depending on an older and >> less well-considered behavior. The problem that changing the nodestring representation could help with is making user-written triggers roughly as robust as C code is, to wit that you don't have to change it as long as the specific fields it touches aren't redefined. The question is whether that's good enough. The DROP command changes provide a pretty strong clue that it isn't. Admittedly, that's not the sort of change we make too often. But I will be seriously annoyed if we start getting the sort of pushback on parsetree changes that we've been hearing from certain quarters about configuration file changes. Those structures are *internal* and we have got to have the flexibility to whack them around. regards, tom lane
On 12/19/2011 11:12 AM, Robert Haas wrote: > But before we get bogged down in the data representation, I think we > need to make a more fundamental decision: to what extent are we OK > with exporting the parse tree at all, in any form? Tom is arguing > that we shouldn't, and I see his point: the recent DROP command rework > would have broken everybody's command triggers if we had adopted this > proposal, and that would be a real shame, because I don't particularly > like the idea that we can't continue to improve the code and refactor > things because someone out there might be depending on an older and > less well-considered behavior. Any interface here would need to be in the same sense Linux uses the term: subject to change in every major version, and maybe even in a minor one if that's the best way to solve a higher priority issue. An example we've been consuming that comes to mind is the "API" for keeping processes from being killed by the OOM killer. It's far from stable: http://archives.postgresql.org/message-id/4CE5E437.7080902@2ndquadrant.com but it's still possible for users of it to keep up with new releases, and when feasible some work toward backward compatibility is done (but not always) As a tool author, I would expect anything working at the level where the data needed is only available from the parse tree would need to be re-tested against each new version, and then have version specific changes as needed. Setting the expectations bar any higher for consumers of that interface would be unrealistic. The minority of people who'd like to use this feature shouldn't want to see PostgreSQL development hamstrung for the majority either, and the standards for breakage here should be clear from the beginning--unlike those for, say, GUC changes between releases. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Mon, Dec 19, 2011 at 12:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I personally think youre underestimating the complexity of providing a >> sensible json compatibility shim ontop the nodestring representation. But I >> would like to be proven wrong ;) > > If we were going to do this at all, the way to do it is to flush the > existing nodestring representation and redefine it as being JSON. No? > If this is as easy as people are claiming, it should not be hard to > revise the lower-level bits of outfuncs/readfuncs to make the text > representation compatible. And there's no reason we can't change > what is stored in pg_rewrite. I thought about that. A quick inspection suggests that this would slightly increase the size of the stored rules, because the node type would need to become a key, which would add at least a few more characters, and also because we'd need more quoting. That is: {THING :wump 1} becomes, at a minimum: {"_": "THING", "wump": 1} And that's using a somewhat-lame single character label for the node tag. Now, I suspect that in practice the cost of the stored rules becoming slightly larger is negligible, so maybe it's worth it. >>> But before we get bogged down in the data representation, I think we >>> need to make a more fundamental decision: to what extent are we OK >>> with exporting the parse tree at all, in any form? Tom is arguing >>> that we shouldn't, and I see his point: the recent DROP command rework >>> would have broken everybody's command triggers if we had adopted this >>> proposal, and that would be a real shame, because I don't particularly >>> like the idea that we can't continue to improve the code and refactor >>> things because someone out there might be depending on an older and >>> less well-considered behavior. > > The problem that changing the nodestring representation could help with > is making user-written triggers roughly as robust as C code is, to wit > that you don't have to change it as long as the specific fields it > touches aren't redefined. The question is whether that's good enough. Agreed. > The DROP command changes provide a pretty strong clue that it isn't. > Admittedly, that's not the sort of change we make too often. But I > will be seriously annoyed if we start getting the sort of pushback > on parsetree changes that we've been hearing from certain quarters about > configuration file changes. Those structures are *internal* and we have > got to have the flexibility to whack them around. Yes. Maybe we should try to split the baby here and defer the question of whether to expose any of the parse tree internals, and if so how much, to a future release. It seems to me that we could design a fairly useful set of functionality around AFTER-CREATE, BEFORE-DROP, and maybe even AFTER-ALTER triggers without exposing any parse tree details. For CREATE and ALTER, that would make it the client's responsibility to inspect the system catalogs and figure out what had happened and what to do about it, which is admittedly not ideal, but it would be more than we have now, and it would then give us the option to consider requests to expose more information in future releases on a case-by-case basis, rather than making a blanket decision about whether to expose the parse tree or not. I have a sneaking suspicion that, while we probably can't get by without exposing any parse tree information ever, the amount we truly need to expose might end up being only a small percentage of the total... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Robert Haas <robertmhaas@gmail.com> writes: > Maybe we should try to split the baby here and defer the question of > whether to expose any of the parse tree internals, and if so how much, > to a future release. It seems to me that we could design a fairly > useful set of functionality around AFTER-CREATE, BEFORE-DROP, and > maybe even AFTER-ALTER triggers without exposing any parse tree > details. +1 Also remember that you have a “normalized” command string to play with. Lots of use cases are already ok here. The other ones would need a tree representation that's easy to consume, which in the current state of affairs (I saw no progress on the JSON data type and facilities) is very hard to imagine when you consider PLpgSQL. So unless I hear about a show stopper, I'm going to work some more on the command trigger patch where I still had some rough edges to polish. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes: > Maybe we should try to split the baby here and defer the question of > whether to expose any of the parse tree internals, and if so how much, > to a future release. It seems to me that we could design a fairly > useful set of functionality around AFTER-CREATE, BEFORE-DROP, and > maybe even AFTER-ALTER triggers without exposing any parse tree > details. Please find attached v5 of the patch, with nodeToString() support removed. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
On Monday, December 19, 2011 07:37:46 PM Robert Haas wrote: > Maybe we should try to split the baby here and defer the question of > whether to expose any of the parse tree internals, and if so how much, > to a future release. I personally think this is an error and those details should at least be available on the c level (e.g. some pg_command_trigger_get_plan() function, only available via C) to allow sensible playing around with that knowledge. I don't really see making progress towards a nice interface unless we get something to play around with out there. Andres
Andres Freund <andres@anarazel.de> writes: > I personally think this is an error and those details should at least be > available on the c level (e.g. some pg_command_trigger_get_plan() function, > only available via C) to allow sensible playing around with that knowledge. I > don't really see making progress towards a nice interface unless we get > something to play around with out there. If you target C coded triggers then all you need to do is provide a pointer to the Node *parsetree, I would think. What else? The drawback though is still the same, the day you do that you've proposed a public API and changing the parsetree stops being internal refactoring. The way around this problem is that if you want a command trigger in C, just write an extension that implements the Process Utility hook. Bonus, you can have that working with already released versions of PostgreSQL. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Friday, January 13, 2012 11:53:32 PM Dimitri Fontaine wrote: > Andres Freund <andres@anarazel.de> writes: > > I personally think this is an error and those details should at least be > > available on the c level (e.g. some pg_command_trigger_get_plan() > > function, only available via C) to allow sensible playing around with > > that knowledge. I don't really see making progress towards a nice > > interface unless we get something to play around with out there. > > If you target C coded triggers then all you need to do is provide a > pointer to the Node *parsetree, I would think. What else? Yes. Being able to turn that into a statement again is still valuable imo. > The drawback though is still the same, the day you do that you've > proposed a public API and changing the parsetree stops being internal > refactoring. Yes, sure. I don't particularly care though actually. Changing some generic guts of trigger functions is not really that much of a problem compared to the other stuff involoved in a version migration. > The way around this problem is that if you want a command > trigger in C, just write an extension that implements the Process > Utility hook. The point is that with CREATE COMMAND TRIGGER only the internal part of the triggers need to change. No the external interface. Which is a big difference from my pov. Also hooks are relatively hard to stack, i.e. its hard to use them sensibly from multiple independent projects. They also cannot be purely installed via extensions/sql. Andres
Andres Freund <andres@anarazel.de> writes: >> If you target C coded triggers then all you need to do is provide a >> pointer to the Node *parsetree, I would think. What else? > Yes. > > Being able to turn that into a statement again is still valuable imo. That part of the WIP code is still in the patch, yes. >> The drawback though is still the same, the day you do that you've >> proposed a public API and changing the parsetree stops being internal >> refactoring. > Yes, sure. I don't particularly care though actually. Changing some generic > guts of trigger functions is not really that much of a problem compared to the > other stuff involoved in a version migration. Let's hear about it from Tom, who's mainly been against publishing that. > The point is that with CREATE COMMAND TRIGGER only the internal part of the > triggers need to change. No the external interface. Which is a big difference > from my pov. I'm not sure. The way you get the arguments would stay rather stable, but the parsetree would change at each release: that's not a long term API here. I fail to see much difference in between a hook and a command trigger as soon as you've chosen to implement the feature in C. > Also hooks are relatively hard to stack, i.e. its hard to use them sensibly > from multiple independent projects. They also cannot be purely installed via > extensions/sql. That remains true, you can't easily know in which order your hooks will get fired, contrary to triggers, and you can't even list the hooks. I fear that we won't be able to answer your need here in 9.2 though. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Jan 13, 2012 at 5:53 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Andres Freund <andres@anarazel.de> writes: >> I personally think this is an error and those details should at least be >> available on the c level (e.g. some pg_command_trigger_get_plan() function, >> only available via C) to allow sensible playing around with that knowledge. I >> don't really see making progress towards a nice interface unless we get >> something to play around with out there. > > If you target C coded triggers then all you need to do is provide a > pointer to the Node *parsetree, I would think. What else? > > The drawback though is still the same, the day you do that you've > proposed a public API and changing the parsetree stops being internal > refactoring. The way around this problem is that if you want a command > trigger in C, just write an extension that implements the Process > Utility hook. Bonus, you can have that working with already released > versions of PostgreSQL. But on the flip side, I think we're generally a bit more flexible about exposing things via C than through the procedural languages. I think it's reasonable for people to complain about their PL/pgsql functions breaking between major releases, but I have less sympathy for someone who has the same problem when coding in C. When you program in C, you're interfacing with the guts of the system, and we can't improve the system without changing the guts. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > But on the flip side, I think we're generally a bit more flexible > about exposing things via C than through the procedural languages. So we could still expose the parsetree of the current command. I wonder if it's already possible to get that from a C coded trigger, but I'll admit I'm yet to code a trigger in C. Will look into that. Then as Andres proposed, a new function would be available to get the value, we're not changing the trigger procedure function API in case the language is C… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Robert Haas <robertmhaas@gmail.com> writes: >> But on the flip side, I think we're generally a bit more flexible >> about exposing things via C than through the procedural languages. > > Then as Andres proposed, a new function would be available to get the > value, we're not changing the trigger procedure function API in case the > language is C… I've been updating my github branch with a patch that provides the parsetree to C coded command trigger functions only, as their 5th argument, of type INTERNAL (so that only C coded procs apply). https://github.com/dimitri/postgres/compare/master...command_triggers I still have some cleaning to do before to prepare the next patch version, such as documentation updating and dealing with rewrites of CHECK and DEFAULT column constraints in CREATE TABLE. I had to add support for the T_A_Const parser node, and now I'm about to see about adding support for the T_A_Expr one, but I can't help to wonder how the rewriter could work without them. What is this simple thing I'm missing here, if any? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > I still have some cleaning to do before to prepare the next patch > version, such as documentation updating and dealing with rewrites of > CHECK and DEFAULT column constraints in CREATE TABLE. I had to add > support for the T_A_Const parser node, and now I'm about to see about > adding support for the T_A_Expr one, but I can't help to wonder how the > rewriter could work without them. It doesn't, and it shouldn't have to. If those nodes get to the rewriter then somebody forgot to apply parse analysis. What's your test case? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: >> I still have some cleaning to do before to prepare the next patch >> version, such as documentation updating and dealing with rewrites of >> CHECK and DEFAULT column constraints in CREATE TABLE. I had to add >> support for the T_A_Const parser node, and now I'm about to see about >> adding support for the T_A_Expr one, but I can't help to wonder how the >> rewriter could work without them. > > It doesn't, and it shouldn't have to. If those nodes get to the > rewriter then somebody forgot to apply parse analysis. What's your test > case? I'm trying to rewrite the command string from the parse tree, and the simple example that I use to raise an ERROR is the following: create table foo (id serial, foo integer default 1, primary key(id)); I don't know if the parsetree handed by ProcessUtility() has gone through parse analysis and I don't know if that's needed to execute the commands, so maybe I have some high level function to call before walking the parse tree in my new rewriting support? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> It doesn't, and it shouldn't have to. If those nodes get to the >> rewriter then somebody forgot to apply parse analysis. What's your test >> case? > I'm trying to rewrite the command string from the parse tree, and the > simple example that I use to raise an ERROR is the following: > create table foo (id serial, foo integer default 1, primary key(id)); That needs to go through transformCreateStmt(). The comments at the head of parse_utilcmd.c might be informative. While I've not looked at your patch, I can't escape the suspicion that this means you are trying to do the wrong things in the wrong places. Calling transformCreateStmt() from some random place is not going to make things better; it is going to break CREATE TABLE, which expects to do that for itself. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: >> create table foo (id serial, foo integer default 1, primary key(id)); > > That needs to go through transformCreateStmt(). The comments at the > head of parse_utilcmd.c might be informative. Indeed, thanks for the heads up here. > While I've not looked at your patch, I can't escape the suspicion that > this means you are trying to do the wrong things in the wrong places. > Calling transformCreateStmt() from some random place is not going to > make things better; it is going to break CREATE TABLE, which expects to > do that for itself. From the comments in the file, it seems like I could either call the function where I need it on a copy of the parse tree (as made by the copyObject() function), or reshuffle either when that call happen or when the calling of the command trigger user functions happens. At the moment the trigger functions are called from standard_ProcessUtility() and are given the parse tree as handed over to that function, before the parse analysis. We can easily enough copy the parse tree and do another round of parse analysis on it only when some command triggers are going to get called. Is the cost of doing so acceptable? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Excerpts from Dimitri Fontaine's message of mié ene 18 16:03:29 -0300 2012: > At the moment the trigger functions are called from > standard_ProcessUtility() and are given the parse tree as handed over to > that function, before the parse analysis. > > We can easily enough copy the parse tree and do another round of parse > analysis on it only when some command triggers are going to get called. > Is the cost of doing so acceptable? Huh, isn't it simpler to just pass the triggers the parse tree *after* parse analysis? I don't understand what you're doing here. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wednesday, January 18, 2012 08:17:36 PM Alvaro Herrera wrote: > Excerpts from Dimitri Fontaine's message of mié ene 18 16:03:29 -0300 2012: > > At the moment the trigger functions are called from > > standard_ProcessUtility() and are given the parse tree as handed over to > > that function, before the parse analysis. > > > > We can easily enough copy the parse tree and do another round of parse > > analysis on it only when some command triggers are going to get called. > > Is the cost of doing so acceptable? > > Huh, isn't it simpler to just pass the triggers the parse tree *after* > parse analysis? I don't understand what you're doing here. Parse analysis is not exactly nicely separated for utility statements. Andres
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > We can easily enough copy the parse tree and do another round of parse > analysis on it only when some command triggers are going to get called. > Is the cost of doing so acceptable? It's not the costs I'm worried about so much as the side effects --- locks and so forth. Also, things like assignment of specific names for indexes and sequences seem rather problematic. In the worst case the trigger could run seeing "foo_bar_idx1" as the name of an index to be created, and then when the action actually happens, the name turns out to be "foo_bar_idx2" because someone else took the first name meanwhile. As I said, I think this suggests that you're trying to do the triggers in the wrong place. regards, tom lane
On Wednesday, January 18, 2012 08:31:49 PM Tom Lane wrote: > Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > > We can easily enough copy the parse tree and do another round of parse > > analysis on it only when some command triggers are going to get called. > > Is the cost of doing so acceptable? > > It's not the costs I'm worried about so much as the side effects --- > locks and so forth. Also, things like assignment of specific names > for indexes and sequences seem rather problematic. In the worst case > the trigger could run seeing "foo_bar_idx1" as the name of an index > to be created, and then when the action actually happens, the name > turns out to be "foo_bar_idx2" because someone else took the first name > meanwhile. You can't generally assume such a thing anyway. Remember there can be BEFORE command triggers. It would be easy to create a conflict there. The CREATE TABLE will trigger command triggers on CREATE SEQUENCE and ALTER SEQUENCE while creating the table. If one actually wants to do anything about those that would be the place. > As I said, I think this suggests that you're trying to do the triggers > in the wrong place. In my opinion it mostly shows that parse analysis of utlity statments is to intermingled with other stuff.... Not sure what to do about that. Andres
Alvaro Herrera <alvherre@commandprompt.com> writes: > Huh, isn't it simpler to just pass the triggers the parse tree *after* > parse analysis? I don't understand what you're doing here. I didn't realize that the parse analysis is in fact done from within standard_ProcessUtility() directly, which means your suggestion is indeed workable. Tom Lane <tgl@sss.pgh.pa.us> writes: > It's not the costs I'm worried about so much as the side effects --- Ok, so I'm now calling the command trigger procedures once the parse analysis is done, and guess what, I'm back to the same problem as before: https://github.com/dimitri/postgres/commit/4bfab6344a554c09f7322e861f9d09468f641bd9 CREATE TABLE public."ab_foo-bar" ( id serial NOT NULL, foo integer default 1, PRIMARY KEY(id) ); NOTICE: CREATE TABLEwill create implicit sequence "ab_foo-bar_id_seq" for serial column "ab_foo-bar.id" NOTICE: snitch: CREATE SEQUENCEERROR: unrecognized node type: 904 I'm not sure about the next step, and I'm quite sure I need to stop here for tonight. Any advice welcome, I'll be working on that again as soon as tomorrow. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Jan 18, 2012 at 4:23 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Huh, isn't it simpler to just pass the triggers the parse tree *after* >> parse analysis? I don't understand what you're doing here. > > I didn't realize that the parse analysis is in fact done from within > standard_ProcessUtility() directly, which means your suggestion is > indeed workable. > > Tom Lane <tgl@sss.pgh.pa.us> writes: >> It's not the costs I'm worried about so much as the side effects --- > > Ok, so I'm now calling the command trigger procedures once the parse > analysis is done, and guess what, I'm back to the same problem as > before: > > https://github.com/dimitri/postgres/commit/4bfab6344a554c09f7322e861f9d09468f641bd9 > > CREATE TABLE public."ab_foo-bar" > ( > id serial NOT NULL, > foo integer default 1, > PRIMARY KEY(id) > ); > NOTICE: CREATE TABLE will create implicit sequence "ab_foo-bar_id_seq" for serial column "ab_foo-bar.id" > NOTICE: snitch: CREATE SEQUENCE > ERROR: unrecognized node type: 904 > > I'm not sure about the next step, and I'm quite sure I need to stop here > for tonight. Any advice welcome, I'll be working on that again as soon > as tomorrow. My advice is to forget about trying to provide the command string to the user for the first version of this patch. As you're finding out, there's no simple, easy, obvious way of doing it, and there are N>0 useful things that can be done without that functionality. I continue to think that for a first version of this, we should be satisfied to pass just the OID. I know that's not really what you want, but there's much to be said for picking a goal that is achievable in the limited time available, and I fear that setting your sights too high will lead to something that either (a) doesn't get committed, or (b) gets committed, but turns out not to work very well, either of which would be less than ideal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > My advice is to forget about trying to provide the command string to > the user for the first version of this patch. As you're finding out, > there's no simple, easy, obvious way of doing it, and there are N>0 > useful things that can be done without that functionality. Actually, none of my current examples and tests are relying on it. For the trigger based replications Jan said he would need the full parse tree rather than just the command string anyway, so we're not loosing anything more that we already were ready to postpone. > I continue > to think that for a first version of this, we should be satisfied to > pass just the OID. I know that's not really what you want, but > there's much to be said for picking a goal that is achievable in the > limited time available, and I fear that setting your sights too high > will lead to something that either (a) doesn't get committed, or (b) > gets committed, but turns out not to work very well, either of which > would be less than ideal. Agreed, mostly. From the code I already have I think we should still pass in the command tag, the schema name (or null) and the object name (or null) as input parameters to the trigger's procedure. I'm now working on that and then the concurrency angle of the command triggers. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Jan 20, 2012 at 9:28 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> My advice is to forget about trying to provide the command string to >> the user for the first version of this patch. As you're finding out, >> there's no simple, easy, obvious way of doing it, and there are N>0 >> useful things that can be done without that functionality. > > Actually, none of my current examples and tests are relying on it. For > the trigger based replications Jan said he would need the full parse > tree rather than just the command string anyway, so we're not loosing > anything more that we already were ready to postpone. Cool. >> I continue >> to think that for a first version of this, we should be satisfied to >> pass just the OID. I know that's not really what you want, but >> there's much to be said for picking a goal that is achievable in the >> limited time available, and I fear that setting your sights too high >> will lead to something that either (a) doesn't get committed, or (b) >> gets committed, but turns out not to work very well, either of which >> would be less than ideal. > > Agreed, mostly. > > From the code I already have I think we should still pass in the command > tag, the schema name (or null) and the object name (or null) as input > parameters to the trigger's procedure. I think the OID is better than the name, but if it's easy to pass the name and schema, then I'm fine with it. But I do think this is one of those problems that is complex enough that we should be happy to get the core infrastructure in in the first commit, even with an extremely limited amount of functionality, and then build on it. Enhance-and-extend is so much easier than a monolithic code drop. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I think the OID is better than the name, but if it's easy to pass the > name and schema, then I'm fine with it. But I do think this is one of It's quite easy to get name and schema from the command yes, here's an example of how I'm doing it for some commands: case T_CreateStmt: { CreateStmt *node = (CreateStmt *)parsetree; cmd->schemaname = RangeVarGetNamespace(node->relation); cmd->objectname = node->relation->relname; break; } case T_AlterTableStmt: { AlterTableStmt *node = (AlterTableStmt *)parsetree; cmd->schemaname = RangeVarGetNamespace(node->relation); cmd->objectname = node->relation->relname; break; } case T_CreateExtensionStmt: { cmd->schemaname = NULL; cmd->objectname = ((CreateExtensionStmt *)parsetree)->extname; break; } Getting the OID on the other hand is much harder, because each command implements that part as it wants to, and DefineWhatever() functions are returning void. We could maybe have them return the main Oid of the object created, or we could have the CommandContext structure I'm using be a backend global variable that any command would stuff. In any case, adding support for the OID only works for after trigger when talking about CREATE commands and for before trigger if talking about DROP commands, assuming that the trigger procedure is run after we've been locating said Oid. It seems to me to be a couple orders of magnitude more work to get the Oid here compared to just get the schemaname and objectname. And getting those works in all cases as we take them from the command itself (we fill in the schema with the first search_path entry when it's not given explicitly in the command) CREATE TABLE foo();NOTICE: tag: CREATE TABLENOTICE: enforce_local_style{public.foo}: foo > those problems that is complex enough that we should be happy to get > the core infrastructure in in the first commit, even with an extremely > limited amount of functionality, and then build on it. > Enhance-and-extend is so much easier than a monolithic code drop. I'm happy to read that, and I'm preparing next patch version (v6) with that goal in mind. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Jan 20, 2012 at 12:14 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I think the OID is better than the name, but if it's easy to pass the >> name and schema, then I'm fine with it. But I do think this is one of > > It's quite easy to get name and schema from the command yes, here's an > example of how I'm doing it for some commands: > > case T_CreateStmt: > { > CreateStmt *node = (CreateStmt *)parsetree; > cmd->schemaname = RangeVarGetNamespace(node->relation); > cmd->objectname = node->relation->relname; > break; > } > > case T_AlterTableStmt: > { > AlterTableStmt *node = (AlterTableStmt *)parsetree; > cmd->schemaname = RangeVarGetNamespace(node->relation); > cmd->objectname = node->relation->relname; > break; > } > > case T_CreateExtensionStmt: > { > cmd->schemaname = NULL; > cmd->objectname = ((CreateExtensionStmt *)parsetree)->extname; > break; > } > > Getting the OID on the other hand is much harder, because each command > implements that part as it wants to, and DefineWhatever() functions are > returning void. We could maybe have them return the main Oid of the > object created, or we could have the CommandContext structure I'm using > be a backend global variable that any command would stuff. > > In any case, adding support for the OID only works for after trigger > when talking about CREATE commands and for before trigger if talking > about DROP commands, assuming that the trigger procedure is run after > we've been locating said Oid. > > It seems to me to be a couple orders of magnitude more work to get the > Oid here compared to just get the schemaname and objectname. And getting > those works in all cases as we take them from the command itself (we > fill in the schema with the first search_path entry when it's not given > explicitly in the command) > > CREATE TABLE foo(); > NOTICE: tag: CREATE TABLE > NOTICE: enforce_local_style{public.foo}: foo Hmm, OK. But what happens if the user doesn't specify a schema name explicitly? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Hmm, OK. But what happens if the user doesn't specify a schema name explicitly? For the creation case, RangeVarGetCreationNamespace should handle that. The code Dimitri quoted is wrong, but not that hard to fix. Unfortunately, the code he quoted for the ALTER case is also wrong, and harder to fix. Until you've done the lookup you don't know which schema the referenced object is in. And I don't care for the idea of doing the lookup twice, as (a) it'll be slower and (b) there are race cases where it will give the wrong answer, ie return an object other than the one the ALTER eventually acts on. Really I think there is not any single point where you can put the command-trigger hook and be done. In almost every case, the right place is going to be buried somewhere within the execution of the command. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > For the creation case, RangeVarGetCreationNamespace should handle that. > The code Dimitri quoted is wrong, but not that hard to fix. Ok. > Unfortunately, the code he quoted for the ALTER case is also wrong, > and harder to fix. Until you've done the lookup you don't know which > schema the referenced object is in. And I don't care for the idea of > doing the lookup twice, as (a) it'll be slower and (b) there are race > cases where it will give the wrong answer, ie return an object other > than the one the ALTER eventually acts on. Oh. Yes. > Really I think there is not any single point where you can put the > command-trigger hook and be done. In almost every case, the right > place is going to be buried somewhere within the execution of the > command. I'm finally realizing it. I already introduced a structure called CommandContextData to keep track of the current command elements we want to pass down to command triggers, I think we're saying that this should be a static variable that commands will need to be editing. The main problem with that approach is that we will need to spread calling the command trigger entry points to every supported command, I wanted to avoid that first. It's no big deal though, as the API is simple enough. Expect a new patch made this way early next week. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: >> Really I think there is not any single point where you can put the >> command-trigger hook and be done. In almost every case, the right >> place is going to be buried somewhere within the execution of the >> command. > > I'm finally realizing it. I already introduced a structure called > CommandContextData to keep track of the current command elements we want > to pass down to command triggers, I think we're saying that this should > be a static variable that commands will need to be editing. In fact passing it as an argument to the command trigger API is much simpler and done in the attached. I'm having problems here with my install and not enough time this week (you don't speak English if you don't use understatements here and there, right?) so please expect a one-step-further patch to show the integration concept, not a ready for commit one just yet. Next steps are about adding support for more commands, and now that we have settled on a simpler integration that will be easy. The parameters sent to the trigger procedure are now the command tag, the main object Oid, its schema name and its object name. Only the command tag will never be NULL, all the other columns could be left out when calling an ANY COMMAND trigger, or a command on a schemaless object. Note: the per-command integration means the Oid is generally available, so let's just export it. An ack about the way it's now implemented would be awesome, and we could begin to start about which set of command exactly we want supported from the get go (default to all of them of course, but well, I don't think that's necessarily the best use of our time given our command list). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
On Thu, Jan 26, 2012 at 10:00 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: >>> Really I think there is not any single point where you can put the >>> command-trigger hook and be done. In almost every case, the right >>> place is going to be buried somewhere within the execution of the >>> command. >> >> I'm finally realizing it. I already introduced a structure called >> CommandContextData to keep track of the current command elements we want >> to pass down to command triggers, I think we're saying that this should >> be a static variable that commands will need to be editing. > > In fact passing it as an argument to the command trigger API is much > simpler and done in the attached. I'm having problems here with my > install and not enough time this week (you don't speak English if you > don't use understatements here and there, right?) so please expect a > one-step-further patch to show the integration concept, not a ready for > commit one just yet. > > Next steps are about adding support for more commands, and now that we > have settled on a simpler integration that will be easy. The parameters > sent to the trigger procedure are now the command tag, the main object > Oid, its schema name and its object name. Only the command tag will > never be NULL, all the other columns could be left out when calling an > ANY COMMAND trigger, or a command on a schemaless object. > > Note: the per-command integration means the Oid is generally available, > so let's just export it. > > An ack about the way it's now implemented would be awesome, and we could > begin to start about which set of command exactly we want supported from > the get go (default to all of them of course, but well, I don't think > that's necessarily the best use of our time given our command list). Looks good so far. A user centric review of this patch... Would like a way to say ALL COMMANDS rather than write out list of supported commands. That list is presumably subject to change, so not having this could result in bugs in later code. The example given in the docs has no explanation. More examples and explanation please. Would specifically be interested in a command trigger which simply prevents all commands, which has many uses and is simple enough to be in docs. Why do we exclude SEQUENCEs? That could well be a problem for intended users Please explain/add to docs/comments why ALTER VIEW is not supported? Why not other commands?? Not a problem, but users will ask, so we should write down the answer Please document why pg_cmdtriggers is a separate table and not enhancement of pg_triggers? Thanks -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, Simon Riggs <simon@2ndQuadrant.com> writes: >> An ack about the way it's now implemented would be awesome I'm still missing that, which is only fair, just a ping from me here. This is about where to hook in the command trigger calls and which parameters do we call the procedures with. I've now done it the way Tom hinted me to, so I guess it's ok. > Would like a way to say ALL COMMANDS rather than write out list of > supported commands. That list is presumably subject to change, so not > having this could result in bugs in later code. That was included in the patch, but docs were not updated, they are now. > The example given in the docs has no explanation. More examples and > explanation please. Would specifically be interested in a command > trigger which simply prevents all commands, which has many uses and is > simple enough to be in docs. Done. Docs still need some more love before commit, I'm planning to be doing that tomorrow and/or the next days. > Why do we exclude SEQUENCEs? That could well be a problem for intended users > > Please explain/add to docs/comments why ALTER VIEW is not supported? > Why not other commands?? > Not a problem, but users will ask, so we should write down the answer I've added support for much more commands (49 of them or something, including SEQUENCEs), will continue doing so tomorrow, I think we should be ok to support any and all commands that make sense for 9.2. I still have some basic CREATE commands to add support for, then CLUSTER and VACUUM and REINDEX, then it's down to ALTER commands where only ALTER TABLE is currently finished. That's almost mechanical editing here, the brain only gets used to pick the right place to hook-in. The other change I need to make is support for regression tests. I guess we will want one test per command included. The tests will need to be in their own file and excluded from the parallel schedule too, or we would be changing the behavior of the other tests under them and couldn't compare the outputs anymore. > Please document why pg_cmdtriggers is a separate table and not > enhancement of pg_triggers? Because pg_triggers is made for triggers attached to a relation, as we can see from that unique constraint: "pg_trigger_tgrelid_tgname_index" UNIQUE, btree (tgrelid, tgname) A command trigger is attached to a command rather than a relation, so it needs to live in its own place: "pg_cmdtrigger_ctgcommand_ctgname_index" UNIQUE, btree (ctgcommand, ctgname) I'm not sure how and where to include that in the docs, any advice would be appreciated. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
On Tue, Feb 14, 2012 at 4:29 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >>> An ack about the way it's now implemented would be awesome > I'm still missing that, which is only fair, just a ping from me here. I took a brief look at this just now, and in general I'm pleasantly surprised. But, as you might imagine, I have some reservations: 1. I fear that the collection of commands to which this applies is going to end up being kind of a random selection. I suggest that for the first version of the patch, just pick a couple of simple commands like VACUUM and ANALYZE, and do just those. We can add on more later easily enough once the basic patch is committed. Also, that way, if you don't get to absolutely everything, we'll have a coherent subset of the functionality, rather than a subset defined by "what Dimitri got to". 2. Currently, we have some objects (views) that support INSTEAD OF triggers and others (tables) that support BEFORE and AFTER triggers. I don't see any advantage in supporting both. 3. I am not at all convinced that it's safe to allow command triggers to silently (i.e. without throwing an error) cancel the operation. I don't have very much confidence that that is in general a safe thing to do; there may be code calling this code that expects that such things will not happen. This diff hunk, for example, scares the crap out of me: - /* check that the locales can be loaded */ - CommandCounterIncrement(); - (void) pg_newlocale_from_collation(newoid); + /* before or instead of command trigger might have cancelled the comman + if (OidIsValid(newoid)) + { + /* check that the locales can be loaded */ + CommandCounterIncrement(); + (void) pg_newlocale_from_collation(newoid); + } I don't immediately understand why that's necessary, but I'm sure there's a good reason, and I bet a nickel that there are other places where similar adjustments are necessary but you haven't found them. I think we should rip out the option to silently cancel the command. We can always add that later, but if we add it here and in the first commit I think we are going to be chasing bugs for months. 4. This API forces all the work of setting up the CommandContext to be done regardless of whether any command triggers are present: + cmd->objectId = InvalidOid; + cmd->objectname = (char *)aggName; + cmd->schemaname = get_namespace_name(aggNamespace); I'll grant you that most people probably do not execute enough DDL for the cost of those extra get_namespace_name() calls to add up, but I'm not completely sure it's a negligible overhead in general, and in any case I think it's a safe bet that there will be continuing demand to add more information to the set of things that are supplied to the trigger. So I think that should be rethought. It'd be nice to have a cheap way to say if (AnyCommandTriggersFor(command_tag)) { ... do stuff ... }, emphasis on cheap. There's a definitional problem here, too, which is that you're supplying to the trigger the aggregate name *as supplied by the user* and the schema name as it exists at the time we do the reverse lookup. That reverse lookup isn't guaranteed to work at all, and it's definitely not guaranteed to produce an answer that's consistent with the aggName field. Maybe there's no better way, but it would at least be better to pass the namespace OID rather than the name. That way, the name lookup can be deferred until we are sure that we actually need to call something. 5. I'm not entirely convinced that it makes sense to support command triggers on commands that affect shared objects. It seems odd that such triggers will fire or not fire depending on which database is currently selected. I think that could lead to user confusion, too. 6. Why do we need all this extra copyfuncs/equalfuncs/outfuncs/readfuncs support? I thought we were dropping passing node trees for 9.2. 7. I don't have a strong feeling on what the psql command should be called, but \dcT seems odd. Why one lower-case letter and one upper-case letter? In general, I like the direction that this is going. But I think it will greatly improve its chances of being successful and relatively non-buggy if we strip it down to something very simple for an initial commit, and then add more stuff piece by piece. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Thanks for your time reviewing that patch! Robert Haas <robertmhaas@gmail.com> writes: > I took a brief look at this just now, and in general I'm pleasantly > surprised. But, as you might imagine, I have some reservations: Good starting point, let's see about the details :) > 1. I fear that the collection of commands to which this applies is > going to end up being kind of a random selection. I suggest that for > the first version of the patch, just pick a couple of simple commands > like VACUUM and ANALYZE, and do just those. We can add on more later > easily enough once the basic patch is committed. Also, that way, if > you don't get to absolutely everything, we'll have a coherent subset > of the functionality, rather than a subset defined by "what Dimitri > got to". I share that feeling here. Now, given the current scope of the patch, I think we can add in 9.2 all the commands that make sense to support (yes, including alter operator family). FWIW, I've been adding support for 16 forms of ALTER commands today, triple (implementation of alter rename, owner and namespace are separated). So while your reaction is perfectly understandable, I don't think that's the main thing here, you've just happened to see an intermediate state of things. > 2. Currently, we have some objects (views) that support INSTEAD OF > triggers and others (tables) that support BEFORE and AFTER triggers. > I don't see any advantage in supporting both. That's because you can cancel an INSERT from a BEFORE trigger, that's how you can write an INSTEAD OF trigger on a TABLE. As a VIEW does not support INSERT (directly), an INSTEAD OF trigger is what makes sense here. I don't think it's possible to transpose that thinking to command triggers, so I've been providing for either INSTEAD OF triggers or BEFORE/AFTER triggers. Note that you can't have both with my current patch. It's a convenience only thing, but I think it's worth having it: you don't need to read the trigger procedure to decide if that BEFORE command trigger is in fact an INSTEAD OF command trigger. Also, the code footprint for this feature is very small. > 3. I am not at all convinced that it's safe to allow command triggers > to silently (i.e. without throwing an error) cancel the operation. I > don't have very much confidence that that is in general a safe thing I was trying to offer a parallel with returning NULL from a BEFORE INSERT trigger, which allows you to cancel it. If that turns out to be so bad an idea that we need to withdraw it, so be it. It's still possible to cancel a command by means of RAISE EXCEPTION in a BEFORE command trigger, or by means of an INSTEAD OF trigger that does nothing. > to do; there may be code calling this code that expects that such > things will not happen. This diff hunk, for example, scares the crap > out of me: [...] > I don't immediately understand why that's necessary, but I'm sure > there's a good reason, and I bet a nickel that there are other places > where similar adjustments are necessary but you haven't found them. I Might be. That idea was sound to me when under the illusion that I wouldn't have to edit each and every command implementation in order to implement command triggers. I'm ok to step back now, but read on. > think we should rip out the option to silently cancel the command. We > can always add that later, but if we add it here and in the first > commit I think we are going to be chasing bugs for months. Ok, I'm going to remove that from the BEFORE command implementation. What about having both INSTEAD OF triggers (the command is not executed) and BEFORE trigger (you can cancel its execution only by raising an exception, and that cancels the whole transaction). I'd like that we'd be able to keep that feature. > 4. This API forces all the work of setting up the CommandContext to be > done regardless of whether any command triggers are present: > > + cmd->objectId = InvalidOid; > + cmd->objectname = (char *)aggName; > + cmd->schemaname = get_namespace_name(aggNamespace); At some point while developing the patch I had something way smarter than that, and centralized. I've not revised the API to get back the smartness when breaking out of the centralized implementation, because those elements only can be set from the innards of each command. The API to list which triggers to run already exists and only need the command tag, which we might have earlier in the processing. Note that we have to deal with commands acting on more than one object, as in RemoveObjects() where we loop over a list and call triggers each time: https://github.com/dimitri/postgres/compare/master...command_triggers#diff-19 The best I can think of would be to build the list of triggers to call as soon as we have the command tag, and skip preparing the rest of the CommandContextData structure when that list comes empty. Still no general stop-gap, but that would address your point I think. Now, about INSTEAD OF command triggers, we still need to implement them in exactly the same spot as BEFORE command triggers, and they're not making things any more complex in the per-command support code. > I'll grant you that most people probably do not execute enough DDL for > the cost of those extra get_namespace_name() calls to add up, but I'm > not completely sure it's a negligible overhead in general, and in any > case I think it's a safe bet that there will be continuing demand to > add more information to the set of things that are supplied to the > trigger. So I think that should be rethought. It'd be nice to have a > cheap way to say if (AnyCommandTriggersFor(command_tag)) { ... do > stuff ... }, emphasis on cheap. There's a definitional problem here, It's as cheap as scanning a catalog, as of now. I didn't install a new catalog cache for command triggers, as I don't think this code path is performance critical enough to pay back for the maintenance cost. Also consider that a big user of command triggers might have as much as a couple of triggers (BEFORE/AFTER) per command, that's about 300 of them. If they go crazy and have more than that with special conditions each time, let's say that's 1000 rows in the catalog. I don't think that kind of size and use cases (DDL) calls for a catalog cache here, nor for micro optimizing things. > too, which is that you're supplying to the trigger the aggregate name > *as supplied by the user* and the schema name as it exists at the time > we do the reverse lookup. That reverse lookup isn't guaranteed to > work at all, and it's definitely not guaranteed to produce an answer > that's consistent with the aggName field. Maybe there's no better > way, but it would at least be better to pass the namespace OID rather > than the name. That way, the name lookup can be deferred until we are > sure that we actually need to call something. I'm confused here, because all error messages that needs to contain the namespace are doing exactly the same thing as I'm doing in my patch. > 5. I'm not entirely convinced that it makes sense to support command > triggers on commands that affect shared objects. It seems odd that > such triggers will fire or not fire depending on which database is > currently selected. I think that could lead to user confusion, too. You mean tablespace here, I guess, what else? I don't think I've added other shared objects in there yet. I share your analyze btw, will remove support. > 6. Why do we need all this extra > copyfuncs/equalfuncs/outfuncs/readfuncs support? I thought we were > dropping passing node trees for 9.2. I didn't clean that out yet, that's the only reason why it's still there. I would like that this work was not useless :) > 7. I don't have a strong feeling on what the psql command should be > called, but \dcT seems odd. Why one lower-case letter and one > upper-case letter? Well \dt and \dT are already taken, so I got there. Will change to something else, e.g. \dct would be your choice here? > In general, I like the direction that this is going. But I think it > will greatly improve its chances of being successful and relatively > non-buggy if we strip it down to something very simple for an initial > commit, and then add more stuff piece by piece. I had the feeling we did exactly that when reducing the API not to provide the parse tree nor the (rewritten) command string. I can see more features to drop off, like the ability to silently cancel a statement in a BEFORE command trigger. I'd like to keep the INSTEAD OF command trigger feature and I think we can support lots of commands as soon as the integration becomes simple enough, which is where we stand as soon as we remove the statement cancelling. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Feb 15, 2012 at 3:25 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> 1. I fear that the collection of commands to which this applies is >> going to end up being kind of a random selection. I suggest that for >> the first version of the patch, just pick a couple of simple commands >> like VACUUM and ANALYZE, and do just those. We can add on more later >> easily enough once the basic patch is committed. Also, that way, if >> you don't get to absolutely everything, we'll have a coherent subset >> of the functionality, rather than a subset defined by "what Dimitri >> got to". > > I share that feeling here. Now, given the current scope of the patch, I > think we can add in 9.2 all the commands that make sense to support > (yes, including alter operator family). FWIW, I've been adding support > for 16 forms of ALTER commands today, triple (implementation of alter > rename, owner and namespace are separated). > > So while your reaction is perfectly understandable, I don't think that's > the main thing here, you've just happened to see an intermediate state > of things. I'm just saying that nobody's realistically going to be able to verify a patch of this size. It's either going to get committed warts and all, or it's going to not get committed. Decomposing it into a series of patches would make it possible to actually verify the logic. I guess on reflection I don't really care whether you decompose it at this point; the parts are pretty independent and it's easy enough to revert pieces of it. But if I commit any of this it's certainly not going to be the whole thing in one go. > It's still possible to cancel a command by means of RAISE EXCEPTION in a > BEFORE command trigger, or by means of an INSTEAD OF trigger that does > nothing. I think that there is no problem with cancelling a command via RAISE EXCEPTION. It's an established precedent that errors can be thrown anywhere, and any code that doesn't deal with that is flat broken. But I think letting either a BEFORE or INSTEAD trigger cancel the command is going to break things, and shouldn't be allowed without a lot of careful study. So -1 from me on supporting INSTEAD triggers in the first version of this. >> I'll grant you that most people probably do not execute enough DDL for >> the cost of those extra get_namespace_name() calls to add up, but I'm >> not completely sure it's a negligible overhead in general, and in any >> case I think it's a safe bet that there will be continuing demand to >> add more information to the set of things that are supplied to the >> trigger. So I think that should be rethought. It'd be nice to have a >> cheap way to say if (AnyCommandTriggersFor(command_tag)) { ... do >> stuff ... }, emphasis on cheap. There's a definitional problem here, > > It's as cheap as scanning a catalog, as of now. I didn't install a new > catalog cache for command triggers, as I don't think this code path is > performance critical enough to pay back for the maintenance cost. Also > consider that a big user of command triggers might have as much as a > couple of triggers (BEFORE/AFTER) per command, that's about 300 of them. Yowza. A catalog scan is WAY more expensive than a syscache lookup. I definitely don't think you can afford to have every command result in an extra index probe into pg_cmdtrigger. You definitely need some kind of caching there. Or at least, I think you do. You could try pgbench -f foo.sql, where foo.sql repeatedly creates and drops a function. See if there's a significant slowdown with your patch vs. HEAD. If there is, you need some caching. You might actually need some whole new type of sinval message to make this work efficiently. >> too, which is that you're supplying to the trigger the aggregate name >> *as supplied by the user* and the schema name as it exists at the time >> we do the reverse lookup. That reverse lookup isn't guaranteed to >> work at all, and it's definitely not guaranteed to produce an answer >> that's consistent with the aggName field. Maybe there's no better >> way, but it would at least be better to pass the namespace OID rather >> than the name. That way, the name lookup can be deferred until we are >> sure that we actually need to call something. > > I'm confused here, because all error messages that needs to contain the > namespace are doing exactly the same thing as I'm doing in my patch. Hmm. I wonder what happens if those errors fire after the schema has been dropped? I suppose the real answer here is probably to add enough locking that that can't happen in the first place... so maybe this isn't an issue for your patch to worry about. >> 5. I'm not entirely convinced that it makes sense to support command >> triggers on commands that affect shared objects. It seems odd that >> such triggers will fire or not fire depending on which database is >> currently selected. I think that could lead to user confusion, too. > > You mean tablespace here, I guess, what else? I don't think I've added > other shared objects in there yet. I share your analyze btw, will > remove support. And databases. >> 7. I don't have a strong feeling on what the psql command should be >> called, but \dcT seems odd. Why one lower-case letter and one >> upper-case letter? > > Well \dt and \dT are already taken, so I got there. Will change to > something else, e.g. \dct would be your choice here? Yeah, probably. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I'm just saying that nobody's realistically going to be able to verify > a patch of this size. It's either going to get committed warts and > all, or it's going to not get committed. Decomposing it into a series > of patches would make it possible to actually verify the logic. I > guess on reflection I don't really care whether you decompose it at > this point; the parts are pretty independent and it's easy enough to > revert pieces of it. But if I commit any of this it's certainly not > going to be the whole thing in one go. Ok, I can perfectly understand that. The principled implementation is not saving us here, we still need to review each call site. The way I read your comment, I continue working on my big patch and we'll see what pieces get in, right? > I think that there is no problem with cancelling a command via RAISE > EXCEPTION. It's an established precedent that errors can be thrown > anywhere, and any code that doesn't deal with that is flat broken. Sure. > But I think letting either a BEFORE or INSTEAD trigger cancel the > command is going to break things, and shouldn't be allowed without a > lot of careful study. So -1 from me on supporting INSTEAD triggers in > the first version of this. I'm sad about that, but I hear you. Removing. > Yowza. A catalog scan is WAY more expensive than a syscache lookup. > I definitely don't think you can afford to have every command result > in an extra index probe into pg_cmdtrigger. You definitely need some > kind of caching there. > > Or at least, I think you do. You could try pgbench -f foo.sql, where > foo.sql repeatedly creates and drops a function. See if there's a > significant slowdown with your patch vs. HEAD. If there is, you need > some caching. You might actually need some whole new type of sinval > message to make this work efficiently. Ok, I will test that down the road (before the end of this week). >> I'm confused here, because all error messages that needs to contain the >> namespace are doing exactly the same thing as I'm doing in my patch. > > Hmm. I wonder what happens if those errors fire after the schema has > been dropped? I suppose the real answer here is probably to add > enough locking that that can't happen in the first place... so maybe > this isn't an issue for your patch to worry about. I get it that I continue doing things this way, get_namespace_name() and friends are trustworthy as far as I'm concerned. >> You mean tablespace here, I guess, what else? I don't think I've added >> other shared objects in there yet. I share your analyze btw, will >> remove support. > > And databases. Right, on their way. >> something else, e.g. \dct would be your choice here? > > Yeah, probably. Next version will sports that. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Feb 15, 2012 at 4:32 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Ok, I can perfectly understand that. The principled implementation is > not saving us here, we still need to review each call site. The way I > read your comment, I continue working on my big patch and we'll see what > pieces get in, right? Yeah, I think that makes sense. I'll have a look at your next version when it shows up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Please find attached version 8 of the patch, which fixes most of your complaints. Remaining tasks for me here: closing support for all interesting commands, and improving docs with a section about command triggers in the general trigger discussion in doc/src/sgml/trigger.sgml. Robert Haas <robertmhaas@gmail.com> writes: > 1. I fear that the collection of commands to which this applies is > going to end up being kind of a random selection. I suggest that for I didn't change that (except for shared objects cleaning), and I still think it's not too much work to add all objects we care offering command triggers for in this release. > 2. Currently, we have some objects (views) that support INSTEAD OF > triggers and others (tables) that support BEFORE and AFTER triggers. > I don't see any advantage in supporting both. INSTEAD OF command triggers are now gone. > 3. I am not at all convinced that it's safe to allow command triggers > to silently (i.e. without throwing an error) cancel the operation. That's gone too. > 4. This API forces all the work of setting up the CommandContext to be > done regardless of whether any command triggers are present: Fixed, I've spent quite some time refining the API. > trigger. So I think that should be rethought. It'd be nice to have a > cheap way to say if (AnyCommandTriggersFor(command_tag)) { ... do > stuff ... }, emphasis on cheap. There's a definitional problem here, The CommandContext initialization is now doing the index scan, then the command context properties are only filled when we actually want to run some user functions. Both BEFORE and AFTER command triggers context filling are now protected. > 5. I'm not entirely convinced that it makes sense to support command > triggers on commands that affect shared objects. It seems odd that > such triggers will fire or not fire depending on which database is > currently selected. I think that could lead to user confusion, too. Agreed, they're now gone. > 6. Why do we need all this extra > copyfuncs/equalfuncs/outfuncs/readfuncs support? I thought we were > dropping passing node trees for 9.2. I still have to clean that up, and as it's not as simple as cancelling any change I've made in the files, allow me to still defer to another version of the patch. > 7. I don't have a strong feeling on what the psql command should be > called, but \dcT seems odd. Why one lower-case letter and one > upper-case letter? The psql command is now \dct. > In general, I like the direction that this is going. But I think it > will greatly improve its chances of being successful and relatively > non-buggy if we strip it down to something very simple for an initial > commit, and then add more stuff piece by piece. Having now removed support for the ability to cancel a command without raising an error, and after some API cleaning, I think it's now much easier to review the patch and grow confidence into it being sane. Also, I've added regression tests. They are only exercised by the serial schedule (make installcheck) because they would randomly ruin the output of the parallel schedule and make it impossible for pg_regress to do its job here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
On Thu, Feb 16, 2012 at 12:42 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Fixed, I've spent quite some time refining the API. That is better. What could still be done here is to create a cache (along the lines of attoptcache.c) that stores a big array with one slot for each supported command type. The first time a command is executed in a particular session, we construct a cache entry for that command type, storing the OIDs of the before and after triggers. Then, instead of having to do a catalog scan to notice that there are no triggers for a given command type, we can just reference into the array and see, oh, we probed before and found no triggers. CacheRegisterSyscacheCallback or some other mechanism can be used to flush all the cached information whenever we notice that pg_cmdtrigger has been updated. Now, I don't know for sure that this is necessary without performance testing, but I think we ought to try to figure that out. This version doesn't apply cleanly for me; there is a conflict in functioncmds.c, which precludes my easily benchmarking it. It strikes me that this patch introduces a possible security hole: it allows command triggers to be installed by the database owner, but that seems like it might allow the database owner can usurp the privileges of any user who runs one of these commands in his or her database, including the superuser. Perhaps that could be fixed by running command triggers as the person who created them rather than as the superuser, but that seems like it might be lead to other kinds of privilege-escalation bugs. If I install a command trigger that prevents all DDL, how do I uninstall it? Or say I'm the superuser and I want to undo something my disgruntled DBA did before he quit. I would much prefer to have DropCmdTrigStmt wedge itself into the existing DropStmt infrastructure which I just recently worked so hard on. If you do that, you should find that you can then easily also support comments on command triggers, security labels on command triggers (though I don't know if that's useful), and the ability to include command triggers in extensions. I am a bit worried about supporting command triggers on statements that do internal transaction control, such as VACUUM and CREATE INDEX CONCURRENTLY. Obviously that's very useful and I'd like to have it, but there's a problem: if the AFTER trigger errors out, it won't undo the whole command. That might be very surprising. BEFORE triggers seem OK, and AFTER triggers might be OK too but we at least need to think hard about how to document that. I think it would be better to bail on trying to use CREATE TRIGGER and DROP TRIGGER as a basis for this functionality, and instead create completely new toplevel statements CREATE COMMAND TRIGGER and DROP COMMAND TRIGGER. Then, you could decide that all command triggers live in the same namespace, and therefore to get rid of the command trigger called bob you can just say "DROP COMMAND TRIGGER bob", without having to specify the type of command it applies to. It's still clear that you're dropping a *command* trigger because that's right in the statement name, whereas it would make me a bit uneasy to decide that "DROP TRIGGER bob" means a command trigger just by virtue of the fact that no table name was specified. That would probably also make it easier to accomplish the above-described goal of integrating this into the DropStmt infrastructure. + if (!superuser()) + if (!pg_database_ownercheck(MyDatabaseId, GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_DATABASE, + get_database_name(MyDatabaseId)); The separate superuser check is superfluous; pg_database_ownercheck() already handles that. Can we call InitCommandContext in some centralized place, rather than separately at lots of different call sites? I am confused why this is adding a new file called dumpcatalog.c which looks suspiciously similar to some existing pg_dump code I've been staring at all day. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Dimitri Fontaine's message of jue feb 16 14:42:26 -0300 2012: > Hi, > > Please find attached version 8 of the patch, which fixes most of your > complaints. Hi, I didn't like the new cmdtrigger.h file. It's included by a lot of other headers, and it's also itself including execnodes.h and parsenodes.h which means practically the whole lot of the source tree is included in turn. If you could split it, so that the struct definition is in a different file that's only included by the few .c files that actually use that struct, I'd be much happier. ... after looking at it more closely, I think only this line needs to be in a separate file: typedef struct CommandContextData *CommandContext; and that file is included by other headers; they don't need the function declarations or the struct definition. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi, First answer now, new patch version tomorrow. Robert Haas <robertmhaas@gmail.com> writes: > That is better. Cool :) > What could still be done here is to create a cache (along the lines of > attoptcache.c) that stores a big array with one slot for each > supported command type. The first time a command is executed in a > particular session, we construct a cache entry for that command type, > storing the OIDs of the before and after triggers. Then, instead of > having to do a catalog scan to notice that there are no triggers for a > given command type, we can just reference into the array and see, oh, > we probed before and found no triggers. CacheRegisterSyscacheCallback > or some other mechanism can be used to flush all the cached > information whenever we notice that pg_cmdtrigger has been updated. I guess it's another spelling for a catalog cache, so I'll look at what it takes to build either a full catcache or this array cache you're talking about tomorrow. > Now, I don't know for sure that this is necessary without performance > testing, but I think we ought to try to figure that out. This version > doesn't apply cleanly for me; there is a conflict in functioncmds.c, > which precludes my easily benchmarking it. Means I need to update my master's branch and merge conflicts. You could also test right from my github branch too, I guess. https://github.com/dimitri/postgres https://github.com/dimitri/postgres/tree/command_triggers > It strikes me that this patch introduces a possible security hole: it > allows command triggers to be installed by the database owner, but > that seems like it might allow the database owner can usurp the > privileges of any user who runs one of these commands in his or her > database, including the superuser. Perhaps that could be fixed by > running command triggers as the person who created them rather than as > the superuser, but that seems like it might be lead to other kinds of > privilege-escalation bugs. We could decide command triggers are superuser only. Security is not something I'm very strong at, so I'll leave it up to you to decide. > If I install a command trigger that prevents all DDL, how do I > uninstall it? Or say I'm the superuser and I want to undo something > my disgruntled DBA did before he quit. Good catch, I guess we need to remove creating and dropping a command trigger to the list of supported commands in the ANY COMMAND list. > I would much prefer to have DropCmdTrigStmt wedge itself into the > existing DropStmt infrastructure which I just recently worked so hard > on. If you do that, you should find that you can then easily also > support comments on command triggers, security labels on command > triggers (though I don't know if that's useful), and the ability to > include command triggers in extensions. Ah yes, that too was on the TODO list, I just forgot about it. I still remember the merge conflicts when that patch went it, you know… :) > I am a bit worried about supporting command triggers on statements > that do internal transaction control, such as VACUUM and CREATE INDEX > CONCURRENTLY. Obviously that's very useful and I'd like to have it, > but there's a problem: if the AFTER trigger errors out, it won't undo > the whole command. That might be very surprising. BEFORE triggers > seem OK, and AFTER triggers might be OK too but we at least need to > think hard about how to document that. I think we should limit the support to BEFORE command trigger only. It was unclear to me how to solve the problem for the AFTER command case, and if you're unclear to, then there's not that many open questions. > I think it would be better to bail on trying to use CREATE TRIGGER and > DROP TRIGGER as a basis for this functionality, and instead create > completely new toplevel statements CREATE COMMAND TRIGGER and DROP > COMMAND TRIGGER. Then, you could decide that all command triggers > live in the same namespace, and therefore to get rid of the command > trigger called bob you can just say "DROP COMMAND TRIGGER bob", > without having to specify the type of command it applies to. It's I have no strong feeling about that. Would that require that COMMAND be more reserved than it currently is (UNRESERVED_KEYWORD)? > still clear that you're dropping a *command* trigger because that's > right in the statement name, whereas it would make me a bit uneasy to > decide that "DROP TRIGGER bob" means a command trigger just by virtue > of the fact that no table name was specified. That would probably > also make it easier to accomplish the above-described goal of > integrating this into the DropStmt infrastructure. The other thing is that you might want to drop the trigger from only one command, of course we could support both syntax. We could also add support for the following one: DROP TRIGGER bob ON ALL COMMANDS; As ALL is already a reserved word, that will work. > + if (!superuser()) > + if (!pg_database_ownercheck(MyDatabaseId, GetUserId())) > + aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_DATABASE, > + > get_database_name(MyDatabaseId)); > > The separate superuser check is superfluous; pg_database_ownercheck() > already handles that. Ok, let's see if we keep the feature open to database owners then. > Can we call InitCommandContext in some centralized place, rather than > separately at lots of different call sites? Then you have to either make the current command context a backend private global variable, or amend even more call sites to pass it down. The global variable idea does not work, see RemoveRelations() and RemoveObjects() which are handling an array of command contexts. So do you prefer lots of InitCommandContext() or adding another parameter to almost all functions called by standard_ProcessUtility()? > I am confused why this is adding a new file called dumpcatalog.c which > looks suspiciously similar to some existing pg_dump code I've been > staring at all day. Merge or diff issue I think, I will look into that, I don't know. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Alvaro Herrera <alvherre@commandprompt.com> writes: > I didn't like the new cmdtrigger.h file. It's included by a lot of > other headers, and it's also itself including execnodes.h and > parsenodes.h which means practically the whole lot of the source tree > is included in turn. If you could split it, so that the struct > definition is in a different file that's only included by the few .c > files that actually use that struct, I'd be much happier. I didn't realize that, thanks for reviewing! > ... after looking at it more closely, I think only this line needs to be > in a separate file: > > typedef struct CommandContextData *CommandContext; > > and that file is included by other headers; they don't need the function > declarations or the struct definition. I'll look into that tomorrow then. The same trick is already applied to Relation and RelationData (resp. in src/include/utils/relcache.h and src/include/utils/rel.h), and only now I understand why :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Feb 16, 2012 at 4:21 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > We could decide command triggers are superuser only. Security is not > something I'm very strong at, so I'll leave it up to you to decide. That's certainly the easiest option. If you don't feel passionate about spending a lot of energy figuring out how to make it secure, then I suggest we just restrict it to superusers until someone does. >> If I install a command trigger that prevents all DDL, how do I >> uninstall it? Or say I'm the superuser and I want to undo something >> my disgruntled DBA did before he quit. > > Good catch, I guess we need to remove creating and dropping a command > trigger to the list of supported commands in the ANY COMMAND list. Another option would be to add a PGC_SUSET boolean GUC that can be used to disable command triggers. I think that might be more flexible, not to mention useful for recursion prevention. >> I am a bit worried about supporting command triggers on statements >> that do internal transaction control, such as VACUUM and CREATE INDEX >> CONCURRENTLY. Obviously that's very useful and I'd like to have it, >> but there's a problem: if the AFTER trigger errors out, it won't undo >> the whole command. That might be very surprising. BEFORE triggers >> seem OK, and AFTER triggers might be OK too but we at least need to >> think hard about how to document that. > > I think we should limit the support to BEFORE command trigger only. It > was unclear to me how to solve the problem for the AFTER command case, > and if you're unclear to, then there's not that many open questions. Works for me. >> I think it would be better to bail on trying to use CREATE TRIGGER and >> DROP TRIGGER as a basis for this functionality, and instead create >> completely new toplevel statements CREATE COMMAND TRIGGER and DROP >> COMMAND TRIGGER. Then, you could decide that all command triggers >> live in the same namespace, and therefore to get rid of the command >> trigger called bob you can just say "DROP COMMAND TRIGGER bob", >> without having to specify the type of command it applies to. It's > > I have no strong feeling about that. Would that require that COMMAND be > more reserved than it currently is (UNRESERVED_KEYWORD)? It shouldn't. >> still clear that you're dropping a *command* trigger because that's >> right in the statement name, whereas it would make me a bit uneasy to >> decide that "DROP TRIGGER bob" means a command trigger just by virtue >> of the fact that no table name was specified. That would probably >> also make it easier to accomplish the above-described goal of >> integrating this into the DropStmt infrastructure. > > The other thing is that you might want to drop the trigger from only one > command, of course we could support both syntax. We could also add > support for the following one: > > DROP TRIGGER bob ON ALL COMMANDS; Uh, hold on. Creating a trigger on multiple commands ought to only create one entry in pg_cmdtrigger. If you drop it, you drop the whole thing. Changing which commands the trigger applies to would be the job for ALTER, not DROP. But note that we have no similar functionality for regular triggers, so I can't think we really need it here either. >> Can we call InitCommandContext in some centralized place, rather than >> separately at lots of different call sites? > > Then you have to either make the current command context a backend > private global variable, or amend even more call sites to pass it down. > The global variable idea does not work, see RemoveRelations() and > RemoveObjects() which are handling an array of command contexts. > > So do you prefer lots of InitCommandContext() or adding another parameter > to almost all functions called by standard_ProcessUtility()? Blech. Maybe we should just have CommandFiresTriggers initialize it if not already done? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Feb 16, 2012 at 4:21 PM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: > That's certainly the easiest option. If you don't feel passionate > about spending a lot of energy figuring out how to make it secure, > then I suggest we just restrict it to superusers until someone does. Works for me. >>> If I install a command trigger that prevents all DDL, how do I >>> uninstall it? Or say I'm the superuser and I want to undo something >>> my disgruntled DBA did before he quit. >> >> Good catch, I guess we need to remove creating and dropping a command >> trigger to the list of supported commands in the ANY COMMAND list. > > Another option would be to add a PGC_SUSET boolean GUC that can be > used to disable command triggers. I think that might be more > flexible, not to mention useful for recursion prevention. Wait, we already have ALTER TRIGGER bob ON ANY COMMAND SET DISABLED; which I had forgotten about in the previous answer, so I think we're good as it is. That's how I prevented recursion in some of my tests (not included in the regress tests, was using INSTEAD OF). >> DROP TRIGGER bob ON ALL COMMANDS; > > Uh, hold on. Creating a trigger on multiple commands ought to only > create one entry in pg_cmdtrigger. If you drop it, you drop the whole > thing. Changing which commands the trigger applies to would be the > job for ALTER, not DROP. But note that we have no similar > functionality for regular triggers, so I can't think we really need it > here either. Why would we do it that way (a single entry for multiple commands)? The way it is now, it's only syntactic sugar, so I think it's easier to implement, document and use. >> So do you prefer lots of InitCommandContext() or adding another parameter >> to almost all functions called by standard_ProcessUtility()? > > Blech. Maybe we should just have CommandFiresTriggers initialize it > if not already done? Thing is, in a vast number of cases (almost of ALTER OBJECT name, namespace and owner) you don't have the Node * parse tree any more from the place where you check for CommandFiresTriggers(), so you can't initialize there. That's part of the fun. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Feb 16, 2012 at 5:38 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: >> Another option would be to add a PGC_SUSET boolean GUC that can be >> used to disable command triggers. I think that might be more >> flexible, not to mention useful for recursion prevention. > > Wait, we already have ALTER TRIGGER bob ON ANY COMMAND SET DISABLED; > which I had forgotten about in the previous answer, so I think we're > good as it is. That's how I prevented recursion in some of my tests > (not included in the regress tests, was using INSTEAD OF). Eh, so what happens then if someone sets a command trigger on ALTER TRIGGER? >>> DROP TRIGGER bob ON ALL COMMANDS; >> >> Uh, hold on. Creating a trigger on multiple commands ought to only >> create one entry in pg_cmdtrigger. If you drop it, you drop the whole >> thing. Changing which commands the trigger applies to would be the >> job for ALTER, not DROP. But note that we have no similar >> functionality for regular triggers, so I can't think we really need it >> here either. > > Why would we do it that way (a single entry for multiple commands)? The > way it is now, it's only syntactic sugar, so I think it's easier to > implement, document and use. Well, for one thing, it's consistent with how we handle it for regular triggers. For two things, if you create an object named bob, you expect to end up with an object named bob - not 47 objects (or whatever) that are all named bob. Also, suppose you create a trigger on ALL COMMANDS, and then a new version of PG adds a new command. When you dump and reload, do you expect to end up with a trigger on all commands that existed in the old version, or all the commands that exist in the new version? Or conversely, suppose we get rid of a command in a future release. How will we handle that? I can't think of another example of where a CREATE command creates multiple objects like that. >>> So do you prefer lots of InitCommandContext() or adding another parameter >>> to almost all functions called by standard_ProcessUtility()? >> >> Blech. Maybe we should just have CommandFiresTriggers initialize it >> if not already done? > > Thing is, in a vast number of cases (almost of ALTER OBJECT name, > namespace and owner) you don't have the Node * parse tree any more from > the place where you check for CommandFiresTriggers(), so you can't > initialize there. That's part of the fun. Hmm, I'll look at this in more detail next time through. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> Wait, we already have ALTER TRIGGER bob ON ANY COMMAND SET DISABLED; > > Eh, so what happens then if someone sets a command trigger on ALTER TRIGGER? We should remove support for command triggers on alter command triggers. Well I could also go with the GUC idea, it's only that I'm not entirely sold it's the best we can do yet and I'd like to avoid yet another GUC. >> Why would we do it that way (a single entry for multiple commands)? The >> way it is now, it's only syntactic sugar, so I think it's easier to >> implement, document and use. > > Well, for one thing, it's consistent with how we handle it for regular > triggers. For two things, if you create an object named bob, you I don't think so, if you attach the same procedure to more than one table each time with the same name, you get multiple entries in pg_trigger: "pg_trigger_tgrelid_tgname_index" UNIQUE, btree (tgrelid, tgname) create trigger footg after insert on tg.foo for each row execute procedure tg.trigfunc(); create trigger footg after insert on tg.bar for each row execute procedure tg.trigfunc(); create trigger footg after insert on tg.baz for each row execute procedure tg.trigfunc(); select oid, tgrelid::regclass, tgname, tgfoid, tgtype, tgenabled from pg_trigger; oid | tgrelid | tgname | tgfoid | tgtype| tgenabled --------+---------+--------+--------+--------+-----------533210 | tg.foo | footg | 533209 | 5 | O533211 | tg.bar | footg | 533209 | 5 | O533212 | tg.baz | footg | 533209 | 5 | O (3 rows) The difference I see is that in the table trigger case you don't have a syntax that allows you to do the 3 operations I did above in 1 command, and it's easy to provide for this capability with command triggers (and the use case is much bigger too, as all command triggers are given the same arguments and all expected to return void). > expect to end up with an object named bob - not 47 objects (or > whatever) that are all named bob. Also, suppose you create a trigger > on ALL COMMANDS, and then a new version of PG adds a new command. You create a trigger on ANY command :) > When you dump and reload, do you expect to end up with a trigger on > all commands that existed in the old version, or all the commands that > exist in the new version? Or conversely, suppose we get rid of a > command in a future release. How will we handle that? I can't think > of another example of where a CREATE command creates multiple objects > like that. ANY COMMAND triggers are just one entry in pg_cmdtrigger, with the command name registered as "ANY", which is only safe as long as we don't provide a new SQL command whose command tag is ANY. We could decide that we want to name this magic ANY command "__ANY__", but it does not look like it fits the project usual naming style. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Alvaro Herrera <alvherre@commandprompt.com> writes: > ... after looking at it more closely, I think only this line needs to be > in a separate file: > > typedef struct CommandContextData *CommandContext; Files like src/backend/commands/tablecmds.c and others need both the structure and the pointer, so we need both. What about putting those definitions into src/include/catalog/pg_cmdtrigger.h? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Alvaro Herrera <alvherre@commandprompt.com> writes: > I didn't like the new cmdtrigger.h file. It's included by a lot of > other headers, and it's also itself including execnodes.h and It turns around that this file does not need including execnode.h, I've cleaned that up now (compile ok, make installcheck ok). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Feb 17, 2012 at 10:54 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> Wait, we already have ALTER TRIGGER bob ON ANY COMMAND SET DISABLED; >> >> Eh, so what happens then if someone sets a command trigger on ALTER TRIGGER? > > We should remove support for command triggers on alter command triggers. > Well I could also go with the GUC idea, it's only that I'm not entirely > sold it's the best we can do yet and I'd like to avoid yet another GUC. Btw, we already have a GUC for triggers: session_replication_role, how will the command triggers follow that? -- marko
Marko Kreen <markokr@gmail.com> writes: > Btw, we already have a GUC for triggers: session_replication_role, > how will the command triggers follow that? Note that the replica here in my mind would have been an Hot Standby node, and having the standby run the replica/always command triggers is not implemented yet, because you can't run DDL on the standby. Now that you mention it we should also provide support the GUC here and only fire the triggers matching it. I'm working on that now. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Feb 17, 2012 at 3:54 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> Wait, we already have ALTER TRIGGER bob ON ANY COMMAND SET DISABLED; >> >> Eh, so what happens then if someone sets a command trigger on ALTER TRIGGER? > > We should remove support for command triggers on alter command triggers. > Well I could also go with the GUC idea, it's only that I'm not entirely > sold it's the best we can do yet and I'd like to avoid yet another GUC. I'm OK with not supporting command triggers on command triggers, but I still think the GUC is useful. Keep in mind that flipping a GUC is really cheap compared to a catalog change, and can affect just one session. Those are significant advantages. However, if you want to just not support triggers on statements that modify command triggers, I'm OK with that, too. >>> Why would we do it that way (a single entry for multiple commands)? The >>> way it is now, it's only syntactic sugar, so I think it's easier to >>> implement, document and use. >> >> Well, for one thing, it's consistent with how we handle it for regular >> triggers. For two things, if you create an object named bob, you > > I don't think so, if you attach the same procedure to more than one > table each time with the same name, you get multiple entries in > pg_trigger: > > "pg_trigger_tgrelid_tgname_index" UNIQUE, btree (tgrelid, tgname) > > create trigger footg after insert on tg.foo for each row execute procedure tg.trigfunc(); > create trigger footg after insert on tg.bar for each row execute procedure tg.trigfunc(); > create trigger footg after insert on tg.baz for each row execute procedure tg.trigfunc(); Sure, but if you run the same trigger on multiple operations - INSERT OR UPDATE OR DELETE. >> expect to end up with an object named bob - not 47 objects (or >> whatever) that are all named bob. Also, suppose you create a trigger >> on ALL COMMANDS, and then a new version of PG adds a new command. > > You create a trigger on ANY command :) Oh. Well, then +1 for me on the ANY COMMAND thing, but -1 on ALL COMMANDS. I can't see that there's enough utility to having a bulk-create functionality to justify its existence. The ANY COMMAND thing I think is what people will want. > ANY COMMAND triggers are just one entry in pg_cmdtrigger, with the > command name registered as "ANY", which is only safe as long as we don't > provide a new SQL command whose command tag is ANY. We could decide that > we want to name this magic ANY command "__ANY__", but it does not look > like it fits the project usual naming style. I am thinking that we should ditch the idea of keeping track of commands using strings and instead assign a bunch of integer constants using a big enum. The parser can translate from what the user enters to these constants and then use those throughout, including in the system catalogs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 17, 2012 at 4:04 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Marko Kreen <markokr@gmail.com> writes: >> Btw, we already have a GUC for triggers: session_replication_role, >> how will the command triggers follow that? > > Note that the replica here in my mind would have been an Hot Standby > node, and having the standby run the replica/always command triggers is > not implemented yet, because you can't run DDL on the standby. But we will be able? Thats news to me. I'm more interested whether it follows ordinary trigger behaviour on Slony/Londiste slave node. -- marko
Hi, Please find attached version 9 of this patch, answering to most open comments about it. Robert Haas <robertmhaas@gmail.com> writes: > That's certainly the easiest option. If you don't feel passionate > about spending a lot of energy figuring out how to make it secure, > then I suggest we just restrict it to superusers until someone does. Done. >> Good catch, I guess we need to remove creating and dropping a command >> trigger to the list of supported commands in the ANY COMMAND list. Done. > Another option would be to add a PGC_SUSET boolean GUC that can be > used to disable command triggers. I think that might be more > flexible, not to mention useful for recursion prevention. I implemented support for the session_replication_role GUC so that you can SET session_replication_role TO replica; and avoid any command trigger running (if you didn't ALTER them from their default setting). Of course you can also use the GUC in its intended purpose, as said Marko. >>> [CREATE INDEX CONCURRENTLY and VACUUM] >> I think we should limit the support to BEFORE command trigger only. It >> was unclear to me how to solve the problem for the AFTER command case, >> and if you're unclear to, then there's not that many open questions. > > Works for me. Done. Of course at the time the command trigger is created you can't distinguish if the CREATE INDEX command will be run CONCURRENTLY or not, so I've decided to issue a WARNING about it. >>> I think it would be better to bail on trying to use CREATE TRIGGER and >>> DROP TRIGGER as a basis for this functionality, and instead create >>> completely new toplevel statements CREATE COMMAND TRIGGER and DROP >>> COMMAND TRIGGER. Then, you could decide that all command triggers >>> live in the same namespace, and therefore to get rid of the command >>> trigger called bob you can just say "DROP COMMAND TRIGGER bob", >>> without having to specify the type of command it applies to. It's Not done yet, I'm not sure we took a decision on this one. Also note that I chose to give DDL on command triggers their own command tag so that it's easy to separate them out in the command trigger context. Robert Haas <robertmhaas@gmail.com> writes: >> Wait, we already have ALTER TRIGGER bob ON ANY COMMAND SET DISABLED; > Eh, so what happens then if someone sets a command trigger on ALTER TRIGGER? You would need to set a command trigger on ALTER COMMAND TRIGGER and that's not supported. Triggers on command "ALTER TRIGGER" in fact will not get fired on ALTER TRIGGER ... ON COMMAND ... I guess that's a point to change the grammar the way you're hinting: CREATE COMMAND TRIGGER DROP COMMAND TRIGGER ALTER COMMAND TRIGGER That also needs each their own reference page. It will be easier on the users I guess. Will work on that. Robert Haas <robertmhaas@gmail.com> writes: > I'm OK with not supporting command triggers on command triggers, but I > still think the GUC is useful. Keep in mind that flipping a GUC is > really cheap compared to a catalog change, and can affect just one > session. Those are significant advantages. However, if you want to > just not support triggers on statements that modify command triggers, > I'm OK with that, too. Both done, if you agree with using session_replication_role here. > Sure, but if you run the same trigger on multiple operations - INSERT > OR UPDATE OR DELETE. I failed to see that analogy. The other problem with the current way of doing things is that I can't integrate with RemoveObjects(), and I think you won't like that :) >> You create a trigger on ANY command :) > > Oh. Well, then +1 for me on the ANY COMMAND thing, but -1 on ALL > COMMANDS. I can't see that there's enough utility to having a > bulk-create functionality to justify its existence. The ANY COMMAND > thing I think is what people will want. There's no such thing as ALL COMMANDS in the patch, there's a syntactic sugar allowing you to create and drop more than one command trigger in a single command, much as we have DROP TABLE foo, bar, baz; > I am thinking that we should ditch the idea of keeping track of > commands using strings and instead assign a bunch of integer constants > using a big enum. The parser can translate from what the user enters > to these constants and then use those throughout, including in the > system catalogs. It's not really command strings but the Command Tag we've historically been using up until now. You're saying that it should remain the same for users but change internally. No strong opinion from me here, apart from it being more code for doing the same thing. I mean we will need to get the command tag from the parse tree then change that into an integer thanks to yet another huge switch that will have to be maintained every time we add a new command. We only have too many of them now (not proposing to remove some). > Alvaro Herrera <alvherre@commandprompt.com> writes: >> I didn't like the new cmdtrigger.h file. It's included by a lot of >> other headers, and it's also itself including execnodes.h and Fixed in the attach. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: >>>> I think it would be better to bail on trying to use CREATE TRIGGER and >>>> DROP TRIGGER as a basis for this functionality, and instead create >>>> completely new toplevel statements CREATE COMMAND TRIGGER and DROP >>>> COMMAND TRIGGER. Then, you could decide that all command triggers >>>> live in the same namespace, and therefore to get rid of the command >>>> trigger called bob you can just say "DROP COMMAND TRIGGER bob", >>>> without having to specify the type of command it applies to. It's > > I guess that's a point to change the grammar the way you're hinting: > > CREATE COMMAND TRIGGER > DROP COMMAND TRIGGER > ALTER COMMAND TRIGGER > > That also needs each their own reference page. It will be easier on the > users I guess. Will work on that. FWIW I've pushed such a change to my github repository, I'm not spamming the list with v10 already though, unless someone wants to see it. https://github.com/dimitri/postgres/commit/82996b45aae10f12818f1e3097ba805fff22a97b Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Fri, Feb 17, 2012 at 10:42 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Done. Of course at the time the command trigger is created you can't > distinguish if the CREATE INDEX command will be run CONCURRENTLY or not, > so I've decided to issue a WARNING about it. That seems icky. Whatever warnings need to be given should be in the documentation, not at runtime. Another idea here would be to treat CREATE INDEX CONCURRENTLY as if it were a separate toplevel command, for command-trigger purposes only. But I'm not sure that's any better. > You would need to set a command trigger on ALTER COMMAND TRIGGER and > that's not supported. Triggers on command "ALTER TRIGGER" in fact will > not get fired on ALTER TRIGGER ... ON COMMAND ... > > I guess that's a point to change the grammar the way you're hinting: Indeed it is. :-) > CREATE COMMAND TRIGGER > DROP COMMAND TRIGGER > ALTER COMMAND TRIGGER > > That also needs each their own reference page. It will be easier on the > users I guess. Will work on that. Yeah, I think that will be much more clear, and not really that much work for you. It will also make the reference pages simpler, I think, since there are significant behavioral differences between ordinary triggers and command triggers. > Both done, if you agree with using session_replication_role here. It's better than a sharp stick in the eye. I'm not convinced it's ideal, but I don't feel strongly enough about the issue to push on it for now, as long as we disallow command triggers on CREATE/ALTER/DROP COMMAND TRIGGER. >> Sure, but if you run the same trigger on multiple operations - INSERT >> OR UPDATE OR DELETE. > > I failed to see that analogy. The other problem with the current way of > doing things is that I can't integrate with RemoveObjects(), and I think > you won't like that :) I sure won't. I think ultimately you won't like it either, since the objectaddress infrastructure is also needed to make this work with extensions. And I assume you would agree with me that extensions are an important feature. :-) >>> You create a trigger on ANY command :) >> >> Oh. Well, then +1 for me on the ANY COMMAND thing, but -1 on ALL >> COMMANDS. I can't see that there's enough utility to having a >> bulk-create functionality to justify its existence. The ANY COMMAND >> thing I think is what people will want. > > There's no such thing as ALL COMMANDS in the patch, there's a syntactic > sugar allowing you to create and drop more than one command trigger in a > single command, much as we have DROP TABLE foo, bar, baz; OK, I'll look more carefully. >> I am thinking that we should ditch the idea of keeping track of >> commands using strings and instead assign a bunch of integer constants >> using a big enum. The parser can translate from what the user enters >> to these constants and then use those throughout, including in the >> system catalogs. > > It's not really command strings but the Command Tag we've historically > been using up until now. You're saying that it should remain the same > for users but change internally. No strong opinion from me here, apart > from it being more code for doing the same thing. Well, the reason I thought it might be better is for caching purposes.If you have a cache of which triggers need to be runfor which commands, an integer index into an array will be a lot faster than a hash table lookup. But it may bear more examination, so I don't feel this is a must-do at this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Feb 17, 2012 at 10:42 AM, Dimitri Fontaine > <dimitri@2ndquadrant.fr> wrote: >> Done. Of course at the time the command trigger is created you can't >> distinguish if the CREATE INDEX command will be run CONCURRENTLY or not, >> so I've decided to issue a WARNING about it. > > That seems icky. Whatever warnings need to be given should be in the > documentation, not at runtime. Agreed. We're still making user visible changes though, so I wanted to defer docs editing some more. Even documented, a WARNING seems a good idea to me, but maybe you would prefer a NOTICE? > Another idea here would be to treat CREATE INDEX CONCURRENTLY as if it > were a separate toplevel command, for command-trigger purposes only. > But I'm not sure that's any better. The patch as it stands will fire the AFTER command trigger only when not using the CONCURRENTLY variant, which I think is enough, once documented. > Yeah, I think that will be much more clear, and not really that much > work for you. It will also make the reference pages simpler, I think, > since there are significant behavioral differences between ordinary > triggers and command triggers. Yeah done this way, still needed an overview section in triggers.sgml I think. >> Both done, if you agree with using session_replication_role here. > > It's better than a sharp stick in the eye. I'm not convinced it's > ideal, but I don't feel strongly enough about the issue to push on it > for now, as long as we disallow command triggers on CREATE/ALTER/DROP > COMMAND TRIGGER. We simply don't support those commands as far as command triggers are concerned, which seems to be like a sane limitation. > I sure won't. I think ultimately you won't like it either, since the > objectaddress infrastructure is also needed to make this work with > extensions. And I assume you would agree with me that extensions are > an important feature. :-) How you'd guess about that :) Will see about it later tonight, I'd like to keep the multiple command drop command trigger spelling. >> It's not really command strings but the Command Tag we've historically >> been using up until now. You're saying that it should remain the same >> for users but change internally. No strong opinion from me here, apart >> from it being more code for doing the same thing. > > Well, the reason I thought it might be better is for caching purposes. > If you have a cache of which triggers need to be run for which > commands, an integer index into an array will be a lot faster than a > hash table lookup. But it may bear more examination, so I don't feel > this is a must-do at this point. I've been trying to get a feeling of the runtime performance with command triggers in the line you suggested, even if I'd be very surprised that a couple of index scans are anything but noise when completing a DDL command. I'm having those results on my development machine: duration: 30 s number of transactions actually processed: 42390 tps = 1413.004051 (including connections establishing) tps = 1413.505517 (excluding connections establishing) statement latencies in milliseconds:0.705843 create or replace function plus1(int) returns bigint language sql as $$ select$1::bigint + 1; $$; I don't have the setup to compare that easily to current master's branch, I was hoping you would run tests on your side (btw the previous patch version is rebased against master and cleaned up, should be fine now — oh and in context format). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Hi, Robert Haas <robertmhaas@gmail.com> writes: > That seems icky. Whatever warnings need to be given should be in the > documentation, not at runtime. So, new patch is attached, with some more docs, but it's still shy on it though, mainly because I don't have a setup to build docs easily here. I intend to be completing docs on Monday. If you agree with the way it's implemented now, I'll integrate some more commands (all our commands expect for those working with shared objects, and restricting away after command trigger support for commands that handle transactions themselves). Once those two items are done, I believe we will reach the Grail^W “ready for commit” status, and that should happen early next week. >> CREATE COMMAND TRIGGER >> DROP COMMAND TRIGGER >> ALTER COMMAND TRIGGER > > Yeah, I think that will be much more clear, and not really that much > work for you. It will also make the reference pages simpler, I think, > since there are significant behavioral differences between ordinary > triggers and command triggers. This is implementing in the attached patch, v10. >> I failed to see that analogy. The other problem with the current way of >> doing things is that I can't integrate with RemoveObjects(), and I think >> you won't like that :) > > I sure won't. I think ultimately you won't like it either, since the > objectaddress infrastructure is also needed to make this work with > extensions. And I assume you would agree with me that extensions are > an important feature. :-) So now I've integrated with RemoveObjects(), which means it's no longer possible to drop a command trigger attached to more than one command in one go. Not a show stopper, but maybe worth noting here. >> It's not really command strings but the Command Tag we've historically >> been using up until now. You're saying that it should remain the same >> for users but change internally. No strong opinion from me here, apart >> from it being more code for doing the same thing. > > Well, the reason I thought it might be better is for caching purposes. > If you have a cache of which triggers need to be run for which > commands, an integer index into an array will be a lot faster than a > hash table lookup. But it may bear more examination, so I don't feel > this is a must-do at this point. I'm still not convinced that two index scans will be noticeable when executing a DDL command. So it all feels like premature optimization to me, even if I completely understand where you come from (most of the patch you worked on for this release share an easy to spot common theme after all). So I didn't implement a cache here yet in patch v10. I would hate to spend time on that to chase a 1% DDL-only performance regression. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachment
On Sunday, December 04, 2011 02:09:08 AM Andres Freund wrote: > First, does anybody think it would be worth getting rid of the duplication > from OpenIntoRel (formerly from execMain.c) in regard to DefineRelation()? > I noticed that there already is some diversion between both. E.g. CREATE > TABLE frak TABLESPACE pg_global AS SELECT 1; is possible while it would > be forbidden via a plain CREATE TABLE. (I will send a fix for this > separately). Sorry for letting this slide. Is it worth adding this bit to OpenIntoRel? Not sure if there is danger in allowing anyone to create shared tables /* In all cases disallow placing user relations in pg_global */if (tablespaceId == GLOBALTABLESPACE_OID) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("only shared relations can be placedin pg_global tablespace"))); Andres
Andres Freund <andres@anarazel.de> writes: > Sorry for letting this slide. > Is it worth adding this bit to OpenIntoRel? Not sure if there is danger in > allowing anyone to create shared tables > /* In all cases disallow placing user relations in pg_global */ > if (tablespaceId == GLOBALTABLESPACE_OID) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("only shared relations can be placed in pg_global > tablespace"))); Ugh ... if that's currently allowed, we definitely need to fix it. But I'm not sure OpenIntoRel is the right place. I'd have expected the test to be at some lower level, like heap_create_with_catalog or so. regards, tom lane
On Tuesday, February 28, 2012 12:30:36 AM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Sorry for letting this slide. > > > > Is it worth adding this bit to OpenIntoRel? Not sure if there is danger > > in allowing anyone to create shared tables > > > > /* In all cases disallow placing user relations in pg_global */ > > if (tablespaceId == GLOBALTABLESPACE_OID) > > > > ereport(ERROR, > > > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > > errmsg("only shared relations can be placed in pg_global > > > > tablespace"))); > > Ugh ... if that's currently allowed, we definitely need to fix it. > But I'm not sure OpenIntoRel is the right place. I'd have expected > the test to be at some lower level, like heap_create_with_catalog > or so. Its definitely allowed right now: test-upgrade=# CREATE TABLE foo TABLESPACE pg_global AS SELECT 1; SELECT 1 Time: 354.097 ms The analogous check for the missing one in OpenIntoRel for plain relations is in defineRelation. heap_create_with_catalog only contains the inverse check: /* * Shared relations must be in pg_global (last-ditch check) */if (shared_relation && reltablespace != GLOBALTABLESPACE_OID) elog(ERROR, "shared relations must be placed in pg_global tablespace"); Moving it there sounds like a good idea without any problem I can see right now. Want me to prepare a patch or is it just the same for you if you do it yourself? Andres
On Tuesday, February 28, 2012 12:30:36 AM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Sorry for letting this slide. > > > > Is it worth adding this bit to OpenIntoRel? Not sure if there is danger > > in allowing anyone to create shared tables > > > > /* In all cases disallow placing user relations in pg_global */ > > if (tablespaceId == GLOBALTABLESPACE_OID) > > > > ereport(ERROR, > > > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > > errmsg("only shared relations can be placed in pg_global > > > > tablespace"))); > > Ugh ... if that's currently allowed, we definitely need to fix it. Btw, whats the danger youre seing? Andres
Andres Freund <andres@anarazel.de> writes: > On Tuesday, February 28, 2012 12:30:36 AM Tom Lane wrote: >> Ugh ... if that's currently allowed, we definitely need to fix it. > Btw, whats the danger youre seing? Well, I'm not sure that it would actively break anything, but we definitely meant to disallow the case. Also, I seem to recall some places that intuit a relation's shared marker in the opposite direction (if it's in pg_global it must be shared), and that could definitely cause issues if we treat a rel as shared when it isn't. regards, tom lane
Excerpts from Tom Lane's message of lun feb 27 20:54:41 -0300 2012: > Andres Freund <andres@anarazel.de> writes: > > On Tuesday, February 28, 2012 12:30:36 AM Tom Lane wrote: > >> Ugh ... if that's currently allowed, we definitely need to fix it. > > > Btw, whats the danger youre seing? > > Well, I'm not sure that it would actively break anything, but we > definitely meant to disallow the case. Also, I seem to recall some > places that intuit a relation's shared marker in the opposite direction > (if it's in pg_global it must be shared), and that could definitely > cause issues if we treat a rel as shared when it isn't. The list of shared rels is hardcoded -- see IsSharedRelation. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tuesday, February 28, 2012 12:43:02 AM Andres Freund wrote: > On Tuesday, February 28, 2012 12:30:36 AM Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > Sorry for letting this slide. > > > > > > Is it worth adding this bit to OpenIntoRel? Not sure if there is danger > > > in allowing anyone to create shared tables > > > > > > /* In all cases disallow placing user relations in pg_global */ > > > if (tablespaceId == GLOBALTABLESPACE_OID) > > > > > > ereport(ERROR, > > > > > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > > > > errmsg("only shared relations can be placed in pg_global > > > > > > tablespace"))); > > > > Ugh ... if that's currently allowed, we definitely need to fix it. > > But I'm not sure OpenIntoRel is the right place. I'd have expected > > the test to be at some lower level, like heap_create_with_catalog > > or so. > > Its definitely allowed right now: > > test-upgrade=# CREATE TABLE foo TABLESPACE pg_global AS SELECT 1; > SELECT 1 > Time: 354.097 ms > > The analogous check for the missing one in OpenIntoRel for plain relations > is in defineRelation. heap_create_with_catalog only contains the inverse > check: > > /* > * Shared relations must be in pg_global (last-ditch check) > */ > if (shared_relation && reltablespace != GLOBALTABLESPACE_OID) > elog(ERROR, "shared relations must be placed in pg_global > tablespace"); > > > Moving it there sounds like a good idea without any problem I can see right > now. Want me to prepare a patch or is it just the same for you if you do it > yourself? Sorry to bother you with that dreary topic further, but this should probably be fixed before the next set of stable releases. The check cannot easily be moved to heap_create_with_catalog because e.g. cluster.c's make_new_heap does heap_create_with_catalog for a temporary copy of shared relations without being able to mark them as such (because they don't have the right oid and thus IsSharedRelation would cry). So I think just adding the same check to the ctas code as the normal DefineRelation contains is the best way forward. The attached patch applies from 8.3 to 9.1 (8.2 has conflicts but thankfully...). Andres
On Tue, Mar 20, 2012 at 12:02 PM, Andres Freund <andres@anarazel.de> wrote: > The attached patch applies from 8.3 to 9.1 (8.2 has conflicts but > thankfully...). It seems it doesn't apply to master (any more?). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wednesday, March 21, 2012 04:54:00 PM Robert Haas wrote: > On Tue, Mar 20, 2012 at 12:02 PM, Andres Freund <andres@anarazel.de> wrote: > > The attached patch applies from 8.3 to 9.1 (8.2 has conflicts but > > thankfully...). > > It seems it doesn't apply to master (any more?). Its not required there because of the unification of CTAS with normal code. Sorry, that was only clear from a few mails away, should have made that explicit. Andres
On Wed, Mar 21, 2012 at 11:58 AM, Andres Freund <andres@anarazel.de> wrote: > On Wednesday, March 21, 2012 04:54:00 PM Robert Haas wrote: >> On Tue, Mar 20, 2012 at 12:02 PM, Andres Freund <andres@anarazel.de> wrote: >> > The attached patch applies from 8.3 to 9.1 (8.2 has conflicts but >> > thankfully...). >> >> It seems it doesn't apply to master (any more?). > Its not required there because of the unification of CTAS with normal code. > Sorry, that was only clear from a few mails away, should have made that > explicit. Or maybe I should have tested it before jumping to conclusions. Anyway, I've committed the patch to 9.1, 9.0, 8.4, and 8.3. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company