Re: Event Triggers: adding information - Mailing list pgsql-hackers

From Dimitri Fontaine
Subject Re: Event Triggers: adding information
Date
Msg-id m2k3r27tej.fsf@2ndQuadrant.fr
Whole thread Raw
In response to Re: Event Triggers: adding information  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Event Triggers: adding information
Re: Event Triggers: adding information
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
> <dimitri@2ndquadrant.fr> wrote:
>>   - Context exposing and filtering
>
> I'm not feeling very sanguine about any of this.  I feel strongly that
> we should try to do something that's more principled than just
> throwing random junk into ProcessUtility.

The goal is to allow for advanced users of that feature to get at the
sequence, constraints and index names that have been picked
automatically by PostgreSQL when doing some CREATE TABLE statement that
involves them:
  CREATE TABLE foo (id serial PRIMARY KEY, f1 text);

Andres and Steve, as you're the next ones to benefit directly from that
whole feature and at different levels, do you have a strong opinion on
whether you really need that to operate at the logical level?

The trick is not implementing the feature (that's quite easy really),
it's about being able to provide it without any API promises, because
it's an opening on the internals.

Maybe the answer is to provide the same level of information as you get
in an Event Trigger to new hooks.

>>   - Additional Information to PL/pgSQL
>>
>>     We're talking about :
>>
>>       - operation   (CREATE, ALTER, DROP)
>>       - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …)
>>       - object OID        (when there's a single object target)
>>       - object shema name (when there's a single object target)
>>       - object name       (when there's a single object target)

I realise I omited something important here. The current patch effort is
only adding those information if the target object exists in the
catalogs at the time of the information lookup.

> In ddl_command_start, I believe you can only expose the information
> that is available in the parse tree.  That is, if the user typed
> CREATE TABLE foo, you can tell the event trigger that the user didn't
> specify a schema and that they mentioned the name foo.  You cannot

I used to do that, and that's the reason why I had the Command String
Normalisation in the same patch: both are walking the parsetree to some
extend, and I didn't want to have different routines doing different
walks to get at the information.

> predict what schema foo will actually end up in at that point, and we
> shouldn't try.  I don't have a problem with exposing the information
> we have at this point with the documented caveat that if you rely on
> it to mean something it doesn't you get to keep both pieces.

Fair enough. I retracted from doing that to be able to separate away the
parse tree walking code for later review, and I'm now wondering if
that's really what we want to do.

If auditing, you can log the information after the command has been run
(in case of CREATE and ALTER command) or before the object has been
deleted (in case of a DROP command).

If you want to cancel the whole operation depending on the name of the
object, or the schema it ends up in, then you can do it at the end of
the command's execution with a RAISE EXCEPTION.

> But I wonder: wouldn't it be better to just expose the raw string the
> user typed?  I mean, in all the cases we really care about, the
> information we can *reliably* expose at this point is going to be
> pretty nearly the same as extracting the third space-delimited word
> out of the command text and splitting it on a period.  And you can do
> that much easily even in PL/pgsql.  You could also extract a whole lot

Totally Agreed. That's another reason why I want to provide users with
the Normalized Command String, it will be then be even easier for them
to do what you just say.

After having been working on that for some time, though, I'm leaning
toward Tom's view that you shouldn't try to expose information about
objets that don't exists in the catalogs, because "that's not code,
that's magic".

> of OTHER potentially useful information from the command text that you
> couldn't get if we just exposed the given schema name and object name.
>  For example, this would allow you to write an event trigger that lets
> you enforce a policy that all column names must use Hungarian
> notation.  Note that this is a terrible idea and you shouldn't do it,
> but you could if you wanted to.  And, the trigger would be relatively
> long and complex and you might have bugs, but that's still better than
> the status quo where you are simply out of luck.

You seem to be talking about System Hungarian, not Apps Hungarian, but I
don't think that's the point here.
 http://www.joelonsoftware.com/articles/Wrong.html

> Now, in a ddl_command_end trigger, there's a lot more information that
> you can usefully expose.  In theory, if the command records "what it
> did" somewhere, you can extract that information with as much
> precision as it cared to record.  However, I'm pretty skeptical of the
> idea of exposing a single OID.  I mean, if the "main" use case here is

There's precedent in PostgreSQL: how do you get information about each
row that were in the target from a FOR EACH STATEMENT trigger?

> array of OIDs.  Really, you're going to need something quite a bit
> more complicated than that to describe something like ALTER TABLE, but
> even for pretty simple cases a single OID doesn't cut it.

If you want to solve that problem, let's talk about pseudo relations
that you can SELECT FROM, please. We can already expose a tuplestore in
PL code by means of cursors and SRF, but I'm not sure that's how we want
to expose the "statement level information" here.

The only commands that will target more than one object are:
 - DROP, either in the command or by means of CASCADE; - CREATE SCHEMA with its PROCESS_UTILITY_SUBCOMMAND usage;

At that, not all of the DropStmt variants allow more than one object,
only those are concerned:
TABLE, SEQUENCE, VIEW, INDEX, FOREIGN TABLE, EVENT TRIGGER,TYPE, DOMAIN, COLLATION, CONVERSION, SCHEMA, EXTENSION,TEXT
SEARCHPARSER, TEXT SEARCH DICTIONARY, TEXT SEARCHTEMPLATE, TEXT SEARCH CONFIGURATION 

So my proposal was to address that by exposing non-TOPLEVEL commands to
the Event Triggers users, so that the create schema targets are to be
found in their respective subcommand. As far as DROP are concerned, my
thinking was to process a DROP targetting several objects at once like
as many DROP commands as objects given in the command line, producing
parse tree nodes to feed as SUBCOMMANDs again.

The CASCADE case is something else entirely, and we need to be very
careful about its locking behavior. Also, in the CASCADE case I can
perfectly agree that we're not talking about a ddl_something event any
more, and that in 9.4 we will be able to provide a sql_drop generic
event that will now about that.

>>   - Additional Information to PL/*
>
> I'm not sure what you mean that you shouldn't be on the hook to
> provide that.  I don't think that you're obliged to submit patches to
> make this work for other PLs - indeed, I think it's too late for that
> for 9.3 - but I don't think anyone else is required to do it, either.

Mostly agreed. Do we want to ship with only PL/pgSQL support? I've not
been following how those PL features usually reach the other PLs…

> It doesn't seem like it would be too hard to get the other PLs up to
> the same level that PL/pgsql is at; I just didn't want to try to
> familiarize myself with PL/perl, PL/python, and PL/tcl, none of which
> I know the guts of and most of which I don't compile regularly, at the
> same time I was trying to grok everything else in the patch.

Same here. But I felt like I had to do it anyway before to submit the
patch, so the time consuming part of the job is done already.

>>   - Command String Normalisation
>
> I'm not sure we've reached a consensus here (what is it?), but let me

The consensus I feel we're reaching now is that it's an important
feature to have and there's no other practical way to have it.

> run an idea or two up the flagpole and see if anyone salutes.  It
> seems to me that there's some consensus that reverse-parsing may be a
> reasonable thing to include in some form, but there are a lot of
> unsolved problems there, including (1) that the parse tree might not
> contain enough information for the reverse-parser to describe what the
> command actually did, as opposed to what the user asked for, and (2)

For having actually implemented the most complex of our commands, namely
CREATE TABLE and ALTER TABLE, the trick is to complement the parse tree
with some catalog (cache) lookups to have the exact information about
what the command did, or to tweak the command's implementation to expose
it, as I did for ALTER TABLE.

> there may be important decisions which were made at execution time and
> which are needed for correct replication but which actually can't be
> represented in the syntax at all (I am not sure what these would be:
> but Tom seemed to think that such cases exist, so they need to be
> understood and thought about).

Well, try and get the name of the exclusion constraint created when
running that command, then rewrite it with the name injected in the
command:
 CREATE TABLE xyz(i int, exclude (i WITH =) where (i > 10) deferrable);

Another case is column DEFAULT values, but I don't think we care about
the constraint name in that case. It's possible to inject the column and
table level CHECK constraints names in the CREATE TABLE syntax, though.

I don't remember having hit another blocker here.

> If we're going to do reverse parsing, I would argue that we need to
> put some effort into making that more flexible than the designs I've
> seen so far.  For example, I believe your concept of command string

While that seems fair enough, I have to stress that it's also known as
feature creep and a sure way to kill a patch idea. Sometimes it's
exactly what needs to be done. Sometimes not.

> normalization includes the idea that we'd fully schema-qualify the
> object name.  Like... instead of saying CREATE TABLE foo (a text),
> we'd say CREATE TABLE public.foo (a text).  That would guarantee that
> when we replay the command on another machine, it ends up in the same
> schema.  But who is to say that's what the user wants?  You can
> imagine, for example, that a user might want to replicate schema
> public on node A into schema node_a on node B.  If we're back to

You have to remember at this point that the patch we're talking about is
not exposing *ONLY* the command string, but also the object name and the
schema name is nicely separated away variables.

> parsing the SQL at that point we are still ahead - but not very far
> ahead.  And, of course, the proper canonicalization is really CREATE
> TABLE public.foo (a pg_catalog.text), because there could be another
> type called text in some other schema on node B.  That may seem like a
> borderline case when the name of the type is "text" but if you
> substitute "bar" for "text" it suddenly starts to seem totally
> plausible that there could be more than one bar.  Of course if you

That's taken care of in my ddl rewriting branch, yes. Even int and
bigint could be rewritten with their real qualified names. See a
previsouly posted example in:
 http://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr
[… cut out description of …]
> get_array_of_object_names_and_schemas(pg_parse_tree), which would

> I certainly don't think this solves every problem.  In particular, we
> run the risk of exposing every internal detail of the parse tree,
> something Tom and I have both vigorously resisted in every discussion
> of event triggers since time immemorial.  On the other hand, the

And which is why we have very limited preprocessed variables exposed in
the event triggers instead, including the normalized command string,
rather than the parse tree itself (available only in C).

> way.  For example, if we add a function that tells you what object
> name the user typed, it doesn't need to expose the fact that the parse
> tree names that field differently in different DDL constructs, which
> certainly increases our odds of not going insane after a release or
> two of maintaining this code.

That's called tg_objectname and tg_schemaname in my proposal. Can you
come up with either a design proposal of an "external" stable
representation of the parse tree, or a more complete list of information
you want exposed?

My proposal is that the external stable representation format is another
couple of years of work just to reach some traction on its specifics,
and that adding more preprocessed information is better done in 9.4.

Meanwhile, you can always get at it from an extension coded in C.

> some kind of "container" data structure to which the DDL command can
> glue in whatever bits of information it wants to include, and then
> dump the result out as a 2-D text array or a JSON object or something.

Either you want to reach agreement on the container specs, and it's a 2
years project at best, or you want to weight in so much that something
gets into 9.3, and then you will make non compatible changes in 9.4, so
that all this work is wasted: the original goal is to prevent changing
that external format in incompatible ways in between major versions.
Baring that goal, we would just be exposing the parse tree as it is.

This line of though comes from the fact that I've been sitting in
different meetings with people wanting to solve that very problem for
external tools (SQL syntax highlighters, indentation, you name it) for
several years in a row now, and I'm yet to see such a data structure
specs emerge.

>>   - Non DDL related Events
>>
>>     Extra bonus items, a table_rewrite event. Unlikely for 9.3 at best,
>>     unless someone else wants to have at it, maybe?
>
> I'd like to take a crack at adding support for some kind of really
> cool event for 9.4 - I suspect this one is not what I'd pick as a
> first target, just because it seems hard.  But I'll mull it over.

I'll hand you another one then: have INSERT INTO create the destination
TABLE when it does not exists, for RAD users out there. You know, those
schema less NoSQL guys…

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: logical changeset generation v4 - Heikki's thoughts about the patch state
Next
From: Dimitri Fontaine
Date:
Subject: Re: Event Triggers: adding information