Thread: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

[PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

From
"Joe Wildish"
Date:
Hi hackers,

Attached is a patch for supporting queries in the WHEN expression of 
statement triggers. It is restricted so that the expression can 
reference only the transition tables and the table to which the trigger 
is attached. This seemed to make the most sense in that it follows what 
you can do in the per row triggers. I did have a look in the standards 
document about triggers, and couldn't see any restrictions mentioned, 
but nevertheless thought it made most sense.

One possibility controversial aspect is that the patch doesn't use SPI 
to evaluate the expression; it constructs a Query instead and passes it 
to the executor. Don't know what people's thoughts are on doing that?

-Joe
Attachment

Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

From
Daniel Gustafsson
Date:
> On 17 Jul 2020, at 00:22, Joe Wildish <joe@lateraljoin.com> wrote:

> Attached is a patch for supporting queries in the WHEN expression of statement triggers.at?

Hi!,

Please create an entry for this patch in the 2020-09 commitfest to make sure
it's properly tracked:

    https://commitfest.postgresql.org/29/

cheers ./daniel


Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

From
Surafel Temesgen
Date:

Hi Joe,

This is my review of your patch

On Fri, Jul 17, 2020 at 1:22 AM Joe Wildish <joe@lateraljoin.com> wrote:
Hi hackers,

Attached is a patch for supporting queries in the WHEN expression of
statement triggers.

 

- Currently, <literal>WHEN</literal> expressions cannot contain

- subqueries.

subqueries in row trigger's is not supported in your patch so the the documentation have to reflect it


+ </literal>UPDATE</literal> triggers are able to refer to both </literal>OLD</literal>

+ and <literal>NEW</literal>

Opening and ending tag mismatch on UPDATE and OLD literal so documentation build fails and please update the documentation on server programming section too


+ /*

+ * Plan the statement. No need to rewrite as it can only refer to the

+ * transition tables OLD and NEW, and the relation which is being

+ * triggered upon.

+ */

+ stmt = pg_plan_query(query, trigger->tgqual, 0, NULL);

+ dest = CreateDestReceiver(DestTuplestore);

+ store = tuplestore_begin_heap(false, false, work_mem);

+ tupdesc = CreateTemplateTupleDesc(1);

+ whenslot = MakeSingleTupleTableSlot(tupdesc, &TTSOpsMinimalTuple);

Instead of planning every time the trigger fire I suggest to store plan or prepared statement node so planning time can be saved


There are server crash on the following sequence of command

CREATE TABLE main_table (a int unique, b int);


CREATE FUNCTION trigger_func() RETURNS trigger LANGUAGE plpgsql AS '

BEGIN

RAISE NOTICE ''trigger_func(%) called: action = %, when = %, level = %'', TG_ARGV[0], TG_OP, TG_WHEN, TG_LEVEL;

RETURN NULL;

END;';


INSERT INTO main_table DEFAULT VALUES;


CREATE TRIGGER after_insert AFTER INSERT ON main_table

REFERENCING NEW TABLE AS NEW FOR EACH STATEMENT

WHEN (500 <= ANY(SELECT b FROM NEW union SELECT a FROM main_table))

EXECUTE PROCEDURE trigger_func('after_insert');


INSERT INTO main_table (a, b) VALUES

(101, 498),

(102, 499);

server crashed


regards

Surafel

Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

From
Michael Paquier
Date:
On Thu, Sep 03, 2020 at 09:22:31PM +0300, Surafel Temesgen wrote:
> server crashed

That's a problem.  As this feedback has not been answered after two
weeks, I am marking the patch as returned with feedback.
--
Michael

Attachment

Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

From
"Joe Wildish"
Date:
Hi Surafel,

On 3 Sep 2020, at 19:22, Surafel Temesgen wrote:

> This is my review of your patch

Thanks for the review.

> subqueries in row trigger's is not supported in your patch so the the
> documentation have to reflect it

It is still the case that the documentation says this. But, that may 
have been unclear as the documentation wouldn't compile (as you noted), 
so it wasn't possible to read it in the rendered form.

>
> + </literal>UPDATE</literal> triggers are able to refer to both
> </literal>OLD</literal>
>
> + and <literal>NEW</literal>
>
> Opening and ending tag mismatch on UPDATE and OLD literal so 
> documentation
> build fails and please update the documentation on server programming
> section too

Fixed.

I've also amended the server programming section to accurately reflect 
how WHEN conditions can be used.

> Instead of planning every time the trigger fire I suggest to store 
> plan or
> prepared statement node so planning time can be saved

Yes, that would make sense. I'll look in to what needs to be done.

Do you know if there are other areas of the code that cache plans that 
could act as a guide as to how best to achieve it?

> There are server crash on the following sequence of command
>
> ...
>
> INSERT INTO main_table (a, b) VALUES
>
> (101, 498),
>
> (102, 499);
>
> server crashed

Thanks. It was an incorrect Assert about NULL returns. Fixed.

-Joe


Attachment

Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

From
"Joe Wildish"
Date:
Hi Hackers,

Attached is a new version of this patch. I resurrected it after removing it from the commitfest last year; I'll add it back in to the next CF.

The main change is a switch to using SPI for expression evaluation.  The plans are also cached along the same lines as the RI trigger plans.

Some random thoughts on the allowable expressions:

a. I originally disallowed functions and table-valued functions from appearing in the expression as they could potentially do anything and everything.  However, I noticed that we allow functions in FOR EACH ROW triggers so we are already in that position.  Do we want to continue allowing that in FOR EACH STATEMENT triggers?  If so, then the choice to restrict the expression to just OLD, NEW and the table being triggered against might be wrong.

b. If a WHEN expression is defined as "n = (SELECT ...)", there is the possibility that a user gets the error "more than one row returned by a subquery used as an expression" when performing DML, which would be rather cryptic if they didn't know there was a trigger involved.  To avoid this, we could disallow scalar expressions, with a hint to use the ANY/ALL quantifiers.

-Joe


Attachment
"Joe Wildish" <joe@lateraljoin.com> writes:
> The main change is a switch to using SPI for expression evaluation.  The plans are also cached along the same lines
asthe RI trigger plans. 

I really dislike this implementation technique.  Aside from the likely
performance hit for existing triggers, I think it opens serious security
holes, because we can't fully guarantee that deparse-and-reparse doesn't
change the semantics.  For comparison, the RI trigger code goes to
ridiculous lengths to force exact parsing of the queries it makes,
and succeeds only because it needs just a very stylized subset of SQL.
With a generic user-written expression, we'd be at risk for several
inherently-ambiguous SQL constructs such as IS DISTINCT FROM (see
relevant reading at [1]).

You could argue that the same hazards apply if the user writes the same
query within the body of the trigger, and you'd have a point.  But
we've made a policy decision that users are on the hook to write their
functions securely.  No such decision has ever been taken with respect
to pre-parsed expression trees.  In general, users may expect that
once those are parsed by the accepting DDL command, they'll hold still,
not get re-interpreted at runtime.

> a. I originally disallowed functions and table-valued functions from appearing in the expression as they could
potentiallydo anything and everything.  However, I noticed that we allow functions in FOR EACH ROW triggers so we are
alreadyin that position.  Do we want to continue allowing that in FOR EACH STATEMENT triggers?  If so, then the choice
torestrict the expression to just OLD, NEW and the table being triggered against might be wrong. 

Meh ... users have always been able to write CHECK constraints, WHEN
clauses, etc, that have side-effects --- they just have to bury that
inside a function.  It's only their own good taste and the lack of
predictability of when the side-effects will happen that stop them.
I don't see much point in enforcing restrictions that are easily
evaded by making a function.

(Relevant to that, I wonder why this patch is only concerned with
WHEN clauses and not all the other places where we forbid subqueries
for implementation simplicity.)

> b. If a WHEN expression is defined as "n = (SELECT ...)", there is the possibility that a user gets the error "more
thanone row returned by a subquery used as an expression" when performing DML, which would be rather cryptic if they
didn'tknow there was a trigger involved.  To avoid this, we could disallow scalar expressions, with a hint to use the
ANY/ALLquantifiers. 

How is that any more cryptic than any other error that can occur
in a WHEN expression?

            regards, tom lane

[1] https://www.postgresql.org/message-id/10492.1531515255%40sss.pgh.pa.us



Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

From
"Joe Wildish"
Date:
Hi Tom,

On Wed, 22 Sep 2021, at 17:09, Tom Lane wrote:
> The main change is a switch to using SPI for expression evaluation.  The plans are also cached along the same lines as the RI trigger plans.

I really dislike this implementation technique.  Aside from the likely
performance hit for existing triggers, I think it opens serious security
holes, because we can't fully guarantee that deparse-and-reparse doesn't
change the semantics.  For comparison, the RI trigger code goes to
ridiculous lengths to force exact parsing of the queries it makes,
and succeeds only because it needs just a very stylized subset of SQL.
With a generic user-written expression, we'd be at risk for several
inherently-ambiguous SQL constructs such as IS DISTINCT FROM (see
relevant reading at [1]).

Where do you consider the performance hit to be? I just read the code again. The only time the new code paths are hit are when a FOR EACH STATEMENT trigger fires that has a WHEN condition. Given the existing restrictions for such a scenario, that can only mean a WHEN condition that is a simple function call; so, "SELECT foo()" vs "foo()"? Or have I misunderstood?

Regarding the deparse-and-reparse --- if I understand correctly, the core problem is that we have no way of going from a node tree to a string, such that the string is guaranteed to have the same meaning as the node tree? (I did try just now to produce such a scenario with the patch but I couldn't get ruleutils to emit the wrong thing).  Moreover, we couldn't store the string for use with SPI, as the string would be subject to trigger-time search path lookups.  That pretty much rules out SPI for this then.  Do you have a suggestion for an alternative? I guess it would be go to the planner/executor directly with the node tree?

 In general, users may expect that
once those are parsed by the accepting DDL command, they'll hold still,
not get re-interpreted at runtime.

I agree entirely. I wasn't aware of the deparsing/reparsing problem.

...
(Relevant to that, I wonder why this patch is only concerned with
WHEN clauses and not all the other places where we forbid subqueries
for implementation simplicity.)

I don't know how many other places WHEN clauses are used. Rules, perhaps? The short answer is this patch was written to solve a specific problem I had rather than it being a more general attempt to remove all "subquery forbidden" restrictions.


> b. If a WHEN expression is defined as "n = (SELECT ...)", there is the possibility that a user gets the error "more than one row returned by a subquery used as an expression" when performing DML, which would be rather cryptic if they didn't know there was a trigger involved.  To avoid this, we could disallow scalar expressions, with a hint to use the ANY/ALL quantifiers.

How is that any more cryptic than any other error that can occur
in a WHEN expression?

It isn't.  What *is* different about it, is that -- AFAIK -- it is the only cryptic message that can come about due to the syntactic structure of an expression.  Yes, someone could have a function that does "RAISE ERROR 'foo'".  There's not a lot that can be done about that.  But scalar subqueries are detectable and they have an obvious rewrite using the quantifiers, hence the suggestion. However, it was just that; a suggestion.

-Joe

Joe Wildish" <joe@lateraljoin.com> writes:
> On Wed, 22 Sep 2021, at 17:09, Tom Lane wrote:
> The main change is a switch to using SPI for expression evaluation.  The plans are also cached along the same lines
asthe RI trigger plans. 
>>
>> I really dislike this implementation technique.  Aside from the likely
>> performance hit for existing triggers, I think it opens serious security
>> holes, because we can't fully guarantee that deparse-and-reparse doesn't
>> change the semantics.

> Where do you consider the performance hit to be?

The deparse/reparse cost might not be negligible, and putting SPI into
the equation where it was not before is certainly going to add overhead.
Now maybe those things are negligible in context, but I wouldn't believe
it without seeing some performance numbers.

> Do you have a suggestion for an alternative? I guess it would be go to the planner/executor directly with the node
tree?

What I'd be thinking about is what it'd take to extend expression_planner
and related infrastructure to allow expressions containing SubLinks.
I fear there are a lot of moving parts there though, since the restriction
has been in place so long.

>> (Relevant to that, I wonder why this patch is only concerned with
>> WHEN clauses and not all the other places where we forbid subqueries
>> for implementation simplicity.)

> I don't know how many other places WHEN clauses are used. Rules, perhaps?

I'm thinking of things like CHECK constraints.  Grepping for calls to
expression_planner would give you a clearer idea of the scope.

> The short answer is this patch was written to solve a specific problem I had rather than it being a more general
attemptto remove all "subquery forbidden" restrictions. 

I won't carp too much if the initial patch only removes the restriction
for WHEN; but I'd like to see it lay the groundwork to remove the
restriction elsewhere as well.

            regards, tom lane



Re: [PATCH] Allow queries in WHEN expression of FOR EACH STATEMENT triggers

From
Robert Haas
Date:
On Thu, Sep 23, 2021 at 5:34 AM Joe Wildish <joe@lateraljoin.com> wrote:
> Regarding the deparse-and-reparse --- if I understand correctly, the core problem is that we have no way of going
froma node tree to a string, such that the string is guaranteed to have the same meaning as the node tree? (I did try
justnow to produce such a scenario with the patch but I couldn't get ruleutils to emit the wrong thing).  Moreover, we
couldn'tstore the string for use with SPI, as the string would be subject to trigger-time search path lookups.  That
prettymuch rules out SPI for this then.  Do you have a suggestion for an alternative? I guess it would be go to the
planner/executordirectly with the node tree? 

I think hoping that you can ever make deparse and reparse reliably
produce the same result is a hopeless endeavor. Tom mentioned hazards
related to ambiguous constructs, but there's also often the risk of
concurrent DDL. Commit 5f173040e324f6c2eebb90d86cf1b0cdb5890f0a is a
cautionary tale, demonstrating that you can't even count on
schema_name.table_name to resolve to the same OID for the entire
duration of a single DDL command. The same hazard exists for
functions, operators, and anything else that gets looked up in a
system catalog.

I don't know what all of that means for your patch, but just wanted to
get my $0.02 in on the general topic.

--
Robert Haas
EDB: http://www.enterprisedb.com