Re: delta relations in AFTER triggers - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: delta relations in AFTER triggers
Date
Msg-id 5404A7DE.5040204@vmware.com
Whole thread Raw
In response to Re: delta relations in AFTER triggers  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: delta relations in AFTER triggers  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 08/30/2014 12:15 AM, Kevin Grittner wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>> On 08/28/2014 12:03 AM, Kevin Grittner wrote:
>>> Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>>>> I suggest adding a new hook to the ParseState struct, (p_rangevar_hook
>>>> ?). The planner calls it whenever it sees a reference to a table, and
>>>> the hook function returns back some sort of placeholder reference to the
>>>> tuplestore. With variables, the hook returns a Param node, and at
>>>> execution time, the executor calls the paramFetch hook to fetch the
>>>> value of the param. For relations/tuplestores, I guess we'll need to
>>>> invent something like a Param node, but for holding information about
>>>> the relation. Like your TsrData struct, but without the pointer to the
>>>> tuplestore. At execution time, in the SPI_execute call, you pass the
>>>> pointer to the tuplestore in the ParamListInfo struct, like you pass
>>>> parameter values.
>>>>
>>>> Does this make sense?
>>>
>>> I see your point, but SPI first has to be made aware of the
>>> tuplestores and their corresponding names and TupleDesc structures.
>>> Does it make sense to keep the SPI_register_tuplestore() and
>>> SPI_unregister_tuplestore() functions for the client side of the
>>> API, and pass things along to the parse analysis through execution
>>> phases using the techniques you describe?
>>
>> Sorry, I didn't understand that. What do you mean by "first", and the
>> "client side of the API"? I don't see any need for the
>> SPI_register_tuplestore() and and SPI_unregister_tuplestore() functions
>> if you use the hooks.
>
> If we were to go with the hooks as you propose, we would still need
> to take the information from TriggerData and put it somewhere else
> for the hook to reference.

Sure.

> The hooks are generalized for plpgsql,
> not just for triggers, and it doesn't seem appropriate for them to
> be fishing around in the TriggerData structure.

PLpgSQL_execstate seems like the appropriate place.

> And what if we add other sources for tuplestores?

What about it?

> The lookup during parse analysis
> each time an apparent relation name is encountered must be simple
> and fast.

We already use hooks for ColumnRefs, which are called even more often, 
and we haven't had a problem making that fast enough.

> I want named tuplestores to be easy to use from *all* PLs (for
> trigger usage) as well as useful for other purposes people may want
> to develop.

I'm not sure other PLs would even want to resolve the old/new relations 
like PL/pgSQL does. It might be more natural to access the new/old 
tuplestores as perl or python hashes or arrays, for example. But if they 
do, it's not that difficult to write the hooks.

> I had to change the hashkey for plpgsql's plan
> caching, but that needs to be done regardless of the API (to
> prevent problems in the obscure case that someone attaches the same
> trigger function to the same table for the same events more than
> once with different trigger names and different transition table
> names).  If you ignore that, the *entire* change to use this in
> plpgsql is to add these lines to plpgsql_exec_trigger():
>
>      /*
>       * Capture the NEW and OLD transition TABLE tuplestores (if specified for
>       * this trigger).
>       */
>      if (trigdata->tg_newtable)
>      {
>          Tsr tsr = palloc(sizeof(TsrData));
>
>          tsr->name = trigdata->tg_trigger->tgnewtable;
>          tsr->tstate = trigdata->tg_newtable;
>          tsr->tupdesc = trigdata->tg_relation->rd_att;
>          tsr->relid = trigdata->tg_relation->rd_id;
>          SPI_register_tuplestore(tsr);
>      }
>      if (trigdata->tg_oldtable)
>      {
>          Tsr tsr = palloc(sizeof(TsrData));
>
>          tsr->name = trigdata->tg_trigger->tgoldtable;
>          tsr->tstate = trigdata->tg_oldtable;
>          tsr->tupdesc = trigdata->tg_relation->rd_att;
>          tsr->relid = trigdata->tg_relation->rd_id;
>          SPI_register_tuplestore(tsr);
>      }
>
> With the new SPI functions, the code to implement this in each
> other PL should be about the same (possibly identical), and areas
> using SPI only need similar code to make tuplestores visible to the
> planner and usable in the executor if someone has another use for
> this.  You just do the above once you have run SPI_connect() and
> before preparing or executing any query that references the named
> tuplestore.

With hooks, the code to implement them in other PLs would be about the 
same too, if they want the same behavior.

>  It remains available on that SPI connection until
> SPI_finish() is called or you explicitly unregister it (by name).

Yeah, I don't like that. The SPI interface is currently stateless. Well, 
except for cursors and plans explicitly saved with SPI_keepplan. But the 
way queries are parsed is stateless - you pass all the necessary 
information as part of the SPI_execute call (or similar), using direct 
arguments and the ParamListInfo struct.


If you don't want to use hooks, I nevertheless feel that the old/new 
relations should be passed as part of the ParamListInfo struct, one way 
or another. With hooks, you would set the parserSetup hook, which in 
turn would set up the table-ref hook similar to the column-ref hooks, 
but if you don't want to do that, you could also add new fields directly 
to ParamListInfo for the relations, like the "ParamExternData 
params[1]" array that's there currently.

> If we use the hooks, I think it will be several times as much code,
> more invasive, and probably more fragile.  More importantly, these
> hooks are not used by the other PLs included with core, and are not
> used in any of the other core code, anywhere.  They all use SPI, so
> they can do the above very easily, but adding infrastructure for
> them to use the hooks would be a lot more work, and I'm not seeing
> a corresponding benefit.

If they use SPI, they can use the hooks just as well.

> I think there is some room to change the API as used by the parser,
> planner, and executor so that no changes are needed there if we add
> some other registration mechanism besides SPI, but I think having a
> registry for tuplestores that sort of takes the place of the
> catalogs and related caches (but is far simpler, being process
> local) is a better model than what you propose.
>
> In summary, thinking of the definition of a named tuplestore as a
> variable is the wrong parallel; it is more like a lightweight
> relation, and the comparison should be to that, not to function
> parameters or local variables.

I'm not too convinced. Marti Raudsepp mentioned refcursor variables, and 
I think he's on to something. Refcursor are a bit difficult to 
understand, so I wouldn't use those directly, but it would make a lot of 
sense if you could e.g. loop over the rows directly with a FOR loop, 
e.g. "FOR recordvar IN new LOOP ... END LOOP" without having to do 
"SELECT * FROM new". Introducing a new "relation variable" datatype for 
this would be nice. I won't insist that you have to implement that right 
now, but let's not foreclose the option to add it later.

BTW, do we expect the old/new relations to be visible to dynamic SQL, 
ie. EXECUTE? I think most users would say yes, even though the old/new 
variables are not - you have to pass them with EXECUTE USING. Admittedly 
that supports thinking of them more as lightweight relations rather than 
variables.

Another question is whether you would expect the NEW/OLD table to be 
visible to functions that you call from the trigger? For example, if the 
trigger does:

...
PERFORM myfunction()
...

Can myfunction do "SELECT * FROM new" ? If not, is there a work-around? 
I guess that's not this patch's problem, because we don't have a way to 
pass sets as arguments in general. Refcursor is the closest thing, but 
it's cumbersome.

In any case, I think having the parser call back into SPI, in 
parse_tuplestore.c, is a modularity violation. The tuplestores need to
somehow find their way into ParseState. If you go with the 
SPI_register_tuplestore API, then you should still have the tuplestores 
in the ParseState struct, and have SPI fill in that information. Now 
that I look back, I think you also referred to that earlier in the 
paragraph that I didn't understand. Let me try again:

> I see your point, but SPI first has to be made aware of the
> tuplestores and their corresponding names and TupleDesc structures.
> Does it make sense to keep the SPI_register_tuplestore() and
> SPI_unregister_tuplestore() functions for the client side of the
> API, and pass things along to the parse analysis through execution
> phases using the techniques you describe?  That would eliminate the
> need for the SPI_get_caller_tuplestore() function and the
> parse_tuplestore.[ch] files, and change how the data is fetched in
> parse analysis and execution phases, but that seems fairly minimal
> -- there are exactly three places that would need to call the new
> hooks where the patch is now getting the information from the
> registry.

Yes, if you decide to keep the SPI_register_tuplestore() function, then 
IMHO you should still use hooks to pass that information to the parser, 
planner and executor.

- Heikki




pgsql-hackers by date:

Previous
From: Dobes Vandermeer
Date:
Subject: Re: Tips/advice for implementing integrated RESTful HTTP API
Next
From: Hannu Krosing
Date:
Subject: Re: On partitioning