Thread: SQL functions, INSERT/UPDATE/DELETE RETURNING, and triggers
While investigating Merlin's bug report here: http://archives.postgresql.org/pgsql-general/2006-10/msg00447.php I realized that we've completely failed to consider the interactions of $subject. In particular, functions.c still thinks that SELECT is the only type of query that can return rows. ISTM that ideally, a query with RETURNING ought to act like a SELECT for the purposes of a SQL function --- to wit, that the result rows are discarded if it's not the last query in the function, and are returned as the function result if it is. The difficulty with this is that unlike SELECT, a query with RETURNING might be queueing up AFTER triggers, which we shouldn't fire until the query is fully executed. Merlin's report shows that we've already got a problem in the back branches with mishandling of after-trigger state, because we push an AfterTrigger stack level at start of an SQL function command, and then are willing to return from the function with that stack level still active if it's a set-returning function. I think we can fix this in the back branches by the expedient of not pushing a stack level (ie, not calling AfterTriggerBegin/EndQuery) unless it's a non-SELECT command --- SELECT will never queue triggers, and we never return partway through a non-SELECT command. But this falls down for RETURNING queries. I thought about fixing this by extending the AfterTrigger state structure to let it be a tree rather than just a stack, ie, we could temporarily pop the function AfterTrigger status entry without executing any queued triggers, and then push it back on when re-entering the function. This seems horribly messy however, and I'm not sure we could still promise unsurprising order of trigger execution in complicated cases. I think the most promising answer may be to push RETURNING rows into a TupleStore and then read them out from there, which is pretty much the same approach we adopted for RETURNING queries inside portals. This'd allow the query to be executed completely, and its triggers fired, before we return from the SQL function. Comments? regards, tom lane
On Thu, Oct 12, 2006 at 12:19:24PM -0400, Tom Lane wrote: > I think the most promising answer may be to push RETURNING rows into a > TupleStore and then read them out from there, which is pretty much the > same approach we adopted for RETURNING queries inside portals. This'd > allow the query to be executed completely, and its triggers fired, > before we return from the SQL function. Would this only affect RETURNING queries that are returning data via a SRF? ISTM that pushing to a tuplestore is a lot of extra work for simpler cases like INSERT INTO table DELETE FROM queue_table WHERE ... RETURNING *; Even for the case of just going back to the client, it seems like a fair amount of overhead. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On 10/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think the most promising answer may be to push RETURNING rows into a > TupleStore and then read them out from there, which is pretty much the > same approach we adopted for RETURNING queries inside portals. It certainly sounds like the safest implementation and I can't think of any simple way around using a tuplestore in this type of case. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
On Thu, Oct 12, 2006 at 01:28:18PM -0500, Jim C. Nasby wrote: > On Thu, Oct 12, 2006 at 12:19:24PM -0400, Tom Lane wrote: > > I think the most promising answer may be to push RETURNING rows > > into a TupleStore and then read them out from there, which is > > pretty much the same approach we adopted for RETURNING queries > > inside portals. This'd allow the query to be executed completely, > > and its triggers fired, before we return from the SQL function. > > Would this only affect RETURNING queries that are returning data via > a SRF? ISTM that pushing to a tuplestore is a lot of extra work for > simpler cases like INSERT INTO table DELETE FROM queue_table WHERE > ... RETURNING *; Even for the case of just going back to the > client, it seems like a fair amount of overhead. More generally, would this change impede promoting RETURNING to work just as VALUES does in 8.2 (i.e. being able to join RETURNING results, etc.)? Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
"Jim C. Nasby" <jim@nasby.net> writes: > On Thu, Oct 12, 2006 at 12:19:24PM -0400, Tom Lane wrote: >> I think the most promising answer may be to push RETURNING rows into a >> TupleStore and then read them out from there, which is pretty much the >> same approach we adopted for RETURNING queries inside portals. > Would this only affect RETURNING queries that are returning data via a > SRF? Right, and specifically an SQL-language function. > ISTM that pushing to a tuplestore is a lot of extra work for I'm not entirely convinced of that --- the overhead of getting down through functions.c and ExecutorRun into the per-tuple loop isn't trivial either. It wouldn't work at all without significant restructuring, in fact, because as execMain stands we'd be firing "per statement" triggers once per row if we tried to handle RETURNING queries the same way that SQL functions currently handle SELECTs. When you look at the big picture there's a whole lot of call overhead that would go away if SQL functions returned a tuplestore instead of a row at a time. I was toying with the idea that we should make SELECTs return via a tuplestore too, which would allow eliminating the special case of having ExecutorRun return an individual tuple at all. That's not a bugfix though so I'll wait for 8.3 before thinking more about it ... regards, tom lane
On Thu, Oct 12, 2006 at 02:50:34PM -0400, Tom Lane wrote: > > ISTM that pushing to a tuplestore is a lot of extra work for > > I'm not entirely convinced of that --- the overhead of getting down > through functions.c and ExecutorRun into the per-tuple loop isn't > trivial either. It wouldn't work at all without significant > restructuring, in fact, because as execMain stands we'd be firing "per > statement" triggers once per row if we tried to handle RETURNING queries > the same way that SQL functions currently handle SELECTs. When you look > at the big picture there's a whole lot of call overhead that would go > away if SQL functions returned a tuplestore instead of a row at a time. > I was toying with the idea that we should make SELECTs return via a > tuplestore too, which would allow eliminating the special case of having > ExecutorRun return an individual tuple at all. That's not a bugfix > though so I'll wait for 8.3 before thinking more about it ... The specific concern I have is large result sets, like 10s or 100s of MB (or more). We just added support for not buffering those in psql, so it seems like a step backwards to have the backend now buffering it (unless I'm confused on how a tuplestore works...) -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
David Fetter <david@fetter.org> writes: > More generally, would this change impede promoting RETURNING to work > just as VALUES does in 8.2 (i.e. being able to join RETURNING results, > etc.)? Making that happen would imply a whole lot of other changes; this issue isn't the principal gating factor. One of the main things I'd point to right now, in view of this having all arisen from the question of when triggers should fire, is where and when we'd fire BEFORE/AFTER STATEMENT triggers for a RETURNING command embedded in a larger query. For that matter, the system has several not-easily-removed assumptions that a SELECT command won't fire any triggers at all --- which would break down if we allowed constructs like SELECT ... FROM (INSERT ... RETURNING ...) ... We do currently have the ability to make plpgsql functions send RETURNING results back to a calling query, and with this change we could say the same of plain SQL functions --- and in both cases we'll be depending on a tuplestore buffer to keep things sane in terms of when triggers fire. regards, tom lane
"Jim C. Nasby" <jim@nasby.net> writes: > The specific concern I have is large result sets, like 10s or 100s of MB > (or more). We just added support for not buffering those in psql, so it > seems like a step backwards to have the backend now buffering it (unless > I'm confused on how a tuplestore works...) Well, a tuplestore can dump to disk, so at least you don't need to worry about out-of-memory considerations. regards, tom lane
On Thu, Oct 12, 2006 at 03:03:43PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jim@nasby.net> writes: > > The specific concern I have is large result sets, like 10s or 100s of MB > > (or more). We just added support for not buffering those in psql, so it > > seems like a step backwards to have the backend now buffering it (unless > > I'm confused on how a tuplestore works...) > > Well, a tuplestore can dump to disk, so at least you don't need to worry > about out-of-memory considerations. Sure, it's just a lot of data to be shuffling around if we can avoid it. Perhaps we could only do this if there's triggers on the table involved? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <jim@nasby.net> writes: > Sure, it's just a lot of data to be shuffling around if we can avoid it. > Perhaps we could only do this if there's triggers on the table involved? Maybe, but it's awfully late in the 8.2 cycle to be worrying about performance improvements for something that currently doesn't work at all. I'm inclined to keep it simple for now; we can revisit the issue later if anyone has problems in practice. regards, tom lane
I wrote: > ISTM that ideally, a query with RETURNING ought to act like a SELECT > for the purposes of a SQL function --- to wit, that the result rows are > discarded if it's not the last query in the function, and are returned > as the function result if it is. The current state of affairs is that the first part of that works as expected, and the second part fails like so: regression=# create function foo8(bigint,bigint) returns setof int8_tbl as $$ insert into int8_tbl values($1,$2) returning * $$ language sql; ERROR: return type mismatch in function declared to return int8_tbl DETAIL: Function's final statement must be a SELECT. CONTEXT: SQL function "foo8" regression=# While this is certainly undesirable, it looks more like a missing feature than a bug, especially since the documentation says exactly that: ... the final command must be a SELECT that returns whatever isspecified as the function's return type. I spent some time looking at what it would take to fix it, and I find that the changes are a bit bigger than I want to be making in mid-beta. So my recommendation is that for now we just add a TODO item: * Allow SQL-language functions to return results from RETURNING queries regards, tom lane
Hi, Tom, Tom Lane wrote: > "Jim C. Nasby" <jim@nasby.net> writes: >> The specific concern I have is large result sets, like 10s or 100s of MB >> (or more). We just added support for not buffering those in psql, so it >> seems like a step backwards to have the backend now buffering it (unless >> I'm confused on how a tuplestore works...) > > Well, a tuplestore can dump to disk, so at least you don't need to worry > about out-of-memory considerations. Would it be possible to kinda "wrap" the query execution into the tuplestore interface, so that the tuples are generated "on the fly" when fetched from the tuplestore, for large resultsets? Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Added to TODO: * Allow SQL-language functions to return results from RETURNING queries --------------------------------------------------------------------------- Tom Lane wrote: > I wrote: > > ISTM that ideally, a query with RETURNING ought to act like a SELECT > > for the purposes of a SQL function --- to wit, that the result rows are > > discarded if it's not the last query in the function, and are returned > > as the function result if it is. > > The current state of affairs is that the first part of that works as > expected, and the second part fails like so: > > regression=# create function foo8(bigint,bigint) returns setof int8_tbl as > $$ insert into int8_tbl values($1,$2) returning * $$ > language sql; > ERROR: return type mismatch in function declared to return int8_tbl > DETAIL: Function's final statement must be a SELECT. > CONTEXT: SQL function "foo8" > regression=# > > While this is certainly undesirable, it looks more like a missing > feature than a bug, especially since the documentation says exactly > that: > > ... the final command must be a SELECT that returns whatever is > specified as the function's return type. > > I spent some time looking at what it would take to fix it, and I find > that the changes are a bit bigger than I want to be making in mid-beta. > So my recommendation is that for now we just add a TODO item: > > * Allow SQL-language functions to return results from RETURNING queries > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +