Thread: Proposal - asynchronous functions
Asynchronous functions *Problem* Postgresql does not have support for asynchronous function calls. *Solution* An asynchronous function would allow a user to call a function and have it return immediately, while an internal session manages the actual processing. Any return value(s) of the function would be discarded. *Value Added* There are two types of primary usage for an asynchronous function: * Building summary tables or materialized views. * Running business logic functionality through untrusted languages These function can be either run stand-alone or called by a trigger. The proposed rules of an asynchronous function are: * the result should not impact the statement run, meaning if there is an error it should not cancel the transaction * the user should not have to wait until the function is finished to get control of the session back * the long-running function should not be dependent on the user keeping the session alive *Current workaround* Currently, the way to implement these types of functions are: * Using the Listen/Notify calls. * Adding a row to a queuing table and processing it as a cron job. The problem with these workarounds are: * In principal, going from the database outside, just to go back in so that you have an external session controller,is awkward. Listen/Notify is a great method to run a server function that is not related to the database. * Sometimes the connection in your daemon stops (from experience), and there is no notification * Some functions should be run immediately and not queued. * They add complexity to the end user *Proposal* Add an Async command for functions ( ASYNC my_func(var1,var2) ) and add an async optional keyword in trigger statements ( CREATE TRIGGER ... EXECUTE ASYNC trig_func() ). This should cause an internal session to be started that the function or trigger function will run in, disconnected from the session it started in. Sim Zacks
On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks <sim@compulab.co.il> wrote: > Asynchronous functions > > *Problem* > Postgresql does not have support for asynchronous function calls. Well, there is asynchronous support from the client of course. Thus you can set up a asynchronous call back to the database with dblink. There is some discussion about formalizing this feature -- you might want to read up on autonomous transactions and how they might be used to do what you are proposing. merlin
On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks <sim@compulab.co.il> wrote: > Add an Async command for functions ( ASYNC my_func(var1,var2) ) and add an > async optional keyword in trigger statements ( CREATE TRIGGER ... EXECUTE > ASYNC trig_func() ). This should cause an internal session to be started > that the function or trigger function will run in, disconnected from the > session it started in. We've talked about a number of features that could benefit from some kind of "worker process" facility (e.g. logical replication, parallel query). So far no one has stepped forward to build such a facility, and I think without that this can't even get off the ground. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > We've talked about a number of features that could benefit from some > kind of "worker process" facility (e.g. logical replication, parallel > query). So far no one has stepped forward to build such a facility, > and I think without that this can't even get off the ground. Well, this specific thing could be done by just having PG close the client connection, not care that it's gone, and have an implied 'commit;' at the end. I'm not saying that I like this approach, but I don't think it'd be hard to implement. What I don't think we saw was any information about how, exactly, the OP was planning to implement this in the backend. Thanks, Stephen
On Tue, Apr 26, 2011 at 8:32 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> We've talked about a number of features that could benefit from some >> kind of "worker process" facility (e.g. logical replication, parallel >> query). So far no one has stepped forward to build such a facility, >> and I think without that this can't even get off the ground. > > Well, this specific thing could be done by just having PG close the > client connection, not care that it's gone, and have an implied > 'commit;' at the end. I'm not saying that I like this approach, but I > don't think it'd be hard to implement. Maybe, but that introduces a lot of complications with regards to things like authentication. We probably want some API for a backend to say - hey, please spawn a session with the same user ID and database association as me, and also provide some mechanism for data transfer between the two processes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 04/26/2011 03:15 PM, Merlin Moncure wrote: > On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il> wrote: >> Asynchronous functions >> >> *Problem* >> Postgresql does not have support for asynchronous function calls. > Well, there is asynchronous support from the client of course. Thus > you can set up a asynchronous call back to the database with dblink. > There is some discussion about formalizing this feature -- you might > want to read up on autonomous transactions and how they might be used > to do what you are proposing. > > merlin I am looking for specifically server support and not client support. Part of the proposal is that if the client goes away, it will still continue to finish. Sim
* Robert Haas (robertmhaas@gmail.com) wrote: > On Tue, Apr 26, 2011 at 8:32 AM, Stephen Frost <sfrost@snowman.net> wrote: > > Well, this specific thing could be done by just having PG close the > > client connection, not care that it's gone, and have an implied > > 'commit;' at the end. I'm not saying that I like this approach, but I > > don't think it'd be hard to implement. > > Maybe, but that introduces a lot of complications with regards to > things like authentication. We probably want some API for a backend > to say - hey, please spawn a session with the same user ID and > database association as me, and also provide some mechanism for data > transfer between the two processes. The impression I got from the OP is that this function call could be the last (and possibly only) thing done with this connection. I wasn't suggesting that we spawn a new backend to run it (that introduces all kinds of complexities). The approach I was suggesting was to just have the backend close its client connection and then process the function and then 'commit;' and exit. Might be interesting as a way to prefix anything, ala: LAST delete from big_table; poof, client is disconnected, backend keeps running, etc. I don't know if that would really be useful to very many people or that it's something we'd really want to do but it's an interesting idea to be able to 'background' a process. I'm certainly all for the bigger projects of having a cron-like capability and/or being able to spawn off multiple backgrounded queries from a single connection. Thanks, Stephen
On 04/26/2011 03:32 PM, Stephen Frost wrote: > What I don't think we saw was any information about how, exactly, the OP > was planning to implement this in the backend. > > Thanks, > > Stephen I'm at stage 1 of this proposal, meaning I know exactly what I want. I am checking with the hackers list to see if this is a desirable feature before going to a postgres developer to talk about actually building the feature.
On 04/26/2011 04:22 PM, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Tue, Apr 26, 2011 at 8:32 AM, Stephen Frost<sfrost@snowman.net> wrote: >>> Well, this specific thing could be done by just having PG close the >>> client connection, not care that it's gone, and have an implied >>> 'commit;' at the end. I'm not saying that I like this approach, but I >>> don't think it'd be hard to implement. >> Maybe, but that introduces a lot of complications with regards to >> things like authentication. We probably want some API for a backend >> to say - hey, please spawn a session with the same user ID and >> database association as me, and also provide some mechanism for data >> transfer between the two processes. > The impression I got from the OP is that this function call could be the > last (and possibly only) thing done with this connection. I wasn't > suggesting that we spawn a new backend to run it (that introduces all > kinds of complexities). The approach I was suggesting was to just have > the backend close its client connection and then process the function > and then 'commit;' and exit. > My thought was that it actually would require its own process. One use case is a function might be called from within another function, but it does not want to wait for a return. Then the original function would finish processing and return. The second function would be run with the security of the user who called the function, but would be "managed" as a separate connection without a client (or as a client on the server to be more precise) Sim
On Tue, Apr 26, 2011 at 04:17:48PM +0300, Sim Zacks wrote: > On 04/26/2011 03:15 PM, Merlin Moncure wrote: > > >On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il> wrote: > >>Asynchronous functions > >> > >>*Problem* > >>Postgresql does not have support for asynchronous function calls. > >Well, there is asynchronous support from the client of course. Thus > >you can set up a asynchronous call back to the database with dblink. > >There is some discussion about formalizing this feature -- you might > >want to read up on autonomous transactions and how they might be used > >to do what you are proposing. > > > >merlin > I am looking for specifically server support and not client support. > Part of the proposal is that if the client goes away, it will still > continue to finish. This is exactly autonomous transactions. Please read this thread to see how. http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Apr 26, 2011 at 10:02 AM, David Fetter <david@fetter.org> wrote: > On Tue, Apr 26, 2011 at 04:17:48PM +0300, Sim Zacks wrote: >> On 04/26/2011 03:15 PM, Merlin Moncure wrote: >> >> >On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il> wrote: >> >>Asynchronous functions >> >> >> >>*Problem* >> >>Postgresql does not have support for asynchronous function calls. >> >Well, there is asynchronous support from the client of course. Thus >> >you can set up a asynchronous call back to the database with dblink. >> >There is some discussion about formalizing this feature -- you might >> >want to read up on autonomous transactions and how they might be used >> >to do what you are proposing. >> > >> >merlin >> I am looking for specifically server support and not client support. >> Part of the proposal is that if the client goes away, it will still >> continue to finish. > > This is exactly autonomous transactions. Please read this thread to > see how. > > http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php It's not the same thing at all. An autonomous function is (or appears to be) two simultaneous toplevel transactions within the same backend.This is a request for an *asynchronous* function, whichwould run concurrently with foreground processing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 26, 2011 at 9:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Apr 26, 2011 at 10:02 AM, David Fetter <david@fetter.org> wrote: >> On Tue, Apr 26, 2011 at 04:17:48PM +0300, Sim Zacks wrote: >>> On 04/26/2011 03:15 PM, Merlin Moncure wrote: >>> >>> >On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il> wrote: >>> >>Asynchronous functions >>> >> >>> >>*Problem* >>> >>Postgresql does not have support for asynchronous function calls. >>> >Well, there is asynchronous support from the client of course. Thus >>> >you can set up a asynchronous call back to the database with dblink. >>> >There is some discussion about formalizing this feature -- you might >>> >want to read up on autonomous transactions and how they might be used >>> >to do what you are proposing. >>> > >>> >merlin >>> I am looking for specifically server support and not client support. >>> Part of the proposal is that if the client goes away, it will still >>> continue to finish. >> >> This is exactly autonomous transactions. Please read this thread to >> see how. >> >> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php > > It's not the same thing at all. An autonomous function is (or appears > to be) two simultaneous toplevel transactions within the same backend. > This is a request for an *asynchronous* function, which would run > concurrently with foreground processing. It's not exactly the same, but in the greater spirit of things I think David is correct. If you make async dblink call, you get parallel processing from a single function entry point. Autonomous transaction implementations I've heard are basically taking this approach and de-kludging it, and give you a lot of the same stuff, like being able to do work in parallel. I'm curious if the feature meets the OP's requirements. merlin
On 04/26/2011 06:32 PM, Merlin Moncure wrote: > On Tue, Apr 26, 2011 at 9:24 AM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Tue, Apr 26, 2011 at 10:02 AM, David Fetter<david@fetter.org> wrote: >>> On Tue, Apr 26, 2011 at 04:17:48PM +0300, Sim Zacks wrote: >>>> On 04/26/2011 03:15 PM, Merlin Moncure wrote: >>>> >>>>> On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il> wrote: >>>>>> Asynchronous functions >>>>>> >>>>>> *Problem* >>>>>> Postgresql does not have support for asynchronous function calls. >>>>> Well, there is asynchronous support from the client of course. Thus >>>>> you can set up a asynchronous call back to the database with dblink. >>>>> There is some discussion about formalizing this feature -- you might >>>>> want to read up on autonomous transactions and how they might be used >>>>> to do what you are proposing. >>>>> >>>>> merlin >>>> I am looking for specifically server support and not client support. >>>> Part of the proposal is that if the client goes away, it will still >>>> continue to finish. >>> This is exactly autonomous transactions. Please read this thread to >>> see how. >>> >>> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php >> It's not the same thing at all. An autonomous function is (or appears >> to be) two simultaneous toplevel transactions within the same backend. >> This is a request for an *asynchronous* function, which would run >> concurrently with foreground processing. > It's not exactly the same, but in the greater spirit of things I think > David is correct. If you make async dblink call, you get parallel > processing from a single function entry point. Autonomous > transaction implementations I've heard are basically taking this > approach and de-kludging it, and give you a lot of the same stuff, > like being able to do work in parallel. I'm curious if the feature > meets the OP's requirements. We have tried a similar approach, using plpythonu, by calling import pg and then creating a new connection to the database. This does give you an autonomous transaction, but not an asynchronous function. My use cases are mostly where the function takes longer then the user wants to wait and the result is not as important to the user as it is to the system. One example is building a summary table (materialized view if you will). Lets say building the table takes 10 seconds and is run on a trigger for every update to a specific table. When the user updates the table he doesn't want to wait 10 seconds before the control returns. Another example, is a plpythonu function that FTPs a file. The file can take X amount of time to send and the user just needs to know that it has been sent. If there is a problem the user will not be informed about it directly. There are ways of having the function tell the system (either email or error table or marking a bool flag, etc) and by using this type of function the user declares that he understands that something might go wrong and he won't get a message about it. The user may also turn off his computer before the file is finished sending. Sim
On Tue, Apr 26, 2011 at 1:15 PM, Sim Zacks <sim@compulab.co.il> wrote: > We have tried a similar approach, using plpythonu, by calling import pg and > then creating a new connection to the database. This does give you an > autonomous transaction, but not an asynchronous function. > My use cases are mostly where the function takes longer then the user wants > to wait and the result is not as important to the user as it is to the > system. > One example is building a summary table (materialized view if you will). > Lets say building the table takes 10 seconds and is run on a trigger for > every update to a specific table. When the user updates the table he doesn't > want to wait 10 seconds before the control returns. > Another example, is a plpythonu function that FTPs a file. The file can take > X amount of time to send and the user just needs to know that it has been > sent. If there is a problem the user will not be informed about it directly. > There are ways of having the function tell the system (either email or error > table or marking a bool flag, etc) and by using this type of function the > user declares that he understands that something might go wrong and he won't > get a message about it. The user may also turn off his computer before the > file is finished sending. There's a pretty big "foot gun" there in that there's the potential for each connection coming in from a client to spawn a further connection that *doesn't* go away when the client does. There's a not-inconsiderable risk of having a ballooning set of "post-processing" connections lurking around. That doesn't have to be problematic, within the context of a reasonable design. For such cases, the "thing that lurks afterwards" shouldn't the process that does "the postprocessing for MY connection", but rather a singleton process (e.g. - it does something to ensure that There Can Only Be One) that does postprocessing of that kind of activity. The "asynchronous bit" would consist of something like: - queueing up My Connection's Object IDs for processing - trying to start the singleton asynchronous process, failing, gracefully (e.g. - without terminating any of the client's work) if that fails. An extra use case for this leaps out at me immediately. It would be a plenty fine idea for a NOTIFY request to cause asynchronous invocation of a specified stored procedure. That would definitely "spiff up" the usefulness of NOTIFY/LISTEN, by adding a way of having a listener process already available on the server. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
Robert, On 04/26/2011 02:25 PM, Robert Haas wrote: > We've talked about a number of features that could benefit from some > kind of "worker process" facility (e.g. logical replication, parallel > query). So far no one has stepped forward to build such a facility, > and I think without that this can't even get off the ground. Remember the bgworker patches extracted from Postgres-R? [ Interestingly enough, one of the complaints I heard back then (not necessarily from you) was that there's no user for bgworkers, yet. Smells a lot like a chicken and egg problem to me. ] Regards Markus
Markus Wanner <markus@bluegap.ch> wrote: > On 04/26/2011 02:25 PM, Robert Haas wrote: >> We've talked about a number of features that could benefit from >> some kind of "worker process" facility (e.g. logical replication, >> parallel query). So far no one has stepped forward to build such >> a facility, and I think without that this can't even get off the >> ground. > > Remember the bgworker patches extracted from Postgres-R? Yeah, that crossed my mind. > [ Interestingly enough, one of the complaints I heard back then > (not necessarily from you) was that there's no user for bgworkers, > yet. Smells a lot like a chicken and egg problem to me. ] My recollection is that people wanted two or three solid use cases so that what was implemented could be shown to be generalized. Perhaps this brings us to critical mass to re-introduce the idea. -Kevin
On Apr 26, 2011, at 3:32 PM, Markus Wanner <markus@bluegap.ch> wrote: > Remember the bgworker patches extracted from Postgres-R? Oh, right. I should have remembered that. > [ Interestingly enough, one of the complaints I heard back then (not > necessarily from you) was that there's no user for bgworkers, yet. > Smells a lot like a chicken and egg problem to me. ] IIRC, we kind of got stuck on the prerequisite wamalloc patch, and that sunk the whole thing. :-( ...Robert
On 04/26/2011 11:17 PM, Robert Haas wrote: > IIRC, we kind of got stuck on the prerequisite wamalloc patch, and that sunk the whole thing. :-( Right, that prerequisite was the largest stumbling block. As I certainly mentioned back then, it should be possible to get rid of the imessages dependency (and thus wamalloc). So whoever really wants to implement asynchronous functions (or autonomous transactions) is more than welcome to try that. Please keep in mind that you'd need an alternative communication path. Not only for the bgworker infrastructure itself, but for communication between the requesting backend and the bgworker (except for fire-and-forget jobs like autovacuum, of course. OTOH even those could benefit from communicating back their state to the coordinator.. eh.. autovacuum launcher). Regards Markus
It sounds like there is interest in this feature, can it get added to the TODO list?