Thread: Proposal - asynchronous functions

Proposal - asynchronous functions

From
Sim Zacks
Date:
Asynchronous functions

*Problem*
Postgresql does not have support for asynchronous function calls.

*Solution*
An asynchronous function would allow a user to call a function and have 
it return immediately, while an internal session manages the actual 
processing. Any return value(s) of the function would be discarded.

*Value Added*
There are two types of primary usage for an asynchronous function:
   * Building summary tables or materialized views.
   * Running business logic functionality through untrusted languages

These function can be either run stand-alone or called by a trigger. The 
proposed rules of an asynchronous function are:
   * the result should not impact the statement run, meaning if there     is an error it should not cancel the
transaction
   * the user should not have to wait until the function is finished to     get control of the session back
   * the long-running function should not be dependent on the user     keeping the session alive

*Current workaround*

Currently, the way to implement these types of functions are:
   * Using the Listen/Notify calls.
   * Adding a row to a queuing table and processing it as a cron job.

The problem with these workarounds are:
   * In principal, going from the database outside, just to go back in     so that you have an external session
controller,is awkward.     Listen/Notify is a great method to run a server function that is     not related to the
database.
   * Sometimes the connection in your daemon stops (from experience),     and there is no notification
   * Some functions should be run immediately and not queued.
   * They add complexity to the end user

*Proposal*

Add an Async command for functions ( ASYNC my_func(var1,var2) ) and add 
an async optional keyword in trigger statements ( CREATE TRIGGER ... 
EXECUTE ASYNC trig_func() ). This should cause an internal session to be 
started that the function or trigger function will run in, disconnected 
from the session it started in.

Sim Zacks


Re: Proposal - asynchronous functions

From
Merlin Moncure
Date:
On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks <sim@compulab.co.il> wrote:
> Asynchronous functions
>
> *Problem*
> Postgresql does not have support for asynchronous function calls.

Well, there is asynchronous support from the client of course.  Thus
you can set up a asynchronous call back to the database with dblink.
There is some discussion about formalizing this feature -- you might
want to read up on autonomous transactions and how they might be used
to do what you are proposing.

merlin


Re: Proposal - asynchronous functions

From
Robert Haas
Date:
On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks <sim@compulab.co.il> wrote:
> Add an Async command for functions ( ASYNC my_func(var1,var2) ) and add an
> async optional keyword in trigger statements ( CREATE TRIGGER ... EXECUTE
> ASYNC trig_func() ). This should cause an internal session to be started
> that the function or trigger function will run in, disconnected from the
> session it started in.

We've talked about a number of features that could benefit from some
kind of "worker process" facility (e.g. logical replication, parallel
query).  So far no one has stepped forward to build such a facility,
and I think without that this can't even get off the ground.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Proposal - asynchronous functions

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> We've talked about a number of features that could benefit from some
> kind of "worker process" facility (e.g. logical replication, parallel
> query).  So far no one has stepped forward to build such a facility,
> and I think without that this can't even get off the ground.

Well, this specific thing could be done by just having PG close the
client connection, not care that it's gone, and have an implied
'commit;' at the end.  I'm not saying that I like this approach, but I
don't think it'd be hard to implement.

What I don't think we saw was any information about how, exactly, the OP
was planning to implement this in the backend.
Thanks,
    Stephen

Re: Proposal - asynchronous functions

From
Robert Haas
Date:
On Tue, Apr 26, 2011 at 8:32 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> We've talked about a number of features that could benefit from some
>> kind of "worker process" facility (e.g. logical replication, parallel
>> query).  So far no one has stepped forward to build such a facility,
>> and I think without that this can't even get off the ground.
>
> Well, this specific thing could be done by just having PG close the
> client connection, not care that it's gone, and have an implied
> 'commit;' at the end.  I'm not saying that I like this approach, but I
> don't think it'd be hard to implement.

Maybe, but that introduces a lot of complications with regards to
things like authentication.  We probably want some API for a backend
to say - hey, please spawn a session with the same user ID and
database association as me, and also provide some mechanism for data
transfer between the two processes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Proposal - asynchronous functions

From
Sim Zacks
Date:
On 04/26/2011 03:15 PM, Merlin Moncure wrote:

> On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il>  wrote:
>> Asynchronous functions
>>
>> *Problem*
>> Postgresql does not have support for asynchronous function calls.
> Well, there is asynchronous support from the client of course.  Thus
> you can set up a asynchronous call back to the database with dblink.
> There is some discussion about formalizing this feature -- you might
> want to read up on autonomous transactions and how they might be used
> to do what you are proposing.
>
> merlin
I am looking for specifically server support and not client support. 
Part of the proposal is that if the client goes away, it will still 
continue to finish.

Sim


Re: Proposal - asynchronous functions

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Apr 26, 2011 at 8:32 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > Well, this specific thing could be done by just having PG close the
> > client connection, not care that it's gone, and have an implied
> > 'commit;' at the end.  I'm not saying that I like this approach, but I
> > don't think it'd be hard to implement.
>
> Maybe, but that introduces a lot of complications with regards to
> things like authentication.  We probably want some API for a backend
> to say - hey, please spawn a session with the same user ID and
> database association as me, and also provide some mechanism for data
> transfer between the two processes.

The impression I got from the OP is that this function call could be the
last (and possibly only) thing done with this connection.  I wasn't
suggesting that we spawn a new backend to run it (that introduces all
kinds of complexities).  The approach I was suggesting was to just have
the backend close its client connection and then process the function
and then 'commit;' and exit.

Might be interesting as a way to prefix anything, ala:

LAST delete from big_table;

poof, client is disconnected, backend keeps running, etc.

I don't know if that would really be useful to very many people or that
it's something we'd really want to do but it's an interesting idea to be
able to 'background' a process.

I'm certainly all for the bigger projects of having a cron-like
capability and/or being able to spawn off multiple backgrounded queries
from a single connection.
Thanks,
    Stephen

Re: Proposal - asynchronous functions

From
Sim Zacks
Date:
On 04/26/2011 03:32 PM, Stephen Frost wrote:

> What I don't think we saw was any information about how, exactly, the OP
> was planning to implement this in the backend.
>
>     Thanks,
>
>         Stephen
I'm at stage 1 of this proposal, meaning I know exactly what I want. I 
am checking with the hackers list to see if this is a desirable feature 
before going to a postgres developer to talk about actually building the 
feature.


Re: Proposal - asynchronous functions

From
Sim Zacks
Date:
On 04/26/2011 04:22 PM, Stephen Frost wrote:

> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Tue, Apr 26, 2011 at 8:32 AM, Stephen Frost<sfrost@snowman.net>  wrote:
>>> Well, this specific thing could be done by just having PG close the
>>> client connection, not care that it's gone, and have an implied
>>> 'commit;' at the end.  I'm not saying that I like this approach, but I
>>> don't think it'd be hard to implement.
>> Maybe, but that introduces a lot of complications with regards to
>> things like authentication.  We probably want some API for a backend
>> to say - hey, please spawn a session with the same user ID and
>> database association as me, and also provide some mechanism for data
>> transfer between the two processes.
> The impression I got from the OP is that this function call could be the
> last (and possibly only) thing done with this connection.  I wasn't
> suggesting that we spawn a new backend to run it (that introduces all
> kinds of complexities).  The approach I was suggesting was to just have
> the backend close its client connection and then process the function
> and then 'commit;' and exit.
>
My thought was that it actually would require its own process. One use 
case is a function might be called from within another function, but it 
does not want to wait for a return. Then the original function would 
finish processing and return. The second function would be run with the 
security of the user who called the function, but would be "managed" as 
a separate connection without a client (or as a client on the server to 
be more precise)

Sim


Re: Proposal - asynchronous functions

From
David Fetter
Date:
On Tue, Apr 26, 2011 at 04:17:48PM +0300, Sim Zacks wrote:
> On 04/26/2011 03:15 PM, Merlin Moncure wrote:
> 
> >On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il>  wrote:
> >>Asynchronous functions
> >>
> >>*Problem*
> >>Postgresql does not have support for asynchronous function calls.
> >Well, there is asynchronous support from the client of course.  Thus
> >you can set up a asynchronous call back to the database with dblink.
> >There is some discussion about formalizing this feature -- you might
> >want to read up on autonomous transactions and how they might be used
> >to do what you are proposing.
> >
> >merlin
> I am looking for specifically server support and not client support.
> Part of the proposal is that if the client goes away, it will still
> continue to finish.

This is exactly autonomous transactions.  Please read this thread to
see how.

http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Proposal - asynchronous functions

From
Robert Haas
Date:
On Tue, Apr 26, 2011 at 10:02 AM, David Fetter <david@fetter.org> wrote:
> On Tue, Apr 26, 2011 at 04:17:48PM +0300, Sim Zacks wrote:
>> On 04/26/2011 03:15 PM, Merlin Moncure wrote:
>>
>> >On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il>  wrote:
>> >>Asynchronous functions
>> >>
>> >>*Problem*
>> >>Postgresql does not have support for asynchronous function calls.
>> >Well, there is asynchronous support from the client of course.  Thus
>> >you can set up a asynchronous call back to the database with dblink.
>> >There is some discussion about formalizing this feature -- you might
>> >want to read up on autonomous transactions and how they might be used
>> >to do what you are proposing.
>> >
>> >merlin
>> I am looking for specifically server support and not client support.
>> Part of the proposal is that if the client goes away, it will still
>> continue to finish.
>
> This is exactly autonomous transactions.  Please read this thread to
> see how.
>
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php

It's not the same thing at all.  An autonomous function is (or appears
to be) two simultaneous toplevel transactions within the same backend.This is a request for an *asynchronous* function,
whichwould run 
concurrently with foreground processing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Proposal - asynchronous functions

From
Merlin Moncure
Date:
On Tue, Apr 26, 2011 at 9:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 10:02 AM, David Fetter <david@fetter.org> wrote:
>> On Tue, Apr 26, 2011 at 04:17:48PM +0300, Sim Zacks wrote:
>>> On 04/26/2011 03:15 PM, Merlin Moncure wrote:
>>>
>>> >On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il>  wrote:
>>> >>Asynchronous functions
>>> >>
>>> >>*Problem*
>>> >>Postgresql does not have support for asynchronous function calls.
>>> >Well, there is asynchronous support from the client of course.  Thus
>>> >you can set up a asynchronous call back to the database with dblink.
>>> >There is some discussion about formalizing this feature -- you might
>>> >want to read up on autonomous transactions and how they might be used
>>> >to do what you are proposing.
>>> >
>>> >merlin
>>> I am looking for specifically server support and not client support.
>>> Part of the proposal is that if the client goes away, it will still
>>> continue to finish.
>>
>> This is exactly autonomous transactions.  Please read this thread to
>> see how.
>>
>> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php
>
> It's not the same thing at all.  An autonomous function is (or appears
> to be) two simultaneous toplevel transactions within the same backend.
>  This is a request for an *asynchronous* function, which would run
> concurrently with foreground processing.

It's not exactly the same, but in the greater spirit of things I think
David is correct.  If you make async dblink call, you get parallel
processing from a single function entry point.   Autonomous
transaction implementations I've heard are basically taking this
approach and de-kludging it, and give you a lot of the same stuff,
like being able to do work in parallel.  I'm curious if the feature
meets the OP's requirements.

merlin


Re: Proposal - asynchronous functions

From
Sim Zacks
Date:
On 04/26/2011 06:32 PM, Merlin Moncure wrote:

> On Tue, Apr 26, 2011 at 9:24 AM, Robert Haas<robertmhaas@gmail.com>  wrote:
>> On Tue, Apr 26, 2011 at 10:02 AM, David Fetter<david@fetter.org>  wrote:
>>> On Tue, Apr 26, 2011 at 04:17:48PM +0300, Sim Zacks wrote:
>>>> On 04/26/2011 03:15 PM, Merlin Moncure wrote:
>>>>
>>>>> On Tue, Apr 26, 2011 at 3:28 AM, Sim Zacks<sim@compulab.co.il>    wrote:
>>>>>> Asynchronous functions
>>>>>>
>>>>>> *Problem*
>>>>>> Postgresql does not have support for asynchronous function calls.
>>>>> Well, there is asynchronous support from the client of course.  Thus
>>>>> you can set up a asynchronous call back to the database with dblink.
>>>>> There is some discussion about formalizing this feature -- you might
>>>>> want to read up on autonomous transactions and how they might be used
>>>>> to do what you are proposing.
>>>>>
>>>>> merlin
>>>> I am looking for specifically server support and not client support.
>>>> Part of the proposal is that if the client goes away, it will still
>>>> continue to finish.
>>> This is exactly autonomous transactions.  Please read this thread to
>>> see how.
>>>
>>> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php
>> It's not the same thing at all.  An autonomous function is (or appears
>> to be) two simultaneous toplevel transactions within the same backend.
>>   This is a request for an *asynchronous* function, which would run
>> concurrently with foreground processing.
> It's not exactly the same, but in the greater spirit of things I think
> David is correct.  If you make async dblink call, you get parallel
> processing from a single function entry point.   Autonomous
> transaction implementations I've heard are basically taking this
> approach and de-kludging it, and give you a lot of the same stuff,
> like being able to do work in parallel.  I'm curious if the feature
> meets the OP's requirements.
We have tried a similar approach, using plpythonu, by calling import pg 
and then creating a new connection to the database. This does give you 
an autonomous transaction, but not an asynchronous function.
My use cases are mostly where the function takes longer then the user 
wants to wait and the result is not as important to the user as it is to 
the system.
One example is building a summary table (materialized view if you will). 
Lets say building the table takes 10 seconds and is run on a trigger for 
every update to a specific table. When the user updates the table he 
doesn't want to wait 10 seconds before the control returns.
Another example, is a plpythonu function that FTPs a file. The file can 
take X amount of time to send and the user just needs to know that it 
has been sent. If there is a problem the user will not be informed about 
it directly. There are ways of having the function tell the system 
(either email or error table or marking a bool flag, etc) and by using 
this type of function the user declares that he understands that 
something might go wrong and he won't get a message about it. The user 
may also turn off his computer before the file is finished sending.

Sim


Re: Proposal - asynchronous functions

From
Christopher Browne
Date:
On Tue, Apr 26, 2011 at 1:15 PM, Sim Zacks <sim@compulab.co.il> wrote:
> We have tried a similar approach, using plpythonu, by calling import pg and
> then creating a new connection to the database. This does give you an
> autonomous transaction, but not an asynchronous function.
> My use cases are mostly where the function takes longer then the user wants
> to wait and the result is not as important to the user as it is to the
> system.
> One example is building a summary table (materialized view if you will).
> Lets say building the table takes 10 seconds and is run on a trigger for
> every update to a specific table. When the user updates the table he doesn't
> want to wait 10 seconds before the control returns.
> Another example, is a plpythonu function that FTPs a file. The file can take
> X amount of time to send and the user just needs to know that it has been
> sent. If there is a problem the user will not be informed about it directly.
> There are ways of having the function tell the system (either email or error
> table or marking a bool flag, etc) and by using this type of function the
> user declares that he understands that something might go wrong and he won't
> get a message about it. The user may also turn off his computer before the
> file is finished sending.

There's a pretty big "foot gun" there in that there's the potential
for each connection coming in from a client to spawn a further
connection that *doesn't* go away when the client does.  There's a
not-inconsiderable risk of having a ballooning set of
"post-processing" connections lurking around.

That doesn't have to be problematic, within the context of a
reasonable design.  For such cases, the "thing that lurks afterwards"
shouldn't the process that does "the postprocessing for MY
connection", but rather a singleton process (e.g. - it does something
to ensure that There Can Only Be One) that does postprocessing of that
kind of activity.

The "asynchronous bit" would consist of something like:
- queueing up My Connection's Object IDs for processing
- trying to start the singleton asynchronous process, failing,
gracefully (e.g. - without terminating any of the client's work) if
that fails.

An extra use case for this leaps out at me immediately.

It would be a plenty fine idea for a NOTIFY request to cause
asynchronous invocation of a specified stored procedure.  That would
definitely "spiff up" the usefulness of NOTIFY/LISTEN, by adding a way
of having a listener process already available on the server.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: Proposal - asynchronous functions

From
Markus Wanner
Date:
Robert,

On 04/26/2011 02:25 PM, Robert Haas wrote:
> We've talked about a number of features that could benefit from some
> kind of "worker process" facility (e.g. logical replication, parallel
> query).  So far no one has stepped forward to build such a facility,
> and I think without that this can't even get off the ground.

Remember the bgworker patches extracted from Postgres-R?

[ Interestingly enough, one of the complaints I heard back then (not
necessarily from you) was that there's no user for bgworkers, yet.
Smells a lot like a chicken and egg problem to me. ]

Regards

Markus


Re: Proposal - asynchronous functions

From
"Kevin Grittner"
Date:
Markus Wanner <markus@bluegap.ch> wrote:
> On 04/26/2011 02:25 PM, Robert Haas wrote:
>> We've talked about a number of features that could benefit from
>> some kind of "worker process" facility (e.g. logical replication,
>> parallel query).  So far no one has stepped forward to build such
>> a facility, and I think without that this can't even get off the
>> ground.
> 
> Remember the bgworker patches extracted from Postgres-R?
Yeah, that crossed my mind.
> [ Interestingly enough, one of the complaints I heard back then
> (not necessarily from you) was that there's no user for bgworkers,
> yet. Smells a lot like a chicken and egg problem to me. ]
My recollection is that people wanted two or three solid use cases
so that what was implemented could be shown to be generalized. 
Perhaps this brings us to critical mass to re-introduce the idea.
-Kevin


Re: Proposal - asynchronous functions

From
Robert Haas
Date:
On Apr 26, 2011, at 3:32 PM, Markus Wanner <markus@bluegap.ch> wrote:
> Remember the bgworker patches extracted from Postgres-R?

Oh, right.  I should have remembered that.

> [ Interestingly enough, one of the complaints I heard back then (not
> necessarily from you) was that there's no user for bgworkers, yet.
> Smells a lot like a chicken and egg problem to me. ]

IIRC, we kind of got stuck on the prerequisite wamalloc patch, and that sunk the whole thing.  :-(

...Robert

Re: Proposal - asynchronous functions

From
Markus Wanner
Date:
On 04/26/2011 11:17 PM, Robert Haas wrote:
> IIRC, we kind of got stuck on the prerequisite wamalloc patch, and that sunk the whole thing.  :-(

Right, that prerequisite was the largest stumbling block.  As I
certainly mentioned back then, it should be possible to get rid of the
imessages dependency (and thus wamalloc).  So whoever really wants to
implement asynchronous functions (or autonomous transactions) is more
than welcome to try that.

Please keep in mind that you'd need an alternative communication path.
Not only for the bgworker infrastructure itself, but for communication
between the requesting backend and the bgworker (except for
fire-and-forget jobs like autovacuum, of course.  OTOH even those could
benefit from communicating back their state to the coordinator.. eh..
autovacuum launcher).

Regards

Markus


Re: Proposal - asynchronous functions

From
Sim Zacks
Date:
It sounds like there is interest in this feature, can it get added to 
the TODO list?