Thread: Trigger that spawns forked process

Trigger that spawns forked process

From
Christopher Murtagh
Date:
 I would like to write a trigger or function that spawns a forked
process so that the transaction is considered 'complete' to the client,
but continues to perform more work. I've been looking for examples to
steal^H^H^H^H^H learn from but have only found someone asking pretty
much the same question:

  http://archives.postgresql.org/pgsql-general/2002-12/msg01187.php

 Has anyone done anything like this?

Cheers,

Chris


--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017


Re: Trigger that spawns forked process

From
Tom Lane
Date:
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:
>  I would like to write a trigger or function that spawns a forked
> process so that the transaction is considered 'complete' to the client,
> but continues to perform more work.

It's not very clear what you are hoping to have the forked process do,
but if it's supposed to be another backend, forget it --- won't work.
See this thread:
http://archives.postgresql.org/pgsql-sql/2005-04/msg00329.php

            regards, tom lane

Re: Trigger that spawns forked process

From
Christopher Murtagh
Date:
On Mon, 2005-05-09 at 15:38 -0400, Tom Lane wrote:
> Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:
> >  I would like to write a trigger or function that spawns a forked
> > process so that the transaction is considered 'complete' to the client,
> > but continues to perform more work.
>
> It's not very clear what you are hoping to have the forked process do,
> but if it's supposed to be another backend, forget it --- won't work.

 No, I don't want the trigger to do any db stuff at all. Basically, I've
got a content management system that is going to be split across a
cluster. Upon an update submission from one of them, I want to replicate
across the others (which can happen in pseudo real time). So, basically
the DB is the master, once it makes an update, it will spawn a process
to the syncronization (all of this code is written). I just don't want
the initial update process to wait for replication to finish (which is
only a second or two under normal load). I could write a daemon that
would sit an listen to these replication requests, but that just seems
to be more complex than I need.

Cheers,

Chris

--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017


Re: Trigger that spawns forked process

From
Douglas McNaught
Date:
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:

>  No, I don't want the trigger to do any db stuff at all. Basically, I've
> got a content management system that is going to be split across a
> cluster. Upon an update submission from one of them, I want to replicate
> across the others (which can happen in pseudo real time). So, basically
> the DB is the master, once it makes an update, it will spawn a process
> to the syncronization (all of this code is written). I just don't want
> the initial update process to wait for replication to finish (which is
> only a second or two under normal load). I could write a daemon that
> would sit an listen to these replication requests, but that just seems
> to be more complex than I need.

Why not have a client connection LISTENing and doing the
synchronization, and have the trigger use NOTIFY?

Or, you could have the trigger write to a table, and have another
client periodically scanning the table for new sync events.

Either one of those would be simpler and more robust than fork()ing
inside the backend.

-Doug

Re: Trigger that spawns forked process

From
Tom Lane
Date:
Douglas McNaught <doug@mcnaught.org> writes:
> Why not have a client connection LISTENing and doing the
> synchronization, and have the trigger use NOTIFY?
> Or, you could have the trigger write to a table, and have another
> client periodically scanning the table for new sync events.
> Either one of those would be simpler and more robust than fork()ing
> inside the backend.

... not to mention it would avoid the risk of propagating
not-yet-committed changes.

            regards, tom lane

Re: Trigger that spawns forked process

From
Christopher Murtagh
Date:
On Mon, 2005-05-09 at 17:01 -0400, Douglas McNaught wrote:
> Why not have a client connection LISTENing and doing the
> synchronization, and have the trigger use NOTIFY?
>
> Or, you could have the trigger write to a table, and have another
> client periodically scanning the table for new sync events.
>
> Either one of those would be simpler and more robust than fork()ing
> inside the backend.

 How is writing a daemon simpler than using something that could be done
within Postgres? Forking is something that should be natural to Unix
systems, I shouldn't need to write another application to do this. I
don't see how a daemon would necessarily be more robust either.

Cheers,

Chris


--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017


Re: Trigger that spawns forked process

From
Christopher Murtagh
Date:
On Mon, 2005-05-09 at 17:07 -0400, Tom Lane wrote:
> Douglas McNaught <doug@mcnaught.org> writes:
> > Why not have a client connection LISTENing and doing the
> > synchronization, and have the trigger use NOTIFY?
> > Or, you could have the trigger write to a table, and have another
> > client periodically scanning the table for new sync events.
> > Either one of those would be simpler and more robust than fork()ing
> > inside the backend.
>
> ... not to mention it would avoid the risk of propagating
> not-yet-committed changes.

 How's that? If I can notify a daemon that the change is committed, then
why couldn't I write a forking plperl function that executes when the
transaction is done? How is one riskier than the other? Is there
something obvious I'm missing here?

Cheers,

Chris


--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017


Re: Trigger that spawns forked process

From
Douglas McNaught
Date:
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:

> On Mon, 2005-05-09 at 17:01 -0400, Douglas McNaught wrote:
>> Why not have a client connection LISTENing and doing the
>> synchronization, and have the trigger use NOTIFY?
>>
>> Or, you could have the trigger write to a table, and have another
>> client periodically scanning the table for new sync events.
>>
>> Either one of those would be simpler and more robust than fork()ing
>> inside the backend.
>
>  How is writing a daemon simpler than using something that could be done
> within Postgres? Forking is something that should be natural to Unix
> systems, I shouldn't need to write another application to do this. I
> don't see how a daemon would necessarily be more robust either.

Why do random code surgery on the backend, which needs to be utterly
robust and stable, when client-level solutions are just as easy and
much less dangerous?

It's kind of akin to the Linux principle of "don't do in the kernel
what you can do in userspace."

That's my philosophy, anyway.  :)

-Doug

Re: Trigger that spawns forked process

From
Tom Lane
Date:
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:
> On Mon, 2005-05-09 at 17:07 -0400, Tom Lane wrote:
>> ... not to mention it would avoid the risk of propagating
>> not-yet-committed changes.

>  How's that? If I can notify a daemon that the change is committed, then
> why couldn't I write a forking plperl function that executes when the
> transaction is done? How is one riskier than the other? Is there
> something obvious I'm missing here?

Yes: the mechanisms that are being suggested to you already exist.
There is not, AND NEVER WILL BE, any mechanism to invoke random
user-defined functions during the post-commit sequence.  That code
sequence cannot afford to do anything that will potentially incur
errors.

            regards, tom lane

Re: Trigger that spawns forked process

From
"Jim C. Nasby"
Date:
On Mon, May 09, 2005 at 09:07:40PM -0400, Christopher Murtagh wrote:
> On Mon, 2005-05-09 at 17:01 -0400, Douglas McNaught wrote:
> > Why not have a client connection LISTENing and doing the
> > synchronization, and have the trigger use NOTIFY?
> >
> > Or, you could have the trigger write to a table, and have another
> > client periodically scanning the table for new sync events.
> >
> > Either one of those would be simpler and more robust than fork()ing
> > inside the backend.
>
>  How is writing a daemon simpler than using something that could be done
> within Postgres? Forking is something that should be natural to Unix
> systems, I shouldn't need to write another application to do this. I
> don't see how a daemon would necessarily be more robust either.

Well, LISTEN and NOTIFY are built into PostgreSQL
(http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). If the
processes that you're trying to notify of the changes are connected to
the database then this might be the easiest way to do what you're
looking for. Setting up some form of replication, such as Slony, also
comes to mind. But it's impossible to really make a recommendation
without having a better idea of what you're doing.

BTW, my understanding is that it's pretty easy to write a daemon in
perl, and there are examples of how to do this floating around.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Trigger that spawns forked process

From
Christopher Murtagh
Date:
On Tue, 2005-05-10 at 11:11 -0500, Jim C. Nasby wrote:
> Well, LISTEN and NOTIFY are built into PostgreSQL
> (http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). If the
> processes that you're trying to notify of the changes are connected to
> the database then this might be the easiest way to do what you're
> looking for. Setting up some form of replication, such as Slony, also
> comes to mind. But it's impossible to really make a recommendation
> without having a better idea of what you're doing.
>
> BTW, my understanding is that it's pretty easy to write a daemon in
> perl, and there are examples of how to do this floating around.

Yes, I saw the LISTEN/NOTIFY stuff, and it could be interesting. As to
the replication, Slony won't do it for me, as it isn't the database I
want to replicate. Here's a basic description:

 I have 4 cluster nodes all running the same content management software
(home grown). When a change request comes in to one of them (update to
an XML document), it submits the new XML doc to the database (which is
the master repository of all content), then performs an XSLT. Upon the
new change, I want the database to propagate the new result of the XSLT
to the other nodes so that they can pre-cache it (to avoid page loading
latency).

 I was given an example of how to spawn a forked process with plperlu,
and it looks pretty simple and straightforward and exactly what I want:

CREATE or REPLACE function somefunc() returns void as $$
$SIG{CHLD}='IGNORE';
# the preceding line removes any zombies created.
# Assumes you do not want to handle the return value
#from the child process

unless (defined ($pid=fork)) {
     die "cannot fork: $!";
}
unless ($pid) {
     $cmd="your command here";
     system "$cmd";
     if ($? != 0) {
         # handle errors here
     }
     exit;
}
RETURN;
$$ language plperlu;

This seems to be pretty trivial, and near fail-proof to me. my '$cmd'
would then be a script that handles the replication of the cached file
to the nodes (already written and tested). Why is a daemon more robust
than this? (BTW, I ask out of ignorance, not out of arrogance).

Cheers,

Chris
--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017


Re: Trigger that spawns forked process

From
Tom Lane
Date:
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:
>  I was given an example of how to spawn a forked process with plperlu,
> and it looks pretty simple and straightforward and exactly what I want:

> CREATE or REPLACE function somefunc() returns void as $$
> $SIG{CHLD}='IGNORE';

... let's see, you already broke the backend there --- unless its normal
setting of SIGCHLD is IGNORE, in which case munging it is unnecessary
anyway ...

> unless ($pid) {
>      $cmd="your command here";
>      system "$cmd";
>      if ($? != 0) {
>          # handle errors here
>      }
>      exit;
> }

I'm not sure what happens when you do "exit" here, but I'll lay odds
against it being exactly the right things.  (An atexit hook in a backend
process is entitled to suppose it is cleaning up a backend.)  Also,
exactly what is your "handle errors" step going to be?  You don't get
to reflect anything back into the database at that point.

> This seems to be pretty trivial, and near fail-proof to me. my '$cmd'
> would then be a script that handles the replication of the cached file
> to the nodes (already written and tested). Why is a daemon more robust
> than this? (BTW, I ask out of ignorance, not out of arrogance).

The main reason why this is probably a bad idea is that your
transaction is causing side-effects outside the database that cannot
be undone if the transaction later rolls back.  The general consensus
of people who have looked at this is that it's safer to fire those
operations after the transaction actually commits.  (As an example,
this gives you an opportunity to retry the outside operations if *they*
fail, and in any case to keep some state information about whether the
outside-the-DB state is actually synced with inside-the-DB or not.)

            regards, tom lane

Re: Trigger that spawns forked process

From
Scott Marlowe
Date:
On Tue, 2005-05-10 at 15:02, Christopher Murtagh wrote:
> On Tue, 2005-05-10 at 11:11 -0500, Jim C. Nasby wrote:
> > Well, LISTEN and NOTIFY are built into PostgreSQL
> > (http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). If the
> > processes that you're trying to notify of the changes are connected to
> > the database then this might be the easiest way to do what you're
> > looking for. Setting up some form of replication, such as Slony, also
> > comes to mind. But it's impossible to really make a recommendation
> > without having a better idea of what you're doing.
> >
> > BTW, my understanding is that it's pretty easy to write a daemon in
> > perl, and there are examples of how to do this floating around.
>
> Yes, I saw the LISTEN/NOTIFY stuff, and it could be interesting. As to
> the replication, Slony won't do it for me, as it isn't the database I
> want to replicate. Here's a basic description:
>
>  I have 4 cluster nodes all running the same content management software
> (home grown). When a change request comes in to one of them (update to
> an XML document), it submits the new XML doc to the database (which is
> the master repository of all content), then performs an XSLT. Upon the
> new change, I want the database to propagate the new result of the XSLT
> to the other nodes so that they can pre-cache it (to avoid page loading
> latency).

Seeing as how Slony replicates tables you choose to have it replicate,
it seems to me you could just have it replicate the post-xslt table and
it would do what you want.

Re: Trigger that spawns forked process

From
David Fetter
Date:
On Tue, May 10, 2005 at 04:02:59PM -0400, Christopher Murtagh wrote:
> On Tue, 2005-05-10 at 11:11 -0500, Jim C. Nasby wrote:
> > Well, LISTEN and NOTIFY are built into PostgreSQL
> > (http://www.postgresql.org/docs/8.0/interactive/sql-notify.html).
> > If the processes that you're trying to notify of the changes are
> > connected to the database then this might be the easiest way to do
> > what you're looking for. Setting up some form of replication, such
> > as Slony, also comes to mind. But it's impossible to really make a
> > recommendation without having a better idea of what you're doing.
> >
> > BTW, my understanding is that it's pretty easy to write a daemon
> > in perl, and there are examples of how to do this floating around.
>
> Yes, I saw the LISTEN/NOTIFY stuff, and it could be interesting. As
> to the replication, Slony won't do it for me, as it isn't the
> database I want to replicate. Here's a basic description:
>
> I have 4 cluster nodes all running the same content management
> software (home grown). When a change request comes in to one of them
> (update to an XML document), it submits the new XML doc to the
> database (which is the master repository of all content), then
> performs an XSLT. Upon the new change, I want the database to
> propagate the new result of the XSLT to the other nodes so that they
> can pre-cache it (to avoid page loading latency).

Why do you think Slony won't work for this?  One way it could do it is
to have an ON INSERT trigger that populates one or more tables with
the result of the XSLT, which table(s) Slony replicates to the other
servers.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

Re: Trigger that spawns forked process

From
Christopher Murtagh
Date:
On Tue, 2005-05-10 at 13:50 -0700, David Fetter wrote:
> Why do you think Slony won't work for this?  One way it could do it is
> to have an ON INSERT trigger that populates one or more tables with
> the result of the XSLT, which table(s) Slony replicates to the other
> servers.

 Because the nodes are not databases, they are Apache/PHP web servers
which have file system caching where the URL = directory/file. The XSLT
also converts XML objects to PHP code. So basically a content editor can
do something like:

<br /><br />
This is my course description: <coursedesc courseid="AAA 123" />
...
etc.


and the XSLT converts the <coursedesc /> tag into a PHP function (that
does a db lookup).

Cheers,

Chris


Re: Trigger that spawns forked process

From
Christopher Murtagh
Date:
On Tue, 2005-05-10 at 16:17 -0400, Tom Lane wrote:
> ... let's see, you already broke the backend there --- unless its normal
> setting of SIGCHLD is IGNORE, in which case munging it is unnecessary
> anyway ...

Here's my (probably all garbled) explanation: Essentially what that code
is a self-daemonizing perl wrapper. Setting SIGCHLD to IGNORE will
prevent zombie processes from hanging around, essentially
daemonizing/orphaning/forking the perl script.

> > unless ($pid) {
> >      $cmd="your command here";
> >      system "$cmd";
> >      if ($? != 0) {
> >          # handle errors here
> >      }
> >      exit;
> > }
>
> I'm not sure what happens when you do "exit" here, but I'll lay odds
> against it being exactly the right things.

It ends the daemonized process, kinda like a wrapper suicide. :-)

> (An atexit hook in a backend
> process is entitled to suppose it is cleaning up a backend.)  Also,
> exactly what is your "handle errors" step going to be?

Well, if my command fails for some reason, I can replace '#handle
errors' with something that notifies me (email, or by populating the
database, etc.).

> You don't get to reflect anything back into the database at that point.

 That's ok, my $cmd might or might not have db connections in it, same
for the error checking script (which could be written in a totally
different language).

> The main reason why this is probably a bad idea is that your
> transaction is causing side-effects outside the database that cannot
> be undone if the transaction later rolls back. The general consensus
> of people who have looked at this is that it's safer to fire those
> operations after the transaction actually commits.

 I should have stated that this will get used only by single auto-commit
transactions. Any rollbacks are essentially changes to the past and
aren't permitted. Instead if someone wanted to 'undo' a change, they
would re-submit a previous version. This way, I can keep my replication
code to very atomic things which makes it very simple to write and
maintain.

 From my (somewhat limited experience) point of view, I think that this
plperlu script isn't much different from writing a daemon to receive
signals via NOTIFY. Instead the script is self daemonizing, and it will
always run (instead of a couple of NOTIFY's building up and only one
being sent), which is more in line with what I want.

 Sorry, my explanation probably isn't very clear at all, I've been
writing talk material and my brain is in a totally different space. Feel
free to deliver any LARTs. :-)

Cheers,

Chris



Re: Trigger that spawns forked process

From
Martijn van Oosterhout
Date:
On Tue, May 10, 2005 at 05:31:56PM -0400, Christopher Murtagh wrote:
> > I'm not sure what happens when you do "exit" here, but I'll lay odds
> > against it being exactly the right things.
>
> It ends the daemonized process, kinda like a wrapper suicide. :-)

I think you have a problem here. PostgreSQL is a complete program,
which use signal, atexit handlers, callback all or which fork()
preserves. When your "little daemon" exits it may trigger all the code
normally run on backend exit, you know, closing WAL files, etc...

The rest of the program has no idea it's a forked process rather than a
real one. Say the backend had a alarm() set and it goes off in your
forked process. Havoc ensues...

>  From my (somewhat limited experience) point of view, I think that this
> plperlu script isn't much different from writing a daemon to receive
> signals via NOTIFY. Instead the script is self daemonizing, and it will
> always run (instead of a couple of NOTIFY's building up and only one
> being sent), which is more in line with what I want.

Except that the daemon will be a client process that uses the database
to do work whereas with the other you're duplicating the server without
telling it and putting your data at risk...

>  Sorry, my explanation probably isn't very clear at all, I've been
> writing talk material and my brain is in a totally different space. Feel
> free to deliver any LARTs. :-)

You should never fork() and not exec() a large program unless it knows
you're doing it. Note that exec() doesn't run atexit handlers but
exit() does. Big difference...

Hope this helps,

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Trigger that spawns forked process

From
Christopher Murtagh
Date:
On Wed, 2005-05-11 at 00:08 +0200, Martijn van Oosterhout wrote:
> On Tue, May 10, 2005 at 05:31:56PM -0400, Christopher Murtagh wrote:
> > > I'm not sure what happens when you do "exit" here, but I'll lay odds
> > > against it being exactly the right things.
> >
> > It ends the daemonized process, kinda like a wrapper suicide. :-)
>
> I think you have a problem here. PostgreSQL is a complete program,
> which use signal, atexit handlers, callback all or which fork()
> preserves. When your "little daemon" exits it may trigger all the code
> normally run on backend exit, you know, closing WAL files, etc...
>
> The rest of the program has no idea it's a forked process rather than a
> real one. Say the backend had a alarm() set and it goes off in your
> forked process. Havoc ensues...

 Ok, I think I'm seeing the light here. Thanks for your input!

So, if I made a slight modification to my script to this instead:

CREATE or REPLACE function exec_test() returns void as '
unless (defined ($pid=fork)) {
     die "cannot fork: $!";
}
unless ($pid) {
     $cmd="/path/to/some/script.pl";
     exec "$cmd";
}
RETURN;
' language plperlu;

Then the exec'd $cmd shouldn't inherit any of the signals or atexit
handlers. My script.pl can handle any errors it encounters (other than
not being executed of course, but I can live with that), and this way I
don't have to write a daemon that polls listening for a NOTIFY.

 Is this less objectionable? Again, thanks to all for your input and
feedback, I really do appreciate it.

Cheers,

Chris


--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Service Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017


Re: Trigger that spawns forked process

From
Tom Lane
Date:
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes:
> So, if I made a slight modification to my script to this instead:

> CREATE or REPLACE function exec_test() returns void as '
> unless (defined ($pid=fork)) {
>      die "cannot fork: $!";
> }
> unless ($pid) {
>      $cmd="/path/to/some/script.pl";
>      exec "$cmd";
> }
> RETURN;
> ' language plperlu;

>  Is this less objectionable?

Well, it's better, but you need to think about what happens if the exec
fails (eg, script is not where you thought).  Does plperl let you get at
abort(), or some other way of terminating the process other than exit()?

There still remains the point that this violates transaction semantics,
in that an aborted transaction may still have caused things to happen
outside the database.

            regards, tom lane