Thread: Trigger that spawns forked process
I would like to write a trigger or function that spawns a forked process so that the transaction is considered 'complete' to the client, but continues to perform more work. I've been looking for examples to steal^H^H^H^H^H learn from but have only found someone asking pretty much the same question: http://archives.postgresql.org/pgsql-general/2002-12/msg01187.php Has anyone done anything like this? Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Service Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes: > I would like to write a trigger or function that spawns a forked > process so that the transaction is considered 'complete' to the client, > but continues to perform more work. It's not very clear what you are hoping to have the forked process do, but if it's supposed to be another backend, forget it --- won't work. See this thread: http://archives.postgresql.org/pgsql-sql/2005-04/msg00329.php regards, tom lane
On Mon, 2005-05-09 at 15:38 -0400, Tom Lane wrote: > Christopher Murtagh <christopher.murtagh@mcgill.ca> writes: > > I would like to write a trigger or function that spawns a forked > > process so that the transaction is considered 'complete' to the client, > > but continues to perform more work. > > It's not very clear what you are hoping to have the forked process do, > but if it's supposed to be another backend, forget it --- won't work. No, I don't want the trigger to do any db stuff at all. Basically, I've got a content management system that is going to be split across a cluster. Upon an update submission from one of them, I want to replicate across the others (which can happen in pseudo real time). So, basically the DB is the master, once it makes an update, it will spawn a process to the syncronization (all of this code is written). I just don't want the initial update process to wait for replication to finish (which is only a second or two under normal load). I could write a daemon that would sit an listen to these replication requests, but that just seems to be more complex than I need. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Service Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes: > No, I don't want the trigger to do any db stuff at all. Basically, I've > got a content management system that is going to be split across a > cluster. Upon an update submission from one of them, I want to replicate > across the others (which can happen in pseudo real time). So, basically > the DB is the master, once it makes an update, it will spawn a process > to the syncronization (all of this code is written). I just don't want > the initial update process to wait for replication to finish (which is > only a second or two under normal load). I could write a daemon that > would sit an listen to these replication requests, but that just seems > to be more complex than I need. Why not have a client connection LISTENing and doing the synchronization, and have the trigger use NOTIFY? Or, you could have the trigger write to a table, and have another client periodically scanning the table for new sync events. Either one of those would be simpler and more robust than fork()ing inside the backend. -Doug
Douglas McNaught <doug@mcnaught.org> writes: > Why not have a client connection LISTENing and doing the > synchronization, and have the trigger use NOTIFY? > Or, you could have the trigger write to a table, and have another > client periodically scanning the table for new sync events. > Either one of those would be simpler and more robust than fork()ing > inside the backend. ... not to mention it would avoid the risk of propagating not-yet-committed changes. regards, tom lane
On Mon, 2005-05-09 at 17:01 -0400, Douglas McNaught wrote: > Why not have a client connection LISTENing and doing the > synchronization, and have the trigger use NOTIFY? > > Or, you could have the trigger write to a table, and have another > client periodically scanning the table for new sync events. > > Either one of those would be simpler and more robust than fork()ing > inside the backend. How is writing a daemon simpler than using something that could be done within Postgres? Forking is something that should be natural to Unix systems, I shouldn't need to write another application to do this. I don't see how a daemon would necessarily be more robust either. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Service Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017
On Mon, 2005-05-09 at 17:07 -0400, Tom Lane wrote: > Douglas McNaught <doug@mcnaught.org> writes: > > Why not have a client connection LISTENing and doing the > > synchronization, and have the trigger use NOTIFY? > > Or, you could have the trigger write to a table, and have another > > client periodically scanning the table for new sync events. > > Either one of those would be simpler and more robust than fork()ing > > inside the backend. > > ... not to mention it would avoid the risk of propagating > not-yet-committed changes. How's that? If I can notify a daemon that the change is committed, then why couldn't I write a forking plperl function that executes when the transaction is done? How is one riskier than the other? Is there something obvious I'm missing here? Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Service Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes: > On Mon, 2005-05-09 at 17:01 -0400, Douglas McNaught wrote: >> Why not have a client connection LISTENing and doing the >> synchronization, and have the trigger use NOTIFY? >> >> Or, you could have the trigger write to a table, and have another >> client periodically scanning the table for new sync events. >> >> Either one of those would be simpler and more robust than fork()ing >> inside the backend. > > How is writing a daemon simpler than using something that could be done > within Postgres? Forking is something that should be natural to Unix > systems, I shouldn't need to write another application to do this. I > don't see how a daemon would necessarily be more robust either. Why do random code surgery on the backend, which needs to be utterly robust and stable, when client-level solutions are just as easy and much less dangerous? It's kind of akin to the Linux principle of "don't do in the kernel what you can do in userspace." That's my philosophy, anyway. :) -Doug
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes: > On Mon, 2005-05-09 at 17:07 -0400, Tom Lane wrote: >> ... not to mention it would avoid the risk of propagating >> not-yet-committed changes. > How's that? If I can notify a daemon that the change is committed, then > why couldn't I write a forking plperl function that executes when the > transaction is done? How is one riskier than the other? Is there > something obvious I'm missing here? Yes: the mechanisms that are being suggested to you already exist. There is not, AND NEVER WILL BE, any mechanism to invoke random user-defined functions during the post-commit sequence. That code sequence cannot afford to do anything that will potentially incur errors. regards, tom lane
On Mon, May 09, 2005 at 09:07:40PM -0400, Christopher Murtagh wrote: > On Mon, 2005-05-09 at 17:01 -0400, Douglas McNaught wrote: > > Why not have a client connection LISTENing and doing the > > synchronization, and have the trigger use NOTIFY? > > > > Or, you could have the trigger write to a table, and have another > > client periodically scanning the table for new sync events. > > > > Either one of those would be simpler and more robust than fork()ing > > inside the backend. > > How is writing a daemon simpler than using something that could be done > within Postgres? Forking is something that should be natural to Unix > systems, I shouldn't need to write another application to do this. I > don't see how a daemon would necessarily be more robust either. Well, LISTEN and NOTIFY are built into PostgreSQL (http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). If the processes that you're trying to notify of the changes are connected to the database then this might be the easiest way to do what you're looking for. Setting up some form of replication, such as Slony, also comes to mind. But it's impossible to really make a recommendation without having a better idea of what you're doing. BTW, my understanding is that it's pretty easy to write a daemon in perl, and there are examples of how to do this floating around. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Tue, 2005-05-10 at 11:11 -0500, Jim C. Nasby wrote: > Well, LISTEN and NOTIFY are built into PostgreSQL > (http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). If the > processes that you're trying to notify of the changes are connected to > the database then this might be the easiest way to do what you're > looking for. Setting up some form of replication, such as Slony, also > comes to mind. But it's impossible to really make a recommendation > without having a better idea of what you're doing. > > BTW, my understanding is that it's pretty easy to write a daemon in > perl, and there are examples of how to do this floating around. Yes, I saw the LISTEN/NOTIFY stuff, and it could be interesting. As to the replication, Slony won't do it for me, as it isn't the database I want to replicate. Here's a basic description: I have 4 cluster nodes all running the same content management software (home grown). When a change request comes in to one of them (update to an XML document), it submits the new XML doc to the database (which is the master repository of all content), then performs an XSLT. Upon the new change, I want the database to propagate the new result of the XSLT to the other nodes so that they can pre-cache it (to avoid page loading latency). I was given an example of how to spawn a forked process with plperlu, and it looks pretty simple and straightforward and exactly what I want: CREATE or REPLACE function somefunc() returns void as $$ $SIG{CHLD}='IGNORE'; # the preceding line removes any zombies created. # Assumes you do not want to handle the return value #from the child process unless (defined ($pid=fork)) { die "cannot fork: $!"; } unless ($pid) { $cmd="your command here"; system "$cmd"; if ($? != 0) { # handle errors here } exit; } RETURN; $$ language plperlu; This seems to be pretty trivial, and near fail-proof to me. my '$cmd' would then be a script that handles the replication of the cached file to the nodes (already written and tested). Why is a daemon more robust than this? (BTW, I ask out of ignorance, not out of arrogance). Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Service Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes: > I was given an example of how to spawn a forked process with plperlu, > and it looks pretty simple and straightforward and exactly what I want: > CREATE or REPLACE function somefunc() returns void as $$ > $SIG{CHLD}='IGNORE'; ... let's see, you already broke the backend there --- unless its normal setting of SIGCHLD is IGNORE, in which case munging it is unnecessary anyway ... > unless ($pid) { > $cmd="your command here"; > system "$cmd"; > if ($? != 0) { > # handle errors here > } > exit; > } I'm not sure what happens when you do "exit" here, but I'll lay odds against it being exactly the right things. (An atexit hook in a backend process is entitled to suppose it is cleaning up a backend.) Also, exactly what is your "handle errors" step going to be? You don't get to reflect anything back into the database at that point. > This seems to be pretty trivial, and near fail-proof to me. my '$cmd' > would then be a script that handles the replication of the cached file > to the nodes (already written and tested). Why is a daemon more robust > than this? (BTW, I ask out of ignorance, not out of arrogance). The main reason why this is probably a bad idea is that your transaction is causing side-effects outside the database that cannot be undone if the transaction later rolls back. The general consensus of people who have looked at this is that it's safer to fire those operations after the transaction actually commits. (As an example, this gives you an opportunity to retry the outside operations if *they* fail, and in any case to keep some state information about whether the outside-the-DB state is actually synced with inside-the-DB or not.) regards, tom lane
On Tue, 2005-05-10 at 15:02, Christopher Murtagh wrote: > On Tue, 2005-05-10 at 11:11 -0500, Jim C. Nasby wrote: > > Well, LISTEN and NOTIFY are built into PostgreSQL > > (http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). If the > > processes that you're trying to notify of the changes are connected to > > the database then this might be the easiest way to do what you're > > looking for. Setting up some form of replication, such as Slony, also > > comes to mind. But it's impossible to really make a recommendation > > without having a better idea of what you're doing. > > > > BTW, my understanding is that it's pretty easy to write a daemon in > > perl, and there are examples of how to do this floating around. > > Yes, I saw the LISTEN/NOTIFY stuff, and it could be interesting. As to > the replication, Slony won't do it for me, as it isn't the database I > want to replicate. Here's a basic description: > > I have 4 cluster nodes all running the same content management software > (home grown). When a change request comes in to one of them (update to > an XML document), it submits the new XML doc to the database (which is > the master repository of all content), then performs an XSLT. Upon the > new change, I want the database to propagate the new result of the XSLT > to the other nodes so that they can pre-cache it (to avoid page loading > latency). Seeing as how Slony replicates tables you choose to have it replicate, it seems to me you could just have it replicate the post-xslt table and it would do what you want.
On Tue, May 10, 2005 at 04:02:59PM -0400, Christopher Murtagh wrote: > On Tue, 2005-05-10 at 11:11 -0500, Jim C. Nasby wrote: > > Well, LISTEN and NOTIFY are built into PostgreSQL > > (http://www.postgresql.org/docs/8.0/interactive/sql-notify.html). > > If the processes that you're trying to notify of the changes are > > connected to the database then this might be the easiest way to do > > what you're looking for. Setting up some form of replication, such > > as Slony, also comes to mind. But it's impossible to really make a > > recommendation without having a better idea of what you're doing. > > > > BTW, my understanding is that it's pretty easy to write a daemon > > in perl, and there are examples of how to do this floating around. > > Yes, I saw the LISTEN/NOTIFY stuff, and it could be interesting. As > to the replication, Slony won't do it for me, as it isn't the > database I want to replicate. Here's a basic description: > > I have 4 cluster nodes all running the same content management > software (home grown). When a change request comes in to one of them > (update to an XML document), it submits the new XML doc to the > database (which is the master repository of all content), then > performs an XSLT. Upon the new change, I want the database to > propagate the new result of the XSLT to the other nodes so that they > can pre-cache it (to avoid page loading latency). Why do you think Slony won't work for this? One way it could do it is to have an ON INSERT trigger that populates one or more tables with the result of the XSLT, which table(s) Slony replicates to the other servers. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Tue, 2005-05-10 at 13:50 -0700, David Fetter wrote: > Why do you think Slony won't work for this? One way it could do it is > to have an ON INSERT trigger that populates one or more tables with > the result of the XSLT, which table(s) Slony replicates to the other > servers. Because the nodes are not databases, they are Apache/PHP web servers which have file system caching where the URL = directory/file. The XSLT also converts XML objects to PHP code. So basically a content editor can do something like: <br /><br /> This is my course description: <coursedesc courseid="AAA 123" /> ... etc. and the XSLT converts the <coursedesc /> tag into a PHP function (that does a db lookup). Cheers, Chris
On Tue, 2005-05-10 at 16:17 -0400, Tom Lane wrote: > ... let's see, you already broke the backend there --- unless its normal > setting of SIGCHLD is IGNORE, in which case munging it is unnecessary > anyway ... Here's my (probably all garbled) explanation: Essentially what that code is a self-daemonizing perl wrapper. Setting SIGCHLD to IGNORE will prevent zombie processes from hanging around, essentially daemonizing/orphaning/forking the perl script. > > unless ($pid) { > > $cmd="your command here"; > > system "$cmd"; > > if ($? != 0) { > > # handle errors here > > } > > exit; > > } > > I'm not sure what happens when you do "exit" here, but I'll lay odds > against it being exactly the right things. It ends the daemonized process, kinda like a wrapper suicide. :-) > (An atexit hook in a backend > process is entitled to suppose it is cleaning up a backend.) Also, > exactly what is your "handle errors" step going to be? Well, if my command fails for some reason, I can replace '#handle errors' with something that notifies me (email, or by populating the database, etc.). > You don't get to reflect anything back into the database at that point. That's ok, my $cmd might or might not have db connections in it, same for the error checking script (which could be written in a totally different language). > The main reason why this is probably a bad idea is that your > transaction is causing side-effects outside the database that cannot > be undone if the transaction later rolls back. The general consensus > of people who have looked at this is that it's safer to fire those > operations after the transaction actually commits. I should have stated that this will get used only by single auto-commit transactions. Any rollbacks are essentially changes to the past and aren't permitted. Instead if someone wanted to 'undo' a change, they would re-submit a previous version. This way, I can keep my replication code to very atomic things which makes it very simple to write and maintain. From my (somewhat limited experience) point of view, I think that this plperlu script isn't much different from writing a daemon to receive signals via NOTIFY. Instead the script is self daemonizing, and it will always run (instead of a couple of NOTIFY's building up and only one being sent), which is more in line with what I want. Sorry, my explanation probably isn't very clear at all, I've been writing talk material and my brain is in a totally different space. Feel free to deliver any LARTs. :-) Cheers, Chris
On Tue, May 10, 2005 at 05:31:56PM -0400, Christopher Murtagh wrote: > > I'm not sure what happens when you do "exit" here, but I'll lay odds > > against it being exactly the right things. > > It ends the daemonized process, kinda like a wrapper suicide. :-) I think you have a problem here. PostgreSQL is a complete program, which use signal, atexit handlers, callback all or which fork() preserves. When your "little daemon" exits it may trigger all the code normally run on backend exit, you know, closing WAL files, etc... The rest of the program has no idea it's a forked process rather than a real one. Say the backend had a alarm() set and it goes off in your forked process. Havoc ensues... > From my (somewhat limited experience) point of view, I think that this > plperlu script isn't much different from writing a daemon to receive > signals via NOTIFY. Instead the script is self daemonizing, and it will > always run (instead of a couple of NOTIFY's building up and only one > being sent), which is more in line with what I want. Except that the daemon will be a client process that uses the database to do work whereas with the other you're duplicating the server without telling it and putting your data at risk... > Sorry, my explanation probably isn't very clear at all, I've been > writing talk material and my brain is in a totally different space. Feel > free to deliver any LARTs. :-) You should never fork() and not exec() a large program unless it knows you're doing it. Note that exec() doesn't run atexit handlers but exit() does. Big difference... Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Wed, 2005-05-11 at 00:08 +0200, Martijn van Oosterhout wrote: > On Tue, May 10, 2005 at 05:31:56PM -0400, Christopher Murtagh wrote: > > > I'm not sure what happens when you do "exit" here, but I'll lay odds > > > against it being exactly the right things. > > > > It ends the daemonized process, kinda like a wrapper suicide. :-) > > I think you have a problem here. PostgreSQL is a complete program, > which use signal, atexit handlers, callback all or which fork() > preserves. When your "little daemon" exits it may trigger all the code > normally run on backend exit, you know, closing WAL files, etc... > > The rest of the program has no idea it's a forked process rather than a > real one. Say the backend had a alarm() set and it goes off in your > forked process. Havoc ensues... Ok, I think I'm seeing the light here. Thanks for your input! So, if I made a slight modification to my script to this instead: CREATE or REPLACE function exec_test() returns void as ' unless (defined ($pid=fork)) { die "cannot fork: $!"; } unless ($pid) { $cmd="/path/to/some/script.pl"; exec "$cmd"; } RETURN; ' language plperlu; Then the exec'd $cmd shouldn't inherit any of the signals or atexit handlers. My script.pl can handle any errors it encounters (other than not being executed of course, but I can live with that), and this way I don't have to write a daemon that polls listening for a NOTIFY. Is this less objectionable? Again, thanks to all for your input and feedback, I really do appreciate it. Cheers, Chris -- Christopher Murtagh Enterprise Systems Administrator ISR / Web Service Group McGill University Montreal, Quebec Canada Tel.: (514) 398-3122 Fax: (514) 398-2017
Christopher Murtagh <christopher.murtagh@mcgill.ca> writes: > So, if I made a slight modification to my script to this instead: > CREATE or REPLACE function exec_test() returns void as ' > unless (defined ($pid=fork)) { > die "cannot fork: $!"; > } > unless ($pid) { > $cmd="/path/to/some/script.pl"; > exec "$cmd"; > } > RETURN; > ' language plperlu; > Is this less objectionable? Well, it's better, but you need to think about what happens if the exec fails (eg, script is not where you thought). Does plperl let you get at abort(), or some other way of terminating the process other than exit()? There still remains the point that this violates transaction semantics, in that an aborted transaction may still have caused things to happen outside the database. regards, tom lane