Thread: how to ensure a client waits for a previous transaction to finish?
Hi, this is a bit of a noob question. I am using PGSql to perform some large analyses, with the clients being a sequentially run set of perl scripts (one to set up and populate tables and then down stream scripts to query the database for the results). During manual testing everything works, but automating it results in the query scripts starting before the population transactions having completed, so they give erroneous result. How can I ensure the query scripts wait until the population transaction processes have finished? I've looked at 'SET TRANSACTION ISOLATION LEVEL SERIALIZABLE' (to us in the query scripts I thought), but I'm not sure that is what I want. Can any one suggest where I should be looking, or if SET TRANSACTION is the right way to go? thanks
Dan Kortschak <dan.kortschak@adelaide.edu.au> writes: > How can I ensure the query scripts wait until the population transaction > processes have finished? I've looked at 'SET TRANSACTION ISOLATION LEVEL > SERIALIZABLE' (to us in the query scripts I thought), but I'm not sure > that is what I want. It's not. What you want is to COMMIT and make sure you've gotten the command completion response for that. Possibly perl is complicating matters with some sort of autocommit logic behind your back. regards, tom lane
Dan Kortschak wrote: > Hi, this is a bit of a noob question. > > I am using PGSql to perform some large analyses, with the clients being > a sequentially run set of perl scripts (one to set up and populate > tables and then down stream scripts to query the database for the > results). > .... when you say 'a series of perl scripts', do you mean separate perl files, each run as their own process? how are you invoking them?
Yes, they are separate perl files (I'm thinking that perhaps this wasn't the best way to do it now, but for the moment I'm going to have to stick with it). In the case of the manual testing it's jus a matter of command line calls. The automated runs call each script as part of a PBS torque script (essentially a sh script calls each in turn). cheers On Mon, 2009-12-07 at 13:16 -0800, John R Pierce wrote: > when you say 'a series of perl scripts', do you mean separate perl > files, each run as their own process? how are you invoking them?
Thanks for that, that should help me sort it out. I haven't used the autocommit option in pgdbi. I'll have a look to see if DBI::do has an option to wait for command completion. cheers On Mon, 2009-12-07 at 16:12 -0500, Tom Lane wrote: > It's not. What you want is to COMMIT and make sure you've gotten the > command completion response for that. Possibly perl is complicating > matters with some sort of autocommit logic behind your back.
On Mon, Dec 7, 2009 at 4:00 PM, Dan Kortschak <dan.kortschak@adelaide.edu.au> wrote: > During manual testing everything works, but automating it results in the > query scripts starting before the population transactions having > completed, so they give erroneous result. > Update your process to have the query scripts wait until the population scripts are done. Either that or have the populate script LOCK TABLE the first table that the query is going to use, and it will automatically block until your transaction is complete for populating. But really, it sounds like you have a process coordination problem, not a database problem. Hmmm.... you could use the DB to synchronize by having the query program LISTEN for an event and wait until that arrives, then have the populating program NOTIFY that event once it is done. Just select() on the Pg socket on the query program and once it is ready to read, check for any NOTIFY events that may have come.
Dan Kortschak wrote: > Yes, they are separate perl files (I'm thinking that perhaps this wasn't > the best way to do it now, but for the moment I'm going to have to stick > with it). > > In the case of the manual testing it's jus a matter of command line > calls. The automated runs call each script as part of a PBS torque > script (essentially a sh script calls each in turn). > I'm totally unfamiliar with torque., but you probably need to tell torque to run the first script and wait for it to return before running the rest, its probably launching a bunch concurrently.
On Mon, Dec 7, 2009 at 4:21 PM, Dan Kortschak <dan.kortschak@adelaide.edu.au> wrote: > The automated runs call each script as part of a PBS torque > script (essentially a sh script calls each in turn). > Sounds more like it calls them in parallel, else your populate script would have finished before your query script ran assuming they run in the right order.
Thanks to everyone who has answered this. The short answer is that torque is not behaving the way I expected and not the way I have ever seen it behave in the past. The I/O binding of these jobs may have something to do with this, but I will look into it further. cheers On Mon, 2009-12-07 at 13:26 -0800, John R Pierce wrote: > I'm totally unfamiliar with torque., but you probably need to tell > torque to run the first script and wait for it to return before > running > the rest, its probably launching a bunch concurrently. > That *shouldn't* be the case as the contents of a torque script should be run sequentially (many jobs depend on this and I've never seen job parts run out of order), just as a sh script is (they are actually just csh scripts in my case). My understanding is that the parallelisation occurs either through using MPI or other parallel compilers or running a number of torque jobs, BUT I've just tested the hypothesis by running it as a straight csh script - and it works perfectly, so there must be something like that going on. I'll ask some of our more experience torque admins about it. Thanks.
On Mon, Dec 7, 2009 at 5:32 PM, Dan Kortschak <dan.kortschak@adelaide.edu.au> wrote: > Thanks to everyone who has answered this. The short answer is that > torque is not behaving the way I expected and not the way I have ever > seen it behave in the past. The I/O binding of these jobs may have > something to do with this, but I will look into it further. > > cheers > > On Mon, 2009-12-07 at 13:26 -0800, John R Pierce wrote: >> I'm totally unfamiliar with torque., but you probably need to tell >> torque to run the first script and wait for it to return before >> running >> the rest, its probably launching a bunch concurrently. >> > That *shouldn't* be the case as the contents of a torque script should > be run sequentially (many jobs depend on this and I've never seen job > parts run out of order), just as a sh script is (they are actually just > csh scripts in my case). My understanding is that the parallelisation > occurs either through using MPI or other parallel compilers or running a > number of torque jobs, BUT I've just tested the hypothesis by running it > as a straight csh script - and it works perfectly, so there must be > something like that going on. I'll ask some of our more experience > torque admins about it. Thanks. If it turns out you need to have a lock with a 'longer than transaction' duration, maybe advisory locks are a good fit. merlin
I've spoken to people on the torque user mailing list and tried merlin's suggestion below (which looked like it should work - but unfortunately did not prevent the problem). From working through things with the torque list, it seems to be the case that postgresql is behaving differently because it is not attached to a terminal (this has caused problems for others on that list with sqlite and mysql). Can anyone confirm/refute this? And if it is the case, is there something that I can do about it? thanks again. Dan On Mon, 2009-12-07 at 18:33 -0500, Merlin Moncure wrote: > If it turns out you need to have a lock with a 'longer than > transaction' duration, maybe advisory locks are a good fit. > > merlin >
Dan Kortschak <dan.kortschak@adelaide.edu.au> writes: > From working through things with the torque list, it seems to be the > case that postgresql is behaving differently because it is not attached > to a terminal (this has caused problems for others on that list with > sqlite and mysql). That seems unlikely. My best guess at this point is that for some reason the torque scripts are failing to wait for psql to exit. You might possibly be able to confirm or deny this by watching the whole process tree with "strace -f" or local equivalent. regards, tom lane
On Tue, Dec 8, 2009 at 10:13 PM, Dan Kortschak <dan.kortschak@adelaide.edu.au> wrote: > I've spoken to people on the torque user mailing list and tried merlin's > suggestion below (which looked like it should work - but unfortunately > did not prevent the problem). > > >From working through things with the torque list, it seems to be the > case that postgresql is behaving differently because it is not attached > to a terminal (this has caused problems for others on that list with > sqlite and mysql). > > Can anyone confirm/refute this? And if it is the case, is there > something that I can do about it? Advisory locks are basically only useful if the locker of the resource maintains a database session (that is, stays connected and enjoys private use of that connection) for the duration of the lock. Aside: there is a way to hold locks from unconnected sessions...2PC, but the feature is dangerous and probably not useful in your case. Can you give a clearer explanation of the problem? You can monitor the output from: select * from pg_stat_activity; in psql. Take special note of 'idle in transaction' backends and if the connection is being regenerated behind your back by watching for the pid changing. merlin
Thanks Tom, That was my initial view and I'm still not convinced that I'm wrong - I can see no way that the server can tell that the query came from a process without a terminal (though I may be missing something here). Unfortunately I'm working at the boundary of my knowledge for both torque and pg, so it's difficult to be certain. I should point out that the pg queries are not submitted by psql, but from perl using the perl DBI. This does not fundamentally change the argument. The thing that worries me is that I can't replicate the fault any way but but calling with a qsub. Unfortunately I need the machine to do this work over the next few days, so I can't look into the details (I can run the jobs without using torque, but it makes life easier). The strace -f is a good suggestion and I'll try if once this set of jobs has been processed (each takes ~40-60 hours, and unfortunately the test case takes about 20-30 minutes - I suspect that this problem would not arise with more convenient tests). cheers Dan On Wed, 2009-12-09 at 00:22 -0500, Tom Lane wrote: > That seems unlikely. My best guess at this point is that for some > reason the torque scripts are failing to wait for psql to exit. > You might possibly be able to confirm or deny this by watching the > whole process tree with "strace -f" or local equivalent.
Thanks for that clarification Merlin, The server/client is on a workstation that is essentially private (I share some time with students, but they don't have pg access). The locks are across sessions. There are three perl scripts that connect to a pg db, one loads the database and creates some computed tables (a legacy of what the database was designed for that makes sense to retain) that are used for the subsequent analysis. The following two script query the db with a set of defined queries and provide the glue to R to generate a statistical analysis and produce graphs etc. The two analysis scripts seem to start/finish before the population and indexing has been completed. As I've mentioned in the previous post to Tom, I won't have the machine time to look at the suggestion you have provided for the next week of so, but I will follow it up one the analyses that are pending have been completed. The complexity of the system suggests to me that regeneration of connections I unlikely to be a problem (there are only three sessions in the whole system). I was initially considering making a process state table for communication between the three scripts. (This is what I initially thought you were refering to in your first post.) This would essentially be a soft lock. thanks again Dan On Wed, 2009-12-09 at 00:28 -0500, Merlin Moncure wrote: > Advisory locks are basically only useful if the locker of the resource > maintains a database session (that is, stays connected and enjoys > private use of that connection) for the duration of the lock. Aside: > there is a way to hold locks from unconnected sessions...2PC, but the > feature is dangerous and probably not useful in your case. > > Can you give a clearer explanation of the problem? You can monitor > the output from: > select * from pg_stat_activity; > in psql. Take special note of 'idle in transaction' backends and if > the connection is being regenerated behind your back by watching for > the pid changing.