Thread: using SQL for multi-machine job management?
I'm considering using PostgreSQL as part of the implementation of a multi-machine job management system. Here is an overview of the system: -jobs are submitted by an API and stored to a SQL database. Jobs contain a list of source filenames and a description of the operations to perform on the files (compress file, add it to an archive, encrypt it, compare with other file, etc). -multiple machines (up to 50?) look at the database and grabs a job. It will update the database to indicate that it will be the machine running this job. It will also update the database with the current completion progress (%) of this job. 1)Is this something done often using SQL databases? 2)The jobs will be quite CPU intensive: will I run into trouble if the database is located on one of the machine which will be executing the jobs? 3)I would like to have a backup ready to take over if the machine with the database fails. Only some of the info I store (data about a job, machine that is executing the job) is important to back up. Things like job progress doesn't have to be backed up. Any tips on how I should set up the database to accomplish this? Thanks! -- View this message in context: http://www.nabble.com/using-SQL-for-multi-machine-job-management--tp25418254p25418254.html Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
On Sat, Sep 12, 2009 at 5:05 PM, jebjeb <martin.belleau@yahoo.com> wrote:
You might want to look into using something like SLURM or SGE or Condor for doing this type of thing.
Sean
I'm considering using PostgreSQL as part of the implementation of a
multi-machine job management system. Here is an overview of the system:
-jobs are submitted by an API and stored to a SQL database. Jobs contain a
list of source filenames and a description of the operations to perform on
the files (compress file, add it to an archive, encrypt it, compare with
other file, etc).
-multiple machines (up to 50?) look at the database and grabs a job. It
will update the database to indicate that it will be the machine running
this job. It will also update the database with the current completion
progress (%) of this job.
1)Is this something done often using SQL databases?
2)The jobs will be quite CPU intensive: will I run into trouble if the
database is located on one of the machine which will be executing the jobs?
3)I would like to have a backup ready to take over if the machine with the
database fails. Only some of the info I store (data about a job, machine
that is executing the job) is important to back up. Things like job
progress doesn't have to be backed up. Any tips on how I should set up the
database to accomplish this?
You might want to look into using something like SLURM or SGE or Condor for doing this type of thing.
Sean