I'm considering using PostgreSQL as part of the implementation of a multi-machine job management system. Here is an overview of the system:
-jobs are submitted by an API and stored to a SQL database. Jobs contain a list of source filenames and a description of the operations to perform on the files (compress file, add it to an archive, encrypt it, compare with other file, etc).
-multiple machines (up to 50?) look at the database and grabs a job. It will update the database to indicate that it will be the machine running this job. It will also update the database with the current completion progress (%) of this job.
1)Is this something done often using SQL databases? 2)The jobs will be quite CPU intensive: will I run into trouble if the database is located on one of the machine which will be executing the jobs? 3)I would like to have a backup ready to take over if the machine with the database fails. Only some of the info I store (data about a job, machine that is executing the job) is important to back up. Things like job progress doesn't have to be backed up. Any tips on how I should set up the database to accomplish this?
You might want to look into using something like SLURM or SGE or Condor for doing this type of thing.