Re: elegant and effective way for running jobs inside a database - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: elegant and effective way for running jobs inside a database
Date
Msg-id CAHyXU0xpbrkoEDSchnGUJ1SDEcCs1mm+hZrGxKXgtKicee1+Wg@mail.gmail.com
Whole thread Raw
In response to Re: elegant and effective way for running jobs inside a database  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Mar 6, 2012 at 9:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> But having said that, it's not apparent to me why such a thing would
>> need to live "inside the database" at all.  It's very easy to visualize
>> a task scheduler that runs as a client and requires nothing new from the
>> core code.  Approaching the problem that way would let the scheduler
>> be an independent project that stands or falls on its own merits.
>
> I was trying to make a general comment about PostgreSQL development,
> without diving too far into the merits or demerits of this particular
> feature.  I suspect you'd agree with me that, in general, a lot of
> valuable things don't get done because there aren't enough people or
> enough hours in the day, and we can always use more contributors.
>
> But since you brought it up, I think there is a lot of value to having
> a scheduler that's integrated with the database.  There are many
> things that the database does which could also be done outside the
> database, but people want them in the database because it's easier
> that way.  If you have a web application that talks to the database,
> and which sometimes needs to schedule tasks to run at a future time,
> it is much nicer to do that by inserting a row into an SQL table
> somewhere, or executing some bit of DDL, than it is to do it by making
> your web application know how to connect to a PostgreSQL database and
> also how to rewrite crontab (in a concurrency-safe manner, no less).

The counter argument to this is that there's nothing keeping you from
layering your own scheduling system on top of cron.  Cron provides the
heartbeat -- everything else you build out with tables implementing a
work queue or whatever else comes to mind.

The counter-counter argument is that cron has a couple of annoying
limitations -- sub minute scheduling is not possible, lousy windows
support, etc.  It's pretty appealing that you would be able to back up
your database and get all your scheduling configuration back up with
it.  Dealing with cron is a headache for database administrators.

Personally I find the C-unixy way of solving this problem inside
postgres not worth chasing -- that really does belong outside and you
really are rewriting cron.  A (mostly) sql driven scheduler would be
pretty neat though.

I agree with Chris B upthread: I find that what people really need
here is stored procedures, or some way of being able to embed code in
the database that can manage it's own transactions.  That way your
server-side entry, dostuff() called every minute doesn't have to exit
to avoid accumulating locks for everything it needs to do or be broken
up into multiple independent entry points in scripts outside the
database.  Without SP though, you can still do it via 100% sql/plpgsql
using listen/notify and dblink for the AT workaround, and at least one
dedicated task runner.  By 'it' I mean a server side scheduling system
relying on a heartbeat from out of the database code.

merlin


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: elegant and effective way for running jobs inside a database
Next
From: Christopher Browne
Date:
Subject: Re: elegant and effective way for running jobs inside a database