Thread: Re: Scheduler in Postgres

Re: Scheduler in Postgres

From
"Guy Rouillier"
Date:
Here is a real world example where a scheduler in PostgreSQL would be
helpful.  We collect usage statistics from our network throughout the
day (raw stats.)  After midnight, we roll up those raw stats into daily
statistics.
We have a very large amount of data, about 2 million rows a day a
growing, so I want this whole operation done on the database server.
It's all database work, just summing up data from one table and putting
the result in another table.  We have all that logic in a stored
procedure.  So why do I need to set up a cron job and a shell script
whose only task is to connect to the database and start up the stored
procedure?  Wouldn't it be much simpler just to have a schedule in
PostgreSQL that says "at 12:01, run this stored procedure"?

Another advantage to having a scheduler in the database is to ease your
DBA's job in maintenance, and to coordinate work by multiple systems.

--
Guy Rouillier


Re: Scheduler in Postgres

From
Ben
Date:
I have many such tasks. Depending on implementation, it has the potential
to be a TINY amount of less work to schedule such tasks from inside
the database, but it takes all of about a minute to schedule it through
cron. Including the amount of time it takes to refer to the man page.

Additionally, cron has already been written and is already maintained,
neither of which is true about an internal postgres scheduler.

I suppose I can see the point of needing a scheduler in postgres if you
don't give your DBAs access to cron, but from my point of view (for
whatever that's worth), giving your DBAs access to cron seems like a small
price to pay for an elegant database design.

On Fri, 17 Dec 2004, Guy Rouillier wrote:

> Here is a real world example where a scheduler in PostgreSQL would be
> helpful.  We collect usage statistics from our network throughout the
> day (raw stats.)  After midnight, we roll up those raw stats into daily
> statistics.
> We have a very large amount of data, about 2 million rows a day a
> growing, so I want this whole operation done on the database server.
> It's all database work, just summing up data from one table and putting
> the result in another table.  We have all that logic in a stored
> procedure.  So why do I need to set up a cron job and a shell script
> whose only task is to connect to the database and start up the stored
> procedure?  Wouldn't it be much simpler just to have a schedule in
> PostgreSQL that says "at 12:01, run this stored procedure"?
>
> Another advantage to having a scheduler in the database is to ease your
> DBA's job in maintenance, and to coordinate work by multiple systems.
>
> --
> Guy Rouillier
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>



Re: Scheduler in Postgres

From
Christopher Browne
Date:
In the last exciting episode, guyr@masergy.com ("Guy Rouillier") wrote:
> Here is a real world example where a scheduler in PostgreSQL would be
> helpful.  We collect usage statistics from our network throughout the
> day (raw stats.)  After midnight, we roll up those raw stats into daily
> statistics.

> We have a very large amount of data, about 2 million rows a day a
> growing, so I want this whole operation done on the database server.
> It's all database work, just summing up data from one table and
> putting the result in another table.  We have all that logic in a
> stored procedure.  So why do I need to set up a cron job and a shell
> script whose only task is to connect to the database and start up
> the stored procedure?  Wouldn't it be much simpler just to have a
> schedule in PostgreSQL that says "at 12:01, run this stored
> procedure"?

But that means Yet Another Scheduling User Interface to learn.  I'd
really rather prefer to improve cron.

It's not as if DBMS connections are all that daunting to establish,
after all.  Even on a "paranoid" system where we use MD5 passwords
everywhere, I can make the challenges go away using .pgpass

If you have a lot of jobs like this, it makes sense to me to use some
common "scheduling" layer that doesn't tell you "Oh, that's not
database stuff, so you'll have to learn some other scheduler!"

> Another advantage to having a scheduler in the database is to ease
> your DBA's job in maintenance, and to coordinate work by multiple
> systems.

I'd consider that a disadvantage, personally, as forcing it to be
usable solely for DBMS tasks means that you have to learn (and pick
between) multiple schedulers in order to manage things outside the
DBMS.

We run Slony-I, for instance.  We might want to schedule activity for
it using a scheduler, but your "captive interface inside the DBMS"
won't work for that because slon daemons and slonik control scripts
run as processes that reside _outside_ the DBMS.

Similarly, I hate the idea of having multiple PIM "calendar" systems;
if tasks are getting scheduled in 4 places for me, personally, that
means I need to do a barrel of in-my-head context switching in order
to monitor each of those four places, and to try to keep them
consistent.  I seem to recall someone doing a talk making that very
point at OSCON.
--
let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/sap.html
"...you  might  as well  skip  the  Xmas  celebration completely,  and
instead  sit  in  front  of  your  linux  computer  playing  with  the
all-new-and-improved linux kernel version." -- Linus Torvalds

Re: Scheduler in Postgres

From
Christopher Browne
Date:
> We have a very large amount of data, about 2 million rows a day a
> growing, so I want this whole operation done on the database server.
> It's all database work, just summing up data from one table and
> putting the result in another table.  We have all that logic in a
> stored procedure.  So why do I need to set up a cron job and a shell
> script whose only task is to connect to the database and start up
> the stored procedure?  Wouldn't it be much simpler just to have a
> schedule in PostgreSQL that says "at 12:01, run this stored
> procedure"?

But that means Yet Another Scheduling User Interface to learn.  I'd
really rather prefer to improve cron.

It's not as if DBMS connections are all that daunting to establish,
after all.  Even on a "paranoid" system where we use MD5 passwords
everywhere, I can make the challenges go away using .pgpass

If you have a lot of jobs like this, it makes sense to me to use some
common "scheduling" layer that doesn't tell you "Oh, that's not
database stuff, so you'll have to learn some other scheduler!"

> Another advantage to having a scheduler in the database is to ease
> your DBA's job in maintenance, and to coordinate work by multiple
> systems.

I'd consider that a disadvantage, personally, as forcing it to be
usable solely for DBMS tasks means that you have to learn (and pick
between) multiple schedulers in order to manage things outside the
DBMS.

We run Slony-I, for instance.  We might want to schedule activity for
it using a scheduler, but your "captive interface inside the DBMS"
won't work for that because slon daemons and slonik control scripts
run as processes that reside _outside_ the DBMS.

Similarly, I hate the idea of having multiple PIM "calendar" systems;
if tasks are getting scheduled in 4 places for me, personally, that
means I need to do a barrel of in-my-head context switching in order
to monitor each of those four places, and to try to keep them
consistent.  I seem to recall someone doing a talk making that very
point at OSCON.
--
let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/sap.html
"...you  might  as well  skip  the  Xmas  celebration completely,  and
instead  sit  in  front  of  your  linux  computer  playing  with  the
all-new-and-improved linux kernel version." -- Linus Torvalds

Re: Scheduler in Postgres

From
Greg Stark
Date:
"Guy Rouillier" <guyr@masergy.com> writes:

> So why do I need to set up a cron job and a shell script whose only task is
> to connect to the database and start up the stored procedure? Wouldn't it be
> much simpler just to have a schedule in PostgreSQL that says "at 12:01, run
> this stored procedure"?

Why is that any simpler? You need a shell script or web server or other client
to do everything else you want to do in the database, would it be simpler if
we just put the web server and the command line in the database?

Why do you think it would be simpler to have to maintain a whole other
scheduler just for database jobs instead of just using the one you're already
using? That seems like a whole new pile of moving parts to do something you
already have working machinery for.

--
greg

Re: Scheduler in Postgres

From
"Jim C. Nasby"
Date:
On Fri, Dec 17, 2004 at 03:25:20PM -0800, Ben wrote:
> I have many such tasks. Depending on implementation, it has the potential
> to be a TINY amount of less work to schedule such tasks from inside
> the database, but it takes all of about a minute to schedule it through
> cron. Including the amount of time it takes to refer to the man page.

Sure, it might take no amount of time to schedule it, but that's not the
point. Now you have to have code to:

- deal with what you do if the database is down when cron fires off
- provide some kind of semaphore so you don't have multiple copies of
  one job running at once
- connect to the database as the proper user using the proper
  credentials and setup the proper environment

What this all boils down to is there are 2 different sets of needs. For
some things cron works great, and there's no need to worry about any of
this. For other things it would be much easier not to worry about cron
at all and just have the database handle your periodic tasks (which
presumably are stored procedures or something similar).

I think pgcron is an interesting idea, though not directly PostgreSQL
related (I'd argue that if you're going to make a fancier replacement
for cron you should make it back-end agnostic). I also think a lot of
people would benefit from having something similar to dbms_job (Oracle's
scheduling system) available for PostgreSQL. As I mentioned in another
email, I think it's possible to get 90-95% of dbms_job's functionality
without having to touch the backend at all, which eliminates the
argument that development time is better spent elsewhere (since the core
PostgreSQL folks wouldn't have to worry about it at all). BTW, I mention
dbms_job as an example not because it's necessarily the right way to
handle scheduling, but because it's a convenient example. Oracle's newer
job system is even better.
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Scheduler in Postgres

From
Marco Colombo
Date:
On Fri, 17 Dec 2004, Ben wrote:

> I have many such tasks. Depending on implementation, it has the potential
> to be a TINY amount of less work to schedule such tasks from inside
> the database, but it takes all of about a minute to schedule it through
> cron. Including the amount of time it takes to refer to the man page.
>
> Additionally, cron has already been written and is already maintained,
> neither of which is true about an internal postgres scheduler.
>
> I suppose I can see the point of needing a scheduler in postgres if you
> don't give your DBAs access to cron, but from my point of view (for
> whatever that's worth), giving your DBAs access to cron seems like a small
> price to pay for an elegant database design.

You're making too many assumptions about cron:

- cron is available: true only for UNIX-like systems. Nowadays PostgreSQL
   runs natively under other OSes. You may need to install an external
   tool in order to get this functionality, and there are good chances
   it won't be cron-alike.

- you have access to cron: most of DBA tasks on PostgreSQL can be
   performed remotely, all you need is SQL access. Right now, there's
   no way to schedule simple db jobs from SQL. Think of an application
   management web interface: how are you expected to set a simple job
   up? Consider that the web server might (perhaps should) run on a
   different server than than PostgreSQL, and that using the local
   cron service might be unfeasable (again, the web server might run
   on some non-UNIX-like system).

- it is feasable to give cron automated DBA access: unauthorized
   access to either cron or the user used to run DB-relates jobs
   immediately scales up to DBA-level access. Note that I use 'DBA'
   in a broader sense here: any credendial that grants you some sort
   of dangerous write access to a certain database (might be the mere
   owner of tables, w/o other special permission as far as the DB is
   concerned). Again, this cron daemon might run on a different server.

I think that a cron replacement with a DB backend would be nice, but
that's _not_ what I'm referring to here.

I think the need to run administrative tasks on a database is _very_
common, and it's a natural part of an database system. It'd be nice
if the server provided a simple way to schedule db-related jobs.
It'd solve many issues in a very _elegant_ way, IMHO.

> On Fri, 17 Dec 2004, Guy Rouillier wrote:
>
>> Here is a real world example where a scheduler in PostgreSQL would be
>> helpful.  We collect usage statistics from our network throughout the
>> day (raw stats.)  After midnight, we roll up those raw stats into daily
>> statistics.
>> We have a very large amount of data, about 2 million rows a day a
>> growing, so I want this whole operation done on the database server.
>> It's all database work, just summing up data from one table and putting
>> the result in another table.  We have all that logic in a stored
>> procedure.  So why do I need to set up a cron job and a shell script
>> whose only task is to connect to the database and start up the stored
>> procedure?  Wouldn't it be much simpler just to have a schedule in
>> PostgreSQL that says "at 12:01, run this stored procedure"?
>>
>> Another advantage to having a scheduler in the database is to ease your
>> DBA's job in maintenance, and to coordinate work by multiple systems.
>>
>> --
>> Guy Rouillier

Thanks Guy for your contribution: that's _exactly_ the kind of use case
I was thinking of. With an in-core scheduler, you are able to set the
jobs up via SQL commands (no need for shell access, no need for cron/at
access on the server, no need to store passwords anywhere), and they'll
run at the right time, with the right permissions.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Scheduler in Postgres

From
"Jim C. Nasby"
Date:
Sorry to duplicate this but I thought some of you might not have noticed
my other email about this. I have registered pgJob at pgfoundary
(http://pgfoundry.org/projects/pgjob/). The intention of pgJob is to
provide a means to run SQL commands in a database at specified
intervals. There is a mailing list available
(http://pgfoundry.org/mailman/listinfo/pgjob-devel), and I suggest
anyone interested in this subscribe to that list.

Note that I don't intend for this to be a generic cron replacement as
some people were discussing. For anyone interested in that, pgcron seems
to be available on pgfoundary. :)
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"