Thread: RFC: pgAgent Scheduler Design

RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

02 March 2005, 23:25:10

Hi,

I'm working on the scheduler design for pgAgent - this is basically a
service/daemon similar to SQL Server's 'SQL Server Agent' that runs
multi step jobs based on specified schedules, in which each step may be
an SQL script to run against any database in the cluster, or a
batch/shell script.

The basic design that I'm leaning towards is as follows, which each
schedule being represented in one row in a table.

Start/End/Run
-------------

The start date of a given schedule will be a date value (StartDate) -
the schedule will not be active until this date has been reached.

End date - a schedule may have date value (EndDate), after which, no
more instances of the schedule will run. If null, the job will continue
indefinately, or until the end count is reached (see next para).

End count - each schedule will include a run count value (RunCount),
which will be incremented with each run. When this value reaches the end
count value (EndCount), no more instances of the schedule will run. If
set to zero or null, the schedule will run indefinately, or until
EndDate is reached.

If both EndDate and EndCount are set, the schedule will end after the
first has been reached.

Run time - each schedule will run at the rime specified in the time
field RunTime

Schedule types
--------------

A variety of schedule types will allow most requirements to be met. The
proposed types, and their representations are:

One shot - This type of schedule will execute once only at the date and
time specified in the StartDate and RunTime values.

Hourly - This schedule will execute repeatedly at the time specified in
RunTime, on or after the date specified in StartDate. A bool[24] value
(Hours) will specify which hours of the day the schedule will run at.

Daily - This schedule will execute repeatedly at the time specified in
RunTime, on or after the date specified in StartDate. An integer value
will specify the number of days between each run.

Weekly - This schedule will execute repeatedly at the time specified in
RunTime, on or after the date specified in StartDate. A bool[7] value
(WeekDays) will specify which days of the week to run on.

Monthly - This schedule will execute repeatedly at the time specified in
RunTime, on or after the date specified in StartTime. A bool[12] value
(Months) will specify which months to run in. A bool[31] value
(MonthDays) will specify which days of the month to run on. Jobs set to
run on non-existant days (such as 31/02) will be skipped.

Exception Schedules
-------------------

A negative schedule will be identical to a normal schedule, except that
a bool value (Exception) will be set to true. When a job is due to run
at a given schedule, if an exception schedule occurs at the same time,
the job will not run. For example, if a daily schedule is defined to run
a job every second day, a weekly exception schedule may be created with
a matching run time on Fridays only. This would mean that the job would
actually run on every 2nd day unless it ws a Friday.

Any thoughts or comments?

Regards, Dave.

Re: RFC: pgAgent Scheduler Design

From

Andreas Pflug

Date:

03 March 2005, 01:21:52

Dave Page wrote:
> Hi,
>
> I'm working on the scheduler design for pgAgent - this is basically a
> service/daemon similar to SQL Server's 'SQL Server Agent' that runs
> multi step jobs based on specified schedules, in which each step may be
> an SQL script to run against any database in the cluster, or a
> batch/shell script.
>
> The basic design that I'm leaning towards is as follows, which each
> schedule being represented in one row in a table.
>
> Start/End/Run
> -------------
>
> The start date of a given schedule will be a date value (StartDate) -
> the schedule will not be active until this date has been reached.
>
> End date - a schedule may have date value (EndDate), after which, no
> more instances of the schedule will run. If null, the job will continue
> indefinately, or until the end count is reached (see next para).
>
> End count - each schedule will include a run count value (RunCount),
> which will be incremented with each run. When this value reaches the end
> count value (EndCount), no more instances of the schedule will run. If
> set to zero or null, the schedule will run indefinately, or until
> EndDate is reached.
>
> If both EndDate and EndCount are set, the schedule will end after the
> first has been reached.
>
> Run time - each schedule will run at the rime specified in the time
> field RunTime
>
> Schedule types
> --------------
>
> A variety of schedule types will allow most requirements to be met. The
> proposed types, and their representations are:
>
> One shot - This type of schedule will execute once only at the date and
> time specified in the StartDate and RunTime values.
>
> Hourly - This schedule will execute repeatedly at the time specified in
> RunTime, on or after the date specified in StartDate. A bool[24] value
> (Hours) will specify which hours of the day the schedule will run at.
>
> Daily - This schedule will execute repeatedly at the time specified in
> RunTime, on or after the date specified in StartDate. An integer value
> will specify the number of days between each run.
>
> Weekly - This schedule will execute repeatedly at the time specified in
> RunTime, on or after the date specified in StartDate. A bool[7] value
> (WeekDays) will specify which days of the week to run on.
>
> Monthly - This schedule will execute repeatedly at the time specified in
> RunTime, on or after the date specified in StartTime. A bool[12] value
> (Months) will specify which months to run in. A bool[31] value
> (MonthDays) will specify which days of the month to run on. Jobs set to
> run on non-existant days (such as 31/02) will be skipped.

Better is a bool[32], where the 32nd day is the last day of the month.
Silently skipping is not good, should at least be logged.


> Exception Schedules
> -------------------
>
> A negative schedule will be identical to a normal schedule, except that
> a bool value (Exception) will be set to true. When a job is due to run
> at a given schedule, if an exception schedule occurs at the same time,
> the job will not run. For example, if a daily schedule is defined to run
> a job every second day, a weekly exception schedule may be created with
> a matching run time on Fridays only. This would mean that the job would
> actually run on every 2nd day unless it ws a Friday.

Hm, negative schedule...
I doubt the user will understand this.
Exceptions are probably only needed for days, e.g. "I want to do backups
every weekday, but on Jan 1st nobody will change the tape so I don't
want to have it run then."

So I'd propose an additional exception table:

CREATE TABLE pgagent.pga_exception
(
   jexscid int4 NOT NULL,
   jexdate date NOT NULL,
   jexdorun bool,   -- run in addition to schedule if true
   CONSTRAINT pga_exception_pkey PRIMARY KEY (jexscid, jexdate),
   CONSTRAINT pga_exception_jexscid_fkey FOREIGN KEY (jexscid)
    REFERENCES pgagent.pga_schedule (jscid)
    ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITHOUT OIDS;


Regards,
Andreas

Re: RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

03 March 2005, 11:23:08


> -----Original Message-----
> From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
> Sent: 02 March 2005 22:22
> To: Dave Page
> Cc: pgadmin-hackers@postgresql.org
> Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>
> > Monthly - This schedule will execute repeatedly at the time
> specified in
> > RunTime, on or after the date specified in StartTime. A
> bool[12] value
> > (Months) will specify which months to run in. A bool[31] value
> > (MonthDays) will specify which days of the month to run on.
> Jobs set to
> > run on non-existant days (such as 31/02) will be skipped.
>
> Better is a bool[32], where the 32nd day is the last day of the month.

OK.

> Silently skipping is not good, should at least be logged.

Hmm - not sure how that will fit in with my thinking of how it will
work, but I'll bear it in mind.


> Hm, negative schedule...
> I doubt the user will understand this.

Yes, ths part is my main uncertainty - though mainly because it will
complicate the code significantly!

> Exceptions are probably only needed for days, e.g. "I want to
> do backups
> every weekday, but on Jan 1st nobody will change the tape so I don't
> want to have it run then."
>
> So I'd propose an additional exception table:
>
> CREATE TABLE pgagent.pga_exception
> (
>    jexscid int4 NOT NULL,
>    jexdate date NOT NULL,
>    jexdorun bool,   -- run in addition to schedule if true

Run in addition is easy anyway - a new schedule may be added to the
existing job.

How about adding a simple date[] column to the schedule in which the
user can add arbitrary 'don't run' dates?

Regards, Dave.

Re: RFC: pgAgent Scheduler Design

From

Andreas Pflug

Date:

03 March 2005, 13:49:56

Dave Page wrote:
>
>>Exceptions are probably only needed for days, e.g. "I want to
>>do backups
>>every weekday, but on Jan 1st nobody will change the tape so I don't
>>want to have it run then."
>>
>>So I'd propose an additional exception table:
>>
>>CREATE TABLE pgagent.pga_exception
>>(
>>   jexscid int4 NOT NULL,
>>   jexdate date NOT NULL,
>>   jexdorun bool,   -- run in addition to schedule if true
>
>
> Run in addition is easy anyway - a new schedule may be added to the
> existing job.
>

Ok. I was thinking of just extending an existing schedule with a simple
click for a "run additonally" exception, but this is maybe too much
effort for very rare cases.

> How about adding a simple date[] column to the schedule in which the
> user can add arbitrary 'don't run' dates?

Ok. I'm usually not designing data models using vector/array datatypes
because inter-db portability issues are always implicitely considered by
  some cells deep in my brain.
In case of exceptions, IMHO it's a bit unfortunate to use them, because
adding/deleting a single value would always mean to modify the whole
column, so I'd still prefer an additional table (jexdorun omitted).

Regards,
Andreas

Re: RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

06 March 2005, 00:15:11


> -----Original Message-----
> From: pgadmin-hackers-owner@postgresql.org
> [mailto:pgadmin-hackers-owner@postgresql.org] On Behalf Of Dave Page
> Sent: 02 March 2005 20:25
> To: pgadmin-hackers@postgresql.org
> Subject: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>

<snip>

> The basic design that I'm leaning towards is as follows, which each
> schedule being represented in one row in a table.

Y'know, having thought about all that for a few days, I'm simply not
happy with it - it's all too messy and inconsistent :-(.

So, thought #2 - follow a modified cron-style design:


Control
-------

jscstart        timestamptz    -- date/time the schedule may
start at.
jscend        timestamptz    -- date/time the schedule will end at.
jscenabled        bool        -- is the schedule active?

Note the lack of run counting in this design. This is primarily because
missed runs (caused by system downtime for example) would be extremely
difficult to count, potentially leading to errors calculating the
schedule end. In addition, an end date would almost certainly give most
people the flexibility they require.


Schedule
--------

jscminutes        bool[60]    -- 0,1,2,3...59
jschours        bool[24]    -- 0,1,2,3...23
jscweekdays        bool[7]    -- mon,tue,wed...sun
jscmonthdays    bool[32]    -- 0,1,2,3...31,last
jscmonths        bool[12]    -- jan,feb,mar...dec

In this scheme, the elements of the arrays represent the possible values
for each part of the schedule - for example, jscweekdays[] represents
mon, tue, wed, thur, fri, sat, sun. If an array contains 'f' for all
values, it is considered to be the cron * equivalent. jscmonthdays also
includes an additional element to represent the last day of the month,
regardless of it's actual number, per Andreas' suggestion.

As per cron, a simple algorithm would determine if a schedule should
fire:

If ((jscminutes[datetime.minute] || jscminutes.IsAllFalse()) &&
    (jschours[datetime.hour] || jschours.IsAllFalse()) &&
    (jscweekdays[datetime.weekday] || jscweekdays.IsAllFalse()) &&
    (jscmonthdays[datetime.monthday] || jscmonthdays.IsAllFalse() ||
(datetime.lastdayofmonth && jscmonthdays[32])) &&
    (jscmonths[datetime.month] || jscmonths.IsAllFalse()))
{
    FireSchedule();
}

(I think that's about right - it's been a long day :-) )


Exceptions
----------

In addition, an exceptions table will be maintained:

jsedate        date
jsetime        time

If a schedule matches the specified date and/or time (either may be
null), it will be skipped.

Thoughts, comments? Anyone else as well as Andreas? :-)

Regards, Dave.

Re: RFC: pgAgent Scheduler Design

From

Andreas Pflug

Date:

06 March 2005, 02:16:20

Dave Page wrote:
>
>
>
>>-----Original Message-----
>>From: pgadmin-hackers-owner@postgresql.org
>>[mailto:pgadmin-hackers-owner@postgresql.org] On Behalf Of Dave Page
>>Sent: 02 March 2005 20:25
>>To: pgadmin-hackers@postgresql.org
>>Subject: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>>
>
>
> <snip>
>
>>The basic design that I'm leaning towards is as follows, which each
>>schedule being represented in one row in a table.
>
>
> Y'know, having thought about all that for a few days, I'm simply not
> happy with it - it's all too messy and inconsistent :-(.
>
> So, thought #2 - follow a modified cron-style design:
>
>
> Control
> -------
>
> jscstart        timestamptz    -- date/time the schedule may
> start at.
> jscend        timestamptz    -- date/time the schedule will end at.
> jscenabled        bool        -- is the schedule active?
>
> Note the lack of run counting in this design. This is primarily because
> missed runs (caused by system downtime for example) would be extremely
> difficult to count, potentially leading to errors calculating the
> schedule end. In addition, an end date would almost certainly give most
> people the flexibility they require.
>
>
> Schedule
> --------
>
> jscminutes        bool[60]    -- 0,1,2,3...59
> jschours        bool[24]    -- 0,1,2,3...23
> jscweekdays        bool[7]    -- mon,tue,wed...sun
> jscmonthdays    bool[32]    -- 0,1,2,3...31,last
> jscmonths        bool[12]    -- jan,feb,mar...dec
>
> In this scheme, the elements of the arrays represent the possible values
> for each part of the schedule - for example, jscweekdays[] represents
> mon, tue, wed, thur, fri, sat, sun. If an array contains 'f' for all
> values, it is considered to be the cron * equivalent. jscmonthdays also
> includes an additional element to represent the last day of the month,
> regardless of it's actual number, per Andreas' suggestion.
>
> As per cron, a simple algorithm would determine if a schedule should
> fire:
>
> If ((jscminutes[datetime.minute] || jscminutes.IsAllFalse()) &&
>     (jschours[datetime.hour] || jschours.IsAllFalse()) &&
>     (jscweekdays[datetime.weekday] || jscweekdays.IsAllFalse()) &&
>     (jscmonthdays[datetime.monthday] || jscmonthdays.IsAllFalse() ||
> (datetime.lastdayofmonth && jscmonthdays[32])) &&
>     (jscmonths[datetime.month] || jscmonths.IsAllFalse()))
> {
>     FireSchedule();
> }
>
> (I think that's about right - it's been a long day :-) )

Sorry, this won't work.

It is mandatory that a "next run" schedule is calculated after a job has
run, and FireSchedule will run when nextRun is >= current_timestamp.

Imagine two jobs scheduled for the very same minute, and only one
pgAgent running. It will run the first job, which will run for lets say
2 minutes. After that, the fire time for the second job is not due any
more, so it will run somewhat later, if ever.

That's what pga_next_schedule is good for. You'll have quite a hard time
to calulate it from your way of storing schedules, I'm afraid... It's
somewhat the difference between cron and anacron.

Regards,
Andreas

Re: RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

06 March 2005, 16:07:13


> -----Original Message-----
> From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
> Sent: 05 March 2005 23:16
> To: Dave Page
> Cc: pgadmin-hackers@postgresql.org
> Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>
> Sorry, this won't work.
>
> It is mandatory that a "next run" schedule is calculated
> after a job has
> run, and FireSchedule will run when nextRun is >= current_timestamp.

Yes, and that is how it will run - the code snippet I gave was simply to
illustrate how the schedule storage will work, not how the scheduler
will actually run. I fully intend to retain the current method of
pre-calculating the next run time.

> Imagine two jobs scheduled for the very same minute, and only one
> pgAgent running. It will run the first job, which will run
> for lets say
> 2 minutes. After that, the fire time for the second job is
> not due any
> more, so it will run somewhat later, if ever.

Yes, I'm aware of that issue - and even the current design will have
problems in that jobs may stil run late. I think the correct way to
address this is to grab all due jobs, and spawn a separate thread to
handle each. This should allow jobs to actually start when they are
supposed to.

> That's what pga_next_schedule is good for. You'll have quite
> a hard time
> to calulate it from your way of storing schedules, I'm afraid... It's
> somewhat the difference between cron and anacron.

Yes, I realise it's not the easiest thing to do, but I do not believe
it's a major problem that cannot be overcome with a little thought. I
think the trick is to treat each element of the schedule seperately, and
find the next month, then the next day, and lastly the hour and minute
in turn.

Regards, Dave.

Re: RFC: pgAgent Scheduler Design

From

Andreas Pflug

Date:

06 March 2005, 16:41:16

Dave Page wrote:

>Yes, I'm aware of that issue - and even the current design will have
>problems in that jobs may stil run late.
>
Better late than never.

> I think the correct way to
>address this is to grab all due jobs, and spawn a separate thread to
>handle each. This should allow jobs to actually start when they are
>supposed to.
>
>
Well, yes, hm...
Threading is nice, but not a guarantee to be exactly on-time for job starts.
The design is basically able to be run by multiple process instances of
pgAgent (on different machines), this would get quite impossible without
further control. A combination seems the best:
- a job that is due to run will be running as soon as possible
- any instance of pgAgent might be configured to run a job threaded
- multiple instance may share the pool of due tasks
- all together, instances and threads try to execute the schedule asap.
IIRC, that was my original intention, those days.

Regards,
Andreas

Re: RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

06 March 2005, 21:41:09

-----Original Message-----
From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
Sent: Sun 3/6/2005 1:41 PM
To: Dave Page
Cc: pgadmin-hackers@postgresql.org
Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design

> >Yes, I'm aware of that issue - and even the current design will have
> >problems in that jobs may stil run late.
>
> Better late than never.

Yup - but, an exception... say I have a daily job that sends me a simple report via email. If the system is down for a
fewdays, then I don't want it to run all the old instances of the job on restart (cron won't, the MS Task Scheduler
won't- in fact, I can't think of any I've used that do would). Anyway, I was thinking that when the agent first starts,
itshould do something like 'update pga_schedule set nextrun = nextrun where jscactive = true and jscrunning = false' to
nudgethe update trigger to recalculate the next run dates from that point. What I'm not so sure about is how to log the
failedjobs in that instance. This should be multi-agent safe. 

> Threading is nice, but not a guarantee to be exactly on-time for job starts.

Nope, but except on the most overloaded of systems each job should start within a minute of it's schedule. With the
currentdesign, one 6 hour job could completely screw things up for other jobs. 

> -a job that is due to run will be running as soon as possible

Yup.

> - any instance of pgAgent might be configured to run a job threaded
> - multiple instance may share the pool of due tasks

Eh? No, I'm advocating 1 thread per job. The main thread queries the db, find 5 jobs due and spawns 5 threads to run
them.1 minute later, regardless of the state of the 5 threads, the main thread checks for new tasks to run and spawns
morethreads if required. As jobs are finished, their threads simply die. 

The use of multiple agents by the vast majority of people seems unlikely to me - especially given the lack of control
overwhat runs on what agent. In particular, the majority of jobs are likely to be SQL jobs, the distribution of which
ispretty much irrelevant anyway as all the hard work is done by the server. I'm not convinced that many people will
wantto run resource hungry batch jobs that may run on random agent machines. 

Can we get some third party opinions on what usage models will be useful please?

Regards, Dave

Re: RFC: pgAgent Scheduler Design

From

Andreas Pflug

Date:

07 March 2005, 00:44:07

Dave Page wrote:

>
>-----Original Message-----
>From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
>Sent: Sun 3/6/2005 1:41 PM
>To: Dave Page
>Cc: pgadmin-hackers@postgresql.org
>Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>
>
>
>>>Yes, I'm aware of that issue - and even the current design will have
>>>problems in that jobs may stil run late.
>>>
>>>
>>Better late than never.
>>
>>
>
>Yup - but, an exception... say I have a daily job that sends me a simple report via email. If the system is down for a
fewdays, then I don't want it to run all the old instances of the job on restart (cron won't, the MS Task Scheduler
won't- in fact, I can't think of any I've used that do would). 
>
I don't completely agree. If a job was stuck unexecuted in the queue for
a while, it should run asap, and the next schedule should be calculated
in the future, i.e. a daily job not executed for 5 days shouldn't be
executed 5x, but only once after pgAgent is up again.

> Anyway, I was thinking that when the agent first starts, it should do something like 'update pga_schedule set nextrun
=nextrun where jscactive = true and jscrunning = false' to nudge the update trigger to recalculate the next run dates
fromthat point.  
>
No, it should run *once* to get in sync again, if it has become due in
the meantime.

>What I'm not so sure about is how to log the failed jobs in that instance. This should be multi-agent safe.
>
>
Actually they don't log. Think of a weekdaily job, pgAgent down since
Sunday, now up on Friday morningagain. The job was due on Monday
evening, and will run on Friday immediately, new schedule Friday night .
After an agent failure, you'll have to check anyway in which state your
jobs are. I don't think something like "monday job ran on friday,
tue/wed/thur was skipped" would help here. The logging "Job due on
Monday at 2200 was run on Friday 830" should be enough.

>
>
>>Threading is nice, but not a guarantee to be exactly on-time for job starts.
>>
>>
>
>Nope, but except on the most overloaded of systems each job should start within a minute of it's schedule. With the
currentdesign, one 6 hour job could completely screw things up for other jobs. 
>
>
>
>>-a job that is due to run will be running as soon as possible
>>
>>
>
>Yup.
>
>
>
>>- any instance of pgAgent might be configured to run a job threaded
>>- multiple instance may share the pool of due tasks
>>
>>
>
>Eh? No, I'm advocating 1 thread per job. The main thread queries the db, find 5 jobs due and spawns 5 threads to run
them.1 minute later, regardless of the state of the 5 threads, the main thread checks for new tasks to run and spawns
morethreads if required. As jobs are finished, their threads simply die. 
>
>
This doesn't enable multi agents. There should be a limit on threads per
agent to give other instances a chance.

>The use of multiple agents by the vast majority of people seems unlikely to me - especially given the lack of control
overwhat runs on what agent. In particular, the majority of jobs are likely to be SQL jobs, the distribution of which
ispretty much irrelevant anyway as all the hard work is done by the server. I'm not convinced that many people will
wantto run resource hungry batch jobs that may run on random agent machines. 
>
>
Binary jobs (shell jobs) need a system qualifier.

>Can we get some third party opinions on what usage models will be useful please?
>
>

Yesplease.

Regards,
Andreas

Re: RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

08 March 2005, 02:02:56


> -----Original Message-----
> From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
> Sent: 06 March 2005 21:44
> To: Dave Page
> Cc: pgadmin-hackers@postgresql.org
> Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>
> Dave Page wrote:
>
> >
> >-----Original Message-----
> >From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
> >Sent: Sun 3/6/2005 1:41 PM
> >To: Dave Page
> >Cc: pgadmin-hackers@postgresql.org
> >Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
> >
> >
> I don't completely agree. If a job was stuck unexecuted in
> the queue for
> a while, it should run asap, and the next schedule should be
> calculated
> in the future, i.e. a daily job not executed for 5 days shouldn't be
> executed 5x, but only once after pgAgent is up again.

Hmm, not sure about that. I definitely want to hear some other opinions
on this please!!

> This doesn't enable multi agents. There should be a limit on
> threads per
> agent to give other instances a chance.
>
> >The use of multiple agents by the vast majority of people
> seems unlikely to me - especially given the lack of control
> over what runs on what agent. In particular, the majority of
> jobs are likely to be SQL jobs, the distribution of which is
> pretty much irrelevant anyway as all the hard work is done by
> the server. I'm not convinced that many people will want to
> run resource hungry batch jobs that may run on random agent machines.
> >
> >
> Binary jobs (shell jobs) need a system qualifier.

Well that would tie in with the multi-thread model, and the current code
which is broken with multiple agents anyway (because there is nothing to
stop 2 agents grabbing and running the same job at the same time).
Allowing unlimited threads per agent, and allowing a specific system to
be named, we can ensure things start when they should, and can be
properly distributed.


Regards, Dave.

Re: RFC: pgAgent Scheduler Design

From

Miha Radej

Date:

08 March 2005, 09:18:10

hi!

Dave Page wrote:
>>
>>I don't completely agree. If a job was stuck unexecuted in
>>the queue for
>>a while, it should run asap, and the next schedule should be
>>calculated
>>in the future, i.e. a daily job not executed for 5 days shouldn't be
>>executed 5x, but only once after pgAgent is up again.
>
> Hmm, not sure about that. I definitely want to hear some other opinions
> on this please!!
>

if i may, i'd very much like to see something like this:

- if a job does not execute, i'd like to set an option whether to be
executed as soon as possible, delayed or nor, or if execution should be
held until the next planned time.
- should this be the last planned execution of a job, ie. a job
executing monday to friday and then never again, i'd like to be able to
set an option in such special case: if the execution was held and the
last execution was held also, it'd be great to be able to set an option
like the above paragraph for such exceptional cases.
- an option whether to execute jobs several times or only once, would be
handy. for example, if i have a daily job and have an incrementing value
set to n, i'd expect it to be n+5 5 days later. sometimes, when, say,
refreshing data in a table, multiple executions aren't necessary, so a
checkbox or something would be nice imo.
- if it should happen that something is set to execute periodically
every 10 minutes but takes more than that to execute, is it possible to
set it either to wait until the first job is done and only then launch
the next run, or to allow overlapping jobs, ie. execute a job at the
next specified time, regardless if the previous run hasn't finished yet.
- and another feature request: is it possible to set an execution time
frame, from and to a certain date and have a job executed at random
times (with a possibility of specifying the number or range (ie. between
5 and 9) of executions) within that specified time frame? right now i
have a script that does sth. like that. err, this is a rather exotic
request, i know, but it is something i could use so i just had to write
this :)

i hope i stayed in context and my opinion helps. now i'll just crawl
back into my cave and just read the postings here again :)


regards,
Miha

Re: RFC: pgAgent Scheduler Design

From

Andreas Pflug

Date:

08 March 2005, 21:28:39

Dave Page wrote:

>
>>obs (shell jobs) need a system qualifier.
>>
>>
>
>Well that would tie in with the multi-thread model, and the current code
>which is broken with multiple agents anyway (because there is nothing to
>stop 2 agents grabbing and running the same job at the same time).
>
>

AFAIR there *is* a locking, by updating the column "current agent pid".

Regards,
Andreas

Re: RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

09 March 2005, 12:46:05

> -----Original Message-----
> From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
> Sent: 08 March 2005 18:28
> To: Dave Page
> Cc: pgadmin-hackers@postgresql.org
> Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>
> Dave Page wrote:
>
> >
> >>obs (shell jobs) need a system qualifier.
> >>
> >>
> >
> >Well that would tie in with the multi-thread model, and the
> current code
> >which is broken with multiple agents anyway (because there
> is nothing to
> >stop 2 agents grabbing and running the same job at the same time).
> >
> >
>
> AFAIR there *is* a locking, by updating the column "current
> agent pid".

Which isn't done in the same transaction as the original select, thus
allowing a window in which another agent might grab the same job and
execute it (though from my reading of the code, the second instance
won't bother to log it's start time, and will probably mess up the
logging of step results because prtid will be an empty string). Unless
I'm missing something of course...

/D

Re: RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

09 March 2005, 13:39:03


> -----Original Message-----
> From: Miha Radej [mailto:miha.radej@siix.com]
> Sent: 08 March 2005 06:17
> To: Dave Page; pgadmin-hackers@postgresql.org
> Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>
> - if a job does not execute, i'd like to set an option whether to be
> executed as soon as possible, delayed or nor, or if execution
> should be
> held until the next planned time.

You mean the run one missed job and then go back to normal as Andreas
suggested, or run *all* missed instances?

> - should this be the last planned execution of a job, ie. a job
> executing monday to friday and then never again, i'd like to
> be able to
> set an option in such special case: if the execution was held and the
> last execution was held also, it'd be great to be able to set
> an option
> like the above paragraph for such exceptional cases.

OK.

> - an option whether to execute jobs several times or only
> once, would be
> handy. for example, if i have a daily job and have an
> incrementing value
> set to n, i'd expect it to be n+5 5 days later. sometimes, when, say,
> refreshing data in a table, multiple executions aren't
> necessary, so a
> checkbox or something would be nice imo.

There will be a job end date/time that will allow you to do this -
specify the schedule, and then optionally specify the date/time that the
job will stop running after. My previous proposal included a counter,
but that's surprisingly difficult to handle when jobs get missed.

> - if it should happen that something is set to execute periodically
> every 10 minutes but takes more than that to execute, is it
> possible to
> set it either to wait until the first job is done and only
> then launch
> the next run, or to allow overlapping jobs, ie. execute a job at the
> next specified time, regardless if the previous run hasn't
> finished yet.

Only one instance of a job can run at a time in the current scheme.
Currently, when the job finishes, it triggers the calculation of the
next run time - as it stands, that will always be after the current
time.

> - and another feature request: is it possible to set an
> execution time
> frame, from and to a certain date and have a job executed at random
> times (with a possibility of specifying the number or range
> (ie. between
> 5 and 9) of executions) within that specified time frame? right now i
> have a script that does sth. like that. err, this is a rather exotic
> request, i know, but it is something i could use so i just
> had to write
> this :)

Actually that's a pretty neat idea, and one that I would probably use as
well. I don't think I'll include anything like it in v1 (I don't want
the feature list getting too big at this point), but v2....

> i hope i stayed in context and my opinion helps.

Absolutely - the more opinions we get the better (within reason of
course!!)

> now i'll just crawl
> back into my cave and just read the postings here again :)

:-)

Thanks, Dave.

Re: RFC: pgAgent Scheduler Design

From

Miha Radej

Date:

10 March 2005, 01:03:35

hi!

Dave Page wrote:
>>-----Original Message-----
>>From: Miha Radej
>>
>>- if a job does not execute, i'd like to set an option whether to be
>>executed as soon as possible, delayed or nor, or if execution
>>should be
>>held until the next planned time.
>
> You mean the run one missed job and then go back to normal as Andreas
> suggested, or run *all* missed instances?
>

hmm... i have the innate ability to not properly express what i am
thinking about :)

what i meant with the above was a setting for this: you set a job to be
executed, say, at a certain time. if it doesn't execute at that
specified time, then sometimes it might be possible that someone doesn't
want the job to be executed at all (eg. heavy duty jobs where it would
take up too much resources, time, etc) and rather wait until the next
execution time, while other people might like the job to get executed as
soon as possible, even if the set time is way in the past.

for example, if i have a large database of, say a forum, and have a few
jobs set to be executed. one at 0100 to do some magic with it, next one
at 0400 to do sth else, last one at 0500. so that when people would come
to work at 0600 the jobs would be finished and people would be able to
post as usual while at work :) if it should happen that the last job
couldn't be executed at 0500, i don't want it to be executed at all,
because it might take a lot of time to finish so i risk that the
database will be overloaded with work to do and users might experience
large load times. which might turn some users away, which i don't want.
but if i have some simple tasks planned or have a private database where
i don't care what gets executed and when, then i could set the option to
simply execute the job when it gets its turn, regardless whether the
start time has already passed.

the above option, along with this one:
"- an option whether to execute jobs several times or only once, would
be handy. for example, if i have a daily job and have an incrementing
value set to n, i'd expect it to be n+5 5 days later. sometimes, when,
say, refreshing data in a table, multiple executions aren't necessary,
so a checkbox or something would be nice imo. "

are imho somewhat connected. so some jobs i would like to be executed
all at once (those that were delayed), while others i don't need
executed more than once. i can't help myself, i like lots of options and
checkboxes and so forth :)

this was what i was trying to say and i hope i didn't complicate matters
too much as i usually do :)

regards,
M

Re: RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

11 March 2005, 14:26:30


> -----Original Message-----
> From: Miha Radej [mailto:miha.radej@siix.com]
> Sent: 09 March 2005 22:03
> To: Dave Page
> Cc: pgadmin-hackers@postgresql.org
> Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>
> hmm... i have the innate ability to not properly express what i am
> thinking about :)

<grin>

> what i meant with the above was a setting for this: you set a
> job to be
> executed, say, at a certain time. if it doesn't execute at that
> specified time, then sometimes it might be possible that
> someone doesn't
> want the job to be executed at all (eg. heavy duty jobs where
> it would
> take up too much resources, time, etc) and rather wait until the next
> execution time, while other people might like the job to get
> executed as
> soon as possible, even if the set time is way in the past.

OK. I guess that we'd need some tolerance to do this, so combined with
previous thinking:

jsccatchup  bool         -- If set, run all missed occurances
when catching up after missing runs.
jscmaxdelay interval    -- If the job doesn't start within this time of
it's scheduled start, don't run at all.

I'm not sure that is entirely logical with the existing 'play first run
after missing one' design though (which isn't overly easy to change too
much).

Regards, Dave.

Re: RFC: pgAgent Scheduler Design

From

Andreas Pflug

Date:

11 March 2005, 22:30:03

Dave Page wrote:
>
>
>
>>-----Original Message-----
>>From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
>>Sent: 08 March 2005 18:28
>>To: Dave Page
>>Cc: pgadmin-hackers@postgresql.org
>>Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>>
>>Dave Page wrote:
>>
>>
>>>>obs (shell jobs) need a system qualifier.
>>>>
>>>>
>>>
>>>Well that would tie in with the multi-thread model, and the
>>
>>current code
>>

>
>
> Which isn't done in the same transaction as the original select, thus
> allowing a window in which another agent might grab the same job and
> execute it (though from my reading of the code, the second instance
> won't bother to log it's start time, and will probably mess up the
> logging of step results because prtid will be an empty string). Unless
> I'm missing something of course...

You do.

int rc=serviceConn->ExecuteVoid(
      "UPDATE ... SET jobagentpid=pg_backend_pid() ...
          WHERE jobagentid IS NULL ...");

if (rc == 1) // i.e. if exactly one row was affected, it's our job now.

Regards,
Andreas

Re: RFC: pgAgent Scheduler Design

From

"Dave Page"

Date:

11 March 2005, 23:34:53


> -----Original Message-----
> From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
> Sent: 11 March 2005 19:30
> To: Dave Page
> Cc: pgadmin-hackers@postgresql.org
> Subject: Re: [pgadmin-hackers] RFC: pgAgent Scheduler Design
>
> You do.
>
> int rc=serviceConn->ExecuteVoid(
>       "UPDATE ... SET jobagentpid=pg_backend_pid() ...
>           WHERE jobagentid IS NULL ...");
>
> if (rc == 1) // i.e. if exactly one row was affected, it's
> our job now.

Yeah, I saw that, but looking again I missed job::Runnable() in the
header.

So that's alright then! :-)

/D