Re: large crontab database design - Mailing list pgsql-general

From Csaba Nagy
Subject Re: large crontab database design
Date
Msg-id 1075822461.687.15.camel@coppola.ecircle.de
Whole thread Raw
In response to large crontab database design  (David Garamond <lists@zara.6.isreserved.com>)
Responses Re: large crontab database design
List pgsql-general
David,

I have another solution working (not using data base persistence, but
the idea is the same).
We have the next run time saved along with the original cron-tab string,
which is left as it is (not broken down to any details). The system
polls for the earliest "next runtime", and executes the associated taks
if the time is not in the future, otherwise does nothing.
Then find the next run time for the just executed task based on it's
time table, and update it in the schedule table.
Of course in a database version this would involve the locking of the
executed task's row, so no other machine/thread will execute it at the
same time (I assume it is about some high concurrency system). I'm not
sure what kind of race conditions can this cause...

Just my 2c,
Csaba.


On Tue, 2004-02-03 at 15:52, David Garamond wrote:
> I was thinking on how one would design an optimal (performance-wise)
> database of large number of schedules with crontab-like semantic. There
> will potentially be hundreds of thousands or even millions of "crontab"
> entries, and there will be a query run once every minute to determine
> which entries are to be run for that minute. The primary goal is, of
> course, for the query to run as fast as possible.
>
> My first stab is, straightforward enough, like below:
>
>   create table schedule (
>     id int not null primary key,
>     command text
>   );
>
>   create table schedule_detail(
>     schedule_id int not null references schedule(id),
>     minute smallint not null,
>     hour smallint not null,
>     day smallint not null,
>     month smallint not null,
>     year smallint not null,
>     dayofweek smallint not null
>   );
>
>   create index idx_schedule_detail_schedule_id on
>     schedule_detail(schedule_id);
>   create index idx_schedule_detail_minute on
>     schedule_detail(minute);
>   create index idx_schedule_detail_hour on
>     schedule_detail(hour);
>   create index idx_schedule_detail_day on
>     schedule_detail(day);
>   create index idx_schedule_detail_month on
>     schedule_detail(month);
>   create index idx_schedule_detail_year on
>     schedule_detail(year);
>   create index idx_schedule_detail_dayofweek on
>     schedule_detail(dayofweek);
>
> A "multiple items" syntax like '1,2,3' in a crontab time field will be
> presented with several records in the schedule_detail table. An "every
> N" syntax like '*/5' will be represented by a negative number '-5' in
> the database field.
>
> For example, this crontab entry:
>
>   # every three hours except on weekends (sat/sun)
>   0 */3 * * 1,2,3,4,5 CHECK-THE-COPIER-MACHINE
>
> will translate to these records:
>
>   insert into schedule values (1, 'CHECK-THE-COPIER-MACHINE');
>   insert into schedule_detail values (1, 0, -3, -1, -1, -1, 1);
>   insert into schedule_detail values (1, 0, -3, -1, -1, -1, 2);
>   insert into schedule_detail values (1, 0, -3, -1, -1, -1, 3);
>   insert into schedule_detail values (1, 0, -3, -1, -1, -1, 4);
>   insert into schedule_detail values (1, 0, -3, -1, -1, -1, 5);
>
> The query will then be:
>
>   select distinct schedule_id,command from schedule_detail
>   left join schedule on schedule_id=schedule.id
>   where
>   (year=extract(year from current_date) or
>    (year<0 and cast(extract(year from current_date) as smallint)
>                % year = 0)) and
>   (month=extract(month from current_date) or
>    (month<0 and cast(extract(month from current_date) as smallint)
>                % month = 0)) and
>   (day=extract(month from current_date) or
>    (day<0 and cast(extract(day from current_date) as smallint)
>                % day = 0)) and
>   (hour=extract(hour from current_time) or
>    (hour<0 and cast(extract(hour from current_time) as smallint)
>                % hour = 0)) and
>   (minute=extract(minute from current_time) or
>    (minute<0 and cast(extract(minute from current_time) as smallint)
>                % month = 0));
>
> Several questions:
>
> 1. Any idea for a better design? The selectivity of each field is not so
> great (only 60 different values for minute, 24 for day, etc).
>
> 2. What if we want to add support for things like "every 3 hours after
> 1:00AM" (1:00, 4:00, 7:00, and so on) or "beginning from 24 Jan 2004 and
> every 12 days after that." We won't be able to use the negative number
> and modulo trick, or even the crontab-like database fields for this.
>
> 3. I think I have a race condition in my query (because I call the
> current_date & current_time function several times), but I don't know
> how to fix this without resorting to procedure/function.


pgsql-general by date:

Previous
From: David Garamond
Date:
Subject: large crontab database design
Next
From: Phil Campaigne
Date:
Subject: problem with jdbc connection to postgesql