Thread: proposal: contrib module - generic command scheduler

proposal: contrib module - generic command scheduler

From

Pavel Stehule

Date:

12 May 2015, 07:26:36

Generic simple scheduler to contrib
===================================
Job schedulers are important and sometimes very complex part of any software. PostgreSQL miss it. I propose new contrib module, that can be used simply for some tasks, and that can be used as base for other more richer schedulers. I prefer minimalist design - but strong enough for enhancing when it is necessary. Some complex logic can be implemented in PL better than in C. Motto: Simply to learn, simply to use, simply to customize.

Motivation
----------
Possibility to simplify administration of repeated tasks. Possibility to write complex schedulers in PL/pgSQL or other PL.

Design
------
Any scheduled command will be executed in independent worker. The number workers for one command can be limited. Total number of workers will be limited. Any command will be executed under specified user with known timeout in current database. Next step can be implementation global scheduler - but we have not a environment for running server side global scripts, so I don't think about it in this moment.

This scheduler does not guarantee number of executions. Without available workers the execution will be suspended, after crash the execution can be repeated. But it can be solved in upper layer if it is necessary. It is not designed as realtime system. Scheduled task will be executed immediately when related worker will be free, but the execution window is limited to next start.

This design don't try to solve mechanism for repeating tasks when tasks hash a crash. This can be solved better in PL on custom layer when it is necessary.

Scheduled time is stored to type scheduled_time:

create type scheduled_time as (second int[], minute int[], hour int[], dow int[], month int[]);

(,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
(,"{5}",,,) .. run once per hour

The core is table pg_scheduled_commands

Oid: 1

name: xxxx

user: pavel

stime: (,"{5}",,,)

max_workers: 1

timeout: 10s

command: SELECT plpgsql_entry(scheduled_time(), scheduled_command_oid())

set timeout to 0 ~ unlimited, -1 default statement_timeout
set max_workers to 0 ~ disable tasks

API
---
pg_create_scheduled_command(name,
   stime,
                         command,
user default current_user,
   max_workers default 1,
timeout default -1);

pg_drop_scheduled_command(oid)
pg_drop_scheduled_command(name);

pg_update_scheduled_command(oid | name, ...

Usage:
------

pg_create_scheduled_command('delete obsolete data', '(,,"{1}",,)', $$DELETE FROM data WHERE inserted < current_timestamp - interval '1month'$$);

pg_update_scheduled_command('delete obsolete data', max_workers => 2, timeout :=> '1h');

pg_drop_scheduled_command('delete obsolete data');

select * from pg_scheduled_commands;

Comments, notices?

Regards

Pavel

Re: proposal: contrib module - generic command scheduler

From

Dave Page

Date:

12 May 2015, 08:45:26

On Tue, May 12, 2015 at 10:25 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Generic simple scheduler to contrib
> ===================================
> Job schedulers are important and sometimes very complex part of any
> software. PostgreSQL miss it. I propose new contrib module, that can be used
> simply for some tasks, and that can be used as base for other more richer
> schedulers. I prefer minimalist design - but strong enough for enhancing
> when it is necessary. Some complex logic can be implemented in PL better
> than in C. Motto: Simply to learn, simply to use, simply to customize.
>
> Motivation
> ----------
> Possibility to simplify administration of repeated tasks. Possibility to
> write complex schedulers in PL/pgSQL or other PL.
>
> Design
> ------
> Any scheduled command will be executed in independent worker. The number
> workers for one command can be limited. Total number of workers will be
> limited. Any command will be executed under specified user with known
> timeout in current database. Next step can be implementation global
> scheduler - but we have not a environment for running server side global
> scripts, so I don't think about it in this moment.
>
> This scheduler does not guarantee number of executions. Without available
> workers the execution will be suspended, after crash the execution can be
> repeated. But it can be solved in upper layer if it is necessary. It is not
> designed as realtime system. Scheduled task will be executed immediately
> when related worker will be free, but the execution window is limited to
> next start.
>
> This design don't try to solve mechanism for repeating tasks when tasks hash
> a crash. This can be solved better in PL on custom layer when it is
> necessary.
>
> Scheduled time is stored to type scheduled_time:
>
> create type scheduled_time as (second int[], minute int[], hour int[], dow
> int[], month int[]);
>
>  (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
>  (,"{5}",,,) .. run once per hour
>
> The core is table pg_scheduled_commands
>
> Oid:           1
> name:        xxxx
> user:          pavel
> stime:        (,"{5}",,,)
> max_workers: 1
> timeout:          10s
> command: SELECT plpgsql_entry(scheduled_time(), scheduled_command_oid())
>
>
> set timeout to 0 ~ unlimited, -1 default statement_timeout
> set max_workers to 0 ~ disable tasks
>
> API
> ---
> pg_create_scheduled_command(name,
>                          stime,
>                          command,
>                          user default current_user,
>                          max_workers default 1,
>                          timeout default -1);
>
> pg_drop_scheduled_command(oid)
> pg_drop_scheduled_command(name);
>
> pg_update_scheduled_command(oid | name, ...
>
> Usage:
> ------
> pg_create_scheduled_command('delete obsolete data', '(,,"{1}",,)', $$DELETE
> FROM data WHERE inserted < current_timestamp - interval '1month'$$);
> pg_update_scheduled_command('delete obsolete data', max_workers => 2,
> timeout :=> '1h');
> pg_drop_scheduled_command('delete obsolete data');
>
> select * from pg_scheduled_commands;
>
>
> Comments, notices?

It's not integrated with the server (though it is integrated with
pgAdmin), but pgAgent provides scheduling services for PostgreSQL
already, offering multi-schedule, multi-step job execution.

http://www.pgadmin.org/docs/1.20/pgagent.html

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: contrib module - generic command scheduler

From

Pavel Stehule

Date:

12 May 2015, 09:03:35

2015-05-12 10:45 GMT+02:00 Dave Page <dpage@pgadmin.org>:

On Tue, May 12, 2015 at 10:25 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Generic simple scheduler to contrib
> ===================================
> Job schedulers are important and sometimes very complex part of any
> software. PostgreSQL miss it. I propose new contrib module, that can be used
> simply for some tasks, and that can be used as base for other more richer
> schedulers. I prefer minimalist design - but strong enough for enhancing
> when it is necessary. Some complex logic can be implemented in PL better
> than in C. Motto: Simply to learn, simply to use, simply to customize.
>
> Motivation
> ----------
> Possibility to simplify administration of repeated tasks. Possibility to
> write complex schedulers in PL/pgSQL or other PL.
>
> Design
> ------
> Any scheduled command will be executed in independent worker. The number
> workers for one command can be limited. Total number of workers will be
> limited. Any command will be executed under specified user with known
> timeout in current database. Next step can be implementation global
> scheduler - but we have not a environment for running server side global
> scripts, so I don't think about it in this moment.
>
> This scheduler does not guarantee number of executions. Without available
> workers the execution will be suspended, after crash the execution can be
> repeated. But it can be solved in upper layer if it is necessary. It is not
> designed as realtime system. Scheduled task will be executed immediately
> when related worker will be free, but the execution window is limited to
> next start.
>
> This design don't try to solve mechanism for repeating tasks when tasks hash
> a crash. This can be solved better in PL on custom layer when it is
> necessary.
>
> Scheduled time is stored to type scheduled_time:
>
> create type scheduled_time as (second int[], minute int[], hour int[], dow
> int[], month int[]);
>
> (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
> (,"{5}",,,) .. run once per hour
>
> The core is table pg_scheduled_commands
>
> Oid: 1
> name: xxxx
> user: pavel
> stime: (,"{5}",,,)
> max_workers: 1
> timeout: 10s
> command: SELECT plpgsql_entry(scheduled_time(), scheduled_command_oid())
>
>
> set timeout to 0 ~ unlimited, -1 default statement_timeout
> set max_workers to 0 ~ disable tasks
>
> API
> ---
> pg_create_scheduled_command(name,
> stime,
> command,
> user default current_user,
> max_workers default 1,
> timeout default -1);
>
> pg_drop_scheduled_command(oid)
> pg_drop_scheduled_command(name);
>
> pg_update_scheduled_command(oid | name, ...
>
> Usage:
> ------
> pg_create_scheduled_command('delete obsolete data', '(,,"{1}",,)', $$DELETE
> FROM data WHERE inserted < current_timestamp - interval '1month'$$);
> pg_update_scheduled_command('delete obsolete data', max_workers => 2,
> timeout :=> '1h');
> pg_drop_scheduled_command('delete obsolete data');
>
> select * from pg_scheduled_commands;
>
>
> Comments, notices?

It's not integrated with the server (though it is integrated with
pgAdmin), but pgAgent provides scheduling services for PostgreSQL
already, offering multi-schedule, multi-step job execution.

http://www.pgadmin.org/docs/1.20/pgagent.html

I know pgagent - the proposal is about more deeper integration with core - based on background workers without any other dependency.

Regards

Pavel

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: contrib module - generic command scheduler

From

hubert depesz lubaczewski

Date:

12 May 2015, 09:27:30

On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
> create type scheduled_time as (second int[], minute int[], hour int[], dow
> int[], month int[]);
>  (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
>  (,"{5}",,,) .. run once per hour
> Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
                  http://depesz.com/

Re: proposal: contrib module - generic command scheduler

From

Pavel Stehule

Date:

12 May 2015, 16:32:34

2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski <depesz@depesz.com>:

On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
> create type scheduled_time as (second int[], minute int[], hour int[], dow
> int[], month int[]);
> (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
> (,"{5}",,,) .. run once per hour
> Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

I would not to store state on this level - so "at" should be implemented on higher level. There is very high number of possible strategies, what can be done with failed tasks - and I would not to open this topic. I believe with proposed scheduler, anybody can simply implement what need in PLpgSQL with dynamic SQL. But on second hand "run once" can be implemented with proposed API too.

pg_create_scheduled_command('delete obsolete data', '(,,"{1}","{1}",)',
$$DO $_$
BEGIN
DELETE FROM data WHERE inserted < current_timestamp - interval '1month';

PERFORM pg_update_scheduled_command(scheduled_command_oid(), max_workers => 0);

END $_$
$$);

Regards

Pavel

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

Re: proposal: contrib module - generic command scheduler

From

Craig Ringer

Date:

13 May 2015, 02:08:55

On 13 May 2015 at 00:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski <depesz@depesz.com>:
On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
> create type scheduled_time as (second int[], minute int[], hour int[], dow
> int[], month int[]);
> (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
> (,"{5}",,,) .. run once per hour
> Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

I would not to store state on this level - so "at" should be implemented on higher level. There is very high number of possible strategies, what can be done with failed tasks - and I would not to open this topic. I believe with proposed scheduler, anybody can simply implement what need in PLpgSQL with dynamic SQL. But on second hand "run once" can be implemented with proposed API too.

That seems reasonable in a v1, so long as there's room to easily extend it without pain to add "at"-like one-shot commands, at-startup commands, etc.

I'd prefer to see a scheduling interface that's a close match for cron's or that leaves room for it - so things like "*/5" for every five minutes, ranges like "Mon-Fri", etc. If there's a way to express similar capabilities more cleanly using PostgreSQL's types and conventions that makes sense, but I'm not sure a composite type of arrays fits that.

How do you plan to manage the bgworkers?

In BDR, where we have a similar need to have workers across multiple databases, and where each database contains a list of workers to launch, we have:

* A single static "supervisor" bgworker. In 9.5 this will connect with InvalidOid as the target database so it can only access shared catalogs. In 9.4 this isn't possible in the bgworker API so we have to connect to a dummy database.

* A dynamic background worker for each database in which BDR is enabled, which is launched from the supervisor. We check which DBs are BDR-enabled by (ab)using database security labels and checking pg_shseclabel from the supervisor worker so we only launch bgworkers on BDR-enabled DBs.

* A dynamic background worker for each peer node, launched by the per-database worker based on the contents of that database's bdr.bdr_connections table.

What I suspect you're going to want is:

* A static worker launched by your extension when it starts, which launches per-db workers for each DB in which the scheduler is enabled. You could use a GUC listing scheduler-enabled DBs in postgresql.conf and have an on-reload hook to update it, you don't need to do the security label hack.

* A DB scheduler worker, which looks up the scheduled tasks list, finds the next scheduled event, and sleeps on a long latch timeout until then, resetting it when interrupted. When it reaches the scheduled event it would launch a one-shot BGW_NO_RESTART worker to run the desired PL/PgSQL procedure over the SPI.

* A task runner worker, which gets launched by the db scheduler to actually run a task using the SPI.

Does that match your current thinking?

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: proposal: contrib module - generic command scheduler

From

Pavel Stehule

Date:

13 May 2015, 04:33:39

2015-05-13 4:08 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 13 May 2015 at 00:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski <depesz@depesz.com>:
On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
> create type scheduled_time as (second int[], minute int[], hour int[], dow
> int[], month int[]);
> (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
> (,"{5}",,,) .. run once per hour
> Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

I would not to store state on this level - so "at" should be implemented on higher level. There is very high number of possible strategies, what can be done with failed tasks - and I would not to open this topic. I believe with proposed scheduler, anybody can simply implement what need in PLpgSQL with dynamic SQL. But on second hand "run once" can be implemented with proposed API too.

That seems reasonable in a v1, so long as there's room to easily extend it without pain to add "at"-like one-shot commands, at-startup commands, etc.

I'd prefer to see a scheduling interface that's a close match for cron's or that leaves room for it - so things like "*/5" for every five minutes, ranges like "Mon-Fri", etc. If there's a way to express similar capabilities more cleanly using PostgreSQL's types and conventions that makes sense, but I'm not sure a composite type of arrays fits that.

I though about it too - but the parser for this cron time will be longer than all other code probably. I see a possibility to write constructors that simplify creating a value of this type. Some like

make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);

There are two possible ways - composite with arrays or custom composite. I'll decide later.

There are basic points:

1. don't hold a states, results of commands

2. It execute task immediately in related time window once (from start to next start), when necessary worker is available

3. When command fails, it writes info to log only

4. When command runs too long (over specified timeout), it is killed.

5. When command waits to free worker, write to log

6. When command was not be executed due missing workers (and max_workers > 0), write to log

How do you plan to manage the bgworkers?

I am thinking about one static supervisor, that will hold a calendar in shared memory, that will start dynamic bgworkers for commands per database. The scheduler is enabled in all databases, where the proposed extension is installed.

For working with prototype I am planning to use SPI, but maybe it is not necessary - so commands like VACUUM, CREATE DATABASE, DROP DATABASE can be supported too. But I didn't tested it and I don't know if it is possible or not. It can define new hooks too. So some other extensions can be based on it.

In BDR, where we have a similar need to have workers across multiple databases, and where each database contains a list of workers to launch, we have:

* A single static "supervisor" bgworker. In 9.5 this will connect with InvalidOid as the target database so it can only access shared catalogs. In 9.4 this isn't possible in the bgworker API so we have to connect to a dummy database.

* A dynamic background worker for each database in which BDR is enabled, which is launched from the supervisor. We check which DBs are BDR-enabled by (ab)using database security labels and checking pg_shseclabel from the supervisor worker so we only launch bgworkers on BDR-enabled DBs.

* A dynamic background worker for each peer node, launched by the per-database worker based on the contents of that database's bdr.bdr_connections table.

What I suspect you're going to want is:

* A static worker launched by your extension when it starts, which launches per-db workers for each DB in which the scheduler is enabled. You could use a GUC listing scheduler-enabled DBs in postgresql.conf and have an on-reload hook to update it, you don't need to do the security label hack.

* A DB scheduler worker, which looks up the scheduled tasks list, finds the next scheduled event, and sleeps on a long latch timeout until then, resetting it when interrupted. When it reaches the scheduled event it would launch a one-shot BGW_NO_RESTART worker to run the desired PL/PgSQL procedure over the SPI.

* A task runner worker, which gets launched by the db scheduler to actually run a task using the SPI.

Does that match your current thinking?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: proposal: contrib module - generic command scheduler

From

Jim Nasby

Date:

13 May 2015, 05:50:57

On 5/12/15 11:32 PM, Pavel Stehule wrote:
>         I would not to store state on this level - so "at" should be
>         implemented on higher level. There is very high number of
>         possible strategies, what can be done with failed tasks - and I
>         would not to open this topic. I believe with proposed scheduler,
>         anybody can simply implement what need in PLpgSQL with dynamic
>         SQL. But on second hand "run once" can be implemented with
>         proposed API too.
>
>
>     That seems reasonable in a v1, so long as there's room to easily
>     extend it without pain to add "at"-like one-shot commands,
>     at-startup commands, etc.

Yeah, being able to run things after certain system events would be nice.

>     I'd prefer to see a scheduling interface that's a close match for
>     cron's or that leaves room for it - so things like "*/5" for every
>     five minutes, ranges like "Mon-Fri", etc. If there's a way to
>     express similar capabilities more cleanly using PostgreSQL's types
>     and conventions that makes sense, but I'm not sure a composite type
>     of arrays fits that.

It seems unfortunate to go with cron's limited syntax when we have such 
fully capable timestamp and interval capabilities already in the 
database. :/

Is there anything worth stealing from pgAgent?

> I though about it too - but the parser for this cron time will be longer
> than all other code probably. I see a possibility to write constructors
> that simplify creating a value of this type. Some like
>
> make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or
> make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);

Wouldn't that be just as bad as writing the parser in the first place?

> 1. don't hold a states, results of commands
...> 3. When command fails, it writes info to log only
Unfortunate, but understandable in a first pass.

> 4. When command runs too long (over specified timeout), it is killed.

I think that needs to be optional.

> 5. When command waits to free worker, write to log
> 6. When command was not be executed due missing workers (and max_workers
>  > 0), write to log

Also unfortunate. We already don't provide enough monitoring capability 
and this just makes that worse.

Perhaps it would be better to put something into PGXN first; this 
doesn't really feel like it's baked enough for contrib yet. (And I say 
that as someone who's really wanted this ability in the past...)
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: proposal: contrib module - generic command scheduler

From

Pavel Stehule

Date:

13 May 2015, 06:33:03

2015-05-13 7:50 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 5/12/15 11:32 PM, Pavel Stehule wrote:
I would not to store state on this level - so "at" should be
implemented on higher level. There is very high number of
possible strategies, what can be done with failed tasks - and I
would not to open this topic. I believe with proposed scheduler,
anybody can simply implement what need in PLpgSQL with dynamic
SQL. But on second hand "run once" can be implemented with
proposed API too.

That seems reasonable in a v1, so long as there's room to easily
extend it without pain to add "at"-like one-shot commands,
at-startup commands, etc.

Yeah, being able to run things after certain system events would be nice.

I'd prefer to see a scheduling interface that's a close match for
cron's or that leaves room for it - so things like "*/5" for every
five minutes, ranges like "Mon-Fri", etc. If there's a way to
express similar capabilities more cleanly using PostgreSQL's types
and conventions that makes sense, but I'm not sure a composite type
of arrays fits that.

It seems unfortunate to go with cron's limited syntax when we have such fully capable timestamp and interval capabilities already in the database. :/

It is next option - MySQL event scheduler use it. The usage is trivial - but it is little bit weak - it is hard to describe some asymmetric events - like run in working days only - but if I use named parameters and axillary constructor function I am thinking so it can be supported too.

make_scheduled_time(at => '2015-014-05 10:00:0', repeat => '1day', stop_after => '...')

Is there anything worth stealing from pgAgent?

Surely not - although I have little bit different goals - pgAgent is top end scheduler - little bit complex due support jobs/steps. My target is implementation of low end scheduler. pgAgent and others can be implemented as next layer. It should be strong enough for some simple admin tasks, and strong enough for base for implementation some complex scheduler and workflow systems - but it should be simple as possible. In this moment PLpgSQL is strong enough for implementation very complex workflow system - but missing the low end scheduling functionality.

I though about it too - but the parser for this cron time will be longer
than all other code probably. I see a possibility to write constructors
that simplify creating a value of this type. Some like

make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or
make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);

Wouldn't that be just as bad as writing the parser in the first place?

yes - I am thinking about special type, where input function will be empty and value has to be created with constructor function - it can simplify parser lot.

1. don't hold a states, results of commands
...
> 3. When command fails, it writes info to log only
Unfortunate, but understandable in a first pass.

4. When command runs too long (over specified timeout), it is killed.

I think that needs to be optional.

you can specify timeout for any command - so if you specify timeout 0, then it will run without timeout.

5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and max_workers
> 0), write to log

Also unfortunate. We already don't provide enough monitoring capability and this just makes that worse.

theoretically it can be supported some pg_stat_ view - but I would not to implement a some history table for commands. Again it is task for higher layers.

Perhaps it would be better to put something into PGXN first; this doesn't really feel like it's baked enough for contrib yet. (And I say that as someone who's really wanted this ability in the past...)

It is plan B. I am thinking so PostgreSQL missing some lowend scheduler so I am asking here. Some features can be implemented later, some features can be implemented elsewhere. I have to specify limit, borders, what is simple scheduler, and what is not. The full functionality scheduler is relatively heavy application - so it should not be a contrib module. But simple generic scheduler can be good enough for 50% and with some simple plpgsql code for other 40%

Regards

Pavel

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: proposal: contrib module - generic command scheduler

From

Jim Nasby

Date:

14 May 2015, 06:01:33

On 5/13/15 1:32 AM, Pavel Stehule wrote:
>         5. When command waits to free worker, write to log
>         6. When command was not be executed due missing workers (and
>         max_workers
>           > 0), write to log
>
>
>     Also unfortunate. We already don't provide enough monitoring
>     capability and this just makes that worse.
>
>
> theoretically it can be supported some pg_stat_ view - but I would not
> to implement a some history table for commands. Again it is task for
> higher layers.

I don't think we want to log statements, but we should be able to log 
when a job has run and whether it succeeded or not. (log in a table, not 
just a logfile).

This isn't something that can be done at higher layers either; only the 
scheduler will know if the job failed to even start, or whether it tried 
to run the job.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: proposal: contrib module - generic command scheduler

From

Pavel Stehule

Date:

14 May 2015, 06:37:14

2015-05-14 8:01 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 5/13/15 1:32 AM, Pavel Stehule wrote:
5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and
max_workers
> 0), write to log

Also unfortunate. We already don't provide enough monitoring
capability and this just makes that worse.

theoretically it can be supported some pg_stat_ view - but I would not
to implement a some history table for commands. Again it is task for
higher layers.

I don't think we want to log statements, but we should be able to log when a job has run and whether it succeeded or not. (log in a table, not just a logfile).

This isn't something that can be done at higher layers either; only the scheduler will know if the job failed to even start, or whether it tried to run the job.

I don't agree - generic scheduler can run your procedure, and there you can log start, you can run other commands and you can log result (now there is no problem to catch any production nonfatal exception). Personally I afraid about responsibility to maintain this log table - when and by who it should be cleaned, who can see results, ... This is job for top end scheduler.

Regards

Pavel

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: proposal: contrib module - generic command scheduler

From

Jim Nasby

Date:

14 May 2015, 17:13:04

On 5/14/15 1:36 AM, Pavel Stehule wrote:
>     I don't think we want to log statements, but we should be able to
>     log when a job has run and whether it succeeded or not. (log in a
>     table, not just a logfile).
>
>     This isn't something that can be done at higher layers either; only
>     the scheduler will know if the job failed to even start, or whether
>     it tried to run the job.
>
>
> I don't agree - generic scheduler can run your procedure, and there you
> can log start, you can run other commands and you can log result (now
> there is no problem to catch any production nonfatal exception).

And what happens when the job fails to even start? You get no logging.

> Personally I afraid about responsibility to maintain this log table -
> when and by who it should be cleaned, who can see results, ... This is
> job for top end scheduler.

Only if the top-end scheduler has callbacks for everytime the bottom-end 
scheduler tries to start a job. Otherwise, the top has no clue what the 
bottom has actually attempted.

To be clear, I don't think these need to be done in a first pass. I am 
concerned about not painting ourselves into a corner though.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: proposal: contrib module - generic command scheduler

From

Pavel Stehule

Date:

15 May 2015, 05:18:48

2015-05-14 19:12 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 5/14/15 1:36 AM, Pavel Stehule wrote:
I don't think we want to log statements, but we should be able to
log when a job has run and whether it succeeded or not. (log in a
table, not just a logfile).

This isn't something that can be done at higher layers either; only
the scheduler will know if the job failed to even start, or whether
it tried to run the job.

I don't agree - generic scheduler can run your procedure, and there you
can log start, you can run other commands and you can log result (now
there is no problem to catch any production nonfatal exception).

And what happens when the job fails to even start? You get no logging.

Is only one case - when job is not started due missing worker. Else where is started topend executor, that can run in protected block.

Personally I afraid about responsibility to maintain this log table -
when and by who it should be cleaned, who can see results, ... This is
job for top end scheduler.

Only if the top-end scheduler has callbacks for everytime the bottom-end scheduler tries to start a job. Otherwise, the top has no clue what the bottom has actually attempted.

sure.

To be clear, I don't think these need to be done in a first pass. I am concerned about not painting ourselves into a corner though.

I understand

Regards

Pavel

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: proposal: contrib module - generic command scheduler

From

Alvaro Herrera

Date:

20 August 2015, 14:42:28

Pavel Stehule wrote:

Hi,

> Job schedulers are important and sometimes very complex part of any
> software. PostgreSQL miss it. I propose new contrib module, that can be
> used simply for some tasks, and that can be used as base for other more
> richer schedulers. I prefer minimalist design - but strong enough for
> enhancing when it is necessary. Some complex logic can be implemented in PL
> better than in C. Motto: Simply to learn, simply to use, simply to
> customize.

Have you made any progress on this?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: proposal: contrib module - generic command scheduler

From

Pavel Stehule

Date:

21 August 2015, 05:01:14

2015-08-20 16:42 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Pavel Stehule wrote:

Hi,

> Job schedulers are important and sometimes very complex part of any
> software. PostgreSQL miss it. I propose new contrib module, that can be
> used simply for some tasks, and that can be used as base for other more
> richer schedulers. I prefer minimalist design - but strong enough for
> enhancing when it is necessary. Some complex logic can be implemented in PL
> better than in C. Motto: Simply to learn, simply to use, simply to
> customize.

Have you made any progress on this?

I am working on second iteration prototype - resp. I worked one month ago. I finished the basic design, basic infrastructure. I designed architecture based on one coordinator and dynamicaly started workers.

I found, so probably some fair policy should be implemented in future.

I have to finish other requests now, so I am planning to continue at end of autumn, but sources are public

https://github.com/okbob/generic-scheduler, Not sure, about code quality - I had not time to debug it. But mental model (UI) is almost complete - https://github.com/okbob/generic-scheduler/blob/master/schedulerx--1.0.sql

I found as interesting idea to handle not only time events, but handle our notifications too. It can be perfect base for building some complex workflow systems. But I did zero work on this topic.

Regards

Pavel

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services