Thread: elegant and effective way for running jobs inside a database

elegant and effective way for running jobs inside a database

From
Artur Litwinowicz
Date:
Dear Developers,
   I am looking for elegant and effective way for running jobs inside a
database or cluster - for now I can not find that solution.
OK if You say "use cron" or "pgAgent" I say I know that solutions, but
the are not effective and elegant. Compilation of "pgAgent" is a pain
(especially wxWidgets usage on system with no X) - it can run jobs with
minimal 60s periods but what when someone needs run it faster for eg.
with 5s period ? Of course using cron I can do that but it is not
effective and elegant solution. Why PostgreSQL can not have so elegant
solution like Oracle database ? I am working with Oracle databases for
many years, but I like much more PostgreSQL project but this one
thing... I can not understand - the lack of jobs inside the database...

Best regards,
Artur

Attachment

Re: elegant and effective way for running jobs inside a database

From
Fabrízio de Royes Mello
Date:

2012/3/5 Artur Litwinowicz <admin@ybka.com>
Dear Developers,
  I am looking for elegant and effective way for running jobs inside a
database or cluster - for now I can not find that solution.
OK if You say "use cron" or "pgAgent" I say I know that solutions, but
the are not effective and elegant. Compilation of "pgAgent" is a pain
(especially wxWidgets usage on system with no X) - it can run jobs with
minimal 60s periods but what when someone needs run it faster for eg.
with 5s period ? Of course using cron I can do that but it is not
effective and elegant solution. Why PostgreSQL can not have so elegant
solution like Oracle database ? I am working with Oracle databases for
many years, but I like much more PostgreSQL project but this one
thing... I can not understand - the lack of jobs inside the database...


IMHO it is not necessary add this feature to the PostgreSQL core, because the OS already has the capability to schedule and maintain the tasks.

Best regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com

Re: elegant and effective way for running jobs inside a database

From
Alvaro Herrera
Date:
Excerpts from Artur Litwinowicz's message of lun mar 05 16:18:56 -0300 2012:
> Dear Developers,
>    I am looking for elegant and effective way for running jobs inside a
> database or cluster - for now I can not find that solution.

Yeah, it'd be good to have something.  Many people say it's not
necessary, and probably some hackers would oppose it; but mainly I think
we just haven't agreed (or even discussed) what the design of such a
scheduler would look like.  For example, do we want it to be able to
just connect and run queries and stuff, or do we want something more
elaborate able to start programs such as running pg_dump?  What if the
program crashes -- should it cause the server to restart?  And so on.
It's not a trivial problem.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: elegant and effective way for running jobs inside a database

From
Pavel Stehule
Date:
Hello

2012/3/5 Alvaro Herrera <alvherre@commandprompt.com>:
>
> Excerpts from Artur Litwinowicz's message of lun mar 05 16:18:56 -0300 2012:
>> Dear Developers,
>>    I am looking for elegant and effective way for running jobs inside a
>> database or cluster - for now I can not find that solution.
>
> Yeah, it'd be good to have something.  Many people say it's not
> necessary, and probably some hackers would oppose it; but mainly I think
> we just haven't agreed (or even discussed) what the design of such a
> scheduler would look like.  For example, do we want it to be able to
> just connect and run queries and stuff, or do we want something more
> elaborate able to start programs such as running pg_dump?  What if the
> program crashes -- should it cause the server to restart?  And so on.
> It's not a trivial problem.
>

I agree - it is not simple

* workflow support
* dependency support

a general ACID scheduler can be nice (in pg) but it is not really
simple. There was some proposal about using autovacuum demon like
scheduler.

Pavel

> --
> Álvaro Herrera <alvherre@commandprompt.com>
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: elegant and effective way for running jobs inside a database

From
Artur Litwinowicz
Date:
W dniu 2012-03-05 20:56, Alvaro Herrera pisze:
>
> Excerpts from Artur Litwinowicz's message of lun mar 05 16:18:56 -0300 2012:
>> Dear Developers,
>>    I am looking for elegant and effective way for running jobs inside a
>> database or cluster - for now I can not find that solution.
>
> Yeah, it'd be good to have something.  Many people say it's not
> necessary, and probably some hackers would oppose it; but mainly I think
> we just haven't agreed (or even discussed) what the design of such a
> scheduler would look like.  For example, do we want it to be able to
> just connect and run queries and stuff, or do we want something more
> elaborate able to start programs such as running pg_dump?  What if the
> program crashes -- should it cause the server to restart?  And so on.
> It's not a trivial problem.
>

Yes, yes it is not a trivial problem... - tools like "pgAgent" are good
when someone starts play with PostgreSQL - but this great environment
(only one serious against something like Oracle or DB2) needs something
professional, production ready. It can not happen when we are upgrading
database or OS and can not compile "pgAgent" because of "strange"
dependences... and for example whole sofisticated solution like web
application with complicated data flow has a problem... For example I am
using stored functions developed in Lua language, which are writing and
reading data to and from Redis server with a periods less then one
minute. Without "heart beat" like precise job manager it can not works
as professional as it can. Every one can use CRON or something like that
- yes it works but PostgreSQL has so many features and something like
job manager is inalienable in mine mind.

Best regards,
Artur


Attachment

Re: elegant and effective way for running jobs inside a database

From
Pavel Stehule
Date:
2012/3/5 Artur Litwinowicz <admin@ybka.com>:
> W dniu 2012-03-05 20:56, Alvaro Herrera pisze:
>>
>> Excerpts from Artur Litwinowicz's message of lun mar 05 16:18:56 -0300 2012:
>>> Dear Developers,
>>>    I am looking for elegant and effective way for running jobs inside a
>>> database or cluster - for now I can not find that solution.
>>
>> Yeah, it'd be good to have something.  Many people say it's not
>> necessary, and probably some hackers would oppose it; but mainly I think
>> we just haven't agreed (or even discussed) what the design of such a
>> scheduler would look like.  For example, do we want it to be able to
>> just connect and run queries and stuff, or do we want something more
>> elaborate able to start programs such as running pg_dump?  What if the
>> program crashes -- should it cause the server to restart?  And so on.
>> It's not a trivial problem.
>>
>
> Yes, yes it is not a trivial problem... - tools like "pgAgent" are good
> when someone starts play with PostgreSQL - but this great environment
> (only one serious against something like Oracle or DB2) needs something
> professional, production ready. It can not happen when we are upgrading
> database or OS and can not compile "pgAgent" because of "strange"
> dependences... and for example whole sofisticated solution like web
> application with complicated data flow has a problem... For example I am
> using stored functions developed in Lua language, which are writing and
> reading data to and from Redis server with a periods less then one
> minute. Without "heart beat" like precise job manager it can not works
> as professional as it can. Every one can use CRON or something like that
> - yes it works but PostgreSQL has so many features and something like
> job manager is inalienable in mine mind.

Long time a strategy for PostgreSQL was a minimal core and extensible
modules without duplication some system services. This strategy is
valid still but some services are in core - example should be
replication.

Some proposals about custom scheduler exists
http://archives.postgresql.org/pgsql-hackers/2010-02/msg01701.php and
it is part of ToDo - so this feature should be in core (in next 2-4
years).

Why this is not in core? Nobody wrote it :).

Regards

Pavel Stehule

>
> Best regards,
> Artur
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: elegant and effective way for running jobs inside a database

From
Artur Litwinowicz
Date:
W dniu 2012-03-05 22:09, Pavel Stehule pisze:
> 2012/3/5 Artur Litwinowicz <admin@ybka.com>:
>> W dniu 2012-03-05 20:56, Alvaro Herrera pisze:
>>>
>>> Excerpts from Artur Litwinowicz's message of lun mar 05 16:18:56 -0300 2012:
>>>> Dear Developers,
>>>>    I am looking for elegant and effective way for running jobs inside a
>>>> database or cluster - for now I can not find that solution.
>>>
>>> Yeah, it'd be good to have something.  Many people say it's not
>>> necessary, and probably some hackers would oppose it; but mainly I think
>>> we just haven't agreed (or even discussed) what the design of such a
>>> scheduler would look like.  For example, do we want it to be able to
>>> just connect and run queries and stuff, or do we want something more
>>> elaborate able to start programs such as running pg_dump?  What if the
>>> program crashes -- should it cause the server to restart?  And so on.
>>> It's not a trivial problem.
>>>
>>
>> Yes, yes it is not a trivial problem... - tools like "pgAgent" are good
>> when someone starts play with PostgreSQL - but this great environment
>> (only one serious against something like Oracle or DB2) needs something
>> professional, production ready. It can not happen when we are upgrading
>> database or OS and can not compile "pgAgent" because of "strange"
>> dependences... and for example whole sofisticated solution like web
>> application with complicated data flow has a problem... For example I am
>> using stored functions developed in Lua language, which are writing and
>> reading data to and from Redis server with a periods less then one
>> minute. Without "heart beat" like precise job manager it can not works
>> as professional as it can. Every one can use CRON or something like that
>> - yes it works but PostgreSQL has so many features and something like
>> job manager is inalienable in mine mind.
>
> Long time a strategy for PostgreSQL was a minimal core and extensible
> modules without duplication some system services. This strategy is
> valid still but some services are in core - example should be
> replication.
>
> Some proposals about custom scheduler exists
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg01701.php and
> it is part of ToDo - so this feature should be in core (in next 2-4
> years).
>
> Why this is not in core? Nobody wrote it :).
>
> Regards
>
> Pavel Stehule
>
>>
>> Best regards,
>> Artur
>>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>

Ouch... "in next 2-4 years" - it broke my heart like a bullet - You
should not write it... ;)
I feel that I need to set aside SQL, Python, PHP and so on and take to
my hands old book about C programming language from university ;)
I hope my words are like drops of water for this idea and in the future
some people will be happy to use professional job manager :)

Best regards,
Artur

Attachment

Re: elegant and effective way for running jobs inside a database

From
Alvaro Herrera
Date:
Excerpts from Artur Litwinowicz's message of lun mar 05 18:32:44 -0300 2012:

> Ouch... "in next 2-4 years" - it broke my heart like a bullet - You
> should not write it... ;)
> I feel that I need to set aside SQL, Python, PHP and so on and take to
> my hands old book about C programming language from university ;)
> I hope my words are like drops of water for this idea and in the future
> some people will be happy to use professional job manager :)

Keep in mind that it's not about coding in C but mostly about figuring
out what a sane design out to look like.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: elegant and effective way for running jobs inside a database

From
Artur Litwinowicz
Date:
W dniu 2012-03-05 22:44, Alvaro Herrera pisze:
>
> Excerpts from Artur Litwinowicz's message of lun mar 05 18:32:44 -0300 2012:
>
>> Ouch... "in next 2-4 years" - it broke my heart like a bullet - You
>> should not write it... ;)
>> I feel that I need to set aside SQL, Python, PHP and so on and take to
>> my hands old book about C programming language from university ;)
>> I hope my words are like drops of water for this idea and in the future
>> some people will be happy to use professional job manager :)
>
> Keep in mind that it's not about coding in C but mostly about figuring
> out what a sane design out to look like.
>

I understand it... (I meant if you wanna something... do it for your
self - it is the fastest way).
Regarding a functional area I can help... but I can not understand why
this idea is so unappreciated?
It will be so powerfull feature - I am working with systems made for
goverment (Orcale) - jobs are the core gears for data flow between many
systems and other goverment bureaus.

Best regards,
Artur

Attachment

Re: elegant and effective way for running jobs inside a database

From
Jaime Casanova
Date:
On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz <admin@ybka.com> wrote:
>
> I understand it... (I meant if you wanna something... do it for your
> self - it is the fastest way).

other way is to fund the work so someone can use his/her time to do it

> Regarding a functional area I can help... but I can not understand why
> this idea is so unappreciated?

is not unappreciated, is just a problem that already *has* a solution
if it were something that currently you can't do it then there would
be more people after it

> It will be so powerfull feature - I am working with systems made for
> goverment (Orcale) - jobs are the core gears for data flow between many
> systems and other goverment bureaus.
>

me too, and we solve it with cron

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


Re: elegant and effective way for running jobs inside a database

From
Artur Litwinowicz
Date:
W dniu 2012-03-05 23:09, Jaime Casanova pisze:
> On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz <admin@ybka.com> wrote:
>>
>> I understand it... (I meant if you wanna something... do it for your
>> self - it is the fastest way).
>
> other way is to fund the work so someone can use his/her time to do it
>
>> Regarding a functional area I can help... but I can not understand why
>> this idea is so unappreciated?
>
> is not unappreciated, is just a problem that already *has* a solution
> if it were something that currently you can't do it then there would
> be more people after it
>
>> It will be so powerfull feature - I am working with systems made for
>> goverment (Orcale) - jobs are the core gears for data flow between many
>> systems and other goverment bureaus.
>>
>
> me too, and we solve it with cron
>

And You can modulate the jobs frequency, stop them and start from inside
the database automatically using only algorithms and interenal events
without administrator hand work... with cron... I can not belive... I do
not meant just simple: run stored procedure... I am using cron as well,
but in my work I like elegant, complex solutions - many "lego" blocks is
not always the best and simplest solution...

Attachment

Re: elegant and effective way for running jobs inside a database

From
Christopher Browne
Date:
On Mon, Mar 5, 2012 at 4:44 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>
> Excerpts from Artur Litwinowicz's message of lun mar 05 18:32:44 -0300 2012:
>
>> Ouch... "in next 2-4 years" - it broke my heart like a bullet - You
>> should not write it... ;)
>> I feel that I need to set aside SQL, Python, PHP and so on and take to
>> my hands old book about C programming language from university ;)
>> I hope my words are like drops of water for this idea and in the future
>> some people will be happy to use professional job manager :)
>
> Keep in mind that it's not about coding in C but mostly about figuring
> out what a sane design out to look like.

Just so.

And it seems to me that the Right Thing here is to go down the road to
having the fabled Stored Procedure Language, which is *not* pl/pgsql,
in that iIt needs to run *outside* transactional context.  It needs to
be able to start transactions, not to run inside them.

Given a language which can do some setup of transactions and then run
them, this could be readily used for a number of useful purposes, of
which a job scheduler would be just a single example.

It would enable turning some backend processes from hand-coded C into
possibly more dynamically-flexible scripted structures.

I'd expect this to be useful for having more customizable/dynamic
policies for the autovacuum process, for instance.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: elegant and effective way for running jobs inside a database

From
Daniel Farina
Date:
On Mon, Mar 5, 2012 at 12:17 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hello
>
> 2012/3/5 Alvaro Herrera <alvherre@commandprompt.com>:
>>
>> Excerpts from Artur Litwinowicz's message of lun mar 05 16:18:56 -0300 2012:
>>> Dear Developers,
>>>    I am looking for elegant and effective way for running jobs inside a
>>> database or cluster - for now I can not find that solution.
>>
>> Yeah, it'd be good to have something.  Many people say it's not
>> necessary, and probably some hackers would oppose it; but mainly I think
>> we just haven't agreed (or even discussed) what the design of such a
>> scheduler would look like.  For example, do we want it to be able to
>> just connect and run queries and stuff, or do we want something more
>> elaborate able to start programs such as running pg_dump?  What if the
>> program crashes -- should it cause the server to restart?  And so on.
>> It's not a trivial problem.
>>
>
> I agree - it is not simple
>
> * workflow support
> * dependency support
>
> a general ACID scheduler can be nice (in pg) but it is not really
> simple. There was some proposal about using autovacuum demon like
> scheduler.

I've been thinking about making autovacuum a special case of a general
*non*-transactional job-running system because dealing with large
physical changes to a database (where one wants to rewrite 300GB of
data, or whatever) that are prohibitive in a transaction are -- to
understate things -- incredibly painful.  Painful enough that people
will risk taking their site down with a large UPDATE or ALTER TABLE,
hoping that they can survive the duration (and then when they cancel
it and are left with huge volumes of dead tuples, things get a lot
more ugly).

The closest approximation a client program can make is "well, I guess
I'll paginate through the database and rewrite small chunks". Instead,
it may make more sense to have the database spoon-feed work to do the
transformations little-at-a-time ala autovacuum.

--
fdr


Re: elegant and effective way for running jobs inside a database

From
"David Johnston"
Date:
> >
> > Keep in mind that it's not about coding in C but mostly about figuring
> > out what a sane design out to look like.
> >
>

While I can straddle the fence pretty my first reaction is that we are talking about "application" functionality that
fallsoutside what belongs in "core" PostgreSQL.  I'd rather see pgAgent be improved and act as a basic implementation
while,for more complex use-cases, letting the community/marketplace provide solutions. 

Even with simple use-cases you end up having a separate process continually running anyway.  The main benefit to
linkingwith core would be the ability to startup that process after the server starts and shutdown the process before
theserver shutdown.  That communication channel is something to consider outside this specific application and, if
done,could be used to talk with whatever designated "pgAgent"-like application the user chooses.  Other applications
couldalso be communicated with in this way.  Basically some form of API where in the postgres.conf file you specify
whichIP addresses and ports you wish to synchronize and which executable to launch just prior to communicating on said
port. If the startup routine succeeds that Postgres will, within reason, attempt to communicate and wait for these
externalprocess to finish before shutting down.  If the external application closes it should proactively notify
Postgresthat it is doing so AND if you startup a program manually it can look for and talk with a running Postgres
instance.

David J.





Re: elegant and effective way for running jobs inside a database

From
Robert Haas
Date:
On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz <admin@ybka.com> wrote:
> Regarding a functional area I can help... but I can not understand why
> this idea is so unappreciated?

I think it's a bit unfair to say that this idea is unappreciated.
There are LOTS of good features that we don't have yet simply because
nobody's had time to implement them.  There are many things I'd really
like to have that I have spent no time at all on as yet, just because
there are other things that I (or my employer) would like to have even
more.  The good news is that this is an open-source project and there
is always room at the table for more people who would like to
contribute (or fund others so that they can contribute).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: elegant and effective way for running jobs inside a database

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz <admin@ybka.com> wrote:
>> Regarding a functional area I can help... but I can not understand why
>> this idea is so unappreciated?

> I think it's a bit unfair to say that this idea is unappreciated.

Well, there is the question of why we should re-invent the cron wheel.

> There are LOTS of good features that we don't have yet simply because
> nobody's had time to implement them.

Implementation work is only part of it.  Any large feature will create
an ongoing, distributed maintenance overhead.  It seems entirely
possible to me that we'd not accept such a feature even if someone
dropped a working implementation on us.

But having said that, it's not apparent to me why such a thing would
need to live "inside the database" at all.  It's very easy to visualize
a task scheduler that runs as a client and requires nothing new from the
core code.  Approaching the problem that way would let the scheduler
be an independent project that stands or falls on its own merits.
        regards, tom lane


Re: elegant and effective way for running jobs inside a database

From
Pavel Stehule
Date:
2012/3/6 Tom Lane <tgl@sss.pgh.pa.us>:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz <admin@ybka.com> wrote:
>>> Regarding a functional area I can help... but I can not understand why
>>> this idea is so unappreciated?
>
>> I think it's a bit unfair to say that this idea is unappreciated.
>
> Well, there is the question of why we should re-invent the cron wheel.
>
>> There are LOTS of good features that we don't have yet simply because
>> nobody's had time to implement them.
>
> Implementation work is only part of it.  Any large feature will create
> an ongoing, distributed maintenance overhead.  It seems entirely
> possible to me that we'd not accept such a feature even if someone
> dropped a working implementation on us.
>
> But having said that, it's not apparent to me why such a thing would
> need to live "inside the database" at all.  It's very easy to visualize
> a task scheduler that runs as a client and requires nothing new from the
> core code.  Approaching the problem that way would let the scheduler
> be an independent project that stands or falls on its own merits.

There are a few arguments for scheduler in core

* platform independence
* possible richer workflow and loging possibilities or as minimum -
better integration with SP
* when application has lot of business logic in stored procedures,
then outer scheduler is little bit foreign element - harder
maintaining, harder configuration
* when somebody would to implement some like materialised views, then
have to have use outer schedule for very simple task - just exec SP
every 5 minutes

so I think there are reason why we can have a scheduler on core -
simple or richer, but it can helps. cron and similar works, but
maintaining of external scheduler is more expensive then using some
simple scheduler in core.

Regards

Pavel

>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: elegant and effective way for running jobs inside a database

From
Robert Haas
Date:
On Tue, Mar 6, 2012 at 10:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz <admin@ybka.com> wrote:
>>> Regarding a functional area I can help... but I can not understand why
>>> this idea is so unappreciated?
>
>> I think it's a bit unfair to say that this idea is unappreciated.
>
> Well, there is the question of why we should re-invent the cron wheel.
>
>> There are LOTS of good features that we don't have yet simply because
>> nobody's had time to implement them.
>
> Implementation work is only part of it.  Any large feature will create
> an ongoing, distributed maintenance overhead.  It seems entirely
> possible to me that we'd not accept such a feature even if someone
> dropped a working implementation on us.
>
> But having said that, it's not apparent to me why such a thing would
> need to live "inside the database" at all.  It's very easy to visualize
> a task scheduler that runs as a client and requires nothing new from the
> core code.  Approaching the problem that way would let the scheduler
> be an independent project that stands or falls on its own merits.

I was trying to make a general comment about PostgreSQL development,
without diving too far into the merits or demerits of this particular
feature.  I suspect you'd agree with me that, in general, a lot of
valuable things don't get done because there aren't enough people or
enough hours in the day, and we can always use more contributors.

But since you brought it up, I think there is a lot of value to having
a scheduler that's integrated with the database.  There are many
things that the database does which could also be done outside the
database, but people want them in the database because it's easier
that way.  If you have a web application that talks to the database,
and which sometimes needs to schedule tasks to run at a future time,
it is much nicer to do that by inserting a row into an SQL table
somewhere, or executing some bit of DDL, than it is to do it by making
your web application know how to connect to a PostgreSQL database and
also how to rewrite crontab (in a concurrency-safe manner, no less).

Now, the extent to which such a schedule requires core support is
certainly arguable.  Maybe it doesn't, and can be an entirely
stand-alone project.  pgAgent aims to do something like this, but it
has a number of deficiencies, including a tendency to quit
unexpectedly and a very klunky interface.  Those are things that could
presumably fixed, or done differently in a new implementation, and
maybe that's all anyone needs.  Or maybe it's not.  But at any rate I
think the idea of a better job scheduler is a good one, and if anyone
is interested in working on that, I think we should encourage them to
do so, regardless of what happens vis-a-vis core.  This is a very
common need, and the current solutions are clearly more awkward than
our users would like.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: elegant and effective way for running jobs inside a database

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 6, 2012 at 10:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But having said that, it's not apparent to me why such a thing would
>> need to live "inside the database" at all. �It's very easy to visualize
>> a task scheduler that runs as a client and requires nothing new from the
>> core code. �Approaching the problem that way would let the scheduler
>> be an independent project that stands or falls on its own merits.

> But since you brought it up, I think there is a lot of value to having
> a scheduler that's integrated with the database.  There are many
> things that the database does which could also be done outside the
> database, but people want them in the database because it's easier
> that way.  If you have a web application that talks to the database,
> and which sometimes needs to schedule tasks to run at a future time,
> it is much nicer to do that by inserting a row into an SQL table
> somewhere, or executing some bit of DDL, than it is to do it by making
> your web application know how to connect to a PostgreSQL database and
> also how to rewrite crontab (in a concurrency-safe manner, no less).

Sure, and I would expect that a client-side scheduler would work just
the same way: you make requests to it through database actions such
as inserting a row in a task table.

> Now, the extent to which such a schedule requires core support is
> certainly arguable.  Maybe it doesn't, and can be an entirely
> stand-alone project.  pgAgent aims to do something like this, but it
> has a number of deficiencies, including a tendency to quit
> unexpectedly and a very klunky interface.

Well, if they didn't get it right the first time, that suggests that
it's a harder problem than people would like to think.  All the more
reason to do it as an external project, at least to start with.
I would much rather entertain a proposal to integrate a design that's
been proven by an external implementation, than a proposal to implement
a design that's never been tested at all (which we'll nonetheless have
to support for eternity, even if it turns out to suck).
        regards, tom lane


Re: elegant and effective way for running jobs inside a database

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of mar mar 06 12:47:46 -0300 2012:
> Robert Haas <robertmhaas@gmail.com> writes:

> > But since you brought it up, I think there is a lot of value to having
> > a scheduler that's integrated with the database.  There are many
> > things that the database does which could also be done outside the
> > database, but people want them in the database because it's easier
> > that way.  If you have a web application that talks to the database,
> > and which sometimes needs to schedule tasks to run at a future time,
> > it is much nicer to do that by inserting a row into an SQL table
> > somewhere, or executing some bit of DDL, than it is to do it by making
> > your web application know how to connect to a PostgreSQL database and
> > also how to rewrite crontab (in a concurrency-safe manner, no less).
>
> Sure, and I would expect that a client-side scheduler would work just
> the same way: you make requests to it through database actions such
> as inserting a row in a task table.

What such an external scheduler would need from core is support for
starting up and shutting down along postmaster (as well as restarts at
appropriate times).  Postmaster already has the ability to start and
shut down many processes depending on several different policies; I
think it's mostly a matter of exporting that funcionality in a sane way.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: elegant and effective way for running jobs inside a database

From
Artur Litwinowicz
Date:
With all the respect to all into this Community...
I have no experience enough rich with C or C++ to say yes I can do that
alone. I do not know the internals of PostgreSQL at all. But I have
quite long experience with other languages.
I imagine if you have a piece of code which can run function like
"Select function(123);" you can reuse it (with some modifications) to
run jobs saved in job manager tables in the same manner. All we need is
two "crazy" (some simplification) loops - one for job running and one
for control and logging purposes - all fast enought with period not
slower then 5s or faster.

Algorithm for first loop:
check jobs exists and is time to run it
   run job as other sql statements (some validity check may be done)
   get next job
no jobs - delay

second loop:
find started job
   check it is still working
      if error log it, calculate next start time
        (may be longer then at the first time)
        if configured and clean up
      if works fine log duration
      if just finished log it, calculate next run and clean up
   find next job
no jobs - delay

And it will be art of state if the job could return (but not have to)
next run time value (for log loop to save).
And it is quite all about I wanted to say - do not understand me bad (I
do not want to teach anyone or something like that ;) - I wanted just
explain what I meant.

Best regards,
Artur


Attachment

Re: elegant and effective way for running jobs inside a database

From
Christopher Browne
Date:
On Tue, Mar 6, 2012 at 12:20 PM, Artur Litwinowicz <admin@ybka.com> wrote:
> Algorithm for first loop:
> check jobs exists and is time to run it
>   run job as other sql statements (some validity check may be done)
>   get next job
> no jobs - delay

There are crucial things missing here, namely the need to establish at
least one database connection in order to be able to check for the
existence of jobs, as well as to establish additional connections as
contexts in which to run jobs.

That implies the need for some New Stuff that isn't quite the same as
what we have within server processes today.

There is nothing horrible about this; just that there's some extra
mechanism that needs to come into existence in order to do this.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: elegant and effective way for running jobs inside a database

From
Robert Haas
Date:
On Tue, Mar 6, 2012 at 12:37 PM, Christopher Browne <cbbrowne@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 12:20 PM, Artur Litwinowicz <admin@ybka.com> wrote:
>> Algorithm for first loop:
>> check jobs exists and is time to run it
>>   run job as other sql statements (some validity check may be done)
>>   get next job
>> no jobs - delay
>
> There are crucial things missing here, namely the need to establish at
> least one database connection in order to be able to check for the
> existence of jobs, as well as to establish additional connections as
> contexts in which to run jobs.
>
> That implies the need for some New Stuff that isn't quite the same as
> what we have within server processes today.
>
> There is nothing horrible about this; just that there's some extra
> mechanism that needs to come into existence in order to do this.

And also some interface.  It'd be useful to have background jobs that
executed either immediately or at a certain time or after a certain
delay, as well as repeating jobs that execute at a certain interval or
on a certain schedule.  Figuring out what all that should look like
is, well, part of the work that someone has to do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: elegant and effective way for running jobs inside a database

From
Pavel Stehule
Date:
2012/3/6 Robert Haas <robertmhaas@gmail.com>:
> On Tue, Mar 6, 2012 at 12:37 PM, Christopher Browne <cbbrowne@gmail.com> wrote:
>> On Tue, Mar 6, 2012 at 12:20 PM, Artur Litwinowicz <admin@ybka.com> wrote:
>>> Algorithm for first loop:
>>> check jobs exists and is time to run it
>>>   run job as other sql statements (some validity check may be done)
>>>   get next job
>>> no jobs - delay
>>
>> There are crucial things missing here, namely the need to establish at
>> least one database connection in order to be able to check for the
>> existence of jobs, as well as to establish additional connections as
>> contexts in which to run jobs.
>>
>> That implies the need for some New Stuff that isn't quite the same as
>> what we have within server processes today.
>>
>> There is nothing horrible about this; just that there's some extra
>> mechanism that needs to come into existence in order to do this.
>
> And also some interface.  It'd be useful to have background jobs that
> executed either immediately or at a certain time or after a certain
> delay, as well as repeating jobs that execute at a certain interval or
> on a certain schedule.  Figuring out what all that should look like
> is, well, part of the work that someone has to do.

+1

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: elegant and effective way for running jobs inside a database

From
Alvaro Herrera
Date:
Excerpts from Pavel Stehule's message of mar mar 06 14:57:30 -0300 2012:
> 2012/3/6 Robert Haas <robertmhaas@gmail.com>:
> > On Tue, Mar 6, 2012 at 12:37 PM, Christopher Browne <cbbrowne@gmail.com> wrote:
> >> On Tue, Mar 6, 2012 at 12:20 PM, Artur Litwinowicz <admin@ybka.com> wrote:
> >>> Algorithm for first loop:
> >>> check jobs exists and is time to run it
> >>>   run job as other sql statements (some validity check may be done)
> >>>   get next job
> >>> no jobs - delay
> >>
> >> There are crucial things missing here, namely the need to establish at
> >> least one database connection in order to be able to check for the
> >> existence of jobs, as well as to establish additional connections as
> >> contexts in which to run jobs.
> >>
> >> That implies the need for some New Stuff that isn't quite the same as
> >> what we have within server processes today.
> >>
> >> There is nothing horrible about this; just that there's some extra
> >> mechanism that needs to come into existence in order to do this.
> >
> > And also some interface.  It'd be useful to have background jobs that
> > executed either immediately or at a certain time or after a certain
> > delay, as well as repeating jobs that execute at a certain interval or
> > on a certain schedule.  Figuring out what all that should look like
> > is, well, part of the work that someone has to do.
>
> +1

It seems to me that we could simply have some sort of external daemon
program running the schedule, i.e. starting up other programs or running
queries; that daemon would connect to the database somehow to fetch
tasks to run.  Separately a client program could be provided to program
tasks using a graphical interface, web, or whatever (more than one, if
we want to get fancy); this would also connect to the database and store
tasks to run by the daemon.  The client doesn't have to talk to the
daemon directly (we can simply have a trigger on the schedule table so
that the daemon receives a notify whenever the client changes stuff).

It seems to me that the only thing that needs core support is the
ability to start up the daemon when postmaster is ready to accept
queries, and shut the daemon down when postmaster kills backends (either
because one crashed, or because it's shutting down).

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: elegant and effective way for running jobs inside a database

From
Merlin Moncure
Date:
On Tue, Mar 6, 2012 at 9:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> But having said that, it's not apparent to me why such a thing would
>> need to live "inside the database" at all.  It's very easy to visualize
>> a task scheduler that runs as a client and requires nothing new from the
>> core code.  Approaching the problem that way would let the scheduler
>> be an independent project that stands or falls on its own merits.
>
> I was trying to make a general comment about PostgreSQL development,
> without diving too far into the merits or demerits of this particular
> feature.  I suspect you'd agree with me that, in general, a lot of
> valuable things don't get done because there aren't enough people or
> enough hours in the day, and we can always use more contributors.
>
> But since you brought it up, I think there is a lot of value to having
> a scheduler that's integrated with the database.  There are many
> things that the database does which could also be done outside the
> database, but people want them in the database because it's easier
> that way.  If you have a web application that talks to the database,
> and which sometimes needs to schedule tasks to run at a future time,
> it is much nicer to do that by inserting a row into an SQL table
> somewhere, or executing some bit of DDL, than it is to do it by making
> your web application know how to connect to a PostgreSQL database and
> also how to rewrite crontab (in a concurrency-safe manner, no less).

The counter argument to this is that there's nothing keeping you from
layering your own scheduling system on top of cron.  Cron provides the
heartbeat -- everything else you build out with tables implementing a
work queue or whatever else comes to mind.

The counter-counter argument is that cron has a couple of annoying
limitations -- sub minute scheduling is not possible, lousy windows
support, etc.  It's pretty appealing that you would be able to back up
your database and get all your scheduling configuration back up with
it.  Dealing with cron is a headache for database administrators.

Personally I find the C-unixy way of solving this problem inside
postgres not worth chasing -- that really does belong outside and you
really are rewriting cron.  A (mostly) sql driven scheduler would be
pretty neat though.

I agree with Chris B upthread: I find that what people really need
here is stored procedures, or some way of being able to embed code in
the database that can manage it's own transactions.  That way your
server-side entry, dostuff() called every minute doesn't have to exit
to avoid accumulating locks for everything it needs to do or be broken
up into multiple independent entry points in scripts outside the
database.  Without SP though, you can still do it via 100% sql/plpgsql
using listen/notify and dblink for the AT workaround, and at least one
dedicated task runner.  By 'it' I mean a server side scheduling system
relying on a heartbeat from out of the database code.

merlin


Re: elegant and effective way for running jobs inside a database

From
Christopher Browne
Date:
On Tue, Mar 6, 2012 at 12:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> And also some interface.  It'd be useful to have background jobs that
> executed either immediately or at a certain time or after a certain
> delay, as well as repeating jobs that execute at a certain interval or
> on a certain schedule.  Figuring out what all that should look like
> is, well, part of the work that someone has to do.

Certainly.  It would seem to make sense to have a database schema
indicating this kind of metadata.

It needs to be sophisticated enough to cover *enough* unusual cases.

A schema duplicating crontab might look something like:
create table cron ( id serial primary key, minutes integer[], hours text integer[], dayofmonth integer[], month
integer[],dayofweek integer[], command text 
);

That's probably a bit too minimalist, and that only properly supports
one user's crontab.

The schema needs to include things like:
a) When to perform actions.  Several bases for this, including
time-based, event-based.
b) What actions to perform (including context as to database user,
search_path, desired UNIX $PWD, perhaps more than that)
c) Sequencing information, including what jobs should NOT be run concurrently.
d) Logging.  If a job succeeds, that should be noted.  If it fails,
that should be noted.  Want to know start + end times.
e) What to do on failure.  "Everything blows up" is not a good answer :-).
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: elegant and effective way for running jobs inside a database

From
Jaime Casanova
Date:
On Tue, Mar 6, 2012 at 1:14 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>
> It seems to me that the only thing that needs core support is the
> ability to start up the daemon when postmaster is ready to accept
> queries, and shut the daemon down when postmaster kills backends (either
> because one crashed, or because it's shutting down).
>

+1

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


Re: elegant and effective way for running jobs inside a database

From
Josh Berkus
Date:
>> It seems to me that the only thing that needs core support is the
>> ability to start up the daemon when postmaster is ready to accept
>> queries, and shut the daemon down when postmaster kills backends (either
>> because one crashed, or because it's shutting down).

I think this could be addressed simply by the ability to call actions at
a predefined interval, i.e.:

CREATE RECURRING JOB {job_name}
FOR EACH {interval}
[ STARTING {timestamptz} ]
[ ENDING {timestamptz} ]
EXECUTE PROCEDURE {procedure name}

CREATE RECURRING JOB {job_name}
FOR EACH {interval}
[ STARTING {timestamptz} ]
[ ENDING {timestamptz} ]
EXECUTE STATEMENT 'some statement'

(obviously, we'd want to adjust the above to use existing reserved
words, but you get the idea)

Activity and discretion beyond that could be defined in PL code,
including run/don't run conditions, activities, and dependancies.  The
only thing Postgres doesn't currently have is a clock which fires
events.  Anything we try to implement which is more complex than the
above is going to not work for someone.  And the pg_agent could be
adapted easily to use the Postgres clock instead of cron.

Oh, and the ability to run VACUUM inside a larger statement in some way.But that's a different TODO.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: elegant and effective way for running jobs inside a database

From
Dimitri Fontaine
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Activity and discretion beyond that could be defined in PL code,
> including run/don't run conditions, activities, and dependancies.  The
> only thing Postgres doesn't currently have is a clock which fires
> events.  Anything we try to implement which is more complex than the
> above is going to not work for someone.  And the pg_agent could be
> adapted easily to use the Postgres clock instead of cron.

Oh, you mean like a ticker?  If only we knew about a project that did
implement a ticker, in C, using the PostgreSQL licence, and who's using
it in large scale production.  While at it, if such a ticker could be
used to implement job queues…
 https://github.com/markokr/skytools/tree/master/sql/ticker

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: elegant and effective way for running jobs inside a database

From
Merlin Moncure
Date:
On Tue, Mar 6, 2012 at 3:44 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Activity and discretion beyond that could be defined in PL code,
>> including run/don't run conditions, activities, and dependancies.  The
>> only thing Postgres doesn't currently have is a clock which fires
>> events.  Anything we try to implement which is more complex than the
>> above is going to not work for someone.  And the pg_agent could be
>> adapted easily to use the Postgres clock instead of cron.
>
> Oh, you mean like a ticker?  If only we knew about a project that did
> implement a ticker, in C, using the PostgreSQL licence, and who's using
> it in large scale production.  While at it, if such a ticker could be
> used to implement job queues…
>
>  https://github.com/markokr/skytools/tree/master/sql/ticker

right -- exactly.  it would be pretty neat if the database exposed
this or a similar feature somehow -- perhaps by having the ticker send
a notify?  then a scheduler could sit on top of it without any
dependencies on the host operating system.

merlin


Re: elegant and effective way for running jobs inside a database

From
Alvaro Herrera
Date:
Excerpts from Dimitri Fontaine's message of mar mar 06 18:44:18 -0300 2012:
> Josh Berkus <josh@agliodbs.com> writes:
> > Activity and discretion beyond that could be defined in PL code,
> > including run/don't run conditions, activities, and dependancies.  The
> > only thing Postgres doesn't currently have is a clock which fires
> > events.  Anything we try to implement which is more complex than the
> > above is going to not work for someone.  And the pg_agent could be
> > adapted easily to use the Postgres clock instead of cron.
>
> Oh, you mean like a ticker?  If only we knew about a project that did
> implement a ticker, in C, using the PostgreSQL licence, and who's using
> it in large scale production.  While at it, if such a ticker could be
> used to implement job queues…
>
>   https://github.com/markokr/skytools/tree/master/sql/ticker

Why do we need a ticker?  Just fetch the time of the task closest in the
future, and sleep till that time or a notify arrives (meaning schedule
change).

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: elegant and effective way for running jobs inside a database

From
Merlin Moncure
Date:
On Tue, Mar 6, 2012 at 4:01 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Why do we need a ticker?  Just fetch the time of the task closest in the
> future, and sleep till that time or a notify arrives (meaning schedule
> change).

Because that can't be done in userland (at least, not without stored
procedures) since you'd have to keep an open running transaction while
sleeping.

merlin


Re: elegant and effective way for running jobs inside a database

From
Alvaro Herrera
Date:
Excerpts from Merlin Moncure's message of mar mar 06 19:07:51 -0300 2012:
>
> On Tue, Mar 6, 2012 at 4:01 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > Why do we need a ticker?  Just fetch the time of the task closest in the
> > future, and sleep till that time or a notify arrives (meaning schedule
> > change).
>
> Because that can't be done in userland (at least, not without stored
> procedures) since you'd have to keep an open running transaction while
> sleeping.

I was thinking that the connection would be kept open but no query would
be running.  Does this preclude reception of notifies?  I mean, you
don't sleep via "SELECT pg_sleep()" but rather a select/poll in the
daemon.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: elegant and effective way for running jobs inside a database

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I was thinking that the connection would be kept open but no query would
> be running.  Does this preclude reception of notifies?  I mean, you
> don't sleep via "SELECT pg_sleep()" but rather a select/poll in the
> daemon.

No.  If you're not inside a transaction, notifies will be sent
immediately.  They'd be pretty useless if they didn't work that way ---
the whole point is for clients not to have to busy-wait.
        regards, tom lane


Re: elegant and effective way for running jobs inside a database

From
Christopher Browne
Date:
On Tue, Mar 6, 2012 at 5:01 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Why do we need a ticker?  Just fetch the time of the task closest in the
> future, and sleep till that time or a notify arrives (meaning schedule
> change).

Keep in mind that cron functionality also includes "batch", which
means that the process needs to have the ability to be woken up by the
need to handle some pressing engagement that comes in suddenly.

For some events to be initiated by a NOTIFY received by a LISTENing
batch processor would be pretty slick...
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: elegant and effective way for running jobs inside a database

From
Andrew Dunstan
Date:

On 03/06/2012 06:12 PM, Christopher Browne wrote:
> On Tue, Mar 6, 2012 at 5:01 PM, Alvaro Herrera
> <alvherre@commandprompt.com>  wrote:
>> Why do we need a ticker?  Just fetch the time of the task closest in the
>> future, and sleep till that time or a notify arrives (meaning schedule
>> change).
> Keep in mind that cron functionality also includes "batch", which
> means that the process needs to have the ability to be woken up by the
> need to handle some pressing engagement that comes in suddenly.
>
> For some events to be initiated by a NOTIFY received by a LISTENing
> batch processor would be pretty slick...

We don't slavishly need to reproduce every piece of cron. In any case, 
on my Linux machine at least, batch is part of the "at" package, not the 
"cron" package. If you want anything at all done, then I'd suggest 
starting with a simple scheduler. Just about he quickest way to get 
something rejected in Postgres is to start with something overly complex 
and baroque.

cheers

andrew


Re: elegant and effective way for running jobs inside a database

From
Daniel Farina
Date:
On Tue, Mar 6, 2012 at 3:31 PM, Andrew Dunstan <andrew@dunslane.net> wrote:>
> We don't slavishly need to reproduce every piece of cron. In any case, on my
> Linux machine at least, batch is part of the "at" package, not the "cron"
> package. If you want anything at all done, then I'd suggest starting with a
> simple scheduler. Just about he quickest way to get something rejected in
> Postgres is to start with something overly complex and baroque.

I sort agree with this, I think.  However, I don't see the need to
have Postgres get involved with scheduling and triggering of jobs at
all.  Rather, it just doesn't have support for what I'd think of as a
"job", period, regardless of how it gets triggered.

The crux of the issue for me is that sometimes it's pretty annoying to
have to maintain a socket connection just to get some things to run
for a while: I can't tell the database "execute stored procedure (not
UDF) 'job' in a new backend, I'm going to disconnect now".

Nearly relatedly, I've heard from at least two people in immediate
memory that would like database sessions to be reified somehow from
their socket, so that they could resume work in a session that had a
connection blip.

At the same time, it would really suck to have an "idle in
transaction" because a client died and didn't bother to reconnect and
clean up...a caveat.

Nevertheless, I think session support (think "GNU screen" or "tmux")
is both useful and painful to accomplish without backend support (for
example, the BackendKey might be useful).  And stored procedures are a
familiar quantity at large...

Thoughts?

-- 
fdr


Re: elegant and effective way for running jobs inside a database

From
Simon Riggs
Date:
On Tue, Mar 6, 2012 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> But having said that, it's not apparent to me why such a thing would
> need to live "inside the database" at all.  It's very easy to visualize
> a task scheduler that runs as a client and requires nothing new from the
> core code.  Approaching the problem that way would let the scheduler
> be an independent project that stands or falls on its own merits.

On Tue, Mar 6, 2012 at 4:36 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

> What such an external scheduler would need from core is support for
> starting up and shutting down along postmaster (as well as restarts at
> appropriate times).  Postmaster already has the ability to start and
> shut down many processes depending on several different policies; I
> think it's mostly a matter of exporting that functionality in a sane way.

Tom's question is exactly on the money, and so is Alvaro's answer.

Many, many people have requested code that "runs in core", but the key
point is that all they actually want are the core features required to
build one. The actual projects actively want to live outside of core.
The "run in core" bit is actually just what Alvaro says, the ability
to interact gracefully for startup and shutdown.

What I think we need is an API like the LWlock add in requests, so we
can have a library that requests it is assigned a daemon to run in,
looking very much like autovacuum launcher, with the guts removed. It
would then be a matter for the code authors as to whether it was a
client program that interacts with server, or whether it was a full
blown daemon like autovacuum.

We talked about this at last year's Dev meeting. And we got
sidetracked into "what we really want is stored procedures". Maybe we
want that, but its a completely separate thing. Please lets not get
distracted from a very simple thing because of the existence of other
requirements.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: elegant and effective way for running jobs inside a database

From
Pavel Stehule
Date:
2012/3/7 Simon Riggs <simon@2ndquadrant.com>:
> On Tue, Mar 6, 2012 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> But having said that, it's not apparent to me why such a thing would
>> need to live "inside the database" at all.  It's very easy to visualize
>> a task scheduler that runs as a client and requires nothing new from the
>> core code.  Approaching the problem that way would let the scheduler
>> be an independent project that stands or falls on its own merits.
>
> On Tue, Mar 6, 2012 at 4:36 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>
>> What such an external scheduler would need from core is support for
>> starting up and shutting down along postmaster (as well as restarts at
>> appropriate times).  Postmaster already has the ability to start and
>> shut down many processes depending on several different policies; I
>> think it's mostly a matter of exporting that functionality in a sane way.
>
> Tom's question is exactly on the money, and so is Alvaro's answer.
>
> Many, many people have requested code that "runs in core", but the key
> point is that all they actually want are the core features required to
> build one. The actual projects actively want to live outside of core.
> The "run in core" bit is actually just what Alvaro says, the ability
> to interact gracefully for startup and shutdown.
>
> What I think we need is an API like the LWlock add in requests, so we
> can have a library that requests it is assigned a daemon to run in,
> looking very much like autovacuum launcher, with the guts removed. It
> would then be a matter for the code authors as to whether it was a
> client program that interacts with server, or whether it was a full
> blown daemon like autovacuum.
>

it is true - first step should be short - and maintaining, assign to
jobs and others can be implemented as extension. There is not
necessary SQL api (other than functions).

Regards

Pavel


> We talked about this at last year's Dev meeting. And we got
> sidetracked into "what we really want is stored procedures". Maybe we
> want that, but its a completely separate thing. Please lets not get
> distracted from a very simple thing because of the existence of other
> requirements.
>
> --
>  Simon Riggs                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: elegant and effective way for running jobs inside a database

From
Alvaro Herrera
Date:
Excerpts from Simon Riggs's message of mié mar 07 05:15:03 -0300 2012:

> We talked about this at last year's Dev meeting. And we got
> sidetracked into "what we really want is stored procedures". Maybe we
> want that, but its a completely separate thing. Please lets not get
> distracted from a very simple thing because of the existence of other
> requirements.

Completely agreed.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: elegant and effective way for running jobs inside a database

From
Merlin Moncure
Date:
On Wed, Mar 7, 2012 at 2:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> We talked about this at last year's Dev meeting. And we got
> sidetracked into "what we really want is stored procedures". Maybe we
> want that, but its a completely separate thing. Please lets not get
> distracted from a very simple thing because of the existence of other
> requirements.

The reason why stored procedures were brought up is because they are
one way to implement an ad hoc scheduler without rewriting cron.
Another (better) way to do that would be to have the postgres expose a
heartbeat ticker that you could layer a scheduler on top of.  These
are minimalist approaches with the intent of providing scaffolding
upon which robust external solutions can be built.  Not having them
forces dependency on the operating system (cron) or an external daemon
like pgqd.  PGQ does exactly this (over the daemon) so that the bulk
of the algorithm can be kept in SQL which is IMNSHO extremely nice.

With a built in heartbeat you can expose a 100% SQL api that user
applications can call without having to maintain a separate process to
drive everything (although you can certainly do that if you wish).
This is exactly what PGQ (which I consider to be an absolute marvel)
does.  So if you want to start small, do that -- it can be used to do
a number of interesting things that aren't really possible at the
moment.

OTOH, if you want to implement a fully fledged out job scheduler
inside of the postmaster, then do that...it's a great solution to the
problem.  But it's a little unfair to dismiss those who are saying:
"If I had stored procedures, this could get done" and conclude that
scheduling through the postmaster is the only way forward.

merlin


Re: elegant and effective way for running jobs inside a database

From
Simon Riggs
Date:
On Wed, Mar 7, 2012 at 7:55 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Mar 7, 2012 at 2:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> We talked about this at last year's Dev meeting. And we got
>> sidetracked into "what we really want is stored procedures". Maybe we
>> want that, but its a completely separate thing. Please lets not get
>> distracted from a very simple thing because of the existence of other
>> requirements.
>
> The reason why stored procedures were brought up is because they are
> one way to implement an ad hoc scheduler without rewriting cron.
> Another (better) way to do that would be to have the postgres expose a
> heartbeat ticker that you could layer a scheduler on top of.  These
> are minimalist approaches with the intent of providing scaffolding
> upon which robust external solutions can be built.  Not having them
> forces dependency on the operating system (cron) or an external daemon
> like pgqd.  PGQ does exactly this (over the daemon) so that the bulk
> of the algorithm can be kept in SQL which is IMNSHO extremely nice.
>
> With a built in heartbeat you can expose a 100% SQL api that user
> applications can call without having to maintain a separate process to
> drive everything (although you can certainly do that if you wish).
> This is exactly what PGQ (which I consider to be an absolute marvel)
> does.  So if you want to start small, do that -- it can be used to do
> a number of interesting things that aren't really possible at the
> moment.
>
> OTOH, if you want to implement a fully fledged out job scheduler
> inside of the postmaster, then do that...it's a great solution to the
> problem.  But it's a little unfair to dismiss those who are saying:
> "If I had stored procedures, this could get done" and conclude that
> scheduling through the postmaster is the only way forward.

It's not the only way, I agree. But we do need a way forwards
otherwise nothing gets done.

The stored procedure route sounds attractive but its a long way off
and doesn't address all of the states needs people have voiced. I'm
not against doing both, I just want to do the quickest and easiest.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: elegant and effective way for running jobs inside a database

From
Merlin Moncure
Date:
On Wed, Mar 7, 2012 at 2:14 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The stored procedure route sounds attractive but its a long way off
> and doesn't address all of the states needs people have voiced. I'm
> not against doing both, I just want to do the quickest and easiest.

sure, I get that, especially in regards to procedures.  a server
ticker though is a pretty small thing and it's fair to ask if maybe
that should be exposed instead of (or perhaps in addition to) a job
scheduling system.

a userland scheduling system has some advantages -- for example it
could be pulled in as an extension.  it would have a very different
feel though since it would be participatory scheduling.  i guess it
really depends on who's writing it and what the objective is (if
anyone is willing to rewrite cron into the postmaster, by all
means...)

merlin


Re: elegant and effective way for running jobs inside a database

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> sure, I get that, especially in regards to procedures.  a server
> ticker though is a pretty small thing and it's fair to ask if maybe
> that should be exposed instead of (or perhaps in addition to) a job
> scheduling system.

I don't want to have a server-side ticker at all, especially not one
that exists only for a client that might or might not be there.  We've
been doing what we can to reduce PG's idle-power consumption, which is
an important consideration for large-data-center applications.  Adding a
new source of periodic wakeups is exactly the wrong direction to be
going.

There is no need for a ticker to drive a job system.  It should be able
to respond to interrupts (if a NOTIFY comes in) and otherwise sleep
until the precalculated time that it next needs to launch a job.
        regards, tom lane


Re: elegant and effective way for running jobs inside a database

From
Kohei KaiGai
Date:
2012/3/6 Alvaro Herrera <alvherre@commandprompt.com>:
> It seems to me that the only thing that needs core support is the
> ability to start up the daemon when postmaster is ready to accept
> queries, and shut the daemon down when postmaster kills backends (either
> because one crashed, or because it's shutting down).
>
+10

Even though it is different from the original requirement, I also would
like to see the feature to run daemon processes managed by extension
according to start/stop of the postmaster.

I'm trying to implement an extension that uses GPU devices to help
calculation of complex qualifiers. CUDA or OpenCL has a limitation
that does not allow a particular number of processes open a device
concurrently.
So, I launches calculation threads that handles all the communication
with GPU devices behalf on the postmaster process, however, it is not
a graceful design, of course.
Each backend communicate with the calculation thread via shared-
memory segment, thus, it should be a child process of postmaster.

So, although my motivation is not something like Cron in core,
it seems to me Alvaro's idea is quite desirable and reasonable,
to be discussed in v9.3.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: elegant and effective way for running jobs inside a database

From
Merlin Moncure
Date:
On Fri, Mar 9, 2012 at 9:36 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> 2012/3/6 Alvaro Herrera <alvherre@commandprompt.com>:
>> It seems to me that the only thing that needs core support is the
>> ability to start up the daemon when postmaster is ready to accept
>> queries, and shut the daemon down when postmaster kills backends (either
>> because one crashed, or because it's shutting down).
>>
> So, although my motivation is not something like Cron in core,
> it seems to me Alvaro's idea is quite desirable and reasonable,
> to be discussed in v9.3.

100% agree  (having re-read the thread and Alvaro's idea having sunk
in).  Being able to set up daemon processes side by side with the
postmaster would fit the bill nicely.  It's pretty interesting to
think of all the places you could go with it.

merlin


Re: elegant and effective way for running jobs inside a database

From
"David E. Wheeler"
Date:
On Mar 9, 2012, at 7:55 AM, Merlin Moncure wrote:

> 100% agree  (having re-read the thread and Alvaro's idea having sunk
> in).  Being able to set up daemon processes side by side with the
> postmaster would fit the bill nicely.  It's pretty interesting to
> think of all the places you could go with it.

pgAgent could use it *right now*. I keep forgetting to restart it after restarting PostgreSQL and finding after a day
orso that no jobs have run. 

Best,

David



Re: elegant and effective way for running jobs inside a database

From
Robert Haas
Date:
On Fri, Mar 9, 2012 at 12:02 PM, David E. Wheeler <david@justatheory.com> wrote:
> On Mar 9, 2012, at 7:55 AM, Merlin Moncure wrote:
>> 100% agree  (having re-read the thread and Alvaro's idea having sunk
>> in).  Being able to set up daemon processes side by side with the
>> postmaster would fit the bill nicely.  It's pretty interesting to
>> think of all the places you could go with it.
>
> pgAgent could use it *right now*. I keep forgetting to restart it after restarting PostgreSQL and finding after a day
orso that no jobs have run. 

That can and should be fixed by teaching pgAgent that failing to
connect to the server, or getting disconnected, is not a fatal error,
but a reason to sleep and retry.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: elegant and effective way for running jobs inside a database

From
Andrew Dunstan
Date:

On 03/09/2012 01:40 PM, Robert Haas wrote:
> On Fri, Mar 9, 2012 at 12:02 PM, David E. Wheeler<david@justatheory.com>  wrote:
>> On Mar 9, 2012, at 7:55 AM, Merlin Moncure wrote:
>>> 100% agree  (having re-read the thread and Alvaro's idea having sunk
>>> in).  Being able to set up daemon processes side by side with the
>>> postmaster would fit the bill nicely.  It's pretty interesting to
>>> think of all the places you could go with it.
>> pgAgent could use it *right now*. I keep forgetting to restart it after restarting PostgreSQL and finding after a
dayor so that no jobs have run.
 
> That can and should be fixed by teaching pgAgent that failing to
> connect to the server, or getting disconnected, is not a fatal error,
> but a reason to sleep and retry.

Yeah. It's still not entirely clear to me what a postmaster-controlled 
daemon is going to be able to do that an external daemon can't.

cheers

andrew



Re: elegant and effective way for running jobs inside a database

From
Simon Riggs
Date:
On Fri, Mar 9, 2012 at 6:51 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> On 03/09/2012 01:40 PM, Robert Haas wrote:
>>
>> On Fri, Mar 9, 2012 at 12:02 PM, David E. Wheeler<david@justatheory.com>
>>  wrote:
>>>
>>> On Mar 9, 2012, at 7:55 AM, Merlin Moncure wrote:
>>>>
>>>> 100% agree  (having re-read the thread and Alvaro's idea having sunk
>>>> in).  Being able to set up daemon processes side by side with the
>>>> postmaster would fit the bill nicely.  It's pretty interesting to
>>>> think of all the places you could go with it.
>>>
>>> pgAgent could use it *right now*. I keep forgetting to restart it after
>>> restarting PostgreSQL and finding after a day or so that no jobs have run.
>>
>> That can and should be fixed by teaching pgAgent that failing to
>> connect to the server, or getting disconnected, is not a fatal error,
>> but a reason to sleep and retry.
>
>
> Yeah. It's still not entirely clear to me what a postmaster-controlled
> daemon is going to be able to do that an external daemon can't.

Start and stop at the same time as postmaster, without any pain.

It's a considerable convenience to be able to design this aspect once
and then have all things linked to the postmaster follow that. It
means people will be able to write code that runs on all OS easily,
without everybody having similar but slightly different code about
starting up, reading parameters, following security rules etc.. Tight
integration, with good usability.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: elegant and effective way for running jobs inside a database

From
Artur Litwinowicz
Date:
W dniu 2012-03-09 16:55, Merlin Moncure pisze:
> On Fri, Mar 9, 2012 at 9:36 AM, Kohei KaiGai <kaigai@kaigai.gr.jp>
> wrote:
>> 2012/3/6 Alvaro Herrera <alvherre@commandprompt.com>:
>>> It seems to me that the only thing that needs core support is
>>> the ability to start up the daemon when postmaster is ready to
>>> accept queries, and shut the daemon down when postmaster kills
>>> backends (either because one crashed, or because it's shutting
>>> down).
>>> 
>> So, although my motivation is not something like Cron in core, it
>> seems to me Alvaro's idea is quite desirable and reasonable, to
>> be discussed in v9.3.
> 
> 100% agree  (having re-read the thread and Alvaro's idea having
> sunk in).  Being able to set up daemon processes side by side with
> the postmaster would fit the bill nicely.  It's pretty interesting
> to think of all the places you could go with it.
> 
> merlin

Good to hear that (I hope that even though English is not my native
language I understand properly posts in this thread). I am convinced,
that all of You will be pround of the new solution like a "heart bit"
for PostgreSQL. May be it is too poetic but considering cron or
pgAgent instead real job manager is like considering defibrillator
instead a real heart. Currently, especially in web applications, the
idea is not where to store a data but how a data can flow and how *fast*.

"It's pretty interesting to think of all the places you could go with
it." - in fact it is :)

Best regards,
Artur


Re: elegant and effective way for running jobs inside a database

From
Andrew Dunstan
Date:

On 03/10/2012 07:11 AM, Simon Riggs wrote:
> On Fri, Mar 9, 2012 at 6:51 PM, Andrew Dunstan<andrew@dunslane.net>  wrote:
>>
>> On 03/09/2012 01:40 PM, Robert Haas wrote:
>>> On Fri, Mar 9, 2012 at 12:02 PM, David E. Wheeler<david@justatheory.com>
>>>   wrote:
>>>> On Mar 9, 2012, at 7:55 AM, Merlin Moncure wrote:
>>>>> 100% agree  (having re-read the thread and Alvaro's idea having sunk
>>>>> in).  Being able to set up daemon processes side by side with the
>>>>> postmaster would fit the bill nicely.  It's pretty interesting to
>>>>> think of all the places you could go with it.
>>>> pgAgent could use it *right now*. I keep forgetting to restart it after
>>>> restarting PostgreSQL and finding after a day or so that no jobs have run.
>>> That can and should be fixed by teaching pgAgent that failing to
>>> connect to the server, or getting disconnected, is not a fatal error,
>>> but a reason to sleep and retry.
>>
>> Yeah. It's still not entirely clear to me what a postmaster-controlled
>> daemon is going to be able to do that an external daemon can't.
> Start and stop at the same time as postmaster, without any pain.
>
> It's a considerable convenience to be able to design this aspect once
> and then have all things linked to the postmaster follow that. It
> means people will be able to write code that runs on all OS easily,
> without everybody having similar but slightly different code about
> starting up, reading parameters, following security rules etc.. Tight
> integration, with good usability.


The devil is in the details, though, pace Mies van der Rohe.

In particular, it's the "tight integration" piece I'm worried about.

What is the postmaster supposed to do if the daemon start fails? What if 
it gets a flood of failures? What access will the daemon have to 
Postgres internals? What OS privileges will it have, since this would 
have to run as the OS postgres user? In general I think we don't want 
arbitrary processes running as the OS postgres user.

I accept that cron might not be the best tool for the jobs, since a) its 
finest granularity is 1 minute and b) it would need a new connection for 
each job. But a well written external daemon that runs as a different 
user and is responsible for making its own connection to the database 
and re-establishing it if necessary, seems to me at least as clean a 
design for a job scheduler as one that is stopped and started by the 
postmaster.

cheers

andrew


Re: elegant and effective way for running jobs inside a database

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> I don't want to have a server-side ticker at all, especially not one
> that exists only for a client that might or might not be there.  We've
> been doing what we can to reduce PG's idle-power consumption, which is
> an important consideration for large-data-center applications.  Adding a
> new source of periodic wakeups is exactly the wrong direction to be
> going.

I would guess that's an opt-in solution, as some other of our subprocess
are, much like autovacuum.

> There is no need for a ticker to drive a job system.  It should be able
> to respond to interrupts (if a NOTIFY comes in) and otherwise sleep
> until the precalculated time that it next needs to launch a job.

I think the ticker was proposed as a minimal component allowing to be
developing the job system as an extension.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: elegant and effective way for running jobs inside a database

From
Simon Riggs
Date:
On Sat, Mar 10, 2012 at 2:59 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

> The devil is in the details, though, pace Mies van der Rohe.
>
> In particular, it's the "tight integration" piece I'm worried about.
>
> What is the postmaster supposed to do if the daemon start fails? What if it
> gets a flood of failures? What access will the daemon have to Postgres
> internals? What OS privileges will it have, since this would have to run as
> the OS postgres user? In general I think we don't want arbitrary processes
> running as the OS postgres user.

So why are the answers to those questions different for a daemon than
for a C function executed from an external client? What additional
exposure is there?

> I accept that cron might not be the best tool for the jobs, since a) its
> finest granularity is 1 minute and b) it would need a new connection for
> each job. But a well written external daemon that runs as a different user
> and is responsible for making its own connection to the database and
> re-establishing it if necessary, seems to me at least as clean a design for
> a job scheduler as one that is stopped and started by the postmaster.

As of this thread, you can see that many people don't agree. Bear in
mind that nobody is trying to prevent you from writing a program in
that way if you believe that. That route will remain available.

It's a key aspect of modular software we're talking about. People want
to have programs that are intimately connected to the database, so
that nobody needs to change the operational instructions when they
start or stop the database.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services