Thread: How to run a task continuously in the background
Hi,
another oracle -> postgreSQL migration question.
I have a database in which a job runs continuously to perform tasks that are inserted into a table in the form of records.
The current Oracle implementation is to use DBMS_JOB to start a job (to be precise, there can be multiple jobs working in parallel) that goes through the following loop:
- is there anything to do?
- Yes -> Do Tasks
- No -> 1 second sleep
- Was the EXIT command sent to me?
- Yes -> Exit loop.
The question now is how do I start in PostgreSQL a background task that meets the following requirements:
- It must continue to run even if the connection in which the task was started is terminated.
- I need to be able to check if a background task exists.
- I need to be able to finish the background task without explicitly sending the EXIT command (see above) to the task.
I've already looked at the pg_background extension, but I don't think it meets either requirement 2 or requirement 3. But maybe I just didn’t look hard enough. And with regard to requirement 1, I'm not sure that's guaranteed.
Any suggestions on how this might be realized. Preferably without an external application. ☺
BR
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
https://www.citusdata.com/blog/2016/09/09/pgcron-run-periodic-jobs-in-postgres/
I _think_ it ticks all three of your boxes.
Steve.
This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.
GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice
pg_cron, perhaps?
https://www.citusdata.com/blog/2016/09/09/pgcron-run-periodic-jobs-in-postgres/
I _think_ it ticks all three of your boxes.
Steve.
This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.
GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice
A cron job will only run once a minute, not wake up every second. But you could write a PHP program that does a one-second sleep before checking if there's something to do, and a batch job that runs periodically to see if the PHP program is running, and if not, launch it.That's how I handle a job that opens a tunnel from our PostgreSQL server to a MySQL server running at AWS so we can synchronize data between those two database servers. It dies periodically for reasons we've never figured out, so every 5 minutes I check to make sure it's running.
pg_cron, perhaps?
https://www.citusdata.com/blog/2016/09/09/pgcron-run-periodic-jobs-in-postgres/
I _think_ it ticks all three of your boxes.
I also checked pg_cron, but this has the disadvantage that it allows starting a job at most once a minute. On the other hand, the job runs as long as there are records to process. And during this time nothing should be queued again.
DIrk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
A cron job will only run once a minute, not wake up every second.
Right, that’s an issue.
But you could write a PHP program that does a one-second sleep before checking if there's something to do, and a batch job that runs periodically to see if the PHP program is running, and if not, launch it.
I would like to avoid external programs if possible. In the current Oracle environment, there are potentially multiple schemas on a server in which processing can be active. And processing can be started, monitored and stopped from a client application. And only for the schema with which the application is connected.
BR
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
A cron job will only run once a minute, not wake up every second.
Right, that’s an issue.
I would like to avoid external programs if possible. In the current Oracle environment, there are potentially multiple schemas on a server in which processing can be active. And processing can be started, monitored and stopped from a client application. And only for the schema with which the application is connected.
BR
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
在 2019年7月12日,下午1:06,Dirk Mika <Dirk.Mika@mikatiming.de> 写道:A cron job will only run once a minute, not wake up every second.Right, that’s an issue.But you could write a PHP program that does a one-second sleep before checking if there's something to do, and a batch job that runs periodically to see if the PHP program is running, and if not, launch it.I would like to avoid external programs if possible. In the current Oracle environment, there are potentially multiple schemas on a server in which processing can be active. And processing can be started, monitored and stopped from a client application. And only for the schema with which the application is connected.BRDirk
--
Dirk Mika
Software Developer<image001_d0a5cf51-5dd3-4ff5-b428-1016a6d95a0f.png>
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
On 12/07/2019 08:08, lilu wrote: > >> 在 2019年7月12日,下午1:06,Dirk Mika <Dirk.Mika@mikatiming.de >> <mailto:Dirk.Mika@mikatiming.de>> 写道: >> >> A cron job will only run once a minute, not wake up every second. >> Right, that’s an issue. >> But you could write a PHP program that does a one-second sleep before >> checking if there's something to do, and a batch job that runs >> periodically to see if the PHP program is running, and if not, launch >> it. >> I would like to avoid external programs if possible. In the current >> Oracle environment, there are potentially multiple schemas on a >> server in which processing can be active. And processing can be >> started, monitored and stopped from a client application. And only >> for the schema with which the application is connected. >> BR >> Dirk But using a specific program that is good at its job is the unix way and better for it imho. If you have a good scheduler why re-write another one into every application? Tim Clarke IT Director Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420 Telephone: Witham: +44(0)1376 503500 | London: +44 (0)20 3009 0853 | Frankfurt: +49 (0)69 7191 6000 | Hong Kong: +852 58031687 | Toronto: +1 647 503 2848 Web: https://www.manifest.co.uk/ Minerva Analytics Ltd - A Solactive Company 9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | United Kingdom ________________________________ Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee youmust not use or disclose such information, instead please report it to admin@minerva.info<mailto:admin@minerva.info> Legal: Minerva Analytics is the trading name of: Minerva Analytics Ltd: Registered in England Number 11260966 & The ManifestVoting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here https://www.manifest.co.uk/legal/for further information.
But using a specific program that is good at its job is the unix way and
better for it imho. If you have a good scheduler why re-write another
one into every application?
Well, the requirement is basically not job scheduling, but the continuous execution of a task that is to be started and stopped and that can work autonomously.
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
You could perhaps tweak the code for pg_cron or pg_agent for a custom solution that supports a finer time resolution?
I might take a look at that. But that would mean that I would have to deal a little with the extension API. ;-)
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
But using a specific program that is good at its job is the unix way and
better for it imho. If you have a good scheduler why re-write another
one into every application?
Well, the requirement is basically not job scheduling, but the continuous execution of a task that is to be started and stopped and that can work autonomously.
Dirk
--
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
--
Attachment
Is runseven(extended version of pgbucket) is what you are looking for ?
runseven is still in beta, still require sometime to make it GA release.
You can refer more information about runseven here.
Run7 looks very promising. The only drawback is that it is not controllable via SQL commands.
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
On Fri, Jul 12, 2019 at 7:06 AM Dirk Mika <Dirk.Mika@mikatiming.de> wrote: > > > > A cron job will only run once a minute, not wake up every second. > > > > I would like to avoid external programs if possible. In the current Oracle environment, there are potentially multipleschemas on a server in which processing can be active. And processing can be started, monitored and stopped froma client application. And only for the schema with which the application is connected. > Creating a background worker that invokes a stored procedure once per second? <https://www.postgresql.org/docs/11/bgworker.html> But this is not so simple to put in place. Otherwise pg_cron with a function that performs a pg_sleep of one second in a loop. Anyway, it seems to me you are better refactoring your solution: it seems you need to process data when _new data_ comes, not once per second, so it sounds to me like a trigger could solve the problem. Luca
Creating a background worker that invokes a stored procedure once per
But this is not so simple to put in place.
It's not really important that the job runs once a second, but that it starts immediately when I want it to.
If I start a job with pg_cron, it will not be executed until the next full minute at the earliest.
Otherwise pg_cron with a function that performs a pg_sleep of one
second in a loop.
Anyway, it seems to me you are better refactoring your solution: it
seems you need to process data when _new data_ comes, not once per
second, so it sounds to me like a trigger could solve the problem.
The processing of the data via a job is deliberately chosen so as to separate the insertion of the data from their processing.
If a trigger were to do this, the transaction in which the data is inserted would take longer. This is not intended.
It is common for many records to be inserted in a short time, but processing takes a little time. The application that inserts the data should however not be slowed down.
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
On Tue, Jul 16, 2019 at 7:32 AM Dirk Mika <Dirk.Mika@mikatiming.de> wrote: > It's not really important that the job runs once a second, but that it starts immediately when I want it to. > > If I start a job with pg_cron, it will not be executed until the next full minute at the earliest. > > The processing of the data via a job is deliberately chosen so as to separate the insertion of the data from their processing. So, as far as I understand, you want asynchronously processing data with a process that can be started manually and/or periodically. I'm probably unable to see what is the goal, but I would go for a combined solution: 1) a trigger that notifies an external process <https://www.postgresql.org/docs/current/sql-notify.html> 2) the process runs when notified (by the trigger) or when started manually or when started by pg_cron (one per minute). Of course the process is "internal", so something like a stored procedure (at least as entry point). The problem with such solution is about race conditions (what if you manually start something that is already running?), but I guess you had this problem on the oracle side too. Hope this helps. Luca
We used a trigger that called pg_notify (https://www.postgresql.org/docs/9.5/sql-notify.html) and then had another
process that LISTENed for notifications.
Sent: Tuesday, July 16, 2019 1:32 AM
To: pgsql-general@lists.postgresql.org
Subject: Re: How to run a task continuously in the background
Creating a background worker that invokes a stored procedure once per
But this is not so simple to put in place.
It's not really important that the job runs once a second, but that it starts immediately when I want it to.
If I start a job with pg_cron, it will not be executed until the next full minute at the earliest.
Otherwise pg_cron with a function that performs a pg_sleep of one
second in a loop.
Anyway, it seems to me you are better refactoring your solution: it
seems you need to process data when _new data_ comes, not once per
second, so it sounds to me like a trigger could solve the problem.
The processing of the data via a job is deliberately chosen so as to separate the insertion of the data from their processing.
If a trigger were to do this, the transaction in which the data is inserted would take longer. This is not intended.
It is common for many records to be inserted in a short time, but processing takes a little time. The application that inserts the data should however not be slowed down.
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
We used a trigger that called pg_notify (https://www.postgresql.org/docs/9.5/sql-notify.html) and then had another
process that LISTENed for notifications.
What kind of process is this? I'm assuming that this is an application written in C.
The advantage of LISTEN / NOTIFY is only that the process which should process data does not have to do polling, but is notified when there is something to do.
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
So, as far as I understand, you want asynchronously processing data
with a process that can be started manually and/or periodically.
The process should be started manually and then run until it is stopped by a stop_job() function call. In Oracle there is a package which contains the functions engine.start_job() and engine.stop_job(). :-)
I only mentioned “periodic” because the current Oracle implementation polls to see if there is something to process and goes to sleep for a second when there is no data to process.
I'm probably unable to see what is the goal, but I would go for a
combined solution:
1) a trigger that notifies an external process
2) the process runs when notified (by the trigger) or when started
manually or when started by pg_cron (one per minute).
That is basically still my main question. How do I start a background job (e.g. a function) which waits by polling or LISTEN / NOTIFY for records in a table to be processed.
Of course the process is "internal", so something like a stored
procedure (at least as entry point).
The problem with such solution is about race conditions (what if you
manually start something that is already running?), but I guess you
had this problem on the oracle side too.
In the Oracle world, there is a maximum of one job that takes care of processing.
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
Also, the NOTIFY wouldn’t significantly extend the lifetime or impact of the trigger.On Jul 17, 2019, at 1:26 AM, Dirk Mika <Dirk.Mika@mikatiming.de> wrote:We used a trigger that called pg_notify (https://www.postgresql.org/docs/9.5/sql-notify.html) and then had another
process that LISTENed for notifications.
What kind of process is this? I'm assuming that this is an application written in C.
The advantage of LISTEN / NOTIFY is only that the process which should process data does not have to do polling, but is notified when there is something to do.
Dirk
--
Dirk Mika
Software Developer<image001_d0a5cf51-5dd3-4ff5-b428-1016a6d95a0f.png>
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
On Wed, Jul 17, 2019 at 9:38 AM Dirk Mika <Dirk.Mika@mikatiming.de> wrote: > That is basically still my main question. How do I start a background job (e.g. a function) which waits by polling or LISTEN/ NOTIFY for records in a table to be processed. You will have a trigger that, once new tuples are created (or older update and so on) issues a NOTIFY. Somewhere (within PostgreSQL or outside it) there will be a process that issued a LISTEN and is locked until a notify comes in. Then it does process whatever you need to do. As an example your trigger function will be something like <https://github.com/fluca1978/PostgreSQL-11-Quick-Start-Guide/blob/master/Chapter09/Chapter09_Listing05.sql> and your listening process will be something like <https://github.com/fluca1978/PostgreSQL-11-Quick-Start-Guide/blob/master/Chapter09/Chapter09_Listing09.perl>. This makes your processing fully asynchronous, and with some tune allows you to decide the start/stop/resume policy as you need/wish. Besides, it is quite hard for me to get to the point where you need to check for new data every second, and therefore why you are not satisfied with pg_cron or stuff like that. Luca
You will have a trigger that, once new tuples are created (or older
update and so on) issues a NOTIFY.
Somewhere (within PostgreSQL or outside it) there will be a process
that issued a LISTEN and is locked until a notify comes in. Then it
does process whatever you need to do.
As an example your trigger function will be something like
and your listening process will be something like
Thanks for the examples. I’ll look into them.
This makes your processing fully asynchronous, and with some tune
allows you to decide the start/stop/resume policy as you need/wish.
Besides, it is quite hard for me to get to the point where you need to
check for new data every second, and therefore why you are not
satisfied with pg_cron or stuff like that.
pg_cron doesn’t start the task instantly and queues subsequent runs, if the task is still running. I just need to start the task once and it should keep running until stopped / killed.
Maybe I'll have to rephrase it.
Suppose I have a procedure and want to start it without the client where I start the procedure waiting for it to finish. And I want the procedure to continue even if the client that started it quits.
And I want to be able to check if the procedure is still running.
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
Gerard Weatherby | Application Architect
NMRbox | Department of Molecular Biology and Biophysics | UConn Health
263 Farmington Avenue, Farmington, CT 06030-6406
Phone: 860 679 8484
uchc.edu
On Jul 17, 2019, at 5:57 AM, Dirk Mika <Dirk.Mika@mikatiming.de> wrote:You will have a trigger that, once new tuples are created (or olderupdate and so on) issues a NOTIFY.Somewhere (within PostgreSQL or outside it) there will be a processthat issued a LISTEN and is locked until a notify comes in. Then itdoes process whatever you need to do.As an example your trigger function will be something likeand your listening process will be something likeThanks for the examples. I’ll look into them.This makes your processing fully asynchronous, and with some tuneallows you to decide the start/stop/resume policy as you need/wish.Besides, it is quite hard for me to get to the point where you need tocheck for new data every second, and therefore why you are notsatisfied with pg_cron or stuff like that.pg_cron doesn’t start the task instantly and queues subsequent runs, if the task is still running. I just need to start the task once and it should keep running until stopped / killed.Maybe I'll have to rephrase it.Suppose I have a procedure and want to start it without the client where I start the procedure waiting for it to finish. And I want the procedure to continue even if the client that started it quits.And I want to be able to check if the procedure is still running.Dirk
--
Dirk Mika
Software Developer<image001_d0a5cf51-5dd3-4ff5-b428-1016a6d95a0f.png>
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
On 17/07/19, Luca Ferrari (fluca1978@gmail.com) wrote: > On Wed, Jul 17, 2019 at 9:38 AM Dirk Mika <Dirk.Mika@mikatiming.de> wrote: > > That is basically still my main question. How do I start a > > background job (e.g. a function) which waits by polling or LISTEN / > > NOTIFY for records in a table to be processed. > > You will have a trigger that, once new tuples are created (or older > update and so on) issues a NOTIFY. > Somewhere (within PostgreSQL or outside it) there will be a process > that issued a LISTEN and is locked until a notify comes in. Then it > does process whatever you need to do. We make extensive use of postgresql 'contacting' an external process, but our use case involves many databases in a cluster rather than many schemas. Also we don't have to deal with cancelling the external process. We chose this architecture to avoid many postgresql connections for LISTEN/NOTIFY. We use a pgmemcache interface trigger to update memcached with a notification of the database holding items to be flushed. A python looping process running under supervisord checks memcache for items to be dealt with, and does so in a serial manner. For the use case mentioned a per-schema process may be required or a sub-process/thread created, which could check perhaps for memcache to signal cancellation of processing for the schema. I guess one might then have thread locking/cancellation issues to resolve. Rory
On 2019-07-17 12:34:41 +0100, Rory Campbell-Lange wrote: > We make extensive use of postgresql 'contacting' an external process, > but our use case involves many databases in a cluster rather than many > schemas. Also we don't have to deal with cancelling the external > process. We chose this architecture to avoid many postgresql > connections for LISTEN/NOTIFY. > > We use a pgmemcache interface trigger to update memcached with a > notification of the database holding items to be flushed. Memcached is a cache. It will delete old items if storage is full (or if they expire). Is this not a problem in your case or did you make sure that this cannot happen? hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment
On 19/07/19, Peter J. Holzer (hjp-pgsql@hjp.at) wrote: > On 2019-07-17 12:34:41 +0100, Rory Campbell-Lange wrote: > > We make extensive use of postgresql 'contacting' an external process, > > but our use case involves many databases in a cluster rather than many > > schemas. Also we don't have to deal with cancelling the external > > process. We chose this architecture to avoid many postgresql > > connections for LISTEN/NOTIFY. > > > > We use a pgmemcache interface trigger to update memcached with a > > notification of the database holding items to be flushed. > > Memcached is a cache. It will delete old items if storage is full (or if > they expire). Is this not a problem in your case or did you make sure > that this cannot happen? We have a few hundred databases in a cluster, and the notifications are simply the database name and the value "1" when the database queue requires processing. With only the database name as key memory use memcached is bounded. There is a potential race condition between the queue processor processing the queue before setting the value to "0" and more queue items landing, but that is not an issue for our use case. Rory