Re: Re: Postgresql the right tool (queue using advisory_locks + long transactions) - Mailing list pgsql-general

From Dorian Hoxha
Subject Re: Re: Postgresql the right tool (queue using advisory_locks + long transactions)
Date
Msg-id CANsFX07-zQWTppJpTUVZOno+Wp3GfUA4tdvVe8g-wRYoo8kG0A@mail.gmail.com
Whole thread Raw
In response to Re: Postgresql the right tool (queue using advisory_locks + long transactions)  (David G Johnston <david.g.johnston@gmail.com>)
List pgsql-general
I'll probably ask @pgbouncer mailing list if i can use it with advisory_locks per session. If not, even raw sessions will be enough.
Some comments inline.
Thanks

On Sun, Apr 27, 2014 at 10:07 PM, David G Johnston <david.g.johnston@gmail.com> wrote:
Dorian Hoxha wrote
> Hi list,
>
> I am trying to use postgresql as a queue for long-jobs (max ~4 hours)
> using
> advisory_locks. I can't separate the long-job into sub-jobs.
>
>
>    1. At ultimate-best-case scenario there will be ~100 workers, so no
>    web-scale performance required.
>    Is there a problem with 100 open sessions (behind a connection pooler?)
>    for hours like in my case?
>    The session will commit the progress of the job, but will stay opened
>    and hold the advisory_lock till the job is done or it expires (look 2).
>    2. Is it possible to set a time limit to auto-close an opened session
>    that hasn't made a query in a while ?
>    So if a worker fails,and the session time-outs, postgresql/pgbouncer
>    will close the session and release the lock ?

Do you understand the difference between a session and a transaction?  Your
subject and body indicate either you don't or have a thinko/typo somewhere?
Long-running transactions (idle or otherwise) are a major problem while
long-running idle sessions are simply marginally wasteful (in moderation).

I'm talking about session advisory locks. That's why i choose advisory instead of "SELECT FOR SHARE",a shared row lock because that looks like it must be inside a transaction, from reading the docs.
I'll keep the session open for the duration of the hole job, and create/commit transactions to update the progress.
And in the end close the connection and the lock will be released.
Why not just update the job as dispatched when it is being worked on and
then completed when finished?  You still would need to handle partial
completion somehow but this way you don't waste live resources during the
long waiting period simply monitoring a lock.  Though probably at your
volume this is not that big an issue.

PostgreSQL proper does not have session timeouts that I am aware of.  If a
worker fails it should release its connection and all advisory locks would
be released...
 
I thought that if a worker failed the session was left open. So maybe i don't need the session-timeout thing.
You reference to a connection pooler in the above doesn't make sense to me;
you'd need persistent connections for this to work (they can be from a pool
but the pool size would have to be 100+).

The main decision is whether job process state is part of your data model or
simply an artifact.  I'd suggest that it should be part of the model so
state should be managed directly thorough row-level locking and status
fields instead of indirectly through temporary system advisory locks.
 
I maintain job-status and job-progress on the same row.
But i also want that users can cancel the job, by updating 'canceled' column. So everytime i report the progress, i also "RETURNING canceled" to see if the job has been canceled by the user and abort it.
If you want to keep track of active workers you should setup some kind of
heartbeat update query; and maybe depending on how important this is attach
advisory lock info to that heartbeat record so a monitoring process can
check both the pulse table and the system advisory lock for a two data point
confirmation of activity.

Will probably do it like this.
David J.





--
View this message in context: http://postgresql.1045698.n5.nabble.com/Postgresql-the-right-tool-queue-using-advisory-locks-long-transactions-tp5801667p5801670.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Hannes Erven
Date:
Subject: Re: Postgresql the right tool (queue using advisory_locks + long transactions)
Next
From: John R Pierce
Date:
Subject: Re: Postgresql the right tool (queue using advisory_locks + long transactions)