Thread: Lock/deadlock issues with priority queue in Postgres - possible VACUUM conflicts

Lock/deadlock issues with priority queue in Postgres - possible VACUUM conflicts

From
Chris Angelico
Date:
This has got to be a solved problem, but I can't find a simple example
of what to do, nor a description of what I'm doing wrong.

In this project, we need to have a disk-based priority queue, with a
few snazzy features (eg limited retries and scheduling) but nothing
particularly exotic. Requests are added to the queue by any number of
different processes, and it's not a problem if a request "misses the
check" and is delayed by a few seconds. Requests are processed by a
pool of workers which are unaware of each other (and may be running on
different computers); the sole arbiter is the database itself.

Everything's currently on a single Debian 6.0.3 laptop, with Postgres
(the .deb package from openscg.org) set to max_connections = 5100, and
the kernel reconfigured to allow 1GB shared memory and "500 32000 32
1024" semaphores. 'select version()' gives:
PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

The current (PHP) implementation involves a 'claimed' column (NULL for
unclaimed, otherwise is an integer worker id - this allows for the
detection of crashed workers) and the following query:

"update interfaceout set
claimed=$workerid,lastretry=now(),retrycount=retrycount+1 where
claimed is null and id=(select id from interfaceout where
        claimed is null
        and not done
        and scheduletime<=now()
        order by priority,random() limit 1) returning *"

This query is performed (in autocommit mode), then some (currently
minimal) processing is done, and then another query is done to "close
off" the call:

"update interfaceout set claimed=null,done=$1,response=$2,success=$3
where id=$4"
with parameters set according to whether or not the call was completed
(if not it gets retried), and the results of the processing. Another
process is then signalled to handle the response, and that process
will then delete the row.

With one single worker process, this works fine; but with 16K calls
and 50 workers (not ridiculously huge numbers by any means), lock
contention becomes a major issue, and there are even occasional
deadlocks. Performance plummets. I'm not entirely sure, but it seems
that problems start happening when autovacuum kicks in.

Monitoring the pg_locks table (joining with pg_stat_activity to see
current query in progress) shows that the time is all being spent on
the first query, attempting to set claimed; the other queries almost
never show up in the list.

Researching the issue brought up a couple of good links:
http://stackoverflow.com/questions/389541/select-unlocked-row-in-postgresql
http://blog.endpoint.com/2008/12/parallel-inventory-access-using.html
http://wiki.postgresql.org/wiki/PGQ_Tutorial

PGQ looks promising, but I can't afford the risk of losing calls in
the event that there are no workers to process them (the correct
action is for them simply to languish in the database until one is
started up).

The StackOverflow thread lists a number of excellent ideas, most of
which are already implemented in the above code, or are unnecessary
(for instance, each worker needs only acquire one row - no need to
grab the top N). There's some interesting stuff on advisory locks; is
this the only way to do this safely?

This is surely something that has been done before. Can anyone point
me to a good tutorial on the subject?

Thanks!

Chris Angelico

On Mon, Jan 9, 2012 at 2:58 PM, Chris Angelico <rosuav@gmail.com> wrote:
> In this project, we need to have a disk-based priority queue, with a
> few snazzy features (eg limited retries and scheduling) but nothing
> particularly exotic. Requests are added to the queue by any number of
> different processes, and it's not a problem if a request "misses the
> check" and is delayed by a few seconds. Requests are processed by a
> pool of workers which are unaware of each other (and may be running on
> different computers); the sole arbiter is the database itself.

Since posting this, I've researched advisory locks and ended up going
with a quite different model. In case anyone else wants to build a
system of this nature, here's what we're currently using.

The system is optimized, at any given time, for a particular number of
worker processes. For discussion purposes, let's say 8. (This number
is stored in the database.) Each worker queries the pg_locks table to
find an available slot, then claims it using pg_try_advisory_lock() -
the query is to avoid spamming the lock function unnecessarily (a
sequential scan would also work). The slots are numbered 0-7, and
represent partitions of the available work, using the sequential ID
modulo 8.

This is the biggest difference. Instead of marking individual calls as
'claimed', then processing them, then de-claiming them at the end, the
semaphoring is done on the entire partition and using advisory locks -
cutting MVCC spam 50% in the process. This technique does have a
vulnerability, however; if a worker process terminates (or the whole
computer it's running on dies), one eighth of the work will simply
never happen. This is dealt with out-of-band by having another process
"keep an eye on" the pg_locks table, checking that there are always 8
advisory locks in the appropriate section.

Performance is excellent, due in part to an index involving (id%8) -
this index is rebuilt by script any time the partition count is
changed, which is a significant administrative change involving a
shutdown of all workers.

To anyone who has need of unusual locking techniques, may I commend
advisory locks to your notice. They're high performance and come with
automatic unlocking on disconnect (which was an issue with the
original "update set claimed=N" technique - some process would have to
detect that a worker had gone, and reset the claimed marker for that
row). An excellent feature.

Chris Angelico

On Mon, Jan 9, 2012 at 5:58 AM, Chris Angelico <rosuav@gmail.com> wrote:
> http://wiki.postgresql.org/wiki/PGQ_Tutorial
>
> PGQ looks promising, but I can't afford the risk of losing calls in
> the event that there are no workers to process them (the correct
> action is for them simply to languish in the database until one is
> started up).

PGQ does not lose events - after consumer registers
on the queue it is guaranteed to see all events.

So it's a matter of registering your consumers
before anything interesting happens in database.
The actual consumers do not need to be running
at that moment.

--
marko

On Tue, Jan 31, 2012 at 4:12 AM, Marko Kreen <markokr@gmail.com> wrote:
> On Mon, Jan 9, 2012 at 5:58 AM, Chris Angelico <rosuav@gmail.com> wrote:
>> http://wiki.postgresql.org/wiki/PGQ_Tutorial
>>
>> PGQ looks promising, but I can't afford the risk of losing calls in
>> the event that there are no workers to process them (the correct
>> action is for them simply to languish in the database until one is
>> started up).
> PGQ does not lose events - after consumer registers
> on the queue it is guaranteed to see all events.
>
> So it's a matter of registering your consumers
> before anything interesting happens in database.
> The actual consumers do not need to be running
> at that moment.

Ah, I think I understand. So registering a consumer simply means
registering its textual name. I was under the impression that it
registered the session/connection it was on. PGQ may still be
unsuitable (it's more geared toward replication than a shared-workload
scenario), but that's my primary concern solved.

ChrisA

On Tue, Jan 31, 2012 at 08:17:57AM +1100, Chris Angelico wrote:
> On Tue, Jan 31, 2012 at 4:12 AM, Marko Kreen <markokr@gmail.com> wrote:
> > On Mon, Jan 9, 2012 at 5:58 AM, Chris Angelico <rosuav@gmail.com> wrote:
> >> http://wiki.postgresql.org/wiki/PGQ_Tutorial
> >>
> >> PGQ looks promising, but I can't afford the risk of losing calls in
> >> the event that there are no workers to process them (the correct
> >> action is for them simply to languish in the database until one is
> >> started up).
> > PGQ does not lose events - after consumer registers
> > on the queue it is guaranteed to see all events.
> >
> > So it's a matter of registering your consumers
> > before anything interesting happens in database.
> > The actual consumers do not need to be running
> > at that moment.
>
> Ah, I think I understand. So registering a consumer simply means
> registering its textual name.

Exactly.

> I was under the impression that it
> registered the session/connection it was on. PGQ may still be
> unsuitable (it's more geared toward replication than a shared-workload
> scenario), but that's my primary concern solved.

Calling it replication-specific is wrong, there is nothing replication
specific there.  But it *is* geared towards batch-processing in a
situtation where few seconds of latencty is OK.  Idea being that
if you are getting >100 events/sec, you cannot care about subsecond
latency anymore.

If that does not describe your use-case - eg. you need to process single
event at a time - then yeah, pgq is wrong tool.

Note that we do have shared processing framework (pgq_coop), but that
also is geared towards batch-processing - it shares batches between
workers, not events.

--
marko