Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"? - Mailing list pgsql-performance

From Gunther Schadow
Subject Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?
Date
Msg-id 4f379a07-c49b-14dd-ddba-e0aaf37235d5@pragmaticdata.com
Whole thread Raw
Responses Re: Is there a known bug with SKIP LOCKED and "tuple to be locked was already moved to another partition due to concurrent update"?  (Jim Jarvie <jim@talentstack.to>)
List pgsql-performance
Hi all,

long time ago I devised with your help a task queuing system which uses 
SELECT ... FOR UPDATE SKIP LOCKED for many parallel workers to find 
tasks in the queue, and it used a partitioned table where the hot part 
of the queue is short and so the query for a job is quick and the skip 
locked locking makes sure that one job is only assigned to one worker. 
And this works pretty well for me, except that when we run many workers 
we find a lot of these failures occurring:

"tuple to be locked was already moved to another partition due to 
concurrent update"

This would not exactly look like a bug, because the message says "to be 
locked", so at least it's not allowing two workers to lock the same 
tuple. But it seems that the skip-locked mode should not make an error 
out of this, but treat it as the tuple was already locked. Why would it 
want to lock the tuple (representing the job) if another worker has 
already finished his UPDATE of the job to mark it as "done" (which is 
what makes the tuple move to the "completed" partition.)

Either the SELECT for jobs to do returned a wrong tuple, which was 
already update, or there is some lapse in the locking.

Either way it would seem to be a waste of time throwing all these errors 
when the tuple should not even have been selected for update and locking.

I wonder if anybody knows anything about that issue? Of course you'll 
want to see the DDL and SQL queries, etc. but you can't really try it 
out unless you do some massively parallel magic. So I figured I just ask.

regards,
-Gunther




pgsql-performance by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Recommended value for pg_test_fsync
Next
From: Jeff Janes
Date:
Subject: Re: Recommended value for pg_test_fsync