Thread: Possible bug with SKIP LOCKED behaviour
Hello everyone
I believe I've run into a bug in the behaviour of SKIP LOCKED, where I have a program that implements a queue with concurrent workers SELECTing work from some shared tables.
The code in question does a LEFT JOIN across two tables with a FOR UPDATE on the left table and a SKIP LOCKED clause, and then UPDATEs or INSERTs rows into the table on right side of the JOIN in a way that leads to subsequent executions of the same query to no longer match those rows. However, when run concurrently I'm seeing the same row be selected by multiple workers - which shouldn't be possible based on my understanding of the relevant semantics of these operations. Perhaps I'm just holding it wrong, but I would have expected the FOR UPDATE lock on the left table to be sufficient to avoid overlapping results.
I have extracted a fairly minimal reproducing case from our production code, which includes some Go code as a test harness to run the queries concurrently enough to demonstrate the problem - this can be found at https://github.com/glenjamin/postgres-skip-locked-surprise
I wasn't sure how much detail from that reproducing case to repeat in this email, so I've only gone with an outline of the observed and expected behaviour - but I can try and add more detail to this thread if desired
Cheers
Glen
Hi,
On Sep 29, 2022, 00:56 +0800, Glen Mailer <glen@geckoboard.com>, wrote:
On Sep 29, 2022, 00:56 +0800, Glen Mailer <glen@geckoboard.com>, wrote:
Hello everyone
I believe I've run into a bug in the behaviour of SKIP LOCKED, where I have a program that implements a queue with concurrent workers SELECTing work from some shared tables.
The code in question does a LEFT JOIN across two tables with a FOR UPDATE on the left table and a SKIP LOCKED clause, and then UPDATEs or INSERTs rows into the table on right side of the JOIN in a way that leads to subsequent executions of the same query to no longer match those rows. However, when run concurrently I'm seeing the same row be selected by multiple workers - which shouldn't be possible based on my understanding of the relevant semantics of these operations. Perhaps I'm just holding it wrong, but I would have expected the FOR UPDATE lock on the left table to be sufficient to avoid overlapping results.
I have extracted a fairly minimal reproducing case from our production code, which includes some Go code as a test harness to run the queries concurrently enough to demonstrate the problem - this can be found at https://github.com/glenjamin/postgres-skip-locked-surprise
I wasn't sure how much detail from that reproducing case to repeat in this email, so I've only gone with an outline of the observed and expected behaviour - but I can try and add more detail to this thread if desired
Cheers
Glen
According to doc:
With
With
SKIP LOCKED
, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table.this can be found at https://github.com/glenjamin/postgres-skip-locked-surprise
And a golang script is not convenient for hackers to reproduce. Could you provide some steps to produce the bug stably if it really was ?
Regards,
Zhang Mingli
Hello
WithSKIP LOCKED
, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table.
Yes, I am specifically aiming to avoid lock contention with multiple consumers accessing a queue-like table, and I'm seeing the same row being retrieved my multiple workers
And a golang script is not convenient for hackers to reproduce. Could you provide some steps to produce the bug stably if it really was ?
Reproducing requires running a transaction with queries dependent on the results of earlier queries, and then running a number of these transactions concurrently, and then repeating the test until the unexpected result happens. Currently I'm doing 20 concurrent transactions, and I find that if I repeat the test 100 times I tend to get between zero and 3 failures.
What would be a more convenient way for me to provide this for reproduction?
Thanks
Glen
On Thu, 29 Sept 2022 at 03:41, Zhang Mingli <zmlpostgres@gmail.com> wrote:
Hello everyone
I believe I've run into a bug in the behaviour of SKIP LOCKED, where I have a program that implements a queue with concurrent workers SELECTing work from some shared tables.
The code in question does a LEFT JOIN across two tables with a FOR UPDATE on the left table and a SKIP LOCKED clause, and then UPDATEs or INSERTs rows into the table on right side of the JOIN in a way that leads to subsequent executions of the same query to no longer match those rows. However, when run concurrently I'm seeing the same row be selected by multiple workers - which shouldn't be possible based on my understanding of the relevant semantics of these operations. Perhaps I'm just holding it wrong, but I would have expected the FOR UPDATE lock on the left table to be sufficient to avoid overlapping results.
I have extracted a fairly minimal reproducing case from our production code, which includes some Go code as a test harness to run the queries concurrently enough to demonstrate the problem - this can be found at https://github.com/glenjamin/postgres-skip-locked-surprise
I wasn't sure how much detail from that reproducing case to repeat in this email, so I've only gone with an outline of the observed and expected behaviour - but I can try and add more detail to this thread if desired
Cheers
GlenAccording to doc:
WithSKIP LOCKED
, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table.this can be found at https://github.com/glenjamin/postgres-skip-locked-surpriseAnd a golang script is not convenient for hackers to reproduce. Could you provide some steps to produce the bug stably if it really was ?Regards,Zhang Mingli