Thread: Delete from locking ordering differences
We are wondering if anyone can explain the difference we are having in production with the following queries:
DELETE FROM store
WHERE id IN (
SELECT id FROM store
FOR UPDATE SKIP LOCKED
ORDER BY ID
LIMIT 1000
)
RETURNING id, payload
This query is sometimes executed with high concurrency, and then can hang indefinitely, we assume because of a locking problem that postgresql is not detecting as a deadlock.
This alternative query does not have the hanging problem:
WITH store_ids AS (
SELECT id FROM store
FOR UPDATE SKIP LOCKED
ORDER BY ID
LIMIT 1000
)
Can anyone explain why the first query is expected to fail (hang), and the second query does not have this problem? We would be interested in more understanding on this. Thanks!
On 9/23/22 01:31, Peter Hendriks wrote:
We are wondering if anyone can explain the difference we are having in production with the following queries:DELETE FROM storeWHERE id IN (SELECT id FROM storeFOR UPDATE SKIP LOCKEDORDER BY IDLIMIT 1000)RETURNING id, payloadThis query is sometimes executed with high concurrency, and then can hang indefinitely, we assume because of a locking problem that postgresql is not detecting as a deadlock.This alternative query does not have the hanging problem:WITH store_ids AS (SELECT id FROM storeFOR UPDATE SKIP LOCKEDORDER BY IDLIMIT 1000)Can anyone explain why the first query is expected to fail (hang), and the second query does not have this problem? We would be interested in more understanding on this. Thanks!
The "optimizatin fence" nature of CTEs appears to be a win in this case. Why the "order by"? I assume these are down within a transaction?
Yes, we do not really care about order, but to prevent locking issues. Multiple transactions may run this query at the same time. It should never contend because of the skip locked, so maybe we should try this without an order by too. We can not get this query to fail in test so far, just in production, wo we are bit hesitant to change now that it is finally working... Our guess is to why this makes such a difference is that the delete statement in postgres does not guarantee ordering, so maybe the optimizer makes different choices than expected. Asking the question here, so maybe someone with more understanding can explain why we need the CTE.
Op vr 23 sep. 2022 om 09:31 schreef Peter Hendriks <peter@mindloops.nl>:
We are wondering if anyone can explain the difference we are having in production with the following queries:DELETE FROM storeWHERE id IN (SELECT id FROM storeFOR UPDATE SKIP LOCKEDORDER BY IDLIMIT 1000)RETURNING id, payloadThis query is sometimes executed with high concurrency, and then can hang indefinitely, we assume because of a locking problem that postgresql is not detecting as a deadlock.This alternative query does not have the hanging problem:WITH store_ids AS (SELECT id FROM storeFOR UPDATE SKIP LOCKEDORDER BY IDLIMIT 1000)Can anyone explain why the first query is expected to fail (hang), and the second query does not have this problem? We would be interested in more understanding on this. Thanks!
Met vriendelijke groet,
Peter Hendriks