Thread: BUG #15026: Deadlock using GIST index
The following bug has been logged on the website: Bug reference: 15026 Logged by: Mark Scheffer Email address: pg2401k@pinkwin.com PostgreSQL version: 10.1 Operating system: SLES 12 sp3 Description: Following (fabricated) example shows there is a path in the GIST code that leads to deadlocks: -- Session 1: DROP TABLE IF EXISTS locked; CREATE TABLE locked ( key text NOT NULL, EXCLUDE USING gist (key WITH =) ); BEGIN; INSERT INTO locked(key) values('a'); -- Session 2: INSERT INTO locked(key) values('a'); --> lock wait -- Session 3: INSERT INTO locked(key) values('a'); --> lock wait -- Session 1: ROLLBACK; --> Session 3 generates deadlock: ERROR: deadlock detected DETAIL: Process 16079 waits for ShareLock on transaction 1420618256; blocked by process 8594. Process 8594 waits for ShareLock on transaction 1420618257; blocked by process 16079. HINT: See server log for query details. CONTEXT: while checking exclusion constraint on tuple (0,2) in relation "locked" Note: Replacing the GIST with a unique index does not produce this error. Note: Setting deadlock_timeout to some high value will help analysis. Version: 10.1-9.1 from OpenSuse repository "PostgreSQL 10.1 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.8.5, 64-bit" Regards, Mark.
Hi, Digging deeper into source code of backend/executor/execIndexing.c, I found following: * Exclusion Constraints * --------------------- ... * * There is a chance of deadlock, if two backends insert a tuple at the * same time, and then perform the scan to check for conflicts. They will * find each other's tuple, and both try to wait for each other. The * deadlock detector will detect that, and abort one of the transactions. * That's fairly harmless, as one of them was bound to abort with a * "duplicate key error" anyway, although you get a different error * message. I guess that my example has this deadlock... I'm not so happy with "That's fairly harmless". In my case I'm processing messages from several sessions at a rate of more than 1000 messages per second per session. With default deadlock timeout of 1 second, at least two sessions may block and this may have impact on other sessions which may cascade into a lot of sessions being in a dead-lock. I had a situation where throughput dropped to 6 messages per second.... Very simple example of this problem: 1. lookup a record for entity E using exclusion constraint 2. if it does not exist: insert the record 3. If this insert fails, back to 1. If n sessions try to execute this in parallel, the wait time will be n-1 seconds. Any chance in changing current Postgresql behavior? Regards, Mark
On Sat, Feb 3, 2018 at 2:02 AM, Mark Scheffer <pg2401k@pinkwin.com> wrote: > * Exclusion Constraints > * --------------------- > ... > * > * There is a chance of deadlock, if two backends insert a tuple at the > * same time, and then perform the scan to check for conflicts. They will > * find each other's tuple, and both try to wait for each other. The > * deadlock detector will detect that, and abort one of the transactions. > * That's fairly harmless, as one of them was bound to abort with a > * "duplicate key error" anyway, although you get a different error > * message. > > I guess that my example has this deadlock... > Any chance in changing current Postgresql behavior? While I think that your complaint is a legitimate one, I'd say that it's very unlikely to be fixed. Ordinary (non-deferrable) unique constraints don't have this problem because they pessimistically lock the first leaf page the value could be on ahead of inserting. In contrast, exclusion constraints generally optimistically detect any conflict in a separate pass. That's how you can get concurrent insertions to wait on each other, rather than having a clear "winner" at the precise point that insertion occurs. I can't imagine what it would take to avoid this issue with GiST, but I'm sure it would be very difficult. -- Peter Geoghegan
Peter Geoghegan-4 wrote > While I think that your complaint is a legitimate one, I'd say that > it's very unlikely to be fixed. Ordinary (non-deferrable) unique > constraints don't have this problem because they pessimistically lock > the first leaf page the value could be on ahead of inserting. In > contrast, exclusion constraints generally optimistically detect any > conflict in a separate pass. That's how you can get concurrent > insertions to wait on each other, rather than having a clear "winner" > at the precise point that insertion occurs. I was already afraid of this reading your post in pg-hackers (Jan 24th 2015, "Moving ExecInsertIndexTuples and friends to new file") where comment was added in sourcecode: > FWIW, both Jeff Davis and Tom Lane were well aware of this issue back > when exclusion constraints went in - it was judged to be acceptable at > the time, which I agree with. I happened to discuss this with Jeff in > New York recently. I agree that it should definitely be documented > like this (and the fact that ordinary unique indexes are unaffected, > too). But how to work around this deadlock without requiring a more general lock, like table lock or some other related object? I need guaranteed transactional processing. In case of (this) deadlock, I attempted to catch the deadlock and retry. A new issue popped up: if more then two sessions compete for the lock, I get an infinite loop of deadlocks (cf. "Looking for workaround to avoid deadlock when using exclusion constraint" post of Feb 4 2018). Following code snippet appears to work if pg_sleep is larger than deadlock_timeout value, but I have two problems with that: - I need a deadlock timeout as small as possible (10ms might work, not sure). This will result in many long wait message in log (which I want to monitor...) plus higher CPU load. - I'm not 100% sure this works in all cases. DO $$ DECLARE key_ locked.key%TYPE; ctid_ locked.ctid%TYPE; BEGIN WHILE key_ IS NULL LOOP BEGIN SELECT key, ctid FROM locked WHERE key = 'a' INTO STRICT key_, ctid_; EXCEPTION WHEN no_data_found THEN BEGIN INSERT INTO locked(key) values('a') RETURNING key, ctid INTO key_, ctid_; EXCEPTION WHEN exclusion_violation THEN NULL; WHEN deadlock_detected THEN RAISE WARNING 'Deadlock!! (t=%)', clock_timestamp(); PERFORM pg_sleep(1.1); -- deadlock_timeout = 1s END; END; END LOOP; RAISE WARNING 'key=%,ctid=%', key_, ctid_; END $$; Execute DO block above from at least 3 sessions after inserting a row from 4th session, and rolling back after all sessions are started: DROP TABLE IF EXISTS locked; CREATE TABLE locked ( key text NOT NULL, EXCLUDE USING gist (key WITH =) ); BEGIN; INSERT INTO locked(key) values('a'); -- Start 3+ DO blocks ROLLBACK; -- Sent from: http://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html