Thread: INSERT/UPDATEs cycles and lack of phantom locking
I've got several tables where I need to either insert new records, or update existing ones (identified based on the primary key). For performance reasons, I want to do this in batches, so I plan to use something like this: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; -- SERIALIZABLE ... CREATE TEMPORARY TABLE tmp (key TEXT NOT NULL, new_val INTEGER NOT NULL); COPY tmp (key, new_val) FROM STDIN; ... \. -- SAVEPOINT tmp_created; -- (see below) CREATE TEMPORARY TABLE tmp2 AS SELECT tmp.key, new_val, real.val AS old_val FROM tmp LEFT OUTER JOIN real ON tmp.key = real.key; UPDATE real SET val = new_val + old_val FROM tmp2 WHERE old_val IS NOT NULL AND tmp2.key = real.key; INSERT INTO real SELECT key, new_val FROM tmp2 WHERE old_val IS NULL; If this is run concurrently, the INSERT may fail. In this case, I rerun the transaction. Actually, I want to rollback to the tmp_created checkpoint, but I don't think this will pick up the new rows in the "real" table, and the INSERT will fail again. Usually, the batch size is small enough that the necessary data is still cached, and concurrent updates aren't the norm, so this approach (complete transaction rollback) is not completely infeasible. However, I still wonder if there is a more straightforward solution. Serializing the updates isn't one, I think. Is there some form of table-based advisory locking which I could use? This way, I wouldn't lock out ordinary readers (which is crucial), but the reading part of an updating transaction would be blocked. For bonus points, deadlocks would be automatically detected by PostgreSQL (although I would order the locks properly in the usual case, but I can't guarantee this for all codepaths due to the modularity of the application). Florian -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Durlacher Allee 47 tel: +49-721-96201-1 D-76131 Karlsruhe fax: +49-721-96201-99
Florian Weimer <fweimer@bfk.de> writes: > BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; -- SERIALIZABLE > ... > -- SAVEPOINT tmp_created; -- (see below) > CREATE TEMPORARY TABLE tmp2 AS SELECT tmp.key, new_val, real.val AS old_val > FROM tmp LEFT OUTER JOIN real ON tmp.key = real.key; > UPDATE real SET val = new_val + old_val FROM tmp2 > WHERE old_val IS NOT NULL AND tmp2.key = real.key; > INSERT INTO real SELECT key, new_val FROM tmp2 WHERE old_val IS NULL; > If this is run concurrently, the INSERT may fail. In this case, I > rerun the transaction. Actually, I want to rollback to the > tmp_created checkpoint, but I don't think this will pick up the new > rows in the "real" table, and the INSERT will fail again. Why do you think that? If you're running in READ COMMITTED mode then each statement takes a new snapshot. regards, tom lane
* Tom Lane: > Why do you think that? If you're running in READ COMMITTED mode then > each statement takes a new snapshot. Ah, I should have explained that. I might need the SERIALIZABLE isolation level in the future (this code doesn't need it, but other things in the same transaction might require it). In addition, it occurred to me that I get the INSERT failure only if there is a suitable PRIMARY KEY/UNIQUE constraint on the table. I haven't got that in all cases, so I need that advisory locking anyway, I fear. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Durlacher Allee 47 tel: +49-721-96201-1 D-76131 Karlsruhe fax: +49-721-96201-99
* Florian Weimer: > In addition, it occurred to me that I get the INSERT failure only if > there is a suitable PRIMARY KEY/UNIQUE constraint on the table. I > haven't got that in all cases, so I need that advisory locking anyway, It seems that LOCK TABLE ... IN EXCLUSIVE MODE does exactly what I need: it locks out itself (and write access), but not read access to the table. And deadlocks are detected as well. Yay! -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Durlacher Allee 47 tel: +49-721-96201-1 D-76131 Karlsruhe fax: +49-721-96201-99