Re: race conditions, intersect in subqueries - Mailing list pgsql-general
From | Stephan Szabo |
---|---|
Subject | Re: race conditions, intersect in subqueries |
Date | |
Msg-id | Pine.BSF.4.10.10009081603410.99831-100000@megazone23.bigpanda.com Whole thread Raw |
In response to | race conditions, intersect in subqueries (Cristóvão Dalla Costa <cdallacosta@bigfoot.com>) |
List | pgsql-general |
On Fri, 8 Sep 2000, [iso-8859-1] Crist�v�o Dalla Costa wrote: > I'm working with an application I wrote which does something along these > lines: > > SELECT ID FROM ITEM WHERE URL='X' FOR UPDATE > IF (ROW RETURNED) { > $ID = ITEM.ID > } ELSE { > SELECT nextval ('item_id_seq') > $ID = nextval > INSERT INTO ITEM.... > } > DO OTHER STUFF WITH $ID > > So, I check if an item with a given url exists. If it does, I get its id and > use it later. If it doesn't, I insert a new item and proceed with the new > id. Everything happens inside a transaction. Now, there is a race condition > where the first line is executed simultaneously by two processes, looking > for the same url, and resulting in duplicate lines. So far, there are about > 40 duplicates in a 80,000 row database, and short of manually correcting > them, I don't know what to do to fix the race condition. I'm assuming that this only happens when both are doing this and the row didn't already exist, both insert, because select for update isn't going to lock the non-existant row and they aren't going to block for each other since no rows match. Do the rows have the same sequence number or just different sequences and the same data? I'm not sure what else select for update could do in the latter case other than allow both through since there are no rows that conflict. If you want a quick hack (other than locking the table explicitly), you could effectively make a simple locking mechanism with another table, always have a row for each "lock" you want, and select for update that, basically locking out one until the other is done. > Finally, it seems I cannot do INTERSECT on subqueries, sice the following > fails with a parse error "at or near INTERSECT", and the subquery by itself > works. Yes, currently things like INTERSECT, EXCEPT, etc, are only allowable at the top level of the query. This is likely to go away in 7.2 (the dreaded querytree work :) ). > SELECT * FROM item WHERE id IN (SELECT item_id FROM item_words, words WHERE > words.id = words_id AND words.word='x' INTERSECT SELECT item_id FROM > item_words, words WHERE words.id = words_id AND words.word='y') > > Basically, I'm using the above query to look for words in a reverse index, > sometimes with as many as 10 different words, causing a lot of rows to be > generated, to be later filtered by the intersects. Are there any better ways > to do that, performance-wise? I'm not sure if it'll be fast, but is the subquery equivalent to (assuming item_id is on item_words): SELECT distinct a.item_id FROM item_words a, words b, item_words c, words d where b.id=a.words_id AND b.word='x' AND d.id=c.words_id AND d.word='y' AND a.item_id=c.item_id Well, then converting that IN into an EXISTS... and adding the id=a.item_id at the end. > BTW, I'm not sure whether this is the appropriate mailing list to report > this, but the query optimizer should read > SELECT * FROM x WHERE id IN (SELECT x_id FROM Y) > as > SELECT * FROM x WHERE EXISTS (SELECT * FROM Y WHERE x_id = x.id) > when the tables are "large", and the necessary indexes exist. This is a known problem, it's mentioned in the FAQ ("Why are my subqueries using IN so slow") with the note that we hope to fix this problem, but it hasn't been tackled yet.
pgsql-general by date: