Re: race conditions, intersect in subqueries - Mailing list pgsql-general

From Stephan Szabo
Subject Re: race conditions, intersect in subqueries
Date
Msg-id Pine.BSF.4.10.10009081603410.99831-100000@megazone23.bigpanda.com
Whole thread Raw
In response to race conditions, intersect in subqueries  (Cristóvão Dalla Costa <cdallacosta@bigfoot.com>)
List pgsql-general
On Fri, 8 Sep 2000, [iso-8859-1] Crist�v�o Dalla Costa wrote:

> I'm working with an application I wrote which does something along these
> lines:
>
> SELECT ID FROM ITEM WHERE URL='X' FOR UPDATE
> IF (ROW RETURNED) {
>     $ID = ITEM.ID
> } ELSE {
>     SELECT nextval ('item_id_seq')
>     $ID = nextval
>     INSERT INTO ITEM....
> }
> DO OTHER STUFF WITH $ID
>
> So, I check if an item with a given url exists. If it does, I get its id and
> use it later. If it doesn't, I insert a new item and proceed with the new
> id. Everything happens inside a transaction. Now, there is a race condition
> where the first line is executed simultaneously by two processes, looking
> for the same url, and resulting in duplicate lines. So far, there are about
> 40 duplicates in a 80,000 row database, and short of manually correcting
> them, I don't know what to do to fix the race condition.

I'm assuming that this only happens when both are doing this and the row
didn't already exist, both insert, because select for update isn't going
to lock the non-existant row and they aren't going to block for each
other since no rows match.  Do the rows have the same sequence number or
just different sequences and the same data?  I'm not sure what else
select for update could do in the latter case other than allow both
through since there are no rows that conflict.

If you want a quick hack (other than locking the table explicitly), you
could effectively make a simple locking mechanism with another table,
always have a row for each "lock" you want, and select for update that,
basically locking out one until the other is done.

> Finally, it seems I cannot do INTERSECT on subqueries, sice the following
> fails with a parse error "at or near INTERSECT", and the subquery by itself
> works.

Yes, currently things like INTERSECT, EXCEPT, etc, are only allowable at
the top level of the query.  This is likely to go away in 7.2 (the dreaded
querytree work :) ).

> SELECT * FROM item WHERE id IN (SELECT item_id FROM item_words, words WHERE
> words.id = words_id AND words.word='x' INTERSECT SELECT item_id FROM
> item_words, words WHERE words.id = words_id AND words.word='y')
>
> Basically, I'm using the above query to look for words in a reverse index,
> sometimes with as many as 10 different words, causing a lot of rows to be
> generated, to be later filtered by the intersects. Are there any better ways
> to do that, performance-wise?

I'm not sure if it'll be fast, but is the subquery equivalent to
(assuming item_id is on item_words):
SELECT distinct a.item_id FROM item_words a, words b, item_words c,
 words d where b.id=a.words_id AND b.word='x' AND d.id=c.words_id AND
 d.word='y' AND a.item_id=c.item_id

Well, then converting that IN into an EXISTS... and adding the
id=a.item_id at the end.


> BTW, I'm not sure whether this is the appropriate mailing list to report
> this, but the query optimizer should read
> SELECT * FROM x WHERE id IN (SELECT x_id FROM Y)
> as
> SELECT * FROM x WHERE EXISTS (SELECT * FROM Y WHERE x_id = x.id)
> when the tables are "large", and the necessary indexes exist.

This is a known problem, it's mentioned in the FAQ
("Why are my subqueries using IN so slow") with the note that
we hope to fix this problem, but it hasn't been tackled yet.



pgsql-general by date:

Previous
From: Cristóvão Dalla Costa
Date:
Subject: race conditions, intersect in subqueries
Next
From: Stephan Szabo
Date:
Subject: Re: Column name case conversion