Thread: Is "WITH () UPDATE" Thread Safe ?
Hello,
I have a lot of tickets, i need to take a batch of tickets and process them.
So the process is :
SELECT ONLY 100 tickets
PROCESS ticket
MARK THEM AS « done »
I’m selecting the tickets with :
WITH t0 AS (
SELECT t.id,
RANDOM() AS rank,
EXTRACT(EPOCH FROM NOW() - b.created_at) as elapsed
FROM tickets AS t
LEFT JOIN batch as b ON b.id = t.batch_id
WHERE (
t.status = 'waiting' OR
(t.status = 'processing' AND t.locked_until IS NOT NULL AND t.locked_until <= NOW())
) AND t.send_at < NOW()
AND (t.send_before IS NULL OR t.send_before > NOW())
ORDER BY
t.priority DESC,
rank ASC
LIMIT 100
FOR UPDATE OF t
)
UPDATE tickets AS t1
SET status = 'processing',
locked_until = NOW() + '1 HOUR’,
extra = t1.extra || hstore('elapsed', t0.elapsed || '') || hstore('rank', rank || '')
FROM t0
WHERE t1.id = t0.id
RETURNING t1.*;
I wonder if this query is thread safe, Can a ticket be updated between the SELECT part (t0) and the UPDATE part ?
If this query is not « thread safe » how can i do this ?
--
Paul
Paul
Paul GOERGLER wrote: > I have a lot of tickets, i need to take a batch of tickets and process them. > So the process is : > SELECT ONLY 100 tickets > PROCESS ticket > MARK THEM AS « done » > > I’m selecting the tickets with : > > WITH t0 AS ( > SELECT t.id, > RANDOM() AS rank, > EXTRACT(EPOCH FROM NOW() - b.created_at) as elapsed > FROM tickets AS t > LEFT JOIN batch as b ON b.id = t.batch_id > WHERE ( > t.status = 'waiting' OR > (t.status = 'processing' AND t.locked_until IS NOT NULL AND t.locked_until <= NOW()) > ) AND t.send_at < NOW() > AND (t.send_before IS NULL OR t.send_before > NOW()) > ORDER BY > t.priority DESC, > rank ASC > LIMIT 100 > FOR UPDATE OF t > ) > UPDATE tickets AS t1 > SET status = 'processing', > locked_until = NOW() + '1 HOUR’, > extra = t1.extra || hstore('elapsed', t0.elapsed || '') || hstore('rank', rank || '') > FROM t0 > WHERE t1.id = t0.id > RETURNING t1.*; > > > I wonder if this query is thread safe, Can a ticket be updated between the SELECT part (t0) and the > UPDATE part ? > If this query is not « thread safe » how can i do this ? There is no race condition in your query because you used SELECT ... FOR UPDATE. That causes the rows found in the WITH clause to be locked against concurrent modification. So you should be fine. Yours, Laurenz Albe
Ok thanks.
--
Paul GOERGLER
Paul GOERGLER
De: Albe Laurenz <laurenz.albe@wien.gv.at>
Répondre: Albe Laurenz <laurenz.albe@wien.gv.at>>
Date: 1 décembre 2014 at 13:21:07
À: Paul GOERGLER *EXTERN* <pgoergler@gmail.com>>, pgsql-general@postgresql.org <pgsql-general@postgresql.org>>
Sujet: RE: [GENERAL] Is "WITH () UPDATE" Thread Safe ?
Paul GOERGLER wrote:
> I have a lot of tickets, i need to take a batch of tickets and process them.
> So the process is :
> SELECT ONLY 100 tickets
> PROCESS ticket
> MARK THEM AS « done »
>
> I’m selecting the tickets with :
>
> WITH t0 AS (
> SELECT t.id,
> RANDOM() AS rank,
> EXTRACT(EPOCH FROM NOW() - b.created_at) as elapsed
> FROM tickets AS t
> LEFT JOIN batch as b ON b.id = t.batch_id
> WHERE (
> t.status = 'waiting' OR
> (t.status = 'processing' AND t.locked_until IS NOT NULL AND t.locked_until <= NOW())
> ) AND t.send_at < NOW()
> AND (t.send_before IS NULL OR t.send_before > NOW())
> ORDER BY
> t.priority DESC,
> rank ASC
> LIMIT 100
> FOR UPDATE OF t
> )
> UPDATE tickets AS t1
> SET status = 'processing',
> locked_until = NOW() + '1 HOUR’,
> extra = t1.extra || hstore('elapsed', t0.elapsed || '') || hstore('rank', rank || '')
> FROM t0
> WHERE t1.id = t0.id
> RETURNING t1.*;
>
>
> I wonder if this query is thread safe, Can a ticket be updated between the SELECT part (t0) and the
> UPDATE part ?
> If this query is not « thread safe » how can i do this ?
There is no race condition in your query because you used SELECT ... FOR UPDATE.
That causes the rows found in the WITH clause to be locked against concurrent modification.
So you should be fine.
Yours,
Laurenz Albe
Albe Laurenz *EXTERN* wrote > Paul GOERGLER wrote: >> I have a lot of tickets, i need to take a batch of tickets and process >> them. >> So the process is : >> SELECT ONLY 100 tickets >> PROCESS ticket >> MARK THEM AS « done » >> >> I’m selecting the tickets with : >> >> WITH t0 AS ( >> SELECT t.id, >> RANDOM() AS rank, >> EXTRACT(EPOCH FROM NOW() - b.created_at) as elapsed >> FROM tickets AS t >> LEFT JOIN batch as b ON b.id = t.batch_id >> WHERE ( >> t.status = 'waiting' OR >> (t.status = 'processing' AND t.locked_until IS NOT NULL AND >> t.locked_until <= NOW()) >> ) AND t.send_at < NOW() >> AND (t.send_before IS NULL OR t.send_before > NOW()) >> ORDER BY >> t.priority DESC, >> rank ASC >> LIMIT 100 >> FOR UPDATE OF t >> ) >> UPDATE tickets AS t1 >> SET status = 'processing', >> locked_until = NOW() + '1 HOUR’, >> extra = t1.extra || hstore('elapsed', t0.elapsed || '') || >> hstore('rank', rank || '') >> FROM t0 >> WHERE t1.id = t0.id >> RETURNING t1.*; >> >> >> I wonder if this query is thread safe, Can a ticket be updated between >> the SELECT part (t0) and the >> UPDATE part ? >> If this query is not « thread safe » how can i do this ? > > There is no race condition in your query because you used SELECT ... FOR > UPDATE. > > That causes the rows found in the WITH clause to be locked against > concurrent modification. > > So you should be fine. > > Yours, > Laurenz Albe I was under the impression that the presence of FOR UPDATE in this situation was unnecessary since the execution of the update occurs in the same statement as the select and thus the relevant data will be locked at execution. The FOR UPDATE is for situations where other code needs to intervene between the select and a subsequent update. The documentation is silent on this distinction, but... Note that the use of a CTE in this example is a convenience and that the top level command is still UPDATE, not SELECT. It may be worthwhile to update the UPDATE page's WITH commentary to note that (if correct) there is no need for a FOR UPDATE clause on the contained subquery (yes, that was quite a mouthful...) David J. -- View this message in context: http://postgresql.nabble.com/Is-WITH-UPDATE-Thread-Safe-tp5828738p5829038.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
David G Johnston wrote: > Albe Laurenz *EXTERN* wrote >> Paul GOERGLER wrote: >>> I have a lot of tickets, i need to take a batch of tickets and process >>> them. >>> So the process is : >>> SELECT ONLY 100 tickets >>> PROCESS ticket >>> MARK THEM AS « done » >>> >>> I’m selecting the tickets with : >>> >>> WITH t0 AS ( >>> SELECT t.id, [...] >>> FROM tickets AS t [...] >>> FOR UPDATE OF t >>> ) >>> UPDATE tickets AS t1 [...] >>> FROM t0 [...] >>> >>> >>> I wonder if this query is thread safe, Can a ticket be updated between >>> the SELECT part (t0) and the >>> UPDATE part ? >>> If this query is not « thread safe » how can i do this ? >> There is no race condition in your query because you used SELECT ... FOR >> UPDATE. >> >> That causes the rows found in the WITH clause to be locked against >> concurrent modification. >> >> So you should be fine. > I was under the impression that the presence of FOR UPDATE in this situation > was unnecessary since the execution of the update occurs in the same > statement as the select and thus the relevant data will be locked at > execution. > > The FOR UPDATE is for situations where other code needs to intervene between > the select and a subsequent update. > > The documentation is silent on this distinction, but... > > Note that the use of a CTE in this example is a convenience and that the top > level command is still UPDATE, not SELECT. > > It may be worthwhile to update the UPDATE page's WITH commentary to note > that (if correct) there is no need for a FOR UPDATE clause on the contained > subquery (yes, that was quite a mouthful...) Well, the SELECT and the UPDATE are part of the same statement, so they run in the same transaction and with the same snapshot. But it could happen that (for example) two of the above queries are running concurrently. Then both could find the same row in the SELECT and modify it (because of the exclusive row lock, one of them will have to wait until the other is done). Then this ticket will be processed twice. With FOR UPDATE, no two SELECT statements can find the same row, because by the time the lock taken by the first SELECT is released, the status of the ticket will have changed and the second SELECT will not find it. Yours, Laurenz Albe