Re: SELECT ... FOR UPDATE performance costs? alternatives? - Mailing list pgsql-general
From | D. Dante Lorenso |
---|---|
Subject | Re: SELECT ... FOR UPDATE performance costs? alternatives? |
Date | |
Msg-id | 46C3B4B8.9040105@lorenso.com Whole thread Raw |
In response to | Re: SELECT ... FOR UPDATE performance costs? alternatives? (Erik Jones <erik@myemma.com>) |
Responses |
Re: SELECT ... FOR UPDATE performance costs?
alternatives?
Re: SELECT ... FOR UPDATE performance costs? alternatives? |
List | pgsql-general |
Erik Jones wrote: > > On Aug 15, 2007, at 2:39 PM, btober@ct.metrocast.net wrote: > >> Erik Jones wrote: >>> On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote: >>> >>>> ...to ensure that only one server is processing the queue item, so >>>> inside PGSQL, use SELECT ... FOR UPDATE to lock the row... >>>> When my server is under severe load, however, this function begins >>>> to take a long time to execute and I begin to suspect that the FOR >>>> UPDATE lock might be locking the whole table and not just the row. >>>> How do I know if this is the case, how can I optimize this >>>> procedure, and how should I be doing this differently? ... >>>> >>>> Thoughts? >>> >>> SELECT ... FOR UPDATE should only be locking the rows returned by >>> your the select statement, in this case the one row. You can check >>> what locks exist on a table (and their type) with the pg_locks system >>> view. >>> >> >> Is that correct? Documentation section 12.3.1. Table-Level Locks >> states 'The list below shows the available lock modes ...Remember that >> all of these lock modes are table-level locks, even if the name >> contains the word "row"'. > > You will notice that SELECT ... FOR UPDATE is not in that list. It's > covered in the next section on row level locks. >> >> I wonder why bother with the SELECT statement at all. Why not just go >> straight to the UPDATE statement with something like >> >> UPDATE queue SET >> status = in_new_status, >> ts_start = NOW(), >> ts_end = NULL, >> hostname = COALESCE(in_hostname, hostname) >> WHERE tcq_id = (SELECT tcq_id FROM queue q WHERE q.status = >> in_status ORDER BY tcq_id ASC LIMIT 1); >> >> He may need to trap an exception for the "not found" case, but what's >> the big deal with that? >> >> UPDATE statements acquire a ROW EXCLUSIVE on the table, which >> conflicts, among other things, with ROW EXCLUSIVE, so it will block >> other UPDATE statements initiated by other transactions. > > That won't work because the update won't lock the row until the select > returns. So, if two process execute that at the same time they will > both execute the subquery and return the same result, the first will > update it and the second will then (redundantly) update it. It also won't work because I need to change AND read the row. If I only do the update, I don't know what was updated. I still need to return the tcq_id to my application. Maybe the update could look like this: UPDATE queue SET status = in_new_status, ts_start = NOW(), ts_end = NULL, hostname = COALESCE(in_hostname, hostname), WHERE status = in_status; But there I don't have the LIMIT 1, and I also don't know which rows got updated. I supposed there might be some magic to find the OID of the affected rows, but I don't know how what would be done. I need logic like "atomic test and set" or pop 1 item off the queue atomically and tell me what that item was. In my situation, there are a dozen or so machines polling this queue periodically looking for work to do. As more polling is occurring, the locks seem to be taking longer so I was worried table-level locks might be occurring. -- Dante
pgsql-general by date: