Re: SELECT ... FOR UPDATE performance costs? alternatives? - Mailing list pgsql-general

From D. Dante Lorenso
Subject Re: SELECT ... FOR UPDATE performance costs? alternatives?
Date
Msg-id 46C3B4B8.9040105@lorenso.com
Whole thread Raw
In response to Re: SELECT ... FOR UPDATE performance costs? alternatives?  (Erik Jones <erik@myemma.com>)
Responses Re: SELECT ... FOR UPDATE performance costs? alternatives?
Re: SELECT ... FOR UPDATE performance costs? alternatives?
List pgsql-general
Erik Jones wrote:
>
> On Aug 15, 2007, at 2:39 PM, btober@ct.metrocast.net wrote:
>
>> Erik Jones wrote:
>>> On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:
>>>
>>>> ...to ensure that only one server is processing the queue item, so
>>>> inside PGSQL, use SELECT ... FOR UPDATE to lock the row...
>>>> When my server is under severe load, however, this function begins
>>>> to take a long time to execute and I begin to suspect that the FOR
>>>> UPDATE lock might be locking the whole table and not just the row.
>>>> How do I know if this is the case, how can I optimize this
>>>> procedure, and how should I be doing this differently?  ...
>>>>
>>>> Thoughts?
>>>
>>> SELECT ... FOR UPDATE should only be locking the rows returned by
>>> your the select statement, in this case the one row.  You can check
>>> what locks exist on a table (and their type) with the pg_locks system
>>> view.
>>>
>>
>> Is that correct? Documentation section 12.3.1. Table-Level Locks
>> states 'The list below shows the available lock modes ...Remember that
>> all of these lock modes are table-level locks, even if the name
>> contains the word "row"'.
>
> You will notice that SELECT ... FOR UPDATE is not in that list.  It's
> covered in the next section on row level locks.
>>
>> I wonder why bother with the SELECT statement at all. Why not just go
>> straight to the UPDATE statement with something like
>>
>>    UPDATE queue SET
>>      status = in_new_status,
>>      ts_start = NOW(),
>>      ts_end = NULL,
>>      hostname = COALESCE(in_hostname, hostname)
>>    WHERE tcq_id = (SELECT tcq_id  FROM queue q WHERE q.status =
>> in_status ORDER BY tcq_id ASC LIMIT 1);
>>
>> He may need to trap an exception for the "not found" case, but what's
>> the big deal with that?
>>
>> UPDATE statements acquire a ROW EXCLUSIVE on the table, which
>> conflicts, among other things, with ROW EXCLUSIVE, so it will block
>> other UPDATE statements initiated by other transactions.
>
> That won't work because the update won't lock the row until the select
> returns.  So, if two process execute that at the same time they will
> both execute the subquery and return the same result, the first will
> update it and the second will then (redundantly) update it.

It also won't work because I need to change AND read the row.  If I only
do the update, I don't know what was updated.  I still need to return
the tcq_id to my application.

Maybe the update could look like this:

UPDATE queue SET
   status = in_new_status,
   ts_start = NOW(),
   ts_end = NULL,
   hostname = COALESCE(in_hostname, hostname),
WHERE status = in_status;

But there I don't have the LIMIT 1, and I also don't know which rows got
updated.  I supposed there might be some magic to find the OID of the
affected rows, but I don't know how what would be done.

I need logic like "atomic test and set" or pop 1 item off the queue
atomically and tell me what that item was.

In my situation, there are a dozen or so machines polling this queue
periodically looking for work to do.  As more polling is occurring, the
locks seem to be taking longer so I was worried table-level locks might
be occurring.

-- Dante


pgsql-general by date:

Previous
From: "서기석"
Date:
Subject: please! SPI_finish is strange
Next
From: Tom Lane
Date:
Subject: Re: please! SPI_finish is strange