Thread: SELECT FOR UPDATE performance is bad

SELECT FOR UPDATE performance is bad

From
Mario Splivalo
Date:
For the purpose of the application I need to establish some form of
serialization, therefore I use FOR UPDATE. The query, inside the
function, is like this:

pulitzer2=# explain analyze select id FROM messages JOIN
ticketing_codes_played ON id = message_id WHERE service_id = 1102 AND
receiving_time BETWEEN '2006-03-01' AND '2006-06-30' FOR UPDATE;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=32131.04..34281.86 rows=627 width=16) (actual
time=742.806..1491.864 rows=58005 loops=1)
   Hash Cond: ("outer".message_id = "inner".id)
   ->  Seq Scan on ticketing_codes_played  (cost=0.00..857.17 rows=57217
width=10) (actual time=0.024..209.331 rows=58005 loops=1)
   ->  Hash  (cost=32090.60..32090.60 rows=16177 width=10) (actual
time=742.601..742.601 rows=65596 loops=1)
         ->  Bitmap Heap Scan on messages  (cost=4153.51..32090.60
rows=16177 width=10) (actual time=160.555..489.459 rows=65596 loops=1)
               Recheck Cond: ((service_id = 1102) AND (receiving_time >=
'2006-03-01 00:00:00+01'::timestamp with time zone) AND (receiving_time
<= '2006-06-30 00:00:00+02'::timestamp with time zone))
               ->  BitmapAnd  (cost=4153.51..4153.51 rows=16177 width=0)
(actual time=156.900..156.900 rows=0 loops=1)
                     ->  Bitmap Index Scan on idx_service_id
(cost=0.00..469.31 rows=68945 width=0) (actual time=16.661..16.661
rows=66492 loops=1)
                           Index Cond: (service_id = 1102)
                     ->  Bitmap Index Scan on
idx_messages_receiving_time  (cost=0.00..3683.95 rows=346659 width=0)
(actual time=137.526..137.526 rows=360754 loops=1)
                           Index Cond: ((receiving_time >= '2006-03-01
00:00:00+01'::timestamp with time zone) AND (receiving_time <=
'2006-06-30 00:00:00+02'::timestamp with time zone))
 Total runtime: 6401.954 ms
(12 rows)



Now, this query takes between 8 and 30 seconds, wich is a lot, since
during the day we have almost 20 requests per minute. I notice that
during the execution of the above mentioned query i/o goes bezerk,
iostat tells me that load is around 60%. I tried playing with WAL
configuration parametars, even put the log on separate disk spindles, it
did nothing.

Shall I reconsider the need for the exact lock I developed, or there is
something more I could do to speed the things up?

    Mario
--
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."



Re: SELECT FOR UPDATE performance is bad

From
Tom Lane
Date:
Mario Splivalo <mario.splivalo@mobart.hr> writes:
> For the purpose of the application I need to establish some form of
> serialization, therefore I use FOR UPDATE. The query, inside the
> function, is like this:

> pulitzer2=# explain analyze select id FROM messages JOIN
> ticketing_codes_played ON id = message_id WHERE service_id = 1102 AND
> receiving_time BETWEEN '2006-03-01' AND '2006-06-30' FOR UPDATE;

>  Hash Join  (cost=32131.04..34281.86 rows=627 width=16) (actual
> time=742.806..1491.864 rows=58005 loops=1)
                              ^^^^^

> Now, this query takes between 8 and 30 seconds, wich is a lot, since
> during the day we have almost 20 requests per minute.

Acquiring a row lock separately for each of 58000 rows is not going to
be a cheap operation.  Especially not if anyone else is locking any of
the same rows and thereby blocking you.  If there is concurrent locking,
you're also running a big risk of deadlock because two processes might
try to lock the same rows in different orders.

Are you really intending to update all 58000 rows?  If not, what is
the serialization requirement exactly (ie, what are you trying to
accomplish)?  Seems like something about this app needs to be
redesigned.

            regards, tom lane

Re: SELECT FOR UPDATE performance is bad

From
Tom Lane
Date:
Mario Splivalo <mario.splivalo@mobart.hr> writes:
>> If there is concurrent locking,
>> you're also running a big risk of deadlock because two processes might
>> try to lock the same rows in different orders.

> I think there is no risk of a deadlock, since that particular function
> is called from the middleware (functions are used as interface to the
> database), and the lock order is always the same.

No, you don't even know what the order is, let alone that it's always
the same.

> Now, I just need to have serialization, I need to have clients 'line up'
> in order to perform something in the database. Actually, users are
> sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and
> database needs to check has the code allready been played. Since the
> system is designed so that it could run multiple code-games (and then
> there similair code could exists for coke-game and beer-game), I'm using
> messages table to see what code-game (i.e. service) that particular code
> belongs.

I'd suggest using a table that has exactly one row per "code-game", and
doing a SELECT FOR UPDATE on that row to establish the lock you need.
This need not have anything to do with the tables/rows you are actually
intending to update --- although obviously such a convention is pretty
fragile if you have updates coming from a variety of code.  I think it's
reasonably safe when you're funneling all the operations through a bit
of middleware.

            regards, tom lane

Re: SELECT FOR UPDATE performance is bad

From
PFC
Date:
Suppose you have a table codes :
(
    game_id    INT,
    code        TEXT,
    used        BOOL NOT NULL DEFAULT 'f',
    prize        ...
    ...
    PRIMARY KEY (game_id, code)
)

    Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND code=...

    Then check the rowcount : if one row was updated, the code was not used
yet. If no row was updated, the code either did not exist, or was already
used.

Another option : create a table used_codes like this :

(
    game_id    INT,
    code        TEXT,
    ...
    PRIMARY KEY (game_id, code)
)

    Then, when trying to use a code, INSERT into this table. If you get a
constraint violation on the uniqueness of the primary key, your code has
already been used.

    Both solutions have a big advantage : they don't require messing with
locks and are extremely simple. The one with UPDATE is IMHO better,
because it doesn't abort the current transaction (although you could use a
savepoint in the INSERT case to intercept the error).








On Tue, 18 Apr 2006 17:33:06 +0200, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Mario Splivalo <mario.splivalo@mobart.hr> writes:
>>> If there is concurrent locking,
>>> you're also running a big risk of deadlock because two processes might
>>> try to lock the same rows in different orders.
>
>> I think there is no risk of a deadlock, since that particular function
>> is called from the middleware (functions are used as interface to the
>> database), and the lock order is always the same.
>
> No, you don't even know what the order is, let alone that it's always
> the same.
>
>> Now, I just need to have serialization, I need to have clients 'line up'
>> in order to perform something in the database. Actually, users are
>> sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and
>> database needs to check has the code allready been played. Since the
>> system is designed so that it could run multiple code-games (and then
>> there similair code could exists for coke-game and beer-game), I'm using
>> messages table to see what code-game (i.e. service) that particular code
>> belongs.
>
> I'd suggest using a table that has exactly one row per "code-game", and
> doing a SELECT FOR UPDATE on that row to establish the lock you need.
> This need not have anything to do with the tables/rows you are actually
> intending to update --- although obviously such a convention is pretty
> fragile if you have updates coming from a variety of code.  I think it's
> reasonably safe when you're funneling all the operations through a bit
> of middleware.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings



Re: SELECT FOR UPDATE performance is bad

From
Christopher Kings-Lynne
Date:
> Suppose you have a table codes :
> (
>     game_id    INT,
>     code        TEXT,
>     used        BOOL NOT NULL DEFAULT 'f',
>     prize        ...
>     ...
>     PRIMARY KEY (game_id, code)
> )
>
>     Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND
> code=...
>
>     Then check the rowcount : if one row was updated, the code was not
> used yet. If no row was updated, the code either did not exist, or was
> already used.

You can use a stored procedure with exceptions no?

Try this:

http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

Chris



Re: SELECT FOR UPDATE performance is bad

From
Mario Splivalo
Date:
On Tue, 2006-04-18 at 11:33 -0400, Tom Lane wrote:
> Mario Splivalo <mario.splivalo@mobart.hr> writes:
> >> If there is concurrent locking,
> >> you're also running a big risk of deadlock because two processes might
> >> try to lock the same rows in different orders.
>
> > I think there is no risk of a deadlock, since that particular function
> > is called from the middleware (functions are used as interface to the
> > database), and the lock order is always the same.
>
> No, you don't even know what the order is, let alone that it's always
> the same.

You got me confused here! :) If I have just only one function that acts
as a interface to the middleware, and all the operations on the database
are done trough that one function, and I carefuly design that function
so that I first grab the lock, and then do the stuff, aint I pretty sure
that I won't be having any deadlocks?

>
> > Now, I just need to have serialization, I need to have clients 'line up'
> > in order to perform something in the database. Actually, users are
> > sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and
> > database needs to check has the code allready been played. Since the
> > system is designed so that it could run multiple code-games (and then
> > there similair code could exists for coke-game and beer-game), I'm using
> > messages table to see what code-game (i.e. service) that particular code
> > belongs.
>
> I'd suggest using a table that has exactly one row per "code-game", and
> doing a SELECT FOR UPDATE on that row to establish the lock you need.
> This need not have anything to do with the tables/rows you are actually
> intending to update --- although obviously such a convention is pretty
> fragile if you have updates coming from a variety of code.  I think it's
> reasonably safe when you're funneling all the operations through a bit
> of middleware.

I tend to design my applications so I don't have "flying SQL" in my
java/python/c#/php/whereever code, all the database stuff is done trough
the functions which are designed as interfaces. Those functions are also
designed so they don't stop each other. So, since I need the
serialization, I'll do as you suggested, using a lock-table with
exactley one row per "code-game".

Just one more question here, it has to do with postgres internals, but
still I'd like to know why is postgres doing such huge i/o (in my log
file I see a lot of messages that say "LOG:  archived transaction log
file" when performing that big FOR UPDATE.

    Mario
--
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."



Re: SELECT FOR UPDATE performance is bad

From
Mario Splivalo
Date:
On Tue, 2006-04-18 at 19:00 +0200, PFC wrote:
> Suppose you have a table codes :
> (
>     game_id    INT,
>     code        TEXT,
>     used        BOOL NOT NULL DEFAULT 'f',
>     prize        ...
>     ...
>     PRIMARY KEY (game_id, code)
> )
>
>     Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND code=...
>
>     Then check the rowcount : if one row was updated, the code was not used
> yet. If no row was updated, the code either did not exist, or was already
> used.
>
> Another option : create a table used_codes like this :
>
> (
>     game_id    INT,
>     code        TEXT,
>     ...
>     PRIMARY KEY (game_id, code)
> )
>
>     Then, when trying to use a code, INSERT into this table. If you get a
> constraint violation on the uniqueness of the primary key, your code has
> already been used.
>
>     Both solutions have a big advantage : they don't require messing with
> locks and are extremely simple. The one with UPDATE is IMHO better,
> because it doesn't abort the current transaction (although you could use a
> savepoint in the INSERT case to intercept the error).
>
>

This works perfectly, but sometimes the game has no codes, and I still
need to know exactley who came first, who was second, and so on... So a
locking table as Tom suggested is, I guess, a perfect solution for my
situation...

    Mario
--
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."



Re: SELECT FOR UPDATE performance is bad

From
"Jim C. Nasby"
Date:
On Wed, Apr 19, 2006 at 10:20:54AM +0200, Mario Splivalo wrote:
> This works perfectly, but sometimes the game has no codes, and I still
> need to know exactley who came first, who was second, and so on... So a
> locking table as Tom suggested is, I guess, a perfect solution for my
> situation...

Depending on your performance requirements, you should look at
contrib/userlock as well, since it will probably be much more performant
than locking a row in a table.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461