Thread: SELECT FOR UPDATE performance is bad
For the purpose of the application I need to establish some form of serialization, therefore I use FOR UPDATE. The query, inside the function, is like this: pulitzer2=# explain analyze select id FROM messages JOIN ticketing_codes_played ON id = message_id WHERE service_id = 1102 AND receiving_time BETWEEN '2006-03-01' AND '2006-06-30' FOR UPDATE; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=32131.04..34281.86 rows=627 width=16) (actual time=742.806..1491.864 rows=58005 loops=1) Hash Cond: ("outer".message_id = "inner".id) -> Seq Scan on ticketing_codes_played (cost=0.00..857.17 rows=57217 width=10) (actual time=0.024..209.331 rows=58005 loops=1) -> Hash (cost=32090.60..32090.60 rows=16177 width=10) (actual time=742.601..742.601 rows=65596 loops=1) -> Bitmap Heap Scan on messages (cost=4153.51..32090.60 rows=16177 width=10) (actual time=160.555..489.459 rows=65596 loops=1) Recheck Cond: ((service_id = 1102) AND (receiving_time >= '2006-03-01 00:00:00+01'::timestamp with time zone) AND (receiving_time <= '2006-06-30 00:00:00+02'::timestamp with time zone)) -> BitmapAnd (cost=4153.51..4153.51 rows=16177 width=0) (actual time=156.900..156.900 rows=0 loops=1) -> Bitmap Index Scan on idx_service_id (cost=0.00..469.31 rows=68945 width=0) (actual time=16.661..16.661 rows=66492 loops=1) Index Cond: (service_id = 1102) -> Bitmap Index Scan on idx_messages_receiving_time (cost=0.00..3683.95 rows=346659 width=0) (actual time=137.526..137.526 rows=360754 loops=1) Index Cond: ((receiving_time >= '2006-03-01 00:00:00+01'::timestamp with time zone) AND (receiving_time <= '2006-06-30 00:00:00+02'::timestamp with time zone)) Total runtime: 6401.954 ms (12 rows) Now, this query takes between 8 and 30 seconds, wich is a lot, since during the day we have almost 20 requests per minute. I notice that during the execution of the above mentioned query i/o goes bezerk, iostat tells me that load is around 60%. I tried playing with WAL configuration parametars, even put the log on separate disk spindles, it did nothing. Shall I reconsider the need for the exact lock I developed, or there is something more I could do to speed the things up? Mario -- Mario Splivalo Mob-Art mario.splivalo@mobart.hr "I can do it quick, I can do it cheap, I can do it well. Pick any two."
Mario Splivalo <mario.splivalo@mobart.hr> writes: > For the purpose of the application I need to establish some form of > serialization, therefore I use FOR UPDATE. The query, inside the > function, is like this: > pulitzer2=# explain analyze select id FROM messages JOIN > ticketing_codes_played ON id = message_id WHERE service_id = 1102 AND > receiving_time BETWEEN '2006-03-01' AND '2006-06-30' FOR UPDATE; > Hash Join (cost=32131.04..34281.86 rows=627 width=16) (actual > time=742.806..1491.864 rows=58005 loops=1) ^^^^^ > Now, this query takes between 8 and 30 seconds, wich is a lot, since > during the day we have almost 20 requests per minute. Acquiring a row lock separately for each of 58000 rows is not going to be a cheap operation. Especially not if anyone else is locking any of the same rows and thereby blocking you. If there is concurrent locking, you're also running a big risk of deadlock because two processes might try to lock the same rows in different orders. Are you really intending to update all 58000 rows? If not, what is the serialization requirement exactly (ie, what are you trying to accomplish)? Seems like something about this app needs to be redesigned. regards, tom lane
Mario Splivalo <mario.splivalo@mobart.hr> writes: >> If there is concurrent locking, >> you're also running a big risk of deadlock because two processes might >> try to lock the same rows in different orders. > I think there is no risk of a deadlock, since that particular function > is called from the middleware (functions are used as interface to the > database), and the lock order is always the same. No, you don't even know what the order is, let alone that it's always the same. > Now, I just need to have serialization, I need to have clients 'line up' > in order to perform something in the database. Actually, users are > sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and > database needs to check has the code allready been played. Since the > system is designed so that it could run multiple code-games (and then > there similair code could exists for coke-game and beer-game), I'm using > messages table to see what code-game (i.e. service) that particular code > belongs. I'd suggest using a table that has exactly one row per "code-game", and doing a SELECT FOR UPDATE on that row to establish the lock you need. This need not have anything to do with the tables/rows you are actually intending to update --- although obviously such a convention is pretty fragile if you have updates coming from a variety of code. I think it's reasonably safe when you're funneling all the operations through a bit of middleware. regards, tom lane
Suppose you have a table codes : ( game_id INT, code TEXT, used BOOL NOT NULL DEFAULT 'f', prize ... ... PRIMARY KEY (game_id, code) ) Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND code=... Then check the rowcount : if one row was updated, the code was not used yet. If no row was updated, the code either did not exist, or was already used. Another option : create a table used_codes like this : ( game_id INT, code TEXT, ... PRIMARY KEY (game_id, code) ) Then, when trying to use a code, INSERT into this table. If you get a constraint violation on the uniqueness of the primary key, your code has already been used. Both solutions have a big advantage : they don't require messing with locks and are extremely simple. The one with UPDATE is IMHO better, because it doesn't abort the current transaction (although you could use a savepoint in the INSERT case to intercept the error). On Tue, 18 Apr 2006 17:33:06 +0200, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Mario Splivalo <mario.splivalo@mobart.hr> writes: >>> If there is concurrent locking, >>> you're also running a big risk of deadlock because two processes might >>> try to lock the same rows in different orders. > >> I think there is no risk of a deadlock, since that particular function >> is called from the middleware (functions are used as interface to the >> database), and the lock order is always the same. > > No, you don't even know what the order is, let alone that it's always > the same. > >> Now, I just need to have serialization, I need to have clients 'line up' >> in order to perform something in the database. Actually, users are >> sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and >> database needs to check has the code allready been played. Since the >> system is designed so that it could run multiple code-games (and then >> there similair code could exists for coke-game and beer-game), I'm using >> messages table to see what code-game (i.e. service) that particular code >> belongs. > > I'd suggest using a table that has exactly one row per "code-game", and > doing a SELECT FOR UPDATE on that row to establish the lock you need. > This need not have anything to do with the tables/rows you are actually > intending to update --- although obviously such a convention is pretty > fragile if you have updates coming from a variety of code. I think it's > reasonably safe when you're funneling all the operations through a bit > of middleware. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
> Suppose you have a table codes : > ( > game_id INT, > code TEXT, > used BOOL NOT NULL DEFAULT 'f', > prize ... > ... > PRIMARY KEY (game_id, code) > ) > > Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND > code=... > > Then check the rowcount : if one row was updated, the code was not > used yet. If no row was updated, the code either did not exist, or was > already used. You can use a stored procedure with exceptions no? Try this: http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE Chris
On Tue, 2006-04-18 at 11:33 -0400, Tom Lane wrote: > Mario Splivalo <mario.splivalo@mobart.hr> writes: > >> If there is concurrent locking, > >> you're also running a big risk of deadlock because two processes might > >> try to lock the same rows in different orders. > > > I think there is no risk of a deadlock, since that particular function > > is called from the middleware (functions are used as interface to the > > database), and the lock order is always the same. > > No, you don't even know what the order is, let alone that it's always > the same. You got me confused here! :) If I have just only one function that acts as a interface to the middleware, and all the operations on the database are done trough that one function, and I carefuly design that function so that I first grab the lock, and then do the stuff, aint I pretty sure that I won't be having any deadlocks? > > > Now, I just need to have serialization, I need to have clients 'line up' > > in order to perform something in the database. Actually, users are > > sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and > > database needs to check has the code allready been played. Since the > > system is designed so that it could run multiple code-games (and then > > there similair code could exists for coke-game and beer-game), I'm using > > messages table to see what code-game (i.e. service) that particular code > > belongs. > > I'd suggest using a table that has exactly one row per "code-game", and > doing a SELECT FOR UPDATE on that row to establish the lock you need. > This need not have anything to do with the tables/rows you are actually > intending to update --- although obviously such a convention is pretty > fragile if you have updates coming from a variety of code. I think it's > reasonably safe when you're funneling all the operations through a bit > of middleware. I tend to design my applications so I don't have "flying SQL" in my java/python/c#/php/whereever code, all the database stuff is done trough the functions which are designed as interfaces. Those functions are also designed so they don't stop each other. So, since I need the serialization, I'll do as you suggested, using a lock-table with exactley one row per "code-game". Just one more question here, it has to do with postgres internals, but still I'd like to know why is postgres doing such huge i/o (in my log file I see a lot of messages that say "LOG: archived transaction log file" when performing that big FOR UPDATE. Mario -- Mario Splivalo Mob-Art mario.splivalo@mobart.hr "I can do it quick, I can do it cheap, I can do it well. Pick any two."
On Tue, 2006-04-18 at 19:00 +0200, PFC wrote: > Suppose you have a table codes : > ( > game_id INT, > code TEXT, > used BOOL NOT NULL DEFAULT 'f', > prize ... > ... > PRIMARY KEY (game_id, code) > ) > > Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND code=... > > Then check the rowcount : if one row was updated, the code was not used > yet. If no row was updated, the code either did not exist, or was already > used. > > Another option : create a table used_codes like this : > > ( > game_id INT, > code TEXT, > ... > PRIMARY KEY (game_id, code) > ) > > Then, when trying to use a code, INSERT into this table. If you get a > constraint violation on the uniqueness of the primary key, your code has > already been used. > > Both solutions have a big advantage : they don't require messing with > locks and are extremely simple. The one with UPDATE is IMHO better, > because it doesn't abort the current transaction (although you could use a > savepoint in the INSERT case to intercept the error). > > This works perfectly, but sometimes the game has no codes, and I still need to know exactley who came first, who was second, and so on... So a locking table as Tom suggested is, I guess, a perfect solution for my situation... Mario -- Mario Splivalo Mob-Art mario.splivalo@mobart.hr "I can do it quick, I can do it cheap, I can do it well. Pick any two."
On Wed, Apr 19, 2006 at 10:20:54AM +0200, Mario Splivalo wrote: > This works perfectly, but sometimes the game has no codes, and I still > need to know exactley who came first, who was second, and so on... So a > locking table as Tom suggested is, I guess, a perfect solution for my > situation... Depending on your performance requirements, you should look at contrib/userlock as well, since it will probably be much more performant than locking a row in a table. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461