Thread: Row locking during UPDATE
Hi, I have a weird problem and want to know if I understand what's happening. I have a table like this: create table statistics ( date DATE DEFAULT current_date, key TEXT, value INTEGER DEFAULT 0, UNIQUE(date, key) ); and I have a bunch of clients running. Often, a client does something like this: UPDATE statistics SET value = value + 1 WHERE key = 'KEY' AND date = current_date; What appears to be happening is this: 1) Sometimes, a whole bunch of clients try updating the same row. I see sevaral postgresql processes marked "UPDATE waiting" 2) Because the client processes must complete rather quickly, a supervisor process kills them if they don't complete soon. 3) The postgresql processes in an "UPDATE waiting" state seem to be blocked waiting for a semaphore, and they do not notice the closing of the connection when the client is killed. 4) New client processes are spawned and eventually try to update the row. As a result, we end up with more and more postgresql processes until the connection limit is reached, because processes in the "UPDATE waiting" state don't notice the connection has been broken. Questions: 1) Am I on the right track? Is this how PostgreSQL works? 2) I plan on solving it by making a low-contention table like this: create table low_contention_stats ( date DATE DEFAULT current_date, key TEXT, randomizer INTEGER, value INTEGER DEFAULT 0, summarized BOOLEAN DEFAULT FALSE, UNIQUE(date, key, randomizer, summarized) ); Each client will pick a random "randomizer" value, so the odds of two clients trying to update the same row at the same time are low. Once a day, I summarize the low_contention_stats table and put the summary in statistics. A statistics update looks like this: UPDATE low_contention_stats SET value = value + 1 WHERE date = current_date AND key = 'KEY' AND randomizer = my_random_val AND not summarized; (If the update updates 0 rows, we do an INSERT) The summary process when we copy to statistics looks like this: # "Freeze" existing rows -- any updates during the summary process # will have to insert new rows where summarized is false UPDATE low_contention_stats SET summarized = 't'; # Summarize stats SELECT date, key, sum(value) FROM low_contention_stats WHERE summarized GROUP BY date, key; # Then we iterate over the results, updating "statistics" DELETE FROM low_contention_stats WHERE summarized; Will this help? I can't easily test it, because I only see the problem under high load, and the only high-load environment I have access to is a production one. :-( Regards, David.
"David F. Skoll" <dfs@roaringpenguin.com> writes: > What appears to be happening is this: > 1) Sometimes, a whole bunch of clients try updating the same row. I > see sevaral postgresql processes marked "UPDATE waiting" Any process that arrives at the row and finds it already modified by some concurrent transaction will wait for that concurrent transaction to complete. > 2) Because the client processes must complete rather quickly, a supervisor > process kills them if they don't complete soon. Zapping clients that are in the middle of database operations is bad design IMHO. > 3) The postgresql processes in an "UPDATE waiting" state seem to be > blocked waiting for a semaphore, and they do not notice the closing of > the connection when the client is killed. That's correct, a backend will generally not notice client disconnect until it next waits for a client command. It's not totally clear why you've got so many processes waiting to update the same row, though. Which process does have the row lock, and why isn't it completing its transaction? regards, tom lane
On Thu, 4 Sep 2003, Tom Lane wrote: > Any process that arrives at the row and finds it already modified by > some concurrent transaction will wait for that concurrent transaction > to complete. Right. And it waits on a semaphore, right? So there's no way to use select() to wait for EITHER the semaphore OR the loss of the connection? I hate SysV IPC. :-( > Zapping clients that are in the middle of database operations is bad > design IMHO. It's required. The clients are e-mail filters and they must reply quickly, before the end of the SMTP transaction. If they take too long, they must be killed so the SMTP transaction can be tempfailed. If they are not killed, the SMTP sessions pile up and eventually kill the machine. > That's correct, a backend will generally not notice client disconnect > until it next waits for a client command. It's not totally clear why > you've got so many processes waiting to update the same row, though. It's on a high-volume mail server that receives around 500K messages/day. About 180,000 of those are viruses, so we often have multiple processes trying to update the virus statistics row. > Which process does have the row lock, and why isn't it completing its > transaction? I don't know the details of PostgreSQL's implementation, but it seems that when lots of processes are waiting to update the same row, it gets incredibly slow. -- David.
"David F. Skoll" <dfs@roaringpenguin.com> writes: >> Which process does have the row lock, and why isn't it completing its >> transaction? > I don't know the details of PostgreSQL's implementation, but it seems > that when lots of processes are waiting to update the same row, it > gets incredibly slow. Hmm. That might represent a fixable bug. How many is "lots"? Can you give a self-contained test case? regards, tom lane
On Thu, 4 Sep 2003, David F. Skoll wrote: > > Zapping clients that are in the middle of database operations is bad > > design IMHO. > > It's required. The clients are e-mail filters and they must reply > quickly, before the end of the SMTP transaction. If they take too long, > they must be killed so the SMTP transaction can be tempfailed. If they > are not killed, the SMTP sessions pile up and eventually kill the machine. It might be worth racking your brains to think of other ways. Query timeouts? > > That's correct, a backend will generally not notice client disconnect > > until it next waits for a client command. It's not totally clear why > > you've got so many processes waiting to update the same row, though. > > It's on a high-volume mail server that receives around 500K > messages/day. About 180,000 of those are viruses, so we often have > multiple processes trying to update the virus statistics row. > > > Which process does have the row lock, and why isn't it completing its > > transaction? > > I don't know the details of PostgreSQL's implementation, but it seems > that when lots of processes are waiting to update the same row, it > gets incredibly slow. All trying to access the same row seems a bad idea generally. Instead, why not make it store a new record for each instance, and have a cronjob each day update the statistics from that. It will be more efficient, overall. It can be done hourly, even. -- Sam Barnett-Cormack Software Developer | Student of Physics & Maths UK Mirror Service (http://www.mirror.ac.uk) | Lancaster University
On Thu, 4 Sep 2003, Sam Barnett-Cormack wrote: > It might be worth racking your brains to think of other ways. Query > timeouts? Either way, if the back-end is waiting on a semaphore, will it time out the query and terminate? The problem is lots of waiting back-end processes. > why not make it store a new record for each instance, and have a cronjob > each day update the statistics from that. It will be more efficient, > overall. It can be done hourly, even. I posted a compromise solution: A "low-contention" table that reduces the likelihood of contention. Adding a new record could result in pretty large tables that need to be summarised. I'll play around with the low-contention table first. Regards, David.
I didn't see any response to this. Sorry if this is already stale for you. On Thu, Sep 04, 2003 at 08:21:17AM -0400, David F. Skoll wrote: > Questions: > > 1) Am I on the right track? Is this how PostgreSQL works? More or less, yes. The significant part here is that the postmaster won't notice that the client is gone until it returns from the work it was trying to do. It'll eventually come back, but it'll take some time. How low does your contention need to be? > A statistics update looks like this: > > UPDATE low_contention_stats SET value = value + 1 > WHERE date = current_date AND key = 'KEY' > AND randomizer = my_random_val AND not summarized; > (If the update updates 0 rows, we do an INSERT) > > The summary process when we copy to statistics looks like this: If you're going to summarise anyway, why not just always insert into a "holding" table, and then periodically (infrequently, though) select out of there and summarise at that point. Note that if you do this very frequently, and you have also to select the summary data, it won't work (as I have learned from painful experience) because the holding table will gradually build up a lot of dead tuples. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Fri, 12 Sep 2003, Andrew Sullivan wrote: > More or less, yes. The significant part here is that the postmaster > won't notice that the client is gone until it returns from the work > it was trying to do. It'll eventually come back, but it'll take some > time. How low does your contention need to be? Low, low, low... near real-time response is required. > If you're going to summarise anyway, why not just always insert into > a "holding" table, and then periodically (infrequently, though) > select out of there and summarise at that point. This is the solution I picked (thanks to Tom Lane) and it seems to work well. > Note that if you do > this very frequently, and you have also to select the summary data, > it won't work (as I have learned from painful experience) because > the holding table will gradually build up a lot of dead tuples. That doesn't seem to be a problem; after the summary, I do a VACUUM and the holding table seems to shrink down nicely. Regards, David.
On Fri, Sep 12, 2003 at 09:53:47AM -0400, David F. Skoll wrote: > > Note that if you do > > this very frequently, and you have also to select the summary data, > > it won't work (as I have learned from painful experience) because > > the holding table will gradually build up a lot of dead tuples. > > That doesn't seem to be a problem; after the summary, I do a > VACUUM and the holding table seems to shrink down nicely. You apparently don't have the contention that we did. Are your transactions short? We had a problem with this sort of design, but I think it was because we had some transactions which ran long and depended on the summarised results. But if it works for you, great. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110