Thread: transaction timeout
Hi all, I have been quite satisfied with the level of support from the PostgreSQL community, but this time I'm getting nothing. So, is transaction timeout option planned at all? What's the alternative solution to a client that's hung in transaction? thanks, Eugene __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Tue, Jul 26, 2005 at 07:40:30AM -0700, Dr NoName wrote: > So, is transaction timeout option planned at all? > What's the alternative solution to a client that's hung in > transaction? Forcibly end it? -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman)
Yeah, that's what we have to resort to now, but that's not a solution. Until we kill the client, the entire database is locked (or, at least the tables that other clients need to write to, which is effectively the same thing). This is annoying enough during the week but it's especially a problem on weekends when none of the developers are in the office. A single client should not be able to bring the entire database down. The DB should recognize that the client went down and roll back the transaction. That would be the ideal solution. Anything else we can do to remedy the situation? thanks, Eugene --- Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On Tue, Jul 26, 2005 at 07:40:30AM -0700, Dr NoName > wrote: > > > So, is transaction timeout option planned at all? > > > What's the alternative solution to a client that's > hung in > > transaction? > > Forcibly end it? > > -- > Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) > "I dream about dreams about dreams", sang the > nightingale > under the pale moon (Sandman) > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Tue, Jul 26, 2005 at 07:40:30AM -0700, Dr NoName wrote: > > I have been quite satisfied with the level of support > from the PostgreSQL community, but this time I'm > getting nothing. There have been a couple of replies to your post, although perhaps not what you were hoping for: http://archives.postgresql.org/pgsql-general/2005-07/msg00984.php http://archives.postgresql.org/pgsql-general/2005-07/msg00985.php > So, is transaction timeout option planned at all? A search through the archives shows that transaction timeout has been discussed recently in pgsql-hackers, but unless I've missed something it it hasn't been implemented yet (and therefore probably won't be available in 8.1 since it's in feature freeze). > What's the alternative solution to a client that's hung in transaction? What's the client doing that takes locks strong enough to "lock up the entire database"? Why does the client hang? Since the database doesn't currently have a way to detect and handle these situations, it might be worthwhile to find out what's happening to see if it can be prevented. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Tue, 2005-07-26 at 09:40, Dr NoName wrote: > Hi all, > > I have been quite satisfied with the level of support > from the PostgreSQL community, but this time I'm > getting nothing. So, is transaction timeout option > planned at all? What's the alternative solution to a > client that's hung in transaction? The common view on this kind of thing is that if your client is broken, you need to fix it. That said, I have seen some folks post about writing a perl or shell script that runs every x minutes looking for connections that have been idle for > a certain amount of time and kill the backend associated with it (sigterm, not -9...) Generally speaking, I'd rather code my own solution to these problems than do it the way Oracle does. Often times the choices someone else makes for you in fixing these problems are suboptimal for your system.
On Tue, 2005-07-26 at 10:33, Dr NoName wrote: > Yeah, that's what we have to resort to now, but that's > not a solution. Until we kill the client, the entire > database is locked (or, at least the tables that other > clients need to write to, which is effectively the > same thing). This is annoying enough during the week > but it's especially a problem on weekends when none of > the developers are in the office. OK, for the third or fourth time, what kind of locks is your application taking out that can lock the whole database? > > A single client should not be able to bring the entire > database down. A single client running a large unconstrained join can easily bring both postgresql or Oracle to its knees. Their very nature, of handling hundreds of users accessing large amounts of data makes databases prone to such problems, and requires you to carefully design your applications so as not to do things that cause the database to hiccup. > The DB should recognize that the client > went down and roll back the transaction. How, exactly, can PostgreSQL (or any other database) recognize a hung client versus one that's waiting for an hour on user input? > That would be > the ideal solution. Anything else we can do to remedy > the situation? Yes, tell us what you're doing that "locks the whole database".
On Tue, Jul 26, 2005 at 08:33:19AM -0700, Dr NoName wrote: > A single client should not be able to bring the entire > database down. The DB should recognize that the client > went down and roll back the transaction. That would be > the ideal solution. Anything else we can do to remedy > the situation? Now wait just a second. The database is not down at all just because somebody left a transaction open. The real problem is that that open transaction is having some resources locked, right? I guess the real answer is not to leave transactions open. If you do that by design, say because the app shows a data modification window, and keeps a transaction open just to be able to save the changes later, then you really need to rethink your app design. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) Maybe there's lots of data loss but the records of data loss are also lost. (Lincoln Yeoh)
> What's the client doing that takes locks strong > enough to "lock up > the entire database"? Why does the client hang? yeah, good question. I thought postgres uses better-than-row-level locking? Could the total deadlock be caused by a combination of an open transaction and VACUUM FULL that runs every sunday? > Since the database > doesn't currently have a way to detect and handle > these situations, > it might be worthwhile to find out what's happening > to see if it > can be prevented. *Anything* can happen. Like, for instance, last week a user tried to kill the client and only managed to kill some of the threads. But since the process was not fully dead, the socket was not closed, so transaction was still in progress. The point is that the client must not to be trusted to always do the right thing. That's why we have things like protected memory, pre-emptive multitasking, resource limits, etc. Similarly a database must have the ability to detect a broken client and kick it out. thanks, Eugene ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs
> On Tue, Jul 26, 2005 at 08:33:19AM -0700, Dr NoName > wrote: > > > A single client should not be able to bring the > entire > > database down. The DB should recognize that the > client > > went down and roll back the transaction. That > would be > > the ideal solution. Anything else we can do to > remedy > > the situation? > > Now wait just a second. The database is not down at > all just because > somebody left a transaction open. The real problem > is that that open > transaction is having some resources locked, right? right, but that's effectively the same thing: users cannot write to the database and in some cases can't even read from it. > I guess the real answer is not to leave transactions > open. If you do > that by design, say because the app shows a data > modification window, > and keeps a transaction open just to be able to save > the changes later, > then you really need to rethink your app design. There is no user interaction in the middle of a transaction. But there are other things we have to do (file system I/O, heavy processing, etc.) Those operations really do need to be interleaved with the DB writes. thanks, Eugene __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
> OK, for the third or fourth time, what kind of locks > is your application > taking out that can lock the whole database? I'd like to know that myself. How can a select/inser/update lock an entire table or even multiple tables? > How, exactly, can PostgreSQL (or any other database) > recognize a hung > client versus one that's waiting for an hour on user > input? transaction timeout. In our application, one hour-long transaction is not normal. I want it to abort automatically. > Yes, tell us what you're doing that "locks the whole > database". I wish I knew. Last sunday it was locked so bad that even selects were blocked until we killed the offending client. Eugene ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs
> The common view on this kind of thing is that if > your client is broken, > you need to fix it. The problem is, we can't fix the users, nor can we fix other software that our client has to interact with. There will always be occasional situations when a client gets stuck. > That said, I have seen some folks post about writing > a perl or shell > script that runs every x minutes looking for > connections that have been > idle for > a certain amount of time and kill the > backend associated with > it (sigterm, not -9...) what are the implications of killing a postmaster process? thanks, Eugene __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Tue, 2005-07-26 at 11:24, Dr NoName wrote: > > The common view on this kind of thing is that if > > your client is broken, > > you need to fix it. > > The problem is, we can't fix the users, nor can we fix > other software that our client has to interact with. > There will always be occasional situations when a > client gets stuck. > > That said, I have seen some folks post about writing > > a perl or shell > > script that runs every x minutes looking for > > connections that have been > > idle for > a certain amount of time and kill the > > backend associated with > > it (sigterm, not -9...) > > what are the implications of killing a postmaster > process? A Sigterm is generally considered safe. It's -9 and its ilk that you need to be wary of. I think we and you both need more information about this failure. Do you have any logging turned on that could give us a clue to what's causing this failure? It sounds to me more like one of the user apps is DOSing the server with large unconstrained joins or something similarly dangerous to do.
TIP 2: Don't 'kill -9' the postmaster Dr NoName wrote: >>The common view on this kind of thing is that if >>your client is broken, >>you need to fix it. >> >> > >The problem is, we can't fix the users, nor can we fix >other software that our client has to interact with. >There will always be occasional situations when a >client gets stuck. > > > >>That said, I have seen some folks post about writing >>a perl or shell >>script that runs every x minutes looking for >>connections that have been >>idle for > a certain amount of time and kill the >>backend associated with >>it (sigterm, not -9...) >> >> > >what are the implications of killing a postmaster >process? > >thanks, > >Eugene > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com > >---------------------------(end of broadcast)--------------------------- >TIP 5: don't forget to increase your free space map settings > > > > >
> > > That said, I have seen some folks post about writing a > perl or shell > > > script that runs every x minutes looking for connections > that have > > > been idle for > a certain amount of time and kill the backend > > > associated with it (sigterm, not -9...) > > > > what are the implications of killing a postmaster process? > > A Sigterm is generally considered safe. It's -9 and its ilk > that you need to be wary of. No it's not. See the archives. The only *safe* way to do it ATM is to restart the database. SIGTERM may leave orphaned locks or such things in the system. (Incidentally, -9 on a single backend should be safe I believe. The postmaster will tell all concurrent connections to abort and restart. It's not nice, but it should be safe - should perform onrmal recovery same as if you pull the plug) //Magnus
> If you have second database in the cluster is it > still operational when > the main database locks up? we don't have a DB cluster. It would be pretty useless since postgresql doesn't support distributed transactions. > Also it seems that some diagnostics are needed in > the client app to log > the crash event so you can determine which SQL > commands are causing the > lock. I'll try to get that next time it happens. But regardless of sql commands are running, I know what the root cause is: a client hangs while in transaction. > Despite many years of writing buggy code I have not > yet locked a whole > DB in the fashion described. I can not see how a > simple select / insert > / update command sequence can achieve it unless > there is a particular > relation between the tables involved. As I have already said, I suspect this might be caused by a combination of an open transaction and a weekly VACUUM FULL. Does that sound right? > If the tables are related / linked via rules / > triggers/ keys then > perhaps add a test table that bears no relation to > the others and see if > it is locked when the others appear to have this > problem you are describing. > > Perhaps a simple test : When the DB error occurs can > you use PGAdmin to > read an independent table, or read from another > database. thanks, I'll try that. Eugene ____________________________________________________ Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs
On Tue, 2005-07-26 at 12:41, Magnus Hagander wrote: > > > > That said, I have seen some folks post about writing a > > perl or shell > > > > script that runs every x minutes looking for connections > > that have > > > > been idle for > a certain amount of time and kill the backend > > > > associated with it (sigterm, not -9...) > > > > > > what are the implications of killing a postmaster process? > > > > A Sigterm is generally considered safe. It's -9 and its ilk > > that you need to be wary of. > > No it's not. See the archives. > The only *safe* way to do it ATM is to restart the database. SIGTERM may > leave orphaned locks or such things in the system. (Incidentally, -9 on > a single backend should be safe I believe. The postmaster will tell all > concurrent connections to abort and restart. It's not nice, but it > should be safe - should perform onrmal recovery same as if you pull the > plug) Really? I was under the impression that doing a "kill <backendpid>" on an idle connection would clean up those things. Was that a discussion on hackers that brought this up? And if so, what was the time period, I'd like to read through it.
> > > > > That said, I have seen some folks post about writing a > > > perl or shell > > > > > script that runs every x minutes looking for connections > > > that have > > > > > been idle for > a certain amount of time and kill the backend > > > > > associated with it (sigterm, not -9...) > > > > > > > > what are the implications of killing a postmaster process? > > > > > > A Sigterm is generally considered safe. It's -9 and its ilk that > > > you need to be wary of. > > > > No it's not. See the archives. > > The only *safe* way to do it ATM is to restart the > database. SIGTERM > > may leave orphaned locks or such things in the system. > (Incidentally, > > -9 on a single backend should be safe I believe. The > postmaster will > > tell all concurrent connections to abort and restart. It's > not nice, > > but it should be safe - should perform onrmal recovery same > as if you > > pull the > > plug) > > Really? I was under the impression that doing a "kill > <backendpid>" on an idle connection would clean up those > things. Was that a discussion on hackers that brought this > up? And if so, what was the time period, I'd like to read through it. It's been brought up several times on hackers. Once back last autumn when looking at the pg_terminate_backend function, and it was brought up again this spring I think during one of the discussions about the instrumentation patch. I can't find the actual discussion at a quick look, but the quote from Tom on http://archives.postgresql.org/pgsql-hackers/2005-06/msg00970.php certainly shows there is a problem :-) The discussion should be somewhere around that timeframe (I think before it - before I came up with the idea to solve it that didn't work) //Magnus
On Tue, 2005-07-26 at 12:51, Dr NoName wrote: > > If you have second database in the cluster is it > > still operational when > > the main database locks up? > > we don't have a DB cluster. It would be pretty useless > since postgresql doesn't support distributed > transactions. You misunderstood his point. In PostgreSQL parlance, a "cluster" is a single postmaster running on a single machine, with 1 or more databases. So, what he wanted to know was, if your application is hitting a database called fred, and you have a spare database named wilma, would "psql wilma" work when the database is "locked up?" Can you elaborate on what you mean by a database that is "locked up?" > > Also it seems that some diagnostics are needed in > > the client app to log > > the crash event so you can determine which SQL > > commands are causing the > > lock. > > I'll try to get that next time it happens. But > regardless of sql commands are running, I know what > the root cause is: a client hangs while in > transaction. Actually, unless the client is holding a table level exclusive lock, most transactions should continue. > > Despite many years of writing buggy code I have not > > yet locked a whole > > DB in the fashion described. I can not see how a > > simple select / insert > > / update command sequence can achieve it unless > > there is a particular > > relation between the tables involved. > > As I have already said, I suspect this might be caused > by a combination of an open transaction and a weekly > VACUUM FULL. Does that sound right? No, VAcuum full shouldn't cause this kind of issue. Now, if the database is just running real slow, instead of actually locking up, that's possible with vacuum full.
> You misunderstood his point. In PostgreSQL > parlance, a "cluster" is a > single postmaster running on a single machine, with > 1 or more > databases. So, what he wanted to know was, if your > application is > hitting a database called fred, and you have a spare > database named > wilma, would "psql wilma" work when the database is > "locked up?" ok, I see. That's another thing to try next time. > Can you elaborate on what you mean by a database > that is "locked up?" Here is the ps output from before the offending process was killed. That one "idle in transaction" process caused everything to lock. 2005 07 25 6:07:34 s17 79 > ps -efwww | grep postgres postgres 23281 1 0 Jul18 ? 00:00:29 /usr/bin/postmaster -p 5432 postgres 23285 23281 0 Jul18 ? 00:20:20 postgres: stats buffer process postgres 23287 23285 0 Jul18 ? 00:18:08 postgres: stats collector process postgres 12466 23281 0 Jul18 ? 00:00:00 postgres: siam siam_production 172.16.11.95 idle postgres 12474 23281 0 Jul18 ? 00:00:00 postgres: siam siam_production 172.16.11.95 idle postgres 26947 23281 0 Jul19 ? 00:00:00 postgres: siam siam_production 172.16.11.95 idle postgres 3514 23281 0 Jul19 ? 00:00:00 postgres: siam siam_production 172.16.11.50 idle postgres 6881 23281 0 Jul19 ? 00:00:01 postgres: siam siam_production 172.16.11.71 idle postgres 17750 23281 0 Jul20 ? 00:00:00 postgres: siam siam_production 172.16.10.159 idle in transaction postgres 26504 23281 0 Jul20 ? 00:00:00 postgres: siam siam_production 172.16.11.50 idle postgres 12284 23281 0 Jul20 ? 00:00:00 postgres: siam siam_production 172.16.10.125 idle postgres 16026 23281 0 Jul20 ? 00:00:00 postgres: siam siam_production 172.16.10.125 idle postgres 25709 23281 0 Jul21 ? 00:01:00 postgres: siam siam_production 172.16.1.17 idle postgres 27980 23281 0 Jul21 ? 00:04:08 postgres: siam siam_production 172.16.1.17 idle postgres 14854 23281 0 Jul21 ? 00:00:03 postgres: siam siam_production 172.16.11.95 idle postgres 19531 23281 0 Jul21 ? 00:00:02 postgres: siam siam_production 172.16.11.95 idle postgres 17590 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.11.95 idle postgres 26917 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26933 23281 0 Jul22 ? 00:06:57 postgres: siam siam_production 172.16.1.17 idle postgres 26934 23281 0 Jul22 ? 00:00:56 postgres: siam siam_production 172.16.1.17 idle postgres 26939 23281 0 Jul22 ? 00:01:09 postgres: siam siam_production 172.16.1.17 idle postgres 27362 23281 0 Jul22 ? 00:01:56 postgres: siam siam_production 172.16.1.17 idle postgres 27365 23281 0 Jul22 ? 00:01:03 postgres: siam siam_production 172.16.1.17 idle postgres 27398 23281 0 Jul22 ? 00:00:26 postgres: siam siam_production 172.16.1.17 idle postgres 27856 23281 0 Jul22 ? 00:01:00 postgres: siam siam_production 172.16.1.17 idle postgres 27858 23281 0 Jul22 ? 00:05:26 postgres: siam siam_production 172.16.1.17 idle postgres 27863 23281 0 Jul22 ? 00:00:58 postgres: siam siam_production 172.16.1.17 idle postgres 27865 23281 0 Jul22 ? 00:01:28 postgres: siam siam_production 172.16.1.17 idle postgres 27869 23281 0 Jul22 ? 00:00:29 postgres: siam siam_production 172.16.1.17 idle postgres 28295 23281 0 Jul22 ? 00:00:23 postgres: siam siam_production 172.16.1.17 idle postgres 28313 23281 0 Jul22 ? 00:00:45 postgres: siam siam_production 172.16.1.17 idle postgres 28315 23281 0 Jul22 ? 00:01:06 postgres: siam siam_production 172.16.1.17 idle postgres 28725 23281 0 Jul22 ? 00:05:07 postgres: siam siam_production 172.16.1.17 idle postgres 13559 23281 0 Jul22 ? 00:00:24 postgres: siam siam_production 172.16.1.17 idle postgres 13595 23281 0 Jul22 ? 00:00:36 postgres: siam siam_production 172.16.1.17 idle postgres 14017 23281 0 Jul22 ? 00:00:52 postgres: siam siam_production 172.16.1.17 idle postgres 25206 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.11.146 idle postgres 3742 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.10.142 idle postgres 12016 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.10.106 idle postgres 13782 23281 0 Jul22 ? 00:00:13 postgres: siam siam_production 172.16.10.106 idle postgres 13853 23281 0 Jul22 ? 00:00:02 postgres: siam siam_production 172.16.10.106 idle postgres 14381 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.10.106 idle postgres 14923 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.10.106 idle postgres 17181 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.11.59 idle postgres 6212 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production [local] VACUUM waiting postgres 5952 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.1.17 SELECT waiting postgres 24644 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.100 SELECT waiting postgres 26271 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 SELECT waiting postgres 26720 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 SELECT waiting postgres 26721 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 SELECT waiting postgres 27161 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 SELECT waiting postgres 27162 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 SELECT waiting postgres 28005 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 SELECT waiting postgres 28450 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 SELECT waiting postgres 28451 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 SELECT waiting postgres 3049 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.134 SELECT waiting postgres 3875 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.134 SELECT waiting postgres 4286 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.130 SELECT waiting postgres 4700 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.185 SELECT waiting postgres 13850 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.105 SELECT waiting postgres 13851 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.105 SELECT waiting postgres 13852 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.105 SELECT waiting postgres 13854 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.105 SELECT waiting postgres 13855 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.105 SELECT waiting postgres 13856 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.145 SELECT waiting postgres 14268 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.105 SELECT waiting postgres 14269 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.105 SELECT waiting postgres 14270 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.138 SELECT waiting postgres 14685 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.105 SELECT waiting postgres 14686 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.24 SELECT waiting postgres 15100 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.105 SELECT waiting postgres 15951 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.125 SELECT waiting postgres 16367 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.109 SELECT waiting postgres 25054 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.112 SELECT waiting postgres 25920 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.112 SELECT waiting postgres 25921 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.157 SELECT waiting postgres 25922 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.112 SELECT waiting postgres 26337 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.112 SELECT waiting postgres 26338 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.112 SELECT waiting postgres 10948 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.105 SELECT waiting postgres 12195 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.187 SELECT waiting postgres 12196 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.11.105 SELECT waiting postgres 22691 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.1.17 SELECT waiting postgres 23059 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.121 SELECT waiting postgres 15209 23281 0 04:00 ? 00:00:00 postgres: siam siam_production [local] SELECT waiting Here is the ps output immediately after the hung client was killed. As you can see, a whole shitload of SELECTs suddenly woke up and finished. Unfortunately, I don't have pg_stat_activity output. 2005 07 25 6:14:41 s17 78 > ps -efwww | grep postgres postgres 23281 1 0 Jul18 ? 00:00:29 /usr/bin/postmaster -p 5432 postgres 23285 23281 0 Jul18 ? 00:20:21 postgres: stats buffer process postgres 23287 23285 0 Jul18 ? 00:18:08 postgres: stats collector process postgres 12466 23281 0 Jul18 ? 00:00:00 postgres: siam siam_production 172.16.11.95 idle postgres 12474 23281 0 Jul18 ? 00:00:00 postgres: siam siam_production 172.16.11.95 idle postgres 26947 23281 0 Jul19 ? 00:00:00 postgres: siam siam_production 172.16.11.95 idle postgres 3514 23281 0 Jul19 ? 00:00:00 postgres: siam siam_production 172.16.11.50 idle postgres 6881 23281 0 Jul19 ? 00:00:01 postgres: siam siam_production 172.16.11.71 idle postgres 26504 23281 0 Jul20 ? 00:00:00 postgres: siam siam_production 172.16.11.50 idle postgres 12284 23281 0 Jul20 ? 00:00:00 postgres: siam siam_production 172.16.10.125 idle postgres 16026 23281 0 Jul20 ? 00:00:00 postgres: siam siam_production 172.16.10.125 idle postgres 25709 23281 0 Jul21 ? 00:01:00 postgres: siam siam_production 172.16.1.17 idle postgres 27980 23281 0 Jul21 ? 00:04:08 postgres: siam siam_production 172.16.1.17 idle postgres 14854 23281 0 Jul21 ? 00:00:03 postgres: siam siam_production 172.16.11.95 idle postgres 19531 23281 0 Jul21 ? 00:00:02 postgres: siam siam_production 172.16.11.95 idle postgres 17590 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.11.95 idle postgres 26917 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26933 23281 0 Jul22 ? 00:06:57 postgres: siam siam_production 172.16.1.17 idle postgres 26934 23281 0 Jul22 ? 00:00:56 postgres: siam siam_production 172.16.1.17 idle postgres 26939 23281 0 Jul22 ? 00:01:09 postgres: siam siam_production 172.16.1.17 idle postgres 27362 23281 0 Jul22 ? 00:01:56 postgres: siam siam_production 172.16.1.17 idle postgres 27365 23281 0 Jul22 ? 00:01:03 postgres: siam siam_production 172.16.1.17 idle postgres 27398 23281 0 Jul22 ? 00:00:26 postgres: siam siam_production 172.16.1.17 idle postgres 27856 23281 0 Jul22 ? 00:01:00 postgres: siam siam_production 172.16.1.17 idle postgres 27858 23281 0 Jul22 ? 00:05:26 postgres: siam siam_production 172.16.1.17 idle postgres 27863 23281 0 Jul22 ? 00:00:58 postgres: siam siam_production 172.16.1.17 idle postgres 27865 23281 0 Jul22 ? 00:01:28 postgres: siam siam_production 172.16.1.17 idle postgres 27869 23281 0 Jul22 ? 00:00:29 postgres: siam siam_production 172.16.1.17 idle postgres 28295 23281 0 Jul22 ? 00:00:23 postgres: siam siam_production 172.16.1.17 idle postgres 28313 23281 0 Jul22 ? 00:00:45 postgres: siam siam_production 172.16.1.17 idle postgres 28315 23281 0 Jul22 ? 00:01:06 postgres: siam siam_production 172.16.1.17 idle postgres 28725 23281 0 Jul22 ? 00:05:07 postgres: siam siam_production 172.16.1.17 idle postgres 13559 23281 0 Jul22 ? 00:00:24 postgres: siam siam_production 172.16.1.17 idle postgres 13595 23281 0 Jul22 ? 00:00:36 postgres: siam siam_production 172.16.1.17 idle postgres 14017 23281 0 Jul22 ? 00:00:52 postgres: siam siam_production 172.16.1.17 idle postgres 25206 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.11.146 idle postgres 3742 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.10.142 idle postgres 12016 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.10.106 idle postgres 13782 23281 0 Jul22 ? 00:00:13 postgres: siam siam_production 172.16.10.106 idle postgres 13853 23281 0 Jul22 ? 00:00:02 postgres: siam siam_production 172.16.10.106 idle postgres 14381 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.10.106 idle postgres 14923 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.10.106 idle postgres 17181 23281 0 Jul22 ? 00:00:00 postgres: siam siam_production 172.16.11.59 idle postgres 6212 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production [local] VACUUM waiting postgres 5952 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 24644 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.100 idle postgres 26721 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 idle postgres 27161 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.10.114 idle postgres 22691 23281 0 Jul24 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 15209 23281 0 03:59 ? 00:00:10 postgres: siam siam_production [local] COPY postgres 26975 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26976 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26977 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26978 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26979 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26980 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26981 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26982 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26983 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26984 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26985 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26986 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26987 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26988 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26989 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 idle postgres 26990 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.1.17 SELECT waiting postgres 27041 23281 0 06:14 ? 00:00:00 postgres: siam siam_production 172.16.11.130 SELECT waiting costa 27091 26473 0 06:14 pts/0 00:00:00 grep postgres > No, VAcuum full shouldn't cause this kind of issue. > Now, if the > database is just running real slow, instead of > actually locking up, > that's possible with vacuum full. no, there was *zero* load on the server. thanks, Eugene __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Scott Marlowe <smarlowe@g2switchworks.com> writes: > On Tue, 2005-07-26 at 12:41, Magnus Hagander wrote: >> The only *safe* way to do it ATM is to restart the database. SIGTERM may >> leave orphaned locks or such things in the system. > Really? I was under the impression that doing a "kill <backendpid>" on > an idle connection would clean up those things. Well, it ought to, but I for one don't consider that code path adequately tested --- and we have seen at least one report (from Rod Taylor if memory serves) suggesting that there are in fact bugs in it. We know that SIGTERM'ing all the backends at once leaves the database with a good state on disk; that path is tested everytime someone shuts down Postgres. It does not follow that SIGTERM'ing a single backend leaves consistent state in shared memory. Rod's report suggested a corrupt lock table in particular. > Was that a discussion on hackers that brought this up? And if so, > what was the time period, I'd like to read through it. There was a flamewar late during the 8.0 devel cycle where someone was pushing to add a SIGTERM-one-backend function, and I was demurring that I didn't think it was adequately tested. Subsequent events seem to have proven that fear correct. Eventually I'm sure we'll find and fix the problem, but at the moment it's a risky thing to do. regards, tom lane
On Tue, Jul 26, 2005 at 02:33:04PM -0400, Tom Lane wrote: > Well, it ought to, but I for one don't consider that code path > adequately tested --- and we have seen at least one report (from Rod > Taylor if memory serves) suggesting that there are in fact bugs in it. > > We know that SIGTERM'ing all the backends at once leaves the database > with a good state on disk; that path is tested everytime someone shuts > down Postgres. It does not follow that SIGTERM'ing a single backend > leaves consistent state in shared memory. Rod's report suggested a > corrupt lock table in particular. Well, is there debugging you can enable to check for corrupted locak tables? If so, would it we worthwhile to setup a system with lots of concurrent transactions and kill processes regularly and see if anything strange happens. Also, I've tended to use -INT to abort the current query, then -TERM to kill the backend. Would this be safer, given you know exactly where everything is at that point (aborted transaction)? Does an aborted transaction release its locks straight away or does it wait until the client issues a ROLLBACK? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
So can anyone suggest a solution that does not involve killing the client when it hangs? thanks, Eugene --- Dr NoName <spamacct11@yahoo.com> wrote: > > You misunderstood his point. In PostgreSQL > > parlance, a "cluster" is a > > single postmaster running on a single machine, > with > > 1 or more > > databases. So, what he wanted to know was, if > your > > application is > > hitting a database called fred, and you have a > spare > > database named > > wilma, would "psql wilma" work when the database > is > > "locked up?" > > > ok, I see. That's another thing to try next time. > > > > Can you elaborate on what you mean by a database > > that is "locked up?" > > > Here is the ps output from before the offending > process was killed. That one "idle in transaction" > process caused everything to lock. > > 2005 07 25 6:07:34 s17 79 > ps -efwww | grep > postgres > postgres 23281 1 0 Jul18 ? 00:00:29 > /usr/bin/postmaster -p 5432 > postgres 23285 23281 0 Jul18 ? 00:20:20 > postgres: stats buffer process > postgres 23287 23285 0 Jul18 ? 00:18:08 > postgres: stats collector process > postgres 12466 23281 0 Jul18 ? 00:00:00 > postgres: siam siam_production 172.16.11.95 idle > postgres 12474 23281 0 Jul18 ? 00:00:00 > postgres: siam siam_production 172.16.11.95 idle > postgres 26947 23281 0 Jul19 ? 00:00:00 > postgres: siam siam_production 172.16.11.95 idle > postgres 3514 23281 0 Jul19 ? 00:00:00 > postgres: siam siam_production 172.16.11.50 idle > postgres 6881 23281 0 Jul19 ? 00:00:01 > postgres: siam siam_production 172.16.11.71 idle > postgres 17750 23281 0 Jul20 ? 00:00:00 > postgres: siam siam_production 172.16.10.159 idle in > transaction > postgres 26504 23281 0 Jul20 ? 00:00:00 > postgres: siam siam_production 172.16.11.50 idle > postgres 12284 23281 0 Jul20 ? 00:00:00 > postgres: siam siam_production 172.16.10.125 idle > postgres 16026 23281 0 Jul20 ? 00:00:00 > postgres: siam siam_production 172.16.10.125 idle > postgres 25709 23281 0 Jul21 ? 00:01:00 > postgres: siam siam_production 172.16.1.17 idle > postgres 27980 23281 0 Jul21 ? 00:04:08 > postgres: siam siam_production 172.16.1.17 idle > postgres 14854 23281 0 Jul21 ? 00:00:03 > postgres: siam siam_production 172.16.11.95 idle > postgres 19531 23281 0 Jul21 ? 00:00:02 > postgres: siam siam_production 172.16.11.95 idle > postgres 17590 23281 0 Jul22 ? 00:00:00 > postgres: siam siam_production 172.16.11.95 idle > postgres 26917 23281 0 Jul22 ? 00:00:00 > postgres: siam siam_production 172.16.1.17 idle > postgres 26933 23281 0 Jul22 ? 00:06:57 > postgres: siam siam_production 172.16.1.17 idle > postgres 26934 23281 0 Jul22 ? 00:00:56 > postgres: siam siam_production 172.16.1.17 idle > postgres 26939 23281 0 Jul22 ? 00:01:09 > postgres: siam siam_production 172.16.1.17 idle > postgres 27362 23281 0 Jul22 ? 00:01:56 > postgres: siam siam_production 172.16.1.17 idle > postgres 27365 23281 0 Jul22 ? 00:01:03 > postgres: siam siam_production 172.16.1.17 idle > postgres 27398 23281 0 Jul22 ? 00:00:26 > postgres: siam siam_production 172.16.1.17 idle > postgres 27856 23281 0 Jul22 ? 00:01:00 > postgres: siam siam_production 172.16.1.17 idle > postgres 27858 23281 0 Jul22 ? 00:05:26 > postgres: siam siam_production 172.16.1.17 idle > postgres 27863 23281 0 Jul22 ? 00:00:58 > postgres: siam siam_production 172.16.1.17 idle > postgres 27865 23281 0 Jul22 ? 00:01:28 > postgres: siam siam_production 172.16.1.17 idle > postgres 27869 23281 0 Jul22 ? 00:00:29 > postgres: siam siam_production 172.16.1.17 idle > postgres 28295 23281 0 Jul22 ? 00:00:23 > postgres: siam siam_production 172.16.1.17 idle > postgres 28313 23281 0 Jul22 ? 00:00:45 > postgres: siam siam_production 172.16.1.17 idle > postgres 28315 23281 0 Jul22 ? 00:01:06 > postgres: siam siam_production 172.16.1.17 idle > postgres 28725 23281 0 Jul22 ? 00:05:07 > postgres: siam siam_production 172.16.1.17 idle > postgres 13559 23281 0 Jul22 ? 00:00:24 > postgres: siam siam_production 172.16.1.17 idle > postgres 13595 23281 0 Jul22 ? 00:00:36 > postgres: siam siam_production 172.16.1.17 idle > postgres 14017 23281 0 Jul22 ? 00:00:52 > postgres: siam siam_production 172.16.1.17 idle > postgres 25206 23281 0 Jul22 ? 00:00:00 > postgres: siam siam_production 172.16.11.146 idle > postgres 3742 23281 0 Jul22 ? 00:00:00 > postgres: siam siam_production 172.16.10.142 idle > postgres 12016 23281 0 Jul22 ? 00:00:00 > postgres: siam siam_production 172.16.10.106 idle > postgres 13782 23281 0 Jul22 ? 00:00:13 > postgres: siam siam_production 172.16.10.106 idle > postgres 13853 23281 0 Jul22 ? 00:00:02 > postgres: siam siam_production 172.16.10.106 idle > postgres 14381 23281 0 Jul22 ? 00:00:00 > postgres: siam siam_production 172.16.10.106 idle > postgres 14923 23281 0 Jul22 ? 00:00:00 > postgres: siam siam_production 172.16.10.106 idle > postgres 17181 23281 0 Jul22 ? 00:00:00 > postgres: siam siam_production 172.16.11.59 idle > postgres 6212 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production [local] VACUUM > waiting > postgres 5952 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.1.17 SELECT > waiting > postgres 24644 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.100 SELECT > waiting > postgres 26271 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.114 SELECT > waiting > postgres 26720 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.114 SELECT > waiting > postgres 26721 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.114 SELECT > waiting > postgres 27161 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.114 SELECT > waiting > postgres 27162 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.114 SELECT > waiting > postgres 28005 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.114 SELECT > waiting > postgres 28450 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.114 SELECT > waiting > postgres 28451 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.114 SELECT > waiting > postgres 3049 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.11.134 SELECT > waiting > postgres 3875 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.11.134 SELECT > waiting > postgres 4286 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.11.130 SELECT > waiting > postgres 4700 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.11.185 SELECT > waiting > postgres 13850 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.105 SELECT > waiting > postgres 13851 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.105 SELECT > waiting > postgres 13852 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.105 SELECT > waiting > postgres 13854 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.105 SELECT > waiting > postgres 13855 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.105 SELECT > waiting > postgres 13856 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.145 SELECT > waiting > postgres 14268 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.105 SELECT > waiting > postgres 14269 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.105 SELECT > waiting > postgres 14270 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.138 SELECT > waiting > postgres 14685 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.105 SELECT > waiting > postgres 14686 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.11.24 SELECT > waiting > postgres 15100 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.105 SELECT > waiting > postgres 15951 23281 0 Jul24 ? 00:00:00 > postgres: siam siam_production 172.16.10.125 SELECT > === message truncated === __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Dr NoName wrote: >>What's the client doing that takes locks strong >>enough to "lock up >>the entire database"? Why does the client hang? >> >> > >yeah, good question. I thought postgres uses >better-than-row-level locking? Could the total >deadlock be caused by a combination of an open >transaction and VACUUM FULL that runs every sunday? > > > Sure. Like this: Client A accesses table T, and "hangs." Client B attempts to get an ACCESS EXCLUSIVE lock on table T in preparation for VACUUM FULL. Client C connects to the database and waits for client B to get and release his lock on table T. Client D connects to the database and waits for client B to get and release his lock on table T. Client E connects to the database and waits for client B to get and release his lock on table T. etc... until all your free connection slots are used up. This happened to me once, except that client B was trying to rename table T and create a new table T. (You might think that clients C, D, and E should bypass client B (since their access does not conflict with A's access.) However, if that was allowed, then a VACUUM FULL on a busy table would wait forever because client C would slip in before A finished, and client D before C finished, etc., leading to a situation called "lock starvation." This can really only be prevented by granting locks on a first-come-first-serve basis.) In your case, don't run VACUUM FULL via a cron job (i.e., when you're not there). If you need to run it regularly, you're almost certainly not reserving enough space in the free space map. VACUUM takes no locks that conflict with selecting, inserting, updating, or deleting, so that should be perfectly safe. Regards, Paul Tillotson
> Sure. Like this: > > Client A accesses table T, and "hangs." > Client B attempts to get an ACCESS EXCLUSIVE lock on > table T in > preparation for VACUUM FULL. > Client C connects to the database and waits for > client B to get and > release his lock on table T. > Client D connects to the database and waits for > client B to get and > release his lock on table T. > Client E connects to the database and waits for > client B to get and > release his lock on table T. > etc... oh! my! gawd! Finally a clear explanation that makes perfect sense. Now why did it take so long? So all I need to do is take out the FULL? Is regular VACUUM sufficient? How often do we need FULL? (I know it's a stupid question without providing some more context, but how can I estimate it?) I suppose the ultimate solution would be a wrapper script that works as follows: check if there are any waiting/idle in transaction processes if such processes exist, do a regular VACUUM and send out a warning email otherwise, do VACUUM FULL. I like this solution a lot more than getting support calls on weekends. Out of curiousity, how is lock acquisition implemented in postgresql? All the processes have to go through some sort of queue, so that locks are granted in FIFO order, as you described. Just trying to understand it better. thanks a lot, Eugene __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Wed, 2005-07-27 at 10:31, Dr NoName wrote: > > Sure. Like this: > > > > Client A accesses table T, and "hangs." > > Client B attempts to get an ACCESS EXCLUSIVE lock on > > table T in > > preparation for VACUUM FULL. > > Client C connects to the database and waits for > > client B to get and > > release his lock on table T. > > Client D connects to the database and waits for > > client B to get and > > release his lock on table T. > > Client E connects to the database and waits for > > client B to get and > > release his lock on table T. > > etc... > > oh! my! gawd! > Finally a clear explanation that makes perfect sense. > Now why did it take so long? Because your initial definition of the problem kinda led us all in the wrong direction for 24 hours? :) Remember, it took like three times of folks asking "what's happening that locks your database" before the vacuum full issue came up. From there, 24 more hours. Actually not bad. And don't forget, the docs on vacuum pretty clearly state: "The second form is the VACUUM FULL command. This uses a more aggressive algorithm for reclaiming the space consumed by expired row versions. Any space that is freed by VACUUM FULL is immediately returned to the operating system. Unfortunately, this variant of the VACUUM command acquires an exclusive lock on each table while VACUUM FULL is processing it. Therefore, frequently using VACUUM FULL can have an extremely negative effect on the performance of concurrent database queries." And then later on: "VACUUM FULL is recommended for cases where you know you have deleted the majority of rows in a table, so that the steady-state size of the table can be shrunk substantially with VACUUM FULL's more aggressive approach. Use plain VACUUM, not VACUUM FULL, for routine vacuuming for space recovery." So, daily vacuum fulls are not recommended. > So all I need to do is take out the FULL? Is regular > VACUUM sufficient? How often do we need FULL? (I know > it's a stupid question without providing some more > context, but how can I estimate it?) Please read up on vacuuming in the docs, at: http://www.postgresql.org/docs/8.0/static/maintenance.html#ROUTINE-VACUUMING It's quite enlightening about this. Basically, assuming your fsm settings are high enough for your update/delete load, yes, plain vacuums should be enough. > > I suppose the ultimate solution would be a wrapper > script that works as follows: > > check if there are any waiting/idle in transaction > processes > if such processes exist, do a regular VACUUM and send > out a warning email > otherwise, do VACUUM FULL. Nah, that's probably overkill. I'd rather just run plain vacuum verboses and check them by hand once a week or so to make sure I'm reclaiming all the space. > I like this solution a lot more than getting support > calls on weekends. Amen brother, amen... > Out of curiousity, how is lock acquisition implemented > in postgresql? All the processes have to go through > some sort of queue, so that locks are granted in FIFO > order, as you described. Just trying to understand it > better. See here: http://www.postgresql.org/docs/8.0/static/mvcc.html PostgreSQL's locking system is quite impression. I kinda giggle when someone says "Well, not MySQL has feature Y, so why bother with PostgreSQL?" It's pretty obvious they haven't really read up on pgsql when they say things like that.
Thanks a lot, everyone! That solved my problem. But I still want to be able to set transaction timeout. Any chance of that in the next release? Eugene --- Scott Marlowe <smarlowe@g2switchworks.com> wrote: > On Wed, 2005-07-27 at 10:31, Dr NoName wrote: > > > Sure. Like this: > > > > > > Client A accesses table T, and "hangs." > > > Client B attempts to get an ACCESS EXCLUSIVE > lock on > > > table T in > > > preparation for VACUUM FULL. > > > Client C connects to the database and waits for > > > client B to get and > > > release his lock on table T. > > > Client D connects to the database and waits for > > > client B to get and > > > release his lock on table T. > > > Client E connects to the database and waits for > > > client B to get and > > > release his lock on table T. > > > etc... > > > > oh! my! gawd! > > Finally a clear explanation that makes perfect > sense. > > Now why did it take so long? > > Because your initial definition of the problem kinda > led us all in the > wrong direction for 24 hours? :) Remember, it took > like three times of > folks asking "what's happening that locks your > database" before the > vacuum full issue came up. From there, 24 more > hours. Actually not > bad. > > And don't forget, the docs on vacuum pretty clearly > state: > > "The second form is the VACUUM FULL command. This > uses a more aggressive > algorithm for reclaiming the space consumed by > expired row versions. Any > space that is freed by VACUUM FULL is immediately > returned to the > operating system. Unfortunately, this variant of the > VACUUM command > acquires an exclusive lock on each table while > VACUUM FULL is processing > it. Therefore, frequently using VACUUM FULL can have > an extremely > negative effect on the performance of concurrent > database queries." > > And then later on: > > "VACUUM FULL is recommended for cases where you know > you have deleted > the majority of rows in a table, so that the > steady-state size of the > table can be shrunk substantially with VACUUM FULL's > more aggressive > approach. Use plain VACUUM, not VACUUM FULL, for > routine vacuuming for > space recovery." > > So, daily vacuum fulls are not recommended. > > > So all I need to do is take out the FULL? Is > regular > > VACUUM sufficient? How often do we need FULL? (I > know > > it's a stupid question without providing some more > > context, but how can I estimate it?) > > Please read up on vacuuming in the docs, at: > > http://www.postgresql.org/docs/8.0/static/maintenance.html#ROUTINE-VACUUMING > > It's quite enlightening about this. Basically, > assuming your fsm > settings are high enough for your update/delete > load, yes, plain vacuums > should be enough. > > > > > I suppose the ultimate solution would be a wrapper > > script that works as follows: > > > > check if there are any waiting/idle in transaction > > processes > > if such processes exist, do a regular VACUUM and > send > > out a warning email > > otherwise, do VACUUM FULL. > > Nah, that's probably overkill. I'd rather just run > plain vacuum > verboses and check them by hand once a week or so to > make sure I'm > reclaiming all the space. > > > I like this solution a lot more than getting > support > > calls on weekends. > > Amen brother, amen... > > > Out of curiousity, how is lock acquisition > implemented > > in postgresql? All the processes have to go > through > > some sort of queue, so that locks are granted in > FIFO > > order, as you described. Just trying to understand > it > > better. > > See here: > > http://www.postgresql.org/docs/8.0/static/mvcc.html > > PostgreSQL's locking system is quite impression. I > kinda giggle when > someone says "Well, not MySQL has feature Y, so why > bother with > PostgreSQL?" It's pretty obvious they haven't really > read up on pgsql > when they say things like that. > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will > ignore your desire to > choose an index scan if your joining column's > datatypes do not > match > __________________________________ Yahoo! Mail for Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail
Dr NoName wrote: >>Sure. Like this: >> >>Client A accesses table T, and "hangs." >>Client B attempts to get an ACCESS EXCLUSIVE lock on >>table T in >>preparation for VACUUM FULL. >>Client C connects to the database and waits for >>client B to get and >>release his lock on table T. >>Client D connects to the database and waits for >>client B to get and >>release his lock on table T. >>Client E connects to the database and waits for >>client B to get and >>release his lock on table T. >>etc... >> >> > >oh! my! gawd! >Finally a clear explanation that makes perfect sense. >Now why did it take so long? > > > I think you did not get the explanation sooner because you did not mention that you were doing VACUUM FULL from a cron job, and you got drawn into an argument about what postgres should do rather than WHY it did what it did. I had a lot of sympathy with your position as something similar happened to me, but you did not give the detail that allowed me to guess (i.e., the VACUUM FULL) until several exchanges had taken place. >So all I need to do is take out the FULL? Is regular >VACUUM sufficient? How often do we need FULL? (I know >it's a stupid question without providing some more >context, but how can I estimate it?) > > > You never have to run VACUUM FULL. The only thing that it does that plain ole VACUUM does not is that it can actually shrink a table. If your table doesn't need shrinking you don't need VACUUM FULL. It is really only for people in desperate straits who let a table get way too large without running regular VACUUM on it. As another poster already pointed out, you need to set the free space map configuration high enough. The general process is to let the database go 1 day without VACUUMing, and then run VACUUM VERBOSE. This will print a lot of information about each table that you don't really care about, and then at the end, it will tell you how many pages you need in the free space map like this: INFO: free space map: 248 relations, 242 pages stored; 4032 total pages needed DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 182 kB shared memory. This example from my box shows that I have the free space tracking 1000 relations and 20000 pages, but I only need 248 relations and 4000 pages. In your own case, unless you are short on RAM, multiplying the amount it says you need by a factor of 4 is probably a good rule of thumb. >Out of curiousity, how is lock acquisition implemented >in postgresql? All the processes have to go through >some sort of queue, so that locks are granted in FIFO >order, as you described. Just trying to understand it >better. > > > Not all locks--only locks that conflict with each other must wait on each other in this fashion. If every lock did, then you would only need 1 lock in the whole database, as it would protect against any sort of concurrent access. :) There are two main kinds of locks--shared locks and exclusive locks. Multiple shared locks can be granted on the same table or row, but only one exclusive lock can be. select, insert, update, and delete, and regular vacuum take no exclusive locks, hence the excellent general performance of postgres.* (see below) The important thing to remember is that if 1 process is waiting trying to get an exclusive lock on some table, then every other process asking for shared lock on the same table will have to wait. I know offhand that VACUUM FULL, ALTER TABLE, and REINDEX take exclusive locks. These are probably the only commands that people would be tempted to run via a cron job. You might find this informative: http://www.postgresql.org/docs/8.0/interactive/explicit-locking.html Regards, Paul Tillotson (*) Actually, you can get this kind of deadlock with just UPDATES. Suppose that your web application does: BEGIN; UPDATE hits SET count = count + 1 WHERE page = 'somepage.aspx'; [other stuff] COMMIT; If you have another transaction that tries to update the SAME ROW, then it will wait for the first transaction to finish. Thus, if your client does the update and then hangs while doing [other stuff], every other client that tries to update that row will block until the transaction commits, even though the rest of the database will be unaffected.
On Wed, Jul 27, 2005 at 05:12:46PM -0700, Dr NoName wrote: > Thanks a lot, everyone! That solved my problem. But I > still want to be able to set transaction timeout. Any > chance of that in the next release? No, because feature freeze for the next release is one month past already. Anyway, I think it was proposed and shot down several times already for past releases. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "La Primavera ha venido. Nadie sabe como ha sido" (A. Machado)
On Tue, 2005-07-26 at 12:51, Dr NoName wrote: > postgresql doesn't support distributed transactions. Hi, I just wanted to confirm this. Is there any reasonable way to configure distributed transactions involving Postgres? Thanks very much, Mike Fahrenkrog
"Mike Fahrenkrog" <mikef@travelpost.com> writes: > Hi, I just wanted to confirm this. Is there any reasonable way to configure > distributed transactions involving Postgres? Not at the moment. 8.1 will have support for 2-phase commit, which would allow its use as part of a distributed transaction (you still have to get your XA control software from somewhere, though). regards, tom lane