Thread: very long updates very small tables
Dear list, we have got a web application and when people log in some information is written to the user tables. We have around 500 active users, but at the most 5 people are logged in at the same time. From times to times people log in and then the application is not responsive any more. What we see in the postgres server logs is that processes are waiting for other transactions to finish though not because of a deadlock. The log tells me that certain update statements take sometimes about 3-10 minutes. But we are talking about updates on tables with 1000 to 10000 rows and updates that are supposed to update 1 row. We are running under windows 2008 and postgres 8.4.7. ( Sorry for the windows, it was not MY first choice ) My only explanation at the moment would be, that there must be any kind of windows process that stops all other processes until it is finished or something like that. ( Could it even be autovaccuum? ). Is there a way to find out how long autovaccum took ? Has anyone seen anything similiar? Or could it really be that we need a bigger machine with more io? But the one disk in the system still seems not very busy and response times in windows resource monitor are not higher than 28 ms. Following is an excerpt of our server log. LOG: process 1660 acquired ShareLock on transaction 74652 after 533354.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 4984 acquired ShareLock on transaction 74652 after 1523530.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 956 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 383055.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 4312 acquired ExclusiveLock on tuple (9,112) of relation 16412 of database 16384 after 1422677.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 1523567.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: duration: 533391.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: process 5504 acquired ExclusiveLock on tuple (9,112) of relation 16412 of database 16384 after 183216.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 1524 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 376370.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 1422688.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: duration: 383067.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 'f', $2 = '31' LOG: process 4532 acquired ExclusiveLock on tuple (9,112) of relation 16412 of database 16384 after 118851.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 4448 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 366304.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 183241.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: duration: 376395.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: process 4204 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 339893.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 366342.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: process 4760 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 205943.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 339923.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: duration: 205963.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: duration: 124654.000 ms execute <unnamed>: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: process 3844 still waiting for ShareLock on transaction 74839 after 8000.000 ms Thanx in advance. Lars -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Lars Feistner Kompetenzzentrum für Prüfungen in der Medizin Medizinische Fakultät Heidelberg, Im Neuenheimer Feld 346, Raum 013 69120 Heidelberg E-Mail: feistner@uni-heidelberg.de Fon: +49-6221-56-8269 Fax: +49-6221-56-7175 WWW: http://www.ims-m.de http://www.kompmed.de ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lars Feistner <feistner@uni-heidelberg.de> wrote: > The log tells me that certain update statements take sometimes > about 3-10 minutes. But we are talking about updates on tables > with 1000 to 10000 rows and updates that are supposed to update 1 > row. The top possibilities that come to my mind are: (1) The tables are horribly bloated. If autovacuum is off or not aggressive enough, things can degenerate to this level. (2) Memory is over-committed and your machine is thrashing. (3) There are explicit LOCK commands in the software which is contributing to the blocking. (4) There is some external delay within the transaction, such as waiting for user input while the transaction is open. Maybe there's a combination of the above at play. Can you rule any of these out? -Kevin
Hello Kevin, On 03/29/2011 09:28 PM, Kevin Grittner wrote: > Lars Feistner<feistner@uni-heidelberg.de> wrote: > >> The log tells me that certain update statements take sometimes >> about 3-10 minutes. But we are talking about updates on tables >> with 1000 to 10000 rows and updates that are supposed to update 1 >> row. > > The top possibilities that come to my mind are: > > (1) The tables are horribly bloated. If autovacuum is off or not > aggressive enough, things can degenerate to this level. > Some tables are auto vacuumed regularly others are not. The specific table extjs_recentlist was never autovacuumed. So i would think that updates on this table should be always very slow, but they are not. Only every 4 or 5th day for maybe half an hour and then everything is fine again. And;-) there is no anti virus installed. > (2) Memory is over-committed and your machine is thrashing. > We can rule this out. There is enough memory installed and the database is less than 500MB. > (3) There are explicit LOCK commands in the software which is > contributing to the blocking. We use the the jdbc driver. The jdbc driver might do some locking but we don't. > > (4) There is some external delay within the transaction, such as > waiting for user input while the transaction is open. > No, no user interaction within a transaction. > Maybe there's a combination of the above at play. Can you rule any > of these out? > > -Kevin > So, i will try to get the autovacuum to be more aggressive and will report again if nothing changes. Thanks a lot. Lars -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Lars Feistner Kompetenzzentrum für Prüfungen in der Medizin Medizinische Fakultät Heidelberg, Im Neuenheimer Feld 346, Raum 013 69120 Heidelberg E-Mail: feistner@uni-heidelberg.de Fon: +49-6221-56-8269 Fax: +49-6221-56-7175 WWW: http://www.ims-m.de http://www.kompmed.de ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2011/3/30, Lars Feistner <feistner@uni-heidelberg.de>: > Hello Kevin, > > > On 03/29/2011 09:28 PM, Kevin Grittner wrote: >> Lars Feistner<feistner@uni-heidelberg.de> wrote: >> >>> The log tells me that certain update statements take sometimes >>> about 3-10 minutes. But we are talking about updates on tables >>> with 1000 to 10000 rows and updates that are supposed to update 1 >>> row. >> >> The top possibilities that come to my mind are: >> >> (1) The tables are horribly bloated. If autovacuum is off or not >> aggressive enough, things can degenerate to this level. >> > Some tables are auto vacuumed regularly others are not. The specific > table extjs_recentlist was never autovacuumed. So i would think that > updates on this table should be always very slow, but they are not. Only > every 4 or 5th day for maybe half an hour and then everything is fine > again. And;-) there is no anti virus installed. >> (2) Memory is over-committed and your machine is thrashing. >> > We can rule this out. There is enough memory installed and the database > is less than 500MB. >> (3) There are explicit LOCK commands in the software which is >> contributing to the blocking. > We use the the jdbc driver. The jdbc driver might do some locking but we > don't. >> >> (4) There is some external delay within the transaction, such as >> waiting for user input while the transaction is open. >> > No, no user interaction within a transaction. >> Maybe there's a combination of the above at play. Can you rule any >> of these out? >> >> -Kevin >> > So, i will try to get the autovacuum to be more aggressive and will > report again if nothing changes. > > Thanks a lot. > Lars > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Lars Feistner > > Kompetenzzentrum für Prüfungen in der Medizin > Medizinische Fakultät Heidelberg, > Im Neuenheimer Feld 346, Raum 013 > 69120 Heidelberg > > E-Mail: feistner@uni-heidelberg.de > Fon: +49-6221-56-8269 > Fax: +49-6221-56-7175 > > WWW: http://www.ims-m.de > http://www.kompmed.de > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > Hi. try to log all statements for an hour and show us it. And Postgresql.conf . ------------ pasman
Lars Feistner <feistner@uni-heidelberg.de> wrote: > On 03/29/2011 09:28 PM, Kevin Grittner wrote: >> Lars Feistner<feistner@uni-heidelberg.de> wrote: >> >>> The log tells me that certain update statements take sometimes >>> about 3-10 minutes. But we are talking about updates on tables >>> with 1000 to 10000 rows and updates that are supposed to update >>> 1 row. >> >> The top possibilities that come to my mind are: > [all eliminated as possibilities] If you haven't already done so, you should probably turn on checkpoint logging to see if this corresponds to checkpoint activity. If it does, you can try cranking up how aggressive your background writer is, and perhaps limiting your shared_buffers to something around the size of your RAID controller's BBU cache. (I hope you have a RAID controller with BBU cache configured for write-back, anyway.) -Kevin
On 03/30/2011 06:54 PM, Kevin Grittner wrote: > Lars Feistner<feistner@uni-heidelberg.de> wrote: >> On 03/29/2011 09:28 PM, Kevin Grittner wrote: >>> Lars Feistner<feistner@uni-heidelberg.de> wrote: >>> >>>> The log tells me that certain update statements take sometimes >>>> about 3-10 minutes. But we are talking about updates on tables >>>> with 1000 to 10000 rows and updates that are supposed to update >>>> 1 row. >>> >>> The top possibilities that come to my mind are: > >> [all eliminated as possibilities] > > If you haven't already done so, you should probably turn on > checkpoint logging to see if this corresponds to checkpoint > activity. If it does, you can try cranking up how aggressive your > background writer is, and perhaps limiting your shared_buffers to > something around the size of your RAID controller's BBU cache. (I > hope you have a RAID controller with BBU cache configured for > write-back, anyway.) > > -Kevin > Hello Kevin, i am sorry to disappoint you here. As I said in my first E-Mail we don't have much traffic and the database fits easily into memory. The traffic might increase, at least it was increasing the last 12 months. The database will always fit into memory. No, we don't have a raid and thus we don't have a bbu. Actually we started off with a big SAN that our data centre offered. But sometimes this SAN was a bit slow and when we first encountered the very long updates i thought there was a connection between the long running updates and the slowliness of the SAN, so i started to use the local disk (we are talking about one disk not disks) for the database. I am still seeing the long running inserts and updates. I am still following the auto vacuum trail, it does still not run frequently enough. Thanks a lot for the replies so far. I will keep you guys informed about my next steps and the results. Thanx a lot Lars -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Lars Feistner Kompetenzzentrum für Prüfungen in der Medizin Medizinische Fakultät Heidelberg, Im Neuenheimer Feld 346, Raum 013 69120 Heidelberg E-Mail: feistner@uni-heidelberg.de Fon: +49-6221-56-8269 Fax: +49-6221-56-7175 WWW: http://www.ims-m.de http://www.kompmed.de ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lars Feistner <feistner@uni-heidelberg.de> wrote: > On 03/30/2011 06:54 PM, Kevin Grittner wrote: >> If you haven't already done so, you should probably turn on >> checkpoint logging to see if this corresponds to checkpoint >> activity. If it does, you can try cranking up how aggressive >> your background writer is, and perhaps limiting your >> shared_buffers to something around the size of your RAID >> controller's BBU cache. (I hope you have a RAID controller with >> BBU cache configured for write-back, anyway.) > i am sorry to disappoint you here. As I said in my first E-Mail we > don't have much traffic and the database fits easily into memory. > The traffic might increase, at least it was increasing the last 12 > months. The database will always fit into memory. > No, we don't have a raid and thus we don't have a bbu. Actually > we started off with a big SAN that our data centre offered. But > sometimes this SAN was a bit slow and when we first encountered > the very long updates i thought there was a connection between the > long running updates and the slowliness of the SAN, so i started > to use the local disk (we are talking about one disk not disks) > for the database. I am still seeing the long running inserts and > updates. I am still following the auto vacuum trail, it does still > not run frequently enough. Thanks a lot for the replies so far. I > will keep you guys informed about my next steps and the results. Nothing there makes a write glut on checkpoint less likely to be the cause. Without a BBU write-back cache it is actually *more* likely, and having enough RAM to hold the whole database makes it *more* likely. If you haven't placed your pg_xlog directory on a separate file system, it is also more likely. Turning on logging of checkpoint activity and checking whether that correlates with your problem times is strongly indicated. -Kevin
Dne 4.4.2011 16:32, Kevin Grittner napsal(a): > Nothing there makes a write glut on checkpoint less likely to be the > cause. Without a BBU write-back cache it is actually *more* likely, > and having enough RAM to hold the whole database makes it *more* > likely. If you haven't placed your pg_xlog directory on a separate > file system, it is also more likely. > > Turning on logging of checkpoint activity and checking whether that > correlates with your problem times is strongly indicated. > > -Kevin Checkpoints would be my first guess too, but the whole database is just 500MB. Lars, how did you get this number? Did you measure the amount of disk space occupied or somehow else? BTW how much memory is there (total RAM and dedicated to shared buffers)? How many checkpoint segments are there? Have you monitored the overall behavior of the system (what processes are running etc.) when the problems occur? I don't have much experience with Windows but tools from sysinternals are reasonable. And yet another idea - have you tried to use the stats collected by PostgreSQL? I mean the pg_stat_ tables, especially pg_stat_bgwriter and maybe pg_stat_all_tables. Those numbers are cummulative, so do two snapshot when the problems are happening and subtract them to get an idea of what's going on. regards Tomas