Thread: Server stops responding in every week
PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit is installed in Windows 2008 server and servers 23 users in LAN 10 hours per work day. Approx once per week server stops responding: Users are waiting 4 minutes for query to complete but nothing happens. If postgres service is re-started in Windows Control panel, it continues working for a week and same issue appears again. Server and windows event log does not contian any information about issue. How to fix of diagnose it ? Should I move to 32 Postgres 9.0 server or other idea ? Andrus.
On 1/21/2011 7:16 AM, Andrus wrote: > PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit > is installed in Windows 2008 server and servers 23 users in LAN 10 hours > per work day. > > Approx once per week server stops responding: Users are waiting 4 > minutes for query to complete but nothing happens. > > If postgres service is re-started in Windows Control panel, it continues > working for a week and same issue appears again. > Server and windows event log does not contian any information about issue. > > > How to fix of diagnose it ? > Should I move to 32 Postgres 9.0 server or other idea ? > > Andrus. > Does PG start swapping to disk? Does ram using increase over the week? Have you looked at pg_stat_activity and pg_locks? Do you see "Idle in transaction" (probably in task manager)? -Andy
On Friday 21 January 2011 7:13:23 am Andy Colson wrote: > On 1/21/2011 7:16 AM, Andrus wrote: > > PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit > > is installed in Windows 2008 server and servers 23 users in LAN 10 hours > > per work day. > > > > Approx once per week server stops responding: Users are waiting 4 > > minutes for query to complete but nothing happens. > > > > If postgres service is re-started in Windows Control panel, it continues > > working for a week and same issue appears again. > > Server and windows event log does not contian any information about > > issue. > > > > > > How to fix of diagnose it ? > > Should I move to 32 Postgres 9.0 server or other idea ? > > > > Andrus. > > Does PG start swapping to disk? > Does ram using increase over the week? > Have you looked at pg_stat_activity and pg_locks? > Do you see "Idle in transaction" (probably in task manager)? > > -Andy To add to this. Is there a backup being done? Maybe not pg_dump but some Windows backup program? -- Adrian Klaver adrian.klaver@gmail.com
On 1/21/2011 9:21 AM, Adrian Klaver wrote: > On Friday 21 January 2011 7:13:23 am Andy Colson wrote: >> On 1/21/2011 7:16 AM, Andrus wrote: >>> PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit >>> is installed in Windows 2008 server and servers 23 users in LAN 10 hours >>> per work day. >>> >>> Approx once per week server stops responding: Users are waiting 4 >>> minutes for query to complete but nothing happens. >>> >>> If postgres service is re-started in Windows Control panel, it continues >>> working for a week and same issue appears again. >>> Server and windows event log does not contian any information about >>> issue. >>> >>> >>> How to fix of diagnose it ? >>> Should I move to 32 Postgres 9.0 server or other idea ? >>> >>> Andrus. >> >> Does PG start swapping to disk? >> Does ram using increase over the week? >> Have you looked at pg_stat_activity and pg_locks? >> Do you see "Idle in transaction" (probably in task manager)? >> >> -Andy > > To add to this. > Is there a backup being done? Maybe not pg_dump but some Windows backup program? > Yes yes... or antivirus full scan kicking off?
In response to "Andrus" <kobruleht2@hot.ee>: > PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit > is installed in Windows 2008 server and servers 23 users in LAN 10 hours per > work day. > > Approx once per week server stops responding: Users are waiting 4 minutes > for query to complete but nothing happens. > > If postgres service is re-started in Windows Control panel, it continues > working for a week and same issue appears again. > Server and windows event log does not contian any information about issue. > > > How to fix of diagnose it ? SELECT * FROM pg_locks; If there are exclusive locks being held by some process that kicks off once a week, it could block anything else from occurring. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Andy, > Does PG start swapping to disk? Earlier I used Stack builder to optimize Postgres as dedicated server. MS SQL server was installed afterwards. In this case message "System is out of virtual memory". Windows is increasing the swap file size. During this memory requests from applications may fail. occurs in server screen if Postgres stops responding. I re-ran Stack builder to optimize conf file as application server and uninstalled MS SQL server. After that I havent noticed any swapping. > Does ram using increase over the week? > Have you looked at pg_stat_activity and pg_locks? > Do you see "Idle in transaction" (probably in task manager)? I havent looked into them. How to automatically re-start postgres service in every night ? Or is it better to switch to 32bit server? Andrus.
On 1/21/2011 9:43 AM, Andrus wrote: > Andy, > >> Does PG start swapping to disk? > > Earlier I used Stack builder to optimize Postgres as dedicated server. > MS SQL server was installed afterwards. > In this case message > "System is out of virtual memory". Windows is increasing the swap file > size. During this memory requests from applications may fail. > > occurs in server screen if Postgres stops responding. > > I re-ran Stack builder to optimize conf file as application server and > uninstalled MS SQL server. > After that I havent noticed any swapping. > Yeah, mssql is a huge memory resource hog. >> Does ram using increase over the week? >> Have you looked at pg_stat_activity and pg_locks? >> Do you see "Idle in transaction" (probably in task manager)? > > I havent looked into them. > > How to automatically re-start postgres service in every night ? Or is it > better to switch to 32bit server? Neither. You need to fix the problem. -Andy
>> To add to this. >> Is there a backup being done? Maybe not pg_dump but some Windows backup >> program? >> > > Yes yes... or antivirus full scan kicking off? There is no antivirus installed. Windows scheduler contains only one task which runs pg_dump at 2:00 every morning. Dump takes few minutes to run. Andrus.
Andy, >> How to automatically re-start postgres service in every night ? Or is it >> better to switch to 32bit server? > > Neither. You need to fix the problem. How to configure postgres so that freeze issue can addressed ? E.q. if query runs more that 60 seconds, postgres dumps its status and long query reason . How to implement this ? No idea can log_statement=all help. Andrus.
On 01/22/2011 03:00 AM, Andrus Moor wrote: > Andy, > >>> How to automatically re-start postgres service in every night ? Or is it >>> better to switch to 32bit server? >> >> Neither. You need to fix the problem. > > How to configure postgres so that freeze issue can addressed ? > E.q. if query runs more that 60 seconds, postgres dumps its status and long query reason . > How to implement this ? > No idea can log_statement=all help. > > Andrus. > Before we can fix the problem, we need to know what the problem is. Wait until the server is really slow and having problems. Then answer these questions: Does ram using increase over the week? Do you see "Idle in transaction" (probably in task manager)? Show us: select * from pg_stat_activity; and select * from pg_locks; A little practice up front might help. When the system is broke, you can interrogate it to see what its doing. Hopefullywe'll see something locked, and a bunch of things waiting. -Andy
Andy, > Then answer these questions: > > Does ram using increase over the week? Server was re-started yesterday and there was little use after restart. server is idle. Task Manager shows now PF Usage 1.22 GB Physical Memory (K) Total 4191872 Avail 348xxxx (last 4 vary) System cache 35676xx (last 2 vary) How to check for memory increase over week ? > Do you see "Idle in transaction" (probably in task manager)? Task Manager shows only image name. Where I should see this ? > Show us: > select * from pg_stat_activity; I ran it from pgadmin. It shows now : 11874;"postgres";212;10;"postgres";"";"127.0.0.1";1183;"2011-01-22 21:24:51.343+02";"";"2011-01-22 21:24:51.5+02";f;"<IDLE>" 45923;"mydb";3080;10;"postgres";"";"127.0.0.1";1184;"2011-01-22 21:24:55.25+02";"";"2011-01-22 21:24:55.281+02";f;"<IDLE>" 45923;"mydb";4732;10;"postgres";"";"127.0.0.1";1185;"2011-01-22 21:24:57.156+02";"2011-01-22 21:25:24.109+02";"2011-01-22 21:25:24.109+02";f;" I will try to run it on next crash if users will not restart server before. > and > select * from pg_locks; I ran it from pgadmin. It shows now : "relation";45923;10985;;;"";;;;;"4/241";4732;"AccessShareLock";t "virtualxid";;;;;"4/241";;;;;"4/241";4732;"ExclusiveLock";t I will try to run it on next crash if users will not restart server before. > A little practice up front might help. When the system is broke, you can > interrogate it to see what its doing. Hopefully we'll see something > locked, and a bunch of things waiting. Users can restart server without contacting me. For this reason I asked for automated dump. Andrus.
I know you're running windows, but if you can get bash working on it, here's a simple bash script I wrote that when it detects too many people connected creates a table called pg_stat_bk_20110120140634 (i.e. date and time) so I can then look over what was in pg_stat_activity when things were acting up. #!/bin/bash threshold=50; dt=`date +%Y%m%d%H%M%S`; active=`/usr/bin/psql www -Atc "select count(*) from pg_stat_activity where current_query not ilike '%idle%';"` if [[ active -gt threshold ]]; then echo "there are "$active" backends"; echo "creating backup for pg_stat as pg_stat_bk_$dt" psql www -c "select * into monitoring.pg_stat_bk_$dt from pg_stat_activity where current_query not ilike '%idle%';" fi note that you should probably change what triggers it to things like if there are any queries waiting or idle in transaction, etc.
Attachment
Scott, Thank you. >I know you're running windows, but if you can get bash working on it, > here's a simple bash script I wrote that when it detects too many > people connected creates a table called pg_stat_bk_20110120140634 > (i.e. date and time) so I can then look over what was in > pg_stat_activity when things were acting up. I can run scripts from windows scheduler for every 4 minutes or maybe from pgadmin or maybe frequently accessed table triggers. Whan command should be used to detect presence of hangup ( maybe query running more than 60 seconds or too many idle transactions?) should pg_stat_activity.query_start column used for this ? How to log memory usage ? Is it best way to use initially: create table activity as select * from pg_stat_activity limit 0; create table locks as select * from pg_locks limit 0; and if script detects hangup it invokes insert into activity select * from pg_stat_activity; insert into locks select * from pg_locks; How to add log timestamp column to activity and locks tables ? How to log memory usage ? Can users notice server perfomance degration due to this? What else to log ? How to create plpgsql procedure for this ? There are few tables which are used frequently. Hangup occurs probably if queries are invoked againt those tables. Is it better to implement this as trigger for those tables? Andrus.
On 01/23/2011 04:38 AM, Andrus Moor wrote: > Scott, > > Thank you. > >> I know you're running windows, but if you can get bash working on it, >> here's a simple bash script I wrote that when it detects too many >> people connected creates a table called pg_stat_bk_20110120140634 >> (i.e. date and time) so I can then look over what was in >> pg_stat_activity when things were acting up. > > I can run scripts from windows scheduler for every 4 minutes or maybe from > pgadmin or maybe frequently accessed table triggers. > > Whan command should be used to detect presence of hangup ( maybe query running more than 60 > seconds or too many idle transactions?) > should pg_stat_activity.query_start column used for this ? > > How to log memory usage ? > > Is it best way to use initially: > > create table activity as select * from pg_stat_activity limit 0; > create table locks as select * from pg_locks limit 0; > > and if script detects hangup it invokes > > insert into activity select * from pg_stat_activity; > insert into locks select * from pg_locks; > > How to add log timestamp column to activity and locks tables ? > How to log memory usage ? > Can users notice server perfomance degration due to this? > What else to log ? > How to create plpgsql procedure for this ? > > There are few tables which are used frequently. Hangup occurs probably if > queries are invoked againt those tables. > Is it better to implement this as trigger for those tables? > > Andrus. > > How about: a) Tell users not to restart database. Make them call you first. Tell them it wont get fixed until that happens. b) If they insist on doing your job, they have to do all of your job: make a script for them to run that collects the info. Tell them they have to run the script before they restart the db. I swear... users are like kids... they'd open windows with hammers if you let them. :-) -Andy