Re: Server stops responding randomly for 5 minutes - Mailing list pgsql-general

From Chris Travers
Subject Re: Server stops responding randomly for 5 minutes
Date
Msg-id CAKt_Zft793=prwYt0kgT5GVnanXsvmUv1f55WRJX_m+t8O8nDA@mail.gmail.com
Whole thread Raw
In response to Re: Server stops responding randomly for 5 minutes  ("Andrus" <kobruleht2@hot.ee>)
List pgsql-general
I think your log files, and the description of your issue provide at
least a decent idea of what is probably happening.  Unfortunately
without being able to observe the server when it happens, I think you
are going to have limited success tracking this down.

Proximal causes could be network errors or software bugs.

On Wed, Jul 13, 2011 at 4:34 AM, Andrus <kobruleht2@hot.ee> wrote:
> Tomáš and Tony,
>
> thank you.
>
>> Have you done some basic monitoring? This typically happens when the
>> machine does a lot of I/O (swapping, checkpoints, ...) - not sure how this
>> is logged.
>
> This is dedicated server, used only for PostgreSql.
>
> I filtered windows event logs near this time (12:16) . The is error
>
> The OpsMgr Connector could not connect to OPSMAN.yyyyy.xxx:5723.  The error
> code is 11004L(The requested name is valid, but no data of the requested
> type was found.).  Please verify there is network connectivity, the server
> is running and has registered it's listening port, and there are no
> firewalls blocking traffic to the destination.
>
> and OpsMgr related warnings. I do'nt know what is OpsMgr but expect that
> this is not realted to this issue.

It's a product called Microsoft Operations Manager or MOM for short.

The error could be related even if the software is not.  I wouldn't
toss it out entirely yet.
>
> postgres log at this time shows
>
> 2011-07-11 11:45:27 EEST LOG:  could not receive data from client: No
> connection could be made because the target machine actively refused it.
>
> 2011-07-11 11:45:27 EEST LOG:  unexpected EOF on client connection
> 2011-07-11 12:18:35 EEST LOG:  could not receive data from client: No
> connection could be made because the target machine actively refused it.

Sounds like evidence of a network problem or unexpected applicatoin termination.

What I suspect is actually happening (although this is based on sparse
evidence, it may be a useful starting point) is that a session which
is locking certain records is getting spuriously disconnected and the
session is timing out along with locks.  During this timeout period,
the transaction is not committed or rolled back and the locks are
still there.  However that's hardly enough to solve the problem.

Given that you are running the application over RDP it would be
interesting to find out whether these happen when the RDP session is
closed.  Maybe the application is killed and has locks that aren't
released until it times out on the server?  Of course this only causes
an issue in the cases where there is a request for the same locks by
another user when the RDP session closes.


>
> 2011-07-11 12:18:35 EEST LOG:  unexpected EOF on client connection
> 2011-07-11 12:18:46 EEST LOG:  could not receive data from client: No
> connection could be made because the target machine actively refused it.
>
>
> 2011-07-11 12:18:46 EEST LOG:  unexpected EOF on client connection
> 2011-07-11 12:20:55 EEST LOG:  could not receive data from client: No
> connection could be made because the target machine actively refused it.
>
> those messages appear all the time so I expect that they are also not
> related this issue.

They could be though.  If you have a lock conflict and a connection
timeout that could be causing your problem.
>
>> First, you have to determine what's wrong. Set up some basic monitoring,
>> on Linux I'd use iostat/vmstat, not sure about the windows - try process
>> explorer from sysinternals.
>
> Users report hangup later, I cannot monitor this at hangup time.
> No idea what / how to monitor more. Anyway I added lines
>
> log_lock_waits = on
> log_temp_files = 2000
> log_min_duration_statement = 10000
> log_line_prefix='%t %u %d '
> log_min_error_statement = warning

add %r also to your log line prefix.
>
> to end of postgresql.conf and restarted server.
>
>>> 10-25 users manually entering sales order, e.q. low server usage
>>
>> So they're entering the data directly into the database?
>
> Try are using windows application from RDP (other computer in LAN) which
> generates insert, update, delete commands to server for every entered order.

What happens when an RDP session times out or is disconnected without
closing an application?

I.e. if a user just closes the remote desktop session without closing
the app first, the app is killed, right?

>
>> Are you sure
>> there's not something wrong in the application (e.g. a loop that takes a
>> lot of time in some cases)?
>
> This application is used for may other sites without hangup.
> I havent written indentionally such loops, statements cannot take so much
> time.

How many others are running over RDP?
>
>>> Server is installed using setup program from postgresql.org directed
>>> site,
>>> postresql.conf file is not modified after installation.
>>
>> Not sure what setup program you mean (there's an install from
>> EnterpriseDB), but the default config values are usually too low (e.g. the
>> default shared buffers is 24MB IIRC, but in your case 512MB would be
>> probably better). Not sure if that's the problem, though.
>
> PostgreSql was installed from EnterpriceDB, postgresql.org site contains
> link into it.
>
> I ran now EnterpiseDb Tuning wizard and optimized postgresql.conf using it
> for mixed app server and restarted service.
>
> postgresql.conf now contains
>
> # NOTE: This has been modified by EnterpriseDB's Tuning Wizard on 2011/07/13
> 14:10:42
> #       Original Value for "shared_buffers" was "32MB"
> shared_buffers = 69608   # min 128kB
>
Hope this provides a starting point that gets you somewhere.

Best wishes,
Chris Travers

pgsql-general by date:

Previous
From: Nicolas Grilly
Date:
Subject: Re: ts_rank seems very slow (140 ranked documents / second on my machine)
Next
From: Duarte Fonseca
Date:
Subject: SerializableSnapshot removed from postgresql 8.4