Re: Stalls on PGSemaphoreLock - Mailing list pgsql-performance

From Pavy Philippe
Subject Re: Stalls on PGSemaphoreLock
Date
Msg-id 5F8F324242D0E14B97060D4D32CD0F5C827A2143D5@FRSPX100.fr01.awl.atosorigin.net
Whole thread Raw
In response to Re: Stalls on PGSemaphoreLock  (Matthew Spilich <mspilich@tripadvisor.com>)
Responses Re: Stalls on PGSemaphoreLock  ("Gudmundsson Martin (mg)" <martin.mg.gudmundsson@volvo.com>)
Re: Stalls on PGSemaphoreLock  (Matthew Spilich <mspilich@tripadvisor.com>)
List pgsql-performance
Here, we were the transparent hugepage always actif:
        cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
        [always] never

We changed to:
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
        always [never]



For the semaphore, our initial configuration was:
        cat /proc/sys/kernel/sem
        250 32000 32 128

And we changed to:
        cat /proc/sys/kernel/sem
        5010    641280  5010    128




-----Message d'origine-----
De : pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] De la part de Matthew
Spilich
Envoyé : mardi 25 mars 2014 19:38
À : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] Stalls on PGSemaphoreLock

Thanks all:

Ray:  Thanks, we started to look at the hardware/firmware, but didn't get to the the level of detail or running sar.
Iwill probably collect more detail in this area if I continue to see issues. 

Pavy - I hope that you are right that the hugepage setting is the issue.   I was under the impression that I had it
disabledalready because this has been an known issue for us in the past, but it turns out this was not the case for
thisserver in question.   I have disabled it at this time, but it will take a few days of running without issue before
Iam comfortable declaring that this is the solution.   Can you elaborate on the change you mention to "upgrade the
semaphoreconfiguration"?   I think this is not something I have looked at before. 

Ashutosh - Thanks for the reply, I started to do that at first.   I turned on log_statement=all for a few hours and I
generateda few GB of log file, and I didn't want to leave it running in that state for too long because the issue
happensevery few days, and not on any regular schedule, so I reverted that after collecting a few GB of detail in the
pglog.   What I'm doing now to sample every few seconds is I think giving me a decent picture of what is going on with
theincident occurs and is a level of data collection that I am more comfortable will not impact operations.  I am also
loggingat the level of 'mod' and all duration > 500ms.   I don't see that large write operations are a contributing
factorleading up to these incidents. 

I'm hoping that disabling the hugepage setting will be the solution to this.  I'll check back in a day or two with
feedback.

Thanks,
Matt


________________________________________
From: Pavy Philippe [Philippe.Pavy@worldline.com]
Sent: Tuesday, March 25, 2014 1:45 PM
To: Ray Stell; Matthew Spilich
Cc: pgsql-performance@postgresql.org
Subject: RE : [PERFORM] Stalls on PGSemaphoreLock

Hello

Recently I have a similar problem. The first symptom was a freeze of the connection and 100% of CPU SYS during 2 et 10
minutes,1 or 2 times per day. 
Connection impossible, slow query. The strace on one backend show a very long system call on semop().
We have a node with 48 cores dans 128 Go of memory.

We have disable the hugepage and upgrade the semaphore configuration, and since that time, we no longer have any
problemof freeze on our instance. 

Can you check the hugepage and semaphore configuration on our node ?

I am interested in this case, so do not hesitate to let me make a comeback. Thanks.

excuse me for my bad english !!!

________________________________________
De : pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] de la part de Ray Stell
[stellr@vt.edu]Date d'envoi : mardi 25 mars 2014 18:17 À : Matthew Spilich Cc : pgsql-performance@postgresql.org Objet
:Re: [PERFORM] Stalls on PGSemaphoreLock 

On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote:

The symptom:   The database machine (running postgres 9.1.9 on CentOS 6.4) is running a low utilization most of the
time,but once every day or two, it will appear to slow down to the point where queries back up and clients are unable
toconnect.  Once this event occurs, there are lots of concurrent queries, I see slow queries appear in the logs, but
theredoesn't appear to be anything abnormal that I have been able to see that causes this behavior. 
...
Has any on the forum seen something similar?   Any suggestions on what to look at next?    If it is helpful to describe
theserver hardware, it's got 2 E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 local storage
(15K300 GB drives). 


I could be way off here, but years ago I experienced something like this (in oracle land) and after some stressful
chasing,the marginal failure of the raid controller revealed itself.  Same kind of event, steady traffic and then some
i/owould not complete and normal ops would stack up.  Anyway, what you report reminded me of that event.  The E5 is a
fewyears old, I wonder if the raid controller firmware needs a patch?  I suppose a marginal power supply might cause a
similar"hang."  Anyway, marginal failures are very painful.  Have you checked sar or OS logging at event time? 


Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut
égalementêtre protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir
immédiatementl'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la
responsabilitéde Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts
soientfaits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard
etsa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis. 

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be
privileged.If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity
cannotbe secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the
senderendeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is
virus-freeand will not be liable for any damages resulting from any virus transmitted. 


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut
égalementêtre protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir
immédiatementl'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la
responsabilitéde Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts
soientfaits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard
etsa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis. 

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be
privileged.If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity
cannotbe secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the
senderendeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is
virus-freeand will not be liable for any damages resulting from any virus transmitted. 


pgsql-performance by date:

Previous
From: Pavy Philippe
Date:
Subject: RE : Stalls on PGSemaphoreLock
Next
From: "Gudmundsson Martin (mg)"
Date:
Subject: Re: Stalls on PGSemaphoreLock