Re: Debugging shared memory issues on CentOS - Mailing list pgsql-performance

From Tom Lane
Subject Re: Debugging shared memory issues on CentOS
Date
Msg-id 3716.1386737676@sss.pgh.pa.us
Whole thread Raw
In response to Debugging shared memory issues on CentOS  (Mack Talcott <mack.talcott@gmail.com>)
Responses Re: Debugging shared memory issues on CentOS
List pgsql-performance
Mack Talcott <mack.talcott@gmail.com> writes:
> I am trying to debug some shared memory issues with Postgres 9.3.1 and
> CentOS release 6.3 (Final).  I have a database machine that probably has
> some misconfigured shared memory settings.  It's getting into 2+ GB of
> swap.  Restarting postgres frees all of the memory, but after a few hours
> of normal usage it will go back into swap.

Are you sure the kernel isn't just swapping out some idle processes
because it feels like it?  These numbers don't exactly look like a
machine under stress:

> top - 09:38:16 up 1 day, 21:21,  3 users,  load average: 0.40, 0.54, 0.45
> Tasks: 253 total,   2 running, 251 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.7%us,  0.2%sy,  0.0%ni, 97.8%id,  1.2%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Mem:   6998260k total,  6849048k used,   149212k free,      248k buffers
> Swap: 440478516k total,  1981912k used, 438496604k free,  1541356k cached

In particular, you've got 1.5 gig of filesystem cache, so you're hardly
out of memory.  I don't know where the other 5.5 gig of RAM went, but
it doesn't look like postgres is eating it; what else is running on
this box?

These lines look absolutely normal, assuming that you've configured
shared_buffers somewhere in the neighborhood of 1GB:

>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  3534 postgres  20   0 2330m 1.4g 1.1g S  0.0 20.4   1:06.99 postgres:
> deploy mtalcott 10.222.154.172(53495) idle
>  9143 postgres  20   0 2221m 1.1g 983m S  0.0 16.9   0:14.75 postgres:
> deploy mtalcott 10.222.154.167(35811) idle
>  6026 postgres  20   0 2341m 1.1g 864m S  0.0 16.4   0:46.56 postgres:
> deploy mtalcott 10.222.154.167(37110) idle
> 18538 postgres  20   0 2327m 1.1g 865m S  0.0 16.1   2:06.59 postgres:
> deploy mtalcott 10.222.154.172(47796) idle
>  1575 postgres  20   0 2358m 1.1g 858m S  0.0 15.9   1:41.76 postgres:
> deploy mtalcott 10.222.154.172(52560) idle

The key thing to realize about that is that the SHR column is *shared*
memory, ie all these processes are referencing the same chunk of about 1GB
worth of memory.  The process-specific memory is RES minus SHR, and none
of those processes seem tremendously out of line on that measure.  (Note:
the fact that the SHR values aren't all exactly the same is because top
doesn't count a shared page until the process has physically touched that
page.  Even the guy with 1.1g of SHR might not have touched all of the
shared storage yet.)

I'm not sure you have a problem here.  If you do, these figures aren't
showing it.  Having some stuff shoved out to swap is not a problem unless
you have a problem with the swap I/O rate.  You might try watching "vmstat
1" for awhile to see if the si/so columns show significant activity.

            regards, tom lane


pgsql-performance by date:

Previous
From: Krzysztof Olszewski
Date:
Subject: Problem with slow query with WHERE conditions with OR clause on primary keys
Next
From: David Johnston
Date:
Subject: Re: Problem with slow query with WHERE conditions with OR clause on primary keys