Thread: Debugging shared memory issues on CentOS

Debugging shared memory issues on CentOS

From
Mack Talcott
Date:
I am trying to debug some shared memory issues with Postgres 9.3.1 and CentOS release 6.3 (Final).  I have a database machine that probably has some misconfigured shared memory settings.  It's getting into 2+ GB of swap.  Restarting postgres frees all of the memory, but after a few hours of normal usage it will go back into swap.  During light usage, postgres will *very* slowly release some memory, but not all.  Using top, I can see that many of the postgres connections are using shared memory:

```
top - 09:38:16 up 1 day, 21:21,  3 users,  load average: 0.40, 0.54, 0.45
Tasks: 253 total,   2 running, 251 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  0.2%sy,  0.0%ni, 97.8%id,  1.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   6998260k total,  6849048k used,   149212k free,      248k buffers
Swap: 440478516k total,  1981912k used, 438496604k free,  1541356k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3534 postgres  20   0 2330m 1.4g 1.1g S  0.0 20.4   1:06.99 postgres: deploy mtalcott 10.222.154.172(53495) idle
 9143 postgres  20   0 2221m 1.1g 983m S  0.0 16.9   0:14.75 postgres: deploy 
mtalcott 10.222.154.167(35811) idle
 6026 postgres  20   0 2341m 1.1g 864m S  0.0 16.4   0:46.56 postgres: deploy 
mtalcott 10.222.154.167(37110) idle
18538 postgres  20   0 2327m 1.1g 865m S  0.0 16.1   2:06.59 postgres: deploy 
mtalcott 10.222.154.172(47796) idle
 1575 postgres  20   0 2358m 1.1g 858m S  0.0 15.9   1:41.76 postgres: deploy 
mtalcott 10.222.154.172(52560) idle
```

There are about 29 total idle connections.  
`sudo ipcs -m` only shows:
```
  ------ Shared Memory Segments --------
  key        shmid      owner      perms      bytes      nattch     status
  0x0052e2c1 163840     postgres   600        48         21
```

Surprisingly, it only shows it using 48 bytes.  Any ideas why that would be?

My shared memory settings are:
kernel.shmmax = 8589934592  # 8 GB
kernel.shmall = 2097152     # * 4096 = 8 GB  
kernel.shmmni = 4096

Do I need to set lower shared memory limits?  In the past, I've run into issues using pg_dump and executing larger transactions with lower values.  If I can monitor the shared memory segment I can better understand when postgres is allocating and releasing..

Re: Debugging shared memory issues on CentOS

From
Tom Lane
Date:
Mack Talcott <mack.talcott@gmail.com> writes:
> I am trying to debug some shared memory issues with Postgres 9.3.1 and
> CentOS release 6.3 (Final).  I have a database machine that probably has
> some misconfigured shared memory settings.  It's getting into 2+ GB of
> swap.  Restarting postgres frees all of the memory, but after a few hours
> of normal usage it will go back into swap.

Are you sure the kernel isn't just swapping out some idle processes
because it feels like it?  These numbers don't exactly look like a
machine under stress:

> top - 09:38:16 up 1 day, 21:21,  3 users,  load average: 0.40, 0.54, 0.45
> Tasks: 253 total,   2 running, 251 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.7%us,  0.2%sy,  0.0%ni, 97.8%id,  1.2%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Mem:   6998260k total,  6849048k used,   149212k free,      248k buffers
> Swap: 440478516k total,  1981912k used, 438496604k free,  1541356k cached

In particular, you've got 1.5 gig of filesystem cache, so you're hardly
out of memory.  I don't know where the other 5.5 gig of RAM went, but
it doesn't look like postgres is eating it; what else is running on
this box?

These lines look absolutely normal, assuming that you've configured
shared_buffers somewhere in the neighborhood of 1GB:

>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  3534 postgres  20   0 2330m 1.4g 1.1g S  0.0 20.4   1:06.99 postgres:
> deploy mtalcott 10.222.154.172(53495) idle
>  9143 postgres  20   0 2221m 1.1g 983m S  0.0 16.9   0:14.75 postgres:
> deploy mtalcott 10.222.154.167(35811) idle
>  6026 postgres  20   0 2341m 1.1g 864m S  0.0 16.4   0:46.56 postgres:
> deploy mtalcott 10.222.154.167(37110) idle
> 18538 postgres  20   0 2327m 1.1g 865m S  0.0 16.1   2:06.59 postgres:
> deploy mtalcott 10.222.154.172(47796) idle
>  1575 postgres  20   0 2358m 1.1g 858m S  0.0 15.9   1:41.76 postgres:
> deploy mtalcott 10.222.154.172(52560) idle

The key thing to realize about that is that the SHR column is *shared*
memory, ie all these processes are referencing the same chunk of about 1GB
worth of memory.  The process-specific memory is RES minus SHR, and none
of those processes seem tremendously out of line on that measure.  (Note:
the fact that the SHR values aren't all exactly the same is because top
doesn't count a shared page until the process has physically touched that
page.  Even the guy with 1.1g of SHR might not have touched all of the
shared storage yet.)

I'm not sure you have a problem here.  If you do, these figures aren't
showing it.  Having some stuff shoved out to swap is not a problem unless
you have a problem with the swap I/O rate.  You might try watching "vmstat
1" for awhile to see if the si/so columns show significant activity.

            regards, tom lane


Re: Debugging shared memory issues on CentOS

From
Mack Talcott
Date:
On Tue, Dec 10, 2013 at 8:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Mack Talcott <mack.talcott@gmail.com> writes:
>> I am trying to debug some shared memory issues with Postgres 9.3.1 and
>> CentOS release 6.3 (Final).  I have a database machine that probably has
>> some misconfigured shared memory settings.  It's getting into 2+ GB of
>> swap.  Restarting postgres frees all of the memory, but after a few hours
>> of normal usage it will go back into swap.
>
> Are you sure the kernel isn't just swapping out some idle processes
> because it feels like it?  These numbers don't exactly look like a
> machine under stress:
>
>> top - 09:38:16 up 1 day, 21:21,  3 users,  load average: 0.40, 0.54, 0.45
>> Tasks: 253 total,   2 running, 251 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  0.7%us,  0.2%sy,  0.0%ni, 97.8%id,  1.2%wa,  0.0%hi,  0.0%si,
>>  0.0%st
>> Mem:   6998260k total,  6849048k used,   149212k free,      248k buffers
>> Swap: 440478516k total,  1981912k used, 438496604k free,  1541356k cached
>
> In particular, you've got 1.5 gig of filesystem cache, so you're hardly
> out of memory.  I don't know where the other 5.5 gig of RAM went, but
> it doesn't look like postgres is eating it; what else is running on
> this box?
>
> These lines look absolutely normal, assuming that you've configured
> shared_buffers somewhere in the neighborhood of 1GB:
>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  3534 postgres  20   0 2330m 1.4g 1.1g S  0.0 20.4   1:06.99 postgres:
>> deploy mtalcott 10.222.154.172(53495) idle
>>  9143 postgres  20   0 2221m 1.1g 983m S  0.0 16.9   0:14.75 postgres:
>> deploy mtalcott 10.222.154.167(35811) idle
>>  6026 postgres  20   0 2341m 1.1g 864m S  0.0 16.4   0:46.56 postgres:
>> deploy mtalcott 10.222.154.167(37110) idle
>> 18538 postgres  20   0 2327m 1.1g 865m S  0.0 16.1   2:06.59 postgres:
>> deploy mtalcott 10.222.154.172(47796) idle
>>  1575 postgres  20   0 2358m 1.1g 858m S  0.0 15.9   1:41.76 postgres:
>> deploy mtalcott 10.222.154.172(52560) idle
>
> The key thing to realize about that is that the SHR column is *shared*
> memory, ie all these processes are referencing the same chunk of about 1GB
> worth of memory.  The process-specific memory is RES minus SHR, and none
> of those processes seem tremendously out of line on that measure.  (Note:
> the fact that the SHR values aren't all exactly the same is because top
> doesn't count a shared page until the process has physically touched that
> page.  Even the guy with 1.1g of SHR might not have touched all of the
> shared storage yet.)
>
> I'm not sure you have a problem here.  If you do, these figures aren't
> showing it.  Having some stuff shoved out to swap is not a problem unless
> you have a problem with the swap I/O rate.  You might try watching "vmstat
> 1" for awhile to see if the si/so columns show significant activity.
>
>                         regards, tom lane

Thanks for your reply.  I've included the rest of the top output
below.  This is a dedicated postgres box, so nothing else is running.

shared_buffers is set to 1.8g, to accommodate some of our larger
operations.  It looks like this could be lowered a bit, since the max
shared usage is only 1.1g.

The pattern I am seeing is that postgres processes keep growing in
shared (this makes sense as they access more of the shared memory, as
you've pointed out) but also process-specific memory as they run more
queries.  The largest ones are using around 300mb of process-specific
memory, even when they're idle and outside of any transactions.

As for CentOS using 1.5g for disk caching, I'm at a loss.  I have
played with the 'swappiness', setting it down to 10 from the default
of 60 with sysctl.  It didn't have any effect.

Once 70-80% of memory is reached, the machine starts using swap, and
it keeps growing.  At first, queries become slightly slower.  Then
some larger selects start taking 10, then 20, then 30 seconds.  During
this, vmstat shows 5-20 procs waiting on both CPU and I/O.  All of a
sudden, generally after some large transaction, about 1g of swap is
released and the number of blocked procs jumps to 50-80.  Everything
grinds to a halt for a few minutes.  Sometimes my app can recover, and
sometimes it needs a little kick.

As expected, resetting the connection clears the process-specific
memory.  The same number of connections on the same machine only use
20% of memory (with 0 swap) when I periodically reconnect.  What kind
of information are these processes holding on to?  I would expect
long-running, idle postgres processes to have similar memory usage to
brand new, idle ones.

One thing worth mentioning is that I am heavily using schemas.  On
every request, I am setting and resetting search_path.

This top was captured just before swap was released
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3534 postgres  20   0 2330m 1.4g 1.1g S  0.0 20.4   1:06.99 postgres:
deploy mtalcott 10.222.155.179(53495) idle
 9143 postgres  20   0 2221m 1.1g 983m S  0.0 16.9   0:14.75 postgres:
deploy mtalcott 10.222.155.164(35811) idle
 6026 postgres  20   0 2341m 1.1g 864m S  0.0 16.4   0:46.56 postgres:
deploy mtalcott 10.222.155.164(37110) idle
18538 postgres  20   0 2327m 1.1g 865m S  0.0 16.1   2:06.59 postgres:
deploy mtalcott 10.222.155.179(47796) idle
 1575 postgres  20   0 2358m 1.1g 858m S  0.0 15.9   1:41.76 postgres:
deploy mtalcott 10.222.155.179(52560) idle
17931 postgres  20   0 2343m 1.1g 834m S  0.0 15.8   2:04.61 postgres:
deploy mtalcott 10.222.155.164(54439) idle
18286 postgres  20   0 2363m 1.0g 797m S  1.3 15.6   1:54.97 postgres:
deploy mtalcott 10.222.155.179(47588) idle
 4541 postgres  20   0 2343m 1.0g 783m S  0.0 15.2   1:20.75 postgres:
deploy mtalcott 10.222.155.179(53938) idle
18763 postgres  20   0 2347m 1.0g 772m S  0.0 14.9   1:49.83 postgres:
deploy mtalcott 10.222.155.164(32853) idle
 1088 postgres  20   0 2336m 1.0g 778m S  0.3 14.9   1:35.40 postgres:
deploy mtalcott 10.222.155.179(52312) idle
17933 postgres  20   0 2343m 996m 800m S  0.0 14.6   2:11.68 postgres:
deploy mtalcott 10.222.155.164(54443) idle
 1089 postgres  20   0 2310m 970m 776m S  1.7 14.2   1:18.34 postgres:
deploy mtalcott 10.222.155.164(46130) idle
 3535 postgres  20   0 2354m 950m 779m S  0.0 13.9   1:18.44 postgres:
deploy mtalcott 10.222.155.164(33599) idle
 1708 postgres  20   0 2308m 940m 760m S  0.0 13.8   1:08.72 postgres:
deploy mtalcott 10.222.155.164(49552) idle
18540 postgres  20   0 2337m 932m 784m S  0.7 13.6   1:50.66 postgres:
deploy mtalcott 10.222.155.164(59856) idle
 8471 postgres  20   0 2312m 683m 429m S  0.0 10.0   0:54.35 postgres:
deploy mtalcott 10.222.155.179(57867) idle
 5931 postgres  20   0 2327m 589m 340m S  0.0  8.6   0:40.07 postgres:
deploy mtalcott 10.222.155.179(55092) idle
 6070 postgres  20   0 2306m 568m 358m S  0.0  8.3   0:42.56 postgres:
deploy mtalcott 10.222.155.179(55307) idle
 9135 postgres  20   0 2235m 523m 341m S  0.0  7.7   0:19.65 postgres:
deploy mtalcott 10.222.155.164(35140) idle
10996 postgres  20   0 2103m 229m 169m S  0.0  3.4   0:01.65 postgres:
deploy mtalcott 10.222.155.179(60798) idle
11001 postgres  20   0 2062m 163m 144m S  0.7  2.4   0:01.90 postgres:
deploy mtalcott 10.222.155.164(44039) idle
17697 postgres  20   0 2038m 151m 150m S  0.0  2.2   0:09.82 postgres:
checkpointer process
10869 postgres  20   0 2045m  82m  76m S  3.3  1.2   0:12.19 postgres:
deploy mtalcott 10.197.52.158(43556) idle in transaction
10994 postgres  20   0 2052m  61m  50m S  0.0  0.9   0:00.77 postgres:
deploy mtalcott 10.222.155.179(60757) idle
17680 postgres  20   0 2037m  37m  37m S  0.0  0.6   0:03.34
/usr/local/pgsql9.3/bin/postgres -D /db/pgsql/9.3/data
17698 postgres  20   0 2038m  36m  35m S  0.0  0.5   0:02.85 postgres:
writer process
10993 postgres  20   0 2045m  29m  22m S  0.0  0.4   0:00.26 postgres:
deploy mtalcott 10.222.155.164(42908) idle
17701 postgres  20   0  134m  21m  272 S  0.0  0.3   1:21.61 postgres:
stats collector process
 4905 postgres  20   0 2045m  13m 8408 S  0.0  0.2   0:00.44 postgres:
deploy mtalcott 10.222.155.164(47193) idle
 5041 postgres  20   0 2044m  13m 8124 S  0.0  0.2   0:00.54 postgres:
deploy mtalcott 10.222.155.164(49813) idle
 5036 postgres  20   0 2044m  12m 7808 S  0.0  0.2   0:00.50 postgres:
deploy mtalcott 10.222.155.164(49380) idle
 6452 postgres  20   0 2044m  10m 6112 S  0.0  0.2   0:00.26 postgres:
deploy mtalcott 10.222.155.164(44313) idle
 5023 postgres  20   0 2044m  10m 5868 S  0.0  0.2   0:00.50 postgres:
deploy mtalcott 10.222.155.164(47882) idle
 5029 postgres  20   0 2045m  10m 6732 S  0.0  0.1   0:00.81 postgres:
deploy mtalcott 10.222.155.164(48498) idle
 5808 postgres  20   0 2044m 9408 7040 S  0.0  0.1   0:00.30 postgres:
deploy mtalcott 10.222.155.164(33987) idle
17700 postgres  20   0 2039m 4728 4432 S  0.0  0.1   0:00.71 postgres:
autovacuum launcher process
10567 deploy    20   0 97820 1372  432 S  0.0  0.0   0:00.02 sshd: deploy@pts/2
10564 root      20   0 97820 1192  284 S  0.0  0.0   0:00.04 sshd: deploy [priv]
10998 deploy    20   0 15168 1044  604 R  0.7  0.0   0:00.59 top -c


Re: Debugging shared memory issues on CentOS

From
Tom Lane
Date:
Mack Talcott <mack.talcott@gmail.com> writes:
> The pattern I am seeing is that postgres processes keep growing in
> shared (this makes sense as they access more of the shared memory, as
> you've pointed out) but also process-specific memory as they run more
> queries.  The largest ones are using around 300mb of process-specific
> memory, even when they're idle and outside of any transactions.

There's quite a lot of stuff that a PG process will cache in local memory
once it's acquired the info, for example:
- relcache (relation descriptors)
- catcache (system catalog entries)
- compiled trees for plpgsql functions

300mb worth of that stuff seems on the high side, but perhaps you have
lots and lots of tables, or lots and lots of functions?

If this is the explanation then flushing that info just results in
restarting from a cold-cache situation, which doesn't seem likely to
be a win.  You're just going to be paying to read it in again.

> As for CentOS using 1.5g for disk caching, I'm at a loss.  I have
> played with the 'swappiness', setting it down to 10 from the default
> of 60 with sysctl.  It didn't have any effect.

Swappiness has nothing to do with disk cache.  Disk cache just means that
the kernel is free to use any spare memory for copies of file pages it's
read from disk lately.  This is almost always a good thing, because it
saves reading those pages again if they're needed again.  And the key word
there is "spare" --- the kernel is at liberty to drop those cached pages
if it needs the memory for something more pressing.  So there's really no
downside.  Trying to reduce that number is completely counterproductive.
Rather, my observation was that if you had a gig and a half worth of RAM
that the kernel felt it could afford to use for disk caching, then you
weren't having much of a memory problem.  However, apparently that
snapshot wasn't representative of your problem case:

> Once 70-80% of memory is reached, the machine starts using swap, and
> it keeps growing.  At first, queries become slightly slower.  Then
> some larger selects start taking 10, then 20, then 30 seconds.  During
> this, vmstat shows 5-20 procs waiting on both CPU and I/O.

I wonder if the short answer for this isn't that you should be using fewer
backends by running a connection pooler.  If the backends want to cache a
couple hundred meg worth of stuff, it's probably wise to let them do so.
Or maybe you should just buy some more RAM.  8GB is pretty puny for a
server these days (heck, the obsolete laptop I'm typing this mail on
has half that much).

            regards, tom lane


Re: Debugging shared memory issues on CentOS

From
Merlin Moncure
Date:
On Wed, Dec 11, 2013 at 9:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Mack Talcott <mack.talcott@gmail.com> writes:
>> The pattern I am seeing is that postgres processes keep growing in
>> shared (this makes sense as they access more of the shared memory, as
>> you've pointed out) but also process-specific memory as they run more
>> queries.  The largest ones are using around 300mb of process-specific
>> memory, even when they're idle and outside of any transactions.
>
> There's quite a lot of stuff that a PG process will cache in local memory
> once it's acquired the info, for example:
> - relcache (relation descriptors)
> - catcache (system catalog entries)
> - compiled trees for plpgsql functions
>
> 300mb worth of that stuff seems on the high side, but perhaps you have
> lots and lots of tables, or lots and lots of functions?

This has got to be the problem.  It's known that pathological
workloads (lots and lots of tables,views, and functions) abuse the
cache memory segment.  There's no cap to cache memory so over time it
will just accumulate entries until there's nothing left to cache.  For
most applications, this doesn't even show up on the radar.  However,
300mb per postgres backend will burn through that 8gb pretty quickly.
It's tempting to say, "there should be a limit to backend local cache"
but it's not clear if the extra tracking is really worth it all things
considered.  There was some discussion about this (see the archives).

Workarounds:
*) install connection pooler (as Tom noted), in particular pgbouncer.
 For workloads like this you will want to be spartan on the number of
physical connections -- say, 1 * number of cores.  For this option to
work you need to use transaction mode which in turn limits use of
session dependent features (advisory locks, NOTIFY, prepared
statements).  Also if your client stack is java you need to take some
extra steps.
*) add memory
*) force connections to recycle every X period of time

merlin


Re: Debugging shared memory issues on CentOS

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> It's tempting to say, "there should be a limit to backend local cache"
> but it's not clear if the extra tracking is really worth it all things
> considered.  There was some discussion about this (see the archives).

Yeah --- there actually was a limit on total catcache size once, long ago.
We took it out because it was (a) expensive to enforce and (b) either
pointless or counterproductive on most workloads.  The catcache is
probably the least of the memory hogs anyway, so it might be that limiting
the size of relcache or function caches would be more useful.  But that
memory is likely to discourage most hackers from investigating.

            regards, tom lane


Re: Debugging shared memory issues on CentOS

From
Mack Talcott
Date:
> There's quite a lot of stuff that a PG process will cache in local memory
> once it's acquired the info, for example:
> - relcache (relation descriptors)
> - catcache (system catalog entries)
> - compiled trees for plpgsql functions
>
> 300mb worth of that stuff seems on the high side, but perhaps you have
> lots and lots of tables, or lots and lots of functions?
>
> If this is the explanation then flushing that info just results in
> restarting from a cold-cache situation, which doesn't seem likely to
> be a win.  You're just going to be paying to read it in again.

It does seem a bit on the high side, but that makes sense.  There are
about 90 tables and 5 functions in each schema (all are identical),
but there are several infrequent queries for overall statistics that
do a union over all schemas (using UNION ALL).  That seems like the
most likely culprit, as there are ~500 of these schemas.

However, as the app serves a variety of customers, each request makes
queries in a different schema.  Seems like eventually these caches
would get pretty large even without the all-schema queries.

> Swappiness has nothing to do with disk cache.  Disk cache just means that
> the kernel is free to use any spare memory for copies of file pages it's
> read from disk lately.  This is almost always a good thing, because it
> saves reading those pages again if they're needed again.  And the key word
> there is "spare" --- the kernel is at liberty to drop those cached pages
> if it needs the memory for something more pressing.  So there's really no
> downside.  Trying to reduce that number is completely counterproductive.
> Rather, my observation was that if you had a gig and a half worth of RAM
> that the kernel felt it could afford to use for disk caching, then you
> weren't having much of a memory problem.  However, apparently that
> snapshot wasn't representative of your problem case:

I see.  So, maybe the kernel is _first_ determining that some of the
inactive processes' memory should be swapped out.  Then, since there
is free memory, it's being used for disk cache?

> I wonder if the short answer for this isn't that you should be using fewer
> backends by running a connection pooler.

If I can figure out the maximum number of connections that my server
can handle, that's definitely a possibility.

> If the backends want to cache a
> couple hundred meg worth of stuff, it's probably wise to let them do so.
> Or maybe you should just buy some more RAM.  8GB is pretty puny for a
> server these days (heck, the obsolete laptop I'm typing this mail on
> has half that much).

More memory is definitely a good solution.  This server is on EC2, and
I'm working on replacing it with an instance with twice as much.
However, my concern is that if I double the number of app servers to
handle higher load, I will run into the same issue.

I assume the memory of each process grows until it has all 90 tables
from all 500 schemas cached in some way.  Any ideas for optimizations
that would allow less memory usage in this case with many identical
schemas?  I'm guessing using views rather than select statements
wouldn't help.  Any postgres configs concerning caching I should take
a look at?  Different approaches to data organization?

Thanks, Tom.  I really appreciate your feedback!

>
>                         regards, tom lane