Thread: Tons of free RAM. Can't make it go away.

Tons of free RAM. Can't make it go away.

From
Shaun Thomas
Date:
Hey everyone!

This is pretty embarrassing, but I've never seen this before. This is
our system's current memory allocation from 'free -m':

              total       used       free     buffers     cached
Mem:         72485      58473      14012           3      34020
-/+ buffers/cache:      24449      48036

So, I've got 14GB of RAM that the OS is just refusing to use for disk or
page cache. Does anyone know what might cause that?

Our uname -sir, for reference:

Linux 3.2.0-31-generic x86_64

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Tons of free RAM. Can't make it go away.

From
Claudio Freire
Date:
On Mon, Oct 22, 2012 at 2:35 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
> So, I've got 14GB of RAM that the OS is just refusing to use for disk or
> page cache. Does anyone know what might cause that?

Maybe there's just nothing to put inside?

How big is your database? How much of it gets accessed?


Re: Tons of free RAM. Can't make it go away.

From
Shaun Thomas
Date:
On 10/22/2012 12:44 PM, Claudio Freire wrote:


> Maybe there's just nothing to put inside?
> How big is your database? How much of it gets accessed?

Trust me, there's plenty. We have a DB that's 6x larger than RAM that's
currently experiencing 6000TPS, and according to iostat, anywhere from
20-60% disk utilization that's mostly reads.

It's pretty aggressively keeping that 14GB free, and it's driving me
nuts. :)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Tons of free RAM. Can't make it go away.

From
Frank Lanitz
Date:
On Mon, 22 Oct 2012 12:35:32 -0500
Shaun Thomas <sthomas@optionshouse.com> wrote:

> Hey everyone!
>
> This is pretty embarrassing, but I've never seen this before. This is
> our system's current memory allocation from 'free -m':
>
>               total       used       free     buffers     cached
> Mem:         72485      58473      14012           3      34020
> -/+ buffers/cache:      24449      48036
>
> So, I've got 14GB of RAM that the OS is just refusing to use for disk
> or page cache. Does anyone know what might cause that?

Maybe it's not needed? What make you think the OS shall allocate all the
memory?
--
Frank Lanitz <frank@frank.uvena.de>

Attachment

Re: Tons of free RAM. Can't make it go away.

From
Claudio Freire
Date:
On Mon, Oct 22, 2012 at 2:49 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
>> Maybe there's just nothing to put inside?
>> How big is your database? How much of it gets accessed?
>
>
> Trust me, there's plenty. We have a DB that's 6x larger than RAM that's
> currently experiencing 6000TPS, and according to iostat, anywhere from
> 20-60% disk utilization that's mostly reads.
>
> It's pretty aggressively keeping that 14GB free, and it's driving me nuts.
> :)

Did you check the kernel's zone_reclaim_mode ?


Re: Tons of free RAM. Can't make it go away.

From
Marcus Larsson
Date:
On Mon, Oct 22, 2012 at 12:49:49PM -0500, Shaun Thomas wrote:

> Trust me, there's plenty. We have a DB that's 6x larger than RAM
> that's currently experiencing 6000TPS, and according to iostat,
> anywhere from 20-60% disk utilization that's mostly reads.

Could it be related to zone_reclaim_mode? What is vm.zone_reclaim_mode set to?

/marcus



Re: Tons of free RAM. Can't make it go away.

From
Shaun Thomas
Date:
On 10/22/2012 12:53 PM, Claudio Freire wrote:

> Did you check the kernel's zone_reclaim_mode ?

It's currently set to 0, which as I'm led to believe, is the setting I
want there. But here's something interesting:

numactl --hardware

available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 36853 MB
node 0 free: 13816 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 36863 MB
node 1 free: 751 MB
node distances:
node   0   1
   0:  10  20
   1:  20  10


Looks like CPU 0 is hoarding memory. :(


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Tons of free RAM. Can't make it go away.

From
Claudio Freire
Date:
On Mon, Oct 22, 2012 at 3:01 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
>
>> Did you check the kernel's zone_reclaim_mode ?
>
>
> It's currently set to 0, which as I'm led to believe, is the setting I want
> there.

Yep

> But here's something interesting:
>
> numactl --hardware
>
> available: 2 nodes (0-1)
> node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
> node 0 size: 36853 MB
> node 0 free: 13816 MB
> node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
> node 1 size: 36863 MB
> node 1 free: 751 MB
> node distances:
> node   0   1
>   0:  10  20
>   1:  20  10
>
>
> Looks like CPU 0 is hoarding memory. :(

You may want to try setting the numa policy before launching postgres:

numactl --interleave=all pg_ctl start

or

numactl --preferred=+0 pg_ctl start


Re: Tons of free RAM. Can't make it go away.

From
"Franklin, Dan (FEN)"
Date:
This is a good general discussion of the problem - looks like you could
replace "MySQL" with "PostgreSQL" everywhere without loss of generality:

http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-archite
cture/


Dan

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Claudio
Freire
Sent: Monday, October 22, 2012 2:14 PM
To: sthomas@optionshouse.com
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Tons of free RAM. Can't make it go away.

On Mon, Oct 22, 2012 at 3:01 PM, Shaun Thomas <sthomas@optionshouse.com>
wrote:
>
>> Did you check the kernel's zone_reclaim_mode ?
>
>
> It's currently set to 0, which as I'm led to believe, is the setting I
want
> there.

Yep

> But here's something interesting:
>
> numactl --hardware
>
> available: 2 nodes (0-1)
> node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
> node 0 size: 36853 MB
> node 0 free: 13816 MB
> node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
> node 1 size: 36863 MB
> node 1 free: 751 MB
> node distances:
> node   0   1
>   0:  10  20
>   1:  20  10
>
>
> Looks like CPU 0 is hoarding memory. :(

You may want to try setting the numa policy before launching postgres:

numactl --interleave=all pg_ctl start

or

numactl --preferred=+0 pg_ctl start


--
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Tons of free RAM. Can't make it go away.

From
Shaun Thomas
Date:
On 10/22/2012 01:20 PM, Franklin, Dan (FEN) wrote:

> http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-archite
> cture/

Yeah, I remember reading that a while back. While interesting, it
doesn't really apply to PG, in that unlike MySQL, we don't allocate any
large memory segments directly to any large block. With MySQL, it's not
uncommon to dedicate over 50% of RAM to the MySQL process itself, but I
don't often see PG systems with more than 8GB in shared_buffers.

All the rest should be available for random allocation in general. At
least, in theory.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Tons of free RAM. Can't make it go away.

From
Shaun Thomas
Date:
On 10/22/2012 01:14 PM, Claudio Freire wrote:

> You may want to try setting the numa policy before launching postgres:
>
> numactl --interleave=all pg_ctl start

I thought about that. I'd try it on one of our stage nodes, but both of
them show an even memory split. I'm not sure why our prod node is acting
this way. We've used bcfg2 so every server has the exact same
configuration, including kernel parameters, startup settings, and so on.
I can only conclude that there's something about the activity itself
that's causing it.

I'll have to take another look after the market closes to see if the
unallocated chunk shrinks.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Tons of free RAM. Can't make it go away.

From
Claudio Freire
Date:
On Mon, Oct 22, 2012 at 3:24 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
>> http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-archite
>> cture/
>
>
> Yeah, I remember reading that a while back. While interesting, it doesn't
> really apply to PG, in that unlike MySQL, we don't allocate any large memory
> segments directly to any large block. With MySQL, it's not uncommon to
> dedicate over 50% of RAM to the MySQL process itself, but I don't often see
> PG systems with more than 8GB in shared_buffers.

Actually, one problem that creeps up in PG is that shared buffers
tends to be allocated all within one node (the postmaster's), stealing
a lot from workers.

I had written a patch that sets the policy to interleave in the
master, while launching (and setting up shared buffers), and then back
to preferring local when forking a worker.

I never had a chance to test it. I only have one numa system, and it's
in production so I can't really test much there.

I think, unless it gives you trouble with the page cache, numactl
--prefer=+0 should work nicely for postgres overall. Failing that,
numactl --interleave=all would, IMO, be better than the system
default.


Re: Tons of free RAM. Can't make it go away.

From
Shaun Thomas
Date:
On 10/22/2012 01:44 PM, Claudio Freire wrote:

> I think, unless it gives you trouble with the page cache, numactl
> --prefer=+0 should work nicely for postgres overall. Failing that,
> numactl --interleave=all would, IMO, be better than the system
> default.

Thanks, I'll consider that.

FWIW, our current stage cluster node is *not* doing this at all. In
fact, here's a numastat from stage:

                            node0           node1
numa_hit              1623243097      1558610594
numa_miss              257459057       310098727
numa_foreign           310098727       257459057
interleave_hit          25822175        26010606
local_node            1616379287      1545600377
other_node             264322867       323108944

Then from prod:

                            node0           node1
numa_hit              4987625178      3695967931
numa_miss             1678204346       418284176
numa_foreign           418284176      1678204370
interleave_hit             27578           27720
local_node            4988131216      3696305260
other_node            1677698308       417946847


Note how ridiculously uneven node0 and node1 are in comparison to what
we're seeing in stage. I'm willing to bet something is just plain wrong
with our current production node. So I'm working with our NOC team to
schedule a failover to the alternate node. If that resolves it, I'll see
if I can't get some kind of answer from our infrastructure guys to share
in case someone else encounters this.

Yes, even if that answer is "reboot." :)

Thanks again!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Tons of free RAM. Can't make it go away.

From
Віталій Тимчишин
Date:
Sorry for late response, but may be you are still strugling.

It can be that some query(s) use a lot of work mem, either because of high work_mem setting or because of planner error. In this case the moment query runs it will need memory that will later be returned and become free. Usually this can be seen as active memory spike with a lot of free memory after.

2012/10/22 Shaun Thomas <sthomas@optionshouse.com>
Hey everyone!

This is pretty embarrassing, but I've never seen this before. This is our system's current memory allocation from 'free -m':

             total       used       free     buffers     cached
Mem:         72485      58473      14012           3      34020
-/+ buffers/cache:      24449      48036

So, I've got 14GB of RAM that the OS is just refusing to use for disk or page cache. Does anyone know what might cause that?

Our uname -sir, for reference:

Linux 3.2.0-31-generic x86_64

-- 
--
Best regards,
 Vitalii Tymchyshyn

Re: Tons of free RAM. Can't make it go away.

From
Shaun Thomas
Date:
On 10/27/2012 10:49 PM, Віталій Тимчишин wrote:

> It can be that some query(s) use a lot of work mem, either because of
> high work_mem setting or because of planner error. In this case the
> moment query runs it will need memory that will later be returned and
> become free. Usually this can be seen as active memory spike with a lot
> of free memory after.

Yeah, I had briefly considered that. But our work-mem is only 16MB, and
even a giant query would have trouble allocating 10+GB with that size of
work-mem buckets.

That's why I later listed the numa info. In our case, processor 0 is
heavily unbalanced with its memory accesses compared to processor 1. I
think the theory that we didn't start with interleave put an 8GB (our
shared_buffers) segment all on processor 0, which unbalanced a lot of
other stuff.

Of course, that leaves 4-6GB unaccounted for. And numactl still shows a
heavy preference for freeing memory from proc 0. It seems to only do it
on this node, so we're going to switch nodes soon and see if the problem
reappears. We may have to perform a node hardware audit if this persists.

Thanks for your input, though. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email