Re: BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always) - Mailing list pgsql-bugs

From Scott Carey
Subject Re: BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)
Date
Msg-id BDFBB77C9E07BE4A984DAAE981D19F961ACA17DA01@EXVMBX018-1.exch018.msoutlookonline.net
Whole thread Raw
In response to Re: BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)
List pgsql-bugs
I am 99.9% certian its not a fluke of "top" (or 'free').  Or a fluke with t=
he drop_caches linux vm signal.  Otherwise, the system would not spin at 10=
0% System cpu getting no work done if an attempt to allocate memory above t=
hat threshold, the original symptom that led me down this path of investiga=
tion.

So on a 32GB machine, setting shared_buffers to 8GB, filling them up via in=
dex scans, waiting for a checkpoint and syncing to be sure the dirty pages =
are negligible, then trying to allocate (and use) more than 16GB of RAM wil=
l make the sytem unresponsive.  Set shared_buffers to 4GB and repeat, and i=
t takes 24GB of allocations to cause the problem.  The system essentially b=
ehaves like the shared_buffer space is 2x its size, except that the more us=
eless half of it cannot be paged out.

When I first saw this behavior, i did not connect the 8GB that the OS could=
 not free to the 8GB that postgres was caching, and blamed linux, spending =
much effort with linux vm tunables.  But now that they are connected, It wo=
uld seem more likely to be on this side of things.  After all, how can linu=
x even know what pages postgres still has in cache?  AFAIK, you aren't memo=
ry mapping pages, just read() ing them?  Wouldn't the OS only know that the=
re is memory allocated to shared space in a process, and not know those con=
tents are mapped pages?

Alternatively, the same behavior could occur if only the first pages put in=
 the buffer, that replace 'empty space,' are being pinned.  I would need so=
me sort of tool that can tell me what blocks are in the OS page cache and w=
hich are in postgres to distinguish between those given the symptoms.

If there are any ideas on how I could truly distinguish where this bug actu=
ally is, or further characterize it in ways that would be useful to that en=
d, that would be great.

I should have time to set shared_buffers above 50% and see if it kills thin=
gs later today -- this sort of test would not require any special cache flu=
shing particular to linux to see.

Thanks,

Scott

________________________________________
From: Tom Lane [tgl@sss.pgh.pa.us]
Sent: Thursday, December 11, 2008 5:57 AM
To: Scott Carey
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4575: All page cache in shared_buffers pinned (dup=
licated by OS, always)

"Scott Carey" <scott@richrelevance.com> writes:
> I have determined that nearly every cached page within shared_buffers is
> being pinned in memory, preventing the OS from dropping any such pages fr=
om
> its page cache.

Shouldn't you be complaining to kernel folk rather than here?
What do you think we could do about it?

(I'm not convinced that you're seeing anything except a
platform-specific vagary in how "top" counts things, anyway.)

                        regards, tom lane

pgsql-bugs by date:

Previous
From: Scott Carey
Date:
Subject: Re: BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)
Next
From: "Ahmed Shinwari"
Date:
Subject: ECPG Preprocessor throws Syntax Error [Devel Repository]