Re: ARC Memory Usage analysis - Mailing list pgsql-hackers
From | Mark Wong |
---|---|
Subject | Re: ARC Memory Usage analysis |
Date | |
Msg-id | 20041027143422.A6199@osdl.org Whole thread Raw |
In response to | Re: ARC Memory Usage analysis (Jan Wieck <JanWieck@Yahoo.com>) |
List | pgsql-hackers |
On Mon, Oct 25, 2004 at 11:34:25AM -0400, Jan Wieck wrote: > On 10/22/2004 4:09 PM, Kenneth Marshall wrote: > > > On Fri, Oct 22, 2004 at 03:35:49PM -0400, Jan Wieck wrote: > >> On 10/22/2004 2:50 PM, Simon Riggs wrote: > >> > >> >I've been using the ARC debug options to analyse memory usage on the > >> >PostgreSQL 8.0 server. This is a precursor to more complex performance > >> >analysis work on the OSDL test suite. > >> > > >> >I've simplified some of the ARC reporting into a single log line, which > >> >is enclosed here as a patch on freelist.c. This includes reporting of: > >> >- the total memory in use, which wasn't previously reported > >> >- the cache hit ratio, which was slightly incorrectly calculated > >> >- a useful-ish value for looking at the "B" lists in ARC > >> >(This is a patch against cvstip, but I'm not sure whether this has > >> >potential for inclusion in 8.0...) > >> > > >> >The total memory in use is useful because it allows you to tell whether > >> >shared_buffers is set too high. If it is set too high, then memory usage > >> >will continue to grow slowly up to the max, without any corresponding > >> >increase in cache hit ratio. If shared_buffers is too small, then memory > >> >usage will climb quickly and linearly to its maximum. > >> > > >> >The last one I've called "turbulence" in an attempt to ascribe some > >> >useful meaning to B1/B2 hits - I've tried a few other measures though > >> >without much success. Turbulence is the hit ratio of B1+B2 lists added > >> >together. By observation, this is zero when ARC gives smooth operation, > >> >and goes above zero otherwise. Typically, turbulence occurs when > >> >shared_buffers is too small for the working set of the database/workload > >> >combination and ARC repeatedly re-balances the lengths of T1/T2 as a > >> >result of "near-misses" on the B1/B2 lists. Turbulence doesn't usually > >> >cut in until the cache is fully utilized, so there is usually some delay > >> >after startup. > >> > > >> >We also recently discussed that I would add some further memory analysis > >> >features for 8.1, so I've been trying to figure out how. > >> > > >> >The idea that B1, B2 represent something really useful doesn't seem to > >> >have been borne out - though I'm open to persuasion there. > >> > > >> >I originally envisaged a "shadow list" operating in extension of the > >> >main ARC list. This will require some re-coding, since the variables and > >> >macros are all hard-coded to a single set of lists. No complaints, just > >> >it will take a little longer than we all thought (for me, that is...) > >> > > >> >My proposal is to alter the code to allow an array of memory linked > >> >lists. The actual list would be [0] - other additional lists would be > >> >created dynamically as required i.e. not using IFDEFs, since I want this > >> >to be controlled by a SIGHUP GUC to allow on-site tuning, not just lab > >> >work. This will then allow reporting against the additional lists, so > >> >that cache hit ratios can be seen with various other "prototype" > >> >shared_buffer settings. > >> > >> All the existing lists live in shared memory, so that dynamic approach > >> suffers from the fact that the memory has to be allocated during ipc_init. > >> > >> What do you think about my other theory to make C actually 2x effective > >> cache size and NOT to keep T1 in shared buffers but to assume T1 lives > >> in the OS buffer cache? > >> > >> > >> Jan > >> > > Jan, > > > >>From the articles that I have seen on the ARC algorithm, I do not think > > that using the effective cache size to set C would be a win. The design > > of the ARC process is to allow the cache to optimize its use in response > > to the actual workload. It may be the best use of the cache in some cases > > to have the entire cache allocated to T1 and similarly for T2. If fact, > > the ability to alter the behavior as needed is one of the key advantages. > > Only the "working set" of the database, that is the pages that are very > frequently used, are worth holding in shared memory at all. The rest > should be copied in and out of the OS disc buffers. > > The problem is, with a too small directory ARC cannot guesstimate what > might be in the kernel buffers. Nor can it guesstimate what recently was > in the kernel buffers and got pushed out from there. That results in a > way too small B1 list, and therefore we don't get B1 hits when in fact > the data was found in memory. B1 hits is what increases the T1target, > and since we are missing them with a too small directory size, our > implementation of ARC is propably using a T2 size larger than the > working set. That is not optimal. > > If we would replace the dynamic T1 buffers with a max_backends*2 area of > shared buffers, use a C value representing the effective cache size and > limit the T1target on the lower bound to effective cache size - shared > buffers, then we basically moved the T1 cache into the OS buffers. > > This all only holds water, if the OS is allowed to swap out shared > memory. And that was my initial question, how likely is it to find this > to be true these days? > > > Jan > I've asked our linux kernel guys some quick questions and they say you can lock mmapped memory and sys v shared memory with mlock and SHM_LOCK, resp. Otherwise the OS will swap out memory as it sees fit, whether or not it's shared. Mark
pgsql-hackers by date: