Re: [HACKERS] ARC Memory Usage analysis - Mailing list pgsql-patches

From Jan Wieck
Subject Re: [HACKERS] ARC Memory Usage analysis
Date
Msg-id 417D1D01.3090401@Yahoo.com
Whole thread Raw
In response to ARC Memory Usage analysis  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [HACKERS] ARC Memory Usage analysis
Re: [HACKERS] ARC Memory Usage analysis
List pgsql-patches
On 10/22/2004 4:09 PM, Kenneth Marshall wrote:

> On Fri, Oct 22, 2004 at 03:35:49PM -0400, Jan Wieck wrote:
>> On 10/22/2004 2:50 PM, Simon Riggs wrote:
>>
>> >I've been using the ARC debug options to analyse memory usage on the
>> >PostgreSQL 8.0 server. This is a precursor to more complex performance
>> >analysis work on the OSDL test suite.
>> >
>> >I've simplified some of the ARC reporting into a single log line, which
>> >is enclosed here as a patch on freelist.c. This includes reporting of:
>> >- the total memory in use, which wasn't previously reported
>> >- the cache hit ratio, which was slightly incorrectly calculated
>> >- a useful-ish value for looking at the "B" lists in ARC
>> >(This is a patch against cvstip, but I'm not sure whether this has
>> >potential for inclusion in 8.0...)
>> >
>> >The total memory in use is useful because it allows you to tell whether
>> >shared_buffers is set too high. If it is set too high, then memory usage
>> >will continue to grow slowly up to the max, without any corresponding
>> >increase in cache hit ratio. If shared_buffers is too small, then memory
>> >usage will climb quickly and linearly to its maximum.
>> >
>> >The last one I've called "turbulence" in an attempt to ascribe some
>> >useful meaning to B1/B2 hits - I've tried a few other measures though
>> >without much success. Turbulence is the hit ratio of B1+B2 lists added
>> >together. By observation, this is zero when ARC gives smooth operation,
>> >and goes above zero otherwise. Typically, turbulence occurs when
>> >shared_buffers is too small for the working set of the database/workload
>> >combination and ARC repeatedly re-balances the lengths of T1/T2 as a
>> >result of "near-misses" on the B1/B2 lists. Turbulence doesn't usually
>> >cut in until the cache is fully utilized, so there is usually some delay
>> >after startup.
>> >
>> >We also recently discussed that I would add some further memory analysis
>> >features for 8.1, so I've been trying to figure out how.
>> >
>> >The idea that B1, B2 represent something really useful doesn't seem to
>> >have been borne out - though I'm open to persuasion there.
>> >
>> >I originally envisaged a "shadow list" operating in extension of the
>> >main ARC list. This will require some re-coding, since the variables and
>> >macros are all hard-coded to a single set of lists. No complaints, just
>> >it will take a little longer than we all thought (for me, that is...)
>> >
>> >My proposal is to alter the code to allow an array of memory linked
>> >lists. The actual list would be [0] - other additional lists would be
>> >created dynamically as required i.e. not using IFDEFs, since I want this
>> >to be controlled by a SIGHUP GUC to allow on-site tuning, not just lab
>> >work. This will then allow reporting against the additional lists, so
>> >that cache hit ratios can be seen with various other "prototype"
>> >shared_buffer settings.
>>
>> All the existing lists live in shared memory, so that dynamic approach
>> suffers from the fact that the memory has to be allocated during ipc_init.
>>
>> What do you think about my other theory to make C actually 2x effective
>> cache size and NOT to keep T1 in shared buffers but to assume T1 lives
>> in the OS buffer cache?
>>
>>
>> Jan
>>
> Jan,
>
>From the articles that I have seen on the ARC algorithm, I do not think
> that using the effective cache size to set C would be a win. The design
> of the ARC process is to allow the cache to optimize its use in response
> to the actual workload. It may be the best use of the cache in some cases
> to have the entire cache allocated to T1 and similarly for T2. If fact,
> the ability to alter the behavior as needed is one of the key advantages.

Only the "working set" of the database, that is the pages that are very
frequently used, are worth holding in shared memory at all. The rest
should be copied in and out of the OS disc buffers.

The problem is, with a too small directory ARC cannot guesstimate what
might be in the kernel buffers. Nor can it guesstimate what recently was
in the kernel buffers and got pushed out from there. That results in a
way too small B1 list, and therefore we don't get B1 hits when in fact
the data was found in memory. B1 hits is what increases the T1target,
and since we are missing them with a too small directory size, our
implementation of ARC is propably using a T2 size larger than the
working set. That is not optimal.

If we would replace the dynamic T1 buffers with a max_backends*2 area of
shared buffers, use a C value representing the effective cache size and
limit the T1target on the lower bound to effective cache size - shared
buffers, then we basically moved the T1 cache into the OS buffers.

This all only holds water, if the OS is allowed to swap out shared
memory. And that was my initial question, how likely is it to find this
to be true these days?


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

pgsql-patches by date:

Previous
From: Neil Conway
Date:
Subject: Re: (yet) more pfree cast cleanup
Next
From: Kenneth Marshall
Date:
Subject: Re: [HACKERS] ARC Memory Usage analysis