Thread: Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

From
Thomas F.O'Connell
Date:
Simon,

As a postgres DBA, I find your comments about how not to use
effective_cache_size instructive, but I'm still not sure how I should
arrive at a target value for it.

On most of the machines on which I admin postgres, I generally set
shared_buffers to 10,000 (using what seems to have been the recent
conventional wisdom of the lesser of 10,000 or 10% of RAM). I haven't
really settled on an optimal value for effective_cache_size, and now
I'm again confused as to how I might even benchmark it.

Here are the documents on which I've based my knowledge:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html#effcache
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html
http://www.ca.postgresql.org/docs/momjian/hw_performance/node8.html

 From Bruce's document, I gather that effective_cache_size would assume
that either shared buffers or unused RAM were valid sources of cached
pages for the purposes of assessing plans.

As a result, I was intending to inflate the value of
effective_cache_size to closer to the amount of unused RAM on some of
the machines I admin (once I've verified that they all have a unified
buffer cache). Is that correct?

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Oct 26, 2004, at 3:49 AM, Simon Riggs wrote:

> On Mon, 2004-10-25 at 16:34, Jan Wieck wrote:
>> The problem is, with a too small directory ARC cannot guesstimate what
>> might be in the kernel buffers. Nor can it guesstimate what recently
>> was
>> in the kernel buffers and got pushed out from there. That results in a
>> way too small B1 list, and therefore we don't get B1 hits when in fact
>> the data was found in memory. B1 hits is what increases the T1target,
>> and since we are missing them with a too small directory size, our
>> implementation of ARC is propably using a T2 size larger than the
>> working set. That is not optimal.
>
> I think I have seen that the T1 list shrinks "too much", but need more
> tests...with some good test results
>
> The effectiveness of ARC relies upon the balance between the often
> conflicting requirements of "recency" and "frequency". It seems
> possible, even likely, that pgsql's version of ARC may need some subtle
> changes to rebalance it - if we are unlikely enough to find cases where
> it genuinely is out of balance. Many performance tests are required,
> together with a few ideas on extra parameters to include....hence my
> support of Jan's ideas.
>
> That's also why I called the B1+B2 hit ratio "turbulence" because it
> relates to how much oscillation is happening between T1 and T2. In
> physical systems, we expect the oscillations to be damped, but there is
> no guarantee that we have a nearly critically damped oscillator. (Note
> that the absence of turbulence doesn't imply that T1+T2 is optimally
> sized, just that is balanced).
>
> [...and all though the discussion has wandered away from my original
> patch...would anybody like to commit, or decline the patch?]
>
>> If we would replace the dynamic T1 buffers with a max_backends*2 area
>> of
>> shared buffers, use a C value representing the effective cache size
>> and
>> limit the T1target on the lower bound to effective cache size - shared
>> buffers, then we basically moved the T1 cache into the OS buffers.
>
> Limiting the minimum size of T1len to be 2* maxbackends sounds like an
> easy way to prevent overbalancing of T2, but I would like to follow up
> on ways to have T1 naturally stay larger. I'll do a patch with this
> idea
> in, for testing. I'll call this "T1 minimum size" so we can discuss it.
>
> Any other patches are welcome...
>
> It could be that B1 is too small and so we could use a larger value of
> C
> to keep track of more blocks. I think what is being suggested is two
> GUCs: shared_buffers (as is), plus another one, larger, which would
> allow us to track what is in shared_buffers and what is in OS cache.
>
> I have comments on "effective cache size" below....
>
> On Mon, 2004-10-25 at 17:03, Tom Lane wrote:
>> Jan Wieck <JanWieck@Yahoo.com> writes:
>>> This all only holds water, if the OS is allowed to swap out shared
>>> memory. And that was my initial question, how likely is it to find
>>> this
>>> to be true these days?
>>
>> I think it's more likely that not that the OS will consider shared
>> memory to be potentially swappable.  On some platforms there is a
>> shmctl
>> call you can make to lock your shmem in memory, but (a) we don't use
>> it
>> and (b) it may well require privileges we haven't got anyway.
>
> Are you saying we shouldn't, or we don't yet? I simply assumed that we
> did use that function - surely it must be at least an option? RHEL
> supports this at least....
>
> It may well be that we don't have those privileges, in which case we
> turn off the option. Often, we (or I?) will want to install a dedicated
> server, so we should have all the permissions we need, in which case...
>
>> This has always been one of the arguments against making
>> shared_buffers
>> really large, of course --- if the buffers aren't all heavily used,
>> and
>> the OS decides to swap them to disk, you are worse off than you would
>> have been with a smaller shared_buffers setting.
>
> Not really, just an argument against making them *too* large. Large
> *and* utilised is OK, so we need ways of judging optimal sizing.
>
>> However, I'm still really nervous about the idea of using
>> effective_cache_size to control the ARC algorithm.  That number is
>> usually entirely bogus.  Right now it is only a second-order influence
>> on certain planner estimates, and I am afraid to rely on it any more
>> heavily than that.
>
> ...ah yes, effective_cache_size.
>
> The manual describes effective_cache_size as if it had something to do
> with the OS, and some of this discussion has picked up on that.
>
> effective_cache_size is used in only two places in the code (both in
> the
> planner), as an estimate for calculating the cost of a) nonsequential
> access and b) index access, mainly as a way of avoiding overestimates
> of
> access costs for small tables.
>
> There is absolutely no implication in the code that
> effective_cache_size
> measures anything in the OS; what it gives is an estimate of the number
> of blocks that will be available from *somewhere* in memory (i.e. in
> shared_buffers OR OS cache) for one particular table (the one currently
> being considered by the planner).
>
> Crucially, the "size" referred to is the size of the *estimate*, not
> the
> size of the OS cache (nor the size of the OS cache + shared_buffers).
> So
> setting effective_cache_size = total memory available or setting
> effective_cache_size = total memory - shared_buffers are both wildly
> irrelevant things to do, or any assumption that directly links memory
> size to that parameter. So talking about "effective_cache_size" as if
> it
> were the OS cache isn't the right thing to do.
>
> ...It could be that we use a very high % of physical memory as
> shared_buffers - in which case the effective_cache_size would represent
> the contents of shared_buffers.
>
> Note also that the planner assumes that all tables are equally likely
> to
> be in cache. Increasing effective_cache_size in postgresql.conf seems
> destined to give the wrong answer in planning unless you absolutely
> understand what it does.
>
> I will submit a patch to correct the description in the manual.
>
> Further comments:
> The two estimates appear to use effective_cache_size differently:
> a) assumes that a table of size effective_cache_size will be 50% in
> cache
> b) assumes that effective_cache_size blocks are available, so for a
> table of size == effective_cache_size, then it will be 100% available
>
> IMHO the GUC should be renamed "estimated_cached_blocks", with the old
> name deprecated to force people to re-read the manual description of
> what effective_cache_size means and then set accordingly.....all of
> that
> in 8.0....
>
> --
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)


Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

From
Josh Berkus
Date:
Thomas,

> As a result, I was intending to inflate the value of
> effective_cache_size to closer to the amount of unused RAM on some of
> the machines I admin (once I've verified that they all have a unified
> buffer cache). Is that correct?

Currently, yes.  Right now, e_c_s is used just to inform the planner and make
index vs. table scan and join order decisions.

The problem which Simon is bringing up is part of a discussion about doing
*more* with the information supplied by e_c_s.    He points out that it's not
really related to the *real* probability of any particular table being
cached.   At least, if I'm reading him right.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

From
Gaetano Mendola
Date:
Thomas F.O'Connell wrote:
>
> As a result, I was intending to inflate the value of
> effective_cache_size to closer to the amount of unused RAM on some of
> the machines I admin (once I've verified that they all have a unified
> buffer cache). Is that correct?
>

Effective cache size is IMHO a "bogus" parameter on postgresql.conf,
this because:

1) That parameter is not intended to instruct postgres to use that ram but
    is only an hint to the engine on what the "DBA" *believe* the OS cache
    memory for postgres
2) This parameter change only the cost evaluation of plans ( and not soo
    much )

so don't hope to double this parameter and push postgres to use more RAM.



Regards
Gaetano Mendola





Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

From
Simon Riggs
Date:
On Wed, 2004-10-27 at 01:39, Josh Berkus wrote:
> Thomas,
>
> > As a result, I was intending to inflate the value of
> > effective_cache_size to closer to the amount of unused RAM on some of
> > the machines I admin (once I've verified that they all have a unified
> > buffer cache). Is that correct?
>
> Currently, yes.

I now believe the answer to that is "no, that is not fully correct",
following investigation into how to set that parameter correctly.

> Right now, e_c_s is used just to inform the planner and make
> index vs. table scan and join order decisions.

Yes, I agree that is what e_c_s is used for.

...lets go deeper:

effective_cache_size is used to calculate the number of I/Os required to
index scan a table, which varies according to the size of the available
cache (whether this be OS cache or shared_buffers). The reason to do
this is because whether a table is in cache can make a very great
difference to access times; *small* tables tend to be the ones that vary
most significantly. PostgreSQL currently uses the Mackert and Lohman
[1989] equation to assess how much of a table is in cache in a blocked
DBMS with a finite cache.

The Mackert and Lohman equation is accurate, as long as the parameter b
is reasonably accurately set. [I'm discussing only the current behaviour
here, not what it can or should or could be] If it is incorrectly set,
then the equation will give the wrong answer for small tables. The same
answer (i.e. same asymptotic behaviour) is returned for very large
tables, but they are the ones we didn't worry about anyway. Getting the
equation wrong means you will choose sub-optimal plans, potentially
reducing your performance considerably.

As I read it, effective_cache_size is equivalent to the parameter b,
defined as (p.3) "minimum buffer size dedicated to a given scan". M&L
they point out (p.3) "We...do not consider interactions of
multiple users sharing the buffer for multiple file accesses".

Either way, M&L aren't talking about "the total size of the cache",
which we would interpret to mean shared_buffers + OS cache, in our
effort to not forget the beneficial effect of the OS cache. They use the
phrase "dedicated to a given scan"....

AFAICS "effective_cache_size" should be set to a value that reflects how
many other users of the cache there might be. If you know for certain
you're the only user, set it according to the existing advice. If you
know you aren't, then set it an appropriate factor lower. Setting that
accurately on a system wide basis may clearly be difficult and setting
it high will often be inappropriate.

The manual is not clear as to how to set effective_cache_size. Other
advice misses out the effect of the many scans/many tables issue and
will give the wrong answer for many calculations, and thus produce
incorrect plans for 8.0 (and earlier releases also).

This is something that needs to be documented rather than a bug fix.
It's a complex one, so I'll await all of your objections before I write
a new doc patch.

[Anyway, I do hope I've missed something somewhere in all that, though
I've read their paper twice now. Fairly accessible, but requires
interpretation to the PostgreSQL case. Mackert and Lohman [1989] "Index
Scans using a finite LRU buffer: A validated I/O model"]

> The problem which Simon is bringing up is part of a discussion about doing
> *more* with the information supplied by e_c_s.    He points out that it's not
> really related to the *real* probability of any particular table being
> cached.   At least, if I'm reading him right.

Yes, that was how Jan originally meant to discuss it, but not what I meant.

Best regards,

Simon Riggs