Re: [HACKERS] ARC Memory Usage analysis - Mailing list pgsql-patches
From | Simon Riggs |
---|---|
Subject | Re: [HACKERS] ARC Memory Usage analysis |
Date | |
Msg-id | 1098780555.6807.136.camel@localhost.localdomain Whole thread Raw |
In response to | Re: [HACKERS] ARC Memory Usage analysis (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] ARC Memory Usage analysis
|
List | pgsql-patches |
On Mon, 2004-10-25 at 16:34, Jan Wieck wrote: > The problem is, with a too small directory ARC cannot guesstimate what > might be in the kernel buffers. Nor can it guesstimate what recently was > in the kernel buffers and got pushed out from there. That results in a > way too small B1 list, and therefore we don't get B1 hits when in fact > the data was found in memory. B1 hits is what increases the T1target, > and since we are missing them with a too small directory size, our > implementation of ARC is propably using a T2 size larger than the > working set. That is not optimal. I think I have seen that the T1 list shrinks "too much", but need more tests...with some good test results The effectiveness of ARC relies upon the balance between the often conflicting requirements of "recency" and "frequency". It seems possible, even likely, that pgsql's version of ARC may need some subtle changes to rebalance it - if we are unlikely enough to find cases where it genuinely is out of balance. Many performance tests are required, together with a few ideas on extra parameters to include....hence my support of Jan's ideas. That's also why I called the B1+B2 hit ratio "turbulence" because it relates to how much oscillation is happening between T1 and T2. In physical systems, we expect the oscillations to be damped, but there is no guarantee that we have a nearly critically damped oscillator. (Note that the absence of turbulence doesn't imply that T1+T2 is optimally sized, just that is balanced). [...and all though the discussion has wandered away from my original patch...would anybody like to commit, or decline the patch?] > If we would replace the dynamic T1 buffers with a max_backends*2 area of > shared buffers, use a C value representing the effective cache size and > limit the T1target on the lower bound to effective cache size - shared > buffers, then we basically moved the T1 cache into the OS buffers. Limiting the minimum size of T1len to be 2* maxbackends sounds like an easy way to prevent overbalancing of T2, but I would like to follow up on ways to have T1 naturally stay larger. I'll do a patch with this idea in, for testing. I'll call this "T1 minimum size" so we can discuss it. Any other patches are welcome... It could be that B1 is too small and so we could use a larger value of C to keep track of more blocks. I think what is being suggested is two GUCs: shared_buffers (as is), plus another one, larger, which would allow us to track what is in shared_buffers and what is in OS cache. I have comments on "effective cache size" below.... On Mon, 2004-10-25 at 17:03, Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: > > This all only holds water, if the OS is allowed to swap out shared > > memory. And that was my initial question, how likely is it to find this > > to be true these days? > > I think it's more likely that not that the OS will consider shared > memory to be potentially swappable. On some platforms there is a shmctl > call you can make to lock your shmem in memory, but (a) we don't use it > and (b) it may well require privileges we haven't got anyway. Are you saying we shouldn't, or we don't yet? I simply assumed that we did use that function - surely it must be at least an option? RHEL supports this at least.... It may well be that we don't have those privileges, in which case we turn off the option. Often, we (or I?) will want to install a dedicated server, so we should have all the permissions we need, in which case... > This has always been one of the arguments against making shared_buffers > really large, of course --- if the buffers aren't all heavily used, and > the OS decides to swap them to disk, you are worse off than you would > have been with a smaller shared_buffers setting. Not really, just an argument against making them *too* large. Large *and* utilised is OK, so we need ways of judging optimal sizing. > However, I'm still really nervous about the idea of using > effective_cache_size to control the ARC algorithm. That number is > usually entirely bogus. Right now it is only a second-order influence > on certain planner estimates, and I am afraid to rely on it any more > heavily than that. ...ah yes, effective_cache_size. The manual describes effective_cache_size as if it had something to do with the OS, and some of this discussion has picked up on that. effective_cache_size is used in only two places in the code (both in the planner), as an estimate for calculating the cost of a) nonsequential access and b) index access, mainly as a way of avoiding overestimates of access costs for small tables. There is absolutely no implication in the code that effective_cache_size measures anything in the OS; what it gives is an estimate of the number of blocks that will be available from *somewhere* in memory (i.e. in shared_buffers OR OS cache) for one particular table (the one currently being considered by the planner). Crucially, the "size" referred to is the size of the *estimate*, not the size of the OS cache (nor the size of the OS cache + shared_buffers). So setting effective_cache_size = total memory available or setting effective_cache_size = total memory - shared_buffers are both wildly irrelevant things to do, or any assumption that directly links memory size to that parameter. So talking about "effective_cache_size" as if it were the OS cache isn't the right thing to do. ...It could be that we use a very high % of physical memory as shared_buffers - in which case the effective_cache_size would represent the contents of shared_buffers. Note also that the planner assumes that all tables are equally likely to be in cache. Increasing effective_cache_size in postgresql.conf seems destined to give the wrong answer in planning unless you absolutely understand what it does. I will submit a patch to correct the description in the manual. Further comments: The two estimates appear to use effective_cache_size differently: a) assumes that a table of size effective_cache_size will be 50% in cache b) assumes that effective_cache_size blocks are available, so for a table of size == effective_cache_size, then it will be 100% available IMHO the GUC should be renamed "estimated_cached_blocks", with the old name deprecated to force people to re-read the manual description of what effective_cache_size means and then set accordingly.....all of that in 8.0.... -- Best Regards, Simon Riggs
pgsql-patches by date: