Re: [WIP] cache estimates, cache access cost - Mailing list pgsql-hackers

From Cédric Villemain
Subject Re: [WIP] cache estimates, cache access cost
Date
Msg-id BANLkTi=L=cOGEWDzHk90V4iMksSdzGFfFQ@mail.gmail.com
Whole thread Raw
In response to Re: [WIP] cache estimates, cache access cost  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [WIP] cache estimates, cache access cost
List pgsql-hackers
2011/6/19 Robert Haas <robertmhaas@gmail.com>:
> On Sun, Jun 19, 2011 at 9:38 AM, Greg Stark <stark@mit.edu> wrote:
>> On Tue, Jun 14, 2011 at 4:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> 1. ANALYZE happens far too infrequently to believe that any data taken
>>> at ANALYZE time will still be relevant at execution time.
>>> 2. Using data gathered by ANALYZE will make plans less stable, and our
>>> users complain not infrequently about the plan instability we already
>>> have, therefore we should not add more.
>>> 3. Even if the data were accurate and did not cause plan stability, we
>>> have no evidence that using it will improve real-world performance.
>>
>> I feel like this is all baseless FUD. ANALYZE isn't perfect but it's
>> our interface for telling postgres to gather stats and we generally
>> agree that having stats and modelling the system behaviour as
>> accurately as practical is the right direction so we need a specific
>> reason why this stat and this bit of modeling is a bad idea before we
>> dismiss it.
>>
>> I think the kernel of truth in these concerns is simply that
>> everything else ANALYZE looks at mutates only on DML. If you load the
>> same data into two databases and run ANALYZE you'll get (modulo random
>> sampling) the same stats. And if you never modify it and analyze it
>> again a week later you'll get the same stats again. So autovacuum can
>> guess when to run analyze based on the number of DML operations, it
>> can run it without regard to how busy the system is, and it can hold
>> off on running it if the data hasn't changed.
>>
>> In the case of the filesystem buffer cache the cached percentage will
>> vary over time regardless of whether the data changes. Plain select
>> queries will change it, even other activity outside the database will
>> change it. There are a bunch of strategies for mitigating this
>> problem: we might want to look at the cache situation more frequently,
>> discount the results we see since more aggressively, and possibly
>> maintain a kind of running average over time.
>>
>> There's another problem which I haven't seen mentioned. Because the
>> access method will affect the cache there's the possibility of
>> feedback loops. e.g. A freshly loaded system prefers sequential scans
>> for a given table because without the cache the seeks of random reads
>> are too expensive... causing it to never load that table into cache...
>> causing that table to never be cached and never switch to an index
>> method. It's possible there are mitigation strategies for this as well
>> such as keeping a running average over time and discounting the
>> estimates with some heuristic values.
>
> *scratches head*
>
> Well, yeah.  I completely agree with you that these are the things we
> need to worry about.  Maybe I did a bad job explaining myself, because
> ISTM you said my concerns were FUD and then went on to restate them in
> different words.
>
> I'm not bent out of shape about using ANALYZE to try to gather the
> information.  That's probably a reasonable approach if it turns out we
> actually need to do it at all.  I am not sure we do.  What I've argued
> for in the past is that we start by estimating the percentage of the
> relation that will be cached based on its size relative to
> effective_cache_size, and allow the administrator to override the
> percentage on a per-relation basis if it turns out to be wrong.  That
> would avoid all of these concerns and allow us to focus on the issue
> of how the caching percentages impact the choice of plan, and whether
> the plans that pop out are in fact better when you provide information
> on caching as input.  If we have that facility in core, then people
> can write scripts or plug-in modules to do ALTER TABLE .. SET
> (caching_percentage = XYZ) every hour or so based on the sorts of
> statistics that Cedric is gathering here, and users will be able to
> experiment with a variety of algorithms and determine which ones work
> the best.

Robert, I am very surprised.
My patch does offer that.

1st, I used ANALYZE because it is the way to update pg_class I found.
You are suggesting ALTER TABLE instead, that is fine, but give me that
lock-free :) else we have the ahem.. Alvaro's pg_class_ng (I find this
one interesting because it will be lot easier to have different values
on standby server if we find a way to have pg_class_ng 'updatable' per
server)
So, as long as the value can be change without problem, I don't care
where it resides.

2nd, I provided the patches on the last CF, exactly to allow to go to
the exciting part: the cost-estimates changes. (after all, we can work
on the cost estimate, and if later we find a way to use ALTER TABLE or
pg_class_ng, just do it instead of via the ANALYZE magic)

3nd, you can right now write a plugin to set the value of rel_oscache
(exactly like the one you'll do for a ALTER TABLE SET reloscache...)

RelationGetRelationOSCacheInFork(Relation relation, ForkNumber forkNum)
{      float4 percent = 0;      /* if a plugin is present, let it manage things */      if (OSCache_hook)
percent= (*OSCache_hook) (relation, forkNum);      return percent;} 

Looks like the main fear is because I used the ANALYZE word...

PS: ANALYZE OSCACHE does *not* run with ANALYZE, those are distinct
operations. (ANALYZE won't do the job of ANALYZE OSCACHE, we can
discuss the grammar, maybe a ANALYZE ([OSCACHE], [DATA], ...) will be
better ).
--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: Re: Range Types and extensions
Next
From: Dmitriy Igrishin
Date:
Subject: Re: Libpq enhancement