Re: cache estimates, cache access cost - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: cache estimates, cache access cost |
Date | |
Msg-id | BANLkTimZpDpPizxei4A-szs1HYb0Y8x_HA@mail.gmail.com Whole thread Raw |
In response to | Re: cache estimates, cache access cost (Greg Smith <greg@2ndquadrant.com>) |
Responses |
Re: cache estimates, cache access cost
(Cédric Villemain <cedric.villemain.debian@gmail.com>)
|
List | pgsql-hackers |
On Sun, May 15, 2011 at 11:52 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Cédric Villemain wrote: >> >> http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache > > This rebases easily to make Cedric's changes move to the end; I just pushed > a version with that change to > https://github.com/greg2ndQuadrant/postgres/tree/analyze_cache if anyone > wants a cleaner one to browse. I've attached a patch too if that's more > your thing. Thank you. I don't much like sucking in other people's git repos - it tends to take a lot longer than just opening a patch file, and if I add the repo as a remote then my git repo ends up bloated. :-( > The more important question is how to store the data collected and > then use it for optimizing queries. Agreed, but unless I'm missing something, this patch does nothing about that. I think the first step needs to be to update all the formulas that are based on random_page_cost and seq_page_cost to properly take cache_page_cost into account - and in some cases it may be a bit debatable what the right mathematics are. For what it's worth, I don't believe for a minute that an analyze process that may run only run on a given table every six months has a chance of producing useful statistics about the likelihood that a table will be cached. The buffer cache can turn over completely in under a minute, and a minute is a lot less than a month. Now, if we measured this information periodically for a long period of time and averaged it, that might be a believable basis for setting an optimizer parameter. But I think we should take the approach recently discussed on performance: allow it to be manually set by the administrator on a per-relation basis, with some reasonable default (maybe based on the size of the relation relative to effective_cache_size) if the administrator doesn't intervene. I don't want to be excessively negative about the approach of examining the actual behavior of the system and using that to guide system behavior - indeed, I think there are quite a few places where we would do well to incorporate that approach to a greater degree than we do currently. But I think that it's going to take a lot of research, and a lot of work, and a lot of performance testing, to convince ourselves that we've come up with an appropriate feedback mechanism that will actually deliver better performance across a large variety of workloads. It would be much better, IMHO, to *first* get a cached_page_cost parameter added, even if the mechanism by which caching percentages are set is initially quite crude - that will give us a clear-cut benefit that people can begin enjoying immediately. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: