Re: cache estimates, cache access cost - Mailing list pgsql-hackers

From Robert Haas
Subject Re: cache estimates, cache access cost
Date
Msg-id BANLkTimZpDpPizxei4A-szs1HYb0Y8x_HA@mail.gmail.com
Whole thread Raw
In response to Re: cache estimates, cache access cost  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: cache estimates, cache access cost  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
List pgsql-hackers
On Sun, May 15, 2011 at 11:52 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Cédric Villemain wrote:
>>
>> http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache
>
> This rebases easily to make Cedric's changes move to the end; I just pushed
> a version with that change to
> https://github.com/greg2ndQuadrant/postgres/tree/analyze_cache if anyone
> wants a cleaner one to browse.  I've attached a patch too if that's more
> your thing.

Thank you.  I don't much like sucking in other people's git repos - it
tends to take a lot longer than just opening a patch file, and if I
add the repo as a remote then my git repo ends up bloated.  :-(

> The more important question is how to store the data collected and
> then use it for optimizing queries.

Agreed, but unless I'm missing something, this patch does nothing
about that.  I think the first step needs to be to update all the
formulas that are based on random_page_cost and seq_page_cost to
properly take cache_page_cost into account - and in some cases it may
be a bit debatable what the right mathematics are.

For what it's worth, I don't believe for a minute that an analyze
process that may run only run on a given table every six months has a
chance of producing useful statistics about the likelihood that a
table will be cached.  The buffer cache can turn over completely in
under a minute, and a minute is a lot less than a month.  Now, if we
measured this information periodically for a long period of time and
averaged it, that might be a believable basis for setting an optimizer
parameter.  But I think we should take the approach recently discussed
on performance: allow it to be manually set by the administrator on a
per-relation basis, with some reasonable default (maybe based on the
size of the relation relative to effective_cache_size) if the
administrator doesn't intervene.  I don't want to be excessively
negative about the approach of examining the actual behavior of the
system and using that to guide system behavior - indeed, I think there
are quite a few places where we would do well to incorporate that
approach to a greater degree than we do currently.  But I think that
it's going to take a lot of research, and a lot of work, and a lot of
performance testing, to convince ourselves that we've come up with an
appropriate feedback mechanism that will actually deliver better
performance across a large variety of workloads.  It would be much
better, IMHO, to *first* get a cached_page_cost parameter added, even
if the mechanism by which caching percentages are set is initially
quite crude - that will give us a clear-cut benefit that people can
begin enjoying immediately.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: DOMAINs and CASTs
Next
From: Darren Duncan
Date:
Subject: deprecating contrib for PGXN