Re: How does PG know if data is in memory? - Mailing list pgsql-performance

From Craig Ringer
Subject Re: How does PG know if data is in memory?
Date
Msg-id 4CA5D2F5.5070900@postnewspapers.com.au
Whole thread Raw
In response to Re: How does PG know if data is in memory?  (Fabrício dos Anjos Silva<fabricio.silva@linkcom.com.br>)
Responses Re: How does PG know if data is in memory?  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: How does PG know if data is in memory?  ("Pierre C" <lists@peufeu.com>)
Re: How does PG know if data is in memory?  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-performance
On 1/10/2010 7:12 PM, Fabrício dos Anjos Silva wrote:
>
>     Thank you all for the replies.
>
>     If PG does not know whether needed data is in memory, how does it
> estimate cost? There is a huge difference between access time in memory
> and in secondary storage. Not taking this into account results in almost
> "useless" estimates.

It's generally configured with the conservative assumption that data
will have to come from disk.

Note that the query planner's job isn't to figure out how long the query
will take. It's to compare various possible query plans and decide which
will be fastest. There are certainly cases where knowing what's cached
would help with this - for example: if an index is cached but the table
data isn't, it's more likely to be worth using the index to reduce disk
reads. But I don't know just how much difference it really makes.

Because the query often only wants a small subset of the data, and whole
relations are rarely fully cached, it's not enough to know that "some of
relation X is cached", it has to know if the cached parts are the parts
that'll be required, or at least an approximation of that. It sounds
horrendously complicated to keep track of to me, and in the end it won't
make query execution any faster, it'll just potentially help the planner
pick a better plan. I wonder if that'd be worth the extra CPU time spent
managing the cache and cache content stats, and using those cache stats
when planning? It'd be an interesting experiment, but the outcome is
hardly obvious.

As you can see, I don't really agree that the planner's estimates are
useless just because it's not very aware of the cache's current
contents. It has a pretty good idea of the system's memory and how much
of that can be used for cache, and knows how big various indexes and
relations are. That seems to work pretty well.

If some kind of cache awareness was to be added, I'd be interested in
seeing a "hotness" measure that tracked how heavily a given
relation/index has been accessed and how much has been read from it
recently. A sort of age-scaled blocks-per-second measure that includes
both cached and uncached (disk) reads. This would let the planner know
how likely parts of a given index/relation are to be cached in memory
without imposing the cost of tracking the cache in detail. I'm still not
sure it'd be all that useful, though...

 > I am not saying that PG does a pour job, but I've
> been using it for 4 years and from time to time I notice very pour
> estimates.

Most of the issues reported here, at least, are statistics issues,
rather than lack of knowledge about cache status. The planner thinks
it'll find (say) 2 tuples maching a filter, and instead finds 100,000,
so it chooses a much less efficient join type. That sort of thing is
really independent of the cache state.

>     Recently, I faced poor performance again, but this time because we
> started to work with larger tables (10M rows). This encourage me to
> study PG tuning again, trying to understand how the planner works and
> trying to get the best of it. Unfortunately, it does not seem to be an
> easy task.

No argument there! Like any database there's a fair bit of black magic
involved, and a whole lot of benchmarking. The key thing is to have
appropriate statistics (usually high), get a reasonable random_page_cost
and seq_page_cost, to set your effective cache size appropriately, and
to set reasonable work_mem.

"Reasonable" is hard to work out for work_mem, because Pg's work_mem
limit is per-sort (etc) not per-query or per-backend. I understand that
making it per-query is way, way harder than it sounds at face value,
though, so we must make do.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

pgsql-performance by date:

Previous
From: Fabrício dos Anjos Silva
Date:
Subject: Re: How does PG know if data is in memory?
Next
From: "Kevin Grittner"
Date:
Subject: Re: How does PG know if data is in memory?