Thread: How does PG know if data is in memory?

How does PG know if data is in memory?

From

Fabrício dos Anjos Silva

Date:

29 September 2010, 13:31:51

   Hi,

   After reading lots of documentation, I still don't understand fully how PG knows if some needed data is in memory or in second storage.

   While choosing the best query plan, the optimizer must take this into account. Does PG consider this? If so, how does it know?

   I presume it checks its shared buffer. But if the data is not in the shared buffer, it could be on OS cache, and the plan would not be optimized.

   Can PG check the OS cache? If not, considering a dedicated DB server, is it advised to raise the shared buffer size up to a maximum that does not influence the rest of the system? This way, the shared buffer check would have a higher probability of returning a correct answer.

   When setting seq_page_cost and random_page_cost, do I have to consider the probability that data will be in memory? Or does seq_page_cost mean "sequential access on disk" and random_page_cost mean "random access on disk"?

   I appreciate if someone could clear this out.

   Thanks!

Fabrício dos Anjos Silva
LinkCom Soluções em T.I.

Re: How does PG know if data is in memory?

From

"Kevin Grittner"

Date:

29 September 2010, 13:37:03

Fabrício dos Anjos Silva<fabricio.silva@linkcom.com.br> wrote:

> After reading lots of documentation, I still don't understand
> fully how PG knows if some needed data is in memory or in second
> storage.

> Does PG consider this?

No.

> When setting seq_page_cost and random_page_cost, do I have to
> consider the probability that data will be in memory?

Yes.

-Kevin

Re: How does PG know if data is in memory?

From

Samuel Gendler

Date:

29 September 2010, 14:08:49

2010/9/29 Fabrício dos Anjos Silva <fabricio.silva@linkcom.com.br>

When setting seq_page_cost and random_page_cost, do I have to consider the probability that data will be in memory? Or does seq_page_cost mean "sequential access on disk" and random_page_cost mean "random access on disk"?

The reason seq_page_cost and random_page_cost exist as parameters is so that you can inform the optimizer what the relative costs of those actions are, which is directly related to the expected size of the filesystem cache, ratio of total db size to available cache memory, and the performance of your disk i/o subsystems (and any other disk-related work the host may be doing). effective_cache_size allows you to tell postgres how big you believe all available cache memory is - shared_buffers and OS cache.

As to your question about increasing shared_buffers to be some significant proportion of available RAM - apparently, that is not a good idea. I've seen advice that said you shouldn't go above 8GB for shared_buffers and I've also seen 12GB suggested as an upper limit, too. On my host with 48GB of RAM, I didn't see much difference between 8GB and 12GB on a fairly wide variety of tests, so mine is set at 8GB with an efective_cache_size of 36GB.

I appreciate if someone could clear this out.

Thanks!

Fabrício dos Anjos Silva
LinkCom Soluções em T.I.

Re: How does PG know if data is in memory?

From

Fabrício dos Anjos Silva

Date:

01 October 2010, 08:13:06

   Thank you all for the replies.

   If PG does not know whether needed data is in memory, how does it estimate cost? There is a huge difference between access time in memory and in secondary storage. Not taking this into account results in almost "useless" estimates. I am not saying that PG does a pour job, but I've been using it for 4 years and from time to time I notice very pour estimates. After some testing 2 years ago, the only configuration I could manage to use was to tell PG to avoid Seq Scan and Index Scans. I know that in many situations these techniques are the best to choose, but when they are chosen where they are not suitable, I get very bad plans.
   Recently, I faced poor performance again, but this time because we started to work with larger tables (10M rows). This encourage me to study PG tuning again, trying to understand how the planner works and trying to get the best of it. Unfortunately, it does not seem to be an easy task.
   If someone could point good books about PG tuning, I would appreciate that. I found some yet to be released books about PG 9. Any comments about them?

   Thank you all.

Fabrício dos Anjos Silva
LinkCom Soluções em T.I.

Em 29 de setembro de 2010 14:08, Samuel Gendler <sgendler@ideasculptor.com> escreveu:

2010/9/29 Fabrício dos Anjos Silva <fabricio.silva@linkcom.com.br>

   When setting seq_page_cost and random_page_cost, do I have to consider the probability that data will be in memory? Or does seq_page_cost mean "sequential access on disk" and random_page_cost mean "random access on disk"?

The reason seq_page_cost and random_page_cost exist as parameters is so that you can inform the optimizer what the relative costs of those actions are, which is directly related to the expected size of the filesystem cache, ratio of total db size to available cache memory, and the performance of your disk i/o subsystems (and any other disk-related work the host may be doing). effective_cache_size allows you to tell postgres how big you believe all available cache memory is - shared_buffers and OS cache.

As to your question about increasing shared_buffers to be some significant proportion of available RAM - apparently, that is not a good idea. I've seen advice that said you shouldn't go above 8GB for shared_buffers and I've also seen 12GB suggested as an upper limit, too. On my host with 48GB of RAM, I didn't see much difference between 8GB and 12GB on a fairly wide variety of tests, so mine is set at 8GB with an efective_cache_size of 36GB.

   I appreciate if someone could clear this out.

   Thanks!

Fabrício dos Anjos Silva
LinkCom Soluções em T.I.

Re: How does PG know if data is in memory?

From

Craig Ringer

Date:

01 October 2010, 09:24:46

On 1/10/2010 7:12 PM, Fabrício dos Anjos Silva wrote:
>
>     Thank you all for the replies.
>
>     If PG does not know whether needed data is in memory, how does it
> estimate cost? There is a huge difference between access time in memory
> and in secondary storage. Not taking this into account results in almost
> "useless" estimates.

It's generally configured with the conservative assumption that data
will have to come from disk.

Note that the query planner's job isn't to figure out how long the query
will take. It's to compare various possible query plans and decide which
will be fastest. There are certainly cases where knowing what's cached
would help with this - for example: if an index is cached but the table
data isn't, it's more likely to be worth using the index to reduce disk
reads. But I don't know just how much difference it really makes.

Because the query often only wants a small subset of the data, and whole
relations are rarely fully cached, it's not enough to know that "some of
relation X is cached", it has to know if the cached parts are the parts
that'll be required, or at least an approximation of that. It sounds
horrendously complicated to keep track of to me, and in the end it won't
make query execution any faster, it'll just potentially help the planner
pick a better plan. I wonder if that'd be worth the extra CPU time spent
managing the cache and cache content stats, and using those cache stats
when planning? It'd be an interesting experiment, but the outcome is
hardly obvious.

As you can see, I don't really agree that the planner's estimates are
useless just because it's not very aware of the cache's current
contents. It has a pretty good idea of the system's memory and how much
of that can be used for cache, and knows how big various indexes and
relations are. That seems to work pretty well.

If some kind of cache awareness was to be added, I'd be interested in
seeing a "hotness" measure that tracked how heavily a given
relation/index has been accessed and how much has been read from it
recently. A sort of age-scaled blocks-per-second measure that includes
both cached and uncached (disk) reads. This would let the planner know
how likely parts of a given index/relation are to be cached in memory
without imposing the cost of tracking the cache in detail. I'm still not
sure it'd be all that useful, though...

 > I am not saying that PG does a pour job, but I've
> been using it for 4 years and from time to time I notice very pour
> estimates.

Most of the issues reported here, at least, are statistics issues,
rather than lack of knowledge about cache status. The planner thinks
it'll find (say) 2 tuples maching a filter, and instead finds 100,000,
so it chooses a much less efficient join type. That sort of thing is
really independent of the cache state.

>     Recently, I faced poor performance again, but this time because we
> started to work with larger tables (10M rows). This encourage me to
> study PG tuning again, trying to understand how the planner works and
> trying to get the best of it. Unfortunately, it does not seem to be an
> easy task.

No argument there! Like any database there's a fair bit of black magic
involved, and a whole lot of benchmarking. The key thing is to have
appropriate statistics (usually high), get a reasonable random_page_cost
and seq_page_cost, to set your effective cache size appropriately, and
to set reasonable work_mem.

"Reasonable" is hard to work out for work_mem, because Pg's work_mem
limit is per-sort (etc) not per-query or per-backend. I understand that
making it per-query is way, way harder than it sounds at face value,
though, so we must make do.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: How does PG know if data is in memory?

From

"Kevin Grittner"

Date:

01 October 2010, 10:46:18

Craig Ringer <craig@postnewspapers.com.au> wrote:

> Because the query often only wants a small subset of the data, and
> whole relations are rarely fully cached, it's not enough to know
> that "some of relation X is cached", it has to know if the cached
> parts are the parts that'll be required, or at least an
> approximation of that. It sounds horrendously complicated to keep
> track of to me, and in the end it won't make query execution any
> faster, it'll just potentially help the planner pick a better
> plan. I wonder if that'd be worth the extra CPU time spent
> managing the cache and cache content stats, and using those cache
> stats when planning? It'd be an interesting experiment, but the
> outcome is hardly obvious.

I agree with that, but I think there's an even more insidious issue
here.  Biasing plans heavily toward using what is already in cache
could have a destabilizing effect on performance.  Let's say that
some query or maintenance skews the cache toward some plan which is
much slower when cached than another plan would be if cached.  Let's
also postulate that this query runs very frequently.  It will always
settle for what's fastest *this* time, not what would make for
fastest performance if consistently used.  If it never chooses the
plan which would run better if cached, the data used for that plan
may never make it into cache, and you will limp along with the
inferior plan forever.

If you set the overall level of caching you expect, the optimizer
will tend to wind up with data cached to support the optimal plans
for that level of caching for the frequently run queries.

-Kevin

Re: How does PG know if data is in memory?

From

"Pierre C"

Date:

01 October 2010, 10:52:23

> It sounds horrendously complicated to keep track of to me, and in the
> end it won't make query execution any faster, it'll just potentially
> help the planner pick a better plan. I wonder if that'd be worth the
> extra CPU time spent managing the cache and cache content stats, and
> using those cache stats when planning? It'd be an interesting
> experiment, but the outcome is hardly obvious.

Well, suppose you pick an index scan, the only way to know which index
(and heap) pages you'll need is to actually do the index scan... which
isn't really something you'd do when planning. So you scan,

Re: How does PG know if data is in memory?

From

Fabrício dos Anjos Silva

Date:

01 October 2010, 11:01:13

   Craig,

   I agree with you. Not completely, but I do.

   I'm just stuck in a situation where I can't figure out what values to use for the parameters. I can't even think of a way on how to test and discover that.
   I followed Josh Berkus' GUC spreadsheet and some tutorials on PG wiki, but how do I test if my configuration is good or bad? I see in PG log that some queries have bad plans, but should I do in order to tell PG to make better decisions? I tried different values with no success.

   I understand that parameters have no "work everywhere" values. Each database has its characteristics and each server has its HW specifications.

   Is there any automated test tool? A can compile a list of real-world queries, and provide an exact copy of my db server just for testing. But how do I do it? Write a bunch of scripts? Is there any serious tool that try different parameters, run a load test, process results and generate reports?

   Again, thanks all of you for the replies.

   Cheers,

Fabrício dos Anjos Silva
LinkCom Soluções em T.I.

2010/10/1 Kevin Grittner <Kevin.Grittner@wicourts.gov>

Craig Ringer <craig@postnewspapers.com.au> wrote:

> Because the query often only wants a small subset of the data, and
> whole relations are rarely fully cached, it's not enough to know
> that "some of relation X is cached", it has to know if the cached
> parts are the parts that'll be required, or at least an
> approximation of that. It sounds horrendously complicated to keep
> track of to me, and in the end it won't make query execution any
> faster, it'll just potentially help the planner pick a better
> plan. I wonder if that'd be worth the extra CPU time spent
> managing the cache and cache content stats, and using those cache
> stats when planning? It'd be an interesting experiment, but the
> outcome is hardly obvious.

I agree with that, but I think there's an even more insidious issue
here. Biasing plans heavily toward using what is already in cache
could have a destabilizing effect on performance. Let's say that
some query or maintenance skews the cache toward some plan which is
much slower when cached than another plan would be if cached. Let's
also postulate that this query runs very frequently. It will always
settle for what's fastest *this* time, not what would make for
fastest performance if consistently used. If it never chooses the
plan which would run better if cached, the data used for that plan
may never make it into cache, and you will limp along with the
inferior plan forever.

If you set the overall level of caching you expect, the optimizer
will tend to wind up with data cached to support the optimal plans
for that level of caching for the frequently run queries.

-Kevin

Re: How does PG know if data is in memory?

From

Tom Lane

Date:

01 October 2010, 11:40:20

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> I agree with that, but I think there's an even more insidious issue
> here.  Biasing plans heavily toward using what is already in cache
> could have a destabilizing effect on performance.

Not to mention the destabilizing effect on the plans themselves.
Behavior like that would make EXPLAIN nearly useless, because the plan
you get would vary from moment to moment even when "nothing is
changing".  It's fairly clear that people don't actually want that.

            regards, tom lane

Re: How does PG know if data is in memory?

From

Samuel Gendler

Date:

01 October 2010, 13:50:50

2010/10/1 Fabrício dos Anjos Silva <fabricio.silva@linkcom.com.br>

   Craig,

   I agree with you. Not completely, but I do.

   I'm just stuck in a situation where I can't figure out what values to use for the parameters. I can't even think of a way on how to test and discover that.
   I followed Josh Berkus' GUC spreadsheet and some tutorials on PG wiki, but how do I test if my configuration is good or bad? I see in PG log that some queries have bad plans, but should I do in order to tell PG to make better decisions? I tried different values with no success.

You can set different values for most configuration params on individual db connections. You can test different values for individual slow-running queries. Rather than disabling whole features in the entire database - which may well make lots of other queries run less quickly - you can, at the very least, just disable those features before running the queries that are known to be slow and for which you could not find global values which worked well. Disable sequence plans just before running query x, or boost work_mem to a very high value just for query y. It is also possible that you've simply outstripped your hardware's capability. We had a database with a number of tables containing tens of millions of rows and queries which frequently required aggregating over whole tables. Moving from 8Gb of RAM to 48GB of RAM (so that a large chunk of the db fits in memory) and from 6 spindles to 12, and then just modifying the global config to suit the new hardware gave us a huge performance boost that we could never have gotten on the old hardware, no matter how much tuning of individual queries we did. I was actually able to drop all of the custom config tweaks that we had on individual queries, though I'm sure I'll eventually wind up adding some back - queries that aggregate over large tables really benefit from a lot of work_mem - more than I want to configure globally.

Re: How does PG know if data is in memory?

From

Craig Ringer

Date:

02 October 2010, 23:43:13

On 3/10/2010 7:39 AM, Richard Troy wrote:

> I can't speak for modern "OpenVMS", but "back in the day", VMS had a very
> effective memory management strategy which, in effect, made it as if all
> memory was a cache for disk. It did this by means of a mechanism by which
> to identify all potentially reachable disk space. When disk was read in,
> an entry would be made mapping the memory to the disk space from which it
> came - and if it was later updated, the mapping entry was marked "dirty."
> Whenever disk access was contemplated, a check was made to see if it was
> already in memory and if so, it'd provide access to the in-memory copy
> instead of doing the read again. (This also permitted, under some
> circumstances, to reduce write activity as well.)

That's how Linux's memory management works, too, at least if I
understand you correctly. Pretty much every modern OS does it. Pg is
reliant on the operating system's disk cache, and has some minimal
knowledge of it (see effective_cache_size) .

I don't know how shared_buffers management works, but certainly at the
OS cache level that's what already happens.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: How does PG know if data is in memory?

From

Greg Smith

Date:

04 October 2010, 00:07:40

Samuel Gendler wrote:
> As to your question about increasing shared_buffers to be some
> significant proportion of available RAM - apparently, that is not a
> good idea.  I've seen advice that said you shouldn't go above 8GB for
> shared_buffers and I've also seen 12GB suggested as an upper limit,
> too.  On my host with 48GB of RAM, I didn't see much difference
> between 8GB and 12GB on a fairly wide variety of tests, so mine is set
> at 8GB with an efective_cache_size of 36GB.

The publicly discussed tests done at Sun suggested 10GB was the
effective upper limit on Solaris before performance started dropping
instead of increasing on some of their internal benchmarks.  And I've
heard privately from two people who have done similar experiments on
Linux and found closer to 8GB to be the point where performance started
to drop.  I'm hoping to get some hardware capable of providing some more
public results in this area, and some improvements if we can get better
data about what causes this drop in efficiency.

Given that some write-heavy workloads start to suffer considerable
checkpoint issues when shared_buffers is set to a really high value,
there's at least two reasons to be conservative here.  The big win is
going from the tiny default to hundreds of megabytes.  Performance keeps
going up for many people into the low gigabytes range, but the odds of
hitting a downside increase too.  Since PostgreSQL uses the OS cache,
too, I see some sytems with a whole lot of RAM where the 512MB - 1GB
range still ends up being optimal, just in terms of balancing the
improvements you get from things being in the cache vs. the downsides of
heavy checkpoint writes.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

Re: How does PG know if data is in memory?

From

Greg Smith

Date:

04 October 2010, 00:23:03

Fabrício dos Anjos Silva wrote:
> After reading lots of documentation, I still don't understand fully
> how PG knows if some needed data is in memory or in second storage.
> While choosing the best query plan, the optimizer must take this into
> account. Does PG consider this? If so, how does it know?

There are actually two different questions here, and I'm not sure both
have been completely clarified for you by the discussion yet.

PostgreSQL has its own dedicated pool of memory, sized by
shared_buffers.  When you request a page, if it's already in memory, you
get pointed toward that copy and no physical I/O happens.  If it's not,
PostgreSQL asks the OS to read the page.  It's possible that will return
a page that's in the OS cache; the database currently has no idea when
this does or doesn't happen though.  The hit % counters in the database
only reflect shared_buffers hits, not OS ones.  Some work to integrate
the OS cache information into the database has been done, the current
leading project in that area is pgfincore:
http://pgfoundry.org/projects/pgfincore/

However, none of this information is considered at all by the query
optimizer.  It makes plans without any knowledge of what is or isn't in
RAM right now, either the dedicated database memory or the OS cache.
Only the ratios of the planner constants are really considered.  You can
set those on a query by query basis to provide subtle hints when you
know something the planner doesn't, but you have to be very careful
about doing that as those plans tend to get obsolete eventually when you
do that trick.

I had a brain-storming session on this subject with a few of the hackers
in the community in this area a while back I haven't had a chance to do
something with yet (it exists only as a pile of scribbled notes so
far).  There's a couple of ways to collect data on what's in the
database and OS cache, and a couple of ways to then expose that data to
the optimizer.  But that needs to be done very carefully, almost
certainly as only a manual process at first, because something that's
producing cache feedback all of the time will cause plans to change all
the time, too.  Where I suspect this is going is that we may end up
tracking various statistics over time, then periodically providing a way
to export a mass of "typical % cached" data back to the optimizer for
use in plan cost estimation purposes.  But the idea of monitoring
continuously and always planning based on the most recent data available
has some stability issues, both from a "too many unpredictable plan
changes" and a "bad short-term feedback loop" perspective, as mentioned
by Tom and Kevin already.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

Re: How does PG know if data is in memory?

From

Greg Smith

Date:

04 October 2010, 00:29:15

Craig Ringer wrote:
> If some kind of cache awareness was to be added, I'd be interested in
> seeing a "hotness" measure that tracked how heavily a given
> relation/index has been accessed and how much has been read from it
> recently. A sort of age-scaled blocks-per-second measure that includes
> both cached and uncached (disk) reads. This would let the planner know
> how likely parts of a given index/relation are to be cached in memory
> without imposing the cost of tracking the cache in detail. I'm still
> not sure it'd be all that useful, though...

Yup, that's one of the design ideas scribbled in my notes, as is the
idea of what someone dubbed a "heat map" that tracked which parts of the
relation where actually the ones in RAM, the other issue you mentioned.
The problem facing a lot of development possibilities in this area is
that we don't have any continuous benchmarking of complicated plans
going on right now.  So if something really innovative is done, there's
really no automatic way to test the result and then see what types of
plans it improves and what it makes worse.  Until there's some better
performance regression work like that around, development on the
optimizer has to favor being very conservative.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

Re: How does PG know if data is in memory?

From

Greg Smith

Date:

04 October 2010, 00:34:17

Fabrício dos Anjos Silva wrote:
> Is there any automated test tool? A can compile a list of real-world
> queries, and provide an exact copy of my db server just for testing.
> But how do I do it? Write a bunch of scripts? Is there any serious
> tool that try different parameters, run a load test, process results
> and generate reports?

There's a list of tools for playing back a test workload at
http://wiki.postgresql.org/wiki/Statement_Playback

I'm not aware of anyone beyond some academic research that has taken
that idea and built something to test many database parameter
combinations.  They did that at
http://www.cs.duke.edu/~shivnath/papers/ituned.pdf but I don't think
that code went public; I asked about it at one point and never heard
anything really useful back.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

Re: How does PG know if data is in memory?

From

Greg Smith

Date:

04 October 2010, 01:02:27

Fabrício dos Anjos Silva wrote:
> If someone could point good books about PG tuning, I would appreciate
> that. I found some yet to be released books about PG 9. Any comments
> about them?

The largest treatment of the subject already in print I'm aware of is in
the Korry and Susan Douglas "PostgreSQL" book from 2005, which has
around 30 pages convering PostgreSQL 8.0.  That material is very good
and much of it still applies, but it's mostly theory without many
examples, and there have been a lot of changes to the optimizer since
then.  There's a good talk by Robert Haas on this subject too that's
quite current, you can find his slides at
http://sites.google.com/site/robertmhaas/presentations and a recording
of one version of him giving it is at
http://www.pgcon.org/2010/schedule/events/208.en.html

My "PostgreSQL 9.0 High Performance", due out later this month if things
continue on schedule, has about 80 pages dedicated to indexing and query
optimization (waiting for final typesetting to know the exact count).
The main difference with what I do compared to every other treatment
I've seen of this subject is I suggest a sample small but not trivial
data set, then show plans for real queries run against it.  So you
should be able to duplicate the examples, and then tinker with them on
your own system to see their plans change as you adjust parameters to
follow along.  That really is the only way to gain expertise here.
Starting with the troublesome queries from your live system will work
just as well for that, once you get familiar with enough of the basics.

I'd suggest taking a look and listen to Robert's talk for now, that will
get you started in the right direction, and  combine it with reading all
of the documentation in the manual on this subject.  That should keep
you busy for a while, and by the time you're done you may find my book
is available too.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

Re: How does PG know if data is in memory?

From

Cédric Villemain

Date:

04 October 2010, 17:40:09

2010/10/4 Greg Smith <greg@2ndquadrant.com>:
> Craig Ringer wrote:
>>
>> If some kind of cache awareness was to be added, I'd be interested in
>> seeing a "hotness" measure that tracked how heavily a given relation/index
>> has been accessed and how much has been read from it recently. A sort of
>> age-scaled blocks-per-second measure that includes both cached and uncached
>> (disk) reads. This would let the planner know how likely parts of a given
>> index/relation are to be cached in memory without imposing the cost of
>> tracking the cache in detail. I'm still not sure it'd be all that useful,
>> though...
>
> Yup, that's one of the design ideas scribbled in my notes, as is the idea of
> what someone dubbed a "heat map" that tracked which parts of the relation
> where actually the ones in RAM, the other issue you mentioned.  The problem
> facing a lot of development possibilities in this area is that we don't have
> any continuous benchmarking of complicated plans going on right now.  So if
> something really innovative is done, there's really no automatic way to test
> the result and then see what types of plans it improves and what it makes
> worse.  Until there's some better performance regression work like that
> around, development on the optimizer has to favor being very conservative.

* tracking specific block is not very easy because of readahead. You
end-up measuring exactly if a block was in memory at the moment you
requested it physicaly, not at the moment the first seek/fread happen.
It is still interesting stat imho.

I wonder how that can add value to the planner.

* If the planner knows more about the OS cache it can guess the
effective_cache_size on its own, which is probably already nice to
have.

Extract from postgres code:
 * We use an approximation proposed by Mackert and Lohman, "Index Scans
 * Using a Finite LRU Buffer: A Validated I/O Model", ACM Transactions
 * on Database Systems, Vol. 14, No. 3, September 1989, Pages 401-424.

Planner use that in conjunction with effective_cache_size to guess if
it is interesting to scan the index.
All is to know if this model is still valid in front of a more precise
knowledge of the OS page cache... and also if it matches how different
systems like windows and linux handle page cache.

Hooks around cost estimation should help writing a module to rethink
that part of the planner and make it use the statistics about cache. I
wonder if adding such hooks to core impact its  performances ? Anyway
doing that is probably the easier and shorter way to test the
behavior.

>
> --
> Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
> PostgreSQL Training, Services and Support  www.2ndQuadrant.us
> Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
> https://www.packtpub.com/postgresql-9-0-high-performance/book
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: How does PG know if data is in memory?

From

Jeremy Harris

Date:

04 October 2010, 20:06:04

On 10/04/2010 04:22 AM, Greg Smith wrote:
> I had a brain-storming session on this subject with a few of the hackers in the community in this area a while back I
haven'thad a chance to do something with yet (it exists only as a pile of scribbled notes so far). There's a couple of
waysto collect data on what's in the database and OS cache, and a couple of ways to then expose that data to the
optimizer.But that needs to be done very carefully, almost certainly as only a manual process at first, because
somethingthat's producing cache feedback all of the time will cause plans to change all the time, too. Where I suspect
thisis going is that we may end up tracking various statistics over time, then periodically providing a way to export a
massof "typical % cached" data back to the optimizer for use in plan cost estimation purposes. But the idea of
monitoringcontinuously and always planning based on the most recent data available has some stability issues, both from
a"too many unpredictable plan changes" and a "ba 
d
> short-term feedback loop" perspective, as mentioned by Tom and Kevin already.

Why not monitor the distribution of response times, rather than "cached" vs. not?

That a) avoids the issue of discovering what was a cache hit  b) deals neatly with
multilevel caching  c) feeds directly into cost estimation.

Cheers,
    Jeremy

Re: How does PG know if data is in memory?

From

Robert Haas

Date:

11 October 2010, 23:59:37

On Mon, Oct 4, 2010 at 6:47 PM, Jeremy Harris <jgh@wizmail.org> wrote:
> On 10/04/2010 04:22 AM, Greg Smith wrote:
>>
>> I had a brain-storming session on this subject with a few of the hackers
>> in the community in this area a while back I haven't had a chance to do
>> something with yet (it exists only as a pile of scribbled notes so far).
>> There's a couple of ways to collect data on what's in the database and OS
>> cache, and a couple of ways to then expose that data to the optimizer. But
>> that needs to be done very carefully, almost certainly as only a manual
>> process at first, because something that's producing cache feedback all of
>> the time will cause plans to change all the time, too. Where I suspect this
>> is going is that we may end up tracking various statistics over time, then
>> periodically providing a way to export a mass of "typical % cached" data
>> back to the optimizer for use in plan cost estimation purposes. But the idea
>> of monitoring continuously and always planning based on the most recent data
>> available has some stability issues, both from a "too many unpredictable
>> plan changes" and a "ba
>
> d
>>
>> short-term feedback loop" perspective, as mentioned by Tom and Kevin
>> already.
>
> Why not monitor the distribution of response times, rather than "cached" vs.
> not?
>
> That a) avoids the issue of discovering what was a cache hit  b) deals
> neatly with
> multilevel caching  c) feeds directly into cost estimation.

I was hot on doing better cache modeling a year or two ago, but the
elephant in the room is that it's unclear that it solves any
real-world problem.  The OP is clearly having a problem, but there's
not enough information in his post to say what is actually causing it,
and it's probably not caching effects.  We get occasional complaints
of the form "the first time I run this query it's slow, and then after
that it's fast" but, as Craig Ringer pointed out upthread, not too
many.  And even with respect to the complaints we do get, it's far
from clear that the cure is any better than the disease.  Taking
caching effects into account could easily result in the first
execution being slightly less slow and all of the subsequent
executions being moderately slow.  That would not be an improvement
for most people.  The reports that seem really painful to me are the
ones where people with really big machines complain of needing HOURS
for the cache to warm up, and having the system bogged down to a
standstill until then.  But changing the cost model isn't going to
help them either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How does PG know if data is in memory?

From

Date:

12 October 2010, 00:11:44

An approach that works can be found in DB2, and likely elsewhere.

The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term).  A
tablespace/bufferpoolmatch is defined.  Then tables and indexes are assigned to the tablespace (and implicitly, the
bufferpool). As a result, one can effectively pin data in memory.  This is very useful, but not low hanging fruit to
implement.

The introduction of rudimentary tablespaces is a first step.  I assumed that the point was to get to a DB2-like
structureat some point.  Yes? 

Robert

---- Original message ----
>Date: Mon, 11 Oct 2010 22:59:28 -0400
>From: pgsql-performance-owner@postgresql.org (on behalf of Robert Haas <robertmhaas@gmail.com>)
>Subject: Re: [PERFORM] How does PG know if data is in memory?
>To: Jeremy Harris <jgh@wizmail.org>
>Cc: pgsql-performance@postgresql.org
>
>On Mon, Oct 4, 2010 at 6:47 PM, Jeremy Harris <jgh@wizmail.org> wrote:
>> On 10/04/2010 04:22 AM, Greg Smith wrote:
>>>
>>> I had a brain-storming session on this subject with a few of the hackers
>>> in the community in this area a while back I haven't had a chance to do
>>> something with yet (it exists only as a pile of scribbled notes so far).
>>> There's a couple of ways to collect data on what's in the database and OS
>>> cache, and a couple of ways to then expose that data to the optimizer. But
>>> that needs to be done very carefully, almost certainly as only a manual
>>> process at first, because something that's producing cache feedback all of
>>> the time will cause plans to change all the time, too. Where I suspect this
>>> is going is that we may end up tracking various statistics over time, then
>>> periodically providing a way to export a mass of "typical % cached" data
>>> back to the optimizer for use in plan cost estimation purposes. But the idea
>>> of monitoring continuously and always planning based on the most recent data
>>> available has some stability issues, both from a "too many unpredictable
>>> plan changes" and a "ba
>>
>> d
>>>
>>> short-term feedback loop" perspective, as mentioned by Tom and Kevin
>>> already.
>>
>> Why not monitor the distribution of response times, rather than "cached" vs.
>> not?
>>
>> That a) avoids the issue of discovering what was a cache hit  b) deals
>> neatly with
>> multilevel caching  c) feeds directly into cost estimation.
>
>I was hot on doing better cache modeling a year or two ago, but the
>elephant in the room is that it's unclear that it solves any
>real-world problem.  The OP is clearly having a problem, but there's
>not enough information in his post to say what is actually causing it,
>and it's probably not caching effects.  We get occasional complaints
>of the form "the first time I run this query it's slow, and then after
>that it's fast" but, as Craig Ringer pointed out upthread, not too
>many.  And even with respect to the complaints we do get, it's far
>from clear that the cure is any better than the disease.  Taking
>caching effects into account could easily result in the first
>execution being slightly less slow and all of the subsequent
>executions being moderately slow.  That would not be an improvement
>for most people.  The reports that seem really painful to me are the
>ones where people with really big machines complain of needing HOURS
>for the cache to warm up, and having the system bogged down to a
>standstill until then.  But changing the cost model isn't going to
>help them either.
>
>--
>Robert Haas
>EnterpriseDB: http://www.enterprisedb.com
>The Enterprise PostgreSQL Company
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

Re: How does PG know if data is in memory?

From

Robert Haas

Date:

12 October 2010, 09:34:36

On Mon, Oct 11, 2010 at 11:11 PM,  <gnuoytr@rcn.com> wrote:
> An approach that works can be found in DB2, and likely elsewhere.
>
> The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term).  A
tablespace/bufferpoolmatch is defined.  Then tables and indexes are assigned to the tablespace (and implicitly, the
bufferpool). As a result, one can effectively pin data in memory.  This is very useful, but not low hanging fruit to
implement.
>
> The introduction of rudimentary tablespaces is a first step.  I assumed that the point was to get to a DB2-like
structureat some point.  Yes? 

We already have tablespaces, and our data already is accessed through
the buffer pool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How does PG know if data is in memory?

From

Date:

12 October 2010, 11:20:28

The discussions I've seen indicated that, in use, tablespaces were at the database level, but, yes, the docs do say
thata table can be assigned to a defined tablespace.  What I still can't find is syntax which establishes
buffers/caches/whateverand assigns them to tablespaces.  Without that, I'm not sure what benefit there is to
tablespaces,other than a sort of RAID-lite. 

Robert


---- Original message ----
>Date: Tue, 12 Oct 2010 08:34:23 -0400
>From: pgsql-performance-owner@postgresql.org (on behalf of Robert Haas <robertmhaas@gmail.com>)
>Subject: Re: [PERFORM] How does PG know if data is in memory?
>To: gnuoytr@rcn.com
>Cc: pgsql-performance@postgresql.org
>
>On Mon, Oct 11, 2010 at 11:11 PM,  <gnuoytr@rcn.com> wrote:
>> An approach that works can be found in DB2, and likely elsewhere.
>>
>> The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term).  A
tablespace/bufferpoolmatch is defined.  Then tables and indexes are assigned to the tablespace (and implicitly, the
bufferpool). As a result, one can effectively pin data in memory.  This is very useful, but not low hanging fruit to
implement.
>>
>> The introduction of rudimentary tablespaces is a first step.  I assumed that the point was to get to a DB2-like
structureat some point.  Yes? 
>
>We already have tablespaces, and our data already is accessed through
>the buffer pool.
>
>--
>Robert Haas
>EnterpriseDB: http://www.enterprisedb.com
>The Enterprise PostgreSQL Company
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

Re: How does PG know if data is in memory?

From

"Kevin Grittner"

Date:

12 October 2010, 11:36:07

<gnuoytr@rcn.com> wrote:

> An approach that works can be found in DB2, and likely elsewhere.
>
> The key is that tablespaces/tables/indexes/buffers are all
> attached through the bufferpool (the DB2 term).  A tablespace/
> bufferpool match is defined.  Then tables and indexes are assigned
> to the tablespace (and implicitly, the bufferpool).  As a result,
> one can effectively pin data in memory. This is very useful, but
> not low hanging fruit to implement.

This sounds similar to Sybase named caches.  You can segment off
portions of the memory for specific caches, break that up into space
reserved for different I/O buffer sizes, and bind specific database
objects (tables and indexes) to specific caches.  On the few
occasions where someone had failed to configure the named caches
when setting up a machine, it was caught almost immediately after
deployment because of end-user complaints about poor performance.
This was so critical to performance for us when we were using
Sybase, that one of my first reactions on finding it missing in
PostgreSQL was distress over the inability to tune as I had.

When I posted to the list about it, the response was that LRU
eviction was superior to any tuning any human would do.  I didn't
and don't believe that, but have found it's close enough in the
PostgreSQL environment to be *way* down my list of performance
issues.  In fact, when looking at the marginal benefits it would
generate in PostgreSQL when done right, versus the number of people
who would shoot themselves in the foot with it, even I have come
around to feeling it's probably not a good idea.

FWIW, the four main reasons for using it were:

(1) Heavily used data could be kept fully cached in RAM and not
driven out by transient activity.

(2) You could flag a cache used for (1) above as using "relaxed LRU
accounting" -- it saved a lot of time tracking repeated references,
leaving more CPU for other purposes.

(3) Each named cache had its own separate set of locks, reducing
contention.

(4) Large tables for which the heap was often were scanned in its
entirety or for a range on the clustered index could be put in a
relatively small cache with large I/O buffers.  This avoided blowing
out the default cache space for situations which almost always
required disk I/O anyway.

None of that is anything for amateurs to play with.  You need to set
up caches like that based on evidence from monitoring and do careful
benchmarking of the results to actually achieve improvements over
LRU logic.

> The introduction of rudimentary tablespaces is a first step.  I
> assumed that the point was to get to a DB2-like structure at some
> point.  Yes?

As far as I can tell, there is nobody with that intent.

-Kevin

Re: How does PG know if data is in memory?

From

Date:

12 October 2010, 11:49:57

Couldn't have said it better myself; covered all the bases.  If PG wants to become an industrial strength database,
worthyof replacing DB2/etc., then these are the sorts of knobs and switches it will need.  


-- None of that is anything for amateurs to play with.

Not jam a stick in anybody's eye, but shouldn't database pros not be amateurs?  Or are most PG-ers coders who don't
reallywant to design and tune a database? 

Robert

---- Original message ----
>Date: Tue, 12 Oct 2010 09:35:56 -0500
>From: pgsql-performance-owner@postgresql.org (on behalf of "Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
>Subject: Re: [PERFORM] How does PG know if data is in memory?
>To: <pgsql-performance@postgresql.org>,<gnuoytr@rcn.com>
>
><gnuoytr@rcn.com> wrote:
>
>> An approach that works can be found in DB2, and likely elsewhere.
>>
>> The key is that tablespaces/tables/indexes/buffers are all
>> attached through the bufferpool (the DB2 term).  A tablespace/
>> bufferpool match is defined.  Then tables and indexes are assigned
>> to the tablespace (and implicitly, the bufferpool).  As a result,
>> one can effectively pin data in memory. This is very useful, but
>> not low hanging fruit to implement.
>
>This sounds similar to Sybase named caches.  You can segment off
>portions of the memory for specific caches, break that up into space
>reserved for different I/O buffer sizes, and bind specific database
>objects (tables and indexes) to specific caches.  On the few
>occasions where someone had failed to configure the named caches
>when setting up a machine, it was caught almost immediately after
>deployment because of end-user complaints about poor performance.
>This was so critical to performance for us when we were using
>Sybase, that one of my first reactions on finding it missing in
>PostgreSQL was distress over the inability to tune as I had.
>
>When I posted to the list about it, the response was that LRU
>eviction was superior to any tuning any human would do.  I didn't
>and don't believe that, but have found it's close enough in the
>PostgreSQL environment to be *way* down my list of performance
>issues.  In fact, when looking at the marginal benefits it would
>generate in PostgreSQL when done right, versus the number of people
>who would shoot themselves in the foot with it, even I have come
>around to feeling it's probably not a good idea.
>
>FWIW, the four main reasons for using it were:
>
>(1) Heavily used data could be kept fully cached in RAM and not
>driven out by transient activity.
>
>(2) You could flag a cache used for (1) above as using "relaxed LRU
>accounting" -- it saved a lot of time tracking repeated references,
>leaving more CPU for other purposes.
>
>(3) Each named cache had its own separate set of locks, reducing
>contention.
>
>(4) Large tables for which the heap was often were scanned in its
>entirety or for a range on the clustered index could be put in a
>relatively small cache with large I/O buffers.  This avoided blowing
>out the default cache space for situations which almost always
>required disk I/O anyway.
>
>None of that is anything for amateurs to play with.  You need to set
>up caches like that based on evidence from monitoring and do careful
>benchmarking of the results to actually achieve improvements over
>LRU logic.
>
>> The introduction of rudimentary tablespaces is a first step.  I
>> assumed that the point was to get to a DB2-like structure at some
>> point.  Yes?
>
>As far as I can tell, there is nobody with that intent.
>
>-Kevin
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

Re: How does PG know if data is in memory?

From

"Kevin Grittner"

Date:

12 October 2010, 12:15:03

<gnuoytr@rcn.com> wrote:

> -- None of that is anything for amateurs to play with.
>
> Not jam a stick in anybody's eye, but shouldn't database pros not
> be amateurs?

While many PostgreSQL installations are managed by professional
DBAs, or programmers or consultants with a deep enough grasp of the
issues to tune a knob like that appropriately, PostgreSQL is also
used in environments without such staff.  In fact, there is pressure
to make PostgreSQL easier to configure for exactly that reason.  If
we add more knobs which are this hard to tune correctly, we would
risk inundation with complaints from people to tried to use it and
made things worse.

-Kevin

Re: How does PG know if data is in memory?

From

Ron Mayer

Date:

13 October 2010, 00:23:12

Kevin Grittner wrote:
>
> ...Sybase named caches...segment off portions of the memory for
> specific caches... bind specific database
> objects (tables and indexes) to specific caches. ...
>
> When I posted to the list about it, the response was that LRU
> eviction was superior to any tuning any human would do.  I didn't
> and don't believe that....
>
> FWIW, the four main reasons for using it were:
> (1) Heavily used data could be kept fully cached in RAM...

Lightly-used-but-important data seems like another use case.

LRU's probably far better than me at optimizing for the total
throughput and/or average response time.  But if there's a
requirement:
 "Even though this query's very rare, it should respond
  ASAP, even at the expense of the throughput of the rest
  of the system."
it sounds like this kind of hand-tuning might be useful.

Re: How does PG know if data is in memory?

From

Robert Haas

Date:

13 October 2010, 03:40:49

On Tue, Oct 12, 2010 at 10:35 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> (1) Heavily used data could be kept fully cached in RAM and not
> driven out by transient activity.

We've attempted to address this problem by adding logic to prevent the
buffer cache from being trashed by vacuums, bulk loads, and sequential
scans.  It would be interesting to know if anyone has examples of that
logic falling over or proving inadequate.

> (2) You could flag a cache used for (1) above as using "relaxed LRU
> accounting" -- it saved a lot of time tracking repeated references,
> leaving more CPU for other purposes.

We never do strict LRU accounting.

> (3) Each named cache had its own separate set of locks, reducing
> contention.

We have lock partitions, but as discussed recently on -hackers, they
seem to start falling over around 26 cores.  We probably need to
improve that, but I'd rather do that by making the locking more
efficient and by increasing the number of partitions rather than by
allowing users to partition the buffer pool by hand.

> (4) Large tables for which the heap was often were scanned in its
> entirety or for a range on the clustered index could be put in a
> relatively small cache with large I/O buffers.  This avoided blowing
> out the default cache space for situations which almost always
> required disk I/O anyway.

I think, but am not quite sure, that my answer to point #1 is also
relevant here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How does PG know if data is in memory?

From

Cédric Villemain

Date:

14 October 2010, 15:48:12

2010/10/13 Ron Mayer <rm_pg@cheapcomplexdevices.com>:
> Kevin Grittner wrote:
>>
>> ...Sybase named caches...segment off portions of the memory for
>> specific caches... bind specific database
>> objects (tables and indexes) to specific caches. ...
>>
>> When I posted to the list about it, the response was that LRU
>> eviction was superior to any tuning any human would do.  I didn't
>> and don't believe that....
>>
>> FWIW, the four main reasons for using it were:
>> (1) Heavily used data could be kept fully cached in RAM...
>
> Lightly-used-but-important data seems like another use case.
>
> LRU's probably far better than me at optimizing for the total
> throughput and/or average response time.  But if there's a
> requirement:
>  "Even though this query's very rare, it should respond
>  ASAP, even at the expense of the throughput of the rest
>  of the system."
> it sounds like this kind of hand-tuning might be useful.

it is exactly one of the purpose of pgfincore :
http://villemain.org/projects/pgfincore#load_a_table_or_an_index_in_os_page_cache


>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: How does PG know if data is in memory?

From

Bruce Momjian

Date:

21 October 2010, 15:55:34

Kevin Grittner wrote:
> <gnuoytr@rcn.com> wrote:
>
> > -- None of that is anything for amateurs to play with.
> >
> > Not jam a stick in anybody's eye, but shouldn't database pros not
> > be amateurs?
>
> While many PostgreSQL installations are managed by professional
> DBAs, or programmers or consultants with a deep enough grasp of the
> issues to tune a knob like that appropriately, PostgreSQL is also
> used in environments without such staff.  In fact, there is pressure
> to make PostgreSQL easier to configure for exactly that reason.  If
> we add more knobs which are this hard to tune correctly, we would
> risk inundation with complaints from people to tried to use it and
> made things worse.

Agreed.  Here is a blog entry that explains some of the tradeoffs of
adding knobs:

    http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: How does PG know if data is in memory?

From

Francisco Reyes

Date:

27 October 2010, 23:30:16

Greg Smith writes:

> heard privately from two people who have done similar experiments on
> Linux and found closer to 8GB to be the point where performance started

So on a machine with 72GB is 8GB still the recommended value?
Usually have only 10 to 20 connections.

Re: How does PG know if data is in memory?

From

Cédric Villemain

Date:

28 October 2010, 12:31:58

2010/10/28 Francisco Reyes <lists@stringsutils.com>:
> Greg Smith writes:
>
>> heard privately from two people who have done similar experiments on Linux
>> and found closer to 8GB to be the point where performance started
>
> So on a machine with 72GB is 8GB still the recommended value?

Yes, as a maximum, not a minimum. (Some applications will work better
with less shared_buffers than others)

> Usually have only 10 to 20 connections.
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support