Re: PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory? - Mailing list pgsql-performance

From Jeff Janes
Subject Re: PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
Date
Msg-id CAMkU=1zUHzWzCWJWiUkq+rhC_yZuK8h6FaTC-wjPRPgZXTfzNA@mail.gmail.com
Whole thread Raw
In response to Re: PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?  (Stefan Keller <sfkeller@gmail.com>)
Responses Re: PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
Re: PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
List pgsql-performance
On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller <sfkeller@gmail.com> wrote:
> 2012/2/28 Claudio Freire <klaussfreire@gmail.com>:
>>
>> In the OP, you say "There is enough main memory to hold all table
>> contents.". I'm assuming, there you refer to your current system, with
>> 4GB memory.
>
> Sorry for the confusion: I'm doing these tests on this machine with
> one table (osm_point) and one country. This table has a size of 2.6GB
> and 10 million tuples. The other machine has to deal with at least 5
> tables in total and will be hold more than one country plus routing
> etc..

What is your shared_buffers set to?  2.6GB is uncomfortably close to
4GB, considering the computer has other things it needs to use memory
for as well.

A problem is that often the shared_buffers and the OS cache end up
being basically copies of one another, rather than complementing each
other.  So on read-only applications, the actually useful size of the
total cache turns out to be max(shared_buffers, RAM - 2*shared_buffers
- unknown_overhead).

So one choice is setting shared_buffers low (<0.5GB) and let the OS
cache be your main cache.  Advantages of this are that the OS cache
survives PG server restarts, gets populated even by sequential scans,
and can be pre-warmed by the tar trick.  Disadvantages are that pages
can be driven out of the OS cache by non-PG related activity, which
can be hard to monitor and control.  Also, there is some small cost to
constantly transferring data from OS cache to PG cache, but in your
case I htink that would be negligible.

The other choice is setting shared_buffers high (>3GB) and having it
be your main cache.  The advantage is that non-PG activity generally
won't drive it out.  The disadvantages are that it is hard to
pre-populate as the tar trick won't work, and neither will sequential
scans on tables due to the ring buffer.

Actually, the tar trick might work somewhat if applied either shortly
before or shortly after the database is started.  If the database
starts out not using its full allotment of memory, the OS will use it
for cache, and you can pre-populate that cache.  Then as the database
runs, the PG cache gets larger by copying needed data from the OS
cache into it.  As the PG cache grows, pages need to get evicted from
OS cache to make room for it.  Ideally, the pages evicted from the OS
cache would be the ones just copied into PG, but the kernel is not
aware of that.  So the whole thing is rather sub-optimal.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: [planner] Ignore "order by" in subselect if parrent do count(*)
Next
From: Scott Marlowe
Date:
Subject: Re: PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?