Thread: Configuring for maximum memory usage

Configuring for maximum memory usage

From
Christiaan Willemsen
Date:
Hi there,

I configured OpenSolaris on our OpenSolaris Machine. Specs:

2x Quad 2.6 Ghz Xeon
64 GB of memory
16x 15k5 SAS

The filesystem is configured using ZFS, and I think I have found a
configuration that performs fairly well.

I installed the standard PostgreSQL that came with the OpenSolaris disk
(8.3), and later added support for PostGIS. All fime.
I also tried to tune postgresql.conf to maximize performance and also
memory usage.

Since PostgreSQL is the only thing running on this machine, we want it
to take full advantage of the hardware. For the ZFS cache, we have 8 GB
reserved. The rest can be used by postgres.

The problem is getting it to use that much. At the moment, it only uses
almost 9 GB, so by far not enough. The problem is getting it to use
more... I hope you can help me with working config.

Here are the parameters I set in the config file:

shared_buffers = 8192MB
work_mem = 128MB
maintenance_work_mem = 2048MB
max_fsm_pages = 204800
max_fsm_relations = 2000

Database is about 250 GB in size, so we really need to have as much data
as possible in memory.

I hope you can help us tweak a few parameters to make sure all memory
will be used.




Re: Configuring for maximum memory usage

From
Ulrich
Date:
Hi,
you could set effective_cache_size to a high value (free memory on your
server that is used for caching).
Christiaan Willemsen wrote:
> Hi there,
>
> I configured OpenSolaris on our OpenSolaris Machine. Specs:
>
> 2x Quad 2.6 Ghz Xeon
> 64 GB of memory
> 16x 15k5 SAS
>
> The filesystem is configured using ZFS, and I think I have found a
> configuration that performs fairly well.
>
> I installed the standard PostgreSQL that came with the OpenSolaris
> disk (8.3), and later added support for PostGIS. All fime.
> I also tried to tune postgresql.conf to maximize performance and also
> memory usage.
>
> Since PostgreSQL is the only thing running on this machine, we want it
> to take full advantage of the hardware. For the ZFS cache, we have 8
> GB reserved. The rest can be used by postgres.
>
> The problem is getting it to use that much. At the moment, it only
> uses almost 9 GB, so by far not enough. The problem is getting it to
> use more... I hope you can help me with working config.
>
> Here are the parameters I set in the config file:
>
> shared_buffers = 8192MB
> work_mem = 128MB
> maintenance_work_mem = 2048MB
> max_fsm_pages = 204800
> max_fsm_relations = 2000
>
> Database is about 250 GB in size, so we really need to have as much
> data as possible in memory.
>
> I hope you can help us tweak a few parameters to make sure all memory
> will be used.
>
>
>
>


Re: Configuring for maximum memory usage

From
"Joshua D. Drake"
Date:
Christiaan Willemsen wrote:
> Hi there,

> The problem is getting it to use that much. At the moment, it only uses
> almost 9 GB, so by far not enough. The problem is getting it to use
> more... I hope you can help me with working config.

PostgreSQL is only going to use what it needs. It relies on the OS for
much of the caching etc...

>
> Here are the parameters I set in the config file:
>
> shared_buffers = 8192MB

I wouldn't take this any higher.

> work_mem = 128MB

This is quite high but it might be o.k. depending on what you are doing.

> maintenance_work_mem = 2048MB

This is only used during maintenance so you won't see this much.

> max_fsm_pages = 204800
> max_fsm_relations = 2000

This uses very little memory.

>
> Database is about 250 GB in size, so we really need to have as much data
> as possible in memory.
>
> I hope you can help us tweak a few parameters to make sure all memory
> will be used.

You are missing effective_cache_size. Try setting that to 32G.

You also didn't mention checkpoint_segments (which isn't memory but
still important) and default_statistics_target (which isn't memory but
still important).

Joshua D. Drake


>
>
>
>


Re: Configuring for maximum memory usage

From
Christiaan Willemsen
Date:
Joshua D. Drake wrote:
>
> PostgreSQL is only going to use what it needs. It relies on the OS for
> much of the caching etc...
>
So that would actually mean that I could raise the setting of the ARC
cache to far more than 8 GB? As I said, our database is 250 GB, So I
would expect that postgres needs more than it is using right now...
Several tables have over 500 million  records (obviously partitioned).
At the moment we are doing queries over large datasets, So I would
assume that postgress would need a bit more memory than this..
>
> You are missing effective_cache_size. Try setting that to 32G.
That one was set to 24 GB. But this setting only tells posgres how much
caching it can expect from the OS? This is not actually memory that it
will allocate, is it?
>
> You also didn't mention checkpoint_segments (which isn't memory but
> still important) and default_statistics_target (which isn't memory but
> still important).
>
is at the moment set to:

checkpoint_segments = 40
default_statistics_target is set to default (I think that is 10)

Thanks already,

Christiaan

Re: Configuring for maximum memory usage

From
"Joshua D. Drake"
Date:
On Thu, 2008-10-30 at 16:58 +0100, Christiaan Willemsen wrote:
> Joshua D. Drake wrote:
> >
> > PostgreSQL is only going to use what it needs. It relies on the OS for
> > much of the caching etc...
> >
> So that would actually mean that I could raise the setting of the ARC
> cache to far more than 8 GB? As I said, our database is 250 GB, So I
> would expect that postgres needs more than it is using right now...
> Several tables have over 500 million  records (obviously partitioned).
> At the moment we are doing queries over large datasets, So I would
> assume that postgress would need a bit more memory than this..

Well I actually can't answer this definitely. My knowledge of Solaris is
slimmer than my knowledge of other operating systems. However it appears
from a brief google that ARC cache is some predefined file level cache
that you can set with Solaris? If so, then you want that to be high
enough to keep your most active relations hot in cache.

Remember that PostgreSQL doesn't cache anything on its own so if you do
want to hit disk it has to be in file cache.

> >
> > You are missing effective_cache_size. Try setting that to 32G.
> That one was set to 24 GB. But this setting only tells posgres how much
> caching it can expect from the OS? This is not actually memory that it
> will allocate, is it?

That is correct it is not an actual allocation but it does vastly effect
your query plans. PostgreSQL uses this parameter to help determine if a
someone is likely to be cached (see comment about file cache above).

> >
> > You also didn't mention checkpoint_segments (which isn't memory but
> > still important) and default_statistics_target (which isn't memory but
> > still important).
> >
> is at the moment set to:
>
> checkpoint_segments = 40
> default_statistics_target is set to default (I think that is 10)
>

10 is likely way too low. Try 150 and make sure you analyze after.

As I recall some other databases allow you to say, "you have this much
memory, use it". PostgreSQL doesn't do that. You just give it pointers
and it will use as much as it needs within the limits of the OS. The key
word here is needs.

There is obviously some variance to that (like work_mem).

Joshua D. Drake



--


Re: Configuring for maximum memory usage

From
"Scott Carey"
Date:
You must either increase the memory that ZFS uses, or increase Postgresql shard_mem and work_mem to get the aggregate of the two to use more RAM.

I believe, that you have not told ZFS to reserve 8GB, but rather told it to limit itself to 8GB. 

Some comments below:

On Thu, Oct 30, 2008 at 8:15 AM, Christiaan Willemsen <cwillemsen@technocon.com> wrote:
Hi there,

I configured OpenSolaris on our OpenSolaris Machine. Specs:

2x Quad 2.6 Ghz Xeon
64 GB of memory
16x 15k5 SAS
If you do much writing, and even moreso with ZFS, it is critical to put the WAL log on a different ZFS volume (and perhaps disks) than the data and indexes.
 


The filesystem is configured using ZFS, and I think I have found a configuration that performs fairly well.

I installed the standard PostgreSQL that came with the OpenSolaris disk (8.3), and later added support for PostGIS. All fime.
I also tried to tune postgresql.conf to maximize performance and also memory usage.

Since PostgreSQL is the only thing running on this machine, we want it to take full advantage of the hardware. For the ZFS cache, we have 8 GB reserved. The rest can be used by postgres.

What setting reserves (but does not limit) ZFS to a memory size?  I am not familiar with one that behaves that way, but I could be wrong.  Try setting this to 48GB (leaving 16 for the db and misc).
 

The problem is getting it to use that much. At the moment, it only uses almost 9 GB, so by far not enough. The problem is getting it to use more... I hope you can help me with working config.

Are you counting both the memory used by postgres and the memory used by the ZFS ARC cache?  It is the combination you are interested in, and performance will be better if it is biased towards one being a good chunk larger than the other.  In my experience, if you are doing more writes, a larger file system cache is better, if you are doing reads, a larger postgres cache is better (the overhead of calling read() in 8k chunks to the os, even if it is cached, causes CPU use to increase).
 

Here are the parameters I set in the config file:

shared_buffers = 8192MB
You probably want shared_buffers + the ZFS ARC cache ("advanced" file system cache for those unfamiliar with ZFS) to be about 56GB, unless you have a lot of connections and heavily use temp tables or work_mem.  In that case make the total less.
I recommend trying:
shared_buffers = 48GB , ZFS limited to 8GB and
shared_buffers = 8GB, ZFS limited to 48GB

work_mem = 128MB
maintenance_work_mem = 2048MB
max_fsm_pages = 204800
max_fsm_relations = 2000

If you do very large aggregates, you may  need  even 1GB on work_mem.  However, a setting that high would require very careful tuning and reduction of space used by shared_buffers and the ZFS ARC.  Its dangerous since each connection with a large aggregate or sort may consume a lot of memory.
 

Database is about 250 GB in size, so we really need to have as much data as possible in memory.

I hope you can help us tweak a few parameters to make sure all memory will be used.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Configuring for maximum memory usage

From
"Scott Carey"
Date:


On Thu, Oct 30, 2008 at 9:05 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On Thu, 2008-10-30 at 16:58 +0100, Christiaan Willemsen wrote:
> Joshua D. Drake wrote:
> >
> > PostgreSQL is only going to use what it needs. It relies on the OS for
> > much of the caching etc...
> >
> So that would actually mean that I could raise the setting of the ARC
> cache to far more than 8 GB? As I said, our database is 250 GB, So I
> would expect that postgres needs more than it is using right now...
> Several tables have over 500 million  records (obviously partitioned).
> At the moment we are doing queries over large datasets, So I would
> assume that postgress would need a bit more memory than this..

Well I actually can't answer this definitely. My knowledge of Solaris is
slimmer than my knowledge of other operating systems. However it appears
from a brief google that ARC cache is some predefined file level cache
that you can set with Solaris? If so, then you want that to be high
enough to keep your most active relations hot in cache.

Remember that PostgreSQL doesn't cache anything on its own so if you do
want to hit disk it has to be in file cache.

By my understanding, this is absolutely false.  Postgres caches pages from tables/indexes in shared_buffers. You can make this very large if you wish. 

Solaris ZFS ARC is the filesystem cache area for ZFS, it will yield to other apps by default, but you will
get better performance and consistency if you limit it to not compete with apps you know you want in memory, like a database.

In older versions, the postgres shared_buffers page cache was not very efficient, and the OS page caches were jsut better, so setting shared_buffers too large was a bad idea.
However, postgres uses a reasonable eviction algorithm now that doesn't evict recently used items as readily as it used to, or let full table scans kick out heavily accessed data (8.3 +).
The current tradeoff is that going from postgres to the OS cache incurs CPU overhead for reading.  But The OS may be better at caching more relevant pages.
  A good OS cache, like the ZFS ARC, is much more sophisticated in the algorithms used for determining what to cache and what to evict, and so it may be better at limiting disk usage.  But accessing it versus the postgres page cache in shared_buffers incurs extra CPU cost, as both caches must look for, load, and potentially evict, rather than one.
 


> >
> > You are missing effective_cache_size. Try setting that to 32G.
> That one was set to 24 GB. But this setting only tells posgres how much
> caching it can expect from the OS? This is not actually memory that it
> will allocate, is it?

That is correct it is not an actual allocation but it does vastly effect
your query plans. PostgreSQL uses this parameter to help determine if a
someone is likely to be cached (see comment about file cache above).

It should be set to the expected size of the OS file cache (the size of the ZFS ARC cache in this case).
However, it will have very little impact on large data queries that don't use indexes.  It has larger impact
for things that may do index scans when shared_buffers is small comapred to file system cache + shared_buffers.

> >
> > You also didn't mention checkpoint_segments (which isn't memory but
> > still important) and default_statistics_target (which isn't memory but
> > still important).
> >
> is at the moment set to:
>
> checkpoint_segments = 40
> default_statistics_target is set to default (I think that is 10)
>

10 is likely way too low. Try 150 and make sure you analyze after.

As I recall some other databases allow you to say, "you have this much
memory, use it". PostgreSQL doesn't do that. You just give it pointers
and it will use as much as it needs within the limits of the OS. The key
word here is needs.

There is obviously some variance to that (like work_mem).

Joshua D. Drake



--


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Configuring for maximum memory usage

From
"Joshua D. Drake"
Date:
On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote:

>
>         Remember that PostgreSQL doesn't cache anything on its own so
>         if you do
>         want to hit disk it has to be in file cache.
>
> By my understanding, this is absolutely false.  Postgres caches pages
> from tables/indexes in shared_buffers. You can make this very large if
> you wish.

You can make it very large with a potentially serious performance hit.
It is very expensive to manage large amounts of shared buffers. It can
also nail your IO on checkpoint if you are not careful (even with
checkpoint smoothing). You are correct that I did not explain what I
meant very well because shared buffers are exactly that, shared
buffers.

However that isn't the exact same thing as a "cache" at least as I was
trying to describe it. shared buffers are used to keep track of pages
(as well as some other stuff) and their current status. That is not the
same as caching a relation.

It is not possible to pin a relation to memory using PostgreSQL.
PostgreSQL relies on the operating system for that type of caching.

Joshua D. Drake





--


Re: Configuring for maximum memory usage

From
Christiaan Willemsen
Date:
Hi Scott,

Thaks for the clear answers!

Scott Carey wrote:
You must either increase the memory that ZFS uses, or increase Postgresql shard_mem and work_mem to get the aggregate of the two to use more RAM.

I believe, that you have not told ZFS to reserve 8GB, but rather told it to limit itself to 8GB. 
That is correct, but since it will use the whole 8 GB anyway, I can just as easily say that it will reseve that memory ;)

Some comments below:

On Thu, Oct 30, 2008 at 8:15 AM, Christiaan Willemsen <cwillemsen@technocon.com> wrote:
Hi there,

I configured OpenSolaris on our OpenSolaris Machine. Specs:

2x Quad 2.6 Ghz Xeon
64 GB of memory
16x 15k5 SAS
If you do much writing, and even moreso with ZFS, it is critical to put the WAL log on a different ZFS volume (and perhaps disks) than the data and indexes.
I already did that. I also have a separate disk pair for the ZFS intent log.

Are you counting both the memory used by postgres and the memory used by the ZFS ARC cache?  It is the combination you are interested in, and performance will be better if it is biased towards one being a good chunk larger than the other.  In my experience, if you are doing more writes, a larger file system cache is better, if you are doing reads, a larger postgres cache is better (the overhead of calling read() in 8k chunks to the os, even if it is cached, causes CPU use to increase).
No, the figure I gave is this is without the ARC cache.
If you do very large aggregates, you may  need  even 1GB on work_mem.  However, a setting that high would require very careful tuning and reduction of space used by shared_buffers and the ZFS ARC.  Its dangerous since each connection with a large aggregate or sort may consume a lot of memory.
Well, some taks may need a lot, but I guess most wil do fine with the settings we used right now.

So It looks like I can tune the ARC to use more memory, and also increase shared_mem to let postgres cache more tables?

Re: Configuring for maximum memory usage

From
Alvaro Herrera
Date:
Joshua D. Drake wrote:

> However that isn't the exact same thing as a "cache" at least as I was
> trying to describe it. shared buffers are used to keep track of pages
> (as well as some other stuff) and their current status. That is not the
> same as caching a relation.

Um, having a page in shared buffers means exactly that the page is
cached (a.k.a. it won't have to be read from the lower level next time).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Configuring for maximum memory usage

From
"Joshua D. Drake"
Date:
On Thu, 2008-10-30 at 14:00 -0300, Alvaro Herrera wrote:
> Joshua D. Drake wrote:
>
> > However that isn't the exact same thing as a "cache" at least as I was
> > trying to describe it. shared buffers are used to keep track of pages
> > (as well as some other stuff) and their current status. That is not the
> > same as caching a relation.
>
> Um, having a page in shared buffers means exactly that the page is
> cached (a.k.a. it won't have to be read from the lower level next time).
>

Yes but that wasn't my point. Sorry if I wasn't being clear enough.

Joshua D. Drake

--


Re: Configuring for maximum memory usage

From
"Scott Carey"
Date:


If you do very large aggregates, you may  need  even 1GB on work_mem.  However, a setting that high would require very careful tuning and reduction of space used by shared_buffers and the ZFS ARC.  Its dangerous since each connection with a large aggregate or sort may consume a lot of memory.
Well, some taks may need a lot, but I guess most wil do fine with the settings we used right now.

So It looks like I can tune the ARC to use more memory, and also increase shared_mem to let postgres cache more tables?

I would recommend tuning one upwards, and leaving the other smaller.  The worst case is when they are both similarly sized, it leaves the most opportunity for duplication of data, and produces the worst feedback on checkpoint writes.
You may want to compare the performance with:
 larger ARC and smaller shared_buffers
vs
 smaller ARC and larger shared_buffers

The results will be rather dependent on how you use postgres, the types of queries you do, and for writes, how you tune checkpoints and such.

Re: Configuring for maximum memory usage

From
"Scott Carey"
Date:


On Thu, Oct 30, 2008 at 9:55 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote:

>
>         Remember that PostgreSQL doesn't cache anything on its own so
>         if you do
>         want to hit disk it has to be in file cache.
>
> By my understanding, this is absolutely false.  Postgres caches pages
> from tables/indexes in shared_buffers. You can make this very large if
> you wish.

You can make it very large with a potentially serious performance hit.
It is very expensive to manage large amounts of shared buffers. It can
also nail your IO on checkpoint if you are not careful (even with
checkpoint smoothing). You are correct that I did not explain what I
meant very well because shared buffers are exactly that, shared
buffers.

You can slam your I/O by havnig too large of either OS file cache or shared_buffers, and you have to tune both.
In the case of large shared_buffers you have to tune postgres and especially the background writer and checkpoints.
In the case of a large OS cache, you have to tune parameters to limit the ammount of dirty pages there and force writes out smoothly. 
Both layers attempt to delay writes for their own, often similar reasons, and suffer when a large sync comes along with lots of dirty data. 

Recent ZFS changes have been made to limit this, (http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle)
in earlier ZFS versions, this is what usually killed databases -- ZFS in some situations would delay writes too long (even if "long" is 5 seconds) and get in trouble.  This still has to be tuned well, combined with good checkpoint tuning in Postgres as you mention. For Linux, there are similar issues that have to be tuned on many kernels, or up to 40% of RAM can fill with dirty pages not written to disk.

Letting the OS do it doesn't get rid of the problem, both levels of cache share very similar issues with large sizes and dirty pages followed by a sync.

The buffer cache in shared_buffers is a lot more efficient for large scanning queries -- A select count(*) test will be CPU bound if it comes from shared_buffers or the OS page cache, and in the former case I have seen it execute up to 50% faster than the latter, by avoiding calling out to the OS to get pages, purely as a result of less CPU used.
 
 


However that isn't the exact same thing as a "cache" at least as I was
trying to describe it. shared buffers are used to keep track of pages
(as well as some other stuff) and their current status. That is not the
same as caching a relation.

It is not possible to pin a relation to memory using PostgreSQL.
PostgreSQL relies on the operating system for that type of caching.

The OS can't pin a relation either, from its point of view its all just a bunch of disk data blocks, not relations -- so it is all roughly equivalent.  The OS can do a bit better job at data prefetch on sequential scans or other predictable seek sequences (ARC stands for Adaptive Replacement Cache) than postgres currently does (no intelligent prefetch in postgres AFAIK).

So I apologize if I made it sound like Postgres cached the actual relation, its just pages -- but it is basically the same thing as the OS cache, but kept in process closer to the code that needs it.  Its a cache that prevents disk reads.

My suggestion for the OP is to try it both ways, and see what is better for his workload / OS / Hardware combination.
 

Joshua D. Drake





--


Re: Configuring for maximum memory usage

From
Christiaan Willemsen
Date:
Thanks guys,

Lots of info here that I didn't know about! Since I have one of the latest Opensolaris builds, I guess the write throttle feature is already in there. Sadly, the blog doesn't say what build has it included.

For writes, I do everything synchronized because we really need a consistent database on disk. We can see that during large inserts, the intend log is used a lot. 

What  I'm going to te testing is a smaller shared_buffers value, and a large ARC cache, and exactly the other way around.

Another question: since we have huge tables with hundreds of millions or rows, we partitioned the database (it actually is creating the partitions dynamically now on inserts with very good performance :D ), but the question is: is the size of the partions important for the memory parameters in config file? How can we determine the optimal size of the partition. obviously, when doing selects, you want those preferably only needing a single partition for speed. At the moment, that is for the majority of situations the case. But there might be some other things to think about...

Kind regards,

Christiaan


On Oct 30, 2008, at 7:27 PM, Scott Carey wrote:



On Thu, Oct 30, 2008 at 9:55 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote:

>
>         Remember that PostgreSQL doesn't cache anything on its own so
>         if you do
>         want to hit disk it has to be in file cache.
>
> By my understanding, this is absolutely false.  Postgres caches pages
> from tables/indexes in shared_buffers. You can make this very large if
> you wish.

You can make it very large with a potentially serious performance hit.
It is very expensive to manage large amounts of shared buffers. It can
also nail your IO on checkpoint if you are not careful (even with
checkpoint smoothing). You are correct that I did not explain what I
meant very well because shared buffers are exactly that, shared
buffers.

You can slam your I/O by havnig too large of either OS file cache or shared_buffers, and you have to tune both.
In the case of large shared_buffers you have to tune postgres and especially the background writer and checkpoints.
In the case of a large OS cache, you have to tune parameters to limit the ammount of dirty pages there and force writes out smoothly. 
Both layers attempt to delay writes for their own, often similar reasons, and suffer when a large sync comes along with lots of dirty data. 

Recent ZFS changes have been made to limit this, (http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle)
in earlier ZFS versions, this is what usually killed databases -- ZFS in some situations would delay writes too long (even if "long" is 5 seconds) and get in trouble.  This still has to be tuned well, combined with good checkpoint tuning in Postgres as you mention. For Linux, there are similar issues that have to be tuned on many kernels, or up to 40% of RAM can fill with dirty pages not written to disk.

Letting the OS do it doesn't get rid of the problem, both levels of cache share very similar issues with large sizes and dirty pages followed by a sync.

The buffer cache in shared_buffers is a lot more efficient for large scanning queries -- A select count(*) test will be CPU bound if it comes from shared_buffers or the OS page cache, and in the former case I have seen it execute up to 50% faster than the latter, by avoiding calling out to the OS to get pages, purely as a result of less CPU used.
 
 


However that isn't the exact same thing as a "cache" at least as I was
trying to describe it. shared buffers are used to keep track of pages
(as well as some other stuff) and their current status. That is not the
same as caching a relation.

It is not possible to pin a relation to memory using PostgreSQL.
PostgreSQL relies on the operating system for that type of caching.

The OS can't pin a relation either, from its point of view its all just a bunch of disk data blocks, not relations -- so it is all roughly equivalent.  The OS can do a bit better job at data prefetch on sequential scans or other predictable seek sequences (ARC stands for Adaptive Replacement Cache) than postgres currently does (no intelligent prefetch in postgres AFAIK).

So I apologize if I made it sound like Postgres cached the actual relation, its just pages -- but it is basically the same thing as the OS cache, but kept in process closer to the code that needs it.  Its a cache that prevents disk reads.

My suggestion for the OP is to try it both ways, and see what is better for his workload / OS / Hardware combination.
 

Joshua D. Drake





--



Re: Configuring for maximum memory usage

From
"Scott Carey"
Date:


On Thu, Oct 30, 2008 at 2:06 PM, Christiaan Willemsen <cwillemsen@technocon.com> wrote:
Thanks guys,

Lots of info here that I didn't know about! Since I have one of the latest Opensolaris builds, I guess the write throttle feature is already in there. Sadly, the blog doesn't say what build has it included.

If I recall correctly, it went in at about build 89 or so (I think the bottom of the link I provided has a comment to that effect).  So its in there now, but not in OpenSolaris 2008.05.
 

For writes, I do everything synchronized because we really need a consistent database on disk. We can see that during large inserts, the intend log is used a lot. 

The DB synchronizes the WAL log automatically, and the table and index data are written non-synchronously until the commit at the end of a checkpoint, in which case sync is called on them.  This keeps things consistent on disk.  With ZFS, each block written is always consistent, with a checksum kept in the parent block.  There are no partial page writes, ever.  In theory, you can disable full page writes on the WAL log if there is a bottleneck there since ZFS guarantees fully transactional consistent state of the file system, even if you have a RAID controller or hardware failure that causes a partial write.  But WAL log activity is probably not your bottleneck so turning off full page writes on the WAL log is not necessary.

 

What  I'm going to te testing is a smaller shared_buffers value, and a large ARC cache, and exactly the other way around.

Another question: since we have huge tables with hundreds of millions or rows, we partitioned the database (it actually is creating the partitions dynamically now on inserts with very good performance :D ), but the question is: is the size of the partions important for the memory parameters in config file? How can we determine the optimal size of the partition. obviously, when doing selects, you want those preferably only needing a single partition for speed. At the moment, that is for the majority of situations the case. But there might be some other things to think about...

Kind regards,

Christiaan


On Oct 30, 2008, at 7:27 PM, Scott Carey wrote:



On Thu, Oct 30, 2008 at 9:55 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote:

>
>         Remember that PostgreSQL doesn't cache anything on its own so
>         if you do
>         want to hit disk it has to be in file cache.
>
> By my understanding, this is absolutely false.  Postgres caches pages
> from tables/indexes in shared_buffers. You can make this very large if
> you wish.

You can make it very large with a potentially serious performance hit.
It is very expensive to manage large amounts of shared buffers. It can
also nail your IO on checkpoint if you are not careful (even with
checkpoint smoothing). You are correct that I did not explain what I
meant very well because shared buffers are exactly that, shared
buffers.

You can slam your I/O by havnig too large of either OS file cache or shared_buffers, and you have to tune both.
In the case of large shared_buffers you have to tune postgres and especially the background writer and checkpoints.
In the case of a large OS cache, you have to tune parameters to limit the ammount of dirty pages there and force writes out smoothly. 
Both layers attempt to delay writes for their own, often similar reasons, and suffer when a large sync comes along with lots of dirty data. 

Recent ZFS changes have been made to limit this, (http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle)
in earlier ZFS versions, this is what usually killed databases -- ZFS in some situations would delay writes too long (even if "long" is 5 seconds) and get in trouble.  This still has to be tuned well, combined with good checkpoint tuning in Postgres as you mention. For Linux, there are similar issues that have to be tuned on many kernels, or up to 40% of RAM can fill with dirty pages not written to disk.

Letting the OS do it doesn't get rid of the problem, both levels of cache share very similar issues with large sizes and dirty pages followed by a sync.

The buffer cache in shared_buffers is a lot more efficient for large scanning queries -- A select count(*) test will be CPU bound if it comes from shared_buffers or the OS page cache, and in the former case I have seen it execute up to 50% faster than the latter, by avoiding calling out to the OS to get pages, purely as a result of less CPU used.
 
 


However that isn't the exact same thing as a "cache" at least as I was
trying to describe it. shared buffers are used to keep track of pages
(as well as some other stuff) and their current status. That is not the
same as caching a relation.

It is not possible to pin a relation to memory using PostgreSQL.
PostgreSQL relies on the operating system for that type of caching.

The OS can't pin a relation either, from its point of view its all just a bunch of disk data blocks, not relations -- so it is all roughly equivalent.  The OS can do a bit better job at data prefetch on sequential scans or other predictable seek sequences (ARC stands for Adaptive Replacement Cache) than postgres currently does (no intelligent prefetch in postgres AFAIK).

So I apologize if I made it sound like Postgres cached the actual relation, its just pages -- but it is basically the same thing as the OS cache, but kept in process closer to the code that needs it.  Its a cache that prevents disk reads.

My suggestion for the OP is to try it both ways, and see what is better for his workload / OS / Hardware combination.
 

Joshua D. Drake





--