Thread: Optimizing Postgresql server and FreeBSD for heavy read and writes

Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Amitabh Kant
Date:
Hello

I have a server dedicated for Postgres with the following specs:

RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @ 2.33GHz 
OS: FreeBSD 8.0

It runs multiple (approx 10) databases ranging from 500MB to over 24 GB in size. All of them are of the same structure, and almost all of them have very heavy read and writes.

pgtune (http://pgfoundry.org/projects/pgtune/) suggests the settings to be changed as :

maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03
checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03
effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03
work_mem = 160MB # pg_generate_conf wizard 2010-02-03
wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03
checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03
shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03
max_connections = 100 # pg_generate_conf wizard 2010-02-03


While this gives me the changes for postgresql.conf, I am not sure of  of the chnages that I need to make in FreeBSD to support such large memory allocations. The last time I tried, Postgres refused to start and I had to fall back to the default settings.

I would appreciate if somebody could point out the sysctl/loader.conf settings that I need to have in FreeBSD.

With regards

Amitabh Kant

Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Ivan Voras
Date:
On 02/03/10 16:10, Amitabh Kant wrote:
> Hello
>
> I have a server dedicated for Postgres with the following specs:
>
> RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
> 2.33GHz
> OS: FreeBSD 8.0

If you really do have "heavy read and write" load on the server, nothing
will save you from the bottleneck of having only 4 drives in the system
(or more accurately: adding more memory will help reads but nothing
helps writes except more drivers or faster (SSD) drives). If you can,
add another 2 drives in RAID 1 and move+symlink the pg_xlog directory to
the new array.

> maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03
> checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03
> effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03
> work_mem = 160MB # pg_generate_conf wizard 2010-02-03
> wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03
> checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03
> shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03
> max_connections = 100 # pg_generate_conf wizard 2010-02-03

> I would appreciate if somebody could point out the sysctl/loader.conf
> settings that I need to have in FreeBSD.

Firstly, you need to run a 64-bit version ("amd64") of FreeBSD.

In /boot/loader.conf you will probably need to increase the number of
sysv ipc semaphores:

kern.ipc.semmni=512
kern.ipc.semmns=1024

This depends mostly on the number of connections allowed to the server.
The example values I gave above are more than enough but since this is a
boot-only tunable it is expensive to modify later.

In /etc/sysctl.conf you will need to increase the shared memory sizes,
e.g. for a 3900 MB shared_buffer:

kern.ipc.shmmax=4089446400
This is the maximum shared memory segment size, in bytes.

kern.ipc.shmall=1050000
This is the maximum amount of memory allowed to be used as sysv shared
memory, in 4 kB pages.

If the database contains many objects (tables, indexes, etc.) you may
need to increase the maximum number of open files and the amount of
memory for the directory list cache:

kern.maxfiles=16384
vfs.ufs.dirhash_maxmem=4194304

If you estimate you will have large sequential reads on the database,
you should increase read-ahead count:

vfs.read_max=32

Be sure that soft-updates is enabled on the file system you are using
for data. Ignore all Linux-centric discussions about problems with
journaling and write barriers :)

All settings in /etc/sysctl.conf can be changed at runtime (individually
or by invoking "/etc/rc.d/sysctl restart"), settings in loader.conf are
boot-time only.

Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Andy Colson
Date:
On 2/3/2010 9:10 AM, Amitabh Kant wrote:
> Hello
>
> I have a server dedicated for Postgres with the following specs:
>
> RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
> 2.33GHz
> OS: FreeBSD 8.0
>
> It runs multiple (approx 10) databases ranging from 500MB to over 24 GB
> in size. All of them are of the same structure, and almost all of them
> have very heavy read and writes.
>
>
> With regards
>
> Amitabh Kant

What problems are you having?  Is it slow?  Is there something you are
trying to fix, or is this just the first tune up?


 > memory allocations. The last time I tried, Postgres refused to start and
 > I had to fall back to the default settings.

Its probably upset about the amount of shared mem.  There is probably a
way in bsd to set the max amount of shared memory available.  A Quick
google turned up:

kern.ipc.shmmax

Dunno if thats right.  When you try to start PG, if it cannot allocate
enough shared mem it'll spit out an error message into its log saying
how much it tried to allocate.

Check:
http://archives.postgresql.org/pgsql-admin/2004-06/msg00155.php




 > maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03
 > checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03
 > effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03
 > work_mem = 160MB # pg_generate_conf wizard 2010-02-03
 > wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03
 > checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03
 > shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03
 > max_connections = 100 # pg_generate_conf wizard 2010-02-03

Some of these seem like too much.  I'd recommend starting with one or
two and see how it runs.  Then increase if you're still slow.

Start with effective_cache_size, shared_buffers and checkpoint_segments.

Wait until very last to play with work_mem and maintenance_work_mem.


-Andy

Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Robert Haas
Date:
On Wed, Feb 3, 2010 at 10:10 AM, Amitabh Kant <amitabhkant@gmail.com> wrote:
> work_mem = 160MB # pg_generate_conf wizard 2010-02-03

Overall these settings look sane, but this one looks like an
exception.  That is an enormous value for that parameter...

...Robert

Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Greg Smith
Date:
Robert Haas wrote:
On Wed, Feb 3, 2010 at 10:10 AM, Amitabh Kant <amitabhkant@gmail.com> wrote: 
work_mem = 160MB # pg_generate_conf wizard 2010-02-03   
Overall these settings look sane, but this one looks like an
exception.  That is an enormous value for that parameter... 

Yeah, I think I need to retune the suggestions for that parameter.  The idea behind the tuning profile used in the "web" and "OLTP" setups is that you're unlikely to have all the available connections doing something involving sorting at the same time with those workloads, and when it does happen you want it to use the fastest approach possible even if that takes more RAM so the client waiting for a response is more likely to get one on time.  That's why the work_mem figure in those situations is set very aggressively:  total_mem / connections, so on a 16GB server that comes out to the 160MB seen here.  I'm going to adjust that so that it's capped a little below (total_mem - shared_buffers) / connections instead.

pgtune just got a major bit of refactoring recently from Matt Harrison to make it more Python-esque, and I'll be pushing toward an official 1.0 with all the major loose ends cleaned up and an adjusted tuning model that will be available before 9.0 ships.  I'm seeing enough people interested in it now to justify putting another block of work into improving it.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Amitabh Kant
Date:
On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras <ivoras@freebsd.org> wrote:
On 02/03/10 16:10, Amitabh Kant wrote:
Hello

I have a server dedicated for Postgres with the following specs:

RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
2.33GHz
OS: FreeBSD 8.0

If you really do have "heavy read and write" load on the server, nothing will save you from the bottleneck of having only 4 drives in the system (or more accurately: adding more memory will help reads but nothing helps writes except more drivers or faster (SSD) drives). If you can, add another 2 drives in RAID 1 and move+symlink the pg_xlog directory to the new array.


Can't do anything about this server now, but would surely keep in mind before upgrading other servers. Would you recommend the same speed drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS or even SATA II)?

 

maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03
checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03
effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03
work_mem = 160MB # pg_generate_conf wizard 2010-02-03
wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03
checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03
shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03
max_connections = 100 # pg_generate_conf wizard 2010-02-03

I would appreciate if somebody could point out the sysctl/loader.conf
settings that I need to have in FreeBSD.

Firstly, you need to run a 64-bit version ("amd64") of FreeBSD.


Yes, its running amd64 arch.
 
In /boot/loader.conf you will probably need to increase the number of sysv ipc semaphores:

kern.ipc.semmni=512
kern.ipc.semmns=1024

This depends mostly on the number of connections allowed to the server. The example values I gave above are more than enough but since this is a boot-only tunable it is expensive to modify later.

In /etc/sysctl.conf you will need to increase the shared memory sizes, e.g. for a 3900 MB shared_buffer:

kern.ipc.shmmax=4089446400
This is the maximum shared memory segment size, in bytes.

kern.ipc.shmall=1050000
This is the maximum amount of memory allowed to be used as sysv shared memory, in 4 kB pages.

If the database contains many objects (tables, indexes, etc.) you may need to increase the maximum number of open files and the amount of memory for the directory list cache:

kern.maxfiles=16384
vfs.ufs.dirhash_maxmem=4194304

If you estimate you will have large sequential reads on the database, you should increase read-ahead count:

vfs.read_max=32

Be sure that soft-updates is enabled on the file system you are using for data. Ignore all Linux-centric discussions about problems with journaling and write barriers :)

All settings in /etc/sysctl.conf can be changed at runtime (individually or by invoking "/etc/rc.d/sysctl restart"), settings in loader.conf are boot-time only.

Thanks Ivan. That's a great explanation of the variables involved.


With regards

Amitabh Kant

Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Amitabh Kant
Date:
On Thu, Feb 4, 2010 at 12:11 AM, Andy Colson <andy@squeakycode.net> wrote:
On 2/3/2010 9:10 AM, Amitabh Kant wrote:
Hello

I have a server dedicated for Postgres with the following specs:

RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
2.33GHz
OS: FreeBSD 8.0

It runs multiple (approx 10) databases ranging from 500MB to over 24 GB
in size. All of them are of the same structure, and almost all of them
have very heavy read and writes.


With regards

Amitabh Kant

What problems are you having?  Is it slow?  Is there something you are trying to fix, or is this just the first tune up?

This is the first tune up. The system has worked pretty fine till now, but  it does lag once in a while, and I would like to optimize it before it becomes a bigger issue.



> memory allocations. The last time I tried, Postgres refused to start and
> I had to fall back to the default settings.

Its probably upset about the amount of shared mem.  There is probably a way in bsd to set the max amount of shared memory available.  A Quick google turned up:

kern.ipc.shmmax

Dunno if thats right.  When you try to start PG, if it cannot allocate enough shared mem it'll spit out an error message into its log saying how much it tried to allocate.

Check:
http://archives.postgresql.org/pgsql-admin/2004-06/msg00155.php





> maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03
> checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03
> effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03
> work_mem = 160MB # pg_generate_conf wizard 2010-02-03
> wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03
> checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03
> shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03
> max_connections = 100 # pg_generate_conf wizard 2010-02-03

Some of these seem like too much.  I'd recommend starting with one or two and see how it runs.  Then increase if you're still slow.

Start with effective_cache_size, shared_buffers and checkpoint_segments.

Wait until very last to play with work_mem and maintenance_work_mem.


-Andy

I would keep that in mind. Thanks Andy

With regards

Amitabh

Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Ivan Voras
Date:
On 4 February 2010 10:02, Amitabh Kant <amitabhkant@gmail.com> wrote:
> On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras <ivoras@freebsd.org> wrote:
>>
>> On 02/03/10 16:10, Amitabh Kant wrote:
>>>
>>> Hello
>>>
>>> I have a server dedicated for Postgres with the following specs:
>>>
>>> RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
>>> 2.33GHz
>>> OS: FreeBSD 8.0
>>
>> If you really do have "heavy read and write" load on the server, nothing
>> will save you from the bottleneck of having only 4 drives in the system (or
>> more accurately: adding more memory will help reads but nothing helps writes
>> except more drivers or faster (SSD) drives). If you can, add another 2
>> drives in RAID 1 and move+symlink the pg_xlog directory to the new array.
>
> Can't do anything about this server now, but would surely keep in mind
> before upgrading other servers. Would you recommend the same speed
> drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS
> or even SATA II)?

Again, it depends on your load. It would probably be best if they are
approximately the same speed; the location of pg_xlog will dictate
your write (UPDATE / INSERT / CREATE) speed.

Writes to your database go like this: the data is first written to the
WAL (this is the pg_xlog directory - the transaction log), then it is
read and written to the "main" database. If the main database is very
busy reading, transfers from WAL to the database will be slower.

Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Amitabh Kant
Date:


On Thu, Feb 4, 2010 at 3:10 PM, Ivan Voras <ivoras@freebsd.org> wrote:
On 4 February 2010 10:02, Amitabh Kant <amitabhkant@gmail.com> wrote:
> On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras <ivoras@freebsd.org> wrote:
>>
>> On 02/03/10 16:10, Amitabh Kant wrote:
>>>
>>> Hello
>>>
>>> I have a server dedicated for Postgres with the following specs:
>>>
>>> RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
>>> 2.33GHz
>>> OS: FreeBSD 8.0
>>
>> If you really do have "heavy read and write" load on the server, nothing
>> will save you from the bottleneck of having only 4 drives in the system (or
>> more accurately: adding more memory will help reads but nothing helps writes
>> except more drivers or faster (SSD) drives). If you can, add another 2
>> drives in RAID 1 and move+symlink the pg_xlog directory to the new array.
>
> Can't do anything about this server now, but would surely keep in mind
> before upgrading other servers. Would you recommend the same speed
> drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS
> or even SATA II)?

Again, it depends on your load. It would probably be best if they are
approximately the same speed; the location of pg_xlog will dictate
your write (UPDATE / INSERT / CREATE) speed.

Writes to your database go like this: the data is first written to the
WAL (this is the pg_xlog directory - the transaction log), then it is
read and written to the "main" database. If the main database is very
busy reading, transfers from WAL to the database will be slower.

Thanks Ivan. I have to go in for upgrade of couple of more servers. I will  be going in for RAID 1 (OS + pg_xlog ) and RAID 10 (Pgsql data), all of them of same speed.

With regards

Amitabh Kant

Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Amitabh Kant
Date:
On Thu, Feb 4, 2010 at 3:29 AM, Greg Smith <greg@2ndquadrant.com> wrote:
Robert Haas wrote:
On Wed, Feb 3, 2010 at 10:10 AM, Amitabh Kant <amitabhkant@gmail.com> wrote: 
work_mem = 160MB # pg_generate_conf wizard 2010-02-03   
Overall these settings look sane, but this one looks like an
exception.  That is an enormous value for that parameter... 

Yeah, I think I need to retune the suggestions for that parameter.  The idea behind the tuning profile used in the "web" and "OLTP" setups is that you're unlikely to have all the available connections doing something involving sorting at the same time with those workloads, and when it does happen you want it to use the fastest approach possible even if that takes more RAM so the client waiting for a response is more likely to get one on time.  That's why the work_mem figure in those situations is set very aggressively:  total_mem / connections, so on a 16GB server that comes out to the 160MB seen here.  I'm going to adjust that so that it's capped a little below (total_mem - shared_buffers) / connections instead.

Thanks Robert & Greg.  From what others have suggested,  I am going in for the following changes:
/boot/loader.conf:

kern.ipc.semmni=512
kern.ipc.semmns=1024
kern.ipc.semmnu=512



/etc/sysctl.conf:

kern.ipc.shm_use_phys=1
kern.ipc.shmmax=4089446400
kern.ipc.shmall=1050000
kern.maxfiles=16384
kern.ipc.semmsl=1024
kern.ipc.semmap=512
vfs.ufs.dirhash_maxmem=4194304
vfs.read_max=32



/usr/local/pgsql/data/postgresql.conf:

maintenance_work_mem            = 960MB        # pg_generate_conf wizard 2010-02-03
checkpoint_completion_target    = 0.9        # pg_generate_conf wizard 2010-02-03
effective_cache_size            = 11GB        # pg_generate_conf wizard 2010-02-03
work_mem                        = 110MB        # pg_generate_conf wizard 2010-02-03 Reduced as per Robert/Greg suggestions
wal_buffers                        = 8MB        # pg_generate_conf wizard 2010-02-03
checkpoint_segments                = 16        # pg_generate_conf wizard 2010-02-03
shared_buffers                    = 3840MB    # pg_generate_conf wizard 2010-02-03
max_connections                    = 100        # pg_generate_conf wizard 2010-02-03


Hope this works out good in my case.

With regards

Amitabh Kant


Re: Optimizing Postgresql server and FreeBSD for heavy read and writes

From
Matthew Wakeling
Date:
On Thu, 4 Feb 2010, Amitabh Kant wrote:
> On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras <ivoras@freebsd.org> wrote:
>> If you can, add another 2 drives in RAID 1 and move+symlink the pg_xlog
>> directory to the new array.
>
> Can't do anything about this server now, but would surely keep in mind
> before upgrading other servers. Would you recommend the same speed
> drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS
> or even SATA II)?

The performance requirements for the WAL are significantly lower than for
the main database. This is for two reasons - firstly the WAL is
write-only, and has no other activity. The WAL only gets read again in the
event of a crash. Secondly, writes to the WAL are sequential writes, which
is the fastest mode of operation for a disc, whereas the main database
discs will have to handle random access.

The main thing you need to make sure of is that the WAL is on a disc
system that has a battery-backed up cache. That way, it will be able to
handle the high rate of fsyncs that the WAL generates, and the cache will
convert that into a simple sequential write. Otherwise, you will be
limited to one fsync every 5ms (or whatever the access speed of your WAL
discs is).

If you make sure of that, then there is no reason to get expensive fast
discs for the WAL at all (assuming they are expensive enough to not lie
about flushing writes properly).

Matthew

--
So, given 'D' is undeclared too, with a default of zero, C++ is equal to D.
  mnw21, commenting on the "Surely the value of C++ is zero, but C is now 1"
  response to "No, C++ isn't equal to D. 'C' is undeclared [...] C++ should
  really be called 1" response to "C++ -- shouldn't it be called D?"