Thread: shared_buffers/effective_cache_size on 96GB server
Hi everyone,
I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10 15K SCSI drives which is runing Centos 6.2 x64. This server is mainly used for inserting/updating large amounts of data via copy/insert/update commands, and seldom for running select queries.
Here are the relevant configuration parameters I changed:
shared_buffers = 10GB
effective_cache_size = 90GB
work_mem = 32MB
maintenance_work_mem = 512MB
checkpoint_segments = 64
checkpoint_completion_target = 0.8
My biggest concern are shared_buffers and effective_cache_size, should I increase shared_buffers and decrease effective_cache_size? I read that values above 10GB for shared_buffers give lower performance, than smaller amounts?
free is currently reporting (during the loading of data):
$ free -m
total used free shared buffers cached
Mem: 96730 96418 311 0 71 93120
-/+ buffers/cache: 3227 93502
Swap: 21000 51 20949
So it did a little swapping, but only minor, still I should probably decrease shared_buffers so there is no swapping at all.
Thanks in advance,
Strahinja
I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10 15K SCSI drives which is runing Centos 6.2 x64. This server is mainly used for inserting/updating large amounts of data via copy/insert/update commands, and seldom for running select queries.
Here are the relevant configuration parameters I changed:
shared_buffers = 10GB
effective_cache_size = 90GB
work_mem = 32MB
maintenance_work_mem = 512MB
checkpoint_segments = 64
checkpoint_completion_target = 0.8
My biggest concern are shared_buffers and effective_cache_size, should I increase shared_buffers and decrease effective_cache_size? I read that values above 10GB for shared_buffers give lower performance, than smaller amounts?
free is currently reporting (during the loading of data):
$ free -m
total used free shared buffers cached
Mem: 96730 96418 311 0 71 93120
-/+ buffers/cache: 3227 93502
Swap: 21000 51 20949
So it did a little swapping, but only minor, still I should probably decrease shared_buffers so there is no swapping at all.
Thanks in advance,
Strahinja
Hm, I just notices that shared_buffers + effective_cache_size = 100 > 96GB, which can't be right. effective_cache_size should probably be 80GB.
Strahinja Kustudić | System Engineer | Nordeus
Strahinja Kustudić | System Engineer | Nordeus
On Wed, Oct 10, 2012 at 9:12 AM, Strahinja Kustudić <strahinjak@nordeus.com> wrote:
Hi everyone,
I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10 15K SCSI drives which is runing Centos 6.2 x64. This server is mainly used for inserting/updating large amounts of data via copy/insert/update commands, and seldom for running select queries.
Here are the relevant configuration parameters I changed:
shared_buffers = 10GB
effective_cache_size = 90GB
work_mem = 32MB
maintenance_work_mem = 512MB
checkpoint_segments = 64
checkpoint_completion_target = 0.8
My biggest concern are shared_buffers and effective_cache_size, should I increase shared_buffers and decrease effective_cache_size? I read that values above 10GB for shared_buffers give lower performance, than smaller amounts?
free is currently reporting (during the loading of data):
$ free -m
total used free shared buffers cached
Mem: 96730 96418 311 0 71 93120
-/+ buffers/cache: 3227 93502
Swap: 21000 51 20949
So it did a little swapping, but only minor, still I should probably decrease shared_buffers so there is no swapping at all.
Thanks in advance,
Strahinja
On 10.10.2012 09:12, Strahinja Kustudić wrote: > Hi everyone, > > I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and > RAID10 15K SCSI drives which is runing Centos 6.2 x64. This server is > mainly used for inserting/updating large amounts of data via > copy/insert/update commands, and seldom for running select queries. > > Here are the relevant configuration parameters I changed: > > shared_buffers = 10GB > effective_cache_size = 90GB > work_mem = 32MB > maintenance_work_mem = 512MB > checkpoint_segments = 64 > checkpoint_completion_target = 0.8 > > My biggest concern are shared_buffers and effective_cache_size, should I > increase shared_buffers and decrease effective_cache_size? I read that > values above 10GB for shared_buffers give lower performance, than > smaller amounts? > > free is currently reporting (during the loading of data): > > $ free -m > total used free shared buffers cached > Mem: 96730 96418 311 0 71 93120 > -/+ buffers/cache: 3227 93502 > Swap: 21000 51 20949 > > So it did a little swapping, but only minor, still I should probably > decrease shared_buffers so there is no swapping at all. That's hardly caused by shared buffers. The main point is that effective_cache_size is just a hint to the optimizer how much cache (shared buffers + page cache) to expect. So it's unlikely PostgreSQL is going to allocate 100GB of RAM or something. What have you set to the main /proc/sys/vm/ parameters? Mainly these three: /proc/sys/vm/swappiness /proc/sys/vm/overcommit_memory /proc/sys/vm/overcommit_ratio Tomas
Strahinja Kustudic wrote: >> I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10 15K SCSI drives >> which is runing Centos 6.2 x64. This server is mainly used for inserting/updating large amounts of >> data via copy/insert/update commands, and seldom for running select queries. >> >> Here are the relevant configuration parameters I changed: >> >> shared_buffers = 10GB >> effective_cache_size = 90GB >> work_mem = 32MB >> maintenance_work_mem = 512MB >> checkpoint_segments = 64 >> checkpoint_completion_target = 0.8 >> >> My biggest concern are shared_buffers and effective_cache_size, should I increase shared_buffers >> and decrease effective_cache_size? I read that values above 10GB for shared_buffers give lower >> performance, than smaller amounts? >> >> free is currently reporting (during the loading of data): >> >> $ free -m >> total used free shared buffers cached >> Mem: 96730 96418 311 0 71 93120 >> -/+ buffers/cache: 3227 93502 >> Swap: 21000 51 20949 >> >> So it did a little swapping, but only minor, still I should probably decrease shared_buffers so >> there is no swapping at all. > Hm, I just notices that shared_buffers + effective_cache_size = 100 > 96GB, which can't be right. > effective_cache_size should probably be 80GB. I think you misunderstood effective_cache_size. It does not influence memory usage, but query planning. It gives the planner an idea of how much memory there is for caching data, including the filesystem cache. So a good value for effective_cache_size would be total memory minus what the OS and others need minus what private memory the PostgreSQL backends need. The latter can be estimated as work_mem times max_connections. To avoid swapping, consider setting vm.swappiness to 0 in /etc/sysctl.conf. 10GB of shared_buffers is quite a lot. If you can run realistic performance tests, start with a lower value and increase until you cannot see a notable improvement. Yours, Laurenz Albe
On 10/10/2012 09:12, Strahinja Kustudić wrote: > Hi everyone, Hello, > > I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and > RAID10 15K SCSI drives which is runing Centos 6.2 x64. This server is > mainly used for inserting/updating large amounts of data via > copy/insert/update commands, and seldom for running select queries. > > Here are the relevant configuration parameters I changed: > > shared_buffers = 10GB Generally going over 4GB for shared_buffers doesn't help.. some of the overhead of bgwriter and checkpoints is more or less linear in the size of shared_buffers .. > effective_cache_size = 90GB effective_cache_size should be ~75% of the RAM (if it's a dedicated server) > work_mem = 32MB with 96GB of RAM I would raise default work_mem to something like 128MB > maintenance_work_mem = 512MB again, with 96GB of ram you can raise maintenance_work_mem to something like 4GB > checkpoint_segments = 64 > checkpoint_completion_target = 0.8 > > My biggest concern are shared_buffers and effective_cache_size, should > I increase shared_buffers and decrease effective_cache_size? I read > that values above 10GB for shared_buffers give lower performance, than > smaller amounts? > > free is currently reporting (during the loading of data): > > $ free -m > total used free shared buffers cached > Mem: 96730 96418 311 0 71 93120 > -/+ buffers/cache: 3227 93502 > Swap: 21000 51 20949 > > So it did a little swapping, but only minor, still I should probably > decrease shared_buffers so there is no swapping at all. > > Thanks in advance, > Strahinja Julien -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Attachment
Thanks for very fast replies everyone :)
@Laurenz I know that effective cache size is only used for the query planner, what I was saying is that if I tell it that it can have 90GB cached items, that is not trues, since the OS and Postgres process itself can take more than 6GB, which would mean 90GB is not the correct value, but if effective_cache size should be shared_buffers+page cache as Tomas said, than 90GB, won't be a problem.
@Tomas here are the values:
# cat /proc/sys/vm/swappiness
60
# cat /proc/sys/vm/overcommit_memory
0
# cat /proc/sys/vm/overcommit_ratio
50
I will turn of swappiness, I was meaning to do that, but I don't know much about the overcommit settings, I will read what they do.
@Julien thanks for the suggestions, I will tweak them like you suggested.
Strahinja Kustudić | System Engineer | Nordeus
@Laurenz I know that effective cache size is only used for the query planner, what I was saying is that if I tell it that it can have 90GB cached items, that is not trues, since the OS and Postgres process itself can take more than 6GB, which would mean 90GB is not the correct value, but if effective_cache size should be shared_buffers+page cache as Tomas said, than 90GB, won't be a problem.
@Tomas here are the values:
# cat /proc/sys/vm/swappiness
60
# cat /proc/sys/vm/overcommit_memory
0
# cat /proc/sys/vm/overcommit_ratio
50
I will turn of swappiness, I was meaning to do that, but I don't know much about the overcommit settings, I will read what they do.
@Julien thanks for the suggestions, I will tweak them like you suggested.
Strahinja Kustudić | System Engineer | Nordeus
On Wed, Oct 10, 2012 at 10:11 AM, Julien Cigar <jcigar@ulb.ac.be> wrote:
On 10/10/2012 09:12, Strahinja Kustudić wrote:Hello,Hi everyone,Generally going over 4GB for shared_buffers doesn't help.. some of the overhead of bgwriter and checkpoints is more or less linear in the size of shared_buffers ..
I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10 15K SCSI drives which is runing Centos 6.2 x64. This server is mainly used for inserting/updating large amounts of data via copy/insert/update commands, and seldom for running select queries.
Here are the relevant configuration parameters I changed:
shared_buffers = 10GBeffective_cache_size = 90GB
effective_cache_size should be ~75% of the RAM (if it's a dedicated server)work_mem = 32MB
with 96GB of RAM I would raise default work_mem to something like 128MBmaintenance_work_mem = 512MB
again, with 96GB of ram you can raise maintenance_work_mem to something like 4GBJuliencheckpoint_segments = 64
checkpoint_completion_target = 0.8
My biggest concern are shared_buffers and effective_cache_size, should I increase shared_buffers and decrease effective_cache_size? I read that values above 10GB for shared_buffers give lower performance, than smaller amounts?
free is currently reporting (during the loading of data):
$ free -m
total used free shared buffers cached
Mem: 96730 96418 311 0 71 93120
-/+ buffers/cache: 3227 93502
Swap: 21000 51 20949
So it did a little swapping, but only minor, still I should probably decrease shared_buffers so there is no swapping at all.
Thanks in advance,
Strahinja
--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 10/10/2012 10:30, Strahinja Kustudić wrote:
also with 15k SCSI you can reduce random_page_cost to 3.5 (instead of 4.0)
I also recommend to raise cpu_tuple_cost to 0.05 (instead of 0.01), set vm.swappiness to 0, vm.overcommit_memory to 2, and finally raise the read-ahead (something like 8192)
Thanks for very fast replies everyone :)
@Laurenz I know that effective cache size is only used for the query planner, what I was saying is that if I tell it that it can have 90GB cached items, that is not trues, since the OS and Postgres process itself can take more than 6GB, which would mean 90GB is not the correct value, but if effective_cache size should be shared_buffers+page cache as Tomas said, than 90GB, won't be a problem.
@Tomas here are the values:
# cat /proc/sys/vm/swappiness
60
# cat /proc/sys/vm/overcommit_memory
0
# cat /proc/sys/vm/overcommit_ratio
50
I will turn of swappiness, I was meaning to do that, but I don't know much about the overcommit settings, I will read what they do.
@Julien thanks for the suggestions, I will tweak them like you suggested.
also with 15k SCSI you can reduce random_page_cost to 3.5 (instead of 4.0)
I also recommend to raise cpu_tuple_cost to 0.05 (instead of 0.01), set vm.swappiness to 0, vm.overcommit_memory to 2, and finally raise the read-ahead (something like 8192)
Strahinja Kustudić | System Engineer | NordeusOn Wed, Oct 10, 2012 at 10:11 AM, Julien Cigar <jcigar@ulb.ac.be> wrote:On 10/10/2012 09:12, Strahinja Kustudić wrote:Hello,Hi everyone,Generally going over 4GB for shared_buffers doesn't help.. some of the overhead of bgwriter and checkpoints is more or less linear in the size of shared_buffers ..
I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10 15K SCSI drives which is runing Centos 6.2 x64. This server is mainly used for inserting/updating large amounts of data via copy/insert/update commands, and seldom for running select queries.
Here are the relevant configuration parameters I changed:
shared_buffers = 10GBeffective_cache_size = 90GB
effective_cache_size should be ~75% of the RAM (if it's a dedicated server)work_mem = 32MB
with 96GB of RAM I would raise default work_mem to something like 128MBmaintenance_work_mem = 512MB
again, with 96GB of ram you can raise maintenance_work_mem to something like 4GBJuliencheckpoint_segments = 64
checkpoint_completion_target = 0.8
My biggest concern are shared_buffers and effective_cache_size, should I increase shared_buffers and decrease effective_cache_size? I read that values above 10GB for shared_buffers give lower performance, than smaller amounts?
free is currently reporting (during the loading of data):
$ free -m
total used free shared buffers cached
Mem: 96730 96418 311 0 71 93120
-/+ buffers/cache: 3227 93502
Swap: 21000 51 20949
So it did a little swapping, but only minor, still I should probably decrease shared_buffers so there is no swapping at all.
Thanks in advance,
Strahinja
--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
-- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Attachment
Thanks for your help everyone.
I set:
shared_buffers = 4GB
effective_cache_size = 72GB
work_mem = 128MB
maintenance_work_mem = 4GB
checkpoint_segments = 64
checkpoint_completion_target = 0.9
random_page_cost = 3.5
cpu_tuple_cost = 0.05
Where can I get the values for random_page_cost and for cpu_tuple_cost where they depend on hardware? I know that for SSDs random_page_cost should be 1.0, but I have no idea what value this should be for different types of drives.
I also set:
vm.swappiness = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
But I don't understand why do I need to set overcommit_memory, since I only have postgres running, nothing else would allocate memory anyway?
I will set readahead later, first I want to see how is this working.
I set:
shared_buffers = 4GB
effective_cache_size = 72GB
work_mem = 128MB
maintenance_work_mem = 4GB
checkpoint_segments = 64
checkpoint_completion_target = 0.9
random_page_cost = 3.5
cpu_tuple_cost = 0.05
Where can I get the values for random_page_cost and for cpu_tuple_cost where they depend on hardware? I know that for SSDs random_page_cost should be 1.0, but I have no idea what value this should be for different types of drives.
I also set:
vm.swappiness = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
But I don't understand why do I need to set overcommit_memory, since I only have postgres running, nothing else would allocate memory anyway?
I will set readahead later, first I want to see how is this working.
Strahinja Kustudić | System Engineer | Nordeus
On Wed, Oct 10, 2012 at 10:52 AM, Julien Cigar <jcigar@ulb.ac.be> wrote:
On 10/10/2012 10:30, Strahinja Kustudić wrote:also with 15k SCSI you can reduce random_page_cost to 3.5 (instead of 4.0)Thanks for very fast replies everyone :)
@Laurenz I know that effective cache size is only used for the query planner, what I was saying is that if I tell it that it can have 90GB cached items, that is not trues, since the OS and Postgres process itself can take more than 6GB, which would mean 90GB is not the correct value, but if effective_cache size should be shared_buffers+page cache as Tomas said, than 90GB, won't be a problem.
@Tomas here are the values:
# cat /proc/sys/vm/swappiness
60
# cat /proc/sys/vm/overcommit_memory
0
# cat /proc/sys/vm/overcommit_ratio
50
I will turn of swappiness, I was meaning to do that, but I don't know much about the overcommit settings, I will read what they do.
@Julien thanks for the suggestions, I will tweak them like you suggested.
I also recommend to raise cpu_tuple_cost to 0.05 (instead of 0.01), set vm.swappiness to 0, vm.overcommit_memory to 2, and finally raise the read-ahead (something like 8192)Strahinja Kustudić | System Engineer | NordeusOn Wed, Oct 10, 2012 at 10:11 AM, Julien Cigar <jcigar@ulb.ac.be> wrote:On 10/10/2012 09:12, Strahinja Kustudić wrote:Hello,Hi everyone,Generally going over 4GB for shared_buffers doesn't help.. some of the overhead of bgwriter and checkpoints is more or less linear in the size of shared_buffers ..
I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10 15K SCSI drives which is runing Centos 6.2 x64. This server is mainly used for inserting/updating large amounts of data via copy/insert/update commands, and seldom for running select queries.
Here are the relevant configuration parameters I changed:
shared_buffers = 10GBeffective_cache_size = 90GB
effective_cache_size should be ~75% of the RAM (if it's a dedicated server)work_mem = 32MB
with 96GB of RAM I would raise default work_mem to something like 128MBmaintenance_work_mem = 512MB
again, with 96GB of ram you can raise maintenance_work_mem to something like 4GBJuliencheckpoint_segments = 64
checkpoint_completion_target = 0.8
My biggest concern are shared_buffers and effective_cache_size, should I increase shared_buffers and decrease effective_cache_size? I read that values above 10GB for shared_buffers give lower performance, than smaller amounts?
free is currently reporting (during the loading of data):
$ free -m
total used free shared buffers cached
Mem: 96730 96418 311 0 71 93120
-/+ buffers/cache: 3227 93502
Swap: 21000 51 20949
So it did a little swapping, but only minor, still I should probably decrease shared_buffers so there is no swapping at all.
Thanks in advance,
Strahinja
--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance-- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
On 10/10/2012 02:12 AM, Strahinja Kustudić wrote: > total used free shared buffers cached > Mem: 96730 96418 311 0 71 93120 Wow, look at all that RAM. Something nobody has mentioned yet, you'll want to set some additional kernel parameters for this, to avoid getting occasional IO storms caused by dirty memory flushes. vm.dirty_background_ratio = 1 vm.dirty_ratio = 5 Again, these would go in sysctl.conf, or /etc/sysctl.d/10-dbserver.conf or something. If you have a newer kernel, look into vm.dirty_background_bytes, and vm.dirty_bytes. The why of this is brought up occasionally here, but it comes down to your vast amount of memory. The defaults for even late Linux kernels is 5% for dirty_background_ratio, and 10% for dirty_ratio. So if you multiply it out, the kernel will allow about 4.8GB of dirty memory before attempting to flush it to disk. If that number reaches 9.6, the system goes synchronous, and no other disk writes can take place until *all 9.6GB* is flushed. Even with a fast disk subsystem, that's a pretty big gulp. The idea here is to keep it writing in the background by setting a low limit, so it never reaches a critical mass that causes it to snowball into the more dangerous upper limit. If you have a newer kernel, the ability to set "bytes" is a much more granular knob that can be used to match RAID buffer sizes. You'll probably want to experiment with this a bit before committing to a setting. > So it did a little swapping, but only minor, still I should probably > decrease shared_buffers so there is no swapping at all. Don't worry about that amount of swapping. As others have said here, you can reduce that to 0, and even then, the OS will still swap something occasionally. It's really just a hint to the kernel how much swapping you want to go on, and it's free to ignore it in cases where it knows some data won't be accessed after initialization or something, so it swaps it out anyway. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Shaun,
running these commands:
#sysctl vm.dirty_ratio
vm.dirty_ratio = 40
# sysctl vm.dirty_background_ratio
vm.dirty_background_ratio = 10
shows that these values are even higher by default. When you said RAID buffer size, you meant the controllers cache memory size?
Regards,
Strahinja
running these commands:
#sysctl vm.dirty_ratio
vm.dirty_ratio = 40
# sysctl vm.dirty_background_ratio
vm.dirty_background_ratio = 10
shows that these values are even higher by default. When you said RAID buffer size, you meant the controllers cache memory size?
Regards,
Strahinja
On Wed, Oct 10, 2012 at 3:09 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 10/10/2012 02:12 AM, Strahinja Kustudić wrote:Wow, look at all that RAM. Something nobody has mentioned yet, you'll want to set some additional kernel parameters for this, to avoid getting occasional IO storms caused by dirty memory flushes.total used free shared buffers cached
Mem: 96730 96418 311 0 71 93120
vm.dirty_background_ratio = 1
vm.dirty_ratio = 5
Again, these would go in sysctl.conf, or /etc/sysctl.d/10-dbserver.conf or something. If you have a newer kernel, look into vm.dirty_background_bytes, and vm.dirty_bytes.
The why of this is brought up occasionally here, but it comes down to your vast amount of memory. The defaults for even late Linux kernels is 5% for dirty_background_ratio, and 10% for dirty_ratio. So if you multiply it out, the kernel will allow about 4.8GB of dirty memory before attempting to flush it to disk. If that number reaches 9.6, the system goes synchronous, and no other disk writes can take place until *all 9.6GB* is flushed. Even with a fast disk subsystem, that's a pretty big gulp.
The idea here is to keep it writing in the background by setting a low limit, so it never reaches a critical mass that causes it to snowball into the more dangerous upper limit. If you have a newer kernel, the ability to set "bytes" is a much more granular knob that can be used to match RAID buffer sizes. You'll probably want to experiment with this a bit before committing to a setting.Don't worry about that amount of swapping. As others have said here, you can reduce that to 0, and even then, the OS will still swap something occasionally. It's really just a hint to the kernel how much swapping you want to go on, and it's free to ignore it in cases where it knows some data won't be accessed after initialization or something, so it swaps it out anyway.So it did a little swapping, but only minor, still I should probably
decrease shared_buffers so there is no swapping at all.
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 10/10/2012 09:35 AM, Strahinja Kustudić wrote: > #sysctl vm.dirty_ratio > vm.dirty_ratio = 40 > # sysctl vm.dirty_background_ratio > vm.dirty_background_ratio = 10 Ouuuuch. That looks a lot like an old RHEL or CentOS system. Change those ASAP. Currently your system won't start writing dirty buffers until it hits 9.6GB. :( > shows that these values are even higher by default. When you said > RAID buffer size, you meant the controllers cache memory size? Yeah, that. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
I will change those, but I don't think this is that big of an issue if most of the IO is done by Postgres, since Postgres has it's own mechanism to tell the OS to sync the data to disk. For example when it's writing a wal file, or when it's writing a check point, those do not get cached.
Regards,
Strahinja
Regards,
Strahinja
On Wed, Oct 10, 2012 at 4:38 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 10/10/2012 09:35 AM, Strahinja Kustudić wrote:Ouuuuch. That looks a lot like an old RHEL or CentOS system. Change those ASAP. Currently your system won't start writing dirty buffers until it hits 9.6GB. :(#sysctl vm.dirty_ratio
vm.dirty_ratio = 40
# sysctl vm.dirty_background_ratio
vm.dirty_background_ratio = 10Yeah, that. :)shows that these values are even higher by default. When you said
RAID buffer size, you meant the controllers cache memory size?
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 10/10/2012 09:49 AM, Strahinja Kustudić wrote: > I will change those, but I don't think this is that big of an issue if > most of the IO is done by Postgres, since Postgres has it's own > mechanism to tell the OS to sync the data to disk. For example when it's > writing a wal file, or when it's writing a check point, those do not get > cached. You'd be surprised. Greg Smith did a bunch of work a couple years back that supported these changes. Most DBAs with heavily utilized systems could even see this in action by turning on checkpoint logging, and there's an occasional period where the sync time lags into the minutes due to a synchronous IO switch. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Wed, Oct 10, 2012 at 09:12:20AM +0200, Strahinja Kustudić wrote: > Hi everyone, > > I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10 15K > SCSI drives which is runing Centos 6.2 x64. This server is mainly used for > inserting/updating large amounts of data via copy/insert/update commands, and > seldom for running select queries. > > Here are the relevant configuration parameters I changed: > > shared_buffers = 10GB > effective_cache_size = 90GB > work_mem = 32MB > maintenance_work_mem = 512MB > checkpoint_segments = 64 > checkpoint_completion_target = 0.8 > > My biggest concern are shared_buffers and effective_cache_size, should I > increase shared_buffers and decrease effective_cache_size? I read that values > above 10GB for shared_buffers give lower performance, than smaller amounts? > > free is currently reporting (during the loading of data): > > $ free -m > total used free shared buffers cached > Mem: 96730 96418 311 0 71 93120 > -/+ buffers/cache: 3227 93502 > Swap: 21000 51 20949 > > So it did a little swapping, but only minor, still I should probably decrease > shared_buffers so there is no swapping at all. You might want to read my blog entry about swap space: http://momjian.us/main/blogs/pgblog/2012.html#July_25_2012 It is probably swapping unused memory _out_ to make more use of RAM for cache. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Oct 10, 2012 at 10:11:30AM +0200, Julien Cigar wrote: > On 10/10/2012 09:12, Strahinja Kustudić wrote: > >Hi everyone, > > Hello, > > > > >I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM > >and RAID10 15K SCSI drives which is runing Centos 6.2 x64. This > >server is mainly used for inserting/updating large amounts of data > >via copy/insert/update commands, and seldom for running select > >queries. > > > >Here are the relevant configuration parameters I changed: > > > >shared_buffers = 10GB > > Generally going over 4GB for shared_buffers doesn't help.. some of > the overhead of bgwriter and checkpoints is more or less linear in > the size of shared_buffers .. > > >effective_cache_size = 90GB > > effective_cache_size should be ~75% of the RAM (if it's a dedicated server) Why guess? Use 'free' to tell you the kernel cache size: http://momjian.us/main/blogs/pgblog/2012.html#May_4_2012 -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Oct 10, 2012 at 1:10 PM, Bruce Momjian <bruce@momjian.us> wrote: >> >shared_buffers = 10GB >> >> Generally going over 4GB for shared_buffers doesn't help.. some of >> the overhead of bgwriter and checkpoints is more or less linear in >> the size of shared_buffers .. >> >> >effective_cache_size = 90GB >> >> effective_cache_size should be ~75% of the RAM (if it's a dedicated server) > > Why guess? Use 'free' to tell you the kernel cache size: > > http://momjian.us/main/blogs/pgblog/2012.html#May_4_2012 Why does nobody every mention that concurrent access has to be taken into account? Ie: if I expect concurrent access to 10 really big indices, I'll set effective_cache_size = free ram / 10
On Wed, Oct 10, 2012 at 02:05:20PM -0300, Claudio Freire wrote: > On Wed, Oct 10, 2012 at 1:10 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> >shared_buffers = 10GB > >> > >> Generally going over 4GB for shared_buffers doesn't help.. some of > >> the overhead of bgwriter and checkpoints is more or less linear in > >> the size of shared_buffers .. > >> > >> >effective_cache_size = 90GB > >> > >> effective_cache_size should be ~75% of the RAM (if it's a dedicated server) > > > > Why guess? Use 'free' to tell you the kernel cache size: > > > > http://momjian.us/main/blogs/pgblog/2012.html#May_4_2012 > > Why does nobody every mention that concurrent access has to be taken > into account? > > Ie: if I expect concurrent access to 10 really big indices, I'll set > effective_cache_size = free ram / 10 It is true that the estimate assumes a single session is using all the cache, but I think that is based on the assumion is that there is a major overlap between the cache needs of multiple sessions. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 10/10/2012 12:05 PM, Claudio Freire wrote: > Why does nobody every mention that concurrent access has to be taken > into account? That's actually a good point. But if you have one giant database, the overlap of which tables are being accessed by various sessions is going to be immense. There probably should be a point about this in the docs, though. There are more and more shared-hosting setups or places that spread their data horizontally across separate databases for various clients, and in those cases, parallel usage does not imply overlap. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Wed, Oct 10, 2012 at 3:18 PM, Shaun Thomas <sthomas@optionshouse.com> wrote: >> Why does nobody every mention that concurrent access has to be taken >> into account? > > > That's actually a good point. But if you have one giant database, the > overlap of which tables are being accessed by various sessions is going to > be immense. That's why I said "several huge indices". If regularly accessed indices are separate, and big, it means they don't overlap nor do they fit in any cache.
On Wed, Oct 10, 2012 at 12:12 AM, Strahinja Kustudić <strahinjak@nordeus.com> wrote: > Hi everyone, > > I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10 > 15K SCSI drives which is runing Centos 6.2 x64. How many drives in the RAID? > This server is mainly used > for inserting/updating large amounts of data via copy/insert/update > commands, and seldom for running select queries. Are there a lot of indexes? > > Here are the relevant configuration parameters I changed: > > shared_buffers = 10GB > effective_cache_size = 90GB > work_mem = 32MB > maintenance_work_mem = 512MB > checkpoint_segments = 64 > checkpoint_completion_target = 0.8 > > My biggest concern are shared_buffers and effective_cache_size, should I > increase shared_buffers and decrease effective_cache_size? Are you experiencing performance problems? If so, what are they? > I read that > values above 10GB for shared_buffers give lower performance, than smaller > amounts? There are reports that large shared_buffers can lead to latency spikes. I don't know how sensitive your work load is to latency, though. Nor how much those reports apply to 9.1. > > free is currently reporting (during the loading of data): > > $ free -m > total used free shared buffers cached > Mem: 96730 96418 311 0 71 93120 > -/+ buffers/cache: 3227 93502 > Swap: 21000 51 20949 > > So it did a little swapping, but only minor, The kernel has, over the entire time the server has been up, found 51 MB of process memory to swap. That doesn't really mean anything. Do you see active swapping going on, like with vmstat? Cheers, Jeff
@Bruce Thanks for your articles, after reading them all I don't think disabling swap is a good idea now. Also you said to see the effective_cache_size I should check it with free. My question is should I use the value that free is showing as cached, or a little lower one, since not everything in the cache is because of Postgres.
Strahinja Kustudić | System Engineer | Nordeus
@Claudio So you are basically saying that if I have set effective_cache_size to 10GB and I have 10 concurrent processes which are using 10 different indices which are for example 2GB, it would be better to set the effective_cache size to 1GB? Since if I leave it at 10GB each running process query planner will think the whole index is in cache and that won't be true? Did I get that right?
@Jeff I have 4 drives in RADI10. The database has around 80GB of indices. I'm not experiencing any slow downs, I would just like to increase the performance of update/insert, since it needs to insert a lot of data and to make the select queries faster since they are done on a lot of big tables. I am experiencing a lot of performance problems when autovacuum kicks in for a few big tables, since it slows downs things a lot. I didn't notice any swapping and I know those 51MB which were swapped were just staying there, so swap isn't an issue at all.
Strahinja Kustudić | System Engineer | Nordeus
On Wed, Oct 10, 2012 at 9:30 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Oct 10, 2012 at 12:12 AM, Strahinja KustudićHow many drives in the RAID?
<strahinjak@nordeus.com> wrote:
> Hi everyone,
>
> I have a Postgresql 9.1 dedicated server with 16 cores, 96GB RAM and RAID10
> 15K SCSI drives which is runing Centos 6.2 x64.Are there a lot of indexes?
> This server is mainly used
> for inserting/updating large amounts of data via copy/insert/update
> commands, and seldom for running select queries.Are you experiencing performance problems? If so, what are they?
>
> Here are the relevant configuration parameters I changed:
>
> shared_buffers = 10GB
> effective_cache_size = 90GB
> work_mem = 32MB
> maintenance_work_mem = 512MB
> checkpoint_segments = 64
> checkpoint_completion_target = 0.8
>
> My biggest concern are shared_buffers and effective_cache_size, should I
> increase shared_buffers and decrease effective_cache_size?There are reports that large shared_buffers can lead to latency
> I read that
> values above 10GB for shared_buffers give lower performance, than smaller
> amounts?
spikes. I don't know how sensitive your work load is to latency,
though. Nor how much those reports apply to 9.1.The kernel has, over the entire time the server has been up, found 51
>
> free is currently reporting (during the loading of data):
>
> $ free -m
> total used free shared buffers cached
> Mem: 96730 96418 311 0 71 93120
> -/+ buffers/cache: 3227 93502
> Swap: 21000 51 20949
>
> So it did a little swapping, but only minor,
MB of process memory to swap. That doesn't really mean anything. Do
you see active swapping going on, like with vmstat?
Cheers,
Jeff
On Wed, Oct 10, 2012 at 5:12 PM, Strahinja Kustudić <strahinjak@nordeus.com> wrote: > @Claudio So you are basically saying that if I have set effective_cache_size > to 10GB and I have 10 concurrent processes which are using 10 different > indices which are for example 2GB, it would be better to set the > effective_cache size to 1GB? Since if I leave it at 10GB each running > process query planner will think the whole index is in cache and that won't > be true? Did I get that right? Yep. You might get away with setting 2GB, if you're willing to bet there won't be 100% concurrency. But the safest setting would be 1G.
On Wed, Oct 10, 2012 at 10:12:51PM +0200, Strahinja Kustudić wrote: > @Bruce Thanks for your articles, after reading them all I don't think disabling > swap is a good idea now. Also you said to see the effective_cache_size I should > check it with free. My question is should I use the value that free is showing > as cached, or a little lower one, since not everything in the cache is because > of Postgres. Well, you are right that some of that might not be Postgres, so yeah, you can lower it somewhat. > @Claudio So you are basically saying that if I have set effective_cache_size to > 10GB and I have 10 concurrent processes which are using 10 different indices > which are for example 2GB, it would be better to set the effective_cache size > to 1GB? Since if I leave it at 10GB each running process query planner will > think the whole index is in cache and that won't be true? Did I get that right? Well, the real question is whether, while traversing the index, if some of the pages are going to be removed from the cache by other process cache usage. effective_cache_size is not figuring the cache will remain between queries. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Hi, On 10 October 2012 19:11, Julien Cigar <jcigar@ulb.ac.be> wrote: >> shared_buffers = 10GB > > > Generally going over 4GB for shared_buffers doesn't help.. some of the > overhead of bgwriter and checkpoints is more or less linear in the size of > shared_buffers .. Nothing is black or white; It's all shades of Grey :) It depends on workload. In my case external consultants recommended 8GB and I was able to increase it up to 10GB. This was mostly read-only workload. From my experience large buffer cache acts as handbrake for write-heavy workloads. -- Ondrej Ivanic (ondrej.ivanic@gmail.com) (http://www.linkedin.com/in/ondrejivanic)
On Wed, Oct 10, 2012 at 7:06 PM, Ondrej Ivanič <ondrej.ivanic@gmail.com> wrote: >> Generally going over 4GB for shared_buffers doesn't help.. some of the >> overhead of bgwriter and checkpoints is more or less linear in the size of >> shared_buffers .. > > Nothing is black or white; It's all shades of Grey :) It depends on > workload. In my case external consultants recommended 8GB and I was > able to increase it up to 10GB. This was mostly read-only workload. > From my experience large buffer cache acts as handbrake for > write-heavy workloads. Which makes me ask... ...why can't checkpoint_timeout be set above 1h? Mostly for the checkpoint target thing. I know, you'd need an unholy amount of WAL and recovery time, but modern systems I think can handle that (especially if you don't care much about recovery time). I usually set checkpoint_timeout to approach the time between periodic mass updates, and it works rather nice. Except when those updates are spaced more than 1h, my hands are tied.
On Wed, Oct 10, 2012 at 2:03 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Wed, Oct 10, 2012 at 10:12:51PM +0200, Strahinja Kustudić wrote: >> @Claudio So you are basically saying that if I have set effective_cache_size to >> 10GB and I have 10 concurrent processes which are using 10 different indices >> which are for example 2GB, it would be better to set the effective_cache size >> to 1GB? Since if I leave it at 10GB each running process query planner will >> think the whole index is in cache and that won't be true? Did I get that right? > > Well, the real question is whether, while traversing the index, if some > of the pages are going to be removed from the cache by other process > cache usage. effective_cache_size is not figuring the cache will remain > between queries. Does anyone see effective_cache_size make a difference anyway? If so, in what circumstances? In my hands, queries for which effective_cache_size might come into play (for deciding between seq scan and index scan) are instead planned as bitmap scans. Cheers, Jeff
On Wed, Oct 10, 2012 at 7:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> Well, the real question is whether, while traversing the index, if some >> of the pages are going to be removed from the cache by other process >> cache usage. effective_cache_size is not figuring the cache will remain >> between queries. > > Does anyone see effective_cache_size make a difference anyway? If so, > in what circumstances? In my case, if I set it too high, I get impossibly suboptimal plans when an index scan over millions of rows hits the disk way too often way too randomly. The difference is minutes for a seqscan vs hours for the index scan. In fact, I prefer setting it too low than too high.
On Wed, Oct 10, 2012 at 4:37 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Wed, Oct 10, 2012 at 7:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> Well, the real question is whether, while traversing the index, if some >>> of the pages are going to be removed from the cache by other process >>> cache usage. effective_cache_size is not figuring the cache will remain >>> between queries. >> >> Does anyone see effective_cache_size make a difference anyway? If so, >> in what circumstances? > > In my case, if I set it too high, I get impossibly suboptimal plans > when an index scan over millions of rows hits the disk way too often > way too randomly. The difference is minutes for a seqscan vs hours for > the index scan. In fact, I prefer setting it too low than too high. There's a corollary for very fast disk subsystems. If you've got say 40 15krpm disks in a RAID-10 you can get sequential read speeds into the gigabytes per second, so that sequential page access costs MUCH lower than random page access, to the point that if seq page access is rated a 1, random page access should be much higher, sometimes on the order of 100 or so. I.e. sequential accesses are almost always preferred, especially if you're getting more than a tiny portion of the table at one time. As for the arguments for / against having a swap space, no one has mentioned the one I've run into on many older kernels, and that is BUGs. I have had to turn off swap on very large mem machines with 2.6.xx series kernels in the past. These machines all had properly set vm.* settings for dirty buffers and percent etc. Didn't matter, as after 2 to 4 weeks of hard working uptimes, I'd get an alert on the db server for high load, log in, and see kswapd working its butt off doing... NOTHING. Load would be in the 50 to 150 range. iostat showed NOTHING in terms of si/so/bi/bo and so on. kswapd wasn't in a D (iowait) state, but rather R, pegging a CPU core at 100% while running, and apparently blocking a lot of other processes that wanted to access memory, all of which were S(leeping) or R(unning). Two seconds after a sudo swapoff -a completed and my machine went back to a load of 2 to 5 as was normal for it. Honestly if you're running out of memory on a machine with 256G and needing swap, you've got other very real memory usage issues you've been ignoring to get to that point. Are all those bugs fixed in the 3.0.latest kernels? Not sure, but I haven't had this issue on any big memory servers lately and they've all had swap turned on.
Jeff, > Does anyone see effective_cache_size make a difference anyway? If so, > in what circumstances? E_C_S, together with random_page_cost, the table and index sizes, the row estimates and the cpu_* costs, form an equation which estimates the cost of doing various kinds of scans, particularly index scan vs. table scan. If you have an extremely small database (< shared_buffers) or a very large database ( > 50X RAM ), the setting for E_C_S probably doesn't matter, but in the fairly common case where some tables and indexes fit in RAM and some don't, it matters. > In my hands, queries for which effective_cache_size might come into > play (for deciding between seq scan and index scan) are instead > planned as bitmap scans. You have a very unusual workload, or a very small database. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Oct 11, 2012 at 11:17 AM, Josh Berkus <josh@agliodbs.com> wrote: > >> Does anyone see effective_cache_size make a difference anyway? If so, >> in what circumstances? > > E_C_S, together with random_page_cost, the table and index sizes, the > row estimates and the cpu_* costs, form an equation which estimates the > cost of doing various kinds of scans, particularly index scan vs. table > scan. E_C_S only comes into play when the same table pages are (predicted to be) visited repeatedly during the index scan, but this is the same situation in which a bitmap scan is generally preferred anyway. In fact the two seem to be conceptually very similar (either avoid actually visiting the block repeatedly, or avoid the IO cost of visiting the block repeatedly), and I'm not sure why bitmap scans comes out on top--there doesn't seem to be a CPU cost estimate of visiting a block which is assumed to already be in memory, nor is bitmap scan given credit for the use of effective_io_concurrency. But I found a simple case (over in "Unused index influencing sequential scan plan") which is very sensitive to E_C_S. When the index scan is being done to avoid a costly sort or aggregation, then it can't be usefully replaced with a bitmap scan since it won't produce index-order sorted output. >> In my hands, queries for which effective_cache_size might come into >> play (for deciding between seq scan and index scan) are instead >> planned as bitmap scans. > > You have a very unusual workload, or a very small database. I think all real workloads are unusual, otherwise benchmarking would be easy...but since complex queries are intractable to figure out what the planner is thinking, I'm biased to using simple ones when trying to figure out general principles. I can make the database look as big or small as I want (relative to RAM), by feeding effective_cache_size false information. Anyway, it seems like the consequences of overestimating E_C_S (by underestimating the number of processes that might expect to benefit from it concurrently) are worse than the consequences of underestimating it--assuming you have the types of queries for which it makes much of a difference. Cheers, Jeff
On Wed, Oct 10, 2012 at 1:12 PM, Strahinja Kustudić <strahinjak@nordeus.com> wrote: > @Claudio So you are basically saying that if I have set effective_cache_size > to 10GB and I have 10 concurrent processes which are using 10 different > indices which are for example 2GB, It is the size of the table, not the index, which is primarily of concern. However, that mostly factors into how postgres uses effective_cache_size, not how you set it. > it would be better to set the > effective_cache size to 1GB? If 10GB were the correct setting for a system with only one process trying to run that type of query at a time, then 1 GB would be the correct setting for 10 concurrent processing running that type of query concurrently. But, I think there is little reason to think that 10GB actually would be the correct setting for the first case, so little reason to think 1GB is the correct setting in the 2nd case. Since you have 96GB of RAM, I would think that 10GB is an appropriate setting *already taking concurrency into account*, and would be too low if you were not expecting any concurrency. In any case, the setting of effective_cache size shouldn't affect simple inserts or copies at all, since those operations don't use large index range scans. > Since if I leave it at 10GB each running > process query planner will think the whole index is in cache and that won't > be true? Did I get that right? It isn't mostly about how much of the index is in cache, but rather how much of the table is in cache. > > @Jeff I have 4 drives in RADI10. The database has around 80GB of indices. That seems like a pretty small disk set for a server of this size. Do you know what percentage of that 80GB of indices gets dirtied during any given round of batch loading/updating? I think that that could easily be your bottleneck, how fast you can write out dirtied index pages, which are likely being written randomly rather than sequentially. > I'm not experiencing any slow downs, I would just like to increase the > performance of update/insert, since it needs to insert a lot of data and to > make the select queries faster since they are done on a lot of big tables. I think these two things are in tension. The faster the inserts and updates run, the more resources they will take away from the selects during those periods. If you are doing batch copies, then as long as one batch has finished before the next one needs to start, isn't that fast enough? Maybe the goal should be to throttle the inserts so that the selects see a more steady competition for IO. > I > am experiencing a lot of performance problems when autovacuum kicks in for a > few big tables, since it slows downs things a lot. You can tune the autovacuum to make them slower. But it sounds like maybe you should have put more money into spindles and less into CPU cores. (I think that is a very common situation to be in). Cheers, Jeff
On Wed, Oct 10, 2012 at 10:36 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Oct 10, 2012 at 4:37 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >> >> In my case, if I set it too high, I get impossibly suboptimal plans >> when an index scan over millions of rows hits the disk way too often >> way too randomly. The difference is minutes for a seqscan vs hours for >> the index scan. In fact, I prefer setting it too low than too high. > > There's a corollary for very fast disk subsystems. If you've got say > 40 15krpm disks in a RAID-10 you can get sequential read speeds into > the gigabytes per second, so that sequential page access costs MUCH > lower than random page access, to the point that if seq page access is > rated a 1, random page access should be much higher, sometimes on the > order of 100 or so. On the other hand, if you have 40 very busy connections, then if they are all doing sequential scans on different tables they will interfere with each other and will have to divide up the RAID throughput, while if they are doing random fetches they will get along nicely on that RAID. So you have to know how much concurrency of the relevant type you expect to see. The default page cost settings already assume that random fetches are far more likely to be cache hits than sequential fetches are. If that is not true, then the default random page cost is way too low, regardless of the number of spindles or the concurrency. Cheers, Jeff
On Thu, Oct 18, 2012 at 4:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> @Claudio So you are basically saying that if I have set effective_cache_size >> to 10GB and I have 10 concurrent processes which are using 10 different >> indices which are for example 2GB, > > It is the size of the table, not the index, which is primarily of > concern. However, that mostly factors into how postgres uses > effective_cache_size, not how you set it. You're right, I just noticed that a few minutes ago (talk about telepathy).
On Thu, Oct 18, 2012 at 1:50 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Wed, Oct 10, 2012 at 10:36 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Wed, Oct 10, 2012 at 4:37 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >>> >>> In my case, if I set it too high, I get impossibly suboptimal plans >>> when an index scan over millions of rows hits the disk way too often >>> way too randomly. The difference is minutes for a seqscan vs hours for >>> the index scan. In fact, I prefer setting it too low than too high. >> >> There's a corollary for very fast disk subsystems. If you've got say >> 40 15krpm disks in a RAID-10 you can get sequential read speeds into >> the gigabytes per second, so that sequential page access costs MUCH >> lower than random page access, to the point that if seq page access is >> rated a 1, random page access should be much higher, sometimes on the >> order of 100 or so. > > On the other hand, if you have 40 very busy connections, then if they > are all doing sequential scans on different tables they will interfere > with each other and will have to divide up the RAID throughput, while > if they are doing random fetches they will get along nicely on that > RAID. So you have to know how much concurrency of the relevant type > you expect to see. My experience is that both read ahead and caching will make it more resilient than that, to the point that it can take several times the number of read clients than the number of spindles before things get as slow as random access. While it's a performance knee to be aware of, often by the time you have enough clients for it to matter you've already maxxed out either memory bw or all the CPU cores. But again, this is with dozens to hundreds of disk drives. Not four or eight etc. And it's very access pattern dependent. On the last machine I had where we cranked up random versus sequential costs, it was a reporting server with 5 or 6TB of data that regularly got trundled through regularly for aggregation. Daily slices of data, in partitions were 1GB to 10GB. For this machine the access patterns were almost never random. Just a handful of queries running in a random access pattern could interfere with a 5 minute long sequential reporting query and make it suddenly take hours. It had 16x7200rpm 2TB drives and would regularly hand three or four queries at a time, all running plenty fast. But crank up one thread of pgbench and it would slow to a crawl.