Thread: Vacuums on large busy databases

Vacuums on large busy databases

From
Francisco Reyes
Date:
My setup:
Freebsd 6.1
Postgresql 8.1.4
Memory: 8GB
SATA Disks

Raid 1 10 spindles (2 as hot spares)
500GB disks (16MB buffer), 7200 rpm
Raid 10

Raid 2 4 spindles
150GB 10K rpm disks
Raid 10

shared_buffers = 10000
temp_buffers = 1500
work_mem = 32768                # 32MB
maintenance_work_mem = 524288   # 512MB

checkpoint_segments = 64
Just increased to 64 today.. after reading this may help. Was 5 before.

pg_xlog on second raid (which sees very little activity)

Database sizes: 1 200GB+ Db and 2 100GB+

I run 3 daily "vacuumdb -azv". The vacuums were taking 2 to 3 hours.
Recently we have started to do some data mass loading and now the vacuums
are taking close to 5 hours AND it seems they may be slowing down the loads.

These are not bulk loads in the sense that we don't have a big file that we
can do a copy.. instead it is data which several programs are processing
from some temporary tables so we have lots of inserts. There are also
updates to keep track of some totals.

I am looking to either improve the time of the vacuum or decrease it's
impact on the loads.
Are the variables:
#vacuum_cost_delay = 0                  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1               # 0-10000 credits
#vacuum_cost_page_miss = 10             # 0-10000 credits
#vacuum_cost_page_dirty = 20            # 0-10000 credits
#vacuum_cost_limit = 200                # 0-10000 credits

Is that the way to go to decrease impact?
Or should I try increasing maintenance_work_mem to 1GB?

A sum of all running processes from "ps auxw" shows about 3.5GB in "VSZ" and
1.5GB in "RSS".

I am also going to check if I have enough space to move the stage DB to the
second raid which shows very little activity in iostat.

Any other suggestions?

Re: Vacuums on large busy databases

From
Dave Cramer
Date:
On 14-Sep-06, at 11:23 AM, Francisco Reyes wrote:

> My setup:
> Freebsd 6.1
> Postgresql 8.1.4
> Memory: 8GB
> SATA Disks
> Raid 1 10 spindles (2 as hot spares)
> 500GB disks (16MB buffer), 7200 rpm
> Raid 10
>
> Raid 2 4 spindles
> 150GB 10K rpm disks
> Raid 10
>
> shared_buffers = 10000
shared buffers  should be considerably more, depending on what else
is running
> temp_buffers = 1500
> work_mem = 32768                # 32MB
> maintenance_work_mem = 524288   # 512MB
>
> checkpoint_segments = 64
> Just increased to 64 today.. after reading this may help. Was 5
> before.

What is effective_cache set to ?
>
> pg_xlog on second raid (which sees very little activity)
>
> Database sizes: 1 200GB+ Db and 2 100GB+
>
> I run 3 daily "vacuumdb -azv". The vacuums were taking 2 to 3 hours
why not just let  autovac do it's thing ?

.
> Recently we have started to do some data mass loading and now the
> vacuums are taking close to 5 hours AND it seems they may be
> slowing down the loads.
>
> These are not bulk loads in the sense that we don't have a big file
> that we can do a copy.. instead it is data which several programs
> are processing from some temporary tables so we have lots of
> inserts. There are also updates to keep track of some totals.
>
> I am looking to either improve the time of the vacuum or decrease
> it's impact on the loads.
> Are the variables:
> #vacuum_cost_delay = 0                  # 0-1000 milliseconds
> #vacuum_cost_page_hit = 1               # 0-10000 credits
> #vacuum_cost_page_miss = 10             # 0-10000 credits
> #vacuum_cost_page_dirty = 20            # 0-10000 credits
> #vacuum_cost_limit = 200                # 0-10000 credits
>
> Is that the way to go to decrease impact?
> Or should I try increasing maintenance_work_mem to 1GB?
>
> A sum of all running processes from "ps auxw" shows about 3.5GB in
> "VSZ" and 1.5GB in "RSS".
>
> I am also going to check if I have enough space to move the stage
> DB to the second raid which shows very little activity in iostat.
>
> Any other suggestions?
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>


Re: Vacuums on large busy databases

From
Francisco Reyes
Date:
Dave Cramer writes:

> What is effective_cache set to ?

Default of 1000. Was just reading about this parameter.
Will try increasing it to 8192 (8192 * 8K = 64MB)

> why not just let  autovac do it's thing ?

Have been playing with decresing the autovac values. With 100GB+ tables even
1% in autovacuum_vacuum_scale_factor is going to be 1GB.

Right now trying:
autovacuum_vacuum_threshold = 50000
autovacuum_analyze_threshold = 100000
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.1

Initially I had tried autovacuum_vacuum_scale_factor = 0.2 and that was not
enough. Had to end up bumping fsm_pages several times. After I started to do
the 3 daily vacuums they are holding steady.. perhaps I will try only 1
vacuum now that I decreased the threshold to 0.05

I would be curious what others have in their autovacuum parameters for
100GB+ databases

Re: Vacuums on large busy databases

From
Francisco Reyes
Date:
Dave Cramer writes:

> What is effective_cache set to ?

Increasing this seems to have helped significantly a web app. Load times
seem magnitudes faster.

Increased it to effective_cache_size = 12288 # 96MB

What is a reasonable number?
I estimate I have at least 1 to 2 GB free of memory.

Don't want to get too carried away right now with too many changes.. because
right now we have very few connections to that database (usually less than
10), but I expect it to go to a norm of 20+.. so need to make sure I won't
make changes that will be a problem in that scenario.

So far only see one setting that can be an issue: work_mem
so  have it set to only 32768.

Re: Vacuums on large busy databases

From
Dave Cramer
Date:
Francisco
On 14-Sep-06, at 1:36 PM, Francisco Reyes wrote:

> Dave Cramer writes:
>
>> What is effective_cache set to ?
>
> Increasing this seems to have helped significantly a web app. Load
> times seem magnitudes faster.
>
> Increased it to effective_cache_size = 12288 # 96MB
>
> What is a reasonable number?
> I estimate I have at least 1 to 2 GB free of memory.
You are using 6G of memory for something else ?

effective cache should be set to 75% of free memory
>
> Don't want to get too carried away right now with too many
> changes.. because right now we have very few connections to that
> database (usually less than 10), but I expect it to go to a norm of
> 20+.. so need to make sure I won't make changes that will be a
> problem in that scenario.
>
> So far only see one setting that can be an issue: work_mem so  have
> it set to only 32768.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>


Re: Vacuums on large busy databases

From
Francisco Reyes
Date:
Dave Cramer writes:

>> What is a reasonable number?
>> I estimate I have at least 1 to 2 GB free of memory.
> You are using 6G of memory for something else ?

Right now adding up from ps the memory I have about 2GB.
Have an occassional program which uses up to 2GB.

Then I want to give some breathing room for when we have more connections so
that work_mem doesn't make the macihne hit swap.
At 32MB say worst case scenario I may have 50 operations using those 32MB,
that's about 1.5GB.

2+2+1.5 = 5.5
So I believe I have free about 2.5GB

> effective cache should be set to 75% of free memory

So I will increase to 1.5GB then.

I may have more memory, but this is likely a safe value.

Thanks for the feedback.. Didn't even know about this setting.

Re: Vacuums on large busy databases

From
Michael Stone
Date:
On Thu, Sep 14, 2006 at 04:30:46PM -0400, Francisco Reyes wrote:
>Right now adding up from ps the memory I have about 2GB.

That's not how you find out how much memory you have. Try "free" or
somesuch.

Mike Stone


Re: Vacuums on large busy databases

From
Jeff Davis
Date:
On Thu, 2006-09-14 at 11:23 -0400, Francisco Reyes wrote:
> My setup:
> Freebsd 6.1
> Postgresql 8.1.4
> Memory: 8GB
> SATA Disks
>
> Raid 1 10 spindles (2 as hot spares)
> 500GB disks (16MB buffer), 7200 rpm
> Raid 10
>
> Raid 2 4 spindles
> 150GB 10K rpm disks
> Raid 10
>
> shared_buffers = 10000

Why so low? You have a lot of memory, and shared_buffers are an
important performance setting. I have a machine with 4GB of RAM, and I
found my best performance was around 150000 shared buffers, which is a
little more than 1GB.

The default value of 1000 was chosen so that people who use PostgreSQL
only incidentally among many other programs do not notice an impact on
their system. It should be drastically increased when using PostgreSQL
on a dedicated system, particularly with versions 8.1 and later.

Also, a VACUUM helps a table that gets UPDATEs and DELETEs. If you're
doing mostly inserts on a big table, there may be no need to VACUUM it 3
times per day. Try VACUUMing the tables that get more UPDATEs and
DELETEs more often, and if a table has few UPDATEs/DELETEs, VACUUM it
only occasionally. You can run ANALYZE more frequently on all the
tables, because it does not have to read the entire table and doesn't
interfere with the rest of the operations.

Regards,
    Jeff Davis


Re: Vacuums on large busy databases

From
Dave Cramer
Date:
Francisco
On 14-Sep-06, at 4:30 PM, Francisco Reyes wrote:

> Dave Cramer writes:
>
>>> What is a reasonable number?
>>> I estimate I have at least 1 to 2 GB free of memory.
>> You are using 6G of memory for something else ?
>
> Right now adding up from ps the memory I have about 2GB.
> Have an occassional program which uses up to 2GB.
>
> Then I want to give some breathing room for when we have more
> connections so that work_mem doesn't make the macihne hit swap.
> At 32MB say worst case scenario I may have 50 operations using
> those 32MB, that's about 1.5GB.
>
> 2+2+1.5 = 5.5
> So I believe I have free about 2.5GB
>> effective cache should be set to 75% of free memory
>
> So I will increase to 1.5GB then.
personally, I'd set this to about 6G. This doesn't actually consume
memory it is just a setting to tell postgresql how much memory is
being used for cache and kernel buffers
>
> I may have more memory, but this is likely a safe value.
>
> Thanks for the feedback.. Didn't even know about this setting.
regarding shared buffers I'd make this much bigger, like 2GB or more

dave
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


Re: Vacuums on large busy databases

From
Francisco Reyes
Date:
Jeff Davis writes:

>> shared_buffers = 10000
>
> Why so low?

My initial research was not thorough enough with regards to how to compute
how many to use.

  You have a lot of memory, and shared_buffers are an
> important performance setting. I have a machine with 4GB of RAM, and I
> found my best performance was around 150000 shared buffers, which is a
> little more than 1GB.

Going to make it 256,000 (2GB)

> on a dedicated system, particularly with versions 8.1 and later.

Was reading that. Seems to be that around 1/4 of real memory is a good
starting point.

> Also, a VACUUM helps a table that gets UPDATEs and DELETEs. If you're
> doing mostly inserts on a big table, there may be no need to VACUUM it 3
> times per day. Try VACUUMing the tables that get more UPDATEs and
> DELETEs more often, and if a table has few UPDATEs/DELETEs, VACUUM it
> only occasionally.

Will have to talk to the developers. In particular for every insert there
are updates. I know they have at least one table that gets udpated to have
summarized totals.

One of the reasons I was doing the vacuumdb of the entire DB was to get the
number of shared-buffers. Now that I have an idea of how much I need I will
likely do something along the lines of what you suggest. One full for
everything at night and during the days perhaps do the tables that get more
updated. I also set more aggresive values on autovacuum so that should help
some too.

> You can run ANALYZE more frequently on all the
> tables, because it does not have to read the entire table and doesn't
> interfere with the rest of the operations.

On a related question. Right now I have my autovacuums set as:
autovacuum_vacuum_threshold = 50000
autovacuum_analyze_threshold = 100000
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.1

Based on what you described above then I could set my analyze values to the
same as the vacuum to have something like
autovacuum_vacuum_threshold = 50000
autovacuum_analyze_threshold = 50000
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.05

For DBs with hundreds of GBs would it be better to get
autovacuum_analyze_scale_factor to even 0.01? The permanent DB is over 200GB
and growing.. the 100GB ones are staging.. By the time we have finished
migrating all the data from the old system it will be at least 300GB. 0.01
is still 3GB.. pretty sizable.

Do the thresholds tabke presedence over the scale factors? Is it basically
if either one of them gets hit that the action will take place?

Re: Vacuums on large busy databases

From
Francisco Reyes
Date:
Dave Cramer writes:

> personally, I'd set this to about 6G. This doesn't actually consume
> memory it is just a setting to tell postgresql how much memory is
> being used for cache and kernel buffers

Gotcha. Will increase further.


> regarding shared buffers I'd make this much bigger, like 2GB or more

Will do 2GB on the weekend. From what I read this requires shared memory so
have to restart my machine (FreeBSD).

if I plan to give shared buffers 2GB, how much more over that should I give
the total shared memory kern.ipc.shmmax? 2.5GB?

Also will shared buffers impact inserts/updates at all?
I wish the postgresql.org site docs would mention what will be impacted.

Comments like: This setting must be at least 16, as well as at least twice
the value of max_connections; however, settings significantly higher than
the minimum are usually needed for good performance.

Are usefull, but could use some improvement.. increase on what? All
performance? inserts? updates? selects?

For instance, increasing effective_cache_size has made a noticeable
difference in selects. However as I talk to the developers we are still
doing marginally in the inserts. About 150/min.

There is spare CPU cycles, both raid cards are doing considerably less they
can do.. so next I am going to try and research what parameters I need to
bump to increase inserts. Today I increased checkpoint_segments from the
default to 64. Now looking at wall_buffers.

It would be most helpfull to have something on the docs to specify what each
setting affects most such as reads, writes, updates, inserts, etc..

Re: Vacuums on large busy databases

From
Francisco Reyes
Date:
Michael Stone writes:

> On Thu, Sep 14, 2006 at 04:30:46PM -0400, Francisco Reyes wrote:
>>Right now adding up from ps the memory I have about 2GB.
>
> That's not how you find out how much memory you have. Try "free" or
> somesuch.

Wasn't trying to get an accurate value, just a ballpark figure.

When you say "free" are you refering to the free value from top? or some
program called free?

Re: Vacuums on large busy databases

From
Dave Cramer
Date:
On 14-Sep-06, at 7:50 PM, Francisco Reyes wrote:

> Dave Cramer writes:
>
>> personally, I'd set this to about 6G. This doesn't actually
>> consume  memory it is just a setting to tell postgresql how much
>> memory is  being used for cache and kernel buffers
>
> Gotcha. Will increase further.
>
>> regarding shared buffers I'd make this much bigger, like 2GB or more
>
> Will do 2GB on the weekend. From what I read this requires shared
> memory so have to restart my machine (FreeBSD).
>
> if I plan to give shared buffers 2GB, how much more over that
> should I give the total shared memory kern.ipc.shmmax? 2.5GB?

I generally make it slightly bigger. is shmmax the size of the
maximum chunk allowed or the total ?
>
> Also will shared buffers impact inserts/updates at all?
> I wish the postgresql.org site docs would mention what will be
> impacted.
Yes, it will, however not as dramatically as what you are seeing with
effective_cache
>
> Comments like: This setting must be at least 16, as well as at
> least twice the value of max_connections; however, settings
> significantly higher than the minimum are usually needed for good
> performance.
>
> Are usefull, but could use some improvement.. increase on what? All
> performance? inserts? updates? selects?
>
> For instance, increasing effective_cache_size has made a noticeable
> difference in selects. However as I talk to the developers we are
> still doing marginally in the inserts. About 150/min.
The reason is that with effective_cache the select plans changed (for
the better) ; it's unlikely that the insert plans will change.
>
> There is spare CPU cycles, both raid cards are doing considerably
> less they can do.. so next I am going to try and research what
> parameters I need to bump to increase inserts. Today I increased
> checkpoint_segments from the default to 64. Now looking at
> wall_buffers.
>
> It would be most helpfull to have something on the docs to specify
> what each setting affects most such as reads, writes, updates,
> inserts, etc..
It's an art unfortunately.
>

Dave


Re: Vacuums on large busy databases

From
Michael Stone
Date:
On Thu, Sep 14, 2006 at 08:04:39PM -0400, Francisco Reyes wrote:
>Wasn't trying to get an accurate value, just a ballpark figure.

Won't even be a ballpark.

>When you say "free" are you refering to the free value from top? or some
>program called free?

Depends on your OS.

Mike Stone

Re: Vacuums on large busy databases

From
Jeff Davis
Date:
On Thu, 2006-09-14 at 19:30 -0400, Francisco Reyes wrote:
> Will have to talk to the developers. In particular for every insert there
> are updates. I know they have at least one table that gets udpated to have
> summarized totals.
>

If the table being updated is small, you have no problems at all. VACUUM
that table frequently, and the big tables rarely. If the big tables are
only INSERTs and SELECTs, the only reason to VACUUM is to avoid the xid
wraparound. See:

<http://www.postgresql.org/docs/8.1/static/maintenance.html>

See which tables need VACUUM, and how often. Use the statistics to see
if VACUUMing will gain you anything before you do it.

> One of the reasons I was doing the vacuumdb of the entire DB was to get the
> number of shared-buffers. Now that I have an idea of how much I need I will
> likely do something along the lines of what you suggest. One full for
> everything at night and during the days perhaps do the tables that get more
> updated. I also set more aggresive values on autovacuum so that should help
> some too.

Why VACUUM FULL? That is generally not needed. Re-evaluate whether
you're gaining things with all these VACUUMs.

> > You can run ANALYZE more frequently on all the
> > tables, because it does not have to read the entire table and doesn't
> > interfere with the rest of the operations.
>
> On a related question. Right now I have my autovacuums set as:
> autovacuum_vacuum_threshold = 50000
> autovacuum_analyze_threshold = 100000
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.1
>
> Based on what you described above then I could set my analyze values to the
> same as the vacuum to have something like
> autovacuum_vacuum_threshold = 50000
> autovacuum_analyze_threshold = 50000
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.05
>
> For DBs with hundreds of GBs would it be better to get
> autovacuum_analyze_scale_factor to even 0.01? The permanent DB is over 200GB
> and growing.. the 100GB ones are staging.. By the time we have finished
> migrating all the data from the old system it will be at least 300GB. 0.01
> is still 3GB.. pretty sizable.

Just test how long an ANALYZE takes, and compare that to how quickly
your statistics get out of date. As long as postgres is choosing correct
plans, you are ANALYZE-ing often enough.

ANALYZE takes statistical samples to avoid reading the whole table, so
it's really not a major influence on performance in my experience.

> Do the thresholds tabke presedence over the scale factors? Is it basically
> if either one of them gets hit that the action will take place?

u = number of tuples UPDATE-ed or DELETE-ed (i.e. dead tuples)
r = the (estimated) number of total live tuples in the relation

In a loop, autovacuum checks to see if u >
(r*autovacuum_vacuum_scale_factor + autovacuum_vacuum_threshold), and if
so, it runs VACUUM. If not, it sleeps. It works the same way for
ANALYZE.

So, in a large table, the scale_factor is the dominant term. In a small
table, the threshold is the dominant term. But both are taken into
account.

Regards,
    Jeff Davis


Re: Vacuums on large busy databases

From
Jeff Davis
Date:
On Thu, 2006-09-14 at 20:04 -0400, Francisco Reyes wrote:
> Michael Stone writes:
>
> > On Thu, Sep 14, 2006 at 04:30:46PM -0400, Francisco Reyes wrote:
> >>Right now adding up from ps the memory I have about 2GB.
> >
> > That's not how you find out how much memory you have. Try "free" or
> > somesuch.
>
> Wasn't trying to get an accurate value, just a ballpark figure.
>
> When you say "free" are you refering to the free value from top? or some
> program called free?
>

Any long-running system will have very little "free" memory. Free memory
is wasted memory, so the OS finds some use for it.

The VM subsystem of an OS uses many tricks, including the sharing of
memory among processes and the disk buffer cache (which is shared also).
It's hard to put a number on the memory demands of a given process, and
it's also hard to put a number on the ability of a system to accommodate
a new process with new memory demands.

You have 8GB total, which sounds like plenty to me. Keep in mind that if
you have the shared_memory all allocated on physical memory (i.e.
"kern.ipc.shm_use_phys: 1" on FreeBSD), then that amount of physical
memory will never be available to processes other than postgres. At 2GB,
that still leaves 6GB for the other process, so you should be fine.

Regards,
    Jeff Davis



Re: Vacuums on large busy databases

From
Jeff Davis
Date:
On Thu, 2006-09-14 at 20:07 -0400, Dave Cramer wrote:
> On 14-Sep-06, at 7:50 PM, Francisco Reyes wrote:
>
> > Dave Cramer writes:
> >
> >> personally, I'd set this to about 6G. This doesn't actually
> >> consume  memory it is just a setting to tell postgresql how much
> >> memory is  being used for cache and kernel buffers
> >
> > Gotcha. Will increase further.
> >
> >> regarding shared buffers I'd make this much bigger, like 2GB or more
> >
> > Will do 2GB on the weekend. From what I read this requires shared
> > memory so have to restart my machine (FreeBSD).
> >
> > if I plan to give shared buffers 2GB, how much more over that
> > should I give the total shared memory kern.ipc.shmmax? 2.5GB?
>
> I generally make it slightly bigger. is shmmax the size of the
> maximum chunk allowed or the total ?

That's the total on FreeBSD, per process. I think to allow more than 2GB
there you may need a special compile option in the kernel.

> > Also will shared buffers impact inserts/updates at all?
> > I wish the postgresql.org site docs would mention what will be
> > impacted.
> Yes, it will, however not as dramatically as what you are seeing with
> effective_cache
> >
> > Comments like: This setting must be at least 16, as well as at
> > least twice the value of max_connections; however, settings
> > significantly higher than the minimum are usually needed for good
> > performance.
> >
> > Are usefull, but could use some improvement.. increase on what? All
> > performance? inserts? updates? selects?
> >
> > For instance, increasing effective_cache_size has made a noticeable
> > difference in selects. However as I talk to the developers we are
> > still doing marginally in the inserts. About 150/min.
> The reason is that with effective_cache the select plans changed (for
> the better) ; it's unlikely that the insert plans will change.

There aren't multiple INSERT plans (however, there could be a subselect
or something, which would be planned separately). INSERT is INSERT. That
means effective_cache_size will have zero effect on INSERT.

Regards,
    Jeff Davis


Re: Vacuums on large busy databases

From
Michael Stone
Date:
On Thu, Sep 14, 2006 at 05:52:02PM -0700, Jeff Davis wrote:
>Any long-running system will have very little "free" memory. Free memory
>is wasted memory, so the OS finds some use for it.

The important part of the output of "free" in this context isn't how
much is free, it's how much is cache vs how much is allocated to
programs. Other os's have other ways of telling the same thing. Neither
of those numbers generally has much to do with how much shows up in ps
when large amounts of shared memory are in use.

Mike Stone

Re: Vacuums on large busy databases

From
Jeff Davis
Date:
On Thu, 2006-09-14 at 19:50 -0400, Francisco Reyes wrote:
> > regarding shared buffers I'd make this much bigger, like 2GB or more
>
> Will do 2GB on the weekend. From what I read this requires shared memory so
> have to restart my machine (FreeBSD).
>

You should be able to do:
# sysctl -w kern.ipc.shmmax=2147483647

> if I plan to give shared buffers 2GB, how much more over that should I give
> the total shared memory kern.ipc.shmmax? 2.5GB?
>

To get it higher than 2GB, you may need to recompile the kernel, but you
should be able to get 2GB without a restart.

> Also will shared buffers impact inserts/updates at all?
> I wish the postgresql.org site docs would mention what will be impacted.
>

They will not have a real impact on INSERTs, because an INSERT still has
to be logged in the WAL before commit. Technically, it may make a
difference, but I would not expect much.

shared_buffers has a big impact on UPDATEs, because an UPDATE needs to
find the record to UPDATE first. An UPDATE is basically a DELETE and an
INSERT in one transaction.

> Comments like: This setting must be at least 16, as well as at least twice
> the value of max_connections; however, settings significantly higher than
> the minimum are usually needed for good performance.
>
> Are usefull, but could use some improvement.. increase on what? All
> performance? inserts? updates? selects?

More shared_buffers means fewer reads from disk. If you have 10MB worth
of tables, having 100MB worth of shared buffers is useless because they
will be mostly empty. However, if you have 100MB of shared buffers and
you access records randomly from a 100 petabyte database, increasing
shared_buffers to 200MB doesn't help much, because the chances that the
record you need is in a shared buffer already are almost zero.

Shared buffers are a cache, pure and simple. When you have "locality of
reference", caches are helpful. Sometimes that's temporal locality (if
you are likely to access data that you recently accessed), and sometimes
that's spatial locality (if you access block 10, you're likely to access
block 11). If you have "locality of referece" -- and almost every
database does -- shared_buffers help.

> For instance, increasing effective_cache_size has made a noticeable
> difference in selects. However as I talk to the developers we are still
> doing marginally in the inserts. About 150/min.

effective_cache_size affects only the plan generated. INSERTs aren't
planned because, well, it's an INSERT and there's only one thing to do
and only one way to do it.

> There is spare CPU cycles, both raid cards are doing considerably less they
> can do.. so next I am going to try and research what parameters I need to
> bump to increase inserts. Today I increased checkpoint_segments from the
> default to 64. Now looking at wall_buffers.

You won't see any amazing increases from those. You can improve INSERTs
a lot if you have a battery-backed cache on your RAID card and set it to
WriteBack mode (make sure to disable disk caches though, those aren't
battery backed and you could lose data). If you do this, you should be
able to do 1000's of inserts per second.

Another thing to look at is "commit_delay". If you are trying to commit
many INSERTs at once, normally they will be fsync()d individually, which
is slow. However, by adding a commit delay, postgres can batch a few
inserts into one fsync() call, which can help a lot.

> It would be most helpfull to have something on the docs to specify what each
> setting affects most such as reads, writes, updates, inserts, etc..

I agree that they could be improved. It gets complicated quickly though,
and it's hard to generalize the effect that a performance setting will
have. They are all very interdependent.

Regards,
    Jeff Davis



Re: Vacuums on large busy databases

From
Jeff Davis
Date:
On Thu, 2006-09-14 at 21:04 -0400, Michael Stone wrote:
> On Thu, Sep 14, 2006 at 05:52:02PM -0700, Jeff Davis wrote:
> >Any long-running system will have very little "free" memory. Free memory
> >is wasted memory, so the OS finds some use for it.
>
> The important part of the output of "free" in this context isn't how
> much is free, it's how much is cache vs how much is allocated to
> programs. Other os's have other ways of telling the same thing. Neither
> of those numbers generally has much to do with how much shows up in ps
> when large amounts of shared memory are in use.

Right, ps doesn't give you much help. But he didn't tell us about the
process. If a process is using all the buffer cache, and you take away
that memory, it could turn all the reads that previously came from the
buffer cache into disk reads, leading to major slowdown and interference
with the database.

Conversely, if you have a large program running, it may not use much of
it's own memory, and perhaps some rarely-accessed pages could be paged
out in favor of more buffer cache. So even if all your memory is taken
with resident programs, your computer may easily accommodate more
processes by paging out rarely-used process memory.

If he knows a little more about the process than he can make a better
determination. But I don't think it will be much of a problem with 8GB
of physical memory.

Regards,
    Jeff Davis


Re: Vacuums on large busy databases

From
Markus Schaber
Date:
Hi, Francisco,

Francisco Reyes wrote:

> I am looking to either improve the time of the vacuum or decrease it's
> impact on the loads.
> Are the variables:
> #vacuum_cost_delay = 0                  # 0-1000 milliseconds
> #vacuum_cost_page_hit = 1               # 0-10000 credits
> #vacuum_cost_page_miss = 10             # 0-10000 credits
> #vacuum_cost_page_dirty = 20            # 0-10000 credits
> #vacuum_cost_limit = 200                # 0-10000 credits

Just to avoid a silly mistake:

You pasted those settings with # sign, that means that PostgreSQL does
treat them as comments, and uses the defaults instead. You should make
shure that you use "real" settings in your config.

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: Vacuums on large busy databases

From
"Jim C. Nasby"
Date:
On Thu, Sep 14, 2006 at 11:23:01AM -0400, Francisco Reyes wrote:
> My setup:
> Freebsd 6.1
> Postgresql 8.1.4
> Memory: 8GB
> SATA Disks
>
> Raid 1 10 spindles (2 as hot spares)
> 500GB disks (16MB buffer), 7200 rpm
> Raid 10
>
> Raid 2 4 spindles
> 150GB 10K rpm disks
> Raid 10
>
> shared_buffers = 10000
> temp_buffers = 1500
> work_mem = 32768                # 32MB
> maintenance_work_mem = 524288   # 512MB
>
> checkpoint_segments = 64
> Just increased to 64 today.. after reading this may help. Was 5 before.
>
> pg_xlog on second raid (which sees very little activity)

BTW, on some good raid controllers (with battery backup and
write-caching), putting pg_xlog on a seperate partition doesn't really
help, so you might want to try combining everything.

Even if you stay with 2 partitions, I'd cut pg_xlog back to just a
simple mirror.
--
Jim Nasby                                    jimn@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Vacuums on large busy databases

From
Francisco Reyes
Date:
Jim C. Nasby writes:

> BTW, on some good raid controllers (with battery backup and
> write-caching), putting pg_xlog on a seperate partition doesn't really
> help, so you might want to try combining everything.

Planning to put a busy database on second raid or perhaps some index files.
So far the second raid is highly under utilized.

> Even if you stay with 2 partitions, I'd cut pg_xlog back to just a
> simple mirror.

I am considering to put the pg_xlog back to the main raid. Primarily because
we have two hot spares.. on top of RAID 10.. so it is safer.