Re: Vacuums on large busy databases - Mailing list pgsql-performance

From Jeff Davis
Subject Re: Vacuums on large busy databases
Date
Msg-id 1158284062.29889.234.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Vacuums on large busy databases  (Francisco Reyes <lists@stringsutils.com>)
List pgsql-performance
On Thu, 2006-09-14 at 19:50 -0400, Francisco Reyes wrote:
> > regarding shared buffers I'd make this much bigger, like 2GB or more
>
> Will do 2GB on the weekend. From what I read this requires shared memory so
> have to restart my machine (FreeBSD).
>

You should be able to do:
# sysctl -w kern.ipc.shmmax=2147483647

> if I plan to give shared buffers 2GB, how much more over that should I give
> the total shared memory kern.ipc.shmmax? 2.5GB?
>

To get it higher than 2GB, you may need to recompile the kernel, but you
should be able to get 2GB without a restart.

> Also will shared buffers impact inserts/updates at all?
> I wish the postgresql.org site docs would mention what will be impacted.
>

They will not have a real impact on INSERTs, because an INSERT still has
to be logged in the WAL before commit. Technically, it may make a
difference, but I would not expect much.

shared_buffers has a big impact on UPDATEs, because an UPDATE needs to
find the record to UPDATE first. An UPDATE is basically a DELETE and an
INSERT in one transaction.

> Comments like: This setting must be at least 16, as well as at least twice
> the value of max_connections; however, settings significantly higher than
> the minimum are usually needed for good performance.
>
> Are usefull, but could use some improvement.. increase on what? All
> performance? inserts? updates? selects?

More shared_buffers means fewer reads from disk. If you have 10MB worth
of tables, having 100MB worth of shared buffers is useless because they
will be mostly empty. However, if you have 100MB of shared buffers and
you access records randomly from a 100 petabyte database, increasing
shared_buffers to 200MB doesn't help much, because the chances that the
record you need is in a shared buffer already are almost zero.

Shared buffers are a cache, pure and simple. When you have "locality of
reference", caches are helpful. Sometimes that's temporal locality (if
you are likely to access data that you recently accessed), and sometimes
that's spatial locality (if you access block 10, you're likely to access
block 11). If you have "locality of referece" -- and almost every
database does -- shared_buffers help.

> For instance, increasing effective_cache_size has made a noticeable
> difference in selects. However as I talk to the developers we are still
> doing marginally in the inserts. About 150/min.

effective_cache_size affects only the plan generated. INSERTs aren't
planned because, well, it's an INSERT and there's only one thing to do
and only one way to do it.

> There is spare CPU cycles, both raid cards are doing considerably less they
> can do.. so next I am going to try and research what parameters I need to
> bump to increase inserts. Today I increased checkpoint_segments from the
> default to 64. Now looking at wall_buffers.

You won't see any amazing increases from those. You can improve INSERTs
a lot if you have a battery-backed cache on your RAID card and set it to
WriteBack mode (make sure to disable disk caches though, those aren't
battery backed and you could lose data). If you do this, you should be
able to do 1000's of inserts per second.

Another thing to look at is "commit_delay". If you are trying to commit
many INSERTs at once, normally they will be fsync()d individually, which
is slow. However, by adding a commit delay, postgres can batch a few
inserts into one fsync() call, which can help a lot.

> It would be most helpfull to have something on the docs to specify what each
> setting affects most such as reads, writes, updates, inserts, etc..

I agree that they could be improved. It gets complicated quickly though,
and it's hard to generalize the effect that a performance setting will
have. They are all very interdependent.

Regards,
    Jeff Davis



pgsql-performance by date:

Previous
From: Michael Stone
Date:
Subject: Re: Vacuums on large busy databases
Next
From: Jeff Davis
Date:
Subject: Re: Vacuums on large busy databases