Re: Postgres 7.3.1 poor insert/update/search performance - Mailing list pgsql-performance

From Andrew Sullivan
Subject Re: Postgres 7.3.1 poor insert/update/search performance
Date
Msg-id 20030122070524.F27014@mail.libertyrms.com
Whole thread Raw
In response to Re: Postgres 7.3.1 poor insert/update/search performance  (Brian Hirt <bhirt@mobygames.com>)
Responses Re: Postgres 7.3.1 poor insert/update/search performance
List pgsql-performance
On Tue, Jan 21, 2003 at 06:44:57PM -0700, Brian Hirt wrote:
>
> So my question is:  if the kernel is caching all this data, what's the
> benefit of setting this to 1000 or higher?   Why wouldn't i just set it
> to 0 if I believe my kernel is doing a good job.

If Postgres tries to fetch a bit of data which is in its own shared
buffer, it does not even need to make a system call in order to fetch
it.  The data fetch is extremely fast.

The problem is that managing that shared memory comes at some cost.

If the data is not in a shared buffer, then Postgres makes exactly
the same call, no matter what, to the OS kernel, asking for the data
from disk.  It might happen, however, that the kernel will have the
data in its disk cache, however.  The total cost of the operation,
therefore, is much lower in case the data is in the kernel's disk
cache than in the case where it is actually on the disk.  It is
nevertheless still higher (atomically speaking) than fetching the
data from Postgres's own shared buffer.

So the question is this: where is the "sweet spot" where it costs
little enough for Postgres to manage the shared buffer that the
reduced cost of a system call is worth it.  (As you point out, this
caclulation is complicated by the potential to waste memory by
caching the data twice -- once in the shared buffer and once in the
disk cache.  Some systems, like Solaris, allow you to turn off the
disk cache, so the problem may not be one you face.)  The trouble is
that there is no agreement on the answer to that question, and
precious little evidence which seems to settle the question.

The only way to determine the best setting, then, is to use your
system with more-or-less simulated production loads, and see where
the best setting lies.  You have to do this over time, because
sometimes inefficiencies turn up only after running for a while.  In
an experiment we tried, we used a 2G shared buffer on a 12G machine.
It looked brilliantly fast at first, but 48 hours later was
_crawling_; that indicates a problem with shared-buffer management on
the part of Postgres, I guess, but it was hard to say more than that.
We ultimately settled on a value somewhere less than 1 G as
appropriate for our use.  But if I had to justify the number I picked
(as opposed to one a few hundred higher or lower), I'd have a tough
time.

>  From all the discussion on this topic, it's still not clear to me how
> to calculate what value this should be set at and why.  I've read these
> documents and others and have yet to find explanations and
> recommendations that i can use.

I'm afraid what I'm saying is that it's a bit of a black art.  The
pg_autotune project is an attempt to help make this a little more
scientific.  It relies on pgbench, which has its own problems,
however.

Hope that's helpful, but I fear it doesn't give you the answer you'd
like.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


pgsql-performance by date:

Previous
From: Seth Robertson
Date:
Subject: Re: Postgres 7.3.1 poor insert/update/search performance
Next
From: Tom Lane
Date:
Subject: Re: Postgres 7.3.1 poor insert/update/search performance