Re: Postgres 7.3.1 poor insert/update/search performance - Mailing list pgsql-performance
From | Andrew Sullivan |
---|---|
Subject | Re: Postgres 7.3.1 poor insert/update/search performance |
Date | |
Msg-id | 20030122070524.F27014@mail.libertyrms.com Whole thread Raw |
In response to | Re: Postgres 7.3.1 poor insert/update/search performance (Brian Hirt <bhirt@mobygames.com>) |
Responses |
Re: Postgres 7.3.1 poor insert/update/search performance
|
List | pgsql-performance |
On Tue, Jan 21, 2003 at 06:44:57PM -0700, Brian Hirt wrote: > > So my question is: if the kernel is caching all this data, what's the > benefit of setting this to 1000 or higher? Why wouldn't i just set it > to 0 if I believe my kernel is doing a good job. If Postgres tries to fetch a bit of data which is in its own shared buffer, it does not even need to make a system call in order to fetch it. The data fetch is extremely fast. The problem is that managing that shared memory comes at some cost. If the data is not in a shared buffer, then Postgres makes exactly the same call, no matter what, to the OS kernel, asking for the data from disk. It might happen, however, that the kernel will have the data in its disk cache, however. The total cost of the operation, therefore, is much lower in case the data is in the kernel's disk cache than in the case where it is actually on the disk. It is nevertheless still higher (atomically speaking) than fetching the data from Postgres's own shared buffer. So the question is this: where is the "sweet spot" where it costs little enough for Postgres to manage the shared buffer that the reduced cost of a system call is worth it. (As you point out, this caclulation is complicated by the potential to waste memory by caching the data twice -- once in the shared buffer and once in the disk cache. Some systems, like Solaris, allow you to turn off the disk cache, so the problem may not be one you face.) The trouble is that there is no agreement on the answer to that question, and precious little evidence which seems to settle the question. The only way to determine the best setting, then, is to use your system with more-or-less simulated production loads, and see where the best setting lies. You have to do this over time, because sometimes inefficiencies turn up only after running for a while. In an experiment we tried, we used a 2G shared buffer on a 12G machine. It looked brilliantly fast at first, but 48 hours later was _crawling_; that indicates a problem with shared-buffer management on the part of Postgres, I guess, but it was hard to say more than that. We ultimately settled on a value somewhere less than 1 G as appropriate for our use. But if I had to justify the number I picked (as opposed to one a few hundred higher or lower), I'd have a tough time. > From all the discussion on this topic, it's still not clear to me how > to calculate what value this should be set at and why. I've read these > documents and others and have yet to find explanations and > recommendations that i can use. I'm afraid what I'm saying is that it's a bit of a black art. The pg_autotune project is an attempt to help make this a little more scientific. It relies on pgbench, which has its own problems, however. Hope that's helpful, but I fear it doesn't give you the answer you'd like. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
pgsql-performance by date: