On Mon, 2007-03-05 at 21:34 -0800, Sherry Moore wrote:
> - Based on a lot of the benchmarks and workloads I traced, the
> target buffer of read operations are typically accessed again
> shortly after the read, while writes are usually not. Therefore,
> the default operation mode is to bypass L2 for writes, but not
> for reads.
Hi Sherry,
I'm trying to relate what you've said to how we should proceed from
here. My understanding of what you've said is:
- Tom's assessment that the observed performance quirk could be fixed in
the OS kernel is correct and you have the numbers to prove it
- currently Solaris only does NTA for 128K reads, which we don't
currently do. If we were to request 16 blocks at time, we would get this
benefit on Solaris, at least. The copyout_max_cached parameter can be
patched, but isn't a normal system tunable.
- other workloads you've traced *do* reuse the same buffer again very
soon afterwards when reading sequentially (not writes). Reducing the
working set size is an effective technique in improving performance if
we don't have a kernel that does NTA or we don't read in big enough
chunks (we need both to get NTA to kick in).
and what you haven't said
- all of this is orthogonal to the issue of buffer cache spoiling in
PostgreSQL itself. That issue does still exist as a non-OS issue, but
we've been discussing in detail the specific case of L2 cache effects
with specific kernel calls. All of the test results have been
stand-alone, so we've not done any measurements in that area. I say this
because you make the point that reducing the working set size of write
workloads has no effect on the L2 cache issue, but ISTM its still
potentially a cache spoiling issue.
-- Simon Riggs EnterpriseDB http://www.enterprisedb.com