Re: Bug: Buffer cache is not scan resistant - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Bug: Buffer cache is not scan resistant
Date
Msg-id 1173220058.3760.2140.camel@silverbirch.site
Whole thread Raw
In response to Re: Bug: Buffer cache is not scan resistant  (Sherry Moore <sherry.moore@Sun.COM>)
Responses Re: Bug: Buffer cache is not scan resistant  (Sherry Moore <sherry.moore@sun.com>)
List pgsql-hackers
On Mon, 2007-03-05 at 21:34 -0800, Sherry Moore wrote:

>     - Based on a lot of the benchmarks and workloads I traced, the
>       target buffer of read operations are typically accessed again
>       shortly after the read, while writes are usually not.  Therefore,
>       the default operation mode is to bypass L2 for writes, but not
>       for reads.

Hi Sherry,

I'm trying to relate what you've said to how we should proceed from
here. My understanding of what you've said is:

- Tom's assessment that the observed performance quirk could be fixed in
the OS kernel is correct and you have the numbers to prove it

- currently Solaris only does NTA for 128K reads, which we don't
currently do. If we were to request 16 blocks at time, we would get this
benefit on Solaris, at least. The copyout_max_cached parameter can be
patched, but isn't a normal system tunable.

- other workloads you've traced *do* reuse the same buffer again very
soon afterwards when reading sequentially (not writes). Reducing the
working set size is an effective technique in improving performance if
we don't have a kernel that does NTA or we don't read in big enough
chunks (we need both to get NTA to kick in).

and what you haven't said

- all of this is orthogonal to the issue of buffer cache spoiling in
PostgreSQL itself. That issue does still exist as a non-OS issue, but
we've been discussing in detail the specific case of L2 cache effects
with specific kernel calls. All of the test results have been
stand-alone, so we've not done any measurements in that area. I say this
because you make the point that reducing the working set size of write
workloads has no effect on the L2 cache issue, but ISTM its still
potentially a cache spoiling issue.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: "Joris Dobbelsteen"
Date:
Subject: Re: Auto creation of Partitions
Next
From: "Simon Riggs"
Date:
Subject: Re: Plan invalidation vs. unnamed prepared statements