Re: Quad Xeon vs. Dual Itanium - Mailing list pgsql-general

From Christopher Browne
Subject Re: Quad Xeon vs. Dual Itanium
Date
Msg-id m3znblydzz.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to Re: Quad Xeon vs. Dual Itanium  ("Dann Corbit" <DCorbit@connx.com>)
List pgsql-general
Martha Stewart called it a Good Thing when DCorbit@connx.com ("Dann Corbit") wrote:
>> -----Original Message-----
>> From: Andrew Sullivan [mailto:ajs@crankycanuck.ca]
>> Sent: Friday, February 13, 2004 9:05 PM
>> To: pgsql-general@postgresql.org
>> Subject: Re: [GENERAL] Quad Xeon vs. Dual Itanium
>>
>> On Fri, Feb 13, 2004 at 10:46:18PM -0500, Tom Lane wrote:
>>
>> > Quite honestly, I suspect we may be wasting our time hacking the
>> > Postgres buffer replacement algorithm at all.  There are a bunch
>> > of reasons why the PG shared buffer arena should never be more
>> > than a small fraction of physical RAM, and under those conditions
>> > the cache replacement algorithm that will matter is the kernel's,
>> > not ours.
>>
>> Well, unless the Postgres cache is more efficient than the OS's,
>> no?. You could then use the nocache filesystem option, and just let
>> Postgres handle the whole thing.  Of course, that's a pretty big
>> unless, and not one that I'm volunteering to make go away!
>
> Most database systems I have tried scale very well with increased
> memory.
> For instance, Oracle, and SQL*Server will definitely benefit greatly by
> adding more memory.  I suspect (therefore) that there must be some way
> to squeeze some benefit out of it.

You'll certainly "squeeze _some_ benefit" out of increased memory used
for cacheing.

The troublesome question is whether or not you win more by having:
 a) More cache managed by PG, or
 b) More cache managed by the OS.

There are certain "use cases" where we know that PG can do better.

For instance, if you're vacuuming a database, you _know_ that PG is
walking through all of the data in the entire database, and you _know_
that this just happens once.  There is no 'locality of reference'; the
vacuum will look at every page once, and not return to it.

In that case, what would be ideal is for data read in from disk to be
treated as "flushable."  We're reading that data once; there's no
reason to expect it to be re-read.  Might as well read it in page by
page and throw out a page every time you read one, so that there's
only one page of "vacuuming work" consuming memory.

One of Jan's cache management changes involves using that very
strategy for the PostgreSQL buffer.  Pages brought in by VACUUM get
thrown into the "least-recently-used" location even though they are
"most-recently-used" because you _know_ that the data isn't
particularly interesting to keep around, certainly less so than the
data already cached.

That change doesn't touch the OS buffering, and so we'll still find
that a VACUUM will tend to evict commonly-used data from cache.

That all adds up to there being some incentive to want more control
over OS cache.  Wouldn't it be nice to be able to tell the OS: "This
page isn't very important; treat it as LRU"?  Unfortunately, doing
that is troublesome, at best.  There aren't good "hooks" to control
that.  Certainly not portable ones.

The "big name" guys often prefer to store data on raw partitions,
without OS cacheing, which means they set up Really Enormous DBMS
buffers, and manage inclusion into/eviction from cache themselves.

We might conceivably convince Linus Torvalds to include something in
Linux, but that would worsen things, in a way, because it would
probably lead to a code fork between "PostgreSQL for Linux" and
"PostgreSQL for portable platforms."  (Substitute something else for
Linux and the adverse fork merely changes names...)
--
select 'cbbrowne' || '@' || 'acm.org';
http://cbbrowne.com/info/internet.html
E.V.A., pod 5, launching...

pgsql-general by date:

Previous
From: Willem Herremans
Date:
Subject: Providing the password to psql from a script
Next
From: Christopher Browne
Date:
Subject: Re: Quad Xeon vs. Dual Itanium