Re: Large tables (was: RAID 0 not as fast as - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Large tables (was: RAID 0 not as fast as
Date
Msg-id 20060922140113.GW28987@nasby.net
Whole thread Raw
In response to Re: Large tables (was: RAID 0 not as fast as  ("Luke Lonergan" <llonergan@greenplum.com>)
Responses Re: Large tables (was: RAID 0 not as fast as
List pgsql-performance
On Thu, Sep 21, 2006 at 08:46:41PM -0700, Luke Lonergan wrote:
> Mark,
>
> On 9/21/06 8:40 PM, "mark@mark.mielke.cc" <mark@mark.mielke.cc> wrote:
>
> > I'd advise against using this call unless it can be shown that the page
> > will not be used in the future, or at least, that the page is less useful
> > than all other pages currently in memory. This is what the call really means.
> > It means, "There is no value to keeping this page in memory".
>
> Yes, it's a bit subtle.
>
> I think the topic is similar to "cache bypass", used in cache capable vector
> processors (Cray, Convex, Multiflow, etc) in the 90's.  When you are
> scanning through something larger than the cache, it should be marked
> "non-cacheable" and bypass caching altogether.  This avoids a copy, and
> keeps the cache available for things that can benefit from it.
>
> WRT the PG buffer cache, the rule would have to be: "if the heap scan is
> going to be larger than "effective_cache_size", then issue the
> posix_fadvise(BLOCK_NOT_NEEDED) call".  It doesn't sound very efficient to
> do this in block/extent increments though, and it would possibly mess with
> subsets of the block space that would be re-used for other queries.

Another issue is that if you start two large seqscans on the same table
at about the same time, right now you should only be issuing one set of
reads for both requests, because one of them will just pull the blocks
back out of cache. If we weren't caching then each query would have to
physically read (which would be horrid).

There's been talk of adding code that would have a seqscan detect if
another seqscan is happening on the table at the same time, and if it
is, to start it's seqscan wherever the other seqscan is currently
running. That would probably ensure that we weren't reading from the
table in 2 different places, even if we weren't caching.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

pgsql-performance by date:

Previous
From: nicky
Date:
Subject: Re: Opteron vs. Xeon "benchmark"
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Large tables (was: RAID 0 not as fast as