Re: Bug: Buffer cache is not scan resistant - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Bug: Buffer cache is not scan resistant
Date
Msg-id 1173134582.13722.385.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Bug: Buffer cache is not scan resistant  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-hackers
On Mon, 2007-03-05 at 21:03 +0000, Heikki Linnakangas wrote:
> Another approach I proposed back in December is to not have a variable 
> like that at all, but scan the buffer cache for pages belonging to the 
> table you're scanning to initialize the scan. Scanning all the 
> BufferDescs is a fairly CPU and lock heavy operation, but it might be ok 
> given that we're talking about large I/O bound sequential scans. It 
> would require no DBA tuning and would work more robustly in varying 
> conditions. I'm not sure where you would continue after scanning the 
> in-cache pages. At the highest in-cache block number, perhaps.
> 

I assume you're referring to this:

"each backend keeps a bitmap of pages it has processed during the scan,
and read the pages in the order they're available in cache."

which I think is a great idea. However, I was unable to devise a good
answer to all these questions at once:

* How do we attempt to maintain sequential reads on the underlying I/O
layer?

* My current implementation takes advantage of the OS buffer cache, how
could we maintain that advantage from PostgreSQL-specific cache logic?

* How do I test to see whether it actually helps in a realistic
scenario? It seems like it would help the most when scans are
progressing at different rates, but how often do people have CPU-bound
queries on tables that don't fit into physical memory (and how long
would it take for me to benchmark such a query)?

It seems like your idea is more analytical, and my current
implementation is more guesswork. I like the analytical approach, but I
don't know that we have enough information to pull it off because we're
missing what's in the OS buffer cache. The OS buffer cache is crucial to
Synchronized Scanning, because shared buffers are evicted based on a
more complex set of circumstances, whereas the OS buffer cache is
usually LRU and forms a nicer "cache trail" (upon which Synchronized
Scanning is largely based). 

If you have some tests you'd like me to run, I'm planning to do some
benchmarks this week and next. I can see if my current patch holds up
under the scenarios you're worried about.

Regards,Jeff Davis





pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: proposal: custom variables management
Next
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: Add GUC temp_tablespaces to provide a default location for