Re: Bug: Buffer cache is not scan resistant - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Bug: Buffer cache is not scan resistant
Date
Msg-id 1173234511.13722.560.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Bug: Buffer cache is not scan resistant  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: Bug: Buffer cache is not scan resistant  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
On Tue, 2007-03-06 at 18:29 +0000, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote:
> >> On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote:
> >>> Another approach I proposed back in December is to not have a  
> >>> variable like that at all, but scan the buffer cache for pages  
> >>> belonging to the table you're scanning to initialize the scan.  
> >>> Scanning all the BufferDescs is a fairly CPU and lock heavy  
> >>> operation, but it might be ok given that we're talking about large  
> >>> I/O bound sequential scans. It would require no DBA tuning and  
> >>> would work more robustly in varying conditions. I'm not sure where  
> >>> you would continue after scanning the in-cache pages. At the  
> >>> highest in-cache block number, perhaps.
> >> If there was some way to do that, it'd be what I'd vote for.
> >>
> > 
> > I still don't know how to make this take advantage of the OS buffer
> > cache. 
> 
> Yep, I don't see any way to do that. I think we could live with that, 
> though. If we went with the sync_scan_offset approach, you'd have to 
> leave a lot of safety margin in that as well.
> 

Right, there would certainly have to be a safety margin with
sync_scan_offset. However, your plan only works when the shared buffers
are dominated by this sequential scan. Let's say you have 40% of
physical memory for shared buffers, and say that 50% are being used for
hot pages in other parts of the database. That means you have access to
only 20% of physical memory to optimize for this sequential scan, and
20% of the physical memory is basically unavailable (being used for
other parts of the database).

In my current implementation, you could set sync_scan_offset to 1.0
(meaning 1.0 x shared_buffers), giving you 40% of physical memory that
would be used for starting this sequential scan. In this case, that
should be a good margin of error, considering that as much as 80% of the
physical memory might actually be in cache (OS or PG cache).

This all needs to be backed up by testing, of course. I'm just
extrapolating some numbers that look vaguely reasonable to me.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: ITAGAKI Takahiro
Date:
Subject: Re: Aggressive freezing in lazy-vacuum
Next
From: "Luke Lonergan"
Date:
Subject: Re: Bug: Buffer cache is not scan resistant