Re: Bug: Buffer cache is not scan resistant - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Bug: Buffer cache is not scan resistant |
Date | |
Msg-id | 45EDB2FD.4070705@enterprisedb.com Whole thread Raw |
In response to | Re: Bug: Buffer cache is not scan resistant (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Bug: Buffer cache is not scan resistant
|
List | pgsql-hackers |
Jeff Davis wrote: > On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote: >> On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote: >>> Another approach I proposed back in December is to not have a >>> variable like that at all, but scan the buffer cache for pages >>> belonging to the table you're scanning to initialize the scan. >>> Scanning all the BufferDescs is a fairly CPU and lock heavy >>> operation, but it might be ok given that we're talking about large >>> I/O bound sequential scans. It would require no DBA tuning and >>> would work more robustly in varying conditions. I'm not sure where >>> you would continue after scanning the in-cache pages. At the >>> highest in-cache block number, perhaps. >> If there was some way to do that, it'd be what I'd vote for. >> > > I still don't know how to make this take advantage of the OS buffer > cache. Yep, I don't see any way to do that. I think we could live with that, though. If we went with the sync_scan_offset approach, you'd have to leave a lot of safety margin in that as well. > However, no DBA tuning is a huge advantage, I agree with that. > > If I were to implement this idea, I think Heikki's bitmap of pages > already read is the way to go. Can you guys give me some pointers about > how to walk through the shared buffers, reading the pages that I need, > while being sure not to read a page that's been evicted, and also not > potentially causing a performance regression somewhere else? You could take a look at BufferSync, for example. It walks through the buffer cache, syncing all dirty buffers. FWIW, I've attached a function I wrote some time ago when I was playing with the same idea for vacuums. A call to the new function loops through the buffer cache and returns the next buffer that belong to a certain relation. I'm not sure that it's correct and safe, and there's not much comments, but should work if you want to play with it... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/storage/buffer/bufmgr.c =================================================================== RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.214 diff -c -r1.214 bufmgr.c *** src/backend/storage/buffer/bufmgr.c 5 Jan 2007 22:19:37 -0000 1.214 --- src/backend/storage/buffer/bufmgr.c 22 Jan 2007 16:38:37 -0000 *************** *** 97,102 **** --- 97,134 ---- static void AtProcExit_Buffers(int code, Datum arg); + Buffer + ReadAnyBufferForRelation(Relation reln) + { + static int last_buf_id = 0; + int new_buf_id; + volatile BufferDesc *bufHdr; + + /* Make sure we will have room to remember the buffer pin */ + ResourceOwnerEnlargeBuffers(CurrentResourceOwner); + + new_buf_id = last_buf_id; + do + { + if (++new_buf_id >= NBuffers) + new_buf_id = 0; + + bufHdr = &BufferDescriptors[new_buf_id]; + LockBufHdr(bufHdr); + + if ((bufHdr->flags & BM_VALID) && RelFileNodeEquals(bufHdr->tag.rnode, reln->rd_node)) + { + PinBuffer_Locked(bufHdr); + last_buf_id = new_buf_id; + return BufferDescriptorGetBuffer(bufHdr); + } + UnlockBufHdr(bufHdr); + } while(new_buf_id != last_buf_id); + last_buf_id = new_buf_id; + return InvalidBuffer; + } + + /* * ReadBuffer -- returns a buffer containing the requested * block of the requested relation. If the blknum
pgsql-hackers by date: