Re: Bug: Buffer cache is not scan resistant - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Bug: Buffer cache is not scan resistant
Date
Msg-id 45EDB2FD.4070705@enterprisedb.com
Whole thread Raw
In response to Re: Bug: Buffer cache is not scan resistant  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Bug: Buffer cache is not scan resistant
List pgsql-hackers
Jeff Davis wrote:
> On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote:
>> On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote:
>>> Another approach I proposed back in December is to not have a
>>> variable like that at all, but scan the buffer cache for pages
>>> belonging to the table you're scanning to initialize the scan.
>>> Scanning all the BufferDescs is a fairly CPU and lock heavy
>>> operation, but it might be ok given that we're talking about large
>>> I/O bound sequential scans. It would require no DBA tuning and
>>> would work more robustly in varying conditions. I'm not sure where
>>> you would continue after scanning the in-cache pages. At the
>>> highest in-cache block number, perhaps.
>> If there was some way to do that, it'd be what I'd vote for.
>>
>
> I still don't know how to make this take advantage of the OS buffer
> cache.

Yep, I don't see any way to do that. I think we could live with that,
though. If we went with the sync_scan_offset approach, you'd have to
leave a lot of safety margin in that as well.

> However, no DBA tuning is a huge advantage, I agree with that.
>
> If I were to implement this idea, I think Heikki's bitmap of pages
> already read is the way to go. Can you guys give me some pointers about
> how to walk through the shared buffers, reading the pages that I need,
> while being sure not to read a page that's been evicted, and also not
> potentially causing a performance regression somewhere else?

You could take a look at BufferSync, for example. It walks through the
buffer cache, syncing all dirty buffers.

FWIW, I've attached a function I wrote some time ago when I was playing
with the same idea for vacuums. A call to the new function loops through
the buffer cache and returns the next buffer that belong to a certain
relation. I'm not sure that it's correct and safe, and there's not much
comments, but should work if you want to play with it...

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.214
diff -c -r1.214 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c    5 Jan 2007 22:19:37 -0000    1.214
--- src/backend/storage/buffer/bufmgr.c    22 Jan 2007 16:38:37 -0000
***************
*** 97,102 ****
--- 97,134 ----
  static void AtProcExit_Buffers(int code, Datum arg);


+ Buffer
+ ReadAnyBufferForRelation(Relation reln)
+ {
+     static int last_buf_id = 0;
+     int new_buf_id;
+     volatile BufferDesc *bufHdr;
+
+     /* Make sure we will have room to remember the buffer pin */
+     ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
+
+     new_buf_id = last_buf_id;
+     do
+     {
+         if (++new_buf_id >= NBuffers)
+             new_buf_id = 0;
+
+         bufHdr = &BufferDescriptors[new_buf_id];
+         LockBufHdr(bufHdr);
+
+         if ((bufHdr->flags & BM_VALID) && RelFileNodeEquals(bufHdr->tag.rnode, reln->rd_node))
+         {
+             PinBuffer_Locked(bufHdr);
+             last_buf_id = new_buf_id;
+             return BufferDescriptorGetBuffer(bufHdr);
+         }
+         UnlockBufHdr(bufHdr);
+     } while(new_buf_id != last_buf_id);
+     last_buf_id = new_buf_id;
+     return InvalidBuffer;
+ }
+
+
  /*
   * ReadBuffer -- returns a buffer containing the requested
   *        block of the requested relation.  If the blknum

pgsql-hackers by date:

Previous
From: "Florian G. Pflug"
Date:
Subject: Re: Auto creation of Partitions
Next
From: Peter Eisentraut
Date:
Subject: Re: Auto creation of Partitions