Re: double-buffering page writes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: double-buffering page writes
Date
Msg-id 4900A948.1000606@enterprisedb.com
Whole thread Raw
In response to Re: double-buffering page writes  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: double-buffering page writes
List pgsql-hackers
Alvaro Herrera wrote:
> ITAGAKI Takahiro wrote:
> 
>> I have some comments about the double-buffering:
> 
> Since posting this patch I have realized that this implementation is
> bogus.  I'm now playing with WAL-logging hint bits though.  

Yeah, the torn page + hint bit updates problem is the tough question.

>> - Is it ok to allocale dblbuf[BLCKSZ] as local variable?
>>   It might be unaligned. AFAICS we avoid such usages in other places.
> 
> I thought about that too.  I admit I am not sure if this really works
> portably; however I don't want to add a palloc() to that routine.

It should work, AFAIK, but unaligned memcpy()s and write()s can be a 
significantly slower. There can be only one write() happening at any 
time, so you could just palloc() a single 8k buffer in TopMemoryContext 
in backend startup, and always use that.

>> - Are there any other modules that can share in the benefits of
>>   double-buffering? For example, we could avoid avoid waiting for
>>   LockBufferForCleanup(). It is cool if the double-buffering can
>>   be used for multiple purposes.
> 
> Not sure on this.

You'd need to keep both versions of the buffer simultaneously in the 
buffer cache for that. When we talked about the various designs for HOT, 
that was one of the ideas I had to enable more aggressive pruning: if 
you can't immediately get a vacuum lock, allocate a new buffer in the 
buffer cache for the same block, copy the page to the new buffer, and do 
the pruning, including moving tuples around, there. Any new ReadBuffer 
calls would return the new page version, but old readers would keep 
referencing the old one. The intrusive part of that approach, in 
addition to the obvious changes required in the buffer manager to keep 
around multiple copies of the same block, is that all modifications must 
be done on the new version, so anyone who needs to lock the page for 
modification would need to switch to the new page version at the 
LockBuffer call.

As discussed in the other thread with Simon, we also use vacuum locks in 
b-tree for waiting out index scans, so avoiding the waiting there would 
be just wrong.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Unicode escapes in literals
Next
From: Tom Lane
Date:
Subject: Re: SSL cleanups/hostname verification