Home > mailing lists

Re: Should I implement DROP INDEX CONCURRENTLY? - Mailing list pgsql-hackers

From	Jim Nasby
Subject	Re: Should I implement DROP INDEX CONCURRENTLY?
Date	January 4, 2012 02:59:37
Msg-id	06CF0B08-8E1E-4E47-81F3-90C6771D58F9@nasby.net Whole thread Raw
In response to	Re: Should I implement DROP INDEX CONCURRENTLY? (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Jan 3, 2012, at 7:34 PM, Robert Haas wrote:
> On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jim Nasby <jim@nasby.net> writes:
>>> Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with
highcontention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper
thanthe linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is
actuallyuseful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible
(andlet the bgwriter deal with keeping enough pages in that list to satisfy demand). 
>>
>> Well, maybe, but I think the historical evidence suggests that that
>> approach will be a loser, simply because no matter how cheap, the
>> freelist will remain a centralized and heavily contended data structure.
>> IMO we need to be looking for a mechanism that has no single point of
>> contention, and modifying the clock sweep rules looks like the best way
>> to get there.
>>
>> Still, he who wants to do the work can try whatever approach he feels
>> like.
>
> It might be possible to partition the free list.  So imagine, for
> example, 8 free lists.  The background writer runs the clock sweep and
> finds some buffers that are about ready to be reallocated and
> distributes one-eighth of them to each free list.  Then, when a
> backend wants to allocate a buffer, it picks a free list somehow
> (round robin?) and pulls a buffer off it.  If the buffer turns out to
> have been used since it was added to the free list, we give up on it
> and try again.  This hopefully shouldn't happen too often, though, as
> long as we only add enough buffers to the free list to satisfy the
> requests that we expect to get over the next
> small-fraction-of-a-second.
>
> Of course you have to think about what happens if you find that your
> chosen free list is empty.  In that case you probably want to try a
> different one, and if in the worst case where they're all empty, run
> the clock sweep in the foreground.  You probably also want to kick the
> background writer in the pants if even a single one is empty (or
> nearly empty?) and tell it to hurry up and add some more buffers to
> the freelist.

If it comes down to it, we can look at partitioning the free list. But here's the thing: this is the strategy that
FreeBSD(and I think now Linux as well) use to service memory requests, be they for free memory or for reading data from
disk.If it's good enough for an OS to use, I would expect we could make it work as well. 

I would expect that pulling a page off the free list would be an extremely fast operation... lock the read lock on the
list(we should probably have separate locks for putting stuff on the list vs taking it off), pin the buffer (which
shouldn'tbe contentious), update the read pointer and drop the read lock. Even in C code that's probably less than 100
machineinstructions, and no looping. Compared to the cost of running a clock sweep it should be significantly faster.
Sowhile there will be contention on the free list read lock, none of that contention should last very long. 

Do we have any other places where we take extremely short locks like this and still run into problems?
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

pgsql-hackers by date:

From: temporalcraig
Date: 04 January 2012, 02:28:06
Subject: Re: SQL:2011 features

From: "David E. Wheeler"
Date: 04 January 2012, 04:48:14
Subject: PL/Perl Does not Like vstrings

Re: Should I implement DROP INDEX CONCURRENTLY? - Mailing list pgsql-hackers

Previous

Next