Home > mailing lists

Re: Replacement Selection - Mailing list pgsql-hackers

From
Subject	Re: Replacement Selection
Date	November 26, 2007 14:44:52
Msg-id	BAY132-DS30F1930B2A37D80D98377E6750@phx.gbl Whole thread Raw
In response to	Autovacuum and OldestXmin (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Replacement Selection Re: Replacement Selection Re: Replacement Selection Re: Replacement Selection Re: Replacement Selection
List	pgsql-hackers

Tree view

I must precise that it's not the improvement. Other more complex algorithms 
correspond to the refinements, but at the moment I just want to know which 
part of PostgreSQL code does what. I also implemented Replacement Selection 
(RS) so if I'm able to integrate my RS I hope I would be able to integrate 
the others too.

Anyway, even in my RS implementation a longer run is created. The first M 
initialization elements will surely form part of the current run. M is the 
memory size so at least a run sized M will be created. After initialization, 
the elements are not suddenly output, but an element from heap is output 
into run as soon as I get an element from stream. In other words, for each 
element from stream, the root element of the heap is output, and the input 
element takes the root place into the heap. If that element is a "good 
record" I just heapify (since the element will be placed at the now free 
root place). If that input element is a dead record I swap it with the last 
leaf and reduce the heap size.

--------------------------------------------------
From: "Tom Lane" <tgl@sss.pgh.pa.us>
Sent: Monday, November 26, 2007 7:31 PM
To: <mac_man2005@hotmail.it>
Cc: <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Replacement Selection

> <mac_man2005@hotmail.it> writes:
>> 3) Start run generation. As for this phase, I see PostgreSQL code (as 
>> Knuth
>> algorithm) marks elements belonging to runs in otder to know which run 
>> they
>> belong to and to know when the current heap has finished building the
>> current run. I don't memorize this kind of info. I just output from heap 
>> to
>> run all of the elements going into the current run. The elements supposed 
>> to
>> go into the next run (I call them "dead records") are still stored into 
>> main
>> memory, but as leaves of the heap. This implies reducing the heap size 
>> and
>> so heapifying a smaller number of elements each time I get a dead record
>> (it's not necessary to sort dead records). When the heap size is zero a 
>> new
>> run is created heapifying all the dead records currently present into 
>> main
>> memory.
>
> Why would this be an improvement over Knuth?  AFAICS you can't generate
> longer runs this way, and it's not saving any time --- in fact it's
> costing time, because re-heapifying adds a lot of new comparisons.
>
> regards, tom lane
>

pgsql-hackers by date:

From: Tom Lane
Date: 26 November 2007, 14:32:52
Subject: Re: Replacement Selection

From: "Guillaume Smet"
Date: 26 November 2007, 15:04:30
Subject: Re: 8.3devel slower than 8.2 under read-only load

Re: Replacement Selection - Mailing list pgsql-hackers

Previous

Next