Re: Replacement Selection - Mailing list pgsql-hackers
From | |
---|---|
Subject | Re: Replacement Selection |
Date | |
Msg-id | BAY132-DS30840B8472DA2670B81C2E6760@phx.gbl Whole thread Raw |
In response to | Autovacuum and OldestXmin (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Replacement Selection
Re: Replacement Selection |
List | pgsql-hackers |
Hi to all. It seems a previous mail of mine with following body hasn't been sent. Sorry for possibly getting it twice. Actually I have now modified that body, so it's worth to read it once again. Thanks for your attention. Regards. ------------PREVIOUS MAIL-------------------------- Well, the refinements are the followings: Using 2 heaps instead of just one: one heap creating a "descending" run and the other one creating an "ascending" run. Both associated to the same "logical" run. Suppose we want the input elements to be finally sorted in an ascending order. To do this we could QuickSort the first M initialization elements into RAM and then divide it into 2 parts. Suppose the first heap creates the following run: 10 9 8 And suppose the second heap creates the following run: 3 5 7 Those two runs can be seen as just one by mergesort... since they "could" be physically merged into one single run: at first we could write the elements 3,5,7 and then the elements of the other run, red upside down. Possible advantages: Having two heaps of that kinds lets RS better adapt to local variations of the input trend. This technique can be called Two Ways Replacement Selection (2WRS) just because of those 2 heaps. As an extreme example, we can say that having the input already sort in reverse order no more leads us to the worst case: with 2WRS no matter the input is already sort in ascending/descending order... in this case we'll produce just one run instead of producing the maximum number of runs as in RS worst case (input in reverse order). Moreover it lets us to grow the current run in 2 ways: just imagine we would output runs in a regular file. With 2WRS this could be seen as start outputting elements from the middle of such a regular file, the descending heap outputting elements from the middle upwards while the ascending one outputting from the middle downward. This could imply getting a smaller number of "dead records" (as I said in previous mails, a dear record is an element that won't form part of the current run) and so having longer runs. Others optimizations, for example, can be done with the "virtual concatenation" technique: storing a cache of couples (first_element,last_element) for each created run. This could be useful in case we can find 2 couples (first_element_1, last_element_1) and (first_element_2, last_element_2) with last_element_1 <= first_element_2. In this case, those runs too can be seen as belonging to the same "logical run" (actually they are 2 RS different physical runs, or even 4 in 2WRS but can be seen as just one by mergesort). Of course, once those 2 (or 4) runs are logically merged into that only one, this last one in turn could be merged to other runs. What does all that imply? Mergesort would actually consider a smaller number of runs (since it should just work on logical runs). This means less jumps between runs on disk. Now... to test those refinements I should integrate my code into PostgreSQL... but it's not that easy for me... Thanks for your attention. ------------PREVIOUS MAIL--------------------------
pgsql-hackers by date: