Re: Implementing Sorting Refinements - Mailing list pgsql-hackers

From Decibel!
Subject Re: Implementing Sorting Refinements
Date
Msg-id E8E2C8B9-EA19-41A1-B477-F36D0DAC8413@decibel.org
Whole thread Raw
In response to Implementing Sorting Refinements  (Manolo _ <mac_man2005@hotmail.it>)
Responses Re: Implementing Sorting Refinements  (<mac_man2005@hotmail.it>)
List pgsql-hackers
You'll get better response if you don't hijack threads. :) Also,  
there's nothing in here that describes what the benefits of this  
change are.

On Jan 1, 2008, at 2:09 PM, Manolo _ wrote:

>
> Hi to all.
>
> This mail is aimed at asking some suggestion to face PostgreSQL  
> (PG) development to implement a refinement to PG source code. I'll  
> briefly summarize the idea of the "2-Way Replacement  
> Selection" (2WRS) refinement, just in case. If you already remember  
> what 2WRS is, you can please jump directly to the IMPLEMENTATION  
> ISSUES part of this mail.
>
>
> 2WRS.
> Refinement of the Replacement Selection (RS) technique currently  
> used by PG in src/backend/utils/sort/tuplesort.c .
> The 2WRS uses two heaps instead of just one in order to create the  
> current (logical) run. Here there are some fundamental points of  
> the 2WRS technique:
> - 'qsort' the initial unsorted 'memtuples' array
> - divide the 'memtuples' array into two parts and each of those  
> will be managed as a heap
> - one of the heaps will arrange its elements in ascending order,  
> while the other heap in descending order
> - each heap will spill its current root in its corresponding run  
> (i.e.: we have a run per each of those two heaps), so we are  
> actually creating 2 physical current runs
> - those 2 current physical runs could "theoretically" be merged  
> into the same logical run, actually we can make 'mergesort' think  
> they do belong to the same physical run. That reduces the number of  
> comparisons 'mergesort' has to do at each merge step (that means  
> less seek delay time on mass storage). We can also think the  
> average length of logical runs produced by 2WRS will probably be  
> greater than the average length produced by RS (again less seek  
> delay time: the longer each run the less number of final runs  
> produced, that means the less number of comparisons at each  
> 'mergesort' step).
>
>
> IMPLEMENTATION ISSUES.
>     Where to place those heaps?
> 1) I think that both heaps could be arranged on the same  
> 'memtuples' array. That allows easily subsequent resizing those  
> heaps according to their effective use or according to some  
> heuristic, without reallocating memory.
>     
>     How to arrange those heaps?
> 2a) It's convenient to arrange those heaps "root to root". That is  
> arranging those heaps with their roots toward the center of  
> 'memtuples' (in a way we can say they lay "face to face", or "root  
> to root" as said before) while their leaves lay towards the extreme  
> indexes of the 'memtuples' array (that is the last leaf of one heap  
> will lay at index 0, the last leaf of the other heap laying at  
> index memtupsize-1. This arrangement prevents overlapping elements  
> between those physical runs associated to the same current logical  
> run.
> PRO: once we qsort memtuples and divide it into 2 parts we already  
> get those two heaps, no need to build them.
> CONTRA: ???
>
> 2b) As in 2a) but arranging heaps "leaf to leaf", that is their  
> roots will lay at the extreme indexes of 'memtuples' while their  
> leaves towards the center of the 'memtuples' array.
> Or even start building heaps as soon as we get initial elements,  
> instead of qsort the whole 'memtuples' array.
> Any PRO/CONTRA compared to 2a)???
>
>     Current run numbers
> I think I should duplicate the 'int currentRun' variable in the  
> Tuplesortstate struct. I'll replace it with a 'int currentRunUP'  
> and 'int currentRunDOWN' variables in order to distinguish those  
> two physical runs associated to those 2 heaps. In this case I will  
> give a run number (max{currentRunUP,currentRunDOWN} + 1) to  
> elements not belonging to the current logical run. I suppose no  
> need to touch 'long availMem' nor 'long allowedMem' variables nor  
> any others.
>
>     Heap functions
> I will duplicate all the heap management functions in order to  
> adapt them to the kind of heap they should be applied to (for  
> example, the tuplesort_heap_siftup function should be replaced with  
> tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions).
>
>     Merge Plan
> This technique would use a sort of "merge plan"    to instruct  
> mergesort on how to use those physical runs. Actually mergesort  
> should consider at first "odd runs" before "pair runs". That is,  
> for example, mergesort should start merging runs with run number  
> 1,3,5,7,... and when run number X terminates start considering run  
> number X+1. Obviously that doesn't need any merge plan, but I  
> remember someone else (as Simon Riggs) was interested in sorting  
> improvements so it could be a good thing to know if I should  
> consider any conventions or paramethers in order to possibly create  
> that merge plan.
>
>
> DEVELOPMENT CONTEXT
> I preferred to use the "last stable release" at the moment, that is  
> 8.2.
>
>
> Any comment/suggestion/advice ?
>
> Thanks for your attention and for your time.
> Regards, Manolo.
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today it's  
> FREE!
> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



pgsql-hackers by date:

Previous
From: Andrew - Supernews
Date:
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Next
From:
Date:
Subject: Re: Implementing Sorting Refinements