Thread: Index Page Split logging
When we split an index page we perform a multi-block operation that is both fairly expensive and complex to reconstruct should we crash partway through. If we could log *only* the insert that caused the split, rather than the split itself, we would avoid that situation entirely. This would then mean that the recovery code would resolve the split by performing a full logical split rather than replaying pieces of the original physical split. Doing that would remove a ton of complexity, as well as reducing log volumes. We would need to ensure that the right-hand page of the split reached disk before the left-hand page. If a crash occurs when only the right hand page has reached disk then there would be no link (on disk) to it and so it would be ignored. We would need an orphaned page detection mechanism to allow the page to be VACUUMed sometime in the future. There would also be some sort of ordering required in the buffer manager, so that pages which must be written last are kept pinned until the first page is written. That sounds like it is fairly straightforward and it would allow a generic mechanism that worked for all index splits, rather than requiring individual code for each rmgr. ISTM that would require Direct I/O to perform physical writes in a specific order, rather than just issue the writes and fsync. Which probably kills it for now, even assuming you followed me on every point up till now... So I'm mentioning this really to get the idea out there and see if anybody has any bright ideas, rather than as a well-formed proposal for immediate implementation. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: > If we could log *only* the insert that caused the split, rather than the > split itself, we would avoid that situation entirely. How are you going to avoid the need to run user-defined functions (specifically, the btree comparison functions) during replay? regards, tom lane
Hi to all. This mail is aimed at asking some suggestion to face PostgreSQL (PG) development to implement a refinement to PG source code.I'll briefly summarize the idea of the "2-Way Replacement Selection" (2WRS) refinement, just in case. If you alreadyremember what 2WRS is, you can please jump directly to the IMPLEMENTATION ISSUES part of this mail. 2WRS. Refinement of the Replacement Selection (RS) technique currently used by PG in src/backend/utils/sort/tuplesort.c . The 2WRS uses two heaps instead of just one in order to create the current (logical) run. Here there are some fundamentalpoints of the 2WRS technique: - 'qsort' the initial unsorted 'memtuples' array - divide the 'memtuples' array into two parts and each of those will be managed as a heap - one of the heaps will arrange its elements in ascending order, while the other heap in descending order - each heap will spill its current root in its corresponding run (i.e.: we have a run per each of those two heaps), so weare actually creating 2 physical current runs - those 2 current physical runs could "theoretically" be merged into the same logical run, actually we can make 'mergesort'think they do belong to the same physical run. That reduces the number of comparisons 'mergesort' has to do ateach merge step (that means less seek delay time on mass storage). We can also think the average length of logical runsproduced by 2WRS will probably be greater than the average length produced by RS (again less seek delay time: the longereach run the less number of final runs produced, that means the less number of comparisons at each 'mergesort' step). IMPLEMENTATION ISSUES.Where to place those heaps? 1) I think that both heaps could be arranged on the same 'memtuples' array. That allows easily subsequent resizing thoseheaps according to their effective use or according to some heuristic, without reallocating memory.How to arrange thoseheaps? 2a) It's convenient to arrange those heaps "root to root". That is arranging those heaps with their roots toward the centerof 'memtuples' (in a way we can say they lay "face to face", or "root to root" as said before) while their leaves laytowards the extreme indexes of the 'memtuples' array (that is the last leaf of one heap will lay at index 0, the lastleaf of the other heap laying at index memtupsize-1. This arrangement prevents overlapping elements between those physicalruns associated to the same current logical run. PRO: once we qsort memtuples and divide it into 2 parts we already get those two heaps, no need to build them. CONTRA: ??? 2b) As in 2a) but arranging heaps "leaf to leaf", that is their roots will lay at the extreme indexes of 'memtuples' whiletheir leaves towards the center of the 'memtuples' array. Or even start building heaps as soon as we get initial elements, instead of qsort the whole 'memtuples' array. Any PRO/CONTRA compared to 2a)??? Current run numbers I think I should duplicate the 'int currentRun' variable in the Tuplesortstate struct. I'll replace it with a 'int currentRunUP'and 'int currentRunDOWN' variables in order to distinguish those two physical runs associated to those 2 heaps.In this case I will give a run number (max{currentRunUP,currentRunDOWN} + 1) to elements not belonging to the currentlogical run. I suppose no need to touch 'long availMem' nor 'long allowedMem' variables nor any others. Heap functions I will duplicate all the heap management functions in order to adapt them to the kind of heap they should be applied to (forexample, the tuplesort_heap_siftup function should be replaced with tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWNfunctions). Merge Plan This technique would use a sort of "merge plan" to instruct mergesort on how to use those physical runs. Actually mergesortshould consider at first "odd runs" before "pair runs". That is, for example, mergesort should start merging runswith run number 1,3,5,7,... and when run number X terminates start considering run number X+1. Obviously that doesn'tneed any merge plan, but I remember someone else (as Simon Riggs) was interested in sorting improvements so it couldbe a good thing to know if I should consider any conventions or paramethers in order to possibly create that merge plan. DEVELOPMENT CONTEXT I preferred to use the "last stable release" at the moment, that is 8.2. Any comment/suggestion/advice ? Thanks for your attention and for your time. Regards, Manolo. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
On Tue, 2008-01-01 at 14:02 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > If we could log *only* the insert that caused the split, rather than the > > split itself, we would avoid that situation entirely. > > How are you going to avoid the need to run user-defined functions > (specifically, the btree comparison functions) during replay? Seems like a good objection. Just exercising my lateral thought muscles. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Tue, Jan 01, 2008 at 08:55:58PM +0000, Simon Riggs wrote: > On Tue, 2008-01-01 at 14:02 -0500, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > If we could log *only* the insert that caused the split, rather than the > > > split itself, we would avoid that situation entirely. > > > > How are you going to avoid the need to run user-defined functions > > (specifically, the btree comparison functions) during replay? > > Seems like a good objection. Just exercising my lateral thought muscles. It seems to me you should be able to manage an intermediate version where (in for example GiST) you store the output of picksplit. i.e. you describe your split as: items A,B,C went to the left and the rest went to the right page. This would allow you to reconstruct the split completely without actually having to write all the data. The only thing I'm not sure about is whether it's easy to get the relevent inforation to make this efficient. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
On Tue, 2008-01-01 at 22:20 +0100, Martijn van Oosterhout wrote: > On Tue, Jan 01, 2008 at 08:55:58PM +0000, Simon Riggs wrote: > > On Tue, 2008-01-01 at 14:02 -0500, Tom Lane wrote: > > > Simon Riggs <simon@2ndquadrant.com> writes: > > > > If we could log *only* the insert that caused the split, rather than the > > > > split itself, we would avoid that situation entirely. > > > > > > How are you going to avoid the need to run user-defined functions > > > (specifically, the btree comparison functions) during replay? > > > > Seems like a good objection. Just exercising my lateral thought muscles. > > It seems to me you should be able to manage an intermediate version > where (in for example GiST) you store the output of picksplit. i.e. you > describe your split as: items A,B,C went to the left and the rest went > to the right page. This would allow you to reconstruct the split > completely without actually having to write all the data. Yes, but you may have to handle recursive splits in the parent, so the problem is not fully solved by that manoeuvre. Perhaps we could do this only in cases where the parent is not split? Hmm, starting to sound too far fetched for me now. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Jan 2, 2008 2:25 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Can this be explained in more detail???
Thanks,
Gokul.
On Tue, 2008-01-01 at 14:02 -0500, Tom Lane wrote:Seems like a good objection. Just exercising my lateral thought muscles.
> Simon Riggs <simon@2ndquadrant.com> writes:
> > If we could log *only* the insert that caused the split, rather than the
> > split itself, we would avoid that situation entirely.
>
> How are you going to avoid the need to run user-defined functions
> (specifically, the btree comparison functions) during replay?
Can this be explained in more detail???
Thanks,
Gokul.
On Wed, Jan 02, 2008 at 02:49:35PM +0530, Gokulakannan Somasundaram wrote: > On Jan 2, 2008 2:25 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Tue, 2008-01-01 at 14:02 -0500, Tom Lane wrote: > > > How are you going to avoid the need to run user-defined functions > > > (specifically, the btree comparison functions) during replay? > > > > Seems like a good objection. Just exercising my lateral thought muscles. > > Can this be explained in more detail??? If the goal is to only store the insert, then we need to determine during recovery which page the record needs to be added to. To do this you need to go through the index, which can only be done by calling user-defined functions. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
On Jan 2, 2008 3:35 PM, Martijn van Oosterhout <kleptog@svana.org> wrote:
Correct me if i am wrong.
User-defined functions act as comparison functions in GIST. But when do they act as comparison functions in b-tree? I am not able to understand exactly that part.
Thanks for the explanation.
If the goal is to only store the insert, then we need to determine
during recovery which page the record needs to be added to. To do this
you need to go through the index, which can only be done by calling
user-defined functions.
Correct me if i am wrong.
User-defined functions act as comparison functions in GIST. But when do they act as comparison functions in b-tree? I am not able to understand exactly that part.
Thanks for the explanation.
--
Thanks,
Gokul.
On Wed, Jan 02, 2008 at 04:04:48PM +0530, Gokulakannan Somasundaram wrote: > On Jan 2, 2008 3:35 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > > If the goal is to only store the insert, then we need to determine > > during recovery which page the record needs to be added to. To do this > > you need to go through the index, which can only be done by calling > > user-defined functions. > > Correct me if i am wrong. > User-defined functions act as comparison functions in GIST. But when do they > act as comparison functions in b-tree? I am not able to understand exactly > that part. All indexes are done by user-defined functions, even b-trees. People can make their own b-tree indexes by defining an operator class. Note that "user-defined" is this case means anything called via the fmgr interface. For reference, hash indexes are defined by user-defined functions as well... It's all about the intereaction between Index AMs and operator classes. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
All indexes are done by user-defined functions, even b-trees. People can
make their own b-tree indexes by defining an operator class. Note that
"user-defined" is this case means anything called via the fmgr
interface.
Again, i think i have one more wrong understanding. My understanding is,
We are discussing about user-defined functions because, they might be actually be mutable functions, but the user might have classified as immutable. This might cause some problems while replaying the log. In the case of hash-indexes, if the hash-function is mutable, then the user has a corrupted index.
Is there some other problem also, because of user-defined functions, that will stall recovery in the proposed idea?
In our standard b-tree(with no user-defined operator classes), there shouldn't be any problem with replaying right?
Again shouldn't we say b-tree with user-defined op-classes as gist-btree? If not what is the difference between both?
Again thanks for the explanation.
Thanks,
Gokul.
On Wed, Jan 02, 2008 at 04:46:11PM +0530, Gokulakannan Somasundaram wrote: > > All indexes are done by user-defined functions, even b-trees. People can > > make their own b-tree indexes by defining an operator class. Note that > > "user-defined" is this case means anything called via the fmgr > > interface. > > Again, i think i have one more wrong understanding. My understanding is, > We are discussing about user-defined functions because, they might be > actually be mutable functions, but the user might have classified as > immutable. This is where it gets a bit beyond by depth, so someone may have to correct me. The point is that during the recovery the system is not yet fully running and not everything works yet. For example, what happens if the system crashed halfway through updating a page in pg_proc and that page needs to be recovered from WAL. Yet to insert into the index you need to be able to read pg_am, pg_amproc, pg_proc at least, probably more. The point being, you can't rely on anything except WAL during recovery. > In our standard b-tree(with no user-defined operator classes), there > shouldn't be any problem with replaying right? Even inserting into our "standard b-tree" involves looking up the operator class and associated functions and using the fmgr interface. There is no difference in the system between the builtin operator classes and "user defined" ones. > Again shouldn't we say b-tree with user-defined op-classes as gist-btree? If > not what is the difference between both? gist-btree is a nice example but IIRC it takes more diskspace and can't handle uniqueness. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
On Wed, 2008-01-02 at 13:54 +0100, Martijn van Oosterhout wrote: > On Wed, Jan 02, 2008 at 04:46:11PM +0530, Gokulakannan Somasundaram wrote: > > > All indexes are done by user-defined functions, even b-trees. People can > > > make their own b-tree indexes by defining an operator class. Note that > > > "user-defined" is this case means anything called via the fmgr > > > interface. > > > > Again, i think i have one more wrong understanding. My understanding is, > > We are discussing about user-defined functions because, they might be > > actually be mutable functions, but the user might have classified as > > immutable. > > This is where it gets a bit beyond by depth, so someone may have to > correct me. The point is that during the recovery the system is not yet > fully running and not everything works yet. For example, what happens > if the system crashed halfway through updating a page in pg_proc and > that page needs to be recovered from WAL. Yet to insert into the index > you need to be able to read pg_am, pg_amproc, pg_proc at least, > probably more. That's right; shame I forgot this before I started the thread... -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
<br /><br /><div class="gmail_quote">On Jan 2, 2008 6:24 PM, Martijn van Oosterhout <<a href="mailto:kleptog@svana.org">kleptog@svana.org</a>>wrote:<br /><blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d">On Wed,Jan 02, 2008 at 04:46:11PM +0530, Gokulakannan Somasundaram wrote:<br />> > All indexes are done by user-definedfunctions, even b-trees. People can<br />> > make their own b-tree indexes by defining an operator class.Note that <br />> > "user-defined" is this case means anything called via the fmgr<br />> > interface.<br/>><br />> Again, i think i have one more wrong understanding. My understanding is,<br />> We are discussingabout user-defined functions because, they might be <br />> actually be mutable functions, but the user mighthave classified as<br />> immutable.<br /><br /></div>This is where it gets a bit beyond by depth, so someone mayhave to<br />correct me. The point is that during the recovery the system is not yet <br />fully running and not everythingworks yet. For example, what happens<br />if the system crashed halfway through updating a page in pg_proc and<br/>that page needs to be recovered from WAL. Yet to insert into the index<br /> you need to be able to read pg_am, pg_amproc,pg_proc at least,<br />probably more.<br /><br />The point being, you can't rely on anything except WAL duringrecovery.<br /><div class="Ih2E3d"><br /></div></blockquote></div>Thanks a lot for the nice explanation. That wassomething new for me....:(( <br /><br clear="all" /><br />-- <br />Thanks,<br />Gokul.<br /><br />
On Wed, Jan 02, 2008 at 01:17:03PM +0000, Simon Riggs wrote: > That's right; shame I forgot this before I started the thread... Actually, I think your idea has merit, it's just not as easy as originally thought. All splits, including multiple-level splits, can be described as a sequence of "split page X with items {S} going to the left" followed by an "insert item into page X". The trick is that for multi-level splits you have to split from the top down. Then each split becomes a simple loggable operation. But, I think in the live system we split from the bottom up (the split and insert is a single operation), so I don't think you can actually combine the current split algorithm with the logging operation I suggest above. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
You'll get better response if you don't hijack threads. :) Also, there's nothing in here that describes what the benefits of this change are. On Jan 1, 2008, at 2:09 PM, Manolo _ wrote: > > Hi to all. > > This mail is aimed at asking some suggestion to face PostgreSQL > (PG) development to implement a refinement to PG source code. I'll > briefly summarize the idea of the "2-Way Replacement > Selection" (2WRS) refinement, just in case. If you already remember > what 2WRS is, you can please jump directly to the IMPLEMENTATION > ISSUES part of this mail. > > > 2WRS. > Refinement of the Replacement Selection (RS) technique currently > used by PG in src/backend/utils/sort/tuplesort.c . > The 2WRS uses two heaps instead of just one in order to create the > current (logical) run. Here there are some fundamental points of > the 2WRS technique: > - 'qsort' the initial unsorted 'memtuples' array > - divide the 'memtuples' array into two parts and each of those > will be managed as a heap > - one of the heaps will arrange its elements in ascending order, > while the other heap in descending order > - each heap will spill its current root in its corresponding run > (i.e.: we have a run per each of those two heaps), so we are > actually creating 2 physical current runs > - those 2 current physical runs could "theoretically" be merged > into the same logical run, actually we can make 'mergesort' think > they do belong to the same physical run. That reduces the number of > comparisons 'mergesort' has to do at each merge step (that means > less seek delay time on mass storage). We can also think the > average length of logical runs produced by 2WRS will probably be > greater than the average length produced by RS (again less seek > delay time: the longer each run the less number of final runs > produced, that means the less number of comparisons at each > 'mergesort' step). > > > IMPLEMENTATION ISSUES. > Where to place those heaps? > 1) I think that both heaps could be arranged on the same > 'memtuples' array. That allows easily subsequent resizing those > heaps according to their effective use or according to some > heuristic, without reallocating memory. > > How to arrange those heaps? > 2a) It's convenient to arrange those heaps "root to root". That is > arranging those heaps with their roots toward the center of > 'memtuples' (in a way we can say they lay "face to face", or "root > to root" as said before) while their leaves lay towards the extreme > indexes of the 'memtuples' array (that is the last leaf of one heap > will lay at index 0, the last leaf of the other heap laying at > index memtupsize-1. This arrangement prevents overlapping elements > between those physical runs associated to the same current logical > run. > PRO: once we qsort memtuples and divide it into 2 parts we already > get those two heaps, no need to build them. > CONTRA: ??? > > 2b) As in 2a) but arranging heaps "leaf to leaf", that is their > roots will lay at the extreme indexes of 'memtuples' while their > leaves towards the center of the 'memtuples' array. > Or even start building heaps as soon as we get initial elements, > instead of qsort the whole 'memtuples' array. > Any PRO/CONTRA compared to 2a)??? > > Current run numbers > I think I should duplicate the 'int currentRun' variable in the > Tuplesortstate struct. I'll replace it with a 'int currentRunUP' > and 'int currentRunDOWN' variables in order to distinguish those > two physical runs associated to those 2 heaps. In this case I will > give a run number (max{currentRunUP,currentRunDOWN} + 1) to > elements not belonging to the current logical run. I suppose no > need to touch 'long availMem' nor 'long allowedMem' variables nor > any others. > > Heap functions > I will duplicate all the heap management functions in order to > adapt them to the kind of heap they should be applied to (for > example, the tuplesort_heap_siftup function should be replaced with > tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions). > > Merge Plan > This technique would use a sort of "merge plan" to instruct > mergesort on how to use those physical runs. Actually mergesort > should consider at first "odd runs" before "pair runs". That is, > for example, mergesort should start merging runs with run number > 1,3,5,7,... and when run number X terminates start considering run > number X+1. Obviously that doesn't need any merge plan, but I > remember someone else (as Simon Riggs) was interested in sorting > improvements so it could be a good thing to know if I should > consider any conventions or paramethers in order to possibly create > that merge plan. > > > DEVELOPMENT CONTEXT > I preferred to use the "last stable release" at the moment, that is > 8.2. > > > Any comment/suggestion/advice ? > > Thanks for your attention and for your time. > Regards, Manolo. > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today it's > FREE! > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Well, sorry for hijacking... ummm how did I do that? Anyway I'll thank you for giving a "sign of life" when I was almost loosing my hopes to get any kind of answer from "-hackers". I suppose the lack of answers was due to the way I wrote my mail. At that moment I supposed that at least someone reminded the 2WRS technique and possible benefits described into previous posts. I think I was wrong, so I'll write it once again hoping meanwhile to get some suggestions on: HOWTO write a mail to which "-hackers" will give an answer :) hehehehe Thanks for your attention. Manolo. -------------------------------------------------- From: "Decibel!" <decibel@decibel.org> Sent: Tuesday, January 08, 2008 12:34 AM To: "Manolo _" <mac_man2005@hotmail.it> Cc: <pgsql-hackers@postgresql.org> Subject: Re: [HACKERS] Implementing Sorting Refinements > You'll get better response if you don't hijack threads. :) Also, there's > nothing in here that describes what the benefits of this change are. > > On Jan 1, 2008, at 2:09 PM, Manolo _ wrote: > >> >> Hi to all. >> >> This mail is aimed at asking some suggestion to face PostgreSQL (PG) >> development to implement a refinement to PG source code. I'll briefly >> summarize the idea of the "2-Way Replacement Selection" (2WRS) >> refinement, just in case. If you already remember what 2WRS is, you can >> please jump directly to the IMPLEMENTATION ISSUES part of this mail. >> >> >> 2WRS. >> Refinement of the Replacement Selection (RS) technique currently used by >> PG in src/backend/utils/sort/tuplesort.c . >> The 2WRS uses two heaps instead of just one in order to create the >> current (logical) run. Here there are some fundamental points of the >> 2WRS technique: >> - 'qsort' the initial unsorted 'memtuples' array >> - divide the 'memtuples' array into two parts and each of those will be >> managed as a heap >> - one of the heaps will arrange its elements in ascending order, while >> the other heap in descending order >> - each heap will spill its current root in its corresponding run (i.e.: >> we have a run per each of those two heaps), so we are actually creating >> 2 physical current runs >> - those 2 current physical runs could "theoretically" be merged into the >> same logical run, actually we can make 'mergesort' think they do belong >> to the same physical run. That reduces the number of comparisons >> 'mergesort' has to do at each merge step (that means less seek delay >> time on mass storage). We can also think the average length of logical >> runs produced by 2WRS will probably be greater than the average length >> produced by RS (again less seek delay time: the longer each run the less >> number of final runs produced, that means the less number of comparisons >> at each 'mergesort' step). >> >> >> IMPLEMENTATION ISSUES. >> Where to place those heaps? >> 1) I think that both heaps could be arranged on the same 'memtuples' >> array. That allows easily subsequent resizing those heaps according to >> their effective use or according to some heuristic, without reallocating >> memory. >> >> How to arrange those heaps? >> 2a) It's convenient to arrange those heaps "root to root". That is >> arranging those heaps with their roots toward the center of 'memtuples' >> (in a way we can say they lay "face to face", or "root to root" as said >> before) while their leaves lay towards the extreme indexes of the >> 'memtuples' array (that is the last leaf of one heap will lay at index >> 0, the last leaf of the other heap laying at index memtupsize-1. This >> arrangement prevents overlapping elements between those physical runs >> associated to the same current logical run. >> PRO: once we qsort memtuples and divide it into 2 parts we already get >> those two heaps, no need to build them. >> CONTRA: ??? >> >> 2b) As in 2a) but arranging heaps "leaf to leaf", that is their roots >> will lay at the extreme indexes of 'memtuples' while their leaves >> towards the center of the 'memtuples' array. >> Or even start building heaps as soon as we get initial elements, instead >> of qsort the whole 'memtuples' array. >> Any PRO/CONTRA compared to 2a)??? >> >> Current run numbers >> I think I should duplicate the 'int currentRun' variable in the >> Tuplesortstate struct. I'll replace it with a 'int currentRunUP' and >> 'int currentRunDOWN' variables in order to distinguish those two >> physical runs associated to those 2 heaps. In this case I will give a >> run number (max{currentRunUP,currentRunDOWN} + 1) to elements not >> belonging to the current logical run. I suppose no need to touch 'long >> availMem' nor 'long allowedMem' variables nor any others. >> >> Heap functions >> I will duplicate all the heap management functions in order to adapt >> them to the kind of heap they should be applied to (for example, the >> tuplesort_heap_siftup function should be replaced with >> tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions). >> >> Merge Plan >> This technique would use a sort of "merge plan" to instruct mergesort on >> how to use those physical runs. Actually mergesort should consider at >> first "odd runs" before "pair runs". That is, for example, mergesort >> should start merging runs with run number 1,3,5,7,... and when run >> number X terminates start considering run number X+1. Obviously that >> doesn't need any merge plan, but I remember someone else (as Simon >> Riggs) was interested in sorting improvements so it could be a good >> thing to know if I should consider any conventions or paramethers in >> order to possibly create that merge plan. >> >> >> DEVELOPMENT CONTEXT >> I preferred to use the "last stable release" at the moment, that is 8.2. >> >> >> Any comment/suggestion/advice ? >> >> Thanks for your attention and for your time. >> Regards, Manolo. >> _________________________________________________________________ >> Express yourself instantly with MSN Messenger! Download today it's FREE! >> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org >> > > -- > Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org > Give your computer some brain candy! www.distributed.net Team #1828 > > >
On Jan 8, 2008 1:04 AM, <mac_man2005@hotmail.it> wrote: > Well, sorry for hijacking... ummm how did I do that? > > Anyway I'll thank you for giving a "sign of life" when I was almost loosing > my hopes to get any kind of answer from "-hackers". Don't forget that we're just a few days/weeks of 8.3 release so the attention is a bit focused on it at the moment (and I'm not speaking of the security releases of today). Don't feel disappointed about the lack of attention you're suffering at the moment, just post your proposal again after 8.3 release, explain what you plan to do and why and perhaps you'll have the time to write a mock-up and get some numbers to prove your point before that. That could help too. Keep up the good work. Regards, -- Guillaume
On Tue, 08 Jan 2008, mac_man2005@hotmail.it wrote: > Well, sorry for hijacking... ummm how did I do that? You replied to a post instead of creating a new, unrelated e-mail. It is different. Just try to use threaded mode of your e-mail client and you'll get the idea. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh