Thread: Polyphase Merge
I'm trying to refine the sorting module of tuplesort.c
During run creations I use two heaps instead of just one (yeah, it's still me... the one of the two heaps still trying to get some answer/help from -hackers)
Those two runs are built in a way such that if we would concatenate one of them to the other one red upside down, they will still form a run (recall that following Knuth, a "run" is a sorted sequence of data). There are a lot of possibility that with that refinement "logical runs" could be longer than ordinary runs built by the ordinary replacement selection. Remark we build runs: logical runs it's just a concept used to understand why we build runs that way.
ISSUES
a) how to distribute logical runs (that is both of its physical runs) into tapes?
b) one of the 2 physical runs of the logical run is to be red upside down while merging: how to efficiently do it?
Well, that's all for now.
Hope you can please give to me few seconds of you precious time. That would allow me to go on developing my refinement. Or at least tell me don't bother till the day next PostgreSQL release is out (when will it be released?) or don't bother anymore since nobody will ever answer to me.
Thanks for your attention.
Manolo.
On Mon, Jan 21, 2008 at 07:42:24PM +0100, mac_man2005@hotmail.it wrote: > During run creations I use two heaps instead of just one (yeah, > it's still me... the one of the two heaps still trying to get some > answer/help from -hackers) Hi again! > ISSUES > a) how to distribute logical runs (that is both of its physical runs) > into tapes? > b) one of the 2 physical runs of the logical run is to be red upside > down while merging: how to efficiently do it? It's really up to you to find answers to these questions, especially the first one. Once you've designed an efficient algorithm then the second point (which I'm interpreting as how you'd go about changing tuplestore(?) so that things can be read in reverse order) should just drop out as an implementation detail :) I'm guessing you'll end up not reading the store in reverse order but arranging things differently---it'll be interesting to see. What does your current code look like and how have you solved it there? If you've already written it then you'll need to be much more specific with your questions about integrating it into PG. Sam
Sam Mason <sam@samason.me.uk> writes: > It's really up to you to find answers to these questions, especially > the first one. Once you've designed an efficient algorithm then the > second point (which I'm interpreting as how you'd go about changing > tuplestore(?) so that things can be read in reverse order) should > just drop out as an implementation detail :) I'm guessing you'll > end up not reading the store in reverse order but arranging things > differently---it'll be interesting to see. I agree --- having to read the run back from external storage, only to write it out again with no further useful work done on it, sounds like a guaranteed loser. To make this work you'll need some kind of ju-jitsu rearrangement that logically puts the run where it needs to go without physically moving any data. regards, tom lane
On Mon, Jan 21, 2008 at 04:13:32PM -0500, Tom Lane wrote: > Sam Mason <sam@samason.me.uk> writes: > > It's really up to you to find answers to these questions, especially > > the first one. Once you've designed an efficient algorithm then the > > second point (which I'm interpreting as how you'd go about changing > > tuplestore(?) so that things can be read in reverse order) should > > just drop out as an implementation detail :) I'm guessing you'll > > end up not reading the store in reverse order but arranging things > > differently---it'll be interesting to see. > > I agree --- having to read the run back from external storage, only to > write it out again with no further useful work done on it, sounds like > a guaranteed loser. Manolo's idea (wherever it came from) will generate longer runs in some specific non-random data distributions (i.e. hopefully real life), but it'll obviously only be a net win if this is offset by not having to do any extra work reordering data. It would be great if it could be got to work! > To make this work you'll need some kind of ju-jitsu > rearrangement that logically puts the run where it needs to go without > physically moving any data. yup, that's the fun part :) Sam
-------------------------------------------------- From: "Tom Lane" <tgl@sss.pgh.pa.us> Sent: Monday, January 21, 2008 10:13 PM To: "Sam Mason" <sam@samason.me.uk> Cc: <pgsql-hackers@postgresql.org> Subject: Re: [HACKERS] Polyphase Merge > > I agree --- having to read the run back from external storage, only to > write it out again with no further useful work done on it, sounds like > a guaranteed loser. To make this work you'll need some kind of ju-jitsu > rearrangement that logically puts the run where it needs to go without > physically moving any data. I'm not going to write it back with no useful work on it. I should just write them in reverse order during run formation (ju-jitsu couldn't help me in this case) or read them in reverse order while merging (ju-jitsu may help... the point is that I'm not so good in ju-jitsu). An idea could be managing a list of pointers to runs contained into tapes. Any comment? > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >