Thread: Polyphase Merge

Polyphase Merge

From
Date:
I'm trying to refine the sorting module of tuplesort.c
 
During run creations I use two heaps instead of just one (yeah, it's still me... the one of the two heaps still trying to get some answer/help from -hackers)
 
Those two runs are built in a way such that if we would concatenate one of them to the other one red upside down, they will still form a run (recall that following Knuth, a "run" is a sorted sequence of data). There are a lot of possibility that with that refinement "logical runs" could be longer than ordinary runs built by the ordinary replacement selection. Remark we build runs: logical runs it's just a concept used to understand why we build runs that way.
 
ISSUES
a) how to distribute logical runs (that is both of its physical runs) into tapes?
b) one of the 2 physical runs of the logical run is to be red upside down while merging: how to efficiently do it?
 
Well, that's all for now.
 
Hope you can please give to me few seconds of you precious time. That would allow me to go on developing my refinement. Or at least tell me don't bother till the day next PostgreSQL release is out (when will it be released?) or don't bother anymore since nobody will ever answer to me.
 
Thanks for your attention.
Manolo.

Re: Polyphase Merge

From
Sam Mason
Date:
On Mon, Jan 21, 2008 at 07:42:24PM +0100, mac_man2005@hotmail.it wrote:
> During run creations I use two heaps instead of just one (yeah,
> it's still me... the one of the two heaps still trying to get some
> answer/help from -hackers)

Hi again!

> ISSUES
> a) how to distribute logical runs (that is both of its physical runs)
> into tapes?
> b) one of the 2 physical runs of the logical run is to be red upside
> down while merging: how to efficiently do it?

It's really up to you to find answers to these questions, especially
the first one.  Once you've designed an efficient algorithm then the
second point (which I'm interpreting as how you'd go about changing
tuplestore(?) so that things can be read in reverse order) should
just drop out as an implementation detail :) I'm guessing you'll
end up not reading the store in reverse order but arranging things
differently---it'll be interesting to see.

What does your current code look like and how have you solved it there?
If you've already written it then you'll need to be much more specific
with your questions about integrating it into PG.

 Sam


Re: Polyphase Merge

From
Tom Lane
Date:
Sam Mason <sam@samason.me.uk> writes:
> It's really up to you to find answers to these questions, especially
> the first one.  Once you've designed an efficient algorithm then the
> second point (which I'm interpreting as how you'd go about changing
> tuplestore(?) so that things can be read in reverse order) should
> just drop out as an implementation detail :) I'm guessing you'll
> end up not reading the store in reverse order but arranging things
> differently---it'll be interesting to see.

I agree --- having to read the run back from external storage, only to
write it out again with no further useful work done on it, sounds like
a guaranteed loser.  To make this work you'll need some kind of ju-jitsu
rearrangement that logically puts the run where it needs to go without
physically moving any data.
        regards, tom lane


Re: Polyphase Merge

From
Sam Mason
Date:
On Mon, Jan 21, 2008 at 04:13:32PM -0500, Tom Lane wrote:
> Sam Mason <sam@samason.me.uk> writes:
> > It's really up to you to find answers to these questions, especially
> > the first one.  Once you've designed an efficient algorithm then the
> > second point (which I'm interpreting as how you'd go about changing
> > tuplestore(?) so that things can be read in reverse order) should
> > just drop out as an implementation detail :) I'm guessing you'll
> > end up not reading the store in reverse order but arranging things
> > differently---it'll be interesting to see.
> 
> I agree --- having to read the run back from external storage, only to
> write it out again with no further useful work done on it, sounds like
> a guaranteed loser.  

Manolo's idea (wherever it came from) will generate longer runs in some
specific non-random data distributions (i.e. hopefully real life), but
it'll obviously only be a net win if this is offset by not having to do
any extra work reordering data.  It would be great if it could be got to
work!

> To make this work you'll need some kind of ju-jitsu
> rearrangement that logically puts the run where it needs to go without
> physically moving any data.

yup, that's the fun part :)

 Sam


Re: Polyphase Merge

From
Date:

--------------------------------------------------
From: "Tom Lane" <tgl@sss.pgh.pa.us>
Sent: Monday, January 21, 2008 10:13 PM
To: "Sam Mason" <sam@samason.me.uk>
Cc: <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Polyphase Merge

>
> I agree --- having to read the run back from external storage, only to
> write it out again with no further useful work done on it, sounds like
> a guaranteed loser.  To make this work you'll need some kind of ju-jitsu
> rearrangement that logically puts the run where it needs to go without
> physically moving any data.

I'm not going to write it back with no useful work on it. I should just 
write them in reverse order during run formation (ju-jitsu couldn't help me 
in this case) or read them in reverse order while merging (ju-jitsu may 
help... the point is that I'm not so good in ju-jitsu).

An idea could be managing a list of pointers to runs contained into tapes. 
Any comment?


> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>