Thread: Question about maxTapes & selectnewtape & dumptuples

Question about maxTapes & selectnewtape & dumptuples

From
Andy Fan
Date:
Hi,

merge sorts requires all the tuples in each input are pre-sorted (1),
and in tuplesort.c, when the workmem is full, we dumptuples into
destTape. (2).  it looks to me that we need a unlimited number of Tapes
if the number of input tuples is big enough. However we set a maxTapes
in 'inittapes' and we may use a pre-used tapes for storing new tuples in
'selectnewtape'. Wouldn't this break the prerequisite (1)? 

selectnewtape(Tuplesortstate *state)
{
    if (state->nOutputTapes < state->maxTapes)
    {..}
    else
    {
        /*
         * We have reached the max number of tapes.  Append to an existing
         * tape.
         */
        state->destTape = state->outputTapes[state->nOutputRuns % state->nOutputTapes];
        state->nOutputRuns++;
    }
}


for example, at the first use of outputTapes[x], it stores (1, 3, 5, 7),
and later (2, 4, 6, 8) are put into it.  so the overall of (1, 3, 5, 7,
2, 4, 6, 8) are not sorted? Where did I go wrong?

-- 
Best Regards
Andy Fan




Re: Question about maxTapes & selectnewtape & dumptuples

From
Heikki Linnakangas
Date:
On 30/06/2024 12:48, Andy Fan wrote:
> merge sorts requires all the tuples in each input are pre-sorted (1),
> and in tuplesort.c, when the workmem is full, we dumptuples into
> destTape. (2).  it looks to me that we need a unlimited number of Tapes
> if the number of input tuples is big enough. However we set a maxTapes
> in 'inittapes' and we may use a pre-used tapes for storing new tuples in
> 'selectnewtape'. Wouldn't this break the prerequisite (1)?
> 
> selectnewtape(Tuplesortstate *state)
> {
>     if (state->nOutputTapes < state->maxTapes)
>     {..}
>      else
>      {
>         /*
>          * We have reached the max number of tapes.  Append to an existing
>          * tape.
>          */
>         state->destTape = state->outputTapes[state->nOutputRuns % state->nOutputTapes];
>         state->nOutputRuns++;
>     }
> }
> 
> 
> for example, at the first use of outputTapes[x], it stores (1, 3, 5, 7),
> and later (2, 4, 6, 8) are put into it.  so the overall of (1, 3, 5, 7,
> 2, 4, 6, 8) are not sorted? Where did I go wrong?

There's a distinction between "runs" and "tapes". Each "run" is sorted, 
but there can be multiple runs on a tape. In that case, multiple merge 
passes are needed. Note the call to markrunend() between the runs. In 
your example, the tape would look like (1, 3, 5, 7, <end marker>, 2, 4, 
6, 8).

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Question about maxTapes & selectnewtape & dumptuples

From
Andy Fan
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:

> On 30/06/2024 12:48, Andy Fan wrote:

>> for example, at the first use of outputTapes[x], it stores (1, 3, 5,
>> 7),
>> and later (2, 4, 6, 8) are put into it.  so the overall of (1, 3, 5, 7,
>> 2, 4, 6, 8) are not sorted? Where did I go wrong?
>
> There's a distinction between "runs" and "tapes". Each "run" is sorted,
> but there can be multiple runs on a tape. In that case, multiple merge
> passes are needed. Note the call to markrunend() between the runs. In
> your example, the tape would look like (1, 3, 5, 7, <end marker>, 2, 4,
> 6, 8).

I see, Thank you for your answer!

-- 
Best Regards
Andy Fan