Re: Question about maxTapes & selectnewtape & dumptuples - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Question about maxTapes & selectnewtape & dumptuples
Date
Msg-id 72fe610e-ecfb-48d4-b966-083c5bd20c9e@iki.fi
Whole thread Raw
In response to Question about maxTapes & selectnewtape & dumptuples  (Andy Fan <zhihuifan1213@163.com>)
Responses Re: Question about maxTapes & selectnewtape & dumptuples  (Andy Fan <zhihuifan1213@163.com>)
List pgsql-hackers
On 30/06/2024 12:48, Andy Fan wrote:
> merge sorts requires all the tuples in each input are pre-sorted (1),
> and in tuplesort.c, when the workmem is full, we dumptuples into
> destTape. (2).  it looks to me that we need a unlimited number of Tapes
> if the number of input tuples is big enough. However we set a maxTapes
> in 'inittapes' and we may use a pre-used tapes for storing new tuples in
> 'selectnewtape'. Wouldn't this break the prerequisite (1)?
> 
> selectnewtape(Tuplesortstate *state)
> {
>     if (state->nOutputTapes < state->maxTapes)
>     {..}
>      else
>      {
>         /*
>          * We have reached the max number of tapes.  Append to an existing
>          * tape.
>          */
>         state->destTape = state->outputTapes[state->nOutputRuns % state->nOutputTapes];
>         state->nOutputRuns++;
>     }
> }
> 
> 
> for example, at the first use of outputTapes[x], it stores (1, 3, 5, 7),
> and later (2, 4, 6, 8) are put into it.  so the overall of (1, 3, 5, 7,
> 2, 4, 6, 8) are not sorted? Where did I go wrong?

There's a distinction between "runs" and "tapes". Each "run" is sorted, 
but there can be multiple runs on a tape. In that case, multiple merge 
passes are needed. Note the call to markrunend() between the runs. In 
your example, the tape would look like (1, 3, 5, 7, <end marker>, 2, 4, 
6, 8).

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Question about maxTapes & selectnewtape & dumptuples
Next
From: "Joel Jacobson"
Date:
Subject: Re: Optimize numeric.c mul_var() using the Karatsuba algorithm