Tom, Robert,
thank you.
Now it is clearer how space on tapes is recycled.
I tried to follow Robert's example but storing one tape per separate file.
Read in the first block of each run (hosted by separate tapes and so by
separate files) and output them into extra storage, wherever this extra
storage is.
Again, those first input blocks are now garbage: is it correct?
In this case, what happens when trying to recycle those garbage blocks
by hosting the result of merging the second block of each run?
Il 18/06/2010 23:29, Robert Haas ha scritto:
> On Fri, Jun 18, 2010 at 3:46 PM, mac_man2005@hotmail.it
> <mac_man2005@hotmail.it> wrote:
>
>> Which is the difference between having more than one tape into a file and
>> having one tape per file?
>>
> It makes it easier to recycle space a little at a time. Suppose
> you're merging two runs of 100 blocks each. You read in a block from
> each run and write out two output blocks. Now that you've done that,
> the first block of each of the input runs is garbage and can be
> recycled - but if the input runs and the output run are in three
> separate files, there's no easy way to do that. You can truncate a
> file (and throw away the end) but there's no easy way to throw away
> the BEGINNING of a file. So you'll probably have to hold on to the
> entirety of both inputs until you've written the entirety of the
> output.
>
> On the other hand, suppose you have all the blocks in one big file.
> The first input run is in blocks 1-100; the second is in blocks
> 101-200. You can read blocks 1 and 101, say, and write the results to
> blocks 201 and 202, using extra storage, but only a little bit. When
> you then read blocks 2 and 102, you write the results to blocks 1 and
> 100, which are no longer needed, because you've already merged them.
> When you get done with that, blocks 2 and 102 are now no longer needed
> and can be used to write the next part of the output. Of course, you
> have to keep track of which order to reread the blocks in when the
> sort is done: 201, 202, 1, 101, ... but that's a manageable problem.
>
>