Re: [PERFORM] A Better External Sort? - Mailing list pgsql-hackers

From Ron Peacetree
Subject Re: [PERFORM] A Better External Sort?
Date
Msg-id 6812451.1128049039096.JavaMail.root@elwamui-polski.atl.sa.earthlink.net
Whole thread Raw
In response to [PERFORM] A Better External Sort?  (Ron Peacetree <rjpeace@earthlink.net>)
List pgsql-hackers
>From: Zeugswetter Andreas DAZ SD <ZeugswetterA@spardat.at>
>Sent: Sep 29, 2005 9:28 AM
>Subject: RE: [HACKERS] [PERFORM] A Better External Sort?
>
>>In my original example, a sequential scan of the 1TB of 2KB
>>or 4KB records, => 250M or 500M records of data, being sorted
>>on a binary value key will take ~1000x more time than reading
>>in the ~1GB Btree I described that used a Key+RID (plus node
>>pointers) representation of the data.
>
>Imho you seem to ignore the final step your algorithm needs of
>collecting the data rows. After you sorted the keys the collect
>step will effectively access the tuples in random order (given a
>sufficiently large key range).
>
"Collecting" the data rows can be done for each RAM buffer full of
of data in one pass through RAM after we've built the Btree.  Then
if desired those data rows can be read out to HD in sorted order
in essentially one streaming burst.  This combination of index build
+ RAM buffer rearrangement + write results to HD can be repeat
as often as needed until we end up with an overall Btree index and
a set of sorted sublists on HD.  Overall HD IO for the process is only
two effectively sequential passes through the data.

Subsequent retrieval of the sorted information from HD can be
done at full HD streaming speed and whatever we've decided to
save to HD can be reused later if we desire.

Hope this helps,
Ron

pgsql-hackers by date:

Previous
From: Ron Peacetree
Date:
Subject: Re: [PERFORM] A Better External Sort?
Next
From: Ron Peacetree
Date:
Subject: Re: [PERFORM] A Better External Sort?