Home > mailing lists

Re: [PERFORM] A Better External Sort? - Mailing list pgsql-hackers

From	Ron Peacetree
Subject	Re: [PERFORM] A Better External Sort?
Date	September 30, 2005 13:08:05
Msg-id	6812451.1128049039096.JavaMail.root@elwamui-polski.atl.sa.earthlink.net Whole thread Raw
In response to	[PERFORM] A Better External Sort? (Ron Peacetree <rjpeace@earthlink.net>)
List	pgsql-hackers

Tree view

>From: Zeugswetter Andreas DAZ SD <ZeugswetterA@spardat.at>
>Sent: Sep 29, 2005 9:28 AM
>Subject: RE: [HACKERS] [PERFORM] A Better External Sort?
>
>>In my original example, a sequential scan of the 1TB of 2KB
>>or 4KB records, => 250M or 500M records of data, being sorted
>>on a binary value key will take ~1000x more time than reading
>>in the ~1GB Btree I described that used a Key+RID (plus node
>>pointers) representation of the data.
>
>Imho you seem to ignore the final step your algorithm needs of
>collecting the data rows. After you sorted the keys the collect
>step will effectively access the tuples in random order (given a
>sufficiently large key range).
>
"Collecting" the data rows can be done for each RAM buffer full of
of data in one pass through RAM after we've built the Btree.  Then
if desired those data rows can be read out to HD in sorted order
in essentially one streaming burst.  This combination of index build
+ RAM buffer rearrangement + write results to HD can be repeat
as often as needed until we end up with an overall Btree index and
a set of sorted sublists on HD.  Overall HD IO for the process is only
two effectively sequential passes through the data.

Subsequent retrieval of the sorted information from HD can be
done at full HD streaming speed and whatever we've decided to
save to HD can be reused later if we desire.

Hope this helps,
Ron

pgsql-hackers by date:

From: Ron Peacetree
Date: 30 September 2005, 13:03:32
Subject: Re: [PERFORM] A Better External Sort?

From: Ron Peacetree
Date: 30 September 2005, 13:09:51
Subject: Re: [PERFORM] A Better External Sort?

Re: [PERFORM] A Better External Sort? - Mailing list pgsql-hackers

Previous

Next