Home > mailing lists

Re: qsort again (was Re: [PERFORM] Strange Create - Mailing list pgsql-hackers

From	Martijn van Oosterhout
Subject	Re: qsort again (was Re: [PERFORM] Strange Create
Date	February 16, 2006 13:00:06
Msg-id	20060216165959.GH26127@svana.org Whole thread Raw
In response to	Re: qsort again (was Re: [PERFORM] Strange Create (Ron <rjpeace@earthlink.net>)
List	pgsql-hackers

Tree view

On Thu, Feb 16, 2006 at 11:32:55AM -0500, Ron wrote:
> At 10:52 AM 2/16/2006, Ron wrote:
> >In fact we can do better.
> >Using hash codes or what-not to map datums to keys and then sorting
> >just the keys and the pointers to those datums followed by an
> >optional final pass where we do the actual data movement is also a
> >standard technique for handling large data structures.

Or in fact required if the Datums are not all the same size, which is
the case in PostgreSQL.

> I thought some follow up might be in order here.
>
> Let's pretend that we have the typical DB table where rows are ~2-4KB
> apiece.  1TB of storage will let us have 256M-512M rows in such a table.
>
> A 32b hash code can be assigned to each row value such that only
> exactly equal rows will have the same hash code.
> A 32b pointer can locate any of the 256M-512M rows.

That hash code is impossible the way you state it, since the set of
strings is not mappable to a 32bit integer. You probably meant that a
hash code can be assigned such that equal rows have equal hashes (drop
the only).

> Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b+32b=
> 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an optional
> pass to rearrange the actual rows if we so wish.
>
> We get the same result while only examining and manipulating 1/50 to
> 1/25 as much data during the sort.

But this is what we do now. The tuples are loaded, we sort an array of
pointers, then we write the output. Except we don't have the hash, so
we require access to the 1TB of data to do the actual comparisons. Even
if we did have the hash, we'd *still* need access to the data to handle
tie-breaks.

That's why your comment about moves always being more expensive than
compares makes no sense. A move can be acheived simply by swapping two
pointers in the array. A compare actually needs to call all sorts of
functions. If and only if we have functions for every data type to
produce an ordered hash, we can optimise sorts based on single
integers.

For reference, look at comparetup_heap(). It's just 20 lines, but each
function call there expands to maybe a dozen lines of code. And it has
a loop. I don't think we're anywhere near the stage where locality of
reference makes much difference.

We very rarely needs the tuples actualised in memory in the required
order, just the pointers are enough.

Have a ncie day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgsql-hackers by date:

From: Ron
Date: 16 February 2006, 12:33:37
Subject: Re: qsort again (was Re: [PERFORM] Strange Create

From: Tom Lane
Date: 16 February 2006, 13:15:17
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index

Re: qsort again (was Re: [PERFORM] Strange Create - Mailing list pgsql-hackers

Previous

Next