Re: qsort again (was Re: [PERFORM] Strange Create Index - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: qsort again (was Re: [PERFORM] Strange Create Index
Date
Msg-id 20060217163123.GE9254@svana.org
Whole thread Raw
In response to Re: qsort again (was Re: [PERFORM] Strange Create Index  (Scott Lamb <slamb@slamb.org>)
List pgsql-hackers
On Fri, Feb 17, 2006 at 08:18:41AM -0800, Scott Lamb wrote:
> On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote:
> >Data types which could probably provide a useful function for f
> >would be
> >int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).
>
> ...and with some work, floats (I think just the exponent would work,
> if nothing else). bytea. Probably just about anything.
>
> Interesting. If you abandon the idea that collisions should be
> impossible (they're not indexes) or extremely rare (they're not
> hashes), it's pretty easy to come up with a decent hint to avoid a
> lot of dereferences.

Yep, pretty much for any datatype you create a mapping function to map
it to a signed int32. All you have to guarentee is that f(a) > f(b)
implies that a > b. Only if f(a) == f(b) do you need to compare a and
b.

You then change the sorting code to have an array of (Datum,int32)
(ouch, double the storage) where the int32 is the f(Datum). And in the
comparison routines you first check the int32. If they give an order
you're done. On match you do the full comparison.

For integer types (int2,int4,int8,oid) the conversion is straight
forward. For float you'd use the exponent and the first few bits of the
mantissa. For strings you'd have to bail, or use a strxfrm equivalent.
NULL would be INT_MAX pr INT_MIN depending on where you want it. Thing
is, even if you don't have such a function and always return zero, the
results will still be right.

Not a new idea, but it would be very nice to implement. If would
produce nice speedups for type where comparisons are expensive. And
more importantly, the bulk of the comparisons can be moved inline and
make the whole cache-friendlyness discussed here much more meaningful.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgsql-hackers by date:

Previous
From: Scott Lamb
Date:
Subject: Re: qsort again (was Re: [PERFORM] Strange Create Index
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Updated email signature