Re: Review: GiST support for UUIDs - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: Review: GiST support for UUIDs
Date
Msg-id 55F72EBF.70608@sigaev.ru
Whole thread Raw
In response to Re: Review: GiST support for UUIDs  (Paul Jungwirth <pj@illuminatedcomputing.com>)
Responses Re: Review: GiST support for UUIDs  (Paul Jungwirth <pj@illuminatedcomputing.com>)
List pgsql-hackers

Paul Jungwirth wrote:
>> 2)
>>      static double
>>      uuid2num(const pg_uuid_t *i)
>>      {
>>          return *((uint64 *)i);
>>      }
>>     It isn't looked as correct transformation for me. May be, it's better
>>     to transform to numeric type (UUID looks like a 16-digit hexademical
>> number)
>>     and follow  gbt_numeric_penalty() logic (or even call directly).
>
> Thanks for the review! A UUID is actually not stored as a string of
> hexadecimal digits. (It is normally displayed that way, but with 32
> digits, not 16.) Rather it is stored as an unstructured 128-bit value
> (which in C is 16 unsigned chars). Here is the easy-to-misread
> declaration from src/backend/utils/adt/uuid.c:
Missed number of digit, but nevertheless it doesn't matter for idea. 
Original coding uses only 8 bytes from 16 to compute penalty which could 
cause a problem with index performance. Simple way is just printing each 
4bits  with %02d modifier into string and then make a numeric value with 
a help of numeric_in.

Or something like this in pseudocode:

numeric = int8_numeric(*(uint64 *)(&i->data[0])) * 
int8_numeric(MAX_INT64) + int8_numeric(*(uint64 *)(&i->data[8]))

> The only other 128-bit type I found in btree_gist was Interval. For that
> type we convert to a double using INTERVAL_TO_SEC, then call
> penalty_num. By my read that accepts a similar loss of precision.
Right, but precision of double  is enough to represent 1 century 
interval with 0.00001 seconds accuracy which is enough for  practical 
usage. In UUID case you will take into account only half of value. Of 
course, GiST will work even with penalty function returning constant but 
each scan could become full-index-scan.

>
> If I'm mistaken about 128-bit integer support, let me know, and maybe we
> can do the penalty computation on the whole UUID. Or maybe I should just
> convert the uint64 to a double before calling penalty_num? I don't
> completely understand what the penalty calculation is all about, so I
> welcome suggestions here.

Penalty method calculates how union key will be enlarged if insert will 
be produced in current subtree. It directly affects selectivity of subtree.

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru                                      WWW:
http://www.sigaev.ru/



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Check existency of table/schema for -t/-n option (pg_dump/pg_res
Next
From: Peter Eisentraut
Date:
Subject: Re: exposing pg_controldata and pg_config as functions