Re: Review: GiST support for UUIDs - Mailing list pgsql-hackers

From Ildus Kurbangaliev
Subject Re: Review: GiST support for UUIDs
Date
Msg-id 20151223191053.00ab94e1@lp
Whole thread Raw
In response to Re: Review: GiST support for UUIDs  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: Review: GiST support for UUIDs  (Paul Jungwirth <pj@illuminatedcomputing.com>)
List pgsql-hackers
On Tue, 15 Sep 2015 09:03:00 +0300
Teodor Sigaev <teodor@sigaev.ru> wrote:

> Paul Jungwirth wrote:
> >> Or something like this in pseudocode:
> >>
> >> numeric = int8_numeric(*(uint64 *)(&i->data[0])) *
> >> int8_numeric(MAX_INT64) + int8_numeric(*(uint64 *)(&i->data[8]))
> >
> > This is more like what I was hoping for, rather than converting to a
> > string and back. Would you mind confirming for me: int8_numeric
> > turns an int64 into an arbitrary-precision numeric Datum? So there
> > is no overflow risk here?
> Sure, no risk. Numeric precision is limited 1000 digits with
> magnitude 1000
>
>
> >
> > But it looks like int8_numeric takes a *signed* integer. Isn't that
> > a problem? I suppose I could get it working though by jumping
> > through some hoops.
> signed vs unsigned problem does not exist actually, because of
> precision of numeric is much better than we need and presence of
> numeric_abs.
>
>
> > Yes, but that seems like an unrealistic concern. Even "only" 2^64 is
> > 18446744073709551616 records. Or another way of thinking about it,
> > if 64
> :) "only"
>
> > bits (or 32) is a good enough penalty input for all the other data
> > types, why not for UUIDs? Keep in mind type 3-5 UUIDs should be
> > evenly distributed. Perhaps we could use the bottom half (instead
> > of the top) to ensure even distribution for type 1 and 2 too.
> it must be. But UUID could be taken for unknown source and we can't
> predict distribution. I believe pg generates them correctly, but
> other generators could be not so good.
>
> > It seems to me that using only the top half should be okay, but if
> > you believe it's not I'll go with the int8_numeric approach (in
> > three chunks instead of two to work around signed-vs-unsigned).
> Yes, I believe. It is not good case when we can ruin index
> performance with special set of value.
>
> Some difficulty  which I see is how to transform numeric penalty to
> double as it requires by GiST. May be, penalty/(INT_MAX64*INT_MAX64 +
> INT_MAX64)? Keep in mind, that penalty is how range will be enlarged
> after range union, so minimal penalty is 0 and maximum is
> 0xffffffffffffffffffffffffffffffff
>

There is a more improved version of the patch. Main idea is to present
uuid as two uint64 values, and make comparisons and penalty calculation
based on these values. This approach is much faster than using memcmp
for uuid comparisons.

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PROPOSAL] Backup and recovery of pg_statistic
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench --latency-limit option