Thread: GiST support for UUIDs

GiST support for UUIDs

From
Paul A Jungwirth
Date:
Hello,

I'm interested in adding GiST support for the UUID column type from
the uuid-ossp extension. This has been requested and attempted before:
   http://dba.stackexchange.com/questions/83604/optimizing-postgres-row-overlap-constraints-involving-uuids-and-gist
http://www.postgresql.org/message-id/flat/CAH3i69njJ9AX1fzHwt6uoUzCMBqnaDBwhmAPhRQzzLWifb2WOA@mail.gmail.com#CAH3i69njJ9AX1fzHwt6uoUzCMBqnaDBwhmAPhRQzzLWifb2WOA@mail.gmail.com

I've used Postgres for a long time, but I've only dabbled a bit in the
source code. So I'm curious where this change would go? The btree_gist
extension? The uuid-ossp extension? Somewhere else?

If anyone has any advice I'd be grateful to hear it.

Thanks,
Paul



Re: GiST support for UUIDs

From
Tom Lane
Date:
Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> I'm interested in adding GiST support for the UUID column type from
> the uuid-ossp extension. This has been requested and attempted before:
> I've used Postgres for a long time, but I've only dabbled a bit in the
> source code. So I'm curious where this change would go? The btree_gist
> extension? The uuid-ossp extension? Somewhere else?

btree_gist, I'd think, assuming you are only thinking of btree-equivalent
semantics and not anything more outre.  uuid-ossp is only concerned with
UUID generation algorithms, not with storage or indexing.  btree_gist
already has a bunch of infrastructure for this type of GiST opclass,
so it should be pretty straightforward to add it there (at least up to
making it work; the overhead of an extension version bump will probably
exceed the useful payload :-( ).
        regards, tom lane



Re: GiST support for UUIDs

From
Paul A Jungwirth
Date:
On Thu, Jun 25, 2015 at 8:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
>> I'm interested in adding GiST support for the UUID column type
>> . . . . So I'm curious where this change would go?
> btree_gist, I'd think

Okay, thank you for your answer! I was worried about the effects of
having btree_gist depend on uuid-ossp. People won't have to say
`CREATE EXTENSION "uuid-ossp"` if they only want `btree_gist`, right?

> the overhead of an extension version bump will probably
> exceed the useful payload :-(

Sorry to put more work on your plate. :-) I'm trying to pick something
easy to get my feet wet.

Yours,
Paul



Re: GiST support for UUIDs

From
Tom Lane
Date:
Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> On Thu, Jun 25, 2015 at 8:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
>>> I'm interested in adding GiST support for the UUID column type
>>> . . . . So I'm curious where this change would go?

>> btree_gist, I'd think

> Okay, thank you for your answer! I was worried about the effects of
> having btree_gist depend on uuid-ossp. People won't have to say
> `CREATE EXTENSION "uuid-ossp"` if they only want `btree_gist`, right?

No, the uuid type exists in core.  It's only some creation functions that
are in that extension.

>> the overhead of an extension version bump will probably
>> exceed the useful payload :-(

> Sorry to put more work on your plate. :-) I'm trying to pick something
> easy to get my feet wet.

That work will be on your plate actually ;-).  Read the docs concerning
creation of new extension versions, and perhaps look in our git history
for previous commits that have upgraded contrib extensions.
        regards, tom lane



Re: GiST support for UUIDs

From
Paul A Jungwirth
Date:
Hello,

This patch adds UUID support to the `btree_gist` extension. I've also
submitted it to the September commitfest (hopefully correctly!)

Many people are looking for this feature, e.g.:

    http://dba.stackexchange.com/questions/83604/optimizing-postgres-row-overlap-constraints-involving-uuids-and-gist
    http://stackoverflow.com/questions/22720130/how-to-use-uuid-with-postgresql-gist-index-type
    http://www.postgresql.org/message-id/CAH3i69njJ9AX1fzHwt6uoUzCMBqnaDBwhmAPhRQzzLWifb2WOA@mail.gmail.com
    http://www.postgresql.org/message-id/C59F2565-4753-4C83-BDCD-A0F9430B1638@datafax.com

As seen from those links, indexing UUIDs in a GiST is necessary when
including a UUID as part of an exclusion constraint. Here is my note
to the HACKERS list proposing this addition:

    http://www.postgresql.org/message-id/CA+renyVepHxTO1c7dFbVjP1GYMUc0-3qDNWPN30-noo5MPyaVQ@mail.gmail.com

As discussed there, my changes are restricted to `contrib/btree_gist`.
I pretty much did a copy/paste of the code for intervals, since like
UUIDs they are 16 bytes. In some cases I could simplify, since
sometimes intervals can be less than 16 bytes, but not UUIDs.

This patch also includes tests and new files to bump the extension
version from 1.1 to 1.2.

One possible wart is that because the pg_uuid_t struct is defined in
uuid.c and hence invisible here, I had to repeat its definition (so
that I could include it in the GiST key struct). Alternately I could
move the definition in the core code into uuid.h, but my goal was to
touch only code in contrib. Let me know if you'd rather I made the
struct definition public and I can make that change.

This is my first patch, so my apologies if anything is missing. I went
the guidelines and I think I have everything covered. :-)

Thanks!
Paul Jungwirth

On Thu, Jun 25, 2015 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
>> On Thu, Jun 25, 2015 at 8:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
>>>> I'm interested in adding GiST support for the UUID column type
>>>> . . . . So I'm curious where this change would go?
>
>>> btree_gist, I'd think
>
>> Okay, thank you for your answer! I was worried about the effects of
>> having btree_gist depend on uuid-ossp. People won't have to say
>> `CREATE EXTENSION "uuid-ossp"` if they only want `btree_gist`, right?
>
> No, the uuid type exists in core.  It's only some creation functions that
> are in that extension.
>
>>> the overhead of an extension version bump will probably
>>> exceed the useful payload :-(
>
>> Sorry to put more work on your plate. :-) I'm trying to pick something
>> easy to get my feet wet.
>
> That work will be on your plate actually ;-).  Read the docs concerning
> creation of new extension versions, and perhaps look in our git history
> for previous commits that have upgraded contrib extensions.
>
>                         regards, tom lane

Attachment

Re: GiST support for UUIDs

From
Michael Paquier
Date:
On Wed, Aug 19, 2015 at 1:25 PM, Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> This is my first patch, so my apologies if anything is missing. I went
> the guidelines and I think I have everything covered. :-)

I am moving this patch to next CF, removing Julien Rouhaud and Teodor
Sigaev as reviewers because they have not showed up once on this
thread for reviews.
-- 
Michael



Re: GiST support for UUIDs

From
Adam Brusselback
Date:
So I apologize in advance if I didn't follow the processes exactly, I was going to attempt to review this to move it along, but ran into issues applying the patch cleanly to master.  I fixed the issues I was having applying it, and created a new patch (attached).

Managed to test it out after I got it applied, and it is working as expected for exclusion constraints, as well as normal indexes.  

This test was performed on Windows 10 (64 bit), and Postgres was compiled using MSYS2.
Attachment

Re: GiST support for UUIDs

From
Haribabu Kommi
Date:


On Wed, Nov 2, 2016 at 3:49 AM, Adam Brusselback <adambrusselback@gmail.com> wrote:
So I apologize in advance if I didn't follow the processes exactly, I was going to attempt to review this to move it along, but ran into issues applying the patch cleanly to master.  I fixed the issues I was having applying it, and created a new patch (attached).

Managed to test it out after I got it applied, and it is working as expected for exclusion constraints, as well as normal indexes.  

This test was performed on Windows 10 (64 bit), and Postgres was compiled using MSYS2
 
Thanks for the review. The proposed patch still applies cleanly. If you don't have any 
comments on the patch, please change the patch status as "ready for committer" for
committer's attention. This will help us in smoother operation of commitfest.


Regards,
Hari Babu
Fujitsu Australia

Re: GiST support for UUIDs

From
Tom Lane
Date:
Adam Brusselback <adambrusselback@gmail.com> writes:
> [ btree_gist_uuid_7.patch ]

I spent awhile looking at this.  I have exactly no faith that it won't
crash on alignment-picky hardware, because this declaration:

union pg_uuid_t
{unsigned char data[UUID_LEN];uint64 v64[2];
};

is not the same as the existing pg_uuid_t; it instructs the compiler
that union pg_uuid_t must be double-aligned.  The existing type is
only byte-aligned, and is declared that way in pg_type, which means
that values on-disk are quite likely not to meet the alignment
expectation.

I spent a little bit of time trying to get the patch to crash on a PPC
machine, without success, but the compiler I have on that (gcc 4.0.1) is
very old and is not aggressive about things like optimizing memcpy calls
on supposedly-aligned arguments.  I think a modern gcc version on such
hardware would probably generate code that fails.  (Also, PPC is
big-endian, so some of the at-risk code isn't used; a picky little-endian
machine would have more scope for crashing.  Not real sure, but I think
ARM might be like that.)

What I would suggest is that you forget the union hack and just use
memcmp in all the comparison functions.  It's not likely to be worth
the trouble to try to get those calls to be safely aligned.  The
only place where you need go to any trouble is in uuid_parts_distance,
which could be redesigned to memcpy a not-necessarily-aligned source
into a local uint64[2] array and then byte-swap if needed.

(BTW, considering that we're going to return a float distance that has
only got twenty-something significant bits, do we *really* need to deal
with do-it-yourself int128 arithmetic in uuid_parts_distance?  Color
me skeptical.)

Also, I can't say that I think it's good style to be duplicating the
declaration of pg_uuid_t out of uuid.c.  (And it absolutely, positively,
is vile style to have two different declarations of the same struct in
different files, quite aside from whether they're ABI-compatible or not.)

I don't entirely see the point of making pg_uuid_t an opaque struct when
the only interesting thing about it, namely UUID_LEN, is exposed anyway.
So my inclination would be to just do

typedef struct pg_uuid_t
{unsigned char data[UUID_LEN];
} pg_uuid_t;

in uuid.h and forget the idea of it being opaque.

Also, the patch could be made a good deal smaller (and easier to review)
in the wake of commit 40b449ae8: you no longer need to convert
btree_gist--1.2.sql into btree_gist--1.3.sql, just leave it alone and add
btree_gist--1.2--1.3.sql.  That way we don't have to worry about whether
the upgrade script matches the change in the base script.
        regards, tom lane



Re: GiST support for UUIDs

From
Chris Bandy
Date:
On Mon, Nov 28, 2016 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> What I would suggest is that you forget the union hack and just use
> memcmp in all the comparison functions.  It's not likely to be worth
> the trouble to try to get those calls to be safely aligned.  The
> only place where you need go to any trouble is in uuid_parts_distance,
> which could be redesigned to memcpy a not-necessarily-aligned source
> into a local uint64[2] array and then byte-swap if needed.

Done. I only have one architecture to test on (Linux, Intel x86_64) so
I must defer to others in this regard.

> I don't entirely see the point of making pg_uuid_t an opaque struct when
> the only interesting thing about it, namely UUID_LEN, is exposed anyway.
> So my inclination would be to just do
>
> typedef struct pg_uuid_t
> {
>         unsigned char data[UUID_LEN];
> } pg_uuid_t;
>
> in uuid.h and forget the idea of it being opaque.

Done.

> Also, the patch could be made a good deal smaller (and easier to review)
> in the wake of commit 40b449ae8: you no longer need to convert
> btree_gist--1.2.sql into btree_gist--1.3.sql, just leave it alone and add
> btree_gist--1.2--1.3.sql.  That way we don't have to worry about whether
> the upgrade script matches the change in the base script.

Done.


-- Chris

Attachment

Re: GiST support for UUIDs

From
Tom Lane
Date:
Chris Bandy <bandy.chris@gmail.com> writes:
> [ btree_gist_uuid_8.patch ]

Um ... is there a reason why the penalty logic in gbt_uuid_penalty()
is completely unlike that for any other btree_gist module?

As best I can tell from the (admittedly documentation-free) code
elsewhere, the penalty ought to be proportional to the fraction
by which the original range is expanded; that's not what this
code is doing.  It also seems to be missing the machinations related
to scaling per-column results in a multi-column index.

I'm kind of inclined to change uuid_parts_distance to just convert
a given pg_uuid_t to "double" and then apply penalty_num(), as is
done in gbt_macad_penalty.
        regards, tom lane



Re: GiST support for UUIDs

From
Tom Lane
Date:
I wrote:
> I'm kind of inclined to change uuid_parts_distance to just convert
> a given pg_uuid_t to "double" and then apply penalty_num(), as is
> done in gbt_macad_penalty.

Pushed with that change and some other mostly-cosmetic tweaking.

One not too cosmetic fix was that gbt_uuid_union was declared with the
wrong return type.  That's probably mostly harmless given that core GiST
pays little attention to the declared signatures of the support functions,
but it's not a good thing.  This would've been caught if anyone had
thought to run the amvalidate functions on the updated extension.
I think I will go and put a call to that into the regression tests of
all the contrib modules that define new opclasses.
        regards, tom lane



Re: GiST support for UUIDs

From
Chris Bandy
Date:
On Tue, Nov 29, 2016 at 1:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pushed with that change and some other mostly-cosmetic tweaking.

Thank you for addressing all those issues, Tom! I tested some
exclusion constraints that are interesting to me, and everything seems
to be working well.

-- Chris