Thread: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
"Dawid Kuroczko"
Date:
Hello.

I am currently playing with UUID data type and try to use it to store provided
by third party (Hewlett-Packard) application.  The problem is they
format UUIDs as
0000-0000-0000-0000-0000-0000-0000-0000, so I have to
replace(text,'-','')::uuid for
this kind of data.

Nooow, the case is quite simple and it might be that there are other
applications
formatting UUIDs too liberally.

I am working on a patch to support this format (yes, it is a simple
modification).

And in the meanwhile I would like to ask you what do you think about it?

Cons: Such format is not standard.

Pros: This will help UUID data type adoption. [1]  While good
applications format
their data well, there are others which don't follow standards.  Also
I think it is
easier for a human being to enter UUID as 8 times 4 digits.

Your thoughts?  Should I submit a patch?
  Regards,    Dawid

[1]: My first thought when I received the error message was "hey! this
is not an UUID,
it is too long/too short!", only later did I check that they just
don't format it too well.


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Josh Berkus
Date:
Dawid,

> I am working on a patch to support this format (yes, it is a simple
> modification).

I'd suggest writing a formatting function for UUIDs instead.  Not sure what 
it should be called, though.  "to_char" is pretty overloaded right now.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
"Gevik Babakhani"
Date:
> > I am working on a patch to support this format (yes, it is a simple 
> > modification).

There was a proposal and a discussion regarding how this datatype would be
before I started developing it. We decided to go with the format proposed in
RFC. Unless there is strong case, I doubt any non standard formatting will
be accepted into core. IIRC we where also opposed to support java like
formatted uuid's back then. This is no different.

Regards,
Gevik.



Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> I am working on a patch to support this format (yes, it is a simple
>> modification).

> I'd suggest writing a formatting function for UUIDs instead.

That seems like overkill, if not outright encouragement of people to
come up with yet other nonstandard formats for UUIDs.

I think the question we have to answer is whether we want to be
complicit in the spreading of a nonstandard UUID format.  Even if
we answer "yes" for this HP case, it doesn't follow that we should
create a mechanism for anybody to do anything with 'em.  That way
lies the madness people already have to cope with for datetime
data :-(
        regards, tom lane


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
"Jochem van Dieten"
Date:
On Thu, Feb 28, 2008 at 1:19 AM, Tom Lane wrote:
>  I think the question we have to answer is whether we want to be
>  complicit in the spreading of a nonstandard UUID format.

I don't.

I have patched the UUID input and output functions to be compatible
with Adobe ColdFusion (http://adobe.com/products/coldfusion/ uses
8x-4x-4x-16x), and while I have released them I have deliberately made
the changes incompatible with other formats and will not submit them
to PostgreSQL because I want Adobe to fix ColdFusion to use the
standard format.

Jochem


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Josh Berkus
Date:
Tom,

> I think the question we have to answer is whether we want to be
> complicit in the spreading of a nonstandard UUID format.  Even if
> we answer "yes" for this HP case, it doesn't follow that we should
> create a mechanism for anybody to do anything with 'em.  That way
> lies the madness people already have to cope with for datetime
> data :-(

Well, I guess the question is: if we don't offer some builtin way to render 
non-standard formats built into company products, will those companies fix 
their format or just not use PostgreSQL?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Andrew Sullivan
Date:
On Thu, Feb 28, 2008 at 08:58:01AM -0800, Josh Berkus wrote:

> Well, I guess the question is: if we don't offer some builtin way to render 
> non-standard formats built into company products, will those companies fix 
> their format or just not use PostgreSQL?

Well, there is an advantage that Postgres has that some others don't: you
can extend Postgres pretty easily.  That suggests to me a reason to be
conservative in what we "build in".  This is consistent with the principle,
"Be conservative in what you send, and liberal in what you accept."

A



Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
"Zeugswetter Andreas ADI SD"
Date:
> > Well, I guess the question is: if we don't offer some builtin way to
render
> > non-standard formats built into company products, will those
companies fix
> > their format or just not use PostgreSQL?
>
> Well, there is an advantage that Postgres has that some others don't:
you
> can extend Postgres pretty easily.  That suggests to me a reason to be
> conservative in what we "build in".  This is consistent with the
principle,
> "Be conservative in what you send, and liberal in what you accept."

Well, then the uuid input function should most likely disregard all -,
and accept the 4x-, 8x- formats and the like on input.

Andreas



Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Kenneth Marshall
Date:
On Thu, Feb 28, 2008 at 08:06:46PM +0100, Zeugswetter Andreas ADI SD wrote:
> 
> > > Well, I guess the question is: if we don't offer some builtin way to
> render 
> > > non-standard formats built into company products, will those
> companies fix 
> > > their format or just not use PostgreSQL?
> > 
> > Well, there is an advantage that Postgres has that some others don't:
> you
> > can extend Postgres pretty easily.  That suggests to me a reason to be
> > conservative in what we "build in".  This is consistent with the
> principle,
> > "Be conservative in what you send, and liberal in what you accept."
> 
> Well, then the uuid input function should most likely disregard all -,
> and accept the 4x-, 8x- formats and the like on input.
> 
> Andreas
> 
> 
We need to support the standard definition. People not using the standard
need to know that and explicitly acknowledge that by implementing the
conversion process themselves. Accepting random input puts a performance
hit on everybody following the standard. It is the non-standard users who
should pay that cost. 

Cheers,
Ken


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
James Mansion
Date:
Kenneth Marshall wrote:
> conversion process themselves. Accepting random input puts a performance
> hit on everybody following the standard.
Why is that necessarily the case?

Why not have a liberal parser and a configurable switch that determines 
whether non-standard
forms are liberally accepted, accepted with a logged warning, or rejected?

James




Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Mark Mielke
Date:
James Mansion wrote:
> Kenneth Marshall wrote:
>> conversion process themselves. Accepting random input puts a performance
>> hit on everybody following the standard.
> Why is that necessarily the case?
>
> Why not have a liberal parser and a configurable switch that 
> determines whether non-standard
> forms are liberally accepted, accepted with a logged warning, or 
> rejected?

I recall there being a measurable performance difference between the 
most liberal parser, and the most optimized parser, back when I wrote 
one for PostgreSQL. I don't know how good the one in use for PostgreSQL 
8.3 is. As to whether the cost is noticeable to people or not - that 
depends on what they are doing. The problem is that a UUID is pretty 
big, and parsing it liberally means a loop.

My personal opinion is that this is entirely a philosophical issue, and 
that both sides have merits. There is no reason for PostgreSQL to 
support all formats, not matter how non-standard, for every single type. 
So, why would UUID be special? Because it's easy to do is not 
necessarily a good reason. But then, it's not a bad reason either.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
James Mansion
Date:
Mark Mielke wrote:
> I recall there being a measurable performance difference between the 
> most liberal parser, and the most optimized parser, back when I wrote 
> one for PostgreSQL. I don't know how good the one in use for 
> PostgreSQL 8.3 is. As to whether the cost is noticeable to people or 
> not - that depends on what they are doing. The problem is that a UUID 
> is pretty big, and parsing it liberally means a loop.
>
It just seems odd - I would have thought one would use re2c or ragel to 
generate something and the performance would essentially be O[n] on the 
input length in characters - using either a collection of allowed forms 
or an engine that normalises case and discards the '-' characters 
between any hex pairs.  So yes these would have a control loop.  Is that 
so bad?

Either way its hard to imagine how parsing a string of this length could 
create a measurable performance issue compared to what will happen with 
the value post parse.

James



Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Sam Mason
Date:
On Thu, Feb 28, 2008 at 06:45:18PM -0500, Mark Mielke wrote:
> My personal opinion is that this is entirely a philosophical issue, and 
> that both sides have merits. 

I think it depends on what you're optimising for: initial development
time, maintaince time or run time.

> There is no reason for PostgreSQL to 
> support all formats, not matter how non-standard, for every single type. 
> So, why would UUID be special? Because it's easy to do is not 
> necessarily a good reason. But then, it's not a bad reason either.

I never really buy the "performance" argument.  I much prefer the
correctness argument, if the code is doing something strange I'd prefer
to know about it as soon as possible.  This generally means that I'm
optimising for maintaince.

It's a similar argument to why lots of automatic casts were removed from
8.3, it generally doesn't hurt but the few times it does it's going to
be bad and if you're doing something strange to start with it's better
to be explicit about it.

 Sam


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Tom Lane
Date:
Andrew Sullivan <ajs@crankycanuck.ca> writes:
> "Be conservative in what you send, and liberal in what you accept."

Yeah, I was about to quote that same maxim myself.  I don't have a big
problem with allowing uuid_in to accept known format variants.  (I'm
not sure about allowing a hyphen *anywhere*, because that could lead to
accepting things that weren't meant to be a UUID at all, but this HP
format seems regular enough that that's not a serious objection to it.)

What I was really complaining about was Josh's suggestion that we invent
a function to let users *output* UUIDs in random-format-of-the-week.
I can't imagine much good coming of that.  I think we should keep
uuid_out emitting only the RFC-standardized format.
        regards, tom lane


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Mark Mielke
Date:
James Mansion wrote:
> Mark Mielke wrote:
>> I recall there being a measurable performance difference between the 
>> most liberal parser, and the most optimized parser, back when I wrote 
>> one for PostgreSQL. I don't know how good the one in use for 
>> PostgreSQL 8.3 is. As to whether the cost is noticeable to people or 
>> not - that depends on what they are doing. The problem is that a UUID 
>> is pretty big, and parsing it liberally means a loop.
>>
> It just seems odd - I would have thought one would use re2c or ragel 
> to generate something and the performance would essentially be O[n] on 
> the input length in characters - using either a collection of allowed 
> forms or an engine that normalises case and discards the '-' 
> characters between any hex pairs. 

Instruction level parallelism allows for multiple hex values to be 
processed in parallel, whereas a loop relies on branch prediction and 
speculative load and store? :-)

The liberal version is difficult to unroll. The strict version is easy 
to unroll.

> So yes these would have a control loop.  Is that so bad?
>
> Either way its hard to imagine how parsing a string of this length 
> could create a measurable performance issue compared to what will 
> happen with the value post parse.

I think so too.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
"Tom Dunstan"
Date:
On Fri, Feb 29, 2008 at 9:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Sullivan <ajs@crankycanuck.ca> writes:
>  > "Be conservative in what you send, and liberal in what you accept."
>
>  Yeah, I was about to quote that same maxim myself.  I don't have a big
>  problem with allowing uuid_in to accept known format variants.  (I'm
>  not sure about allowing a hyphen *anywhere*, because that could lead to
>  accepting things that weren't meant to be a UUID at all, but this HP
>  format seems regular enough that that's not a serious objection to it.)

This seems like a good enough opportunity to mention an idea that I
had while/after doing the enum patch. The patch was fairly intrusive
for something that was just adding a type because postgresql isn't
really set up for parameterized types other than core types. The idea
would be to extend the enum mechanism to allow UDTs etc to be
parameterized, and enums would just become one use of the mechanism.
Other obvious examples that I had in mind were allowing variable
lengths for that binary data type with hex IO for e.g. differently
sized checksums that people want, and allowing different formats for
uuids.

So the idea as applied to this case would be to do the enum-style
typesafe thing, ie:

create type coldfusion_uuid as generic_uuid('xxxx-xxxx-xxxx-xxxx');

...then just use that. I had some thoughts about whether it would be
worth allowing inline declarations of such types inside table creation
statements as well, and there are various related issues and thoughts
on implementation which I won't go into in this email. Do people think
the idea has legs, though?

>  What I was really complaining about was Josh's suggestion that we invent
>  a function to let users *output* UUIDs in random-format-of-the-week.
>  I can't imagine much good coming of that.  I think we should keep
>  uuid_out emitting only the RFC-standardized format.

Well, if the application is handing them to us in that format, it
might be a bit surprised if it gets back a "fixed" one. The custom
type approach wouldn't have that side effect.

Cheers

Tom


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Tom Lane
Date:
"Tom Dunstan" <pgsql@tomd.cc> writes:
> This seems like a good enough opportunity to mention an idea that I
> had while/after doing the enum patch. The patch was fairly intrusive
> for something that was just adding a type because postgresql isn't
> really set up for parameterized types other than core types. The idea
> would be to extend the enum mechanism to allow UDTs etc to be
> parameterized, and enums would just become one use of the mechanism.

Isn't this reasonably well covered by Teodor's work to support
typmods for user-defined types?  We've discussed how the typmod could
be effectively a key into a system catalog someplace, thus allowing it
to represent more than just an int32 worth of stuff.  I'm not seeing
where your proposal accomplishes more than that can.
        regards, tom lane


Re: UUID data format 4x-4x-4x-4x-4x-4x-4x-4x

From
Bruce Momjian
Date:
Added to TODO:
* Allow the UUID type to accept non-standard formats
http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php


---------------------------------------------------------------------------

Dawid Kuroczko wrote:
> Hello.
> 
> I am currently playing with UUID data type and try to use it to store provided
> by third party (Hewlett-Packard) application.  The problem is they
> format UUIDs as
> 0000-0000-0000-0000-0000-0000-0000-0000, so I have to
> replace(text,'-','')::uuid for
> this kind of data.
> 
> Nooow, the case is quite simple and it might be that there are other
> applications
> formatting UUIDs too liberally.
> 
> I am working on a patch to support this format (yes, it is a simple
> modification).
> 
> And in the meanwhile I would like to ask you what do you think about it?
> 
> Cons: Such format is not standard.
> 
> Pros: This will help UUID data type adoption. [1]  While good
> applications format
> their data well, there are others which don't follow standards.  Also
> I think it is
> easier for a human being to enter UUID as 8 times 4 digits.
> 
> Your thoughts?  Should I submit a patch?
> 
>    Regards,
>      Dawid
> 
> [1]: My first thought when I received the error message was "hey! this
> is not an UUID,
> it is too long/too short!", only later did I check that they just
> don't format it too well.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +