Thread: Re: [GENERAL] UUID's as primary keys

Re: [GENERAL] UUID's as primary keys

From

Greg Stark

Date:

28 June 2006, 21:34:56

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Martijn van Oosterhout <kleptog@svana.org> writes:
> > The input functions get it, the output functions (bpcharout,
> > bpcharsend, etc) don't. Which makes it kind of hard to print a raw
> > value if you don't know how long it's going to be. They used to, but
> > that was removed some time back.

> Even back then you couldn't rely on the typmod value to be supplied;
> it was quite likely to be passed as -1.  The issue is not actually
> with on-disk storage, it is with function/operator arguments and
> results.  Those have never been identified any more closely than by
> giving a type OID.  So for any value that came from a function,
> you won't have a typmod, and you'd better be able to find out all
> you need to know just by inspecting the value itself.  Hence, length
> words.

Hm, so it could be stored on disk without the length header as long as the
length header is added to the in-memory representation? I don't think the type
system has hooks for reading and storing data to disk though.

> This is all pretty off-topic for pgsql-general, isn't it?

[moved to -hackers]

-- 
greg

Re: [GENERAL] UUID's as primary keys

From

Tom Lane

Date:

29 June 2006, 00:23:27

Greg Stark <gsstark@mit.edu> writes:
> Hm, so it could be stored on disk without the length header as long as
> the length header is added to the in-memory representation? I don't
> think the type system has hooks for reading and storing data to disk
> though.

No, it doesn't, and we'd pay a nonzero price for allowing that.
Currently the executor doesn't have to care (much) about whether a
tuple is on-disk or in-memory --- the individual datums look the same
either way.  Allowing them to be different would force a lot of
format conversion steps that currently need not happen.
        regards, tom lane

Re: [GENERAL] UUID's as primary keys

From

Greg Stark

Date:

29 June 2006, 03:40:33

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Hm, so it could be stored on disk without the length header as long as
> > the length header is added to the in-memory representation? I don't
> > think the type system has hooks for reading and storing data to disk
> > though.
> 
> No, it doesn't, and we'd pay a nonzero price for allowing that.
> Currently the executor doesn't have to care (much) about whether a
> tuple is on-disk or in-memory --- the individual datums look the same
> either way.  Allowing them to be different would force a lot of
> format conversion steps that currently need not happen.

Is there ever a case where an entire tuple is passed around without knowing
the typmod of an attribute in the tuple?

The conversion would only really have to happen when the attribute is fetched
or stored, not when the tuple is being passed around wholesale. But I have a
feeling that would be more intrusive than just making the entire system typmod
aware.

-- 
greg

Re: [GENERAL] UUID's as primary keys

From

Martijn van Oosterhout

Date:

29 June 2006, 06:33:44

On Thu, Jun 29, 2006 at 02:40:15AM -0400, Greg Stark wrote:
> > Greg Stark <gsstark@mit.edu> writes:
> > No, it doesn't, and we'd pay a nonzero price for allowing that.
> > Currently the executor doesn't have to care (much) about whether a
> > tuple is on-disk or in-memory --- the individual datums look the same
> > either way.  Allowing them to be different would force a lot of
> > format conversion steps that currently need not happen.
>
> Is there ever a case where an entire tuple is passed around without knowing
> the typmod of an attribute in the tuple?

A tuple is just an array of datums, with some header information. The
problems come when you don't have a tuple anymore, but only the datum,
like in arguments for functions.

I think it's more a case that most places that deal with datums simply
don't know about typmods. For example, the return type of a function
can only be char, not char(16). If you consider the case of a function
returning a RAW, the caller will have no way of knowing the typmod,
they do know the type though.

To be honest, it seems like a lot of work to save the four bytes of
overhead for the varlena structure on disk if you're going to need it
in memory anyway. And anything like RAW(16) which people want for
UUIDs, if it's going to have a lot of functions associated with it, may
as well just be a new type.

I think time would be much better spent finding a way of allowing
user-defined types to be created without using C functions.

> The conversion would only really have to happen when the attribute is fetched
> or stored, not when the tuple is being passed around wholesale. But I have a
> feeling that would be more intrusive than just making the entire system typmod
> aware.

I'm not sure if tuples are ever passed wholesale very far. The first
node to actually do anything with it (any join, expression or condition
test) is going to need to deconstruct it. Consider where we currently we
have a "Filter Cond" on a "Seq Scan". Currently the filter can access
the datums directly on the disk page, with what you're proposing, it
can't.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: [GENERAL] UUID's as primary keys

From

Greg Stark

Date:

29 June 2006, 10:47:20

Martijn van Oosterhout <kleptog@svana.org> writes:

> A tuple is just an array of datums, with some header information. The
> problems come when you don't have a tuple anymore, but only the datum,
> like in arguments for functions.
> 
> I think it's more a case that most places that deal with datums simply
> don't know about typmods. For example, the return type of a function
> can only be char, not char(16). If you consider the case of a function
> returning a RAW, the caller will have no way of knowing the typmod,
> they do know the type though.
> 
> To be honest, it seems like a lot of work to save the four bytes of
> overhead for the varlena structure on disk if you're going to need it
> in memory anyway. And anything like RAW(16) which people want for
> UUIDs, if it's going to have a lot of functions associated with it, may
> as well just be a new type. 

For large databases storage density leads directly to speed. Saving four bytes
of overhead on a 16-byte data structure would mean a 20% speed increase. Even
if that's only helpful on a tenth of the columns you're still talking about a
2% speed increase for all queries on the table. A lot of databases use CHAR(1)
for flags. The overhead is even worse there.

> Consider where we currently we have a "Filter Cond" on a "Seq Scan".
> Currently the filter can access the datums directly on the disk page, with
> what you're proposing, it can't.

Well it only can't if the data type has conversion functions. I'm not sure how
complex it would be having pointers that *getattr sometimes return pointers to
the disk page and sometimes return pointers to a palloced copy though.

-- 
greg

Re: [GENERAL] UUID's as primary keys

From

Thomas Hallgren

Date:

29 June 2006, 10:54:46

Greg Stark wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>
>   
>> To be honest, it seems like a lot of work to save the four bytes of
>> overhead for the varlena structure on disk if you're going to need it
>> in memory anyway. And anything like RAW(16) which people want for
>> UUIDs, if it's going to have a lot of functions associated with it, may
>> as well just be a new type. 
>>     
>
> For large databases storage density leads directly to speed. Saving four bytes
> of overhead on a 16-byte data structure would mean a 20% speed increase. Even
> if that's only helpful on a tenth of the columns you're still talking about a
> 2% speed increase for all queries on the table. A lot of databases use CHAR(1)
> for flags. The overhead is even worse there.
>
>   
I have to concur with this. Assume you use a bytea for a UUID that in 
turn is used as a primary key. The extra overhead will be reflected in 
all indexes, all foreign keys, etc. In a normalized database some tables 
may consist of UUID columns only.

Regards,
Thomas Hallgren

Re: [GENERAL] UUID's as primary keys

From

Martijn van Oosterhout

Date:

29 June 2006, 12:07:54

On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
> I have to concur with this. Assume you use a bytea for a UUID that in
> turn is used as a primary key. The extra overhead will be reflected in
> all indexes, all foreign keys, etc. In a normalized database some tables
> may consist of UUID columns only.

So you create a UUID type. It's cheap enough to create new types after
all, that's one of postgresql's strengths. What I'm saying is that it's
easier to create new fixed length types for the cases that need it,
than it is to redo the entire type handling of the backend.

And for people that want char(1), they should be using "char", which
really is one byte (ex padding ofcourse).

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: [GENERAL] UUID's as primary keys

From

Greg Stark

Date:

29 June 2006, 12:54:59

Martijn van Oosterhout <kleptog@svana.org> writes:

> On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
> > I have to concur with this. Assume you use a bytea for a UUID that in 
> > turn is used as a primary key. The extra overhead will be reflected in 
> > all indexes, all foreign keys, etc. In a normalized database some tables 
> > may consist of UUID columns only.
> 
> So you create a UUID type. It's cheap enough to create new types after
> all, that's one of postgresql's strengths. What I'm saying is that it's
> easier to create new fixed length types for the cases that need it,
> than it is to redo the entire type handling of the backend.

I guess my motivation here is that I feel currently char(n) is basically
broken in Postgres. Sure it satisfies the letter of the specification, but
it's failing to actually achieve anything for the users. There's no point at
all in using char(n) in Postgres since it takes exactly the same amount of
space as varchar() if you're always stuffing it full and more space if you're
not.

In the current setup the only reason for Postgres to have this data type at
all is purely for legacy compatibility. It doesn't actually "work" in that it
doesn't provide the space savings it's intended to and that would give users
an actual reason to use it in new databases.

-- 
greg

Re: [GENERAL] UUID's as primary keys

From

Tom Lane

Date:

29 June 2006, 13:13:06

Greg Stark <gsstark@mit.edu> writes:
> In the current setup the only reason for Postgres to have this data type at
> all is purely for legacy compatibility.

Yes.  So?
        regards, tom lane

Re: [GENERAL] UUID's as primary keys

From

Thomas Hallgren

Date:

29 June 2006, 13:40:23

Martijn van Oosterhout wrote:
> On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
>   
>> I have to concur with this. Assume you use a bytea for a UUID that in 
>> turn is used as a primary key. The extra overhead will be reflected in 
>> all indexes, all foreign keys, etc. In a normalized database some tables 
>> may consist of UUID columns only.
>>     
>
> So you create a UUID type. It's cheap enough to create new types after
> all, that's one of postgresql's strengths.
It would be a whole lot easier if I could use a domain.

>  What I'm saying is that it's
> easier to create new fixed length types for the cases that need it,
> than it is to redo the entire type handling of the backend.
>
>   
Of course. But it's a matter of who does what. Your reasoning push the 
burden to the users.

Regards,
Thomas Hallgren

Re: [GENERAL] UUID's as primary keys

From

Martijn van Oosterhout

Date:

29 June 2006, 13:47:39

On Thu, Jun 29, 2006 at 06:40:13PM +0200, Thomas Hallgren wrote:
> Martijn van Oosterhout wrote:
> >On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
> >
> >>I have to concur with this. Assume you use a bytea for a UUID that in
> >>turn is used as a primary key. The extra overhead will be reflected in
> >>all indexes, all foreign keys, etc. In a normalized database some tables
> >>may consist of UUID columns only.
> >>
> >
> >So you create a UUID type. It's cheap enough to create new types after
> >all, that's one of postgresql's strengths.
> It would be a whole lot easier if I could use a domain.

It seems to me that maybe the backend should include a 16-byte fixed
length object (after all, we've got 1, 2, 4 and 8 bytes already) and
then people can use that to build whatever they like, using domains,
for example...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: [GENERAL] UUID's as primary keys

From

mark@mark.mielke.cc

Date:

06 July 2006, 13:12:58

On Thu, Jun 29, 2006 at 06:47:17PM +0200, Martijn van Oosterhout wrote:
> It seems to me that maybe the backend should include a 16-byte fixed
> length object (after all, we've got 1, 2, 4 and 8 bytes already) and
> then people can use that to build whatever they like, using domains,
> for example...

Sooooo... Back to this.

It won't happen unless somebody does it - and I realize that people
are busy with their own projects, so unless somebody more willing and
better suited will step up, I'm going to take a stab at getting
advanced consensus.

Please answer the below questions, and state whether your opinion is
just an opinion, or whether you are stating it as a PostgreSQL
maintainer and it is law. If you wish, you can rank preferences.

1) The added 128-bit type should take the form of:
   a) UUID, with all functions   b) UUID, with only basic generation functions + encode/decode/indexable   c) UUID,
withonly encode/decode/indexable - generic except for the      name of the type, and the encoding format.   d) Generic
128-bittype - same as c) except may not encode or decode      as UUID (dashes). Either a large number (hex string?), or
binarydata.   e) Generic n-byte binary data type generator. Not sure of feasibility      of this at this point. See
thread.

2) According to your answer in 1), the added 128-bit type should be:
   a) In core first.   b) In contrib first.   c) In pgfoundry first.

Thanks,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: [GENERAL] UUID's as primary keys

From

Martijn van Oosterhout

Date:

06 July 2006, 13:38:32

On Thu, Jul 06, 2006 at 12:12:18PM -0400, mark@mark.mielke.cc wrote:
> Please answer the below questions, and state whether your opinion is
> just an opinion, or whether you are stating it as a PostgreSQL
> maintainer and it is law. If you wish, you can rank preferences.

Do I have to pick only one? I'd choose firstly for:

1c) UUID, with only encode/decode/indexable - generic except for the        name of the type, and the encoding format.
2a) In core first

And in addation to that:

1b) UUID, with only basic generation functions +        encode/decode/indexable
2b) In contrib first.

And maybe finally:

1a)  UUID, with all functions
2c)  In pgfoundry first.

IOW, I'm not so convinced that full UUID support should appear in core,
but I think a 16-byte type should be available in core, with basic UUID
functions in contrib and the full suite on pgfoundry.

But that's just my opinion ofcourse.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: [GENERAL] UUID's as primary keys

From

"Jochem van Dieten"

Date:

06 July 2006, 13:42:15

On 7/6/06, mark@mark.mielke.cc wrote:
>
> Please answer the below questions, and state whether your opinion is
> just an opinion, or whether you are stating it as a PostgreSQL
> maintainer and it is law. If you wish, you can rank preferences.
>
> 1) The added 128-bit type should take the form of:
>
>     a) UUID, with all functions

> 2) According to your answer in 1), the added 128-bit type should be:
>
>     a) In core first.

Opinion, 1 a, 2 a

Jochem

Re: [GENERAL] UUID's as primary keys

From

Greg Stark

Date:

06 July 2006, 14:42:16

mark@mark.mielke.cc writes:

>     e) Generic n-byte binary data type generator. Not sure of feasibility
>        of this at this point. See thread.

I don't like the idea of a generator that would have to be manually invoked,
though such a thing would be a fine tool for contrib or pgfoundry, I think it
would never be a clean enough interface for core.

On the other hand core could conceivably translate things like char(n) into
such a type generated on the fly. That is, instead of having a single char oid
it could check a cache of fixed length char(n) data types and if there isn't
one already generate one on the fly. That would be somewhat grotty of an
implementation but the user interface at least would be entirely transparent.
If one day we change things to pass around typmod database designs wouldn't
have to change at all.

(Actually Postgres can never do this for char(n), at least not as long as we
insist on making char/varchar/text locale-aware. Personally I think the
default char/varchar/text locale should be C unless you specify otherwise on a
per-column basis. But that seems to be a minority opinion. Postgres could
however do this for separate raw binary datatypes like bit(n) or bytea(n).)

In answer to your question, though my opinion carries no special weight at
all, I would suggest adding a bare bones 16-byte data type to core and a
second binary-compatible data type based on it that parsed/output as uuids.
The extended uuid libraries should only go in pgfoundry/contrib.

-- 
greg

Re: [GENERAL] UUID's as primary keys

From

"Sander Steffann"

Date:

06 July 2006, 14:56:14

Hi,

Just MHO:

> 1) The added 128-bit type should take the form of:
> 
>    c) UUID, with only encode/decode/indexable - generic except for the
>       name of the type, and the encoding format.
> 
> 2) According to your answer in 1), the added 128-bit type should be:
> 
>    a) In core first.

1c is what I would need. 1b or 1a would be nice to have.

- Sander

Re: [GENERAL] UUID's as primary keys

From

Thomas Hallgren

Date:

06 July 2006, 17:02:57

Greg Stark wrote:

> In answer to your question, though my opinion carries no special weight at
> all, I would suggest adding a bare bones 16-byte data type to core and a
> second binary-compatible data type based on it that parsed/output as uuids.
> The extended uuid libraries should only go in pgfoundry/contrib.
> 
I second that.

Regards,
Thomas Hallgren

Re: [GENERAL] UUID's as primary keys

From

Jim Nasby

Date:

08 July 2006, 18:54:51

On Jul 6, 2006, at 4:02 PM, Thomas Hallgren wrote:
>> In answer to your question, though my opinion carries no special  
>> weight at
>> all, I would suggest adding a bare bones 16-byte data type to core  
>> and a
>> second binary-compatible data type based on it that parsed/output  
>> as uuids.
>> The extended uuid libraries should only go in pgfoundry/contrib.
> I second that.

+1. If there's enough user demand we can look at adding the type to  
core (I don't see any real advantage to contrib over pgFoundry for  
this). I'm not sure if it makes sense to add a generic 16 byte RAW to  
core, either. I'd *much* rather see effort expended on a generic RAW  
type which had it's size defined as part of the type and didn't use  
varlena.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: [GENERAL] UUID's as primary keys

From

"Joshua D. Drake"

Date:

09 July 2006, 00:13:15

On Saturday 08 July 2006 14:54, Jim Nasby wrote:
> On Jul 6, 2006, at 4:02 PM, Thomas Hallgren wrote:
> >> In answer to your question, though my opinion carries no special
> >> weight at
> >> all, I would suggest adding a bare bones 16-byte data type to core
> >> and a
> >> second binary-compatible data type based on it that parsed/output
> >> as uuids.
> >> The extended uuid libraries should only go in pgfoundry/contrib.
> >
> > I second that.
>
> +1. If there's enough user demand we can look at adding the type to
> core (I don't see any real advantage to contrib over pgFoundry for
> this).

The advantage of contrib over pgFoundry is that it will be packaged by the 
major distributions. Every distribution includes a package of the contrib
modules.

Sincerely,

Joshua D. Drake


--   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240  Providing the most comprehensive  PostgreSQL
solutionssince 1997            http://www.commandprompt.com/

Re: [GENERAL] UUID's as primary keys

From

Martijn van Oosterhout

Date:

09 July 2006, 10:35:06

On Sat, Jul 08, 2006 at 05:54:26PM -0400, Jim Nasby wrote:
> +1. If there's enough user demand we can look at adding the type to
> core (I don't see any real advantage to contrib over pgFoundry for
> this). I'm not sure if it makes sense to add a generic 16 byte RAW to
> core, either. I'd *much* rather see effort expended on a generic RAW
> type which had it's size defined as part of the type and didn't use
> varlena.

You could place a nice wrapper around type generators, which would let
you say:

DECLARE TYPE RAW(16);

After which point you could use that type in function declarations and
such. It would create an OID for that type would could be used as
normal.

I think that trying to get the backend to pay more attention to typmods
is not going to be successful. Simply because functions and operators
have an affect on the typmod and once you start relying on typmods to
decode a tuple, you've got a real problem.

As an example, what do you get when you concatenate two CHAR(4)'s? Do
you get another CHAR(4) or is it a CHAR(8)? How does the backend know?
You'd have to accompany each function with another function just to
tell you how the typmods would be related.

The only way out I can think of is that RAW(n) is merely a sort of
template and RAW(x) cannot be used in a place where RAW(y) is expected
(if x<>y). Hence it makes sense to have a template that people can
instantiate instances of and let the rest of the system treat them as
new types, unrelated to anything else.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.