Thread: UUID's as primary keys
I'm building an app where I have several entities that are identified using a UUID (i.e. a 128 bit quantity). My current implementation uses a composite primary key consisting of two int8 values. It's a bit cumbersome and I would much rather have a distinct type. An earlier implementation using Oracle mapped the UUID to a RAW(16) but PostgreSQL doesn't have that and the BYTEA adds extra overhead. What would be the best (as in most efficient) mapping for a 128 bit primary key? Regards, Thomas Hallgren
On Wed, Jun 28, 2006 at 09:01:49AM +0200, Thomas Hallgren wrote: > I'm building an app where I have several entities that are identified > using a UUID (i.e. a 128 bit quantity). My current implementation uses a > composite primary key consisting of two int8 values. It's a bit > cumbersome and I would much rather have a distinct type. An earlier > implementation using Oracle mapped the UUID to a RAW(16) but PostgreSQL > doesn't have that and the BYTEA adds extra overhead. > > What would be the best (as in most efficient) mapping for a 128 bit > primary key? Sounds like something for a custom type. There's one here[1] though I have no idea how good it is. [1] http://gborg.postgresql.org/project/pguuid/projdisplay.php Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn van Oosterhout wrote: > On Wed, Jun 28, 2006 at 09:01:49AM +0200, Thomas Hallgren wrote: > >> I'm building an app where I have several entities that are identified >> using a UUID (i.e. a 128 bit quantity). My current implementation uses a >> composite primary key consisting of two int8 values. It's a bit >> cumbersome and I would much rather have a distinct type. An earlier >> implementation using Oracle mapped the UUID to a RAW(16) but PostgreSQL >> doesn't have that and the BYTEA adds extra overhead. >> >> What would be the best (as in most efficient) mapping for a 128 bit >> primary key? >> > > Sounds like something for a custom type. There's one here[1] though I > have no idea how good it is. > > [1] http://gborg.postgresql.org/project/pguuid/projdisplay.php > > Have a nice day, > Thanks. That would of course work but at the same time it increases the complexity of my app. Yet another component to install and keep track of. It's also a bit of an overkill since the only thing I need is an opaque bit storage. Why is it that PostgreSQL lack a fixed length binary type similar to the RAW type in Oracle? ISTM that could be very useful and not very hard to implement. Regards, Thomas Hallgren
On Wed, Jun 28, 2006 at 10:22:45AM +0200, Thomas Hallgren wrote: > Thanks. That would of course work but at the same time it increases the > complexity of my app. Yet another component to install and keep track > of. It's also a bit of an overkill since the only thing I need is an > opaque bit storage. Why is it that PostgreSQL lack a fixed length binary > type similar to the RAW type in Oracle? ISTM that could be very useful > and not very hard to implement. AIUI, it can't be done because of a basic rule of the type system: the typmod can't be necessary to interpret the binary representation of a value. For something like RAW(16) the type would be the oid for "raw" and the typmod would be 16. However, when reading the value from a disk page, you're not given the typmod, so you have no way of determining the length. That's why there is a verlena header for types where you don't know the length, which adds overhead. The alternative is to create a type of the exact length you want, but in your case that's not acceptable either. I'm not sure if there are any other alternatives. Have a ncie day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn van Oosterhout wrote: > On Wed, Jun 28, 2006 at 10:22:45AM +0200, Thomas Hallgren wrote: >> Thanks. That would of course work but at the same time it increases the >> complexity of my app. Yet another component to install and keep track >> of. It's also a bit of an overkill since the only thing I need is an >> opaque bit storage. Why is it that PostgreSQL lack a fixed length binary >> type similar to the RAW type in Oracle? ISTM that could be very useful >> and not very hard to implement. > > AIUI, it can't be done because of a basic rule of the type system: the > typmod can't be necessary to interpret the binary representation of a > value. For something like RAW(16) the type would be the oid for "raw" > and the typmod would be 16. However, when reading the value from a disk > page, you're not given the typmod, so you have no way of determining > the length. > OK. I thought you always had a type descriptor handy when reading the binary representation. I've noticed that the typmod is expected in some receive functions (bpcharrecv and numeric_recv for instance). Are you saying that there are times when you don't use that? Regards, Thomas Hallgren
On Wed, Jun 28, 2006 at 12:03:40PM +0200, Thomas Hallgren wrote: > OK. I thought you always had a type descriptor handy when reading the > binary representation. I've noticed that the typmod is expected in some > receive functions (bpcharrecv and numeric_recv for instance). Are you > saying that there are times when you don't use that? The input functions get it, the output functions (bpcharout, bpcharsend, etc) don't. Which makes it kind of hard to print a raw value if you don't know how long it's going to be. They used to, but that was removed some time back. It's a security issue IIRC, since any user could call raw_out(field, 2048) and get whatever was in the 2K of data after that field. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn van Oosterhout wrote: > On Wed, Jun 28, 2006 at 12:03:40PM +0200, Thomas Hallgren wrote: > >> OK. I thought you always had a type descriptor handy when reading the >> binary representation. I've noticed that the typmod is expected in some >> receive functions (bpcharrecv and numeric_recv for instance). Are you >> saying that there are times when you don't use that? >> > > The input functions get it, the output functions (bpcharout, > bpcharsend, etc) don't. Which makes it kind of hard to print a raw > value if you don't know how long it's going to be. They used to, but > that was removed some time back. It's a security issue IIRC, since any > user could call raw_out(field, 2048) and get whatever was in the 2K of > data after that field. > > A user that is trusted with installing a C-function in the backend is free to scan the process memory anyway so in what way did that increase the security? IMHO, the only relevant security in that context is to have trusted people install trusted modules. I'm surprised that something like that made you remove significant functionality. Regards, Thomas Hallgren
On Wed, Jun 28, 2006 at 01:56:47PM +0200, Thomas Hallgren wrote: > A user that is trusted with installing a C-function in the backend is > free to scan the process memory anyway so in what way did that increase > the security? IMHO, the only relevant security in that context is to > have trusted people install trusted modules. I'm surprised that > something like that made you remove significant functionality. You're missing the point. The type output function is not generally a priveledged function. Think bpcharout, text_out, numeric_out, etc... These can be called by users directly and the input to those functions cannot be trusted. If the type output function needs an additional parameter to correctly and safely decode the actual Datum, you're screwed because then users can pass invalid parameters to affect the decoding. If you have a way of telling what the right value is, then you didn't need to pass it in the first place. Hence, you have to be able to decode a datum knowing only its type, irrespective of typmod. So say you had a field of type RAW(16) you would have to be able to decode it knowing only that it is of type "RAW". So you need a header to tell you how long it is i.e. a varlena structure. Hope this clarifies it a bit, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > The input functions get it, the output functions (bpcharout, > bpcharsend, etc) don't. Which makes it kind of hard to print a raw > value if you don't know how long it's going to be. They used to, but > that was removed some time back. Even back then you couldn't rely on the typmod value to be supplied; it was quite likely to be passed as -1. The issue is not actually with on-disk storage, it is with function/operator arguments and results. Those have never been identified any more closely than by giving a type OID. So for any value that came from a function, you won't have a typmod, and you'd better be able to find out all you need to know just by inspecting the value itself. Hence, length words. This is all pretty off-topic for pgsql-general, isn't it? regards, tom lane