Thread: Using PK value as a String

Using PK value as a String

From
Jay
Date:
I have a table named table_Users:

CREATE TABLE table_Users (
   UserID       character(40)  NOT NULL default '',
   Username   varchar(256)  NOT NULL default '',
   Email          varchar(256) NOT NULL default ''
   etc...
);

The UserID is a character(40) and is generated using UUID function. We
started making making other tables and ended up not really using
UserID, but instead using Username as the unique identifier for the
other tables. Now, we pass and insert the Username to for discussions,
wikis, etc, for all the modules we have developed. I was wondering if
it would be a performance improvement to use the 40 Character UserID
instead of Username when querying the other tables, or if we should
change the UserID to a serial value and use that to query the other
tables. Or just keep the way things are because it doesn't really make
much a difference.

We are still in development and its about half done, but if there is
going to be performance issues because using PK as a String value, we
can just take a day change it before any production as been started.
Anyway advice you can give would be much appreciated.

Postgres performance guru where are you?

Re: Using PK value as a String

From
Gregory Stark
Date:
"Jay" <arrival123@gmail.com> writes:

> I have a table named table_Users:
>
> CREATE TABLE table_Users (
>    UserID       character(40)  NOT NULL default '',
>    Username   varchar(256)  NOT NULL default '',
>    Email          varchar(256) NOT NULL default ''
>    etc...
> );
>
> The UserID is a character(40) and is generated using UUID function. We
> started making making other tables and ended up not really using
> UserID, but instead using Username as the unique identifier for the
> other tables. Now, we pass and insert the Username to for discussions,
> wikis, etc, for all the modules we have developed. I was wondering if
> it would be a performance improvement to use the 40 Character UserID
> instead of Username when querying the other tables, or if we should
> change the UserID to a serial value and use that to query the other
> tables. Or just keep the way things are because it doesn't really make
> much a difference.

Username would not be any slower than UserID unless you have a lot of
usernames longer than 40 characters.

However making UserID an integer would be quite a bit more efficient. It would
take 4 bytes instead of as the length of the Username which adds up when it's
in all your other tables... Also internationalized text collations are quite a
bit more expensive than a simple integer comparison.

But the real question here is what's the better design. If you use Username
you'll be cursing if you ever want to provide a facility to allow people to
change their usernames. You may not want such a facility now but one day...

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: Using PK value as a String

From
Valentin Bogdanov
Date:
--- On Mon, 11/8/08, Gregory Stark <stark@enterprisedb.com> wrote:

> From: Gregory Stark <stark@enterprisedb.com>
> Subject: Re: [PERFORM] Using PK value as a String
> To: "Jay" <arrival123@gmail.com>
> Cc: pgsql-performance@postgresql.org
> Date: Monday, 11 August, 2008, 10:30 AM
> "Jay" <arrival123@gmail.com> writes:
>
> > I have a table named table_Users:
> >
> > CREATE TABLE table_Users (
> >    UserID       character(40)  NOT NULL default
> '',
> >    Username   varchar(256)  NOT NULL default
> '',
> >    Email          varchar(256) NOT NULL default
> ''
> >    etc...
> > );
> >
> > The UserID is a character(40) and is generated using
> UUID function. We
> > started making making other tables and ended up not
> really using
> > UserID, but instead using Username as the unique
> identifier for the
> > other tables. Now, we pass and insert the Username to
> for discussions,
> > wikis, etc, for all the modules we have developed. I
> was wondering if
> > it would be a performance improvement to use the 40
> Character UserID
> > instead of Username when querying the other tables, or
> if we should
> > change the UserID to a serial value and use that to
> query the other
> > tables. Or just keep the way things are because it
> doesn't really make
> > much a difference.
>
> Username would not be any slower than UserID unless you
> have a lot of
> usernames longer than 40 characters.
>
> However making UserID an integer would be quite a bit more
> efficient. It would
> take 4 bytes instead of as the length of the Username which
> adds up when it's
> in all your other tables... Also internationalized text
> collations are quite a
> bit more expensive than a simple integer comparison.
>
> But the real question here is what's the better design.
> If you use Username
> you'll be cursing if you ever want to provide a
> facility to allow people to
> change their usernames. You may not want such a facility
> now but one day...
>

I don't understand Gregory's suggestion about the design. I thought using natural primary keys as opposed to surrogate
onesis a better design strategy, even when it comes to performance considerations and even more so if there are complex
relationshipswithin the database. 

Regards,
Valentin


> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>   Ask me about EnterpriseDB's On-Demand Production
> Tuning
>
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo!
http://uk.docs.yahoo.com/ymail/new.html

答复: [PERFORM] Using PK value as a String

From
"jay"
Date:
If UserID just be unique internal key and the unique id of other tables, I'd
like sequence, which is unique and just use 8 bytes(bigint) When it querying
other tables, it will faster , and disk space smaller than UUID(40 bytes).

     莫建祥
阿里巴巴软件(上海)有限公司
研发中心-IM服务端开发部
联系方式:86-0571-85022088-13072
贸易通ID:jaymo 淘宝ID:jackem
公司网站:www.alisoft.com
wiki:http://10.0.32.21:1688/confluence/pages/viewpage.action?pageId=10338

-----邮件原件-----
发件人: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] 代表 Jay
发送时间: 2008年8月11日 15:35
收件人: pgsql-performance@postgresql.org
主题: [PERFORM] Using PK value as a String

I have a table named table_Users:

CREATE TABLE table_Users (
   UserID       character(40)  NOT NULL default '',
   Username   varchar(256)  NOT NULL default '',
   Email          varchar(256) NOT NULL default ''
   etc...
);

The UserID is a character(40) and is generated using UUID function. We
started making making other tables and ended up not really using
UserID, but instead using Username as the unique identifier for the
other tables. Now, we pass and insert the Username to for discussions,
wikis, etc, for all the modules we have developed. I was wondering if
it would be a performance improvement to use the 40 Character UserID
instead of Username when querying the other tables, or if we should
change the UserID to a serial value and use that to query the other
tables. Or just keep the way things are because it doesn't really make
much a difference.

We are still in development and its about half done, but if there is
going to be performance issues because using PK as a String value, we
can just take a day change it before any production as been started.
Anyway advice you can give would be much appreciated.

Postgres performance guru where are you?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Using PK value as a String

From
ries van Twisk
Date:
On Aug 11, 2008, at 4:30 AM, Gregory Stark wrote:

> "Jay" <arrival123@gmail.com> writes:
>
>> I have a table named table_Users:
>>
>> CREATE TABLE table_Users (
>>   UserID       character(40)  NOT NULL default '',
>>   Username   varchar(256)  NOT NULL default '',
>>   Email          varchar(256) NOT NULL default ''
>>   etc...
>> );
>>
>> The UserID is a character(40) and is generated using UUID function.
>> We
>> started making making other tables and ended up not really using
>> UserID, but instead using Username as the unique identifier for the
>> other tables. Now, we pass and insert the Username to for
>> discussions,
>> wikis, etc, for all the modules we have developed. I was wondering if
>> it would be a performance improvement to use the 40 Character UserID
>> instead of Username when querying the other tables, or if we should
>> change the UserID to a serial value and use that to query the other
>> tables. Or just keep the way things are because it doesn't really
>> make
>> much a difference.
>
> Username would not be any slower than UserID unless you have a lot of
> usernames longer than 40 characters.
>
> However making UserID an integer would be quite a bit more
> efficient. It would
> take 4 bytes instead of as the length of the Username which adds up
> when it's
> in all your other tables... Also internationalized text collations
> are quite a
> bit more expensive than a simple integer comparison.
>
> But the real question here is what's the better design. If you use
> Username
> you'll be cursing if you ever want to provide a facility to allow
> people to
> change their usernames. You may not want such a facility now but one
> day...
>

If you generate UUID's with the UUID function  and you are on 8.3,
why not use the UUID type to store it?

Ries


> --
>  Gregory Stark
>  EnterpriseDB          http://www.enterprisedb.com
>  Ask me about EnterpriseDB's On-Demand Production Tuning
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Ries van Twisk
tags: Freelance TYPO3 Glassfish JasperReports JasperETL Flex Blaze-DS
WebORB PostgreSQL DB-Architect
email: ries@vantwisk.nl
web:   http://www.rvantwisk.nl/
skype: callto://r.vantwisk




Re: Using PK value as a String

From
Craig James
Date:
Valentin Bogdanov wrote:
> --- On Mon, 11/8/08, Gregory Stark <stark@enterprisedb.com> wrote:
>
>> From: Gregory Stark <stark@enterprisedb.com>
>> Subject: Re: [PERFORM] Using PK value as a String
>> To: "Jay" <arrival123@gmail.com>
>> Cc: pgsql-performance@postgresql.org
>> Date: Monday, 11 August, 2008, 10:30 AM
>> "Jay" <arrival123@gmail.com> writes:
>>
>>> I have a table named table_Users:
>>>
>>> CREATE TABLE table_Users (
>>>    UserID       character(40)  NOT NULL default
>> '',
>>>    Username   varchar(256)  NOT NULL default
>> '',
>>>    Email          varchar(256) NOT NULL default
>> ''
>>>    etc...
>>> );
>>>
...
>> But the real question here is what's the better design.
>> If you use Username
>> you'll be cursing if you ever want to provide a
>> facility to allow people to
>> change their usernames. You may not want such a facility
>> now but one day...
>>
>
> I don't understand Gregory's suggestion about the design. I thought
> using natural primary keys as opposed to surrogate ones is a better
> design strategy, even when it comes to performance considerations
> and even more so if there are complex relationships within the database.

No, exactly the opposite.  Data about users (such as name, email address, etc.) are rarely a good choice as a foreign
key,and shouldn't be considered "keys" in most circumstances.  As Gregory points out, you're spreading the user's name
acrossthe database, effectively denormalizing it. 

Instead, you should have a user record, with an arbitrary key, an integer or OID, that you use as the foreign key for
allother tables.  That way, when the username changes, only one table will be affected.  And it's much more efficient
touse an integer as the key than a long string. 

Craig

Re: Using PK value as a String

From
Mario Weilguni
Date:
Valentin Bogdanov schrieb:
> --- On Mon, 11/8/08, Gregory Stark <stark@enterprisedb.com> wrote:
>
>
>> From: Gregory Stark <stark@enterprisedb.com>
>> Subject: Re: [PERFORM] Using PK value as a String
>> To: "Jay" <arrival123@gmail.com>
>> Cc: pgsql-performance@postgresql.org
>> Date: Monday, 11 August, 2008, 10:30 AM
>> "Jay" <arrival123@gmail.com> writes:
>>
>>
>>> I have a table named table_Users:
>>>
>>> CREATE TABLE table_Users (
>>>    UserID       character(40)  NOT NULL default
>>>
>> '',
>>
>>>    Username   varchar(256)  NOT NULL default
>>>
>> '',
>>
>>>    Email          varchar(256) NOT NULL default
>>>
>> ''
>>
>>>    etc...
>>> );
>>>
>>> The UserID is a character(40) and is generated using
>>>
>> UUID function. We
>>
>>> started making making other tables and ended up not
>>>
>> really using
>>
>>> UserID, but instead using Username as the unique
>>>
>> identifier for the
>>
>>> other tables. Now, we pass and insert the Username to
>>>
>> for discussions,
>>
>>> wikis, etc, for all the modules we have developed. I
>>>
>> was wondering if
>>
>>> it would be a performance improvement to use the 40
>>>
>> Character UserID
>>
>>> instead of Username when querying the other tables, or
>>>
>> if we should
>>
>>> change the UserID to a serial value and use that to
>>>
>> query the other
>>
>>> tables. Or just keep the way things are because it
>>>
>> doesn't really make
>>
>>> much a difference.
>>>
>> Username would not be any slower than UserID unless you
>> have a lot of
>> usernames longer than 40 characters.
>>
>> However making UserID an integer would be quite a bit more
>> efficient. It would
>> take 4 bytes instead of as the length of the Username which
>> adds up when it's
>> in all your other tables... Also internationalized text
>> collations are quite a
>> bit more expensive than a simple integer comparison.
>>
>> But the real question here is what's the better design.
>> If you use Username
>> you'll be cursing if you ever want to provide a
>> facility to allow people to
>> change their usernames. You may not want such a facility
>> now but one day...
>>
>>
>
> I don't understand Gregory's suggestion about the design. I thought using natural primary keys as opposed to
surrogateones is a better design strategy, even when it comes to performance considerations and even more so if there
arecomplex relationships within the database. 
>
> Regards,
> Valentin
>
>
UUID is already a surrogate key not a natural key, in no aspect better
than a numeric key, just taking a lot more space.

So why not use int4/int8?




Re: Using PK value as a String

From
"Jay D. Kang"
Date:
You guys totally rock!

I guess, bottom line, we should take that extra day to convert our PK and FK to a numerical value, using BIG INT to be on the save side. (Even though Wikipedia's UserID uses just an integer as data type)

To Gregory: Thank you for you valuable statement.
"But the real question here is what's the better design. If you use Username you'll be cursing if you ever want to provide a facility to allow people to change their usernames. You may not want such a facility now but one day" I think you hit the nail on the head with this comment. If a member really wants to change their username, IE: Choose to go with IloveUSara, only to be dumped on the alter, who am I to say no.

To Valentin: I wish someone would prove us both wrong or right. I still thought it wasn't a bad idea to use username a varchar(256) to interact with all the modules... Well thats what I thought when I first started writing the tables...

To Jay: Thanks for keeping it short and simple. "I'd like sequence, which is unique and just use 8 bytes(bigint) When it querying other tables, it will faster , and disk space smaller than UUID(40 bytes)." I'm taking your advice on this^^ Although wikipedia's postgresql database schema still stands.

To Craig: Yes, I agree. Please see my comment on IloveUSara.

To Mario: Let's go! I'm Mario... Sorry, I love Mario Kart. Especially on the old super famacon. Going with int8, thank you for the advice.


On Tue, Aug 12, 2008 at 6:58 PM, Mario Weilguni <mweilguni@sime.com> wrote:
Valentin Bogdanov schrieb:

--- On Mon, 11/8/08, Gregory Stark <stark@enterprisedb.com> wrote:

 
From: Gregory Stark <stark@enterprisedb.com>
Subject: Re: [PERFORM] Using PK value as a String
To: "Jay" <arrival123@gmail.com>
Cc: pgsql-performance@postgresql.org
Date: Monday, 11 August, 2008, 10:30 AM
"Jay" <arrival123@gmail.com> writes:

   
I have a table named table_Users:

CREATE TABLE table_Users (
  UserID       character(40)  NOT NULL default
     
'',
   
  Username   varchar(256)  NOT NULL default
     
'',
   
  Email          varchar(256) NOT NULL default
     
''
   
  etc...
);

The UserID is a character(40) and is generated using
     
UUID function. We
   
started making making other tables and ended up not
     
really using
   
UserID, but instead using Username as the unique
     
identifier for the
   
other tables. Now, we pass and insert the Username to
     
for discussions,
   
wikis, etc, for all the modules we have developed. I
     
was wondering if
   
it would be a performance improvement to use the 40
     
Character UserID
   
instead of Username when querying the other tables, or
     
if we should
   
change the UserID to a serial value and use that to
     
query the other
   
tables. Or just keep the way things are because it
     
doesn't really make
   
much a difference.
     
Username would not be any slower than UserID unless you
have a lot of
usernames longer than 40 characters.

However making UserID an integer would be quite a bit more
efficient. It would
take 4 bytes instead of as the length of the Username which
adds up when it's
in all your other tables... Also internationalized text
collations are quite a
bit more expensive than a simple integer comparison.

But the real question here is what's the better design.
If you use Username
you'll be cursing if you ever want to provide a
facility to allow people to
change their usernames. You may not want such a facility
now but one day...

   

I don't understand Gregory's suggestion about the design. I thought using natural primary keys as opposed to surrogate ones is a better design strategy, even when it comes to performance considerations and even more so if there are complex relationships within the database.

Regards,
Valentin

 
UUID is already a surrogate key not a natural key, in no aspect better than a numeric key, just taking a lot more space.

So why not use int4/int8?






--
Regards,
Jay Kang


This e-mail is intended only for the proper person to whom it is addressed and may contain legally privileged and/or confidential information. If you received this communication erroneously, please notify me by reply e-mail, delete this e-mail and all your copies of this e-mail and do not review, disseminate, redistribute, make other use of, rely upon, or copy this communication. Thank you.

Re: Using PK value as a String

From
Gregory Stark
Date:
"Mario Weilguni" <mweilguni@sime.com> writes:

> UUID is already a surrogate key not a natural key, in no aspect better than a
> numeric key, just taking a lot more space.
>
> So why not use int4/int8?

The main reason to use UUID instead of sequences is if you want to be able to
generate unique values across multiple systems. So, for example, if you want
to be able to send these userids to another system which is taking
registrations from lots of places. Of course that only works if that other
system is already using UUIDs and you're all using good generators.

You only need int8 if you might someday have more than 2 *billion* users...
Probably not an urgent issue.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: Using PK value as a String

From
Bill Moran
Date:
In response to Gregory Stark <stark@enterprisedb.com>:

> "Mario Weilguni" <mweilguni@sime.com> writes:
>
> > UUID is already a surrogate key not a natural key, in no aspect better than a
> > numeric key, just taking a lot more space.
> >
> > So why not use int4/int8?
>
> The main reason to use UUID instead of sequences is if you want to be able to
> generate unique values across multiple systems. So, for example, if you want
> to be able to send these userids to another system which is taking
> registrations from lots of places. Of course that only works if that other
> system is already using UUIDs and you're all using good generators.

Note that in many circumstances, there are other options than UUIDs.  If
you have control over all the systems generating values, you can prefix
each generated value with a system ID (i.e. make the high 8 bits the
system ID and the remaining bits come from a sequence)  This allows
you to still use int4 or int8.

UUID is designed to be a universal solution.  But universal solutions
are frequently less efficient than custom-tailored solutions.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Using PK value as a String

From
Mark Mielke
Date:
Bill Moran wrote:
The main reason to use UUID instead of sequences is if you want to be able to
generate unique values across multiple systems. So, for example, if you want
to be able to send these userids to another system which is taking
registrations from lots of places. Of course that only works if that other
system is already using UUIDs and you're all using good generators.   
Note that in many circumstances, there are other options than UUIDs.  If
you have control over all the systems generating values, you can prefix
each generated value with a system ID (i.e. make the high 8 bits the
system ID and the remaining bits come from a sequence)  This allows
you to still use int4 or int8.

UUID is designed to be a universal solution.  But universal solutions
are frequently less efficient than custom-tailored solutions. 

Other benefits include:
    - Reduced management cost. As described above, one would have to allocate keyspace in each system. By using a UUID, one can skip this step.
    - Increased keyspace. Even if keyspace allocation is performed, an int4 only has 32-bit of keyspace to allocate. The IPv4 address space is already over 85% allocated as an example of how this can happen. 128-bits has a LOT more keyspace than 32-bits or 64-bits.
    - Reduced sequence predictability. Certain forms of exploits when the surrogate key is exposed to the public, are rendered ineffective as guessing the "next" or "previous" generated key is far more difficult.
    - Used as the key into a cache or other lookup table. Multiple types of records can be cached to the same storage as the sequence is intended to be universally unique.
    - Flexibility to merge systems later, even if unplanned. For example, System A and System B are run independently for some time. Then, it is determined that they should be merged. If unique keys are specific to the system, this becomes far more difficult to implement than if the unique keys are universal.

That said, most uses of UUID do not require any of the above. It's a "just in case" measure, that suffers the performance cost, "just in case."

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

Re: Using PK value as a String

From
Gregory Stark
Date:
"Mark Mielke" <mark@mark.mielke.cc> writes:

>    - Increased keyspace. Even if keyspace allocation is performed, an int4 only
> has 32-bit of keyspace to allocate. The IPv4 address space is already over 85%
> allocated as an example of how this can happen. 128-bits has a LOT more
> keyspace than 32-bits or 64-bits.

The rest of your points are valid (though not particularly convincing to me
for most applications) but this example is bogus. The IPv4 address space is
congested because of the hierarchic nature of allocations. Not because there
is an actual shortage of IPv4 addresses themselves. There would be enough IPv4
for every ethernet device on the planet for decades to come if we could
allocate them individually -- but we can't.

That is, when allocating an organization 100 addresses if they want to be able
to treat them as a contiguous network they must be allocated 128 addresses.
And if they might ever grow to 129 they're better off just justifying 256
addresses today.

That's not an issue for a sequence generated primary key. Arguably it *is* a
problem for UUID which partitions up that 128-bits much the way the original
pre-CIDR IPv4 addressing scheme partitioned up the address. But 128-bits is so
much bigger it avoids running into the issue.

The flip side is that sequence generated keys have to deal with gaps if record
is deleted later. So the relevant question is not whether you plan to have 2
billion users at any single point in the future but rather whether you plan to
ever have had 2 billion users total over your history. I suspect large
networks like Yahoo or Google might be nearing or past that point now even
though they probably only have a few hundred million current users.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: Using PK value as a String

From
Mathias Stjernström
Date:
Hi!

We use normal sequences to generate id's across multiple nodes.  We
use the "increment" parameter for the sequence and we specify each
node to increment its sequence with for example 10 and the the first
node to start the sequence at 1 and the second at 2 and so on. In that
way you get an unique ID across each nodes thats an INT. Not in
chronological order but it's unique ;)

The only issue with this is that the value you chose for increment
value is your node limit.

Cheers!

Mathias


On 12 aug 2008, at 14.51, Gregory Stark wrote:

> "Mario Weilguni" <mweilguni@sime.com> writes:
>
>> UUID is already a surrogate key not a natural key, in no aspect
>> better than a
>> numeric key, just taking a lot more space.
>>
>> So why not use int4/int8?
>
> The main reason to use UUID instead of sequences is if you want to
> be able to
> generate unique values across multiple systems. So, for example, if
> you want
> to be able to send these userids to another system which is taking
> registrations from lots of places. Of course that only works if that
> other
> system is already using UUIDs and you're all using good generators.
>
> You only need int8 if you might someday have more than 2 *billion*
> users...
> Probably not an urgent issue.
>
> --
>  Gregory Stark
>  EnterpriseDB          http://www.enterprisedb.com
>  Ask me about EnterpriseDB's Slony Replication support!
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Attachment

Re: Using PK value as a String

From
Mark Mielke
Date:
Gregory Stark wrote:
"Mark Mielke" <mark@mark.mielke.cc> writes:
 
   - Increased keyspace. Even if keyspace allocation is performed, an int4 only
has 32-bit of keyspace to allocate. The IPv4 address space is already over 85%
allocated as an example of how this can happen. 128-bits has a LOT more
keyspace than 32-bits or 64-bits.   
The rest of your points are valid (though not particularly convincing to me
for most applications) but this example is bogus. The IPv4 address space is
congested because of the hierarchic nature of allocations. Not because there
is an actual shortage of IPv4 addresses themselves. There would be enough IPv4
for every ethernet device on the planet for decades to come if we could
allocate them individually -- but we can't. 

I don't disagree. Obviously, most systems people work with do not require 2**32 records. You trimmed my bottom statement where "most systems don't require any of these benefits - it's only a just in case." :-)

The point is valid - 128-bits has more keyspace than 32-bits or 64-bits. The relevance of this point to a particular application other than Facebook, Google, or Yahoo, is probably low or non-existent.

Cheers,
mark


That is, when allocating an organization 100 addresses if they want to be able
to treat them as a contiguous network they must be allocated 128 addresses.
And if they might ever grow to 129 they're better off just justifying 256
addresses today.

That's not an issue for a sequence generated primary key. Arguably it *is* a
problem for UUID which partitions up that 128-bits much the way the original
pre-CIDR IPv4 addressing scheme partitioned up the address. But 128-bits is so
much bigger it avoids running into the issue.

The flip side is that sequence generated keys have to deal with gaps if record
is deleted later. So the relevant question is not whether you plan to have 2
billion users at any single point in the future but rather whether you plan to
ever have had 2 billion users total over your history. I suspect large
networks like Yahoo or Google might be nearing or past that point now even
though they probably only have a few hundred million current users.
 


-- 
Mark Mielke <mark@mielke.cc>

Re: Using PK value as a String

From
"Merlin Moncure"
Date:
On Tue, Aug 12, 2008 at 9:46 AM, Gregory Stark <stark@enterprisedb.com> wrote:
> "Mark Mielke" <mark@mark.mielke.cc> writes:
>
>>    - Increased keyspace. Even if keyspace allocation is performed, an int4 only
>> has 32-bit of keyspace to allocate. The IPv4 address space is already over 85%
>> allocated as an example of how this can happen. 128-bits has a LOT more
>> keyspace than 32-bits or 64-bits.
>
> The rest of your points are valid (though not particularly convincing to me
> for most applications) but this example is bogus. The IPv4 address space is
> congested because of the hierarchic nature of allocations. Not because there
> is an actual shortage of IPv4 addresses themselves. There would be enough IPv4
> for every ethernet device on the planet for decades to come if we could
> allocate them individually -- but we can't.

Only because of NAT.   There are a _lot_ of IP devices out there maybe
not billions, but maybe so, and 'enough for decades' is quite a
stretch.

merlin

Re: Using PK value as a String

From
Moritz Onken
Date:
We chose UUID as PK because there is still some information in an
integer key.
You can see if a user has registered before someone else (user1.id <
user2.id)
or you can see how many new users registered in a specific period of
time
(compare the id of the newest user to the id a week ago). This is
information
which is in some cases critical.

moritz

Re: Using PK value as a String

From
Bill Moran
Date:
In response to Moritz Onken <onken@houseofdesign.de>:

> We chose UUID as PK because there is still some information in an
> integer key.
> You can see if a user has registered before someone else (user1.id <
> user2.id)
> or you can see how many new users registered in a specific period of
> time
> (compare the id of the newest user to the id a week ago). This is
> information
> which is in some cases critical.

So you're accidentally storing critical information in magic values
instead of storing it explicitly?

Good luck with that.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Using PK value as a String

From
Moritz Onken
Date:
Am 12.08.2008 um 17:04 schrieb Bill Moran:

> In response to Moritz Onken <onken@houseofdesign.de>:
>
>> We chose UUID as PK because there is still some information in an
>> integer key.
>> You can see if a user has registered before someone else (user1.id <
>> user2.id)
>> or you can see how many new users registered in a specific period of
>> time
>> (compare the id of the newest user to the id a week ago). This is
>> information
>> which is in some cases critical.
>
> So you're accidentally storing critical information in magic values
> instead of storing it explicitly?
>
> Good luck with that.
>


How do I store critical information? I was just saying that it easy
to get some information out of a primary key which is an incrementing
integer. And it makes sense, in some rare cases, to have a PK which
is some kind of random like UUIDs where you cannot guess the next value.

moritz

Re: Using PK value as a String

From
Bill Moran
Date:
In response to Moritz Onken <onken@houseofdesign.de>:

>
> Am 12.08.2008 um 17:04 schrieb Bill Moran:
>
> > In response to Moritz Onken <onken@houseofdesign.de>:
> >
> >> We chose UUID as PK because there is still some information in an
> >> integer key.
> >> You can see if a user has registered before someone else (user1.id <
> >> user2.id)
> >> or you can see how many new users registered in a specific period of
> >> time
> >> (compare the id of the newest user to the id a week ago). This is
> >> information
> >> which is in some cases critical.
> >
> > So you're accidentally storing critical information in magic values
> > instead of storing it explicitly?
> >
> > Good luck with that.
>
> How do I store critical information? I was just saying that it easy
> to get some information out of a primary key which is an incrementing
> integer. And it makes sense, in some rare cases, to have a PK which
> is some kind of random like UUIDs where you cannot guess the next value.

I just repeated your words.  Read above "this is information which is in
some cases critical."

If I misunderstood, then I misunderstood.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Using PK value as a String

From
Moritz Onken
Date:
Am 12.08.2008 um 17:21 schrieb Bill Moran:

> In response to Moritz Onken <onken@houseofdesign.de>:
>
>>
>> Am 12.08.2008 um 17:04 schrieb Bill Moran:
>>
>>> In response to Moritz Onken <onken@houseofdesign.de>:
>>>
>>>> We chose UUID as PK because there is still some information in an
>>>> integer key.
>>>> You can see if a user has registered before someone else
>>>> (user1.id <
>>>> user2.id)
>>>> or you can see how many new users registered in a specific period
>>>> of
>>>> time
>>>> (compare the id of the newest user to the id a week ago). This is
>>>> information
>>>> which is in some cases critical.
>>>
>>> So you're accidentally storing critical information in magic values
>>> instead of storing it explicitly?
>>>
>>> Good luck with that.
>>
>> How do I store critical information? I was just saying that it easy
>> to get some information out of a primary key which is an incrementing
>> integer. And it makes sense, in some rare cases, to have a PK which
>> is some kind of random like UUIDs where you cannot guess the next
>> value.
>
> I just repeated your words.  Read above "this is information which
> is in
> some cases critical."
>
> If I misunderstood, then I misunderstood.

If you are using incrementing integers as pk then you are storing this
data implicitly with your primary key. Using UUIDs is a way to avoid
that.


Re: Using PK value as a String

From
Steve Atkins
Date:
On Aug 12, 2008, at 8:21 AM, Bill Moran wrote:

> In response to Moritz Onken <onken@houseofdesign.de>:
>
>>
>> Am 12.08.2008 um 17:04 schrieb Bill Moran:
>>
>>> In response to Moritz Onken <onken@houseofdesign.de>:
>>>
>>>> We chose UUID as PK because there is still some information in an
>>>> integer key.
>>>> You can see if a user has registered before someone else
>>>> (user1.id <
>>>> user2.id)
>>>> or you can see how many new users registered in a specific period
>>>> of
>>>> time
>>>> (compare the id of the newest user to the id a week ago). This is
>>>> information
>>>> which is in some cases critical.
>>>
>>> So you're accidentally storing critical information in magic values
>>> instead of storing it explicitly?
>>>
>>> Good luck with that.
>>
>> How do I store critical information? I was just saying that it easy
>> to get some information out of a primary key which is an incrementing
>> integer. And it makes sense, in some rare cases, to have a PK which
>> is some kind of random like UUIDs where you cannot guess the next
>> value.
>
> I just repeated your words.  Read above "this is information which
> is in
> some cases critical."
>
> If I misunderstood, then I misunderstood.
>

I think Moritz is more concerned about leakage of critical information,
rather than intentional storage of it. When a simple incrementing
integer
is used as an identifier in publicly visible places (webapps, ticketing
systems) then that may leak more information than intended.

Cheers,
   Steve


Re: Using PK value as a String

From
Bill Moran
Date:
In response to Steve Atkins <steve@blighty.com>:

>
> On Aug 12, 2008, at 8:21 AM, Bill Moran wrote:
>
> > In response to Moritz Onken <onken@houseofdesign.de>:
> >
> >>
> >> Am 12.08.2008 um 17:04 schrieb Bill Moran:
> >>
> >>> In response to Moritz Onken <onken@houseofdesign.de>:
> >>>
> >>>> We chose UUID as PK because there is still some information in an
> >>>> integer key.
> >>>> You can see if a user has registered before someone else
> >>>> (user1.id <
> >>>> user2.id)
> >>>> or you can see how many new users registered in a specific period
> >>>> of
> >>>> time
> >>>> (compare the id of the newest user to the id a week ago). This is
> >>>> information
> >>>> which is in some cases critical.
> >>>
> >>> So you're accidentally storing critical information in magic values
> >>> instead of storing it explicitly?
> >>>
> >>> Good luck with that.
> >>
> >> How do I store critical information? I was just saying that it easy
> >> to get some information out of a primary key which is an incrementing
> >> integer. And it makes sense, in some rare cases, to have a PK which
> >> is some kind of random like UUIDs where you cannot guess the next
> >> value.
> >
> > I just repeated your words.  Read above "this is information which
> > is in
> > some cases critical."
> >
> > If I misunderstood, then I misunderstood.
> >
>
> I think Moritz is more concerned about leakage of critical information,
> rather than intentional storage of it. When a simple incrementing
> integer
> is used as an identifier in publicly visible places (webapps, ticketing
> systems) then that may leak more information than intended.

Then I did misunderstand.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Using PK value as a String

From
Mark Mielke
Date:
Bill Moran wrote:
>>>>>> We chose UUID as PK because there is still some information in an
>>>>>> integer key.
>>>>>> You can see if a user has registered before someone else
>>>>>> (user1.id <
>>>>>> user2.id)
>>>>>> or you can see how many new users registered in a specific period
>>>>>> of
>>>>>> time
>>>>>> (compare the id of the newest user to the id a week ago). This is
>>>>>> information
>>>>>> which is in some cases critical.
>> I think Moritz is more concerned about leakage of critical information,
>> rather than intentional storage of it. When a simple incrementing
>> integer
>> is used as an identifier in publicly visible places (webapps, ticketing
>> systems) then that may leak more information than intended.
>>

While we are on this distraction - UUID will sometimes encode "critical"
information such as: 1) The timestamp (allowing users to be compared),
and 2) The MAC address of the computer that generated it.

So, I wouldn't say that UUID truly protects you here unless you are sure
to use one of the UUID formats that is not timestamp or MAC address
based. The main benefit of UUID here is the increased keyspace, so
predicting sequence becomes more difficult.

(Note that an all-random UUID is not better than two pairs of all-random
64-bit integers with a good random number source. :-) )

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>


Re: Using PK value as a String

From
"H. Hall"
Date:
Bill Moran wrote:
> In response to Steve Atkins <steve@blighty.com>:
>
>
>> On Aug 12, 2008, at 8:21 AM, Bill Moran wrote:
>>
>>
>>> In response to Moritz Onken <onken@houseofdesign.de>:
>>>
>>>
>>>> Am 12.08.2008 um 17:04 schrieb Bill Moran:
>>>>
>>>>
>>>>> In response to Moritz Onken <onken@houseofdesign.de>:
>>>>>
>>>>>
>>>>>> We chose UUID as PK because there is still some information in an
>>>>>> integer key.
>>>>>> You can see if a user has registered before someone else
>>>>>> (user1.id <
>>>>>> user2.id)
>>>>>> or you can see how many new users registered in a specific period
>>>>>> of
>>>>>> time
>>>>>> (compare the id of the newest user to the id a week ago). This is
>>>>>> information
>>>>>> which is in some cases critical.
>>>>>>
>>>>> So you're accidentally storing critical information in magic values
>>>>> instead of storing it explicitly?
>>>>>
>>>>> Good luck with that.
>>>>>
>>>> How do I store critical information? I was just saying that it easy
>>>> to get some information out of a primary key which is an incrementing
>>>> integer. And it makes sense, in some rare cases, to have a PK which
>>>> is some kind of random like UUIDs where you cannot guess the next
>>>> value.
>>>>
Interesting. Ordered chronologically and the next value is unguessable.

>>> I just repeated your words.  Read above "this is information which
>>> is in
>>> some cases critical."
>>>
>>> If I misunderstood, then I misunderstood.
>>>
>>>
>> I think Moritz is more concerned about leakage of critical information,
>> rather than intentional storage of it. When a simple incrementing
>> integer
>> is used as an identifier in publicly visible places (webapps, ticketing
>> systems) then that may leak more information than intended.
>>

I think there are better ways to accomplish this than encoding and
decoding/decrypting a PK. Store the sensitive data in a session variable
or store session data in the database neither of which is accessible to
users.

It is usually a big mistake to de-normalize a table by encoding several
fields in a single column PK or not.  If you want to do something with
the encoded data such as find one or more rows with an encoded value
then  you will have to inspect every row and decode it and then compare
it or add it or whatever.


>
> Then I did misunderstand.
>
>


--
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com