Thread: Drawbacks of using BYTEA for PK?

Drawbacks of using BYTEA for PK?

From

David Garamond

Date:

11 January 2004, 11:56:50

Are there any drawbacks of using BYTEA for PK compared to using a
primitive/atomic data types like INT/SERIAL? (like significant
performance hit, peculiar FK behaviour, etc).

I plan to use BYTEA for GUID (of course, temporarily I hope, until
PostgreSQL officially supports GUID data type), since it seems to be the
most convenient+compact compared to other data types currently
available. I use GUIDs for most PK columns.

--
dave

Re: Drawbacks of using BYTEA for PK?

From

"D. Dante Lorenso"

Date:

11 January 2004, 18:05:57

David Garamond wrote:

> Are there any drawbacks of using BYTEA for PK compared to using a
> primitive/atomic data types like INT/SERIAL? (like significant
> performance hit, peculiar FK behaviour, etc).
>
> I plan to use BYTEA for GUID (of course, temporarily I hope, until
> PostgreSQL officially supports GUID data type), since it seems to be
> the most convenient+compact compared to other data types currently
> available. I use GUIDs for most PK columns.

GUID?  Isn't that really nothing more than an MD5 on a sequence?

    SELECT (MD5(NEXTVAL('my_table_seq'))) AS my_guid;

Since 7.4 has the md5 function built-in, there's your support ;-)
Now just add that to your table's trigger and your good to go.
I think in MS products, they format the guid with dashes in the
style 8-4-4-4-12 but it still looks to me like a 32 character hex
string or a 16 byte (128 bit) value.  You can choose to store the
value however you like, I'm not sure what would be optimal, but
bits are bits, right?

Dante

Re: Drawbacks of using BYTEA for PK?

From

Richard Huxton

Date:

12 January 2004, 06:04:46

On Sunday 11 January 2004 22:05, D. Dante Lorenso wrote:
> David Garamond wrote:
> > Are there any drawbacks of using BYTEA for PK compared to using a
> > primitive/atomic data types like INT/SERIAL? (like significant
> > performance hit, peculiar FK behaviour, etc).
> >
> > I plan to use BYTEA for GUID (of course, temporarily I hope, until
> > PostgreSQL officially supports GUID data type), since it seems to be
> > the most convenient+compact compared to other data types currently
> > available. I use GUIDs for most PK columns.
>
> GUID?  Isn't that really nothing more than an MD5 on a sequence?
>
>     SELECT (MD5(NEXTVAL('my_table_seq'))) AS my_guid;

I think the point of a GUID is it's supposed to be unique across any number of
machines without requiring those machines to coordinate their use of GUID
values.

I think the typical approach is to use something like:
hash_fn( network_mac_address || other_hopefully_unique_constant ||
sequence_val )
and make sure that the probability of getting collisions is acceptably low.

ISTR a long discussion a year or two back on one of the lists, for those that
are interested.
--
  Richard Huxton
  Archonet Ltd

Re: Drawbacks of using BYTEA for PK?

From

David Garamond

Date:

12 January 2004, 09:51:05

D. Dante Lorenso wrote:
> GUID?  Isn't that really nothing more than an MD5 on a sequence?
>
>    SELECT (MD5(NEXTVAL('my_table_seq'))) AS my_guid;

I know there are several algorithms to generate GUID, but this is
certainly inadequate :-) You need to make sure that the generated GUID
will be unique throughout cyberspace (or to be more precise, the GUID
should have a very very small chance of colliding with other people's
GUID). Even OID is not a good seed at all.

Perhaps I can make a GUID by MD5( two random numbers || a timestamp || a
unique seed like MD5 of '/sbin/ifconfig' output)...

> Since 7.4 has the md5 function built-in, there's your support ;-)

Well, until there's a GUID or INT128 or BIGBIGINT builtin type I doubt
many people will regard PostgreSQL as fully supporting GUID. I believe
there's the pguuid project in GBorg site that does something like this.

--
dave

Re: Drawbacks of using BYTEA for PK?

From

Tom Lane

Date:

12 January 2004, 10:51:16

David Garamond <lists@zara.6.isreserved.com> writes:
> Perhaps I can make a GUID by MD5( two random numbers || a timestamp || a
> unique seed like MD5 of '/sbin/ifconfig' output)...

Adding an MD5 hash contributes *absolutely zero*, except waste of space,
to any attempt to make a GUID.  The hash will add no uniqueness that was
not there before.

            regards, tom lane

Re: Drawbacks of using BYTEA for PK?

From

David Garamond

Date:

12 January 2004, 11:53:34

Tom Lane wrote:
> David Garamond <lists@zara.6.isreserved.com> writes:
>
>>Perhaps I can make a GUID by MD5( two random numbers || a timestamp || a
>>unique seed like MD5 of '/sbin/ifconfig' output)...
>
> Adding an MD5 hash contributes *absolutely zero*, except waste of space,
> to any attempt to make a GUID.  The hash will add no uniqueness that was
> not there before.

Of course, in the above case, MD5 is used to compress the "uniqueness"
(which should be more than 128-bit, comprised of: a) [good] random
number; b) timestamp; c) a "node ID" element, either from /sbin/config
output which contain MAC address, or from the hash of harddisk content,
etc) into a 128-bit space.

--
dave

Re: Drawbacks of using BYTEA for PK?

From

"D. Dante Lorenso"

Date:

12 January 2004, 14:01:56

> Tom Lane wrote:
>
>> Adding an MD5 hash contributes *absolutely zero*, except waste of space,
>> to any attempt to make a GUID.  The hash will add no uniqueness that was
>> not there before.
>
The cool thing about a 'GUID' (or in my example a hashed sequence number
[sure
toss in some entropy if you want it]) is that if you happen to reference
that
value as a primary key on a table, the URL that passes the argument can not
be guessed at easily.  For example using a sequence:

    http://domain.com/application/load_record.html?customer_id=12345

Then, users of the web will assume that you have at most 12345
customers.  And
they can try to look up information on other customers by doing:

    http://domain.com/application/load_record.html?customer_id=12346
    http://domain.com/application/load_record.html?customer_id=12344

...basically walking the sequence.  Sure, you will protect against this with
access rights, BUT...seeing the sequence is a risk and not something you
want
to happen.  NOW, if you use a GUID:

http://domain.com/application/load_record.html?customer_id=f46d6296-5362-2526-42e3-1b8ce9dcccc1

Right, so now try to guess the next value in this sequence.  It's a little
more protective and obfuscated (an advantage in using GUIDs).

Dante

Re: Drawbacks of using BYTEA for PK?

From

"scott.marlowe"

Date:

12 January 2004, 16:13:14

On Mon, 12 Jan 2004, D. Dante Lorenso wrote:

>
> > Tom Lane wrote:
> >
> >> Adding an MD5 hash contributes *absolutely zero*, except waste of space,
> >> to any attempt to make a GUID.  The hash will add no uniqueness that was
> >> not there before.
> >
> The cool thing about a 'GUID' (or in my example a hashed sequence number
> [sure
> toss in some entropy if you want it]) is that if you happen to reference
> that
> value as a primary key on a table, the URL that passes the argument can not
> be guessed at easily.  For example using a sequence:
>
>     http://domain.com/application/load_record.html?customer_id=12345
>
> Then, users of the web will assume that you have at most 12345
> customers.  And
> they can try to look up information on other customers by doing:
>
>     http://domain.com/application/load_record.html?customer_id=12346
>     http://domain.com/application/load_record.html?customer_id=12344
>
> ...basically walking the sequence.  Sure, you will protect against this with
> access rights, BUT...seeing the sequence is a risk and not something you
> want
> to happen.  NOW, if you use a GUID:

Security != obscurity.

While using GUIDs may make it harder to get hacked, it in no way actually
increases security.  Real security comes from secure code, period.

Re: Drawbacks of using BYTEA for PK?

From

Greg Stark

Date:

12 January 2004, 16:52:49

"scott.marlowe" <scott.marlowe@ihs.com> writes:

> > they can try to look up information on other customers by doing:
> >
> >     http://domain.com/application/load_record.html?customer_id=12346
> >     http://domain.com/application/load_record.html?customer_id=12344
> >
> > ...basically walking the sequence.  Sure, you will protect against this with
> > access rights, BUT...seeing the sequence is a risk and not something you
> > want
> > to happen.  NOW, if you use a GUID:
>
> Security != obscurity.
>
> While using GUIDs may make it harder to get hacked, it in no way actually
> increases security.  Real security comes from secure code, period.

Well, uh, you're both wrong.

On the one hand if your GUIDs are just an MD5 of a sequence then they're just
as guessable as the sequence. The attacker can try MD5 of various numbers
until he finds the one he is (it's probably on the web site somewhere anyways)
and then run MD5 himself on whatever number he feels.

On the other hand it is possible to do this right. Include a secret of some
kind in the MD5 hash, something that's not publically available. That secret
is in essence the password to the scheme. Now it's not really "obscurity" any
more than any password based scheme is "security through obscurity".

However even that isn't ideal, since you have to be able to change the
password periodically in case it's leaked. I believe there are techniques to
solve this though I can' think of any off the top of my head.

But if your only threat model is people attacking based on the publicly
visible information then an MD5 of the combination of a sequence and a secret
is a perfectly reasonable approach.

In the past I happily exposed the sequence but used an MD5 of the sequence and
a secret as a protection against spoofing. I find exposing the sequence is
very convenient for programming and debugging problems. Spoofing is a serious
security hazard, but worrying about leaking information like the size of the
customer database is usually a sign of people hoping for security through
obscurity.

--
greg

Re: Drawbacks of using BYTEA for PK?

From

David Garamond

Date:

12 January 2004, 18:00:43

Greg Stark wrote:
> On the other hand it is possible to do this right. Include a secret of some
> kind in the MD5 hash, something that's not publically available. That secret
> is in essence the password to the scheme. Now it's not really "obscurity" any
> more than any password based scheme is "security through obscurity".
>
> However even that isn't ideal, since you have to be able to change the
> password periodically in case it's leaked. I believe there are techniques to
> solve this though I can' think of any off the top of my head.
>
> But if your only threat model is people attacking based on the publicly
> visible information then an MD5 of the combination of a sequence and a secret
> is a perfectly reasonable approach.

We're originally talking about using MD5 as a means to generate unique
ID right (and not to store password hash to be checked against later)?

Then this "secret key" is unnecessary. Just get some truly random bits
(if the number of bits is 128, then you can use it as it is. If the
number of bits is > 128, you can hash it using MD5 to get 128 bit. If
the number of bits is < 128, you're "screwed" anyway :-)

--
dave

Re: Drawbacks of using BYTEA for PK?

From

"Keith C. Perry"

Date:

12 January 2004, 18:13:39

Quoting Greg Stark <gsstark@mit.edu>:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
>
> > > they can try to look up information on other customers by doing:
> > >
> > >     http://domain.com/application/load_record.html?customer_id=12346
> > >     http://domain.com/application/load_record.html?customer_id=12344
> > >
> > > ...basically walking the sequence.  Sure, you will protect against this
> with
> > > access rights, BUT...seeing the sequence is a risk and not something you
>
> > > want
> > > to happen.  NOW, if you use a GUID:
> >
> > Security != obscurity.
> >
> > While using GUIDs may make it harder to get hacked, it in no way actually
> > increases security.  Real security comes from secure code, period.
>
> Well, uh, you're both wrong.
>
> On the one hand if your GUIDs are just an MD5 of a sequence then they're
> just
> as guessable as the sequence. The attacker can try MD5 of various numbers
> until he finds the one he is (it's probably on the web site somewhere
> anyways)
> and then run MD5 himself on whatever number he feels.
>
> On the other hand it is possible to do this right. Include a secret of some
> kind in the MD5 hash, something that's not publically available. That secret
> is in essence the password to the scheme. Now it's not really "obscurity"
> any
> more than any password based scheme is "security through obscurity".
>
> However even that isn't ideal, since you have to be able to change the
> password periodically in case it's leaked. I believe there are techniques to
> solve this though I can' think of any off the top of my head.
>
> But if your only threat model is people attacking based on the publicly
> visible information then an MD5 of the combination of a sequence and a
> secret
> is a perfectly reasonable approach.
>
> In the past I happily exposed the sequence but used an MD5 of the sequence
> and
> a secret as a protection against spoofing. I find exposing the sequence is
> very convenient for programming and debugging problems. Spoofing is a
> serious
> security hazard, but worrying about leaking information like the size of the
> customer database is usually a sign of people hoping for security through
> obscurity.
>
> --
> greg
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

Its not a question of right or wrong.  Its the method.  One thing I see here is
a failing to use several security methods at different layers.  That really is
necessary for a production environment.  If you want customer id's kept private,
then you need a private connection or to not expose them.  Using an MD5 hash to
"hide" them will slow your app down by some delta and not protect your
connection.  Granted garbling that id with a password is somewhat more secure
but your connection could still be attacked or even hijacked.

In the URL's you gave above, why are you not using HTTPS (i.e. authentication)?
 What about using a crytographic cookies to identify your session and link that
to you userid (after authorization)?

'Just seems like you're not using the right tool (method) for the job here.

$-0.02

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

Re: Drawbacks of using BYTEA for PK?

From

Alex Satrapa

Date:

12 January 2004, 18:41:18

David Garamond wrote:
> Perhaps I can make a GUID by MD5( two random numbers || a timestamp || a
> unique seed like MD5 of '/sbin/ifconfig' output)...

As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
host should be unique for the Internet. Append/prepend a 32 bit
timestamp and you have a 64bit unique identifier that is "universally"
unique (to one second).

Protection From Inference (was Re: Drawbacks of using BYTEA for PK?)

From

Alex Satrapa

Date:

12 January 2004, 19:16:18

Greg Stark wrote:
> ...  worrying about leaking information like the size of the
> customer database is usually a sign of people hoping for security through
> obscurity.

To prevent the size of your database being guessed at from the serial
numbers of your customers' accounts, don't issue the numbers sequentially.

One simplistic method of non-sequential assignment is: generate a random
number between "00...00" and "99...99"*, check if it's already in use -
if not, issue it, if so, regenerate.  When presenting the number, always
format it as an N-digit number with leading zeroes - for Perl
programmers, this would be achieved along the lines of printf("%010d",
$account_number)

Thus you will end up with customer numbers evenly spread over the number
space. This will prevent people inferring the size of your database (or
company) through the account numbers they observe.

To protect the customer's account from being accessed by unauthorised
persons, use form-based password access (not HTTP basic**) and/or X.509
certificates over a secure connection.

As Scotty says, "use the right tool for the right job!"

HTH
Alex Satrapa

*make the number space much larger than your expected number of
accounts. This reduces collisions in random number generation. Another
option is to increment through the number space in the event of a
collision, rather than generating another random number.

**using form-based access, the user can log out when leaving the
terminal. Using HTTP basic, the browser is likely to remember their
login for the entire session, and sometimes even between sessions.

Re: Protection From Inference (was Re: Drawbacks of using BYTEA for PK?)

From

Kragen Sitaker

Date:

12 January 2004, 20:13:27

On Tue, Jan 13, 2004 at 10:15:47AM +1100, Alex Satrapa wrote:
> **using form-based access, the user can log out when leaving the
> terminal. Using HTTP basic, the browser is likely to remember their
> login for the entire session, and sometimes even between sessions.

You can persuade the browser to forget the password just by sending it
a 401.  Unfortunately, the user then has to know to hit 'cancel' on the
resulting dialog box.

Re: Drawbacks of using BYTEA for PK?

From

"D. Dante Lorenso"

Date:

12 January 2004, 22:24:38

>>>>they can try to look up information on other customers by doing:
>>>>
>>>>    http://domain.com/application/load_record.html?customer_id=12346
>>>>    http://domain.com/application/load_record.html?customer_id=12344
>>>>
>>>>...basically walking the sequence.  Sure, you will protect against this
>>>>
>>>>
>>>>to happen.  NOW, if you use a GUID:
>>>>
>>>>
>>>Security != obscurity.
>>>
>>>While using GUIDs may make it harder to get hacked, it in no way actually
>>>increases security.  Real security comes from secure code, period.
>>>
>>>
>>Well, uh, you're both wrong.
>>On the one hand if your GUIDs are just an MD5 of a sequence then they're
>>just as guessable as the sequence.
>>
>>
>Its not a question of right or wrong.  Its the method.  One thing I see here is
>a failing to use several security methods at different layers....why are you not using HTTPS (i.e. authentication)?
> What about using a crytographic cookies to identify your session and link that
>to you userid (after authorization)?
>
>
Ok, my point is not one of security as much as the obscurity.  I have the
security aspect already covered whereby I only select the customer
record from
the database where the logged in account has access to the record.  So, if
you are not the admin or the actual customer, the select will return a code
indicating that you do not have permission to view the given record.

Maybe a better example of my problem is with records throughout the system
like invoices, customer data, etc...  If any of these items use a sequence
and that sequence is global to the table in the database and the number is
exposed externally, then it is possible to infer the success of the company
underneath, is it not?

For instance, if I generate sequential numbers for invoice ids and the
customer
sees #123 as an invoice number one month and sees #128 the next month,
it might
imply that there are only 4 customers getting invoiced each month.

Another example ... let's say customers can create 'Widgets' in their
account.
There might be a page that lists all their 'widgets'.  If you click on the
widget, you can edit it.  A link to do this might look as follows:

    http://.../account/widget_list.html
    http://.../account/widget_edit.html?widget_id=12345

Well, if the widget_id is a sequence (global to the widget table), then
by creating
one widget, customer would get widget id (WIDG_1) and another widget
(WIDG_2),
the customer could see that the widget_id increased by only an amount of

    N = WIDG_2 - WIDG_1

and would therefore provide the assumption that the number of customers
creating
widgets in total does not exceed N.  I don't see this as much of a
problem about
'security' in the respect of who can access the data as much as who can make
conclusions about the company beind the data.

See what I mean?  What do you propose as the best solution for this?
Not expose
the sequences to the user and use user-enumerated ids?  Then a trigger
on the
table would assign ids like:

    SELECT (MAX(widget_id)+1) INTO NEW.widget_id
    WHERE cust_id = NEW.cust_id;

But I think after several hundred customer records, this trigger would start
getting slow.  I don't know really, any ideas?

Dante

Re: Drawbacks of using BYTEA for PK?

From

Greg Stark

Date:

13 January 2004, 01:25:19

"D. Dante Lorenso" <dante@lorenso.com> writes:

> Maybe a better example of my problem is with records throughout the system
> like invoices, customer data, etc...  If any of these items use a sequence
> and that sequence is global to the table in the database and the number is
> exposed externally, then it is possible to infer the success of the company
> underneath, is it not?

Except that's exactly the way business has always been done. Though people
usually start new accounts with check# 50000 or something like that for
precisely that reason. But it's still pretty transparent, and they don't
really worry about it too much.

What you're saying is fundamentally valid, but I tend to think these kinds of
concerns are just generically overblown.

My only comment was that just taking an MD5 of the sequence gives you no
security. At the very least you have to include a secret. Even then I suspect
there are further subtle cryptographic issues. There always are.

--
greg

Re: Drawbacks of using BYTEA for PK?

From

"Chris Travers"

Date:

13 January 2004, 04:43:22

----- Original Message -----
From: "Alex Satrapa" <alex@lintelsys.com.au>
> As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
> host should be unique for the Internet. Append/prepend a 32 bit
> timestamp and you have a 64bit unique identifier that is "universally"
> unique (to one second).

Aarrgh...  So if you have 2 inserts in the same second, you have key
collision?  Why not append a sequence to that so you have:  Unique address
|| timestamp || sequence value.  In a case such as this I can see why you
might want to use md5() to hash that value.

Best Wishes,
Chris Travers

Re: Drawbacks of using BYTEA for PK?

From

"Chris Travers"

Date:

13 January 2004, 04:43:28

Answers inline.

----- Original Message -----
From: "Greg Stark" <gsstark@mit.edu>

> On the one hand if your GUIDs are just an MD5 of a sequence then they're
just
> as guessable as the sequence. The attacker can try MD5 of various numbers
> until he finds the one he is (it's probably on the web site somewhere
anyways)
> and then run MD5 himself on whatever number he feels.
>
> On the other hand it is possible to do this right. Include a secret of
some
> kind in the MD5 hash, something that's not publically available. That
secret
> is in essence the password to the scheme. Now it's not really "obscurity"
any
> more than any password based scheme is "security through obscurity".

You still have the following problem:  the PK is not really used for very
much in this case except referencing data.  This is done internally
(invoices, etc), so the application is presumed to know the ID when looking
up a customer.  Nothing you do will prevent any attack based on searching
the database, i.e. select customer_id from customers; if such an attack is
possible in an application.  I actually think that developers should enforce
security as far back (towards the database) as possible, so if this needs to
be prevented, using a view which only provides access to the customers
required is the preferred solution.  You could also use triggers.

If, however, you want a global unique id which will never collide with any
other records (f. ex. for distributed server solutions), then you have
another problem-- MD5 is NOT guaranteed to be unique.  Think about it-- if
the return digest is of a set length, then there must be many different
values which will create that same digest.  Instead MD5 is designed to
prevent deliberate duplication, which is not what we are talking about here
(accidental duplication) and so you may want to be cautious about hashing
your keys.  In this case, a more open, transparent key would be better.  For
example:

machine identifier || sequence.

You *could* hash these, but it is unnecessary and may actually create
collisions if the machine identifier is sufficiently large.  However,
mac_address || ipv4 address should be sufficient, I would think.  It would
still be attackable in your view, so you could add a timestamp :-) but
again, I see limited utility of guids as a security feature.

Best Wishes,
Chris Travers

cryptography, was Drawbacks of using BYTEA for PK?

From

"Chris Travers"

Date:

13 January 2004, 04:43:33

From: "Keith C. Perry" <netadmin@vcsn.com>
> Using an MD5 hash to
> "hide" them will slow your app down by some delta and not protect your
> connection.  Granted garbling that id with a password is somewhat more
secure
> but your connection could still be attacked or even hijacked.
>
> In the URL's you gave above, why are you not using HTTPS (i.e.
authentication)?
>  What about using a crytographic cookies to identify your session and link
that
> to you userid (after authorization)?

Https I can see.  I am having difficulty understanding how you could use
cryptographic cookies to prevent session hijacking though given the current
setup.  Also you could use ssl between the web server and PostgreSQL to
secure that connection.

As a side question:  Does PostgreSQL support using Kerberos for encrypted
connections (beyond authentication), or do you need to use SSL for that?

Best Wishes,
Chris Travers

Re: Drawbacks of using BYTEA for PK?

From

"Chris Travers"

Date:

13 January 2004, 04:43:39

Sounds to me you have concerns more along the lines of counterintelligence.

> Maybe a better example of my problem is with records throughout the system
> like invoices, customer data, etc...  If any of these items use a sequence
> and that sequence is global to the table in the database and the number is
> exposed externally, then it is possible to infer the success of the
company
> underneath, is it not?

IMO, the solution here is to start your sequences at an arbitrary value
(preferably not round) such as 1543691.  Therefore the first customer
doesn't know that you don't have 1.5M other customers :-)  This could be
calculated for each sequence with a formula such as
SELECT (random() * 1000000 + 1000000)::bigint;

>
> For instance, if I generate sequential numbers for invoice ids and the
> customer
> sees #123 as an invoice number one month and sees #128 the next month,
> it might
> imply that there are only 4 customers getting invoiced each month.
>
Another solution I have seen is to use a formula for your invoices based on:
Letter key for invoice type followed by YYYYMMDD followed by a numeric
sequence.  This also helps to obscure things since the customer may not know
how often you reset the sequence (could be every month, or every day).  The
letter key can uniquely identify your server on your network thereby
creating a GUID.  In other words your sequence need only be unique to a
given time frame.  You could even add a timestamp and a sequence that wraps
around after 9 :-)  That way as long as you don't create 10 invoices in the
same second you are OK.

>     http://.../account/widget_list.html
>     http://.../account/widget_edit.html?widget_id=12345

Provided that each customer is only creating one widget at a time, you could
then take the customer_id and append to it a value of a customer-specific
sequence.  You could even have this as a compound primary key.  That way,
each customer can only determine how many widgets they have created :-)

> See what I mean?  What do you propose as the best solution for this?

Create GUIDS which contain only the information you want.  No need to hash.
See above for examples.

Best Wishes,
Chris Travers

Re: Drawbacks of using BYTEA for PK?

From

"John Sidney-Woollett"

Date:

13 January 2004, 04:55:57

Careful...

If two (or more) clients (in the same network) are going through a
firewall that performs NAT, then they could appear to have the same IP
address if the NAT address pool is small (single address).

Appending a sequence would help resolve that issue though.

John Sidney-Woollett

Chris Travers said:
>
> ----- Original Message -----
> From: "Alex Satrapa" <alex@lintelsys.com.au>
>> As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
>> host should be unique for the Internet. Append/prepend a 32 bit
>> timestamp and you have a 64bit unique identifier that is "universally"
>> unique (to one second).
>
> Aarrgh...  So if you have 2 inserts in the same second, you have key
> collision?  Why not append a sequence to that so you have:  Unique address
> || timestamp || sequence value.  In a case such as this I can see why you
> might want to use md5() to hash that value.
>
> Best Wishes,
> Chris Travers
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

Re: Drawbacks of using BYTEA for PK?

From

David Garamond

Date:

13 January 2004, 05:21:37

Alex Satrapa wrote:
> As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
> host should be unique for the Internet. Append/prepend a 32 bit
> timestamp and you have a 64bit unique identifier that is "universally"
> unique (to one second).

Remember that /sbin/ifconfig output usually include MAC address too. Not
that MAC addresses are 100% unique, but that should increase the uniqueness.

--
dave

Re: Drawbacks of using BYTEA for PK?

From

"Chris Travers"

Date:

13 January 2004, 08:59:43

From: "John Sidney-Woollett" <johnsw@wardbrook.com>

>Careful...

> If two (or more) clients (in the same network) are going through a
> firewall that performs NAT, then they could appear to have the same IP
> address if the NAT address pool is small (single address).

Sorry, I should have been more specific about what I meant by "Unique
address."  The unique address is some identifier which can be reasonably
considered to be unique to the system md5(serial number of the first CPU ||
make/model of the CPU) or md5(mac address || ipv4 address) or ipv6 address.
Concatinating this with a timestamp and a sequence should provide very
little chance of a collision.

Here is a more interesting application though-- suppose you have a
distributed environment and want a way of transparently authenticating where
a record is supposed to be.  In this scenario, you have a situation where
you need some sort of GUID for the database server, which can be specified
when an addition is made or where the referring record exists.  If the
record exists in a different location the other child records could be
deflected there too (perhaps via dblink).  In this way, the pseudo-guid can
contain the customer number along with the location guid.  Any suggestions
on handling this in a manageable and opaque fashion (i.e. not giving
customers case numbers with the mac address fo the server appended ;-))

Best Wishes,
Chris Travers

Re: Drawbacks of using BYTEA for PK?

From

Bruno Wolff III

Date:

13 January 2004, 11:51:28

On Mon, Jan 12, 2004 at 20:24:10 -0600,
  "D. Dante Lorenso" <dante@lorenso.com> wrote:
>
> See what I mean?  What do you propose as the best solution for this?
> Not expose
> the sequences to the user and use user-enumerated ids?  Then a trigger
> on the
> table would assign ids like:
>
>    SELECT (MAX(widget_id)+1) INTO NEW.widget_id
>    WHERE cust_id = NEW.cust_id;
>
> But I think after several hundred customer records, this trigger would start
> getting slow.  I don't know really, any ideas?

I think it would be better to have a per customer counter. Then the GID
would be customer, customer_sequence. You probably wouldn't want to use
postgres sequences for this. I would expect that generating new ID numbers
isn't so common that updating a counter row in a customer counter table
would be a real problem.

Re: cryptography, was Drawbacks of using BYTEA for PK?

From

"Keith C. Perry"

Date:

13 January 2004, 12:05:11

Quoting Chris Travers <chris@travelamericas.com>:

> From: "Keith C. Perry" <netadmin@vcsn.com>
> > Using an MD5 hash to
> > "hide" them will slow your app down by some delta and not protect your
> > connection.  Granted garbling that id with a password is somewhat more
> secure
> > but your connection could still be attacked or even hijacked.
> >
> > In the URL's you gave above, why are you not using HTTPS (i.e.
> authentication)?
> >  What about using a crytographic cookies to identify your session and link
> that
> > to you userid (after authorization)?
>
> Https I can see.  I am having difficulty understanding how you could use
> cryptographic cookies to prevent session hijacking though given the current
> setup.

Cryptographic cookies are actually how TCP SYN flood protection is done on Linux
and I think Solaris so in my case the OS is handling that.  What is implemented
there could be implemented at the application layer but I don't think that
becomes valid once you are using HTTPS since is provide similar facilities.

In my applications, I simply have Apache push a cookie to the browser (during
authorization) which is then used as the session key.  Additionally, I almost
always use POST methods instead of GET (I hate exposing application logic that
way).  Ever time a user does something, the presence of that cookie is checked
in the database.

> Also you could use ssl between the web server and PostgreSQL to
> secure that connection.

True but that is only half the story.  You're client interface is what is
public.  I would SSL the web <--> db connection as a standard but I would be
less concerned about (what I'm assumming is) a local connection behind the DMZ.

> As a side question:  Does PostgreSQL support using Kerberos for encrypted
> connections (beyond authentication), or do you need to use SSL for that?
>
> Best Wishes,
> Chris Travers
>

Not sure about that one but if so, I'm sure someone will speak up  :)

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

Re: cryptography, was Drawbacks of using BYTEA for PK?

From

Greg Stark

Date:

13 January 2004, 17:37:00

"Keith C. Perry" <netadmin@vcsn.com> writes:

> Additionally, I almost always use POST methods instead of GET
> (I hate exposing application logic that way).

Well this is pretty far afield from postgres. But I hope you're not using POST
for idempotent operations like search or record lookups. I hate it when I'm on
a site and i can't bookmark pages usefully or use the back button or open
pages in new windows etc.

POST is ideally only used for non idempotent operations like updates, deletes,
etc. Things that you wouldn't want to be repeated when the user hits reload,
and wouldn't want repeated every time a user revisits a bookmark.

--
greg

Re: Drawbacks of using BYTEA for PK?

From

Alex Satrapa

Date:

13 January 2004, 18:26:36

David Garamond wrote:
> Remember that /sbin/ifconfig output usually include MAC address too. Not
> that MAC addresses are 100% unique, but that should increase the
> uniqueness.

How do you increase uniqueness?  Either a value is unique or it isn't -
if you've got multiple hosts on the network with the same network
address, you're in big trouble!

32 bits for an IP address is a huge number space... but why you'd really
need that much space as a base for your GUID is beyond me. The "host"
part of the address (eg: the last 8 bits in a /24 network block) would
be enough to uniquely identify the 254 hosts on your network. Then add a
32 bit timestamp, and you have 24 bits left for uniquely identifying
things that are created within the same second on the same server -
that's 16M things per second. Busy little shop you'd be running to
exhaust that unique space ;)

Adding extra number space doesn't increase the uniqueness of any
particular key - you have to know how little you can get away with to be
unique. Like distinguishing two humans from each other - you don't need
to unravel the DNA to 3,000 base pairs (3k bits!) if you can settle for
"blonde" versus "auburn" (1 bit!).

I can't remember who said it, but there's a nice quote that's relevant
in this situation: "The true mark of a well designed system is not that
there's nothing left to add, it's that there's nothing left to take away!"

Regards
Alex Satrapa

Re: Drawbacks of using BYTEA for PK?

From

"Nigel J. Andrews"

Date:

13 January 2004, 19:00:37

On Wed, 14 Jan 2004, Alex Satrapa wrote:

> David Garamond wrote:
> > Remember that /sbin/ifconfig output usually include MAC address too. Not
> > that MAC addresses are 100% unique, but that should increase the
> > uniqueness.
>
> How do you increase uniqueness?  Either a value is unique or it isn't -
> if you've got multiple hosts on the network with the same network
> address, you're in big trouble!

It's easily done; you've misread it. David said _MAC_ address. Of course your
comment still stands if you've got the same MAC address on a segment more than
once.

>
> 32 bits for an IP address is a huge number space... but why you'd really
> need that much space as a base for your GUID is beyond me. The "host"
> part of the address (eg: the last 8 bits in a /24 network block) would
> be enough to uniquely identify the 254 hosts on your network.

I might have more that 256 hosts.

I can't comment on the real content of this discussion though since a) I
haven't be reading it and b) I probably wouldn't know what it was on about if
I had been.

--
Nigel J. Andrews

Re: Drawbacks of using BYTEA for PK?

From

Karsten Hilbert

Date:

14 January 2004, 04:53:49

> >Remember that /sbin/ifconfig output usually include MAC address too. Not
> >that MAC addresses are 100% unique, but that should increase the
> >uniqueness.
>
> How do you increase uniqueness?  Either a value is unique or it isn't -
It increases the *likelihood* of uniqueness, IOW the expected
collision frequency.

> if you've got multiple hosts on the network with the same network
> address, you're in big trouble!
He is talking about the MAC which in itself is supposed to be
globally unique. Nothing to do with IP numbers.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Drawbacks of using BYTEA for PK?

From

Karsten Hilbert

Date:

14 January 2004, 05:33:48

> > How do you increase uniqueness?  Either a value is unique or it isn't -
> It increases the *likelihood* of uniqueness, IOW the expected
> collision frequency.
..., IOW ... decreases

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Drawbacks of using BYTEA for PK?

From

"Bob Parkinson"

Date:

14 January 2004, 08:14:54

>He is talking about the MAC which in itself is supposed to be
>globally unique. Nothing to do with IP numbers.

Supposed is the word...

Don't count on MACs being unique. Under some OS's you can set them to what you want.





Bob
==================================
When logic and proportion
Have fallen sloppy dead,
And the White Knight is talking backwards
And the Red Queen's "off with her head!"
Remember what the dormouse said:
"Feed your head. Feed your head. Feed your head"

Re: Drawbacks of using BYTEA for PK?

From

David Garamond

Date:

15 January 2004, 01:53:28

Alex Satrapa wrote:
> David Garamond wrote:
>
>> Remember that /sbin/ifconfig output usually include MAC address too.
>> Not that MAC addresses are 100% unique, but that should increase the
>> uniqueness.
>
> How do you increase uniqueness?  Either a value is unique or it isn't -

Ok, let's say "mathematically unique" and "practically unique". I was
referring to the second one.

> if you've got multiple hosts on the network with the same network
> address, you're in big trouble!
>
> 32 bits for an IP address is a huge number space... but why you'd really
> need that much space as a base for your GUID is beyond me.

The point of using /sbin/ifconfig output (and then hashing it to get a
128bit number) is because the output contains MAC address and IP
address. IP addresses are not unique, particularly in intranet or
clustering situation. MAC addresses are more likely to be unique, but
I've heard stories about cheap ethernet cards having duplicate MAC's,
and beside in many cases MAC addresses can be altered by software.

The probability of both MAC and IP address (and other strings in
/sbin/ifconfig output) being totally the same for two hosts is of course
much smaller, and thus the term "increase [the practical] uniqueness."

--
dave

Re: Drawbacks of using BYTEA for PK?

From

David Garamond

Date:

15 January 2004, 01:55:42

Nigel J. Andrews wrote:
> I can't comment on the real content of this discussion though since a) I
> haven't be reading it and b) I probably wouldn't know what it was on about if
> I had been.

Um, any insight on the original question (see subject)? :-)

--
dave

Re: Drawbacks of using BYTEA for PK?

From

"Chris Travers"

Date:

15 January 2004, 03:25:15

Regarding MAC address as a machine key.

I suspect that you could take md5(MAC_Address || IPV4_Address) to create a
machine ID which you can reasonably expect to be unique (i.e. the chance of
having 2 identical MAC addresses on the same IPV4 address is small enough to
disregard).  I only suggest using MD5 to hide the details of the MAC and
IPV4 address from the outside world, and it is not strictly necessary for
the purposes of this discussion.

Best Wishes,
Chris Travers

Re: Drawbacks of using BYTEA for PK?

From

Bruce Momjian

Date:

16 January 2004, 13:32:28

David Garamond wrote:
> Nigel J. Andrews wrote:
> > I can't comment on the real content of this discussion though since a) I
> > haven't be reading it and b) I probably wouldn't know what it was on about if
> > I had been.
>
> Um, any insight on the original question (see subject)? :-)

A bytea join will use a bytea indexed column just fine, I think.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073