Mark, Nathan,
I'm moving this over to the PGSQL-SQL list, away from -hackers, as it's no
longer a -hackers type discussion. Hope you don't mind!
> On Wed, Sep 07, 2005 at 11:31:16AM -0700, Josh Berkus wrote:
> > I'm also a little baffled to come up with any real application where
> > making an id number for most tables "unguessable" would provide any
> > kind of real protection not far better provided by other means. For
> > your "users" table, sure, but that's a very special case.
>
> It should never be the sole means of defense, however, it can be quite
> effective at prevention.
>
> For a rather simple example, consider a site that associates a picture
> with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc.
> it makes it ridiculously easy to write a script to pull all of the
> pictures off the site. This can be bothersome, as the only type of
> person who would do this, is the type of person with an illegitimate
> motivation. I want the data to be easily and freely accessible as
> specific objects, but I do not wish to provide an easy way of
> dumping all of the data as a unit.
>
> By making the picture identifier unguessable, it discourages the most
> common sort of abuse of the system. If the number is unguessable, and
> they can't access the directory as a listing, it will be sufficiently
> difficult as to discourage the common abuser of the system. On the
> other hand, an obviously guessable identifier may *encourage* the
> common person to consider abuse.
>
> In my case, it isn't only pictures. I don't want people pulling all
> the data off the site as a dump, and using it how they wish, but I do
> wish to make the data freely available, and easily accessible from a
> web browser.
>
> I'm not under the impression that it is impossible for a competent
> person to dump my database. I am under the impression that the people
> who would do such a thing, tend not to be intelligent, and will be
> stopped by this simple tactic.
>
> I could use any identifier at all. It could be a random sequence of
> characters. The UUID appeals to me, as I don't have to re-invent
> the concept. This use of UUID falls outside the scope of using it
> to join tables. It's a handle that is associated with the data,
> for external identification of the object.
Seems like this would be better served by simply encrypting the three-part
universal key (server|table|row) using an encryption key which is not
public/obvious. That would preserve the obscurity of object naming while
still allowing the UUID to contain useful information.
> I happen to also use it as an internal primary key for the objects
> that fit this category, as I wish to benefit from the built-in merge
> capabilities of UUID over SERIAL, and I don't currently see the
> point of keeping a SERIAL and a UUID for each object. On the last
> point, I did start to do this, but every single one of my queries
> become more complicated as a result. Using the SERIAL for joining,
> and the UUID for identifying a set of rows was becoming a little
> ridiculous for my purposes. Using only the UUID to provide for
> all my purposes is suiting my requirements for the cost of 1.5X
> the size of a primary key index, 2X the size of a index for
> a n to n relation mapping UUID to UUID, and an far less significant
> increase in table space (much less than 1.5X, although I haven't
> finished calculating it yet).
>
> Not that everybody should rip out SERIAL and replace it with UUID,
> but it really isn't that bad, and in some cases, such as mine,
> I don't see the point of using both, and choose to instead allow
> UUID to solve many of my concerns at the same time, with an
> acceptable for me cost in database pages.
Oh, you won't get any argument from me on that one -- no need for *two*
surrogate keys in a table. IME, most tables don't need even *one*.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco