Thread: design, ref integrity and performance

design, ref integrity and performance

From
Ivan Sergio Borgonovo
Date:
Hi,

I've to generate unique password and associate them with emails.
Association with emails is just to mail the password, email +
password aren't "the password", just the password is.

So a bunch of emails may be associated with the same password.

So there are 2 kind of passwords:
- shared, multiple use
- individual (they could be multiple or single use)

I've as input a list of emails and according to the kind of
passwords I've to generate I fill a table that is
create table pw_email(
  password varchar(16),
  email varchar(64)
);

So data inside may look like

/* bunch of shared passwords */
abcdefg, 1@example.com
abcdefg, 2@example.com
abcdefg, 3@example.com
abcdefg, 4@example.com
/* bunch of individual passwords */
abcdefg1, 1@example.com
abcdefg2, 2@example.com
abcdefg3, 5@example.com
abcdefg4, 6@example.com

Now each password is linked to the access of one or more resource.
I could just add the resource id to the pw_email table but:
- it is not normalized
- considering many password will be duplicated, I'll have to check
  on a larger table to see which password give access to what
- I'd like to associate different passwords to different resource so
  that one password doesn't grant access to more than one resource.

create table pw_resource(
  res int references ...
  password varchar(16)
);

Of course if one password is in pw_email but it is not in
pw_resource I have a problem.

But I've to generate password/email couples first before filling
pw_resource.

I could make the constraint deferrable, add a on delete cascade to
pw_email.password but:
- I'm not sure it is a good design
- I'm still not sure it can work as expected

Before testing if it can work I'd like to hear some comment on the
design.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: design, ref integrity and performance

From
Richard Huxton
Date:
Ivan Sergio Borgonovo wrote:
> Hi,
>
> I've to generate unique password and associate them with emails.
> Association with emails is just to mail the password, email +
> password aren't "the password", just the password is.
>
> So a bunch of emails may be associated with the same password.
>
> So there are 2 kind of passwords:
> - shared, multiple use
> - individual (they could be multiple or single use)

So are you saying I login using *just* a password, not using my email as
a user-name?

How do you know who is logging in and what does it mean for the password
to be shared?

--
  Richard Huxton
  Archonet Ltd

Re: design, ref integrity and performance

From
Ivan Sergio Borgonovo
Date:
On Tue, 27 Oct 2009 09:17:59 +0000
Richard Huxton <dev@archonet.com> wrote:

> Ivan Sergio Borgonovo wrote:
> > Hi,

> > I've to generate unique password and associate them with emails.
> > Association with emails is just to mail the password, email +
> > password aren't "the password", just the password is.

> > So a bunch of emails may be associated with the same password.

> > So there are 2 kind of passwords:
> > - shared, multiple use
> > - individual (they could be multiple or single use)

> So are you saying I login using *just* a password, not using my
> email as a user-name?

yes

> How do you know who is logging in and what does it mean for the
> password to be shared?

I don't care who "logged in".
Shared just mean several people could use a resource providing the
same password.
They are a sort of "promotional codes". Some of them can be shared,
I don't mind if people give them to others provided they let our
service be known by others.
Then there are unique passwords on which we may or may not check if
they are used by multiple users of the site... but that's another
problem.

Association between email and password is just meant to build up a
queue for mailing and there is no uniqueness constraint on
(password, email) pair.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: design, ref integrity and performance

From
Richard Huxton
Date:
Ivan Sergio Borgonovo wrote:
> On Tue, 27 Oct 2009 09:17:59 +0000
> Richard Huxton <dev@archonet.com> wrote:
>
>> Ivan Sergio Borgonovo wrote:
>>> Hi,
>
>>> I've to generate unique password and associate them with emails.
>>> Association with emails is just to mail the password, email +
>>> password aren't "the password", just the password is.
>
>>> So a bunch of emails may be associated with the same password.
>
>>> So there are 2 kind of passwords:
>>> - shared, multiple use
>>> - individual (they could be multiple or single use)
>
>> So are you saying I login using *just* a password, not using my
>> email as a user-name?
>
> yes
>
>> How do you know who is logging in and what does it mean for the
>> password to be shared?
>
> I don't care who "logged in".
> Shared just mean several people could use a resource providing the
> same password.
> They are a sort of "promotional codes". Some of them can be shared,
> I don't mind if people give them to others provided they let our
> service be known by others.
> Then there are unique passwords on which we may or may not check if
> they are used by multiple users of the site... but that's another
> problem.

OK. So not really passwords at all.

> Association between email and password is just meant to build up a
> queue for mailing and there is no uniqueness constraint on
> (password, email) pair.

> create table pw_email(
>   password varchar(16),
>   email varchar(64)
> );

> create table pw_resource(
>   res int references ...
>   password varchar(16)
> );

So the tables are clear enough, with pw_resource having "password" as a
primary key.

> But I've to generate password/email couples first before filling
> pw_resource.

The simplest thing would be to do them the other way around, but
assuming you can't...

The "most relational" way would be to have a separate table containing
just passwords, then reference that from both pw_resource and pw_email.

INSERT INTO passwords VALUES ('abcde');
INSERT INTO pw_email VALUES('abcde', 'someone@example.com');
...
INSERT INTO pw_resource VALUES ('abcde', 12345);

Failing that you could allow nulls for pw_resource.res and do something
like:

INSERT INTO pw_resource VALUES ('abcde', null);
INSERT INTO pw_email VALUES('abcde', 'someone@example.com');
...
UPDATE pw_resource SET res=12345 WHERE password='abcde';

--
  Richard Huxton
  Archonet Ltd

still on joining array/inline values was and is: design, ref integrity and performance

From
Ivan Sergio Borgonovo
Date:
On Tue, 27 Oct 2009 10:54:06 +0000
Richard Huxton <dev@archonet.com> wrote:

> > Association between email and password is just meant to build up
> > a queue for mailing and there is no uniqueness constraint on
> > (password, email) pair.
>
> > create table pw_email(
> >   password varchar(16),
> >   email varchar(64)
> > );
>
> > create table pw_resource(
> >   res int references ...
> >   password varchar(16)
> > );

> > But I've to generate password/email couples first before filling
> > pw_resource.
>
> The simplest thing would be to do them the other way around, but
> assuming you can't...

Trying to understand why I can't do the other way around I made
clearer to myself the constraints.

- There is no reason to have a password without an associated
  recipient.
- There is no reason to have a password without an associated
  resource resource.
- I don't want the same password for more than one resource.

So to satisfy the first 2 requirements... it looks to me the table
should be:


create table resources(
  res int primary key
  /* other characteristics of the resource... */
);

create table pw(
  res int references resources(ref) on delete cascade,
  password varchar(16) not null,
  email varchar(64) not null,
  /* Should I chose some unique constraint on the couples? which? */
);

To have the 3rd constraint I'd have a table:
create table pw_res(
  password varchar(16) primary key,
  res int references resources (res) on delete cascade
);

This comes handy for 2 reasons:
- it helps me to enforce the third constraint
- it makes it easier to find which resource is associated with a
  password that will be a common search

But this introduces one more problem if I decide to delete a
password. This could happen for "shared" and non shared passwords.
I don't think it is something that may happen frequently... but it
may happen... and once you start to introduce mess in your data it
is hard to come back.

So a fk on pw.password may help... but... once I've that fk I can't
insert res,pw,email without pw baing in pw_res.

If I do the other way around inserting password(s) in pw_res I may
end up with passwords with no destination associated.

Surely I could wrap everything in a transaction so that if I can't
succede to insert email/pw records everything will be rolled back.

But I can't see how to make it cleaner.

I can get emails and associate them with a resource and a password
in one sql statement if I can defer the constraint check on password.
The next statement would be used to fill pw_res with distinct values
from pw.

If I do the other way around in case I need different passwords for
different emails I'll have to:
- count the emails and fill pw_res with as many password as needed.
  I think I'll need generate_series.
- find a way to associate these passwords with emails

I don't know how to do the later.
Still supposing I knew how to write the sql statement that will take
values('email1', 'email2'...)
as suggested by Merlin Moncure in a different thread I can see some
pros and cons of these approaches.

The first approach requires a distinct/group by that may be
expensive.
The second one requires I keep in memory all the emails while the
first statement run.

I'd think that if memory consumption start to be an issue, that
means that the number of emails start to be too large to be loaded
from an HTTP request, so I could have it on file and load it in a
temp table and so on...

Still before I make any test I need to find how to associate emails
with passwords considering that I'd like to pass email "inline" the
SQL statement and I'll have the passwords in pw_res.


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


On Wed, Oct 28, 2009 at 4:50 AM, Ivan Sergio Borgonovo
<mail@webthatworks.it> wrote:
>
> To have the 3rd constraint I'd have a table:
> create table pw_res(
>  password varchar(16) primary key,
>  res int references resources (res) on delete cascade
> );
>
> This comes handy for 2 reasons:
> - it helps me to enforce the third constraint
> - it makes it easier to find which resource is associated with a
>  password that will be a common search
>
> But this introduces one more problem if I decide to delete a
> password. This could happen for "shared" and non shared passwords.
> I don't think it is something that may happen frequently... but it
> may happen... and once you start to introduce mess in your data it
> is hard to come back.

Not sure why this is a mess?  Cascaded deletes can clean up the data.
If you need it for some business reason add an active flag.

>
> So a fk on pw.password may help... but... once I've that fk I can't
> insert res,pw,email without pw baing in pw_res.
>
> If I do the other way around inserting password(s) in pw_res I may
> end up with passwords with no destination associated.
>
> Surely I could wrap everything in a transaction so that if I can't
> succede to insert email/pw records everything will be rolled back.
>
> But I can't see how to make it cleaner.

At some point you need business logic to enforce your business
requirements.  There is little point in trying to enforce business
constraints in the database.  Stick to pure low level data integrity
for the database and keep the business constraints at some higher
level.

>
> I can get emails and associate them with a resource and a password
> in one sql statement if I can defer the constraint check on password.
> The next statement would be used to fill pw_res with distinct values
> from pw.
>
> If I do the other way around in case I need different passwords for
> different emails I'll have to:
> - count the emails and fill pw_res with as many password as needed.
>  I think I'll need generate_series.
> - find a way to associate these passwords with emails
>
> I don't know how to do the later.
> Still supposing I knew how to write the sql statement that will take
> values('email1', 'email2'...)
> as suggested by Merlin Moncure in a different thread I can see some
> pros and cons of these approaches.
>
> The first approach requires a distinct/group by that may be
> expensive.
> The second one requires I keep in memory all the emails while the
> first statement run.

Unless you're dealing with 100,000's of these things I think you're
engaging in a process of "premature optimization".  Group by can work
efficiently over millions of rows.

>
> I'd think that if memory consumption start to be an issue, that
> means that the number of emails start to be too large to be loaded
> from an HTTP request, so I could have it on file and load it in a
> temp table and so on...
>
> Still before I make any test I need to find how to associate emails
> with passwords considering that I'd like to pass email "inline" the
> SQL statement and I'll have the passwords in pw_res.
>

Do the simplest thing possible.  Get it working, then see if you have
any new problems you need to solve.  Every issue you've described so
far is database design 101 and should present no real problem.  I
think you're agonizing over nothing...

--
Peter Hunsberger

Re: still on joining array/inline values was and is: design, ref integrity and performance

From
Ivan Sergio Borgonovo
Date:
On Wed, 28 Oct 2009 10:12:19 -0500
Peter Hunsberger <peter.hunsberger@gmail.com> wrote:

> > The first approach requires a distinct/group by that may be
> > expensive.
> > The second one requires I keep in memory all the emails while the
> > first statement run.

> Unless you're dealing with 100,000's of these things I think you're
> engaging in a process of "premature optimization".  Group by can
> work efficiently over millions of rows.

We may get in the range of half that number occasionally but not
feeding emails directly from a HTTP request.
Still the number of passwords generated in one run may be in the
range of 50K. But well I could calmly wait 2 or 3 seconds.
Making some very rough test on a similar box to the one I'll have to
use it takes few milliseconds on a not indexed table.

> Do the simplest thing possible.  Get it working, then see if you
> have any new problems you need to solve.  Every issue you've
> described so far is database design 101 and should present no real
> problem.  I think you're agonizing over nothing...

That's always a good advice. Sometimes you're out just for moral
support.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it