Thread: design, ref integrity and performance
Hi, I've to generate unique password and associate them with emails. Association with emails is just to mail the password, email + password aren't "the password", just the password is. So a bunch of emails may be associated with the same password. So there are 2 kind of passwords: - shared, multiple use - individual (they could be multiple or single use) I've as input a list of emails and according to the kind of passwords I've to generate I fill a table that is create table pw_email( password varchar(16), email varchar(64) ); So data inside may look like /* bunch of shared passwords */ abcdefg, 1@example.com abcdefg, 2@example.com abcdefg, 3@example.com abcdefg, 4@example.com /* bunch of individual passwords */ abcdefg1, 1@example.com abcdefg2, 2@example.com abcdefg3, 5@example.com abcdefg4, 6@example.com Now each password is linked to the access of one or more resource. I could just add the resource id to the pw_email table but: - it is not normalized - considering many password will be duplicated, I'll have to check on a larger table to see which password give access to what - I'd like to associate different passwords to different resource so that one password doesn't grant access to more than one resource. create table pw_resource( res int references ... password varchar(16) ); Of course if one password is in pw_email but it is not in pw_resource I have a problem. But I've to generate password/email couples first before filling pw_resource. I could make the constraint deferrable, add a on delete cascade to pw_email.password but: - I'm not sure it is a good design - I'm still not sure it can work as expected Before testing if it can work I'd like to hear some comment on the design. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo wrote: > Hi, > > I've to generate unique password and associate them with emails. > Association with emails is just to mail the password, email + > password aren't "the password", just the password is. > > So a bunch of emails may be associated with the same password. > > So there are 2 kind of passwords: > - shared, multiple use > - individual (they could be multiple or single use) So are you saying I login using *just* a password, not using my email as a user-name? How do you know who is logging in and what does it mean for the password to be shared? -- Richard Huxton Archonet Ltd
On Tue, 27 Oct 2009 09:17:59 +0000 Richard Huxton <dev@archonet.com> wrote: > Ivan Sergio Borgonovo wrote: > > Hi, > > I've to generate unique password and associate them with emails. > > Association with emails is just to mail the password, email + > > password aren't "the password", just the password is. > > So a bunch of emails may be associated with the same password. > > So there are 2 kind of passwords: > > - shared, multiple use > > - individual (they could be multiple or single use) > So are you saying I login using *just* a password, not using my > email as a user-name? yes > How do you know who is logging in and what does it mean for the > password to be shared? I don't care who "logged in". Shared just mean several people could use a resource providing the same password. They are a sort of "promotional codes". Some of them can be shared, I don't mind if people give them to others provided they let our service be known by others. Then there are unique passwords on which we may or may not check if they are used by multiple users of the site... but that's another problem. Association between email and password is just meant to build up a queue for mailing and there is no uniqueness constraint on (password, email) pair. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo wrote: > On Tue, 27 Oct 2009 09:17:59 +0000 > Richard Huxton <dev@archonet.com> wrote: > >> Ivan Sergio Borgonovo wrote: >>> Hi, > >>> I've to generate unique password and associate them with emails. >>> Association with emails is just to mail the password, email + >>> password aren't "the password", just the password is. > >>> So a bunch of emails may be associated with the same password. > >>> So there are 2 kind of passwords: >>> - shared, multiple use >>> - individual (they could be multiple or single use) > >> So are you saying I login using *just* a password, not using my >> email as a user-name? > > yes > >> How do you know who is logging in and what does it mean for the >> password to be shared? > > I don't care who "logged in". > Shared just mean several people could use a resource providing the > same password. > They are a sort of "promotional codes". Some of them can be shared, > I don't mind if people give them to others provided they let our > service be known by others. > Then there are unique passwords on which we may or may not check if > they are used by multiple users of the site... but that's another > problem. OK. So not really passwords at all. > Association between email and password is just meant to build up a > queue for mailing and there is no uniqueness constraint on > (password, email) pair. > create table pw_email( > password varchar(16), > email varchar(64) > ); > create table pw_resource( > res int references ... > password varchar(16) > ); So the tables are clear enough, with pw_resource having "password" as a primary key. > But I've to generate password/email couples first before filling > pw_resource. The simplest thing would be to do them the other way around, but assuming you can't... The "most relational" way would be to have a separate table containing just passwords, then reference that from both pw_resource and pw_email. INSERT INTO passwords VALUES ('abcde'); INSERT INTO pw_email VALUES('abcde', 'someone@example.com'); ... INSERT INTO pw_resource VALUES ('abcde', 12345); Failing that you could allow nulls for pw_resource.res and do something like: INSERT INTO pw_resource VALUES ('abcde', null); INSERT INTO pw_email VALUES('abcde', 'someone@example.com'); ... UPDATE pw_resource SET res=12345 WHERE password='abcde'; -- Richard Huxton Archonet Ltd
still on joining array/inline values was and is: design, ref integrity and performance
From
Ivan Sergio Borgonovo
Date:
On Tue, 27 Oct 2009 10:54:06 +0000 Richard Huxton <dev@archonet.com> wrote: > > Association between email and password is just meant to build up > > a queue for mailing and there is no uniqueness constraint on > > (password, email) pair. > > > create table pw_email( > > password varchar(16), > > email varchar(64) > > ); > > > create table pw_resource( > > res int references ... > > password varchar(16) > > ); > > But I've to generate password/email couples first before filling > > pw_resource. > > The simplest thing would be to do them the other way around, but > assuming you can't... Trying to understand why I can't do the other way around I made clearer to myself the constraints. - There is no reason to have a password without an associated recipient. - There is no reason to have a password without an associated resource resource. - I don't want the same password for more than one resource. So to satisfy the first 2 requirements... it looks to me the table should be: create table resources( res int primary key /* other characteristics of the resource... */ ); create table pw( res int references resources(ref) on delete cascade, password varchar(16) not null, email varchar(64) not null, /* Should I chose some unique constraint on the couples? which? */ ); To have the 3rd constraint I'd have a table: create table pw_res( password varchar(16) primary key, res int references resources (res) on delete cascade ); This comes handy for 2 reasons: - it helps me to enforce the third constraint - it makes it easier to find which resource is associated with a password that will be a common search But this introduces one more problem if I decide to delete a password. This could happen for "shared" and non shared passwords. I don't think it is something that may happen frequently... but it may happen... and once you start to introduce mess in your data it is hard to come back. So a fk on pw.password may help... but... once I've that fk I can't insert res,pw,email without pw baing in pw_res. If I do the other way around inserting password(s) in pw_res I may end up with passwords with no destination associated. Surely I could wrap everything in a transaction so that if I can't succede to insert email/pw records everything will be rolled back. But I can't see how to make it cleaner. I can get emails and associate them with a resource and a password in one sql statement if I can defer the constraint check on password. The next statement would be used to fill pw_res with distinct values from pw. If I do the other way around in case I need different passwords for different emails I'll have to: - count the emails and fill pw_res with as many password as needed. I think I'll need generate_series. - find a way to associate these passwords with emails I don't know how to do the later. Still supposing I knew how to write the sql statement that will take values('email1', 'email2'...) as suggested by Merlin Moncure in a different thread I can see some pros and cons of these approaches. The first approach requires a distinct/group by that may be expensive. The second one requires I keep in memory all the emails while the first statement run. I'd think that if memory consumption start to be an issue, that means that the number of emails start to be too large to be loaded from an HTTP request, so I could have it on file and load it in a temp table and so on... Still before I make any test I need to find how to associate emails with passwords considering that I'd like to pass email "inline" the SQL statement and I'll have the passwords in pw_res. -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: still on joining array/inline values was and is: design, ref integrity and performance
From
Peter Hunsberger
Date:
On Wed, Oct 28, 2009 at 4:50 AM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: > > To have the 3rd constraint I'd have a table: > create table pw_res( > password varchar(16) primary key, > res int references resources (res) on delete cascade > ); > > This comes handy for 2 reasons: > - it helps me to enforce the third constraint > - it makes it easier to find which resource is associated with a > password that will be a common search > > But this introduces one more problem if I decide to delete a > password. This could happen for "shared" and non shared passwords. > I don't think it is something that may happen frequently... but it > may happen... and once you start to introduce mess in your data it > is hard to come back. Not sure why this is a mess? Cascaded deletes can clean up the data. If you need it for some business reason add an active flag. > > So a fk on pw.password may help... but... once I've that fk I can't > insert res,pw,email without pw baing in pw_res. > > If I do the other way around inserting password(s) in pw_res I may > end up with passwords with no destination associated. > > Surely I could wrap everything in a transaction so that if I can't > succede to insert email/pw records everything will be rolled back. > > But I can't see how to make it cleaner. At some point you need business logic to enforce your business requirements. There is little point in trying to enforce business constraints in the database. Stick to pure low level data integrity for the database and keep the business constraints at some higher level. > > I can get emails and associate them with a resource and a password > in one sql statement if I can defer the constraint check on password. > The next statement would be used to fill pw_res with distinct values > from pw. > > If I do the other way around in case I need different passwords for > different emails I'll have to: > - count the emails and fill pw_res with as many password as needed. > I think I'll need generate_series. > - find a way to associate these passwords with emails > > I don't know how to do the later. > Still supposing I knew how to write the sql statement that will take > values('email1', 'email2'...) > as suggested by Merlin Moncure in a different thread I can see some > pros and cons of these approaches. > > The first approach requires a distinct/group by that may be > expensive. > The second one requires I keep in memory all the emails while the > first statement run. Unless you're dealing with 100,000's of these things I think you're engaging in a process of "premature optimization". Group by can work efficiently over millions of rows. > > I'd think that if memory consumption start to be an issue, that > means that the number of emails start to be too large to be loaded > from an HTTP request, so I could have it on file and load it in a > temp table and so on... > > Still before I make any test I need to find how to associate emails > with passwords considering that I'd like to pass email "inline" the > SQL statement and I'll have the passwords in pw_res. > Do the simplest thing possible. Get it working, then see if you have any new problems you need to solve. Every issue you've described so far is database design 101 and should present no real problem. I think you're agonizing over nothing... -- Peter Hunsberger
Re: still on joining array/inline values was and is: design, ref integrity and performance
From
Ivan Sergio Borgonovo
Date:
On Wed, 28 Oct 2009 10:12:19 -0500 Peter Hunsberger <peter.hunsberger@gmail.com> wrote: > > The first approach requires a distinct/group by that may be > > expensive. > > The second one requires I keep in memory all the emails while the > > first statement run. > Unless you're dealing with 100,000's of these things I think you're > engaging in a process of "premature optimization". Group by can > work efficiently over millions of rows. We may get in the range of half that number occasionally but not feeding emails directly from a HTTP request. Still the number of passwords generated in one run may be in the range of 50K. But well I could calmly wait 2 or 3 seconds. Making some very rough test on a similar box to the one I'll have to use it takes few milliseconds on a not indexed table. > Do the simplest thing possible. Get it working, then see if you > have any new problems you need to solve. Every issue you've > described so far is database design 101 and should present no real > problem. I think you're agonizing over nothing... That's always a good advice. Sometimes you're out just for moral support. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it