Thread: where to divide application and database
I was just reading over a reply from David Fetter from a couple of days ago; the thread is archived[1] but this question doesn't really relate to it much. The a question about how to arrange tables and David make the following comments: On Tue, Feb 17, 2009 at 09:53:00AM -0800, David Fetter wrote: > On Tue, Feb 17, 2009 at 04:40:58PM +0000, Sam Mason wrote: > > > user_name varchar(50) NOT NULL, > > > > As a general design question; should user_name have a UNIQUE > > constraint on it? i.e. > > > > user_name VARCHAR(50) NOT NULL UNIQUE, > > Yes, it's good to have a UNIQUE constraint, but not this one. To have > a sane one, it needs further constraints, and in 8.4, case-insensitive > text (citext) type. Here's one that is reasonably sane until citext > is available. > > user_name TEXT, -- unless length is an integrity constraint, use TEXT instead of VARCHAR. > > then later: > > CREATE UNIQUE INDEX unique_user_name_your_table > ON your_table(LOWER(TRIM(user_name))) > > You might also require that whitespace be treated in some consistent > way, one example of which is simply forbidding whitespace in user_name > at all. This you can do via CHECK constraints or a DOMAIN. The reason behind this appears to be moving some of the checks into the database and away from the application. When I've solved similar problems before, I've tended to make the application more aware of what's going on by having something like: user_name VARCHAR(50) NOT NULL UNIQUE CHECK (user_name ~ '^[a-z][a-z0-9_]*$') I explicitly don't want enormous long usernames, 15 characters should be enough but lets give people something to play with if they want. It's partly in case I want to ever get the code interacting with something like C and want to reduce my exposure to buffer overflows as much as possible (yes, I'll still be writing code that should be safe from buffer overflows but I'm human and bugs occur) and there's also the fact that a 1MB username is going to probably be copied around the place pretty freely because the code isn't expecting it to be big. Then there's layout issues, displaying a bit of text that long is awkward. Next thing, just a plain UNIQUE constraint. It allows me to then use the table as the target of a REFERENCES constraint if I want. If/when the optimizer knows that UNIQUE constraints mean that only a single row is returned then it'll be able to optimize things better as well. Finally the CHECK constraint is filtering out "bad" usernames, I don't want people embedding HTML or whatever else in their username to break my systems so plain text only here and no spaces at the end. I don't think that either my nor David's is better in general, they apply to different situations. It's just interesting to see how different people solve problems so I was wondering if other people do things differently. -- Sam http://samason.me.uk/ [1] http://archives.postgresql.org/pgsql-general/2009-02/msg00770.php
On Thu, Feb 19, 2009 at 11:43:19PM +0000, Sam Mason wrote: > I was just reading over a reply from David Fetter from a couple of > days ago; the thread is archived[1] but this question doesn't really > relate to it much. The a question about how to arrange tables and > David make the following comments: > > On Tue, Feb 17, 2009 at 09:53:00AM -0800, David Fetter wrote: > > On Tue, Feb 17, 2009 at 04:40:58PM +0000, Sam Mason wrote: > > > > user_name varchar(50) NOT NULL, > > > > > > As a general design question; should user_name have a UNIQUE > > > constraint on it? i.e. > > > > > > user_name VARCHAR(50) NOT NULL UNIQUE, > > > > Yes, it's good to have a UNIQUE constraint, but not this one. To > > have a sane one, it needs further constraints, and in 8.4, > > case-insensitive text (citext) type. Here's one that is > > reasonably sane until citext is available. > > > > user_name TEXT, -- unless length is an integrity constraint, use TEXT instead of VARCHAR. > > > > then later: > > > > CREATE UNIQUE INDEX unique_user_name_your_table > > ON your_table(LOWER(TRIM(user_name))) > > > > You might also require that whitespace be treated in some > > consistent way, one example of which is simply forbidding > > whitespace in user_name at all. This you can do via CHECK > > constraints or a DOMAIN. > > The reason behind this appears to be moving some of the checks into > the database and away from the application. Since a useful database has *many* applications instead of "the" application, I think this is an excellent move. Single Point of Truth and all that. > When I've solved similar problems before, I've tended to make the > application more aware of what's going on by having something like: > > user_name VARCHAR(50) NOT NULL UNIQUE > CHECK (user_name ~ '^[a-z][a-z0-9_]*$') My point there was that simply limiting the length isn't enough for many purposes, and when you're adding DOMAIN or other constraints on the value, that's a place to put the length checks in, too. For example, you might well want to set a lower bound on the size of a user_name, not just an upper bound. > I don't think that either my nor David's is better in general, they > apply to different situations. I don't even think they're *different* in general ;) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, 20 Feb 2009 06:50:22 -0800 David Fetter <david@fetter.org> wrote: > > The reason behind this appears to be moving some of the checks > > into the database and away from the application. > > Since a useful database has *many* applications instead of "the" > application, I think this is an excellent move. Single Point of > Truth and all that. I generally prefer code clearness and security over presumed performance gains but I was wondering if "checks" may have an impact on performances and if pg does some optimisation over them. eg. suppose I'm: insert into bla (a,b,c) from select a,b,c from bhu. And bla.a and bhu.a have the same constraint/data type etc... Is postgresql going to check if bhu.a fit in bla.a every time? I may expect this kind of optimisation is done on type and domains but would be too expensive/smart to do it on checks since postgresql should understand the equivalence or inclusion of some checks. So if a lot of stuff is moving around tables... I'd use domains and user defined types rather than checks. reasonable? If postgresql does this kind of optimisation... pushing "checks" in the DB is going to gain more extra points compared to doing checks at the application level. What I find a bit annoying is politely deal with the error once it is reported back to the application *and* connection and *bandwidth* costs of moving clearly wrong data back and forward. If you've a good mapping between pg types and the application language/library types it becomes easier to keep in sync those checks otherwise it is a really boring job and DB checks becomes just one more security net to maintain. In some places you REALLY appreciate/need that layer... sometimes it just get in the way. -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Fri, Feb 20, 2009 at 06:50:22AM -0800, David Fetter wrote: > On Thu, Feb 19, 2009 at 11:43:19PM +0000, Sam Mason wrote: > > On Tue, Feb 17, 2009 at 09:53:00AM -0800, David Fetter wrote: > > > user_name TEXT, -- unless length is an integrity constraint, use TEXT instead of VARCHAR. > > > > > > then later: > > > > > > CREATE UNIQUE INDEX unique_user_name_your_table > > > ON your_table(LOWER(TRIM(user_name))) > > > > > > You might also require that whitespace be treated in some > > > consistent way, one example of which is simply forbidding > > > whitespace in user_name at all. This you can do via CHECK > > > constraints or a DOMAIN. > > > > The reason behind this appears to be moving some of the checks into > > the database and away from the application. > > Since a useful database has *many* applications instead of "the" > application, I think this is an excellent move. Single Point of > Truth and all that. Oops, I think I failed to read your original message very well then. I failed to notice the "forbidding whitespace" comment even though I deliberately left it in. Doh! > > I don't think that either my nor David's is better in general, they > > apply to different situations. > > I don't even think they're *different* in general ;) No, they're not really are they. :) -- Sam http://samason.me.uk/
On Fri, Feb 20, 2009 at 04:51:33PM +0100, Ivan Sergio Borgonovo wrote: > What I find a bit annoying is politely deal with the error once it > is reported back to the application *and* connection and *bandwidth* > costs of moving clearly wrong data back and forward. This sounds a bit like premature optimization to me; I don't think many people worry about optimizing the failure code paths. I know I prefer to make sure that things go quickly when they're working. If you're worried about someone performing a DOS attack on a failure then you'd want to optimize it, but surely you'd want the checks early in the application code. > If you've a good mapping between pg types and the application > language/library types it becomes easier to keep in sync those > checks otherwise it is a really boring job and DB checks becomes just > one more security net to maintain. It does, but constraints like that aren't going to be changing to regularly are they? -- Sam http://samason.me.uk/
Ivan Sergio Borgonovo wrote: > On Fri, 20 Feb 2009 06:50:22 -0800 > David Fetter <david@fetter.org> wrote: >>> ... moving some of the checks >>> into the database and away from the application. >> Since a useful database has *many* applications instead of "the" >> application, I think this is an excellent move. > > ....I was wondering if "checks" may have an impact > on performances and if pg does some optimisation over them. Are you suggesting thee would be a positive or negative impact on performance. Moving some checks in the database should *improve* performance by giving the planner improved information. For example, unique constraints indicate when only 0-1 rows may come out of a query; and range constraints could let a database know when a partition doesn't even need to be visited. No doubt other checks (say, spellchecking a column) would have have performance costs. I'm with David Fetter's perspective of considering multiple applications that can run on top of a database. If a particular check should apply to all conceivable applications that run on a database (say, foreign key constraints) it seems to me they belong in a database. If a particular check should apply to just one application, but other applications might have reasons not to enforce such a check (say, one app might do spell checking in english; another in a different language) - that belongs in the app.
On Sat, 21 Feb 2009 15:02:55 -0800 Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote: > Ivan Sergio Borgonovo wrote: > > ....I was wondering if "checks" may have an impact > > on performances and if pg does some optimisation over them. > Are you suggesting thee would be a positive or negative impact > on performance. > Moving some checks in the database should *improve* performance > by giving the planner improved information. For example, unique > constraints indicate when only 0-1 rows may come out of a query; > and range constraints could let a database know when a partition > doesn't even need to be visited. > No doubt other checks (say, spellchecking a column) would have > have performance costs. I was wondering where and if they could have a performance impact (positive or negative). We're talking about PostgreSQL, not an abstract DB or another implementation. Would you delegate constraint check to *any other DB*? > I'm with David Fetter's perspective of considering multiple > applications that can run on top of a database. Me too. Postgresql is a mature application; the chances the most frequent useful optimisation are not already there are smaller than the mistakes I could make putting optimisations and constraints check in my application. Still it is better to know than guess. That could help in engineering the constraints differently or well to exploit better their "performance boost". -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Fri, 20 Feb 2009 20:45:20 +0000 Sam Mason <sam@samason.me.uk> wrote: > On Fri, Feb 20, 2009 at 04:51:33PM +0100, Ivan Sergio Borgonovo > wrote: > > What I find a bit annoying is politely deal with the error once > > it is reported back to the application *and* connection and > > *bandwidth* costs of moving clearly wrong data back and forward. > This sounds a bit like premature optimization to me; I don't think Well... I'd just know how things work. Not to optimise at the starting blocks but rather to avoid cutting my way to optimisation later. I'm glad to learn that not only postgresql is not seriously affected by constraints but it may take advantage of them as suggested by Ron Mayer. > many people worry about optimizing the failure code paths. I know > I prefer to make sure that things go quickly when they're > working. If you're worried about someone performing a DOS attack > on a failure then you'd want to optimize it, but surely you'd want > the checks early in the application code. There may be several reasons to "duplicate" checks in the application too. Sometimes the failure path is more frequent than the success path, sometimes you need quick feedback, sometimes it is a matter of bandwidth etc... > > If you've a good mapping between pg types and the application > > language/library types it becomes easier to keep in sync those > > checks otherwise it is a really boring job and DB checks becomes > > just one more security net to maintain. > It does, but constraints like that aren't going to be changing to > regularly are they? Actually if I was omniscient I wouldn't be so deeply involved with programming... but even if I was, an application may serve different needs during its lifespan. And still having to write constraint in the application and in the DB is twice the work. Furthermore a DB reports error in a way that may not be useful to the user. create table test.zau(a int, b int); insert into test.zau values('z','z'); ERROR: invalid input syntax for integer: "z" create table test.zau(a int check (a>0), b int); insert into test.zau values(-1,5); ERROR: new row for relation "zau" violates check constraint "zau_a_check" And in a less than ideal world you may be tempted to put constraints just in the client. Once upon a long ago I gave a look to RoR and I vaguely remember you could define tables with constraint in ruby and somehow you automatically had constraints in the DB and some primitive check on the client too. But maybe I was daydreaming. I wonder how all this magic works once you've to refactor. -- Ivan Sergio Borgonovo http://www.webthatworks.it