Thread: where to divide application and database

where to divide application and database

From

Sam Mason

Date:

19 February 2009, 19:43:29

I was just reading over a reply from David Fetter from a couple of days
ago; the thread is archived[1] but this question doesn't really relate
to it much.  The a question about how to arrange tables and David make
the following comments:

On Tue, Feb 17, 2009 at 09:53:00AM -0800, David Fetter wrote:
> On Tue, Feb 17, 2009 at 04:40:58PM +0000, Sam Mason wrote:
> > >      user_name varchar(50) NOT NULL,
> >
> > As a general design question; should user_name have a UNIQUE
> > constraint on it?  i.e.
> >
> >   user_name VARCHAR(50) NOT NULL UNIQUE,
>
> Yes, it's good to have a UNIQUE constraint, but not this one.  To have
> a sane one, it needs further constraints, and in 8.4, case-insensitive
> text (citext) type.  Here's one that is reasonably sane until citext
> is available.
>
> user_name TEXT, -- unless length is an integrity constraint, use TEXT instead of VARCHAR.
>
> then later:
>
> CREATE UNIQUE INDEX unique_user_name_your_table
>     ON your_table(LOWER(TRIM(user_name)))
>
> You might also require that whitespace be treated in some consistent
> way, one example of which is simply forbidding whitespace in user_name
> at all.  This you can do via CHECK constraints or a DOMAIN.

The reason behind this appears to be moving some of the checks into the
database and away from the application.

When I've solved similar problems before, I've tended to make the
application more aware of what's going on by having something like:

  user_name VARCHAR(50) NOT NULL UNIQUE
    CHECK (user_name ~ '^[a-z][a-z0-9_]*$')

I explicitly don't want enormous long usernames, 15 characters should be
enough but lets give people something to play with if they want.  It's
partly in case I want to ever get the code interacting with something
like C and want to reduce my exposure to buffer overflows as much as
possible (yes, I'll still be writing code that should be safe from
buffer overflows but I'm human and bugs occur) and there's also the fact
that a 1MB username is going to probably be copied around the place
pretty freely because the code isn't expecting it to be big.  Then
there's layout issues, displaying a bit of text that long is awkward.

Next thing, just a plain UNIQUE constraint.  It allows me to then use
the table as the target of a REFERENCES constraint if I want.  If/when
the optimizer knows that UNIQUE constraints mean that only a single row
is returned then it'll be able to optimize things better as well.

Finally the CHECK constraint is filtering out "bad" usernames, I don't
want people embedding HTML or whatever else in their username to break
my systems so plain text only here and no spaces at the end.

I don't think that either my nor David's is better in general, they
apply to different situations.  It's just interesting to see how
different people solve problems so I was wondering if other people do
things differently.

--
  Sam  http://samason.me.uk/

 [1] http://archives.postgresql.org/pgsql-general/2009-02/msg00770.php

Re: where to divide application and database

From

David Fetter

Date:

20 February 2009, 10:50:41

On Thu, Feb 19, 2009 at 11:43:19PM +0000, Sam Mason wrote:
> I was just reading over a reply from David Fetter from a couple of
> days ago; the thread is archived[1] but this question doesn't really
> relate to it much.  The a question about how to arrange tables and
> David make the following comments:
>
> On Tue, Feb 17, 2009 at 09:53:00AM -0800, David Fetter wrote:
> > On Tue, Feb 17, 2009 at 04:40:58PM +0000, Sam Mason wrote:
> > > >      user_name varchar(50) NOT NULL,
> > >
> > > As a general design question; should user_name have a UNIQUE
> > > constraint on it?  i.e.
> > >
> > >   user_name VARCHAR(50) NOT NULL UNIQUE,
> >
> > Yes, it's good to have a UNIQUE constraint, but not this one.  To
> > have a sane one, it needs further constraints, and in 8.4,
> > case-insensitive text (citext) type.  Here's one that is
> > reasonably sane until citext is available.
> >
> > user_name TEXT, -- unless length is an integrity constraint, use TEXT instead of VARCHAR.
> >
> > then later:
> >
> > CREATE UNIQUE INDEX unique_user_name_your_table
> >     ON your_table(LOWER(TRIM(user_name)))
> >
> > You might also require that whitespace be treated in some
> > consistent way, one example of which is simply forbidding
> > whitespace in user_name at all.  This you can do via CHECK
> > constraints or a DOMAIN.
>
> The reason behind this appears to be moving some of the checks into
> the database and away from the application.

Since a useful database has *many* applications instead of "the"
application, I think this is an excellent move.  Single Point of
Truth and all that.

> When I've solved similar problems before, I've tended to make the
> application more aware of what's going on by having something like:
>
>   user_name VARCHAR(50) NOT NULL UNIQUE
>     CHECK (user_name ~ '^[a-z][a-z0-9_]*$')

My point there was that simply limiting the length isn't enough for
many purposes, and when you're adding DOMAIN or other constraints on
the value, that's a place to put the length checks in, too.  For
example, you might well want to set a lower bound on the size of a
user_name, not just an upper bound.

> I don't think that either my nor David's is better in general, they
> apply to different situations.

I don't even think they're *different* in general ;)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: where to divide application and database

From

Ivan Sergio Borgonovo

Date:

20 February 2009, 11:51:47

On Fri, 20 Feb 2009 06:50:22 -0800
David Fetter <david@fetter.org> wrote:

> > The reason behind this appears to be moving some of the checks
> > into the database and away from the application.
>
> Since a useful database has *many* applications instead of "the"
> application, I think this is an excellent move.  Single Point of
> Truth and all that.

I generally prefer code clearness and security over presumed
performance gains but I was wondering if "checks" may have an impact
on performances and if pg does some optimisation over them.

eg. suppose I'm:
insert into bla (a,b,c) from select a,b,c from bhu.
And bla.a and bhu.a have the same constraint/data type etc...
Is postgresql going to check if bhu.a fit in bla.a every time?

I may expect this kind of optimisation is done on type and domains
but would be too expensive/smart to do it on checks since postgresql
should understand the equivalence or inclusion of some checks.

So if a lot of stuff is moving around tables... I'd use domains and
user defined types rather than checks.

reasonable?

If postgresql does this kind of optimisation... pushing "checks" in
the DB is going to gain more extra points compared to doing checks
at the application level.

What I find a bit annoying is politely deal with the error once it
is reported back to the application *and* connection and *bandwidth*
costs of moving clearly wrong data back and forward.

If you've a good mapping between pg types and the application
language/library types it becomes easier to keep in sync those
checks otherwise it is a really boring job and DB checks becomes just
one more security net to maintain.

In some places you REALLY appreciate/need that layer... sometimes it
just get in the way.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: where to divide application and database

From

Sam Mason

Date:

20 February 2009, 16:38:34

On Fri, Feb 20, 2009 at 06:50:22AM -0800, David Fetter wrote:
> On Thu, Feb 19, 2009 at 11:43:19PM +0000, Sam Mason wrote:
> > On Tue, Feb 17, 2009 at 09:53:00AM -0800, David Fetter wrote:
> > > user_name TEXT, -- unless length is an integrity constraint, use TEXT instead of VARCHAR.
> > >
> > > then later:
> > >
> > > CREATE UNIQUE INDEX unique_user_name_your_table
> > >     ON your_table(LOWER(TRIM(user_name)))
> > >
> > > You might also require that whitespace be treated in some
> > > consistent way, one example of which is simply forbidding
> > > whitespace in user_name at all.  This you can do via CHECK
> > > constraints or a DOMAIN.
> >
> > The reason behind this appears to be moving some of the checks into
> > the database and away from the application.
>
> Since a useful database has *many* applications instead of "the"
> application, I think this is an excellent move.  Single Point of
> Truth and all that.

Oops, I think I failed to read your original message very well then.
I failed to notice the "forbidding whitespace" comment even though I
deliberately left it in.  Doh!

> > I don't think that either my nor David's is better in general, they
> > apply to different situations.
>
> I don't even think they're *different* in general ;)

No, they're not really are they. :)

--
  Sam  http://samason.me.uk/

Re: where to divide application and database

From

Sam Mason

Date:

20 February 2009, 16:45:32

On Fri, Feb 20, 2009 at 04:51:33PM +0100, Ivan Sergio Borgonovo wrote:
> What I find a bit annoying is politely deal with the error once it
> is reported back to the application *and* connection and *bandwidth*
> costs of moving clearly wrong data back and forward.

This sounds a bit like premature optimization to me; I don't think many
people worry about optimizing the failure code paths.  I know I prefer
to make sure that things go quickly when they're working.  If you're
worried about someone performing a DOS attack on a failure then you'd
want to optimize it, but surely you'd want the checks early in the
application code.

> If you've a good mapping between pg types and the application
> language/library types it becomes easier to keep in sync those
> checks otherwise it is a really boring job and DB checks becomes just
> one more security net to maintain.

It does, but constraints like that aren't going to be changing to
regularly are they?

--
  Sam  http://samason.me.uk/

Re: where to divide application and database

From

Ron Mayer

Date:

21 February 2009, 19:03:13

Ivan Sergio Borgonovo wrote:
> On Fri, 20 Feb 2009 06:50:22 -0800
> David Fetter <david@fetter.org> wrote:
>>> ... moving some of the checks
>>> into the database and away from the application.
>> Since a useful database has *many* applications instead of "the"
>> application, I think this is an excellent move.
>
> ....I was wondering if "checks" may have an impact
> on performances and if pg does some optimisation over them.

Are you suggesting thee would be a positive or negative impact
on performance.

Moving some checks in the database should *improve* performance
by giving the planner improved information.    For example, unique
constraints indicate when only 0-1 rows may come out of a query;
and range constraints could let a database know when a partition
doesn't even need to be visited.
No doubt other checks (say, spellchecking a column) would have
have performance costs.

I'm with David Fetter's perspective of considering multiple
applications that can run on top of a database.

If a particular check should apply to all conceivable applications
that run on a database (say, foreign key constraints) it seems
to me they belong in a database.

If a particular check should apply to just one application, but
other applications might have reasons not to enforce such a check
(say, one app might do spell checking in english; another in a
different language) - that belongs in the app.

Re: where to divide application and database

From

Ivan Sergio Borgonovo

Date:

25 February 2009, 04:35:54

On Sat, 21 Feb 2009 15:02:55 -0800
Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:

> Ivan Sergio Borgonovo wrote:

> > ....I was wondering if "checks" may have an impact
> > on performances and if pg does some optimisation over them.

> Are you suggesting thee would be a positive or negative impact
> on performance.

> Moving some checks in the database should *improve* performance
> by giving the planner improved information.    For example, unique
> constraints indicate when only 0-1 rows may come out of a query;
> and range constraints could let a database know when a partition
> doesn't even need to be visited.
> No doubt other checks (say, spellchecking a column) would have
> have performance costs.

I was wondering where and if they could have a performance impact
(positive or negative).
We're talking about PostgreSQL, not an abstract DB or another
implementation. Would you delegate constraint check to *any other
DB*?

> I'm with David Fetter's perspective of considering multiple
> applications that can run on top of a database.

Me too. Postgresql is a mature application; the chances the most
frequent useful optimisation are not already there are smaller than
the mistakes I could make putting optimisations and constraints
check in my application. Still it is better to know than guess. That
could help in engineering the constraints differently or well to
exploit better their "performance boost".

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: where to divide application and database

From

Ivan Sergio Borgonovo

Date:

25 February 2009, 05:32:21

On Fri, 20 Feb 2009 20:45:20 +0000
Sam Mason <sam@samason.me.uk> wrote:

> On Fri, Feb 20, 2009 at 04:51:33PM +0100, Ivan Sergio Borgonovo
> wrote:
> > What I find a bit annoying is politely deal with the error once
> > it is reported back to the application *and* connection and
> > *bandwidth* costs of moving clearly wrong data back and forward.

> This sounds a bit like premature optimization to me; I don't think

Well... I'd just know how things work. Not to optimise at the
starting blocks but rather to avoid cutting my way to optimisation
later. I'm glad to learn that not only postgresql is not seriously
affected by constraints but it may take advantage of them as
suggested by Ron Mayer.

> many people worry about optimizing the failure code paths.  I know
> I prefer to make sure that things go quickly when they're
> working.  If you're worried about someone performing a DOS attack
> on a failure then you'd want to optimize it, but surely you'd want
> the checks early in the application code.

There may be several reasons to "duplicate" checks in the
application too. Sometimes the failure path is more frequent than
the success path, sometimes you need quick feedback, sometimes it is
a matter of bandwidth etc...

> > If you've a good mapping between pg types and the application
> > language/library types it becomes easier to keep in sync those
> > checks otherwise it is a really boring job and DB checks becomes
> > just one more security net to maintain.

> It does, but constraints like that aren't going to be changing to
> regularly are they?

Actually if I was omniscient I wouldn't be so deeply involved with
programming... but even if I was, an application may serve different
needs during its lifespan.
And still having to write constraint in the application and in the
DB is twice the work.
Furthermore a DB reports error in a way that may not be useful to
the user.

create table test.zau(a int, b int);
insert into test.zau values('z','z');
ERROR:  invalid input syntax for integer: "z"

create table test.zau(a int check (a>0), b int);
insert into test.zau values(-1,5);
ERROR:  new row for relation "zau" violates check constraint
"zau_a_check"

And in a less than ideal world you may be tempted to put constraints
just in the client.

Once upon a long ago I gave a look to RoR and I vaguely remember you
could define tables with constraint in ruby and somehow you
automatically had constraints in the DB and some primitive check on
the client too.
But maybe I was daydreaming.
I wonder how all this magic works once you've to refactor.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it