Re: multimaster - Mailing list pgsql-general

From Jeff Davis
Subject Re: multimaster
Date
Msg-id 1180923685.3955.116.camel@jdavis
Whole thread Raw
In response to Re: multimaster  ("Alexander Staubo" <alex@purefiction.net>)
List pgsql-general
On Mon, 2007-06-04 at 00:51 +0200, Alexander Staubo wrote:
> True, and when they overlap you tend to want to describe the
> validation errors in one place, not two -- either the database or the
> app, not both. Relational databases have traditionally argued that
> these rules should be in the former, so that there's one layer through
> which every single change has to go.
>

I disagree here. You often _do_ want to describe some types of errors
twice. You check the same thing in different ways at different points in
the code, and that isolates errors and assures developers that certain
assumptions are safe.

In the database world, it's particularly important to use declarative
constraints. If developer A inserts bad data and developer B uses that
bad data, it could compound the problem and yet remain invisible until
the problem is almost impossible to debug. Constraints assure the
developers that they are starting with some known state.

Applications should check for inconsistencies when it makes sense, as
well. Every important API that I can think of checks the input, and
reports some kind of useful error when the assumptions are violated.
Every system call has all sorts of useful error codes. For example,
read() can return EBADF. Nobody thinks "Hey, I'll send a random integer
for the file descriptor", but I'd be willing to bet that the error
condition has been reached by accident before, and probably saved
someone a lot of time versus just filling the target buffer with random
bytes and returning success.

I would argue it's more important in a database, because the error
conditions can persist for a longer period of time and cause more damage
in the process, but the idea is the same.

> I agree. In my experience, however, the best kind of data model is the
> one that is immediately mappable to user-level concepts -- to human
> concepts. A "user" relation has attributes like "name", "birth_date",
> etc. If you manage to keep the model flat and friendly enough, you can
> map the attributes to forms and translate attribute-level errors
> directly to form error messages.
>

I think you're oversimplifying. What you say works when user input is a
separate, contained, isolated chunk of data. In that case, any error is
only related to the self-consistency of the input, and can easily be
mapped back to a user-level error.

However, it breaks down when you have constraints on the
interrelationships between pieces of data. These interrelationships are
what can be broken from multiple points in the application code, and
there is no way to map backwards from the constraint violation to a
specific user error. Hence, the application must translate.

Try to imagine some of the complexities in a scheduling system, and what
kind of constraints that might involve. Then, think about how some of
the same constraints might be broken in very different ways. Time
conflicts could come about either by participants overscheduling
themselves, or by the event itself shifting in time such that some
participants are overscheduled. Perhaps someone tries to sign up for an
event that's already full, or perhaps the venue moves to a new location
with a lower capacity. I can't think of any way to map backwards from
the constraint violation to the user level error.

You could probably imagine similar problems with an inventory system.

> > The application has much more information about the user and the context
> > of the error that the database shouldn't have. For instance, the
> > language that the user speaks might affect the error message.
>
> Localization is easily accomplished by piping the error message through gettext.
>

And what about the two-column unique index that can be violated from two
different aspects of the UI? You only get one database error, but you
really should present two different errors to the user.

Any time that a constraint can be violated through two completely
different paths, your one-to-one constraint-to-application-error breaks
down.

The root of the issue is that the database knows that an inconsistent
state has been reached, it does not know _how_, nor should it. The how
might change as new code is added or perhaps as new bugs are introduced.
Constraints in an RDBMS should be declarative which is very important
(you don't need to prove that a column is always in a unique state, you
look, and it says it is unique). You can add procedural code to an
RDBMS, but you can end up making your database your application that
way.

User-level errors are heavily dependent on _how_ the error occurred. The
database doesn't know this, so the constraint violation message
shouldn't presume to know how it happened.

I'll use the analogy to read() again: who passes EBADF back to the user
directly? Does that mean it's a bad API? No, it just means it had no
idea why you passed it a bad file descriptor, but it knows it's bad, and
it tells the caller. Similarly with exceptions in java/ruby/python: you
shouldn't pass those exceptions back to the user.

Regards,
    Jeff Davis


pgsql-general by date:

Previous
From: DiegoF
Date:
Subject: Re: monitor stats
Next
From: Tom Lane
Date:
Subject: Re: why postgresql over other RDBMS