Thread: data integrity

data integrity

From
Jelle Ouwerkerk
Date:
Hi

I'm using postgresql to manage a web site.

Problem: How are people dealing with data integrity issues such as stale
data when writing web-based apps? For instance:

The value of X is 7.

Step (1) User A selects X.
Step (2) User B selects X.
Step (3) User A updates X to 8.
Step (4) User B updates X to 10 under the assumption that X is still 7.

In some scenarios, user B should be prevented from updating.

Solutions (that I'm considering):

(1) Timestamp: Have a timestamp for every row in every table, which
automatically changes when a row is inserted or updated, or

(2) Cache: Similarly, cache the user's selected data on the server side
(every tuple from the original select) and compare when the user attempts
to update, or

(3) Squash: modify only those fields in the row that need updating
(instead of the entire row) and whatever happens happens.

Has anyone solved this 'stale data' problem before?

Does this fall into a family of problems that has an existing (trivial)
solution? Also, is it the responsibility of the application server to deal
with the scenario above or should the application itself be responsible?

Thanks in advance

...Jelle



Re: data integrity

From
Martín Marqués
Date:
On Lun 05 Nov 2001 18:06, you wrote:
> Hi
>
> I'm using postgresql to manage a web site.
>
> Problem: How are people dealing with data integrity issues such as stale
> data when writing web-based apps? For instance:
>
> The value of X is 7.
>
> Step (1) User A selects X.
> Step (2) User B selects X.
> Step (3) User A updates X to 8.
> Step (4) User B updates X to 10 under the assumption that X is still 7.

Lock it with a SELECT ...... FOR UPDATE. The rows that are going to be
updated will be locked, but not the entire table. :-)

> In some scenarios, user B should be prevented from updating.
>
> Solutions (that I'm considering):
>
> (1) Timestamp: Have a timestamp for every row in every table, which
> automatically changes when a row is inserted or updated, or
>
> (2) Cache: Similarly, cache the user's selected data on the server side
> (every tuple from the original select) and compare when the user attempts
> to update, or
>
> (3) Squash: modify only those fields in the row that need updating
> (instead of the entire row) and whatever happens happens.
>
> Has anyone solved this 'stale data' problem before?
>
> Does this fall into a family of problems that has an existing (trivial)
> solution? Also, is it the responsibility of the application server to deal
> with the scenario above or should the application itself be responsible?

saludos... ;-)

--
Porqué usar una base de datos relacional cualquiera,
si podés usar PostgreSQL?
-----------------------------------------------------------------
Martín Marqués                  |        mmarques@unl.edu.ar
Programador, Administrador, DBA |       Centro de Telematica
                       Universidad Nacional
                            del Litoral
-----------------------------------------------------------------

Re: data integrity

From
Martín Marqués
Date:
On Mar 06 Nov 2001 08:24, Janning Vygen wrote:
> Am Dienstag,  6. November 2001 00:17 schrieb Martín Marqués:
> > On Lun 05 Nov 2001 18:06, you wrote:
> > > Hi
> > >
> > > I'm using postgresql to manage a web site.
> > >
> > > Problem: How are people dealing with data integrity issues such
> > > as stale data when writing web-based apps? For instance:
> > >
> > > The value of X is 7.
> > >
> > > Step (1) User A selects X.
> > > Step (2) User B selects X.
> > > Step (3) User A updates X to 8.
> > > Step (4) User B updates X to 10 under the assumption that X is
> > > still 7.
> >
> > Lock it with a SELECT ...... FOR UPDATE. The rows that are going to
> > be updated will be locked, but not the entire table. :-)
>
> I dont think that FOR UPDATE helps in this situation, but it might be
> a misunderstanding.
>
> FOR UPDATE locks the rows of the table for an update. right. but it
> removes the lock if the transaction commits.
>
> In a web based application you cant enforce a transaction between the
> sending of a form and the exceuting of a submit.
>
> Its an application problem: the data i see in my frontend are never
> realtime. they might have change in the last seconds. But who cares.
> If i want to have a value of 8 and another one wants a value of 10
> its a question of whom granting write access.

Web based? Great! PHP?
If you already have 7 saved, then make a CASE statment in your query, and
depending on the IF you update or not.
You may want to think about using plpgsql.

Simple. :-)

--
Porqué usar una base de datos relacional cualquiera,
si podés usar PostgreSQL?
-----------------------------------------------------------------
Martín Marqués                  |        mmarques@unl.edu.ar
Programador, Administrador, DBA |       Centro de Telematica
                       Universidad Nacional
                            del Litoral
-----------------------------------------------------------------

Re: data integrity

From
Janning Vygen
Date:
Am Dienstag,  6. November 2001 00:17 schrieb Martín Marqués:
> On Lun 05 Nov 2001 18:06, you wrote:
> > Hi
> >
> > I'm using postgresql to manage a web site.
> >
> > Problem: How are people dealing with data integrity issues such
> > as stale data when writing web-based apps? For instance:
> >
> > The value of X is 7.
> >
> > Step (1) User A selects X.
> > Step (2) User B selects X.
> > Step (3) User A updates X to 8.
> > Step (4) User B updates X to 10 under the assumption that X is
> > still 7.
>
> Lock it with a SELECT ...... FOR UPDATE. The rows that are going to
> be updated will be locked, but not the entire table. :-)

I dont think that FOR UPDATE helps in this situation, but it might be
a misunderstanding.

FOR UPDATE locks the rows of the table for an update. right. but it
removes the lock if the transaction commits.

In a web based application you cant enforce a transaction between the
sending of a form and the exceuting of a submit.

Its an application problem: the data i see in my frontend are never
realtime. they might have change in the last seconds. But who cares.
If i want to have a value of 8 and another one wants a value of 10
its a question of whom granting write access.

janning


> > In some scenarios, user B should be prevented from updating.
> >
> > Solutions (that I'm considering):
> >
> > (1) Timestamp: Have a timestamp for every row in every table,
> > which automatically changes when a row is inserted or updated, or
> >
> > (2) Cache: Similarly, cache the user's selected data on the
> > server side (every tuple from the original select) and compare
> > when the user attempts to update, or
> >
> > (3) Squash: modify only those fields in the row that need
> > updating (instead of the entire row) and whatever happens
> > happens.
> >
> > Has anyone solved this 'stale data' problem before?
> >
> > Does this fall into a family of problems that has an existing
> > (trivial) solution? Also, is it the responsibility of the
> > application server to deal with the scenario above or should the
> > application itself be responsible?
>
> saludos... ;-)

--
Planwerk 6 /websolutions
Herzogstraße 86
40215 Düsseldorf

fon 0211-6015919
fax 0211-6015917
http://www.planwerk6.de

Re: data integrity

From
Lincoln Yeoh
Date:
I deal with it in an application specific manner.

In one case- row contains FAQ style item - I mark the row as locked by a
particular user/session. If other users try to update the same row, a copy
is made. So the end users have to combine their work.

In another case - rows are given an application level transaction ID, only
rows from committed transactions are visible (did this in MySQL some years
back). The transactions are only for a very limited controlled scope.

Trying to fully implement application level MVCC/transaction might be
possible but difficult coping with cases where a user opens multiple
browser windows or jumps from state to state (history/back button), or goes
for lunch...

Can an application server really deal with this problem? If anyone can show
how, I'd like to know!

Keeping a database connection open between webpages won't work well
(multiple windows = self conflicting transaction) or scale (many
users/sessions per timeout period= run out of DB connections) .

As a kludge you can hold open a web connection to keep track of stuff - a
download that takes a very long time- displaying very little data at a
time. But it's ugly and sometimes certain browsers just randomly disconnect
after a while..

Cheerio,
Link.

At 04:06 PM 11/5/01 -0500, Jelle Ouwerkerk wrote:
>Problem: How are people dealing with data integrity issues such as stale
>data when writing web-based apps? For instance:
>
>The value of X is 7.
>
>Step (1) User A selects X.
>Step (2) User B selects X.
>Step (3) User A updates X to 8.
>Step (4) User B updates X to 10 under the assumption that X is still 7.
>
>In some scenarios, user B should be prevented from updating.
>
>Solutions (that I'm considering):
>
>(1) Timestamp: Have a timestamp for every row in every table, which
>automatically changes when a row is inserted or updated, or
>
>(2) Cache: Similarly, cache the user's selected data on the server side
>(every tuple from the original select) and compare when the user attempts
>to update, or
>
>(3) Squash: modify only those fields in the row that need updating
>(instead of the entire row) and whatever happens happens.
>
>Has anyone solved this 'stale data' problem before?
>
>Does this fall into a family of problems that has an existing (trivial)
>solution? Also, is it the responsibility of the application server to deal
>with the scenario above or should the application itself be responsible?



Re: data integrity

From
Alvaro Herrera
Date:
On Mon, 5 Nov 2001, Jelle Ouwerkerk wrote:

> Problem: How are people dealing with data integrity issues such as stale
> data when writing web-based apps? For instance:
[...]

> In some scenarios, user B should be prevented from updating.

Maybe you can do something like this:

user A SELECTs the data, and the web app caches the fields.
user B does the same thing.
user B changes some fields and commits.
user A changes some fields and tries to commit. His webapp SELECTs the
data again and checks against its cache.
Since the data is different, the webapp presents him the three versions
and lets him choose.

The idea is to check every time you are going to commit the data to see
if it changed while you weren't looking. If it did, tell him so and let
him decide.

--
Alvaro Herrera (<alvherre[@]atentus.com>)
"Ni aun el genio muy grande llegaria muy lejos
si tuviera que sacarlo todo de su propio interior" (Goethe)


Re: data integrity

From
Jason Earl
Date:
Keep reading, I have commented below.

--- Janning Vygen <vygen@gmx.de> wrote:
> Am Dienstag,  6. November 2001 00:17 schrieb Mart�n
> Marqu�s:
> > On Lun 05 Nov 2001 18:06, you wrote:
> > > Hi
> > >
> > > I'm using postgresql to manage a web site.
> > >
> > > Problem: How are people dealing with data
> integrity issues such
> > > as stale data when writing web-based apps? For
> instance:
> > >
> > > The value of X is 7.
> > >
> > > Step (1) User A selects X.
> > > Step (2) User B selects X.
> > > Step (3) User A updates X to 8.
> > > Step (4) User B updates X to 10 under the
> assumption that X is
> > > still 7.
> >
> > Lock it with a SELECT ...... FOR UPDATE. The rows
> that are going to
> > be updated will be locked, but not the entire
> table. :-)
>
> I dont think that FOR UPDATE helps in this
> situation, but it might be
> a misunderstanding.
>
> FOR UPDATE locks the rows of the table for an
> update. right. but it
> removes the lock if the transaction commits.
>
> In a web based application you cant enforce a
> transaction between the
> sending of a form and the exceuting of a submit.

And even if you could, you probably wouldn't want to.
If you did, it would allow folks to lock up rows in
your database indefinitely simply by requestin a form
and closing their web browser.  Not only that, but it
would require that each database user get their own
PostgreSQL connection, and your middle ware layer
would have to map requests to the correct connection.

In the end it is *much* easier to simply make each
request a transaction.  If an error is raised, then
everything is rolled back, if not then everything is
committed and the page is served up.

If you *really* need something like this it is better
not to use PostgreSQL's transaction mechanism and
instead create an extra table that holds the primary
key of the records you want locked and a timestamp
(and probably the user id or session id of the person
who has created the lock).  That way you can easily
check this table for which records are being updated
and you can also easily time out locks.

This is especially useful in those cases where
updating a record takes a lot of time.  If all you are
doing is updating a simple integer then a locking
table is not likely to help.  Each individual will
hold the lock for such a short period of time that the
chances of collision are low.

> Its an application problem: the data i see in my
> frontend are never
> realtime. they might have change in the last
> seconds. But who cares.
> If i want to have a value of 8 and another one wants
> a value of 10
> its a question of whom granting write access.

This sort of a setup is certainly easiest.  However,
sometimes it is useful to alert the user to the fact
that someone else is also working on this particular
row.  Creating a separate table to track which rows
are being accessed is fairly straightforward and not
particular expensive.  Trying to use PostgreSQL's
locking mechanisms (on the other hand) is a short road
to disaster.

Janning is certainly correct, however, when he says
that ultimately it comes down to who has write access.
 Even using a locking table there is nothing to stop
someone changing that row right after I finish
changing it.  They won't be able to change it *while*
I am working on it, but they'll be able to change it
the second I hit "submit."

Jason

__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com