Thread: Allowing update of column only from trigger

Allowing update of column only from trigger

From
Shawn Harrison
Date:
I have a table like this:

create table objects (
    id     serial primary key,
    name     varchar not null,
    parent     integer references objects(id) default 1 not null,
    uri    varchar not null
) without oids;

The uri column is a denormalization for performance, storing a "path" to
the object in the hierarchy, consisting of a sequence of names.

# select id, name, parent, uri from objects;

  id | name | parent |   uri
----+------+--------+----------
   1 |      |      1 | /
   2 | foo  |      1 | /foo
   3 | bar  |      2 | /foo/bar
(3 rows)

The uri is calculated by a trigger before update on objects.

The original version of the trigger function would re-calculate the uri
for an object and its immediate children if the name, parent, or uri
changed. It would apply the uri change to the children, which would
cascade down the hierarchy. This generally worked, but (1) I was having
data visibility voodoo, and (2) it was calculating every child's uri
twice, which would be inefficient for large hierarchies.

So I changed the trigger function so that it would only fire if name or
parent had changed. I created a recursive helper function that changes
the uri for all descendants of an object, if the object's name or parent
has changed. There is no cascade of changes (the trigger fires for all
the descendants, of course, but doesn't do anything). Works great, is
more efficient, and I can manage the max_stack_size to fit the size of
the hierarchy.

The PROBLEM with this is that anyone can now

# update objects set uri='/ha/ha/your/screwed' where id=2;

I want the trigger function and its helper alone to be able to update
the uri.

What is the best way to do this? Should I put the uri column in a
separate table, and play with permissions? (Yuck.) Do I need to bite the
bullet, go back to cascading triggers, and work out the data visibility
voodoo and the efficiency issue? Or is there a better way that I haven't
thought of?

Thanks,

Shawn Harrison
--
________________
harrison@tbc.net

Re: Allowing update of column only from trigger

From
PFC
Date:
    First you should use a ltree type for the uri field :
- you write it foo.bar instead of /foo/bar
- there are operators on ltree types to express "is parent of", "is
children of"
- these operators are indexed

    Check the readme :
http://www.sai.msu.su/~megera/postgres/gist/ltree/

    If you have this type of path, I guess you'll often make tree traversal
operations, and that you'll find the ltree operators extremely useful.
    You can update it with a trigger just like before.

    Now about your update problem, when you rename foo.bar into foo.crum.bar
you could, in a single update, replace all foo.bar by foo.crum.bar in all
your table with the ltree operators and special functions.

    And for your checks, you can add a CHECK on the url field to be sure it's
equal to the url of the parent + the name of the current row. It'll make
one more SELECT request, though.

> I have a table like this:
>
> create table objects (
>     id     serial primary key,
>     name     varchar not null,
>     parent     integer references objects(id) default 1 not null,
>     uri    varchar not null
> ) without oids;
>
> The uri column is a denormalization for performance, storing a "path" to
> the object in the hierarchy, consisting of a sequence of names.
>
> # select id, name, parent, uri from objects;
>
>   id | name | parent |   uri
> ----+------+--------+----------
>    1 |      |      1 | /
>    2 | foo  |      1 | /foo
>    3 | bar  |      2 | /foo/bar
> (3 rows)
>
> The uri is calculated by a trigger before update on objects.
>
> The original version of the trigger function would re-calculate the uri
> for an object and its immediate children if the name, parent, or uri
> changed. It would apply the uri change to the children, which would
> cascade down the hierarchy. This generally worked, but (1) I was having
> data visibility voodoo, and (2) it was calculating every child's uri
> twice, which would be inefficient for large hierarchies.
>
> So I changed the trigger function so that it would only fire if name or
> parent had changed. I created a recursive helper function that changes
> the uri for all descendants of an object, if the object's name or parent
> has changed. There is no cascade of changes (the trigger fires for all
> the descendants, of course, but doesn't do anything). Works great, is
> more efficient, and I can manage the max_stack_size to fit the size of
> the hierarchy.
>
> The PROBLEM with this is that anyone can now
>
> # update objects set uri='/ha/ha/your/screwed' where id=2;
>
> I want the trigger function and its helper alone to be able to update
> the uri.
>
> What is the best way to do this? Should I put the uri column in a
> separate table, and play with permissions? (Yuck.) Do I need to bite the
> bullet, go back to cascading triggers, and work out the data visibility
> voodoo and the efficiency issue? Or is there a better way that I haven't
> thought of?
>
> Thanks,
>
> Shawn Harrison



Re: Allowing update of column only from trigger

From
"Andrey V. Semyonov"
Date:
isn't it possible to restrict UPDATE by access rights based on the DB's
user?

Create table with owner set to the administrator of the database (NOT
PostgreSQL SERVER!!!) and grant only the needed rights (or none of them)
to the user from which the usual processing of the database will be
performed. Then, create a trigger function with SECURITY DEFINER set and
own it by the owner of the database (or other user, who's granted to
UPDATE the table). So, if no one else is granted UPDATE on the table,
the only UPDATE-modifiers of the table will be the owner and the trigger
function's owner (if differs from owner).

Best regards,
    Andrey V. Semyonov

Re: Allowing update of column only from trigger

From
Shawn Harrison
Date:
PFC wrote [01/28/05 7:08 PM]:
>
>     First you should use a ltree type for the uri field :

Yes, it would be very good if I could use ltree rather than rolling my
own, but ltree doesn't meet my use requirements.

> - you write it foo.bar instead of /foo/bar

That is the problem: The point is to make look-up based on uri or
filesystem path very efficient.

> - there are operators on ltree types to express "is parent of", "is
> children of"
> - these operators are indexed

I have written similar pure-SQL functions for my parent/uri system,
though without the syntax shortcuts of the ltree operators.

> [...]
>     Now about your update problem, when you rename foo.bar into
> foo.crum.bar  you could, in a single update, replace all foo.bar by
> foo.crum.bar in all  your table with the ltree operators and special
> functions.

That's an approach I hadn't considered. Hmmm, I'd have to read the ltree
C sources to see how it does this....

>     And for your checks, you can add a CHECK on the url field to be sure
> it's  equal to the url of the parent + the name of the current row.
> It'll make  one more SELECT request, though.

Yes, I'd like to avoid extra selects as much as possible.

Thank you very much for your interaction -- very helpful.
--
________________
harrison@tbc.net

Re: Allowing update of column only from trigger

From
Shawn Harrison
Date:
Andrey V. Semyonov wrote [01/29/05 12:45 PM]:
> isn't it possible to restrict UPDATE by access rights based on the DB's
> user?
>
> Create table with owner set to the administrator of the database (NOT
> PostgreSQL SERVER!!!) and grant only the needed rights (or none of them)
> to the user from which the usual processing of the database will be
> performed. Then, create a trigger function with SECURITY DEFINER set and
> own it by the owner of the database (or other user, who's granted to
> UPDATE the table). So, if no one else is granted UPDATE on the table,
> the only UPDATE-modifiers of the table will be the owner and the trigger
> function's owner (if differs from owner).

Thank you for explaining this. I haven't done much with rights within
the database, but it seems you have explained how to do exactly what I
had been considering as the "rights"-oriented solution to my problem.

> Best regards,
>    Andrey V. Semyonov

Take care,
Shawn Harrison
--
________________
harrison@tbc.net

Re: Allowing update of column only from trigger

From
Shawn Harrison
Date:
Shawn Harrison wrote [01/31/05 12:56 PM]:
> Shawn Harrison wrote [01/28/05 3:53 PM]:
> create or replace rule objects__update as on update to objects
> do instead (
>     update objects_data set
>         name = new.name,
>         typename = new.typename,
>         parent = new.parent,
                            ^^^^
This is a simplified version of the rule; the real one didn't have this
syntax error.

>     where id = new.id
> );
> ...
--
________________
harrison@tbc.net

Re: Allowing update of column only from trigger

From
Shawn Harrison
Date:
Shawn Harrison wrote [01/28/05 3:53 PM]:
> I have a table like this:
>
> create table objects (
>     id     serial primary key,
>     name     varchar not null,
>     parent     integer references objects(id) default 1 not null,
>     uri    varchar not null
> ) without oids;
>
> The uri column is a denormalization for performance, storing a "path" to
> the object in the hierarchy, consisting of a sequence of names.
> [...]
> I want the trigger function and its helper alone to be able to update
> the uri.
>
> What is the best way to do this? [...]

Thank you, PFC and Andrey V. Semyonov, for your help in characterizing
the problem and its solution. Here's what I did.

"version 1" is the speedy, recursive version of the trigger function
that I wrote last week, which has the problem of letting uri be updated
directly.

My "version 2" solution was to (1) add auto-update when the uri had been
changed in addition to the parent and name, (2) make the trigger fire
_after_ update (to overcome data visibility problems), and (3) to add a
SQL update statement to the trigger rather than relying on changes to
new.uri (since changing new.uri won't work in a trigger after update).
This worked. But was very inefficient, because it was calculating the
same thing multiple times for every child object. To wit:

- loading a 300-item hierarchy (inserting all objects and updating parents):
    version 1: 1.8 seconds
    version 2: 7.9 seconds
- Updating the name of the root object in the hierarchy (which requires
updating the uri of 300 objects)
    version 1: 0.4 seconds
    version 2: 4.5 seconds

One can see how unscalable version 2 would be.

My "version 3" solution was to keep my "fast" version 1 trigger code,
but to change the data model a bit: The table is now named
"objects_data". I then created a view named "objects" which is just

create view objects as (select * from objects_data);

To protect uri, I created a rules on insert and update to objects that
doesn't pass to objects_data the changed uri value, like this:

create or replace rule objects__update as on update to objects
do instead (
     update objects_data set
         name = new.name,
         typename = new.typename,
         parent = new.parent,
     where id = new.id
);

The trigger function to update the uri then operates directly on the
objects_data table.

This solution provides as much security as I need -- to protect against
stupidity, mainly. If I wanted more security, I could change the
security on the objects_data table, as Andrey suggested.

So now all of my test cases pass, and its speedy to boot. :-)

Thanks, guys,
Shawn Harrison
--
Peace and joy,

Shawn Harrison
Tyndale House Publishers
_______________
sah@tyndale.com