Thread: storing access rights in a postgres database
Hi, we are developping a web application in which we need to manage access to several types of objects, the most important objects are 'company', 'projects', 'subproject', 'module' (and several others but that's not important for now). In general these objects constitute a tree, as for example each company owns several projects and each project belongs to exactly one company. So it's relatively easy to store these objects and relations between them as it's a simple 1:M relationship. The funny part begins with the access rights on these objects - we want to store them in a database in such a way to get: (a) good performance - there will be several hundreds of users and objects etc. (b) easy administering - all of that will be administered by humans, so it should be as easy as possible There'll be two basic types of questions: (1) Does the user X have an access to the object Y? (i.e. Does the user have access to the 'project X'?) (2) To which objects at the level X can the user Y access? (i.e. 'To which projects does the user have an access?') The (b) in general means some kind of 'inheritance' is used, that is each node in the tree inherits the access right from the node above him in case there's no access right set directly on it. So the first step when deciding 'Does the user X have an access to 'project Y?' would be to determine whether there's an access right right on the project, and if not then the same question ('Does he hava an access?' would be asked for the node above project (a 'firm' for example). I came up with a table CREATE TABLE rights ( user_id INT NOT NULL, allowed BOOLEAN NOT NULL, firm_id INT, project_id INT, subproject_id INT, module_id INT ); Where all the columns are references to the related tables (not important here). The table is filled from 'left to right' that is if a column is NULL then all the columns to right from it are NULL as well, thus each row has a meaning of a path in the tree. For example INSERT INTO rights(allowed,user_id,firm_id,project_id,subproject_id,module_id) VALUES ('t',1,4,33,12,24); INSERT INTO rights(allowed,user_id,firm_id,project_id,subproject_id,module_id) VALUES ('t',1,4,NULL,NULL,NULL); are valid rows, while INSERT INTO rights(allowed,user_id,firm_id,project_id,subproject_id,module_id) VALUES ('t',1,4,NULL,34,NULL); is not valid as there's a 'gap' between '4' and '34'. The question 'Does the user X have an access to object Y?' is then realized by an SQL query (let the object be a project with id 3, belonging to firm with id 4, and let the user have id 1): SELECT allowed FROM rights WHERE user_id = 1 AND ( (firm_id = 4 AND project_id = 3 AND subproject_id IS NULL) OR (firm_id = 4 AND project_id IS NULL) ) ORDER BY firm_id, project_id, subproject_id, module_id, allowed DESC LIMIT 1; where the 'ORDER BY' clause sorts the results the rows so the most specific are 'at the top' and then choose 't' in prior to 'f'. The problem is with the second type of queries (all objects the user has access rights to) as all the ways to find that using SQL are very slow. For example to get a list of all such projects for user with id 1 we use this: SELECT id, ( SELECT allowed FROM rights WHERE user_id = 1 AND ( (firm_id = projects.firm_id AND project_id = projects.idAND subproject_id IS NULL) OR (firm_id = projects.firm_id AND project_id IS NULL) ) ORDER BY firm_id, project_id, subproject_id,module_id, allowed DESC LIMIT 1 ) as allowed FROM projects; The problem is in the 'LIMIT 1' clause - that's the reason I can't write that as a join. Does someone else has an idea how to solve this? If needed I can send more complex examples and some testing data, explain plans, etc. I've been thinking about some 'intermediate table' with results of the subselect, updated by a set of triggers, but maybe there's some better solution. thanks for all your advices Tomas PS: We're not granting right directly to users of course - we are using roles, but it's not necessary here. Just imagine role_id instead of user_id in all the text.
Hello tv, I think in your case the PostgreSQL array column type really fits well. I would just add an array of type integer (or whatever your primary key in your role table is) to your company, project, module, ... tables. Then you can easy check if a role has access to the project row by checking if the roles primary key is contained in the role array. And you can easily select over the entire project table and matching any values in the role array. Here are some (untested) SQL statements to clarify my suggestion: create table role ( id serial primary key, name text ); create table project ( id serial primary key, name text, roles int[] ); create table company ( id serial primary key, name text, roles int[] ); insert into role values(1,'you'); insert into role values(2,'me'); insert into project values(1,'a',{1,2}); insert into project values(2,'b',{2}); -- check if I can access a project select id from project where name='a' and 2=ANY(roles); -- 2 is 'my' role id -- get all companies I have access to select id,name from company where 2=ANY(roles); I think the foreign key constraints can not be enforced with the standard foreign key triggers/functions so you would have to write your own plpgsql triggers if this is a mandatory requirement. -- ---> Dirk Jagdmann ----> http://cubic.org/~doj -----> http://llg.cubic.org
On 10/10/06, tv@fuzzy.cz <tv@fuzzy.cz> wrote:
I think your problem is NOT the LIMIT, it is the fact that you are putting a select inside the select block (your correlated subquery). You should try left outer joins instead:
SELECT
projects.id,
rights.allowed
FROM projects
LEFT OUTER JOIN rights ON (
(
rights.firm_id = projects.firm_id
AND rights.project_id = projects.id
AND rights.subproject_id IS NULL
) OR (
rights.firm_id = projects.firm_id
AND rights.project_id IS NULL
)
)
GROUP BY
projects.id,
rights.allowed
Using correlated subqueries is really bad (IMHO) because it causes your query to perform a select for each row returned. I have never seen a correlated subquery that cannot be refactored into a join (this is a challenge for any of you who disagree - I would love to see a reason to use a correlated subquery).
You can get more information about correlated subqueries and performance at http://www.bcarter.com/sap29.htm .
Also, I would consider putting a conditional unique constraint to enforce your rule on which fields must be null/not null to help preserve your data integrity.
Hope this helps!
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
SELECT id, (
SELECT allowed FROM rights WHERE user_id = 1 AND (
(firm_id = projects.firm_id AND project_id = projects.id AND
subproject_id IS NULL)
OR (firm_id = projects.firm_id AND project_id IS NULL)
)
ORDER BY firm_id, project_id, subproject_id, module_id, allowed DESC LIMIT 1
) as allowed
FROM projects;
The problem is in the 'LIMIT 1' clause - that's the reason I can't write that as
a join.
Does someone else has an idea how to solve this? If needed I can send more
complex examples and some testing data, explain plans, etc.
I've been thinking about some 'intermediate table' with results of the
subselect, updated by a set of triggers, but maybe there's some better
solution.
I think your problem is NOT the LIMIT, it is the fact that you are putting a select inside the select block (your correlated subquery). You should try left outer joins instead:
SELECT
projects.id,
rights.allowed
FROM projects
LEFT OUTER JOIN rights ON (
(
rights.firm_id = projects.firm_id
AND rights.project_id = projects.id
AND rights.subproject_id IS NULL
) OR (
rights.firm_id = projects.firm_id
AND rights.project_id IS NULL
)
)
GROUP BY
projects.id,
rights.allowed
Using correlated subqueries is really bad (IMHO) because it causes your query to perform a select for each row returned. I have never seen a correlated subquery that cannot be refactored into a join (this is a challenge for any of you who disagree - I would love to see a reason to use a correlated subquery).
You can get more information about correlated subqueries and performance at http://www.bcarter.com/sap29.htm .
Also, I would consider putting a conditional unique constraint to enforce your rule on which fields must be null/not null to help preserve your data integrity.
Hope this helps!
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================