Efficiency of stored procedure vs large join - Mailing list pgsql-novice

From Malcolm Hutty
Subject Efficiency of stored procedure vs large join
Date
Msg-id 3DCBFAD1.3030502@hutty.com
Whole thread Raw
Responses Re: Efficiency of stored procedure vs large join
List pgsql-novice
Which is more efficient, a stored procedure that performs a query
followed by a second, where the choice of the second query depends on
the truth/falsity of the first, or a single large join?

My previous experience was with relatively simple applications built
with Apache+PHP+MySQL. My current project is considerably more
complex, which is why I've switched to PostgreSQL.

The application I'm building has a complex roles-based permissions
system, and it is necessary to check the users permissions on every
application-level operation, which is to say on a very high proportion
of database queries. Because this security check occurs so frequently,
efficiency here is extremely important.

Details of my particular situation are as follows:

User, roles, permissions, role-permission relationships for particular
records, and role-permission defaults for groups of records (those
built from a particular "template") are all stored in relevent tables.

The conceptual logic is essentially like this:
For a given user, database record (in certain tables) and operation:
* What are the roles this user has?
* What template was this database record built from? (this is a field
within the record that is a key to another record)
* Are any of the user’s roles permitted to perform the operation by
the template?
* If so, are any of the user’s roles prohibited from performing the
operation by the individual record? If prohibited, return FALSE;
otherwise, return TRUE.
* If not, any of the user’s roles permitted to perform the operation
by the record? If permitted, return TRUE; otherwise, return NULL.

Returning TRUE means the operation is permitted; FALSE or NULL means
that it is not.

My choices seem to be: (1) implement the above logic in PL/pgSQL, so
all my database queries will have this form:

SELECT foo FROM bar WHERE thingy='wibble' AND
SecurityCheck(<userid>,bar.id,<operation>);


(2) do it in SQL:
SELECT foo FROM bar B WHERE thingy='wibble' AND

             ((B.template IN (
          SELECT DISTINCT D.template
              FROM
              securityprivssets S,
              securitytemplatedefaults D,
              roledata R
          WHERE
                  R.person=<userid>
                  AND D.role=R.role
                  AND D.privset=S.id
                  AND D.template = B.template
                  AND S.operation = <operation>
                  AND S.value = TRUE  )
          ) OR

          (
              B.id IN (
              SELECT DISTINCT O.objectid
                  FROM
                  securityprivssets S,
                  securityobjectACL O,
                  roledata R
                      WHERE
                      R.person=$userid
                      AND O.role=R.role
                      AND O.privset=S.id
                      AND O.objectid = Bid
                      AND S.operation = <operation>
                      AND S.value = TRUE
              )
          ))
          AND
          (
              B.id NOT IN (
              SELECT DISTINCT O.objectid
                  FROM
                  securityprivssets S,
                  securityobjectACL O,
                  roledata R
                      WHERE
                      R.person=$userid
                      AND O.role=R.role
                      AND O.privset=S.id
                      AND O.objectid = B.id
                      AND S.operation = <operation>
                      AND S.value = FALSE
              )
          );

On the one hand (with pure SQL), I'm sending a much larger length of
query text across the network from PHP to the database, and I might be
constructing a very large joined resultset before it gets trimmed down
(I don't know how the optimisation works). On the other hand (with a
stored procedure) I'm performing several queries, taking the
resultsets out of the Postgres optimiser and iteritively querying each
result from PL/pgSQL. Is that a bad thing? Or is it no worse than the
joined subselects in the big query? Does it make a performance
difference at all?

I don't really care about query legibility; the whole query is being
constructed programmatically anyway, so appending the large fragment
above as a string (with variable substitution) to each principal query
fragment doesn't harm application-level legibility: it'll look like
this in my PHP code:
...
$querystring = "SELECT foo FROM bar WHERE thingy='wibble'";
$querystring .= getSecurityCheckString($userid,$operation);
$query->Execute($querystring)
...


Thanks in advance for your advice,

Malcolm.



pgsql-novice by date:

Previous
From: Rory Campbell-Lange
Date:
Subject: Re: Return serial from insert
Next
From: Charlie Clark
Date:
Subject: Question on locale settings