Thread: Efficiency of stored procedure vs large join

Efficiency of stored procedure vs large join

From
Malcolm Hutty
Date:
Which is more efficient, a stored procedure that performs a query
followed by a second, where the choice of the second query depends on
the truth/falsity of the first, or a single large join?

My previous experience was with relatively simple applications built
with Apache+PHP+MySQL. My current project is considerably more
complex, which is why I've switched to PostgreSQL.

The application I'm building has a complex roles-based permissions
system, and it is necessary to check the users permissions on every
application-level operation, which is to say on a very high proportion
of database queries. Because this security check occurs so frequently,
efficiency here is extremely important.

Details of my particular situation are as follows:

User, roles, permissions, role-permission relationships for particular
records, and role-permission defaults for groups of records (those
built from a particular "template") are all stored in relevent tables.

The conceptual logic is essentially like this:
For a given user, database record (in certain tables) and operation:
* What are the roles this user has?
* What template was this database record built from? (this is a field
within the record that is a key to another record)
* Are any of the user’s roles permitted to perform the operation by
the template?
* If so, are any of the user’s roles prohibited from performing the
operation by the individual record? If prohibited, return FALSE;
otherwise, return TRUE.
* If not, any of the user’s roles permitted to perform the operation
by the record? If permitted, return TRUE; otherwise, return NULL.

Returning TRUE means the operation is permitted; FALSE or NULL means
that it is not.

My choices seem to be: (1) implement the above logic in PL/pgSQL, so
all my database queries will have this form:

SELECT foo FROM bar WHERE thingy='wibble' AND
SecurityCheck(<userid>,bar.id,<operation>);


(2) do it in SQL:
SELECT foo FROM bar B WHERE thingy='wibble' AND

             ((B.template IN (
          SELECT DISTINCT D.template
              FROM
              securityprivssets S,
              securitytemplatedefaults D,
              roledata R
          WHERE
                  R.person=<userid>
                  AND D.role=R.role
                  AND D.privset=S.id
                  AND D.template = B.template
                  AND S.operation = <operation>
                  AND S.value = TRUE  )
          ) OR

          (
              B.id IN (
              SELECT DISTINCT O.objectid
                  FROM
                  securityprivssets S,
                  securityobjectACL O,
                  roledata R
                      WHERE
                      R.person=$userid
                      AND O.role=R.role
                      AND O.privset=S.id
                      AND O.objectid = Bid
                      AND S.operation = <operation>
                      AND S.value = TRUE
              )
          ))
          AND
          (
              B.id NOT IN (
              SELECT DISTINCT O.objectid
                  FROM
                  securityprivssets S,
                  securityobjectACL O,
                  roledata R
                      WHERE
                      R.person=$userid
                      AND O.role=R.role
                      AND O.privset=S.id
                      AND O.objectid = B.id
                      AND S.operation = <operation>
                      AND S.value = FALSE
              )
          );

On the one hand (with pure SQL), I'm sending a much larger length of
query text across the network from PHP to the database, and I might be
constructing a very large joined resultset before it gets trimmed down
(I don't know how the optimisation works). On the other hand (with a
stored procedure) I'm performing several queries, taking the
resultsets out of the Postgres optimiser and iteritively querying each
result from PL/pgSQL. Is that a bad thing? Or is it no worse than the
joined subselects in the big query? Does it make a performance
difference at all?

I don't really care about query legibility; the whole query is being
constructed programmatically anyway, so appending the large fragment
above as a string (with variable substitution) to each principal query
fragment doesn't harm application-level legibility: it'll look like
this in my PHP code:
...
$querystring = "SELECT foo FROM bar WHERE thingy='wibble'";
$querystring .= getSecurityCheckString($userid,$operation);
$query->Execute($querystring)
...


Thanks in advance for your advice,

Malcolm.



Re: Efficiency of stored procedure vs large join

From
Bruno Wolff III
Date:
On Fri, Nov 08, 2002 at 17:56:33 +0000,
  Malcolm Hutty <msah-postgres@hutty.com> wrote:
>
> On the one hand (with pure SQL), I'm sending a much larger length of
> query text across the network from PHP to the database, and I might be
> constructing a very large joined resultset before it gets trimmed down
> (I don't know how the optimisation works). On the other hand (with a
> stored procedure) I'm performing several queries, taking the
> resultsets out of the Postgres optimiser and iteritively querying each
> result from PL/pgSQL. Is that a bad thing? Or is it no worse than the
> joined subselects in the big query? Does it make a performance
> difference at all?

What about using views? There you could use the sql method, but not
have to pass the extra sql around. This also gives you another option.
If the application authenticates to the database as the end user,
you can enforce the security in the database instead of in the
application (by giving normal users only access to the views).