Thread: A mechanism securing web applications in DBMS

A mechanism securing web applications in DBMS

From
Zhaomo Yang
Date:
Hi all,

I am a graduate student from UC San Diego. My adviser, Dr. Kirill
Levchenko, and I have been working on a web/DB security project for
the last few months. Since fine-grained access control in DBMS is part
of our project and the PostgreSQL community is also working on it now,
we would like to exchange some ideas on the issue with you.

1. Background
Nowadays people entrust more and more sensitive data to web
applications, while security vulnerabilities are common in these
applications. The normal structure of web applications consists of
three tiers: client-side code, server-side code and a database. In
server-side code a database user representing all the
application-level users interacts with the database with full
privileges. Since currently database built-in access control is too
coarse, no access control mechanism is used in most of web
applications. This situation is not ideal since a malicious
application user can tamper with other users’ data by exploiting
vulnerabilities in the application, and if the application is
completely under the malicious user’s control so is the database.

2. Our Mechanism.
In order to fix the problem, we believe there are two underlying
problems to be addressed: authenticating application-level users to
DBMS and fine-grained access control in the DBMS. Note that these two
problems are related. Without authenticating application-level users
to the DBMS, can the DBMS not know which application-level user it is
actually interacting with, thus the DBMS cannot apply fine-grained
access control policy correctly.

2.1 The authentication problem
We address the authentication problem by requiring developers to
define an authentication function in the DBMS. This function is
invoked whenever an application-level user logs in. An authentication
function contains the authentication logic in the server-side code.
Besides, it also caches some attributes of the current
application-level user, which will be checked by the fine-grained
access control mechanism.

In our implementation, we added a new type of function called
authentication function. Developers are supposed to use CREATE
AUTHENTICATION FUNCTION command to define authentication function,
which contains the application authentication logic and must be
defined to return table rows. When the authentication function is
invoked, the return values are cached in a temp table, called the
authentication table,  with the same name as the authentication
function. The authentication table is available to access control
statements (GRANT-WHERE statements).

2.2 The fined-grained access control problem

We address the fined-grained access control problem by using
GRANT-WHERE statements from [1]. Concretely, the syntax of GRANT-WHERE
statements is
     GRANT <privilege> [, ...] ON <relname> TO <user> [, ...]     USING <referenced relname> [, ...]     WHERE
<predicate>;

where <privilege> can be SELECT, INSERT, UPDATE or DELETE, the USING
clause lists the tables of which should be available to the predicate
and the WHERE clause specifies the predicate.

3. Integrate our mechanism to PostgreSQL.

3.1 Authentication Function

The major challenge of this part is how to cache users’ authentication
information. In our prototype implementation, an authentication table
is actually a temporary table. But it makes our GRANT-WHERE statements
also temporary since they refer to the authentication table. Since
every session has its own authentication table and all the
authentication tables share the same schema, the global temporary
table [2] would be a perfect fit for caching  users’ authentication
information. Also, it will not make GRANT-WHERE statements temporary
anymore.

3.2 Fined-Grained Access Control

We noticed that the PostgreSQL community is currently working on
fine-grained access control. Access control policies can be defined in
form of “ALTER TABLE … SET ROW SECURITY” statement. Although
functionally “ALTER TABLE … SET ROW SECURITY” statements and
GRANT-WHERE statements are equivalent, however, there are some
advantages in the latter one:

1) GRANT-WHERE statement is very similar to the SQL GRANT statement,
which should be already familiar to most of the developers, thus the
developers may feel more comfortable to use it.

2) In GRANT-WHERE syntax, a DB user can be specified as the recipient
of the GRANT. In the current proposed row security syntax, one needs
to write a big statement containing all the predicates for all the
possible recipients per table per operation.

For example, suppose we have the following two GRANT-WHERE statements on table1

GRANT UPDATE ON table1 TO Alice
WHERE p1;

GRANT UPDATE ON table1 TO Bob
WHERE p2;

The equivalent statement in the “ALTER TABLE … SET ROW SECURITY” syntax is

ALTER TABLE table1 SET ROW SECURITY FOR UPDATE

TO WHERE (current_user = ‘Alice’ AND p1)
       OR (current_user = ‘Bob’ AND p2);.

As you can see, this syntax requires us to combine two logically
separate statement into one statement. Also, it can be hard for
developers to debug.

4. Collaboration.

We are looking forward to joining forces with PostgreSQL community on
this problem. More specifically, we have the following suggestions
which we hope the community would consider:

1)  Using GRANT-WHERE syntax for the row-level access control. If for
any reasons it is impossible, we hope at least a clause will be added
to the “ALTER TABLE … SET ROW SECURITY” syntax such that a recipient
can be specified;

2) Restarting Global Temporary Table project;

3) Accepting “CREATE AUTHENTICATION FUNCTION” as a variant of “CREATE
FUNCTION” command.

We’d like to discuss the mechanisms above with the community further,
and if any of the mechanisms are accepted by the community, we also
would like to take part in implementing it. Please feel free to
contact us if you have any questions.


Zhaomo Yang      zhy001@ucsd.edu
Kirill Levchenko   klevchen@ucsd.edu


References:
[1] S. Chaudhuri, T. Dutta, and S. Sudarashan. Fine Grained
Authorization Through Predicated Grants. In Proceedings of the 23rd
IEEE International Conference on Data Engineering (ICDE), pages
1174–1183, April 2007.

[2] idea: global temp tables
http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc7806f@mail.gmail.com



Re: A mechanism securing web applications in DBMS

From
Stephen Frost
Date:
Zhaomo,

* Zhaomo Yang (zhy001@cs.ucsd.edu) wrote:
> I am a graduate student from UC San Diego. My adviser, Dr. Kirill
> Levchenko, and I have been working on a web/DB security project for
> the last few months. Since fine-grained access control in DBMS is part
> of our project and the PostgreSQL community is also working on it now,
> we would like to exchange some ideas on the issue with you.

Fantastic!  Very interested.

> 1. Background
> Nowadays people entrust more and more sensitive data to web
> applications, while security vulnerabilities are common in these
> applications. The normal structure of web applications consists of
> three tiers: client-side code, server-side code and a database. In
> server-side code a database user representing all the
> application-level users interacts with the database with full
> privileges. Since currently database built-in access control is too
> coarse, no access control mechanism is used in most of web
> applications. This situation is not ideal since a malicious
> application user can tamper with other users’ data by exploiting
> vulnerabilities in the application, and if the application is
> completely under the malicious user’s control so is the database.

Agreed- we are certainly working to improve that situation, though
consideration must also be given to how our catalogs are structured and
that they are unlikely to support "web-scale" numbers of individual
roles as currently implemented.  We're in a better spot than we were a
few years ago (we used to have flat files for roles...), but we lack any
partitioning capability in the catalogs today.

It's entirely possible that, in some situations, the inability of PG to
support the number of roles has dwarfed the permission granularity
concern.  Even today there are mechanisms available to limit what an
individual role can view, add, update, or delete (though they are less
than ideal, rather cludgy to work with and could be better).

> 2. Our Mechanism.
> In order to fix the problem, we believe there are two underlying
> problems to be addressed: authenticating application-level users to
> DBMS and fine-grained access control in the DBMS. Note that these two
> problems are related. Without authenticating application-level users
> to the DBMS, can the DBMS not know which application-level user it is
> actually interacting with, thus the DBMS cannot apply fine-grained
> access control policy correctly.

Agreed- PG needs to know the user to be able to get to the level of
returning only what that individual user is allowed to see.

> 2.1 The authentication problem
> We address the authentication problem by requiring developers to
> define an authentication function in the DBMS. This function is
> invoked whenever an application-level user logs in. An authentication
> function contains the authentication logic in the server-side code.
> Besides, it also caches some attributes of the current
> application-level user, which will be checked by the fine-grained
> access control mechanism.

This is one approach which can be used, though PG has a reasonably
complex authentication system which applications can leverage (consider
Kerberos proxying and client-side certificates), rather than having the
application independently develop an authentication system.  Still, if
the application is to be responsible for the authentication, then a
simple 'SET ROLE' can be done in PG to switch to the context of an
individual user.

> In our implementation, we added a new type of function called
> authentication function. Developers are supposed to use CREATE
> AUTHENTICATION FUNCTION command to define authentication function,
> which contains the application authentication logic and must be
> defined to return table rows. When the authentication function is
> invoked, the return values are cached in a temp table, called the
> authentication table,  with the same name as the authentication
> function. The authentication table is available to access control
> statements (GRANT-WHERE statements).

It'd be great if you could explain the actual difference, in PG, between
these AUTHENTICATION functions and regular functions (or perhaps
security definer functions), which can also create tables and return
rows.

> 2.2 The fined-grained access control problem
>
> We address the fined-grained access control problem by using
> GRANT-WHERE statements from [1]. Concretely, the syntax of GRANT-WHERE
> statements is
>
>       GRANT <privilege> [, ...] ON <relname> TO <user> [, ...]
>       USING <referenced relname> [, ...]
>       WHERE <predicate>;
>
> where <privilege> can be SELECT, INSERT, UPDATE or DELETE, the USING
> clause lists the tables of which should be available to the predicate
> and the WHERE clause specifies the predicate.

Certainly an interesting alternative, though you run the very serious
risk that the SQL standard will come up with an extention to the GRANT
syntax that will end up causing problems long-term (this is part of the
reason for the CREATE POLICY syntax that we're currently working
towards).

> 3. Integrate our mechanism to PostgreSQL.
>
> 3.1 Authentication Function
>
> The major challenge of this part is how to cache users’ authentication
> information. In our prototype implementation, an authentication table
> is actually a temporary table. But it makes our GRANT-WHERE statements
> also temporary since they refer to the authentication table. Since
> every session has its own authentication table and all the
> authentication tables share the same schema, the global temporary
> table [2] would be a perfect fit for caching  users’ authentication
> information. Also, it will not make GRANT-WHERE statements temporary
> anymore.

This certainly deserves further discussion.  What is the reason that a
cacheing mechanism into a temporary table is necessary..?  Are these
session identifiers to keep track of already-authenticated users?  How
does that play with a connection pooler (which is a very real
consideration when thinking about anything web-scale).

> 3.2 Fined-Grained Access Control
>
> We noticed that the PostgreSQL community is currently working on
> fine-grained access control. Access control policies can be defined in
> form of “ALTER TABLE … SET ROW SECURITY” statement. Although
> functionally “ALTER TABLE … SET ROW SECURITY” statements and
> GRANT-WHERE statements are equivalent, however, there are some
> advantages in the latter one:
>
> 1) GRANT-WHERE statement is very similar to the SQL GRANT statement,
> which should be already familiar to most of the developers, thus the
> developers may feel more comfortable to use it.

To be frank, I don't really buy this argument.  There are certainly
arguments to be had for why GRANT would make sense instead of ALTER
TABLE, but "developer comfort" isn't one of them, and the concern
regarding the SQL specification for the GRANT command trumps it.

> 2) In GRANT-WHERE syntax, a DB user can be specified as the recipient
> of the GRANT. In the current proposed row security syntax, one needs
> to write a big statement containing all the predicates for all the
> possible recipients per table per operation.

This isn't actually accurate- the latest CREATE POLICY syntax includes
the ability to specify roles to which the policy is to be applied.
Further, the next patch will include multiple expressions to address
what happens during UPDATE (where you have to consider what rows are
allowed to be selected, and what rows are allowed to be added back into
the table).  That would certainly further complicate the already quite
complicated GRANT syntax, and not to much advantage that I can see.

> For example, suppose we have the following two GRANT-WHERE statements on table1

[...]

> As you can see, this syntax requires us to combine two logically
> separate statement into one statement. Also, it can be hard for
> developers to debug.

We've moved off of ALTER TABLE for this approach for a number of quite
reasonable reasons and have moved to an independent CREATE POLICY
approach, which addresses the above concerns.

That said, I do feel that in many situations the reality is that users
will simply create a stored procedure (in plpgsql or similar) which then
evaluates what is being requested against another table or possibly even
an external policy system (eg: SELinux) to determine if the record
should be returned.  This runs the risk of not being as performant,
which is absolutely a concern we are considering, but supporting the
per-role policies (which are already in the latest patch) will hopefully
address some of that concern, and we may be able to do even better in
the future.

> 4. Collaboration.
>
> We are looking forward to joining forces with PostgreSQL community on
> this problem. More specifically, we have the following suggestions
> which we hope the community would consider:
>
> 1)  Using GRANT-WHERE syntax for the row-level access control. If for
> any reasons it is impossible, we hope at least a clause will be added
> to the “ALTER TABLE … SET ROW SECURITY” syntax such that a recipient
> can be specified;

Right, already done and integrated into the row-security patch (though
the specific syntax is CREATE POLICY, as discussed previously).  We
certainly welcome feedback on that patch and would love to discuss
further- feel free to contact me directly and we'll set up time to
discuss.

> 2) Restarting Global Temporary Table project;

I'm certainly for this.

> 3) Accepting “CREATE AUTHENTICATION FUNCTION” as a variant of “CREATE
> FUNCTION” command.

More information about how it's different from CREATE FUNCTION would be
very helpful for this.

> We’d like to discuss the mechanisms above with the community further,
> and if any of the mechanisms are accepted by the community, we also
> would like to take part in implementing it. Please feel free to
> contact us if you have any questions.

Consider it done- I'd like to collaborate with you on the row-security
discussion primairly, but also about how to manage very many users in
PG and the authentication challenges.
Thanks!
    Stephen

Re: A mechanism securing web applications in DBMS

From
Laurence Rowe
Date:

> 2.1 The authentication problem
> We address the authentication problem by requiring developers to
> define an authentication function in the DBMS. This function is
> invoked whenever an application-level user logs in. An authentication
> function contains the authentication logic in the server-side code.
> Besides, it also caches some attributes of the current
> application-level user, which will be checked by the fine-grained
> access control mechanism.

This is one approach which can be used, though PG has a reasonably
complex authentication system which applications can leverage (consider
Kerberos proxying and client-side certificates), rather than having the
application independently develop an authentication system.  Still, if
the application is to be responsible for the authentication, then a
simple 'SET ROLE' can be done in PG to switch to the context of an
individual user.
[...]
> 3.1 Authentication Function
>
> The major challenge of this part is how to cache users’ authentication
> information. In our prototype implementation, an authentication table
> is actually a temporary table. But it makes our GRANT-WHERE statements
> also temporary since they refer to the authentication table. Since
> every session has its own authentication table and all the
> authentication tables share the same schema, the global temporary
> table [2] would be a perfect fit for caching  users’ authentication
> information. Also, it will not make GRANT-WHERE statements temporary
> anymore.

This certainly deserves further discussion.  What is the reason that a
cacheing mechanism into a temporary table is necessary..?  Are these
session identifiers to keep track of already-authenticated users?  How
does that play with a connection pooler (which is a very real
consideration when thinking about anything web-scale).

Absolutely, support for connection pooling is vital for performant web applications. The problem I see with Kerberos and client side certificates is that they are only authenticated on the connection level. It would be helpful if some mechanism existed by which an application could pass a signed ticket to the database so that the web application user does not need to be granted privileges to become any user. While Kerberos offers ultimate flexibility it is rather intimidating. Supporting something a little simpler would be very helpful.

The scenario I would like to see supported is the following.

1. Web authentication is centralized with users seeing a single log-in screen. The log-in server issues signed tickets, setting it as a cookie. A ticket asserts the identity of the user along with a number of additional principals /groups / roles which are currently authorized (e.g. two-factor authentication might be required to be issued a delete-permanently principal.)

2. Web application servers verify a user's ticket by checking the signature against the login server's public key or in simpler setups with a shared secret. The ticket has a timeout (or maybe each assertion has a timeout) and a web application checks it is still valid.

3. In making a database request, the web application passes the ticket down to the database. Only with this ticket is it able to identify with this user's id, and only with the group / role / principal assertions is it granted the additional user roles.

For Apache, mod_auth_tkt and mod_auth_pubtkt are perhaps the simplest implementations of this concept (Pubcookie/cosign/webauth are more complex.) As verifying a ticket is cheap, complications around caching can usually be avoided.

For Postgres, I could imagine the web application interacting with it like so during the request:

    BEGIN;
    SELECT my_auth_function('userid=foo;groups=bar1,bar2;timeout=123456789;signature=abcd1234abcd1234abcd1234abcd1234');
    ...
    COMMIT;

Here my_auth_function would validate the ticket and would need to be able to do two things not currently possible with a SECURITY DEFINER function:

1. Call SET SESSION AUTHORIZATION / SET ROLE to become a user the connection user is not otherwise allowed to become.

2. Dynamically set which roles are 'inherited' by the user it is becoming.


Laurence

Re: A mechanism securing web applications in DBMS

From
Geoff Montee
Date:

On Fri, Sep 5, 2014 at 12:21 AM, Laurence Rowe <l@lrowe.co.uk> wrote:


Here my_auth_function would validate the ticket and would need to be able to do two things not currently possible with a SECURITY DEFINER function:

1. Call SET SESSION AUTHORIZATION / SET ROLE to become a user the connection user is not otherwise allowed to become.

2. Dynamically set which roles are 'inherited' by the user it is becoming.


Laurence

I've been testing a similar setup using security barrier views for RLS.

It would be a stretch to call my initialization function an "authentication" function. My application server actually does the authentication using client certificate validation. My database initialization function is primarily for authorization (i.e. it determines what the user should be able to see at a fine-grained level, and it performs the work to grant the right accesses).

Anyway, I also encountered the two issues you mention, and I designed my initialization function to get around them:

1.) My initialization function dynamically creates a role for the user. It returns the role name (as the special "name" text type). The application checks the returned row, and it does the SET ROLE. (And later does a RESET ROLE, since we are using connection pooling.)

2.) My initialization function dynamically grants roles to the user accounts as needed.

Good authentication between the application server and the database server is extra important in my case, since the application is trusted to do the authentication and it is granted access to security definer functions that dynamically create and grant roles.

I just thought I'd share. As a DBA, it's interesting to read ideas about how to push more of this work to the database server.

Geoff Montee

Re: A mechanism securing web applications in DBMS

From
Stephen Frost
Date:
Zhaomo,
 As an FYI- we generally prefer inline responses rather than top-posting on the PostgreSQL mailing lists.  Thanks.

* Zhaomo Yang (zhy001@cs.ucsd.edu) wrote:
> (1) Two philosophies important to our design
> - Try to not force web application developers to make too many changes
> to their apps if they wa.

That's certainly fair.

> - Try to make our mechanism as simple as possible.
> Web application developers have all kinds of backgrounds. If the
> security mechanism is too alien to them, they wouldn't use it.

I'm surprised to hear this and a suggestion to used stored procedures in
the same email- SPs are generally considered 'foreign' to the web
developers that I've talked to. :)  That said, I'll grant that there are
generally two camps: those who expect a database to only have BerkleyDB
level key/value capabilities, and those who know what they're doing and
what relational databases and SQL are all about.  The latter (and clear
minority) group will take advantage of these capabilites, certainly,
regardless of how they are expressed and are likely already comfortable
using stored procedures and database-level roles.

> (2) Why we need to cache application-level users' identifiers
> We want to differentiate application-level users in DBMS, but not by
> creating a DB user (or role in PG's terminology ) for every
> application-level user, otherwise there will be all sorts of problems
> when the number of application-level users is greater than a threshold
> (e.g. catalog, as you mentioned).

While I agree that this can be an issue when things scale up, you *can*
address it by sharding the database based on user.  Even so though, I
agree that PG would do well to improve the situation around this.

> Instead, we still use one DB user
> representing all the application-level users, just as how web apps
> work now. Besides the identifiers (attributes) of a application-level
> user are stored in some private place of the corresponding session
> (e.g. temp table) when the application-level user authenticates so
> that the DBMS can differentiate application-level users. (Connection
> pooling should be fine as long as an application session doesn't
> return its connection until it finishes. )

Fair enough, and the RLS capabilities which are being added to PG will
support this approach.  If a temp table is being used then dynamic SQL
may be required and therefore a plpgsql function will be involved to
handle looking up the current user, as you won't be using PG roles.

> Normally, a web application authenticates an application-level user by
> making a SELECT query with the user provided user id and password on
> the password table to see if there is a match (Of course this is an
> over simplified version of how authentication works. ).  Using our
> mechanism, the web application instead calls the authentication
> function, which does a SELECT on the table first, and store the
> identifiers of that application-level user somewhere if a match found.
> The identifiers of the current application-level user are referenced
> by the policies so that fine-grained access control can be enforced.

That 'somewhere' is certainly something that PG could improve upon- we
don't have SQL-level variable capability today and this means that temp
tables have to be used, which is certainly unfortunate.  I'd love to see
work done to improve this situation.

> (3) CREATE AUTHENTICATION FUNCTION
> In our mechanism, we ask web application developers provide an
> authentication function which normally takes user id and password as
> inputs and returns a row containing all the identifiers (attributes)
> of the corresponding application-level user. Let us call the place
> storing the current application-level user's identifiers as
> "identifier store".

I would *strongly* advocate *against* passing the password to the
database in any (non-hashed) form.  You are much better off using a
one-way hash as early as possible in the stack (ideally, in whatever
system initially receives the password on the server side) and then
comparing that one-way hash.  Of course, passwords in general are not
considered secure and one-time passwords, hardware tokens, or PIV /
HSPD12 / CAC cards with client-side certificates.

> The whole point of this CREATE AUTHENTICATION FUNCTION syntax is to
> reduce developers' work.  By giving developers very specific
> instructions on how to write an authentication function, we hope they
> would find it easy to write one. Admittedly, however, what CREATE
> AUTHENTICATION FUNCTION does can be achieved by CREATE FUNCTION.

I don't see how this is particularly better than simply providing a
function-creating-function (if there is really a concern that creating
two functions instead of just the one is a serious complication..) or,
better yet, creating an extension which creates all the functions,
tables, etc necessary for this system.
Thanks!
    Stephen

Re: A mechanism securing web applications in DBMS

From
Zhaomo Yang
Date:
Stephen,

There are several things I want to explain:

(1) Two philosophies important to our design
- Try to not force web application developers to make too many changes
to their apps if they wa.
- Try to make our mechanism as simple as possible.
Web application developers have all kinds of backgrounds. If the
security mechanism is too alien to them, they wouldn't use it.

(2) Why we need to cache application-level users' identifiers
We want to differentiate application-level users in DBMS, but not by
creating a DB user (or role in PG's terminology ) for every
application-level user, otherwise there will be all sorts of problems
when the number of application-level users is greater than a threshold
(e.g. catalog, as you mentioned). Instead, we still use one DB user
representing all the application-level users, just as how web apps
work now. Besides the identifiers (attributes) of a application-level
user are stored in some private place of the corresponding session
(e.g. temp table) when the application-level user authenticates so
that the DBMS can differentiate application-level users. (Connection
pooling should be fine as long as an application session doesn't
return its connection until it finishes. )

Normally, a web application authenticates an application-level user by
making a SELECT query with the user provided user id and password on
the password table to see if there is a match (Of course this is an
over simplified version of how authentication works. ).  Using our
mechanism, the web application instead calls the authentication
function, which does a SELECT on the table first, and store the
identifiers of that application-level user somewhere if a match found.
The identifiers of the current application-level user are referenced
by the policies so that fine-grained access control can be enforced.

(3) CREATE AUTHENTICATION FUNCTION
In our mechanism, we ask web application developers provide an
authentication function which normally takes user id and password as
inputs and returns a row containing all the identifiers (attributes)
of the corresponding application-level user. Let us call the place
storing the current application-level user's identifiers as
"identifier store".

For example,

This is an authentication function provided by a app developer.

CREATE AUTHENTICATION FUNCTION auth (user_id TEXT, password TEXT)
RETURNS table(uid BIGINT, permissions TEXT[]) AS $$
...
$$ LANGUAGE plpgsql SECURITY DEFINER;

Under the hood, two regular functions will be defined

# the same function with a different name
CREATE FUNCTION _auth (user_id TEXT, password TEXT)
RETURNS table(uid BIGINT, permissions TEXT[]) AS $$ # copy the function body from the CREATE AUTHENTICATION FUNCTION
above
$$ LANGUAGE plpgsql SECURITY DEFINER;

# the function which is actually called in the server code
CREATE FUNCTION auth (user_id TEXT, password TEXT)
RETURNS table(uid BIGINT, permissions TEXT[]) AS $$ # clear the identifier store # execute function _auth and insert
theresult into the identifier store. # return the row in the identifier store 
$$ LANGUAGE plpgsql SECURITY DEFINER;

The whole point of this CREATE AUTHENTICATION FUNCTION syntax is to
reduce developers' work.  By giving developers very specific
instructions on how to write an authentication function, we hope they
would find it easy to write one. Admittedly, however, what CREATE
AUTHENTICATION FUNCTION does can be achieved by CREATE FUNCTION.

Please let me know if you have any other questions.

Zhaomo

On Thu, Sep 4, 2014 at 6:53 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Zhaomo,
>
> * Zhaomo Yang (zhy001@cs.ucsd.edu) wrote:
>> I am a graduate student from UC San Diego. My adviser, Dr. Kirill
>> Levchenko, and I have been working on a web/DB security project for
>> the last few months. Since fine-grained access control in DBMS is part
>> of our project and the PostgreSQL community is also working on it now,
>> we would like to exchange some ideas on the issue with you.
>
> Fantastic!  Very interested.
>
>> 1. Background
>> Nowadays people entrust more and more sensitive data to web
>> applications, while security vulnerabilities are common in these
>> applications. The normal structure of web applications consists of
>> three tiers: client-side code, server-side code and a database. In
>> server-side code a database user representing all the
>> application-level users interacts with the database with full
>> privileges. Since currently database built-in access control is too
>> coarse, no access control mechanism is used in most of web
>> applications. This situation is not ideal since a malicious
>> application user can tamper with other users’ data by exploiting
>> vulnerabilities in the application, and if the application is
>> completely under the malicious user’s control so is the database.
>
> Agreed- we are certainly working to improve that situation, though
> consideration must also be given to how our catalogs are structured and
> that they are unlikely to support "web-scale" numbers of individual
> roles as currently implemented.  We're in a better spot than we were a
> few years ago (we used to have flat files for roles...), but we lack any
> partitioning capability in the catalogs today.
>
> It's entirely possible that, in some situations, the inability of PG to
> support the number of roles has dwarfed the permission granularity
> concern.  Even today there are mechanisms available to limit what an
> individual role can view, add, update, or delete (though they are less
> than ideal, rather cludgy to work with and could be better).
>
>> 2. Our Mechanism.
>> In order to fix the problem, we believe there are two underlying
>> problems to be addressed: authenticating application-level users to
>> DBMS and fine-grained access control in the DBMS. Note that these two
>> problems are related. Without authenticating application-level users
>> to the DBMS, can the DBMS not know which application-level user it is
>> actually interacting with, thus the DBMS cannot apply fine-grained
>> access control policy correctly.
>
> Agreed- PG needs to know the user to be able to get to the level of
> returning only what that individual user is allowed to see.
>
>> 2.1 The authentication problem
>> We address the authentication problem by requiring developers to
>> define an authentication function in the DBMS. This function is
>> invoked whenever an application-level user logs in. An authentication
>> function contains the authentication logic in the server-side code.
>> Besides, it also caches some attributes of the current
>> application-level user, which will be checked by the fine-grained
>> access control mechanism.
>
> This is one approach which can be used, though PG has a reasonably
> complex authentication system which applications can leverage (consider
> Kerberos proxying and client-side certificates), rather than having the
> application independently develop an authentication system.  Still, if
> the application is to be responsible for the authentication, then a
> simple 'SET ROLE' can be done in PG to switch to the context of an
> individual user.
>
>> In our implementation, we added a new type of function called
>> authentication function. Developers are supposed to use CREATE
>> AUTHENTICATION FUNCTION command to define authentication function,
>> which contains the application authentication logic and must be
>> defined to return table rows. When the authentication function is
>> invoked, the return values are cached in a temp table, called the
>> authentication table,  with the same name as the authentication
>> function. The authentication table is available to access control
>> statements (GRANT-WHERE statements).
>
> It'd be great if you could explain the actual difference, in PG, between
> these AUTHENTICATION functions and regular functions (or perhaps
> security definer functions), which can also create tables and return
> rows.
>
>> 2.2 The fined-grained access control problem
>>
>> We address the fined-grained access control problem by using
>> GRANT-WHERE statements from [1]. Concretely, the syntax of GRANT-WHERE
>> statements is
>>
>>       GRANT <privilege> [, ...] ON <relname> TO <user> [, ...]
>>       USING <referenced relname> [, ...]
>>       WHERE <predicate>;
>>
>> where <privilege> can be SELECT, INSERT, UPDATE or DELETE, the USING
>> clause lists the tables of which should be available to the predicate
>> and the WHERE clause specifies the predicate.
>
> Certainly an interesting alternative, though you run the very serious
> risk that the SQL standard will come up with an extention to the GRANT
> syntax that will end up causing problems long-term (this is part of the
> reason for the CREATE POLICY syntax that we're currently working
> towards).
>
>> 3. Integrate our mechanism to PostgreSQL.
>>
>> 3.1 Authentication Function
>>
>> The major challenge of this part is how to cache users’ authentication
>> information. In our prototype implementation, an authentication table
>> is actually a temporary table. But it makes our GRANT-WHERE statements
>> also temporary since they refer to the authentication table. Since
>> every session has its own authentication table and all the
>> authentication tables share the same schema, the global temporary
>> table [2] would be a perfect fit for caching  users’ authentication
>> information. Also, it will not make GRANT-WHERE statements temporary
>> anymore.
>
> This certainly deserves further discussion.  What is the reason that a
> cacheing mechanism into a temporary table is necessary..?  Are these
> session identifiers to keep track of already-authenticated users?  How
> does that play with a connection pooler (which is a very real
> consideration when thinking about anything web-scale).
>
>> 3.2 Fined-Grained Access Control
>>
>> We noticed that the PostgreSQL community is currently working on
>> fine-grained access control. Access control policies can be defined in
>> form of “ALTER TABLE … SET ROW SECURITY” statement. Although
>> functionally “ALTER TABLE … SET ROW SECURITY” statements and
>> GRANT-WHERE statements are equivalent, however, there are some
>> advantages in the latter one:
>>
>> 1) GRANT-WHERE statement is very similar to the SQL GRANT statement,
>> which should be already familiar to most of the developers, thus the
>> developers may feel more comfortable to use it.
>
> To be frank, I don't really buy this argument.  There are certainly
> arguments to be had for why GRANT would make sense instead of ALTER
> TABLE, but "developer comfort" isn't one of them, and the concern
> regarding the SQL specification for the GRANT command trumps it.
>
>> 2) In GRANT-WHERE syntax, a DB user can be specified as the recipient
>> of the GRANT. In the current proposed row security syntax, one needs
>> to write a big statement containing all the predicates for all the
>> possible recipients per table per operation.
>
> This isn't actually accurate- the latest CREATE POLICY syntax includes
> the ability to specify roles to which the policy is to be applied.
> Further, the next patch will include multiple expressions to address
> what happens during UPDATE (where you have to consider what rows are
> allowed to be selected, and what rows are allowed to be added back into
> the table).  That would certainly further complicate the already quite
> complicated GRANT syntax, and not to much advantage that I can see.
>
>> For example, suppose we have the following two GRANT-WHERE statements on table1
>
> [...]
>
>> As you can see, this syntax requires us to combine two logically
>> separate statement into one statement. Also, it can be hard for
>> developers to debug.
>
> We've moved off of ALTER TABLE for this approach for a number of quite
> reasonable reasons and have moved to an independent CREATE POLICY
> approach, which addresses the above concerns.
>
> That said, I do feel that in many situations the reality is that users
> will simply create a stored procedure (in plpgsql or similar) which then
> evaluates what is being requested against another table or possibly even
> an external policy system (eg: SELinux) to determine if the record
> should be returned.  This runs the risk of not being as performant,
> which is absolutely a concern we are considering, but supporting the
> per-role policies (which are already in the latest patch) will hopefully
> address some of that concern, and we may be able to do even better in
> the future.
>
>> 4. Collaboration.
>>
>> We are looking forward to joining forces with PostgreSQL community on
>> this problem. More specifically, we have the following suggestions
>> which we hope the community would consider:
>>
>> 1)  Using GRANT-WHERE syntax for the row-level access control. If for
>> any reasons it is impossible, we hope at least a clause will be added
>> to the “ALTER TABLE … SET ROW SECURITY” syntax such that a recipient
>> can be specified;
>
> Right, already done and integrated into the row-security patch (though
> the specific syntax is CREATE POLICY, as discussed previously).  We
> certainly welcome feedback on that patch and would love to discuss
> further- feel free to contact me directly and we'll set up time to
> discuss.
>
>> 2) Restarting Global Temporary Table project;
>
> I'm certainly for this.
>
>> 3) Accepting “CREATE AUTHENTICATION FUNCTION” as a variant of “CREATE
>> FUNCTION” command.
>
> More information about how it's different from CREATE FUNCTION would be
> very helpful for this.
>
>> We’d like to discuss the mechanisms above with the community further,
>> and if any of the mechanisms are accepted by the community, we also
>> would like to take part in implementing it. Please feel free to
>> contact us if you have any questions.
>
> Consider it done- I'd like to collaborate with you on the row-security
> discussion primairly, but also about how to manage very many users in
> PG and the authentication challenges.
>
>         Thanks!
>
>                 Stephen



Re: A mechanism securing web applications in DBMS

From
Stephen Frost
Date:
* Zhaomo Yang (zhy001@cs.ucsd.edu) wrote:
> I am surprised to hear this too. :) We haven't talked to many web
> developers yet and this is one of the things we need to do in the
> future.

Certainly an excellent idea to talk to your target audience. :)

> The goal of this mechanism is to add another layer of protection
> inside DBMS so that even if the application server is compromised the
> users' data is under protection*. This requires DBMS to be able to
> authenticate application-level users  (know which application-level
> user it is communicating with). That it, we need to move the
> authentication logic of application-level users into DBMS. For this
> purpose, using store procedures (or something similar) is a must. I
> think even if a security mechanism is designed to be easy to use, it
> will still require some expertise.

I agree that good security does require expertise to get right.

> * this mechanism can't help if the attackers control the app server
> completely and the users are not aware of that and keep using the app.
> In that case the attackers will be able to collect all the credentials
> of the users who log in while they are in charge.

This is really the crux of the problem you're trying to solve- what is
the attack vector?  Based on the discussion so far, I imagine you're
considering the "attacker can submit arbitrary SQL" situation, where the
attacker doesn't have full access to the application server.  That's
certainly a worthwhile thing to consider, but I'm not entirely sure the
approach you've outlined will work out well..

> > If a temp table is being used then dynamic SQL may be required and therefore
> > a plpgsql function will be involved to handle looking up the current user, as you
> > won't be using PG roles.
>
> This is why I'd like to have global temp table in PG. With that we can
> probably get around of dynamic SQL.

Have you considered just using a regular, but unlogged, table?  That
would also avoid any risk that the application manages to drop or shadow
the temp table somehow with a "fake" table that changes who is currently
authenticated, and avoids having to figure out how to deal with the temp
table vanishing due to the connections going away.

> You are absolutely right. I should've explained it better. I just
> wanted to show how authentication works and skipped all the hashing
> part.

Ah, ok.
Thanks,
    Stephen

Re: A mechanism securing web applications in DBMS

From
Zhaomo Yang
Date:
Stephen,

> As an FYI- we generally prefer inline responses rather than top-posting on the PostgreSQL mailing lists.  Thanks.

Sorry for that.

> > - Try to make our mechanism as simple as possible.
> > Web application developers have all kinds of backgrounds. If the
> > security mechanism is too alien to them, they wouldn't use it.
> I'm surprised to hear this and a suggestion to used stored procedures in
> the same email- SPs are generally considered 'foreign' to the web
> developers that I've talked to. :)  That said, I'll grant that there are
> generally two camps: those who expect a database to only have BerkleyDB
> level key/value capabilities, and those who know what they're doing and
> what relational databases and SQL are all about.  The latter (and clear
> minority) group will take advantage of these capabilites, certainly,
> regardless of how they are expressed and are likely already comfortable
> using stored procedures and database-level roles.

I am surprised to hear this too. :) We haven't talked to many web
developers yet and this is one of the things we need to do in the
future.

The goal of this mechanism is to add another layer of protection
inside DBMS so that even if the application server is compromised the
users' data is under protection*. This requires DBMS to be able to
authenticate application-level users  (know which application-level
user it is communicating with). That it, we need to move the
authentication logic of application-level users into DBMS. For this
purpose, using store procedures (or something similar) is a must. I
think even if a security mechanism is designed to be easy to use, it
will still require some expertise.

* this mechanism can't help if the attackers control the app server
completely and the users are not aware of that and keep using the app.
In that case the attackers will be able to collect all the credentials
of the users who log in while they are in charge.

> If a temp table is being used then dynamic SQL may be required and therefore
> a plpgsql function will be involved to handle looking up the current user, as you
> won't be using PG roles.

This is why I'd like to have global temp table in PG. With that we can
probably get around of dynamic SQL.

> > (3) CREATE AUTHENTICATION FUNCTION
> > In our mechanism, we ask web application developers provide an
> > authentication function which normally takes user id and password as
> > inputs and returns a row containing all the identifiers (attributes)
> > of the corresponding application-level user. Let us call the place
> > storing the current application-level user's identifiers as
> > "identifier store".
> I would *strongly* advocate *against* passing the password to the
> database in any (non-hashed) form.  You are much better off using a
> one-way hash as early as possible in the stack (ideally, in whatever
> system initially receives the password on the server side) and then
> comparing that one-way hash.  Of course, passwords in general are not
> considered secure and one-time passwords, hardware tokens, or PIV /
> HSPD12 / CAC cards with client-side certificates.

You are absolutely right. I should've explained it better. I just
wanted to show how authentication works and skipped all the hashing
part.

Thanks,
Zhaomo

On Fri, Sep 5, 2014 at 5:52 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Zhaomo,
>
>   As an FYI- we generally prefer inline responses rather than
>   top-posting on the PostgreSQL mailing lists.  Thanks.
>
> * Zhaomo Yang (zhy001@cs.ucsd.edu) wrote:
>> (1) Two philosophies important to our design
>> - Try to not force web application developers to make too many changes
>> to their apps if they wa.
>
> That's certainly fair.
>
>> - Try to make our mechanism as simple as possible.
>> Web application developers have all kinds of backgrounds. If the
>> security mechanism is too alien to them, they wouldn't use it.
>
> I'm surprised to hear this and a suggestion to used stored procedures in
> the same email- SPs are generally considered 'foreign' to the web
> developers that I've talked to. :)  That said, I'll grant that there are
> generally two camps: those who expect a database to only have BerkleyDB
> level key/value capabilities, and those who know what they're doing and
> what relational databases and SQL are all about.  The latter (and clear
> minority) group will take advantage of these capabilites, certainly,
> regardless of how they are expressed and are likely already comfortable
> using stored procedures and database-level roles.
>
>> (2) Why we need to cache application-level users' identifiers
>> We want to differentiate application-level users in DBMS, but not by
>> creating a DB user (or role in PG's terminology ) for every
>> application-level user, otherwise there will be all sorts of problems
>> when the number of application-level users is greater than a threshold
>> (e.g. catalog, as you mentioned).
>
> While I agree that this can be an issue when things scale up, you *can*
> address it by sharding the database based on user.  Even so though, I
> agree that PG would do well to improve the situation around this.
>
>> Instead, we still use one DB user
>> representing all the application-level users, just as how web apps
>> work now. Besides the identifiers (attributes) of a application-level
>> user are stored in some private place of the corresponding session
>> (e.g. temp table) when the application-level user authenticates so
>> that the DBMS can differentiate application-level users. (Connection
>> pooling should be fine as long as an application session doesn't
>> return its connection until it finishes. )
>
> Fair enough, and the RLS capabilities which are being added to PG will
> support this approach.  If a temp table is being used then dynamic SQL
> may be required and therefore a plpgsql function will be involved to
> handle looking up the current user, as you won't be using PG roles.
>
>> Normally, a web application authenticates an application-level user by
>> making a SELECT query with the user provided user id and password on
>> the password table to see if there is a match (Of course this is an
>> over simplified version of how authentication works. ).  Using our
>> mechanism, the web application instead calls the authentication
>> function, which does a SELECT on the table first, and store the
>> identifiers of that application-level user somewhere if a match found.
>> The identifiers of the current application-level user are referenced
>> by the policies so that fine-grained access control can be enforced.
>
> That 'somewhere' is certainly something that PG could improve upon- we
> don't have SQL-level variable capability today and this means that temp
> tables have to be used, which is certainly unfortunate.  I'd love to see
> work done to improve this situation.
>
>> (3) CREATE AUTHENTICATION FUNCTION
>> In our mechanism, we ask web application developers provide an
>> authentication function which normally takes user id and password as
>> inputs and returns a row containing all the identifiers (attributes)
>> of the corresponding application-level user. Let us call the place
>> storing the current application-level user's identifiers as
>> "identifier store".
>
> I would *strongly* advocate *against* passing the password to the
> database in any (non-hashed) form.  You are much better off using a
> one-way hash as early as possible in the stack (ideally, in whatever
> system initially receives the password on the server side) and then
> comparing that one-way hash.  Of course, passwords in general are not
> considered secure and one-time passwords, hardware tokens, or PIV /
> HSPD12 / CAC cards with client-side certificates.
>
>> The whole point of this CREATE AUTHENTICATION FUNCTION syntax is to
>> reduce developers' work.  By giving developers very specific
>> instructions on how to write an authentication function, we hope they
>> would find it easy to write one. Admittedly, however, what CREATE
>> AUTHENTICATION FUNCTION does can be achieved by CREATE FUNCTION.
>
> I don't see how this is particularly better than simply providing a
> function-creating-function (if there is really a concern that creating
> two functions instead of just the one is a serious complication..) or,
> better yet, creating an extension which creates all the functions,
> tables, etc necessary for this system.
>
>         Thanks!
>
>                 Stephen



Re: A mechanism securing web applications in DBMS

From
Stephen Frost
Date:
Zhaomo,

* Zhaomo Yang (zhy001@cs.ucsd.edu) wrote:
> > Have you considered just using a regular, but unlogged, table?  That
> > would also avoid any risk that the application manages to drop or shadow
> > the temp table somehow with a "fake" table that changes who is currently
> > authenticated, and avoids having to figure out how to deal with the temp
> > table vanishing due to the connections going away.
>
> So then all the currently logged in users will be stored in the same
> table, which means we also need to make sure that the correct row in
> that table is used when the row-level security policy refers to the
> current application-level user.

Yes- but that's pretty trivially done, given that you've stipulated that
a single connection DB connection must be used from authentication until
de-authentication.  All that is needed is an additional column in the
auth table which is populated with a pseudo-random value which is
guaranteed to be unique and constant for the duration of the
authenticated time- and the database backend PID is perfect for that.
The auth function can call the pg_backend_pid() function directly and
then the policies can include a 'pid = pg_backend_pid()' as part of the
join to the auth table.  The auth function can also complain loudly if
an entry in the pid table is found with the current PID during auth (and
similar- the de-auth function can complain if an entry with the current
PID is *not* found).  This would eliminate the need for the on-connect
triggers, I believe (though those are interesting for other reasons..).

> Let me send you a copy of our paper in a separate email which is a
> thorough description of the mechanism (including background, threat
> model, how it works, etc), which should give you an better idea on
> every aspect of the mechanism. Please do not distribute it because it
> has been accepted for publication. Note that the implementation we
> show in the paper is just a prototype (we made the changes so that we
> could implement it quickly). Our goal always is to integrate our
> mechanism into open source DBMS's like PG and MySQL cleanly.

It'd be very interesting to see this done with the unlogged table,
security definer functions, and the row-level policies patch which we're
working on.  I'd further suggest that the application also use multiple
roles which are set noinherit and 'set role' based on the operation
which it's currently being used for- this would add another level of
protection.  Using stored procedures (for more than just the auth and
de-auth functions as suggested here) can also be a good idea.
Thanks,
    Stephen

Re: A mechanism securing web applications in DBMS

From
Zhaomo Yang
Date:
Stephen,

> Yes- but that's pretty trivially done, given that you've stipulated that
> a single connection DB connection must be used from authentication until
> de-authentication.  All that is needed is an additional column in the
> auth table which is populated with a pseudo-random value which is
> guaranteed to be unique and constant for the duration of the
> authenticated time- and the database backend PID is perfect for that.
> The auth function can call the pg_backend_pid() function directly and
> then the policies can include a 'pid = pg_backend_pid()' as part of the
> join to the auth table.  The auth function can also complain loudly if
> an entry in the pid table is found with the current PID during auth (and
> similar- the de-auth function can complain if an entry with the current
> PID is *not* found).  This would eliminate the need for the on-connect
> triggers, I believe (though those are interesting for other reasons..).


You are right. Using unlogged table is a good idea. I'll try it out.
Thanks for your advice!

>  It'd be very interesting to see this done with the unlogged table,
> security definer functions, and the row-level policies patch which we're
> working on.  I'd further suggest that the application also use multiple
> roles which are set noinherit and 'set role' based on the operation
> which it's currently being used for- this would add another level of
> protection.  Using stored procedures (for more than just the auth and
> de-auth functions as suggested here) can also be a good idea.


Currently auth functions are security definer functions. I'm gonna try
to create a patch using unlogged table + RLS and put it online (e.g.
this mail list) so that people can try it.

Thanks,
Zhaomo

On Sat, Sep 13, 2014 at 4:00 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Zhaomo,
>
> * Zhaomo Yang (zhy001@cs.ucsd.edu) wrote:
>> > Have you considered just using a regular, but unlogged, table?  That
>> > would also avoid any risk that the application manages to drop or shadow
>> > the temp table somehow with a "fake" table that changes who is currently
>> > authenticated, and avoids having to figure out how to deal with the temp
>> > table vanishing due to the connections going away.
>>
>> So then all the currently logged in users will be stored in the same
>> table, which means we also need to make sure that the correct row in
>> that table is used when the row-level security policy refers to the
>> current application-level user.
>
> Yes- but that's pretty trivially done, given that you've stipulated that
> a single connection DB connection must be used from authentication until
> de-authentication.  All that is needed is an additional column in the
> auth table which is populated with a pseudo-random value which is
> guaranteed to be unique and constant for the duration of the
> authenticated time- and the database backend PID is perfect for that.
> The auth function can call the pg_backend_pid() function directly and
> then the policies can include a 'pid = pg_backend_pid()' as part of the
> join to the auth table.  The auth function can also complain loudly if
> an entry in the pid table is found with the current PID during auth (and
> similar- the de-auth function can complain if an entry with the current
> PID is *not* found).  This would eliminate the need for the on-connect
> triggers, I believe (though those are interesting for other reasons..).
>
>> Let me send you a copy of our paper in a separate email which is a
>> thorough description of the mechanism (including background, threat
>> model, how it works, etc), which should give you an better idea on
>> every aspect of the mechanism. Please do not distribute it because it
>> has been accepted for publication. Note that the implementation we
>> show in the paper is just a prototype (we made the changes so that we
>> could implement it quickly). Our goal always is to integrate our
>> mechanism into open source DBMS's like PG and MySQL cleanly.
>
> It'd be very interesting to see this done with the unlogged table,
> security definer functions, and the row-level policies patch which we're
> working on.  I'd further suggest that the application also use multiple
> roles which are set noinherit and 'set role' based on the operation
> which it's currently being used for- this would add another level of
> protection.  Using stored procedures (for more than just the auth and
> de-auth functions as suggested here) can also be a good idea.
>
>         Thanks,
>
>                 Stephen



Re: A mechanism securing web applications in DBMS

From
Stephen Frost
Date:
Zhaomo,

* Zhaomo Yang (zhy001@cs.ucsd.edu) wrote:
> You are right. Using unlogged table is a good idea. I'll try it out.
> Thanks for your advice!

Happy to help.  Another option would be to have a custom GUC for this
information.  The issue we have with that currently is that it can be
set by anyone..  Your extension could create one and register functions
which are called when it's set though, and only allow it to be set when
the auth/deauth functions are used.  This would get rid of the need for
any kind of table.

> Currently auth functions are security definer functions. I'm gonna try
> to create a patch using unlogged table + RLS and put it online (e.g.
> this mail list) so that people can try it.

I'd strongly suggest that you look into creating PostgreSQL extensions
and using that mechanism as a way to distribute your security definer
functions and other components of this solution as a single, complete,
package which users can install with just "CREATE EXTENSION ...".  That
might help with both getting others to test and play with your solution.
Thanks!
    Stephen