Thread: Arguable RLS security bug, EvalPlanQual() paranoia

Arguable RLS security bug, EvalPlanQual() paranoia

From
Peter Geoghegan
Date:
Very minor concern about RLS
========================

This existing ExecUpdate() comment seems a little inaccurate:

/*
* Check any RLS UPDATE WITH CHECK policies
*
* If we generate a new candidate tuple after EvalPlanQual testing, we
* must loop back here and recheck any RLS policies and constraints.
* (We don't need to redo triggers, however.  If there are any BEFORE
* triggers then trigger.c will have done heap_lock_tuple to lock the
* correct tuple, so there's no need to do them again.)

Because ExecBRUpdateTriggers() does an UPSERT-style heap_lock_tuple()
call, restarting within ExecUpdate() should be impossible once
ExecBRUpdateTriggers() returns and returns a value != NULL (this is
*essential* for UPSERT, but happens to be true here, too).

Aside from this comment understating what can be relied on in a way
that seems a bit inaccurate, I worry about RLS leaks made possible by
EPQ, which is a hazard that other MVCC systems naturally don't have,
because unlike Postgres they have statement-level rollback to deal
with READ COMMITTED mode concurrent UPDATEs. Excuse me if I missed it,
but was this something that was considered?

What I'm imagining is a maliciously constructed UPDATE. It's run in
such a way that it could be allowed to UPDATE a row based on that not
being limited by the security barrier quals added by RLS. However,
having followed the UPDATE chain (i.e. just before "goto lreplace"),
the quals now don't pass and ExecUpdate() returns NULL (because of the
security barrier quals not passing EvalPlanQual() recheck on the new
tuple version -- the only possible reason, we'll say).  The UPDATE is
a no-op, in the sense that it generates an identical new tuple
(identical to the old/existing one). By enabling log_lock_waits, the
attacker can observe that the new row version does not pass.

Why does this matter, given what is already obviously possible --
repeatedly selecting (or maybe "no-op updating") from the target table
in a READ COMMITTED xact, and seeing what goes away? Well, for one
thing, log_lock_waits will leak the XID on the concurrent (more
privileged) updater. It might then be trivial to reconstruct the
identity of the privileged updater based on some other row in some
other table, where xmin matches that privileged XID. Maybe that's a
hazard not peculiar to Postgres, though -- one can imagine a system
with statement-level rollback also leaking this information, and
perhaps that's just the way things are for every system with RLS, sort
of like constraint-related leaks. What I'm more concerned about is the
way things may not be fully consistent with the command's MVCC
snapshot, and that really is a thing peculiar to Postgres.

Arguable RLS security bug and EvalPlanQual()
====================================

It is certainly conceivable that an UPDATE's USING security barrier
qual contains a subquery referencing a relation that is not the
UPDATE's target relation -- CREATE POLICY is very flexible in
accepting such USING security barrier quals. Does this not introduce
security hazards around other relations being queried using the
command's MVCC snapshot, while the target has an EPQ-installed (only
dirty snapshot visible) tuple? See attached patch, an example
illustrating this arguable security problem using isolationtester. It
could certainly be argued that EvalPlanQual() allows an unacceptable
information leak, and I'd say that at the very least we need to
document this.

If you're using another well known MVCC database system that has RLS,
I imagine when this happens the attacker similarly waits on the
conflicting (privileged) xact to finish (in my example in the patch,
Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
Mallory would then have her malicious UPDATE statement entirely rolled
back, and her statement would acquire an entirely new MVCC snapshot,
to be used by the USING security barrier qual (and everything else)
from scratch. This other system would then re-run her UPDATE with the
new MVCC snapshot. This would repeat until Mallory's UPDATE statement
completes without encountering any concurrent UPDATEs/DELETEs to her
would-be affected rows.

In general, with this other database system, an UPDATE must run to
completion without violating MVCC, even in READ COMMITTED mode. For
that reason, I think we can take no comfort from the presumption that
this flexibility in USING security barrier quals (allowing subqueries,
etc) works securely in this other system. (I actually didn't check
this out, but I imagine it's true).

I'm sorry if I just missed the discussion around this and this sounds
a bit alarmist. I want to satisfy myself that we have this right.

--
Peter Geoghegan

Attachment

Re: Arguable RLS security bug, EvalPlanQual() paranoia

From
Peter Geoghegan
Date:
On Mon, Jun 1, 2015 at 12:29 AM, Peter Geoghegan <pg@heroku.com> wrote:
> If you're using another well known MVCC database system that has RLS,
> I imagine when this happens the attacker similarly waits on the
> conflicting (privileged) xact to finish (in my example in the patch,
> Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
> Mallory would then have her malicious UPDATE statement entirely rolled
> back, and her statement would acquire an entirely new MVCC snapshot,
> to be used by the USING security barrier qual (and everything else)
> from scratch. This other system would then re-run her UPDATE with the
> new MVCC snapshot. This would repeat until Mallory's UPDATE statement
> completes without encountering any concurrent UPDATEs/DELETEs to her
> would-be affected rows.
>
> In general, with this other database system, an UPDATE must run to
> completion without violating MVCC, even in READ COMMITTED mode. For
> that reason, I think we can take no comfort from the presumption that
> this flexibility in USING security barrier quals (allowing subqueries,
> etc) works securely in this other system. (I actually didn't check
> this out, but I imagine it's true).

Where are we on this?

Discussion during pgCon with Heikki and Andres led me to believe that
the issue is acceptable. The issue can be documented to help ensure
that user expectation is in line with actual user-visible behavior.
Unfortunately, I think that that will be a clunky documentation patch.

-- 
Peter Geoghegan



Re: Arguable RLS security bug, EvalPlanQual() paranoia

From
Robert Haas
Date:
On Sun, Jul 19, 2015 at 8:56 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Jun 1, 2015 at 12:29 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> If you're using another well known MVCC database system that has RLS,
>> I imagine when this happens the attacker similarly waits on the
>> conflicting (privileged) xact to finish (in my example in the patch,
>> Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
>> Mallory would then have her malicious UPDATE statement entirely rolled
>> back, and her statement would acquire an entirely new MVCC snapshot,
>> to be used by the USING security barrier qual (and everything else)
>> from scratch. This other system would then re-run her UPDATE with the
>> new MVCC snapshot. This would repeat until Mallory's UPDATE statement
>> completes without encountering any concurrent UPDATEs/DELETEs to her
>> would-be affected rows.
>>
>> In general, with this other database system, an UPDATE must run to
>> completion without violating MVCC, even in READ COMMITTED mode. For
>> that reason, I think we can take no comfort from the presumption that
>> this flexibility in USING security barrier quals (allowing subqueries,
>> etc) works securely in this other system. (I actually didn't check
>> this out, but I imagine it's true).
>
> Where are we on this?
>
> Discussion during pgCon with Heikki and Andres led me to believe that
> the issue is acceptable. The issue can be documented to help ensure
> that user expectation is in line with actual user-visible behavior.
> Unfortunately, I think that that will be a clunky documentation patch.

Perhaps I'm missing something, but it looks to me like Stephen has
done absolutely nothing about the many issues reported with the RLS
patch.  I organized the open items list by topic on June 26th; almost
a month later, four more issues have been added to the section on RLS,
and none have been removed.

I think it is right that we should be concerned about this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Arguable RLS security bug, EvalPlanQual() paranoia

From
Stephen Frost
Date:
Robert,

As I mentioned up thread, I'm out until the 27th. I have posted a patch which I will push to fix the copy.c issue, and I have already stated that I'll address the statistics issue. Further, Joe has also been working on issues but he was out of pocket last week attending a conference. 

I'm happy to work up a documentation patch for this when I get back. 

Thanks!

Stephen

On Tuesday, July 21, 2015, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jul 19, 2015 at 8:56 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Jun 1, 2015 at 12:29 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> If you're using another well known MVCC database system that has RLS,
>> I imagine when this happens the attacker similarly waits on the
>> conflicting (privileged) xact to finish (in my example in the patch,
>> Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
>> Mallory would then have her malicious UPDATE statement entirely rolled
>> back, and her statement would acquire an entirely new MVCC snapshot,
>> to be used by the USING security barrier qual (and everything else)
>> from scratch. This other system would then re-run her UPDATE with the
>> new MVCC snapshot. This would repeat until Mallory's UPDATE statement
>> completes without encountering any concurrent UPDATEs/DELETEs to her
>> would-be affected rows.
>>
>> In general, with this other database system, an UPDATE must run to
>> completion without violating MVCC, even in READ COMMITTED mode. For
>> that reason, I think we can take no comfort from the presumption that
>> this flexibility in USING security barrier quals (allowing subqueries,
>> etc) works securely in this other system. (I actually didn't check
>> this out, but I imagine it's true).
>
> Where are we on this?
>
> Discussion during pgCon with Heikki and Andres led me to believe that
> the issue is acceptable. The issue can be documented to help ensure
> that user expectation is in line with actual user-visible behavior.
> Unfortunately, I think that that will be a clunky documentation patch.

Perhaps I'm missing something, but it looks to me like Stephen has
done absolutely nothing about the many issues reported with the RLS
patch.  I organized the open items list by topic on June 26th; almost
a month later, four more issues have been added to the section on RLS,
and none have been removed.

I think it is right that we should be concerned about this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Arguable RLS security bug, EvalPlanQual() paranoia

From
Stephen Frost
Date:
* Peter Geoghegan (pg@heroku.com) wrote:
> On Mon, Jun 1, 2015 at 12:29 AM, Peter Geoghegan <pg@heroku.com> wrote:
> > If you're using another well known MVCC database system that has RLS,
> > I imagine when this happens the attacker similarly waits on the
> > conflicting (privileged) xact to finish (in my example in the patch,
> > Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
> > Mallory would then have her malicious UPDATE statement entirely rolled
> > back, and her statement would acquire an entirely new MVCC snapshot,
> > to be used by the USING security barrier qual (and everything else)
> > from scratch. This other system would then re-run her UPDATE with the
> > new MVCC snapshot. This would repeat until Mallory's UPDATE statement
> > completes without encountering any concurrent UPDATEs/DELETEs to her
> > would-be affected rows.
> >
> > In general, with this other database system, an UPDATE must run to
> > completion without violating MVCC, even in READ COMMITTED mode. For
> > that reason, I think we can take no comfort from the presumption that
> > this flexibility in USING security barrier quals (allowing subqueries,
> > etc) works securely in this other system. (I actually didn't check
> > this out, but I imagine it's true).
>
> Where are we on this?
>
> Discussion during pgCon with Heikki and Andres led me to believe that
> the issue is acceptable. The issue can be documented to help ensure
> that user expectation is in line with actual user-visible behavior.
> Unfortunately, I think that that will be a clunky documentation patch.

It's important to realize that this is an issue beyond RLS and that it
impacts Security Barrier Views also.

One idea which I have for making the documentation patch a bit less
clunky is to provide a simple way for users to address the issue.  Along
those lines, here's what I'd suggest (certainly open for discussion):

-----------------------
When reducing the set of rows which a user has access to, through
modifications to relations referenced by Row-Level Security Policies or
Security Barrier Views, be aware that users with a currently open
transaction might have a lock on an existing row and be able to see that
row after the change.  The best approach to avoid any possible leak of
information during a reduction of a user's rights is to ensure that the
user does not have any open transactions, perhaps by ensuring they have
no concurrent sessions running.
-----------------------

Thoughts?  Trying to keep it straight-forward and provide a simple
solution for users to be able to address the issue, if they're worried
about it.  Perhaps this, plus an additional paragraph which goes into
more detail about exactly what's going on?
Thanks!
    Stephen

Re: Arguable RLS security bug, EvalPlanQual() paranoia

From
Peter Geoghegan
Date:
On Mon, Aug 3, 2015 at 3:07 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Thoughts?  Trying to keep it straight-forward and provide a simple
> solution for users to be able to address the issue, if they're worried
> about it.  Perhaps this, plus an additional paragraph which goes into
> more detail about exactly what's going on?

I'm still thinking about it, but I think you have the right idea here.

However, as the docs put it when talking about conventional access
controls and SELECT: "The use of FOR UPDATE or FOR SHARE requires
UPDATE privilege as well [as SELECT privilege] (for at least one
column of each table so selected)". I wonder if RLS needs to consider
this, too.

If you can just say that you don't have to worry about this when the
user has no right to UPDATE or DELETE the rows in the first place,
that makes it more practical to manage the issue. ISTM that as things
stand, you can't say that because RLS does not consider SELECT FOR
UPDATE special in any way.

-- 
Peter Geoghegan



Re: Arguable RLS security bug, EvalPlanQual() paranoia

From
Adam Brightwell
Date:
On Mon, Aug 3, 2015 at 6:21 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Aug 3, 2015 at 3:07 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> Thoughts?  Trying to keep it straight-forward and provide a simple
>> solution for users to be able to address the issue, if they're worried
>> about it.  Perhaps this, plus an additional paragraph which goes into
>> more detail about exactly what's going on?
>
> I'm still thinking about it, but I think you have the right idea here.

I have attached a patch for review that I believe addresses the
documentation side of this issue.

Thoughts or comments?

Thanks,
Adam

--
Adam Brightwell - adam.brightwell@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com

Attachment

Re: Arguable RLS security bug, EvalPlanQual() paranoia

From
Stephen Frost
Date:
* Adam Brightwell (adam.brightwell@crunchydatasolutions.com) wrote:
> On Mon, Aug 3, 2015 at 6:21 PM, Peter Geoghegan <pg@heroku.com> wrote:
> > On Mon, Aug 3, 2015 at 3:07 PM, Stephen Frost <sfrost@snowman.net> wrote:
> >> Thoughts?  Trying to keep it straight-forward and provide a simple
> >> solution for users to be able to address the issue, if they're worried
> >> about it.  Perhaps this, plus an additional paragraph which goes into
> >> more detail about exactly what's going on?
> >
> > I'm still thinking about it, but I think you have the right idea here.
>
> I have attached a patch for review that I believe addresses the
> documentation side of this issue.
>
> Thoughts or comments?

I'm not convinced this is the right place, but at a minimum it should be
referenced from the RLS documentation.  Further, it should be noted that
users who have direct SQL access can control what the isolation level
is for their transaction.

Also, isn't it possible to avoid this by locking the records?  If the
locking fails or blocks then you know another user has those records
locked and you don't update or you wait until you hold the lock.
Assuming that works (I don't immediately see why it wouldn't..), we
should provide an example.

Thanks!

Stephen

Re: Arguable RLS security bug, EvalPlanQual() paranoia

From
Adam Brightwell
Date:
> I'm not convinced this is the right place, but at a minimum it should be
> referenced from the RLS documentation.  Further, it should be noted that
> users who have direct SQL access can control what the isolation level
> is for their transaction.

I agree that it should be referenced by the RLS docs.  However, I'm
not sure I can think of a better place for it.  My reasons for
choosing this location was that the behavior seems specific to the
READ COMMITTED isolation level and was an accepted MVCC anomaly; not
necessarily specific only to RLS and SBV.  But, again, I'd agree that
referencing it in the other locations would be desired.  Of course,
I'm willing to accept that I may be making the wrong assumptions.

> Also, isn't it possible to avoid this by locking the records?  If the
> locking fails or blocks then you know another user has those records
> locked and you don't update or you wait until you hold the lock.
> Assuming that works (I don't immediately see why it wouldn't..), we
> should provide an example.

I haven't found that to work, at least not with the specific case
presented by Peter.  Based on the following (output from Peter's
isolation test), I would understand that the 'malicious' UPDATE is
waiting for the previous update to be committed before it continues,
even without the FOR UPDATE lock on the rows.

step no_trust_mallory: update users set group_id = 1 where user_name =
'mallory';
step update_hc: update information set info = 'secret' where group_id = 2;
step updatem: update information set info = info where group_id = 2
returning 'mallory update: ' m, *; <waiting ...>
step commit_hc: commit;
step updatem: <... completed>

As well, due to this, as described by the READ COMMITTED documentation:

"it is possible for an updating command to see an inconsistent
snapshot: it can see the effects of concurrent updating commands on
the same rows it is trying to update"

I'm not convinced that this is something that FOR UPDATE can address
for this specific case.  If inconsistencies in the 'snapshot' can be
expected and are accepted at this isolation level, then I'm not sure
how we can reasonably expect locking the rows to have any affect.
Though, again, I'm willing to accept that I am not fully understanding
this behavior and that I am completely wrong.

-Adam

-- 
Adam Brightwell - adam.brightwell@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com