Re: Row-Level Security - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Row-Level Security
Date
Msg-id 603c8f070912140318i6bbc1d63o9e81d409d0b342b6@mail.gmail.com
Whole thread Raw
In response to Re: Row-Level Security  (KaiGai Kohei <kaigai@ak.jp.nec.com>)
Responses Re: Row-Level Security
List pgsql-hackers
2009/12/14 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> Robert Haas wrote:
>>>>> One point. MAC is "mandatory", so the table owner should not be able to
>>>>> control whether row-level checks are applied, or not.
>>>>> So, I used a special purpose system column to represent security label.
>>>>> It is generated for each tables, and no additional storage consumption
>>>>> when MAC feature is disabled.
>>>> My current feeling is that a special-purpose system column is not the
>>>> best approach.  I don't see what we gain by doing it that way.  Even
>>>> in an SE-PostgreSQL environment, row-level security might not be
>>>> desired on every table - after all, we've been told that SE-PostgreSQL
>>>> is useful without any row-level security AT ALL, so it's not hard to
>>>> think there could be environments where only some tables need to
>>>> protected.  So I think we want to have a way to turn it on and off on
>>>> a per-table basis.
>>>>
>>>> Of course, as you point out, we have to make sure that anyone who
>>>> tries to turn RLS on or off for a particular table is authorized to
>>>> perform that operation.  But that's a separate problem which is I
>>>> don't think has much to do with row-level security.
>>> Yes, it is a separate problem not to be concluded at the moment.
>>> (Perhaps, it depends on security model. In DAC, per-table basis is preferable.)
>>
>> Even for MAC, it might be desirable to turn it off on codes tables or
>> the like, to minimize the performance hit.  But we can defer this
>> question to another day.
>
> Yes, I provide sepgsql_row_level guc in my local branch to turn on/off
> its row-level controls. It allows to reduce performance penalty related
> to RLS and reduce storage consumption for security labels. (It requires
> additional sizeof(Oid) bytes for each tuples.)
> The point is this guc option is configurable from the only administrator
> who can edit $PGDATA/postgresql.conf.
>
> But it is an implementation detail not to be concluded at the moment.

Well, that would be a global switch, not per table.

>>> If we set up database cluster without any label-based MAC, all the tuple
>>> shall not have any security label. If the security label is stored within
>>> regular column, we have to modify schema for any tables at first.
>>> If system column provides a security label of tuple, we can dynamically
>>> generate an appropriate security label. In SELinux case, it assumes any
>>> unlabeled objects performs as if it has a pseudo security label:
>>>  system_u:object_r:unlabeled_t:s0
>>>
>>> Needless to say, we need to assign appropriate security labels for
>>> meaningful access controls later, but it does not require any schema
>>> changes, even if we repeat to turn on/off the label-based MAC feature.
>>>
>>> When label-based MAC feature is disabled, this system column can return
>>> a pseudo value such as NULL or empty string.
>>
>> I think you are wrong about all of this.  To add security labels to
>> existing tuples, you're going to need to rewrite the table, period.
>> Whether you're adding a column in the process or just populating the
>> contents of a previous-omitted column doesn't seem particularly
>> relevant.  Similarly you can insert a pseudo security label when the
>> column is missing just as well as you can when it's present but
>> unpopulated.
>
> For system catalogs, we cannot touch its schema with a light heart,
> even if active enhanced security provider is switched or turned on/off.
> If we define a common system column for all the label-based MAC,
> it can be available for both of user tables and system catalogs
> without any table-rewrite process.
>
> But it is an implementation detail not to be concluded at the moment.

Err... well, as I said upthread: "None of this addresses the issue of
doing RLS on system catalogs, which seems like a much harder problem,
possibly one that we should just ignore for the first phase of this
project."  So yeah, I agree: it won't work for system catalogs.

[snip]

>>>>> * Foreign Key constraint(2)
>>>>>
>>>>> FK is implemented as a trigger which internally uses SELECT/UPDATE/DELETE.
>>>>> If associated tuples are filtered out, it breaks reference integrity.
>>>>> So, we have to apply special care. In SE-PgSQL case, it raises an error
>>>>> instead of filtering during FK checks. And, row-level security hook is
>>>>> called at the last for each tuples, unlike normal cases.
>>>> Perfecting referential integrity here seems like a pretty tough
>>>> problem, but it's likely not necessary to solve it in order to get an
>>>> implementation of row-level security that is useful for some purposes.
>>> Is the approach in SE-PgSQL suitable for the issue?
>>> It can prevent to update/delete tuple referenced by invisible tuples.
>>>
>>> We have two modes in row-level security.
>>> The first is filtering-mode. It applies security policy function prior
>>> to any other user given conditions, and filters out violated tuples from
>>> the result set.
>>> The second is aborting-mode. It is only used by internal stuff which does
>>> not provide any malicious function in the condition. It applies security
>>> policy function next to all the WHERE clause, and raises an error if the
>>> query tries to refer violated tuples.
>>
>> Hmm... the idea of having two modes doesn't sound right off the top of
>> my head.  But I think we have a long time before we need worry about
>> this.  We have neither SE-PostgreSQL nor RLS in core, nor are either
>> one anywhere close to being merged.  So worrying about how the two
>> will interact when we have both is putting the cart before the horse.
>> A lot can change between now and then.
>
> IIRC, I've not gotten any opposition about this two-modes design.
> Most of arguments about RLS were information leaks via covert-channels
> which allows us to estimate an existence of invisible PK/FK.
> But we don't define it as a problem to be resolved.

I know that was one of Tom's concerns.  Personally, my concerns are:

1. I want to implement row-level security in a way that is useful for
people who don't care about SE-PostgreSQL.  I think there are lots of
people who would be interested in that.  In fact, as Josh said, there
are probably MORE people who are interested in the constraint-based
approach than there are who want label-based security a la
SE-PostgreSQL.

2. I want to implement row-level security in a way that is very
flexible and allows for a wide range of access control policies.  The
core row-level security mechanism should not care about or prejudge a
particular policy - it should just be a mechanism for enforcing
row-filtering.

3. I want to implement row-level security in a way that allows the
planner maximum flexibility in implementing the row filtering that is
needed in a particular case.  SE-PostgreSQL RLS presumes what is
essentially an additional join against the security table ID for every
table in the query - doing this in a way that allows joins to be
reordered or implemented in multiple ways (straight nestloop, nestloop
with inner indexscan, hash join) will drastically improve performance.The original implementation didn't actually
implementit as a join, 
but rather with special-case code that performed the security ID
lookups as part of the heap scan.  That's not going to work for any
kind of row-level security other than SE-PostgreSQL (so, see points 1
and 2) and it's also going to make the performance much worse than it
needs to be.  Granted, the performance is never going to be GOOD, but
we should try to at least make it not ATROCIOUS.

...Robert


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Hot Standby, release candidate?
Next
From: Robert Haas
Date:
Subject: Re: Hot Standby, release candidate?