Thread: [PATCH] [v8.5] Security checks on largeobjects

[PATCH] [v8.5] Security checks on largeobjects

From
KaiGai Kohei
Date:
This patch adds support DAC security checks on largeobjects.

Enhanced GRANT/REVOKE statement allows to set SELECT (read) and
UPDATE (write) permission on individual largeobjects.

At the creation time, it checks ACL_CREATE on the schema object.
Currently, a largeobject does not have any human readable name and
qualified namespace, we assume "public" namespace here.

At the deletion time, it checks ownership of the largeobject.
Only resource owner and superuser can drop largeobjects.

The ownership and schema can be set using:
  ALTER LARGE OBJECT <lobj> OWNER TO <role>;
  ALTER LARGE OBJECT <lobj> SET SCHEMA <schema>;

The current pg_largeobject system catalog cannot store metadata
of the largeobejcts, its data structure is modified.

  CATALOG(pg_largeobject,2613)
  {
      Oid       lonsp;      /* OID of the namespace */
      Oid       loowner;    /* OID of the owner */
      aclitem   loacl[1];   /* access permissions */
  } FormData_pg_largeobject;

The current pg_largeobejct is renamed to pg_largeobject_data.
One or multiple tuples within pg_largeobject_data points to
a record within pg_largeobject which has a metadata of a
certain largeobject.

  CATALOG(pg_largeobject_data,2966) BKI_WITHOUT_OIDS
  {
      Oid       loid;       /* Identifier of large object */
      int4      pageno;     /* Page number (starting from 0) */
      bytea     data;       /* Data for page (may be zero-length) */
  } FormData_pg_largeobject_data;

Issues:
 * Is ALTER LARGE OBJECT interface suitable?
 * How we should consider the namespace (schema) and the ownership
   of the largeobejct?
 * Is the named large object (including fully qualified one) worth?
   It will enables us to specify a largeobject with human readable
   identifier string.
 * Is the data structure appropriate?
   - As an aside, the pg_largeobject_data has an identical definition
     with TOAST tables. It may be possible to store them within TOAST
     table.
 * If so, it may also resolve other Todo item.
   - Allow read/write into TOAST values like large objects

Memo:
  http://wiki.postgresql.org/wiki/Largeobject_Enhancement

Example:
  postgres=# REVOKE ALL ON LARGE OBJECT 1234 FROM PUBLIC;
  REVOKE
  postgres=# GRANT SELECT ON LARGE OBJECT 1234 TO ymj;
  GRANT
  postgres=# GRANT SELECT,UPDATE ON LARGE OBJECT 1234 TO tak;
  GRANT

  postgres=# \c - ymj
  psql (8.4rc2)
  You are now connected to database "postgres" as user "ymj".
  postgres=> SELECT loread(lo_open(1234, x'40000'::int), 10);
     loread
  ------------
   1234567890
  (1 row)

  postgres=> SELECT lowrite(lo_open(1234, x'20000'::int), 'abcdefg');
  ERROR:  permission denied for largeobject largeobject:1234

  postgres=> \c - tak
  psql (8.4rc2)
  You are now connected to database "postgres" as user "tak".
  postgres=> SELECT lowrite(lo_open(1234, x'20000'::int), 'abcdefg');
   lowrite
  ---------
         7
  (1 row)


Scale of the patch:
  [kaigai@saba ]$ diffstat pgsql-lobj-perms-8.4rc2-r2080.patch
   backend/catalog/Makefile               |    6
   backend/catalog/aclchk.c               |  243 ++++++++++++++++++++++++++++++
   backend/catalog/dependency.c           |   15 +
   backend/catalog/pg_largeobject.c       |  265 ++++++++++++!!!!!!!!!!!!!!!!!!!!
   backend/commands/alter.c               |    9 +
   backend/libpq/be-fsstubs.c             |   25 +++
   backend/parser/gram.y                  |   28 +++
   backend/storage/large_object/inv_api.c |  140 +++------!!!!!!
   backend/tcop/utility.c                 |    6
   backend/utils/adt/acl.c                |    4
   backend/utils/cache/syscache.c         |   13 +
   include/catalog/dependency.h           |    1
   include/catalog/indexing.h             |    7
   include/catalog/pg_largeobject.h       |   21 !!
   include/catalog/pg_largeobject_data.h  |   54 ++++++
   include/catalog/toasting.h             |    1
   include/nodes/parsenodes.h             |    1
   include/utils/acl.h                    |    7
   include/utils/syscache.h               |    1
   test/regress/expected/sanity_check.out |    3
   test/regress/input/largeobject.source  |   95 +++++++++++
   test/regress/output/largeobject.source |  175 +++++++++++++++++++++
   22 files changed, 803 insertions(+), 46 deletions(-), 271 modifications(!)

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: [PATCH] [v8.5] Security checks on largeobjects

From
Bernd Helmle
Date:

--On 26. Juni 2009 13:08:37 +0900 KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

>  * Is the named large object (including fully qualified one) worth?
>    It will enables us to specify a largeobject with human readable
>    identifier string.

I don't understand the notion of this. Does this mean you can create a LO 
with an identifier string, generated from (e.g.) your application?

-- 
Thanks
Bernd


Re: [PATCH] [v8.5] Security checks on largeobjects

From
KaiGai Kohei
Date:
Bernd Helmle wrote:
> --On 26. Juni 2009 13:08:37 +0900 KaiGai Kohei <kaigai@ak.jp.nec.com> 
> wrote:
> 
>>  * Is the named large object (including fully qualified one) worth?
>>    It will enables us to specify a largeobject with human readable
>>    identifier string.
> 
> I don't understand the notion of this. Does this mean you can create a 
> LO with an identifier string, generated from (e.g.) your application?

Yes, it intends to assign an identifier string not only numeric
large object identifier. The identifier string can be qualified
with a certain namespace as follows.

E.g)SELECT lo_open('my_picture01', x'40000'::int);SELECT lo_create('pg_temp.my_musid02');

In the later case, the new largeobject will be reclaimed after
the session closed due to the temporary namespace.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [PATCH] [v8.5] Security checks on largeobjects

From
Bernd Helmle
Date:

--On 29. Juni 2009 08:32:29 +0900 KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> Yes, it intends to assign an identifier string not only numeric
> large object identifier. The identifier string can be qualified
> with a certain namespace as follows.
>
> E.g)
>  SELECT lo_open('my_picture01', x'40000'::int);
>  SELECT lo_create('pg_temp.my_musid02');
>
> In the later case, the new largeobject will be reclaimed after
> the session closed due to the temporary namespace.

I'm not sure about the usefulness of this. While having an identifier for a 
LO is nice, i believe most users store additional metadata about objects 
within their own tables anyways, linking the LO there. Also i doubt there 
is much need for temporary large objects (at least, i have no idea about 
this....).

It might be interesting to dig into your proposal deeper in conjunction 
with TOAST (you've already mentioned this TODO). Having serial access with 
a nice interface into TOAST would be eliminating the need for 
pg_largeobject completely (i'm not a big fan of this one-big-system-table 
approach the old LO interface currently is).

-- 
Thanks
Bernd


Re: [PATCH] [v8.5] Security checks on largeobjects

From
Tom Lane
Date:
Bernd Helmle <mailings@oopsware.de> writes:
> It might be interesting to dig into your proposal deeper in conjunction 
> with TOAST (you've already mentioned this TODO). Having serial access with 
> a nice interface into TOAST would be eliminating the need for 
> pg_largeobject completely (i'm not a big fan of this one-big-system-table 
> approach the old LO interface currently is).

Yeah, it would be more useful probably to fix that than to add
decoration to the LO facility.  Making LO more usable is just going to
encourage people to bump into its other limitations (32-bit OIDs,
32-bit object size, finite maximum size of pg_largeobject, lack of
dead-object cleanup, etc etc).
        regards, tom lane


Re: [PATCH] [v8.5] Security checks on largeobjects

From
KaiGai Kohei
Date:
Tom Lane wrote:
> Bernd Helmle <mailings@oopsware.de> writes:
>> It might be interesting to dig into your proposal deeper in conjunction 
>> with TOAST (you've already mentioned this TODO). Having serial access with 
>> a nice interface into TOAST would be eliminating the need for 
>> pg_largeobject completely (i'm not a big fan of this one-big-system-table 
>> approach the old LO interface currently is).
> 
> Yeah, it would be more useful probably to fix that than to add
> decoration to the LO facility.  Making LO more usable is just going to
> encourage people to bump into its other limitations (32-bit OIDs,
> 32-bit object size, finite maximum size of pg_largeobject, lack of
> dead-object cleanup, etc etc).

The reason why I tried to mention the named largeobject feature is
that dac security checks on largeobject require them to belong to
a certain schema, so I thought it is quite natural to have a string
name. However, obviously, it is not a significant theme for me.

I can also agree your opinion that largeobject interfaces should be
redefined to access partial stuff of TOAST'ed verlena data structure,
not only pg_largeobject.

In this case, we will need a new pg_type.typstorage option which
force to put the given verlena data on external relation without
compression, because we cannot estimate the data offset in inlined
or compressed external verlena data.

I'll try to submit a design within a few days.
Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [PATCH] [v8.5] Security checks on largeobjects

From
KaiGai Kohei
Date:
I concluded that the following issues should be solved when we apply
largeobject-like interfaces on the big toasted data within general
relations, not only pg_largeobject system catalog.

At first, we need to add a new strategy to store the given varlena data
on the external toast relation.
If we try to seek and fetch a certain data chunk, it is necessary to be
computable what chunk stores the required data specified by offset and
length. So, the external chunks should be uncompressed always. It is a
common requirement for both of read and write operations.
If we try to update a part of the toasted data chunks, it should not be
inlined independent from length of the datum, because we need to update
whole the tuple which contains inlined toasted chunks in this case.
If we open the toasted varlena with read-only mode, inlined one does not
prevent anything. It is an issue for only write operation.

I would like to add a new strategy on pg_type.typstorage with the following
characteristics:1. It always stores the given varlena data without any compression.   So, the given data is stored as a
setof fixed-length chunks.2. It always stores the given varlena data on external toast relation.
 

I suggest a new built-in type named BLOB which has an identical definition
to BYTEA type, expect for its attstorage.

Next, a different version of lo_open() should be provided to accept
BLOB type as follows:
 SELECT pictname, lo_open(pictdata, x'20000'::int) FROM my_picture;

It will allocate a largeobject descriptor for the given BLOB data,
and user can read and write using loread() and lowrite() interfaces.

issue: In this case, should it hold the relation handler and locks on the "my_picture" relation, not only its toast
relation?
issue: Should the lo_open() with read-only mode be available on the existing TEXT or BYTEA types? I could not find any
reasonto deny them.
 

Next, pg_largeobject system catalog can be redefined using the BLOB
type as follows:
 CATALOG(pg_largeobject,2613) {     Oid         loowner;        /* OID of the largeobject owner */     Oid
lonsp;         /* OID of the largeobject namespace */     aclitem     loacl[1];       /* access permissions */     blob
      lodata;         /* contents of the largeobject */ } FormData_pg_largeobject;
 

The existing largeobject interfaces perform on pg_largeobject.lodata
specified by largeobject identifier.
Rest of metadata can be used for access control purpose.

Thanks,

KaiGai Kohei wrote:
> Tom Lane wrote:
>> Bernd Helmle <mailings@oopsware.de> writes:
>>> It might be interesting to dig into your proposal deeper in conjunction 
>>> with TOAST (you've already mentioned this TODO). Having serial access with 
>>> a nice interface into TOAST would be eliminating the need for 
>>> pg_largeobject completely (i'm not a big fan of this one-big-system-table 
>>> approach the old LO interface currently is).
>> Yeah, it would be more useful probably to fix that than to add
>> decoration to the LO facility.  Making LO more usable is just going to
>> encourage people to bump into its other limitations (32-bit OIDs,
>> 32-bit object size, finite maximum size of pg_largeobject, lack of
>> dead-object cleanup, etc etc).
> 
> The reason why I tried to mention the named largeobject feature is
> that dac security checks on largeobject require them to belong to
> a certain schema, so I thought it is quite natural to have a string
> name. However, obviously, it is not a significant theme for me.
> 
> I can also agree your opinion that largeobject interfaces should be
> redefined to access partial stuff of TOAST'ed verlena data structure,
> not only pg_largeobject.
> 
> In this case, we will need a new pg_type.typstorage option which
> force to put the given verlena data on external relation without
> compression, because we cannot estimate the data offset in inlined
> or compressed external verlena data.
> 
> I'll try to submit a design within a few days.
> Thanks,


-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [PATCH] [v8.5] Security checks on largeobjects

From
KaiGai Kohei
Date:
I could find one more issue when we apply largeobject-style interfaces
on generic toasted varlena data.

When we fetch a toasted datum, it scans the pg_toast_%u with SnapshotToast,
because it assumes any toasted chunks don't have multiple versions, and
visibility of the toast pointer always means visibility of the toast chunks.

However, if we provide largeobject-style interfaces which allow partial
updates on toasted varlena, it seems to me this assumption will get being
incorrect.

Is there any good idea?

KaiGai Kohei wrote:
> I concluded that the following issues should be solved when we apply
> largeobject-like interfaces on the big toasted data within general
> relations, not only pg_largeobject system catalog.
> 
> At first, we need to add a new strategy to store the given varlena data
> on the external toast relation.
> If we try to seek and fetch a certain data chunk, it is necessary to be
> computable what chunk stores the required data specified by offset and
> length. So, the external chunks should be uncompressed always. It is a
> common requirement for both of read and write operations.
> If we try to update a part of the toasted data chunks, it should not be
> inlined independent from length of the datum, because we need to update
> whole the tuple which contains inlined toasted chunks in this case.
> If we open the toasted varlena with read-only mode, inlined one does not
> prevent anything. It is an issue for only write operation.
> 
> I would like to add a new strategy on pg_type.typstorage with the following
> characteristics:
>  1. It always stores the given varlena data without any compression.
>     So, the given data is stored as a set of fixed-length chunks.
>  2. It always stores the given varlena data on external toast relation.
> 
> I suggest a new built-in type named BLOB which has an identical definition
> to BYTEA type, expect for its attstorage.
> 
> Next, a different version of lo_open() should be provided to accept
> BLOB type as follows:
> 
>   SELECT pictname, lo_open(pictdata, x'20000'::int) FROM my_picture;
> 
> It will allocate a largeobject descriptor for the given BLOB data,
> and user can read and write using loread() and lowrite() interfaces.
> 
> issue:
>   In this case, should it hold the relation handler and locks on the
>   "my_picture" relation, not only its toast relation?
> issue:
>   Should the lo_open() with read-only mode be available on the existing
>   TEXT or BYTEA types? I could not find any reason to deny them.
> 
> Next, pg_largeobject system catalog can be redefined using the BLOB
> type as follows:
> 
>   CATALOG(pg_largeobject,2613)
>   {
>       Oid         loowner;        /* OID of the largeobject owner */
>       Oid         lonsp;          /* OID of the largeobject namespace */
>       aclitem     loacl[1];       /* access permissions */
>       blob        lodata;         /* contents of the largeobject */
>   } FormData_pg_largeobject;
> 
> The existing largeobject interfaces perform on pg_largeobject.lodata
> specified by largeobject identifier.
> Rest of metadata can be used for access control purpose.
> 
> Thanks,
> 
> KaiGai Kohei wrote:
>> Tom Lane wrote:
>>> Bernd Helmle <mailings@oopsware.de> writes:
>>>> It might be interesting to dig into your proposal deeper in conjunction 
>>>> with TOAST (you've already mentioned this TODO). Having serial access with 
>>>> a nice interface into TOAST would be eliminating the need for 
>>>> pg_largeobject completely (i'm not a big fan of this one-big-system-table 
>>>> approach the old LO interface currently is).
>>> Yeah, it would be more useful probably to fix that than to add
>>> decoration to the LO facility.  Making LO more usable is just going to
>>> encourage people to bump into its other limitations (32-bit OIDs,
>>> 32-bit object size, finite maximum size of pg_largeobject, lack of
>>> dead-object cleanup, etc etc).
>> The reason why I tried to mention the named largeobject feature is
>> that dac security checks on largeobject require them to belong to
>> a certain schema, so I thought it is quite natural to have a string
>> name. However, obviously, it is not a significant theme for me.
>>
>> I can also agree your opinion that largeobject interfaces should be
>> redefined to access partial stuff of TOAST'ed verlena data structure,
>> not only pg_largeobject.
>>
>> In this case, we will need a new pg_type.typstorage option which
>> force to put the given verlena data on external relation without
>> compression, because we cannot estimate the data offset in inlined
>> or compressed external verlena data.
>>
>> I'll try to submit a design within a few days.
>> Thanks,
> 
> 


-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [PATCH] [v8.5] Security checks on largeobjects

From
KaiGai Kohei
Date:
Joshua,

I found your name as a reviewer at the commitfest.postgresql.org.

However, I don't think the initial proposal of the largeobject
security is now on the state to be reviewed seriously.

In my preference, it should be reworked based on the TOASTing
approach as discussed in this thread previously, if we can
find a reasonable solution for the issue I raised before.

KaiGai Kohei wrote:
> I could find one more issue when we apply largeobject-style interfaces
> on generic toasted varlena data.
> 
> When we fetch a toasted datum, it scans the pg_toast_%u with SnapshotToast,
> because it assumes any toasted chunks don't have multiple versions, and
> visibility of the toast pointer always means visibility of the toast chunks.
> 
> However, if we provide largeobject-style interfaces which allow partial
> updates on toasted varlena, it seems to me this assumption will get being
> incorrect.
> 
> Is there any good idea?

The largeobject interface uses SnapshotNow for writable accesses, and
GetActiveSnapshot() for read-only accesses, but toast_fetch_datum()
uses SnapshotToast to scan the toast relation.

It seems to me SnapshotToast depends on an assumption that tuples
within TOAST relation does not have any multiple versions.
When we update a toast value, TOAST mechanism inserts whole of
variable length datum with a new chunk_id, and older chunks are
removed at toast_delete_datum().
The TOAST pointer is updated to the new chunk_id, and its visibility
is under MVCC controls.

The source code comments at HeapTupleSatisfiesToast() says as follows:

/** HeapTupleSatisfiesToast*      True iff heap tuple is valid as a TOAST row.** This is a simplified version that only
checksfor VACUUM moving conditions.* It's appropriate for TOAST usage because TOAST really doesn't want to do* its own
timequal checks; if you can see the main table row that contains* a TOAST reference, you should be able to see the
TOASTedvalue.  However,* vacuuming a TOAST table is independent of the main table, and in case such* a vacuum fails
partwaythrough, we'd better do this much checking.** Among other things, this means you can't do UPDATEs of rows in a
TOAST*table.*/
 

If largeobjects-like interface is available to update a part of TOAST
values, we cannot keep the assumption.

I would like to conclude the issue before reworking it.

Is there any good idea?
If we follow the current largeobject manner, what problems can be found?

Thanks,

> KaiGai Kohei wrote:
>> I concluded that the following issues should be solved when we apply
>> largeobject-like interfaces on the big toasted data within general
>> relations, not only pg_largeobject system catalog.
>>
>> At first, we need to add a new strategy to store the given varlena data
>> on the external toast relation.
>> If we try to seek and fetch a certain data chunk, it is necessary to be
>> computable what chunk stores the required data specified by offset and
>> length. So, the external chunks should be uncompressed always. It is a
>> common requirement for both of read and write operations.
>> If we try to update a part of the toasted data chunks, it should not be
>> inlined independent from length of the datum, because we need to update
>> whole the tuple which contains inlined toasted chunks in this case.
>> If we open the toasted varlena with read-only mode, inlined one does not
>> prevent anything. It is an issue for only write operation.
>>
>> I would like to add a new strategy on pg_type.typstorage with the following
>> characteristics:
>>  1. It always stores the given varlena data without any compression.
>>     So, the given data is stored as a set of fixed-length chunks.
>>  2. It always stores the given varlena data on external toast relation.
>>
>> I suggest a new built-in type named BLOB which has an identical definition
>> to BYTEA type, expect for its attstorage.
>>
>> Next, a different version of lo_open() should be provided to accept
>> BLOB type as follows:
>>
>>   SELECT pictname, lo_open(pictdata, x'20000'::int) FROM my_picture;
>>
>> It will allocate a largeobject descriptor for the given BLOB data,
>> and user can read and write using loread() and lowrite() interfaces.
>>
>> issue:
>>   In this case, should it hold the relation handler and locks on the
>>   "my_picture" relation, not only its toast relation?
>> issue:
>>   Should the lo_open() with read-only mode be available on the existing
>>   TEXT or BYTEA types? I could not find any reason to deny them.
>>
>> Next, pg_largeobject system catalog can be redefined using the BLOB
>> type as follows:
>>
>>   CATALOG(pg_largeobject,2613)
>>   {
>>       Oid         loowner;        /* OID of the largeobject owner */
>>       Oid         lonsp;          /* OID of the largeobject namespace */
>>       aclitem     loacl[1];       /* access permissions */
>>       blob        lodata;         /* contents of the largeobject */
>>   } FormData_pg_largeobject;
>>
>> The existing largeobject interfaces perform on pg_largeobject.lodata
>> specified by largeobject identifier.
>> Rest of metadata can be used for access control purpose.
>>
>> Thanks,
>>
>> KaiGai Kohei wrote:
>>> Tom Lane wrote:
>>>> Bernd Helmle <mailings@oopsware.de> writes:
>>>>> It might be interesting to dig into your proposal deeper in conjunction 
>>>>> with TOAST (you've already mentioned this TODO). Having serial access with 
>>>>> a nice interface into TOAST would be eliminating the need for 
>>>>> pg_largeobject completely (i'm not a big fan of this one-big-system-table 
>>>>> approach the old LO interface currently is).
>>>> Yeah, it would be more useful probably to fix that than to add
>>>> decoration to the LO facility.  Making LO more usable is just going to
>>>> encourage people to bump into its other limitations (32-bit OIDs,
>>>> 32-bit object size, finite maximum size of pg_largeobject, lack of
>>>> dead-object cleanup, etc etc).
>>> The reason why I tried to mention the named largeobject feature is
>>> that dac security checks on largeobject require them to belong to
>>> a certain schema, so I thought it is quite natural to have a string
>>> name. However, obviously, it is not a significant theme for me.
>>>
>>> I can also agree your opinion that largeobject interfaces should be
>>> redefined to access partial stuff of TOAST'ed verlena data structure,
>>> not only pg_largeobject.
>>>
>>> In this case, we will need a new pg_type.typstorage option which
>>> force to put the given verlena data on external relation without
>>> compression, because we cannot estimate the data offset in inlined
>>> or compressed external verlena data.
>>>
>>> I'll try to submit a design within a few days.
>>> Thanks,
>>
> 
> 


-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [PATCH] [v8.5] Security checks on largeobjects

From
Joshua Tolley
Date:
On Thu, Jul 16, 2009 at 01:49:14PM +0900, KaiGai Kohei wrote:
> Joshua,
>
> I found your name as a reviewer at the commitfest.postgresql.org.
>
> However, I don't think the initial proposal of the largeobject
> security is now on the state to be reviewed seriously.
>
> In my preference, it should be reworked based on the TOASTing
> approach as discussed in this thread previously, if we can
> find a reasonable solution for the issue I raised before.

For whatever it's worth, I consider my capability limited to making sure the
patch applies and coming up with a few interesting ways of testing it out, but
not seriously studying the code and knowing when there might be a competitive
alternative implementation. I don't yet understand the issues that have been
raised, but will quit working to review the patch if you feel it's not ready.
Thanks for letting me know; I hope a solution to the problems you've brought
up is forthcoming.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: [PATCH] [v8.5] Security checks on largeobjects

From
Robert Haas
Date:
2009/7/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> However, I don't think the initial proposal of the largeobject
> security is now on the state to be reviewed seriously.

OK, I am moving this patch to returned with feedback.

...Robert


Re: [PATCH] [v8.5] Security checks on largeobjects

From
KaiGai Kohei
Date:
Robert Haas wrote:
> 2009/7/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
>> However, I don't think the initial proposal of the largeobject
>> security is now on the state to be reviewed seriously.
> 
> OK, I am moving this patch to returned with feedback.

If possible, I would like to have a discussion to make consensus
about its design and interfaces before submitting my patch to
the next commit fest.

I'll submit its design proposal (including a few issues) again.
Is there anyone interested in?

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [PATCH] [v8.5] Security checks on largeobjects

From
KaiGai Kohei
Date:
I summarized the design proposal and issues currently we have.

I would like to see any comments corresponding to the proposition.
Especially, selection of the snapshot is a headach issue for me.

----------------
This project tries to solve two items listed at: http://wiki.postgresql.org/wiki/Todo#Binary_Data
* Add security checks for large objects* Allow read/write into TOAST values like large objects

= Introduction =

We need to associate a metadata for a certain largeobject to
implement security checks for largeobjects. However, the data
structure of largeobjects are not suitable to manage its
metadata (such as owner identifier, database acls ...) on
a certain largeobject, because a largeobject is stored as
separated page frames in the pg_largeobject system catalog.
Thus, we need to revise the data structure to manage a certain
largeobject.

An interesting fact is a similarity of data structure between
TOAST table and pg_lageobject.
A TOAST relation is declared as follows: pg_toast_%u (     chunk_id        oid,     chunk_seq       int4,
chunk_data     bytea,     unique(chunk_id, chunk_seq) )
 

Definition of the pg_largeobject is as follows: pg_largeobject(     loid            oid,     pageno          int4,
data           bytea,     unique(loid, pageno) )
 

They have an identical data structure, so it is quite natural
to utilize TOAST mechanism to store pagef rames of largeobject.

= Design =

In my plan, role of the pg_largeobject will be changed to
manage metadata of largeobjects, and it will be redefined
as follows:
 CATALOG(pg_largeobject,2613) {     Oid        loowner;     /* OID of the owner */     Oid        lonsp;       /* OID
ofthe namespace */     aclitem    loacl[1];    /* ACL of the largeobject */     Blob       lodata;      /* Contents of
thelargeobject */ } FormData_pg_largeobejct;
 

For access controls purpose, its ownership, namespace and ACLs
are necessary. In addition, the Blob is a new type which support
to read/write a its partial TOAST value.

The current lo_xxx() interfaces will perform as a wrapper function
to access a certain pg_largeobject.lodata identified by a largeobject
handler. The loread(), lowrite() or similar interfaces will support
partial accesses on the Blob type. It enables user defined relation
to contain large data using TOAST mechanism, with reasonable resource
comsumption. (Note that TOAST replaces whole of the chunks with same
identifier, even if it changes just a single byte.)

= Interfaces =

== New type ==

We need a new variable length type that has the following feature,
to allow users partial accesses.

* It always use external TOAST table, independent from its size.If toasted data is stored as inline, we cannot update
itindependent
 
from the main table.It does not prevent partial read, but meaningless because inlined
data is enough small.

* It always store the data without any compression.We cannot easily compute required data offset on the compresseddata.
Allthe toasted data need to be uncompressed, for both ofreader and writer access.
 

== lo_xxx() interfaces ==

A new version of loread() and lowrite() are necessary to access
a part of toasted data within user defined tables. It can be defined
as follows:
 loread(Blob data, int32 offset, int32 length) lowrite(Blob data, int32 offset, Bytea data)

== GRANT/REVOKE ==

When we access traditional largeobjects, reader permission (SELECT)
or writer permission (UPDATE) should be checked on accesses.

The GRANT/REVOKE statements are enhanced as follows:
 GRANT SELECT ON LARGE OBJECT 1234 TO kaigai;

It allows "kaigai" to read the largeobject: 1234.

= Issues =

== New pg_type.typstorage ==

The variable length data is always necessary to be stored in external
storage and uncompressed. The existing typstorage does not satisfies
the requirement, so we need to add a new pg_type.typstorage strategy.

The new typstorage strategy forces:- It always stores the given varlena data on external toast relation.- It always
storesthe given varlena data without any compression.
 

It will give us performance loss, so existing Text or Bytea will be
more suitable to store variable length data being not very large.

== Snapshot ==

The largeobject interface uses SnapshotNow for writable accesses, and
GetActiveSnapshot() for read-only accesses, but toast_fetch_datum()
uses SnapshotToast to scan the toast relation.

It seems to me SnapshotToast depends on an assumption that tuples
within TOAST relation does not have any multiple versions.
When we update a toast value, TOAST mechanism inserts whole of
variable length datum with a new chunk_id, and older chunks are
removed at toast_delete_datum().
The TOAST pointer is updated to the new chunk_id, and its visibility
is under MVCC controls.

The source code comments at HeapTupleSatisfiesToast() says as follows:

/** HeapTupleSatisfiesToast*      True iff heap tuple is valid as a TOAST row.** This is a simplified version that only
checksfor VACUUM moving conditions.* It's appropriate for TOAST usage because TOAST really doesn't want to do* its own
timequal checks; if you can see the main table row that contains* a TOAST reference, you should be able to see the
TOASTedvalue.  However,* vacuuming a TOAST table is independent of the main table, and in case such* a vacuum fails
partwaythrough, we'd better do this much checking.** Among other things, this means you can't do UPDATEs of rows in a
TOAST*table.*/
 

If largeobjects-like interface is available to update a part of TOAST
values, we cannot keep the assumption.

At the beginning, I have a plan to apply the result od GetActiveSnapshot()
to fetch toasted value. Is there any matter which can be expected?

-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [PATCH] [v8.5] Security checks on largeobjects

From
KaiGai Kohei
Date:
Nobody may remember, I also proposed a patch to support access
controls on largeobject.

It was suggested that largeobject stores its contents on the TOAST
relations and provides interfaces to read/write them partially.

For example:
> == lo_xxx() interfaces ==
>
> A new version of loread() and lowrite() are necessary to access
> a part of toasted data within user defined tables. It can be defined
> as follows:
>
>   loread(Blob data, int32 offset, int32 length)
>   lowrite(Blob data, int32 offset, Bytea data)

But, I found an another problem when we tried to update TOAST chunks
partially.

Because the toast pointer contains the total size of a series of
chunks, not only chunk_id and toastrelid, we cannot expand the size
of TOAST'ed data.

See below, toast pointer is defined as follows.
 struct varatt_external {     int32       va_rawsize;     /* Original data size (includes header) */     int32
va_extsize;    /* External saved size (doesn't) */     Oid         va_valueid;     /* Unique ID of value within TOAST
table*/     Oid         va_toastrelid;  /* RelID of TOAST table containing it */ };
 

These fields are inlined within the main tuple, and the contents
of TOAST datum is stored in the toast relation in separation.

If we can update the toast data partially with expanding its total size
(e.g, write 10,000 bytes from the offset 6,000 of the 12,000 bytes verlena),
it is also necessary to update the main tuple, because its va_rawsize and
va_extsize should be also updates, although va_valueid and va_toastrelid
can keep as is.

I wonder why the va_extsize and va_rawsize should be included within the
toast pointer. It is fundamentally computable using a pair of va_valueid
and va_toastrelid.
(I guess it is due to the performance reason. If we need to check whether
it is compressed or not, a flag varible can provide enough hint.)

So, we need to consider more how to implement partial writable verlena
on the user tables.

However, as far as existing largeobject interfaces, I believe it is possible
to implement them using partial writable verlena.
Because largeobject identifier is obvious when we user lo_*() interfaces,
so we can update the main tuple within pg_largeobject also, if the total
length of the verlena was changed on partial writes.

Thus, I would like to implement the feature with focusing on the existing
largeobject interfaces at the first step.
Partial writable TOAST in the user relation will come on the next.
(It also enables to keep the patch size smaller than all-in-a-patch.)

In summary, I try to implement it on the next commit fest.
- GRANT / REVOKE on largeobjects
- DAC access controls on largeobejcts
- It puts the contents of largeobject on TOAST relation. (including hole support)
- It provides largeobejct interfaces to write it partially.

Thanks,

KaiGai Kohei wrote:
> I summarized the design proposal and issues currently we have.
> 
> I would like to see any comments corresponding to the proposition.
> Especially, selection of the snapshot is a headach issue for me.
> 
> ----------------
> This project tries to solve two items listed at:
>   http://wiki.postgresql.org/wiki/Todo#Binary_Data
> 
>  * Add security checks for large objects
>  * Allow read/write into TOAST values like large objects
> 
> = Introduction =
> 
> We need to associate a metadata for a certain largeobject to
> implement security checks for largeobjects. However, the data
> structure of largeobjects are not suitable to manage its
> metadata (such as owner identifier, database acls ...) on
> a certain largeobject, because a largeobject is stored as
> separated page frames in the pg_largeobject system catalog.
> Thus, we need to revise the data structure to manage a certain
> largeobject.
> 
> An interesting fact is a similarity of data structure between
> TOAST table and pg_lageobject.
> A TOAST relation is declared as follows:
>   pg_toast_%u (
>       chunk_id        oid,
>       chunk_seq       int4,
>       chunk_data      bytea,
>       unique(chunk_id, chunk_seq)
>   )
> 
> Definition of the pg_largeobject is as follows:
>   pg_largeobject(
>       loid            oid,
>       pageno          int4,
>       data            bytea,
>       unique(loid, pageno)
>   )
> 
> They have an identical data structure, so it is quite natural
> to utilize TOAST mechanism to store pagef rames of largeobject.
> 
> = Design =
> 
> In my plan, role of the pg_largeobject will be changed to
> manage metadata of largeobjects, and it will be redefined
> as follows:
> 
>   CATALOG(pg_largeobject,2613)
>   {
>       Oid        loowner;     /* OID of the owner */
>       Oid        lonsp;       /* OID of the namespace */
>       aclitem    loacl[1];    /* ACL of the largeobject */
>       Blob       lodata;      /* Contents of the largeobject */
>   } FormData_pg_largeobejct;
> 
> For access controls purpose, its ownership, namespace and ACLs
> are necessary. In addition, the Blob is a new type which support
> to read/write a its partial TOAST value.
> 
> The current lo_xxx() interfaces will perform as a wrapper function
> to access a certain pg_largeobject.lodata identified by a largeobject
> handler. The loread(), lowrite() or similar interfaces will support
> partial accesses on the Blob type. It enables user defined relation
> to contain large data using TOAST mechanism, with reasonable resource
> comsumption. (Note that TOAST replaces whole of the chunks with same
> identifier, even if it changes just a single byte.)
> 
> = Interfaces =
> 
> == New type ==
> 
> We need a new variable length type that has the following feature,
> to allow users partial accesses.
> 
> * It always use external TOAST table, independent from its size.
>  If toasted data is stored as inline, we cannot update it independent
> from the main table.
>  It does not prevent partial read, but meaningless because inlined
> data is enough small.
> 
> * It always store the data without any compression.
>  We cannot easily compute required data offset on the compressed
>  data. All the toasted data need to be uncompressed, for both of
>  reader and writer access.
> 
> == lo_xxx() interfaces ==
> 
> A new version of loread() and lowrite() are necessary to access
> a part of toasted data within user defined tables. It can be defined
> as follows:
> 
>   loread(Blob data, int32 offset, int32 length)
>   lowrite(Blob data, int32 offset, Bytea data)
> 
> == GRANT/REVOKE ==
> 
> When we access traditional largeobjects, reader permission (SELECT)
> or writer permission (UPDATE) should be checked on accesses.
> 
> The GRANT/REVOKE statements are enhanced as follows:
> 
>   GRANT SELECT ON LARGE OBJECT 1234 TO kaigai;
> 
> It allows "kaigai" to read the largeobject: 1234.
> 
> = Issues =
> 
> == New pg_type.typstorage ==
> 
> The variable length data is always necessary to be stored in external
> storage and uncompressed. The existing typstorage does not satisfies
> the requirement, so we need to add a new pg_type.typstorage strategy.
> 
> The new typstorage strategy forces:
>  - It always stores the given varlena data on external toast relation.
>  - It always stores the given varlena data without any compression.
> 
> It will give us performance loss, so existing Text or Bytea will be
> more suitable to store variable length data being not very large.
> 
> == Snapshot ==
> 
> The largeobject interface uses SnapshotNow for writable accesses, and
> GetActiveSnapshot() for read-only accesses, but toast_fetch_datum()
> uses SnapshotToast to scan the toast relation.
> 
> It seems to me SnapshotToast depends on an assumption that tuples
> within TOAST relation does not have any multiple versions.
> When we update a toast value, TOAST mechanism inserts whole of
> variable length datum with a new chunk_id, and older chunks are
> removed at toast_delete_datum().
> The TOAST pointer is updated to the new chunk_id, and its visibility
> is under MVCC controls.
> 
> The source code comments at HeapTupleSatisfiesToast() says as follows:
> 
> /*
>  * HeapTupleSatisfiesToast
>  *      True iff heap tuple is valid as a TOAST row.
>  *
>  * This is a simplified version that only checks for VACUUM moving conditions.
>  * It's appropriate for TOAST usage because TOAST really doesn't want to do
>  * its own time qual checks; if you can see the main table row that contains
>  * a TOAST reference, you should be able to see the TOASTed value.  However,
>  * vacuuming a TOAST table is independent of the main table, and in case such
>  * a vacuum fails partway through, we'd better do this much checking.
>  *
>  * Among other things, this means you can't do UPDATEs of rows in a TOAST
>  * table.
>  */
> 
> If largeobjects-like interface is available to update a part of TOAST
> values, we cannot keep the assumption.
> 
> At the beginning, I have a plan to apply the result od GetActiveSnapshot()
> to fetch toasted value. Is there any matter which can be expected?
> 


-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>