Thread: [PATCH] Largeobject Access Controls (r2432)

[PATCH] Largeobject Access Controls (r2432)

From
KaiGai Kohei
Date:
The attached patch is a revised version of large object privileges
based on the feedbacks at the last commit fest.

List of updates:
* Rebased to the latest CVS HEAD
* Add pg_largeobject_aclcheck_snapshot() interface.
  In the read-only access mode, large object feature uses
  query's snaphot, not only SnapshotNow. This behavior also
  should be applied on accesses to its metadata.
  When it makes its access control decision, it has to fetch
  database acls from the system catalog. In this time, we scan
  the pg_largeobject_metadata with a snapshot of large object
  descriptor. (Note that it is SnapshotNow in read-write mode.)
  It enables to resolve the matter when access rights are changed
  during large objects are opened.
* Add pg_dump support.
* Replace all the "largeobject" by "large object" from
  user visible messages and documentation.
* Remove ac_largeobject_*() routines because we decided
  not to share same entry points between DAC and MAC.
* Add a description of large object privileges in SGML.
* Add a definition of pg_largeobject_metadata in SGML.
* \lo_list command supports both of v8.5 and prior version.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: [PATCH] Largeobject Access Controls (r2432)

From
Jaime Casanova
Date:
2009/11/12 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> The attached patch is a revised version of large object privileges
> based on the feedbacks at the last commit fest.
>

please update the patch, it's giving an error when 'make check' is
trying to "create template1" in initdb:

creating template1 database in
/home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
File: "md.c", Line: 254)
child process was terminated by signal 6: Aborted


Meanwhile, i will make some comments:

This manual will be specific for 8.5 so i think all mentions to the
version should be removed
for example;

+    In this version, a large object has OID of its owner, access permissions
+    and OID of the largeobject itself.

+         Prior to the version 8.5.x release does not have any
privilege checks on
+   large objects.

the parameter name (large_object_privilege_checks) is confusing enough
that we have to make this statements to clarify... let's think in a
better less confuse name
+         Please note that it is not equivalent to disable all the security
+         checks corresponding to large objects.
+         For example, the <literal>lo_import()</literal> and
+         <literal>lo_export</literal> need superuser privileges independent
+         from this setting as prior versions were doing.

this will not be off by default? it should be for compatibility
reasons... i remember there was a discussion about that but can't
remember the conclusion

Mmm... One of them? the first?
+     The one is <literal>SELECT</literal>.

+     Even if a transaction modified access rights and commit it, it is
+     not invisible from other transaction which already opened the large
+     object.

The other one, the second
+     The other is <literal>UPDATE</literal>.


it seems there is an "are" that should not be there :)
+
+     These functions are originally requires database superuser privilege,
+     and it allows to bypass the default database privilege checks, so
+     we don't need to check an obvious test twice.

a typo, obviously
+        For largeo bjects, this privilege also allows to read from
+        the target large object.


We have two versions of these functions one that a recieve an SnapShot
parameter and other that don't...
what is the rationale of this? AFAIU, the one that doesn't receive
SnapShot is calling the other one with SnapShotNow, can't we simply
call it that way and drop the version of the functions that doesn't
have that parameter?
+ pg_largeobject_aclmask(Oid lobj_oid, Oid roleid,
+                      AclMode mask, AclMaskHow how)

+ pg_largeobject_aclcheck(Oid lobj_oid, Oid roleid, AclMode mode)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


Re: [PATCH] Largeobject Access Controls (r2432)

From
Robert Haas
Date:
On Thu, Dec 3, 2009 at 12:49 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
> This manual will be specific for 8.5 so i think all mentions to the
> version should be removed

Not sure I agree on this point.  We have similar mentions elsewhere.

...Robert


Re: [PATCH] Largeobject Access Controls (r2432)

From
Greg Smith
Date:
Robert Haas wrote: <blockquote cite="mid:603c8f070912030953u625784fme8ff46184c3a1831@mail.gmail.com" type="cite"><pre
wrap="">OnThu, Dec 3, 2009 at 12:49 PM, Jaime Casanova
 
<a class="moz-txt-link-rfc2396E" href="mailto:jcasanov@systemguards.com.ec"><jcasanov@systemguards.com.ec></a>
wrote:</pre><blockquote type="cite"><pre wrap="">This manual will be specific for 8.5 so i think all mentions to the
 
version should be removed   </pre></blockquote><pre wrap="">
Not sure I agree on this point.  We have similar mentions elsewhere. </pre></blockquote> In this particular example,
it'sbad form because it's even possible that 8.5 will actually be 9.0.  You don't want to refer to a version number
thatdoesn't even exist for sure yet, lest it leave a loose end that needs to be cleaned up later if that number is
changedbefore release.<br /><br /> Rewriting in terms like "in earlier versions..." instead is one approach.  Then
peoplewill have to manually scan earlier docs to sort that out, I know I end up doing that all the time.  If you want
tokeep the note specific, saying "in 8.4 and earlier versions [old behavior]" is better than "before 8.5 [old
behavior]"because it only mentions version numbers that are historical rather than future.<br /><br /><pre
class="moz-signature"cols="72">-- 
 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
<a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>  <a
class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.com">www.2ndQuadrant.com</a>
 
</pre>

Re: [PATCH] Largeobject Access Controls (r2432)

From
Robert Haas
Date:
On Thu, Dec 3, 2009 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Robert Haas wrote:
> On Thu, Dec 3, 2009 at 12:49 PM, Jaime Casanova
> <jcasanov@systemguards.com.ec> wrote:
>
> This manual will be specific for 8.5 so i think all mentions to the
> version should be removed
>
> Not sure I agree on this point.  We have similar mentions elsewhere.
>
> In this particular example, it's bad form because it's even possible that
> 8.5 will actually be 9.0.  You don't want to refer to a version number that
> doesn't even exist for sure yet, lest it leave a loose end that needs to be
> cleaned up later if that number is changed before release.
>
> Rewriting in terms like "in earlier versions..." instead is one approach.
> Then people will have to manually scan earlier docs to sort that out, I know
> I end up doing that all the time.  If you want to keep the note specific,
> saying "in 8.4 and earlier versions [old behavior]" is better than "before
> 8.5 [old behavior]" because it only mentions version numbers that are
> historical rather than future.

Ah, yes, I like "In 8.4 and earlier versions", or maybe "earlier
releases".  Compare:

http://www.postgresql.org/docs/8.4/static/sql-copy.html#AEN55855
http://www.postgresql.org/docs/8.4/static/runtime-config-logging.html#GUC-LOG-FILENAME

...Robert


Re: [PATCH] Largeobject Access Controls (r2432)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 3, 2009 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> In this particular example, it's bad form because it's even possible that
>> 8.5 will actually be 9.0.� You don't want to refer to a version number that
>> doesn't even exist for sure yet, lest it leave a loose end that needs to be
>> cleaned up later if that number is changed before release.

> Ah, yes, I like "In 8.4 and earlier versions", or maybe "earlier
> releases".  Compare:

Please do *not* resort to awkward constructions just to avoid one
mention of the current version number.  If we did decide to call the
next version 9.0, the search-and-replace effort involved is not going
to be measurably affected by any one usage.  There are plenty already.

(I did the work when we decided to call 7.5 8.0, so I know whereof
I speak.)
        regards, tom lane


Re: [PATCH] Largeobject Access Controls (r2432)

From
Robert Haas
Date:
On Thu, Dec 3, 2009 at 2:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Dec 3, 2009 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>>> In this particular example, it's bad form because it's even possible that
>>> 8.5 will actually be 9.0.  You don't want to refer to a version number that
>>> doesn't even exist for sure yet, lest it leave a loose end that needs to be
>>> cleaned up later if that number is changed before release.
>
>> Ah, yes, I like "In 8.4 and earlier versions", or maybe "earlier
>> releases".  Compare:
>
> Please do *not* resort to awkward constructions just to avoid one
> mention of the current version number.  If we did decide to call the
> next version 9.0, the search-and-replace effort involved is not going
> to be measurably affected by any one usage.  There are plenty already.
>
> (I did the work when we decided to call 7.5 8.0, so I know whereof
> I speak.)

I agree that search and replace isn't that hard, but I don't find the
proposed construction awkward, and we have various uses of it in the
docs already.  Actually the COPY one is not quite clear whether it
means <= 7.3 or < 7.3.  I think we're just aiming for consistency here
as much as anything.

...Robert


Re: [PATCH] Largeobject Access Controls (r2432)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I agree that search and replace isn't that hard, but I don't find the
> proposed construction awkward, and we have various uses of it in the
> docs already.  Actually the COPY one is not quite clear whether it
> means <= 7.3 or < 7.3.  I think we're just aiming for consistency here
> as much as anything.

Well, the problem is that "<= 8.4" is confusing as to whether it
includes 8.4.n.  You and I know that it does because we know we
don't make feature changes in minor releases, but this is not
necessarily obvious to everyone.  "< 8.5" is much less ambiguous.
        regards, tom lane


Re: [PATCH] Largeobject Access Controls (r2432)

From
Robert Haas
Date:
On Thu, Dec 3, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I agree that search and replace isn't that hard, but I don't find the
>> proposed construction awkward, and we have various uses of it in the
>> docs already.  Actually the COPY one is not quite clear whether it
>> means <= 7.3 or < 7.3.  I think we're just aiming for consistency here
>> as much as anything.
>
> Well, the problem is that "<= 8.4" is confusing as to whether it
> includes 8.4.n.  You and I know that it does because we know we
> don't make feature changes in minor releases, but this is not
> necessarily obvious to everyone.  "< 8.5" is much less ambiguous.

Ah.  I would not have considered that, but it does make sense.

...Robert


Re: [PATCH] Largeobject Access Controls (r2432)

From
Greg Smith
Date:
Robert Haas wrote:
> I agree that search and replace isn't that hard, but I don't find the
> proposed construction awkward, and we have various uses of it in the
> docs already.  Actually the COPY one is not quite clear whether it
> means <= 7.3 or < 7.3. 
>   
Yeah, I wouldn't have suggested it if it made the wording particularly 
difficult in the process.  I don't know what your issue with the COPY 
one is:

"The following syntax was used before PostgreSQL version 7.3 and is 
still supported"

I can't parse that as anything other than "<7.3"; now sure how can 
someone read that to be "<="?

In any case, the two examples you gave are certainly good for showing 
the standard practices used here.  Specific version numbers are strewn 
all about, and if there's commits mentioning 8.5 already in there one  
more won't hurt.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: [PATCH] Largeobject Access Controls (r2432)

From
KaiGai Kohei
Date:
Jaime Casanova wrote:
> 2009/11/12 KaiGai Kohei <kaigai@ak.jp.nec.com>:
>> The attached patch is a revised version of large object privileges
>> based on the feedbacks at the last commit fest.
>>
> 
> please update the patch, it's giving an error when 'make check' is
> trying to "create template1" in initdb:
> 
> creating template1 database in
> /home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
> ... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
> File: "md.c", Line: 254)
> child process was terminated by signal 6: Aborted

I could not reproduce it.
Could you run "make clean", then "make check"?
Various kind of patches are merged under the commit fest, so some of them
changes definition of structures. If *.o files are already built based on
older definitions, it may refers incorrect addresses.


> Meanwhile, i will make some comments:
> 
> This manual will be specific for 8.5 so i think all mentions to the
> version should be removed
> for example;
> 
> +    In this version, a large object has OID of its owner, access permissions
> +    and OID of the largeobject itself.
> 
> +         Prior to the version 8.5.x release does not have any
> privilege checks on
> +   large objects.

The conclusion is unclear for me.

Is "In the 8.4.x and prior release, ..." an ambiguous expression?          ^^^^^

> the parameter name (large_object_privilege_checks) is confusing enough
> that we have to make this statements to clarify... let's think in a
> better less confuse name
> +         Please note that it is not equivalent to disable all the security
> +         checks corresponding to large objects.
> +         For example, the <literal>lo_import()</literal> and
> +         <literal>lo_export</literal> need superuser privileges independent
> +         from this setting as prior versions were doing.

In the last commit fest, it was named "largeobject_compat_acl",
but it is not preferable for Tom Lane, so he suggested to rename it
into "large_object_privilege_checks".

Other candidates:- lo_compat_privileges  (<- my preference in this four)- large_object_compat_privs-
lo_compat_access_control-large_object_compat_ac
 

I think "_compat_" should be contained to emphasize it is a compatibility
option.


> this will not be off by default? it should be for compatibility
> reasons... i remember there was a discussion about that but can't
> remember the conclusion

IIRC, we have no discussion about its default value, although similar topics
were discussed:

* what should be checked on creation of a large object?-> No need to check permission on its creation. It allows
everyoneto create   a new large object like current implementation.   (Also note that this behavior may be changed in
thefuture.)
 

* DELETE should be checked on deletion of a large object?-> No. PgSQL checks ownership of the database objects on its
deletionsuch   as DROP TABLE. The DELETE permission is checked when we delete contents   of a certain database object,
notthe database object itself.
 

> Mmm... One of them? the first?
> +     The one is <literal>SELECT</literal>.
> 
> +     Even if a transaction modified access rights and commit it, it is
> +     not invisible from other transaction which already opened the large
> +     object.
> 
> The other one, the second
> +     The other is <literal>UPDATE</literal>.

I have no arguments about English expression.

BTW, "The one is ..., the other is ..." was a statement on textbook
to introduce two things. :-)

> it seems there is an "are" that should not be there :)
> +
> +     These functions are originally requires database superuser privilege,
> +     and it allows to bypass the default database privilege checks, so
> +     we don't need to check an obvious test twice.
> 
> a typo, obviously
> +        For largeo bjects, this privilege also allows to read from
> +        the target large object.

Thanks, I see.

> We have two versions of these functions one that a recieve an SnapShot
> parameter and other that don't...
> what is the rationale of this? AFAIU, the one that doesn't receive
> SnapShot is calling the other one with SnapShotNow, can't we simply
> call it that way and drop the version of the functions that doesn't
> have that parameter?
> + pg_largeobject_aclmask(Oid lobj_oid, Oid roleid,
> +                      AclMode mask, AclMaskHow how)
> 
> + pg_largeobject_aclcheck(Oid lobj_oid, Oid roleid, AclMode mode)

We have no reason other than cosmetic rationale.

In the current implementation, all the caller of pg_largeobejct_aclcheck_*()
needs to provides correct snapshot including SnapshotNow when read-writable.
When pg_aclmask() calls pg_largeobject_aclmask(), it is the only case that
caller assumes SnapshotNow shall be applied implicitly.

On the other hand, all the case when pg_largeobject_ownercheck() is called,
the caller assumes SnapshotNow is applied, so we don't have multiple versions.

So, I'll reorganize these APIs as follows:- pg_largeobject_aclmask_snapshot()- pg_largeobject_aclcheck_snapshot()-
pg_largeobject_ownercheck()

Thanks, please wait for revised revision.
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [PATCH] Largeobject Access Controls (r2432)

From
Itagaki Takahiro
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> > creating template1 database in
> > /home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
> > ... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
> > File: "md.c", Line: 254)
> > child process was terminated by signal 6: Aborted
> 
> I could not reproduce it.

I had the same trap before when I mistakenly used duplicated oids.
Don't you add a new catalog with existing oids?
src/include/catalog/duplicate_oids might be a help.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: [PATCH] Largeobject Access Controls (r2432)

From
KaiGai Kohei
Date:
Itagaki Takahiro wrote:
> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> 
>>> creating template1 database in
>>> /home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
>>> ... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
>>> File: "md.c", Line: 254)
>>> child process was terminated by signal 6: Aborted
>> I could not reproduce it.
> 
> I had the same trap before when I mistakenly used duplicated oids.
> Don't you add a new catalog with existing oids?
> src/include/catalog/duplicate_oids might be a help.

Thanks, Bingo!

toasting.h:DECLARE_TOAST(pg_trigger, 2336, 2337);

pg_largeobject_metadata.h:CATALOG(pg_largeobject_metadata,2336)

-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


[PATCH] Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
The attached patch is an updated revision of Largeobject Access Controls.

List of updates:
* rebased to the latest CVS HEAD

* SGML documentation fixes:
  - The future version number was replaced as:
    "In the 8.4.x series and earlier release, ..."
  - Other strange English representations and typoes were fixed.

* Fixed OID conflicts in system catalog definition.
  The new TOAST relation for pg_trigger used same OID number with
  pg_largeobject_metadata.

* Fixed incorrect error code in pg_largeobject_ownercheck().
  It raised _UNDEFINED_FUNCTION, but should be _UNDEFINED_OBJECT.

* Renamed GUC parameter to "lo_compat_privileges" from
  "large_object_privilege_checks".

* pg_largeobject_aclmask() and pg_largeobject_aclcheck(), not
  take an argument of snapshot, were removed.
  Currently, the caller provide an appropriate snapshot them.

Thanks,

Jaime Casanova wrote:
> 2009/11/12 KaiGai Kohei <kaigai@ak.jp.nec.com>:
>> The attached patch is a revised version of large object privileges
>> based on the feedbacks at the last commit fest.
>>
>
> please update the patch, it's giving an error when 'make check' is
> trying to "create template1" in initdb:
>
> creating template1 database in
> /home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1
> ... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))",
> File: "md.c", Line: 254)
> child process was terminated by signal 6: Aborted
>
>
> Meanwhile, i will make some comments:
>
> This manual will be specific for 8.5 so i think all mentions to the
> version should be removed
> for example;
>
> +    In this version, a large object has OID of its owner, access permissions
> +    and OID of the largeobject itself.
>
> +         Prior to the version 8.5.x release does not have any
> privilege checks on
> +   large objects.
>
> the parameter name (large_object_privilege_checks) is confusing enough
> that we have to make this statements to clarify... let's think in a
> better less confuse name
> +         Please note that it is not equivalent to disable all the security
> +         checks corresponding to large objects.
> +         For example, the <literal>lo_import()</literal> and
> +         <literal>lo_export</literal> need superuser privileges independent
> +         from this setting as prior versions were doing.
>
> this will not be off by default? it should be for compatibility
> reasons... i remember there was a discussion about that but can't
> remember the conclusion
>
> Mmm... One of them? the first?
> +     The one is <literal>SELECT</literal>.
>
> +     Even if a transaction modified access rights and commit it, it is
> +     not invisible from other transaction which already opened the large
> +     object.
>
> The other one, the second
> +     The other is <literal>UPDATE</literal>.
>
>
> it seems there is an "are" that should not be there :)
> +
> +     These functions are originally requires database superuser privilege,
> +     and it allows to bypass the default database privilege checks, so
> +     we don't need to check an obvious test twice.
>
> a typo, obviously
> +        For largeo bjects, this privilege also allows to read from
> +        the target large object.
>
>
> We have two versions of these functions one that a recieve an SnapShot
> parameter and other that don't...
> what is the rationale of this? AFAIU, the one that doesn't receive
> SnapShot is calling the other one with SnapShotNow, can't we simply
> call it that way and drop the version of the functions that doesn't
> have that parameter?
> + pg_largeobject_aclmask(Oid lobj_oid, Oid roleid,
> +                      AclMode mask, AclMaskHow how)
>
> + pg_largeobject_aclcheck(Oid lobj_oid, Oid roleid, AclMode mode)
>


--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: [PATCH] Largeobject Access Controls (r2460)

From
Greg Smith
Date:
I just looked over the latest version of this patch and it seems to 
satisfy all the issues suggested by the initial review.  This looks like 
it's ready for a committer from a quality perspective and I'm going to 
mark it as such.

I have a guess what some of the first points of discussion are going to 
be though, so might as well raise them here.  This patch is 2.8K lines 
of code that's in a lot of places:  a mix of full new functions, tweaks 
to existing ones, docs, regression tests, it's a well structured but 
somewhat heavy bit of work.  One obvious questions is whether there's 
enough demand for access controls on large objects to justify adding the 
complexity involved to do so.  A second thing I'm concerned about is 
what implications this change would have for in-place upgrades.  If 
there's demand and it's not going to cause upgrade issues, then we just 
need to find a committer willing to chew on it.  I think those are the 
main hurdles left for this patch.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: [PATCH] Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
Greg Smith wrote:
> I just looked over the latest version of this patch and it seems to 
> satisfy all the issues suggested by the initial review.  This looks like 
> it's ready for a committer from a quality perspective and I'm going to 
> mark it as such.

Thanks for your efforts.

> I have a guess what some of the first points of discussion are going to 
> be though, so might as well raise them here.  This patch is 2.8K lines 
> of code that's in a lot of places:  a mix of full new functions, tweaks 
> to existing ones, docs, regression tests, it's a well structured but 
> somewhat heavy bit of work.  One obvious questions is whether there's 
> enough demand for access controls on large objects to justify adding the 
> complexity involved to do so.

At least, it is a todo item in the community: http://wiki.postgresql.org/wiki/Todo#Binary_Data

Apart from SELinux, it is quite natural to apply any access controls on
binary data. If we could not have any valid access controls, users will
not want to store their sensitive information, such as confidential PDF
files, as a large object.

> A second thing I'm concerned about is 
> what implications this change would have for in-place upgrades.  If 
> there's demand and it's not going to cause upgrade issues, then we just 
> need to find a committer willing to chew on it.  I think those are the 
> main hurdles left for this patch.

I guess we need to create an empty entry with a given OID into the
pg_largeobject_metadata for each large objects when we try to upgrade
in-place from 8.4.x or earlier release to the upcoming release.
However, no format changes in the pg_largeobject catalog, including
an empty large object, so I guess we need a small amount of additional
support in pg_dump to create empty metadata.

I want any suggestion about here.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: [PATCH] Largeobject Access Controls (r2460)

From
Jaime Casanova
Date:
On Sun, Dec 6, 2009 at 11:19 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> I just looked over the latest version of this patch and it seems to satisfy
> all the issues suggested by the initial review.  This looks like it's ready
> for a committer from a quality perspective and I'm going to mark it as such.
>

yes. i have just finished my tests and seems like the patch is working
just fine...

BTW, seems like KaiGai miss this comment in
src/backend/catalog/pg_largeobject.c when renaming the parameter
* large_object_privilege_checks is not refered here,

i still doesn't like the name but we have changed it a lot of times so
if anyone has a better idea now is when you have to speak

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


Re: [PATCH] Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
Jaime Casanova wrote:
> On Sun, Dec 6, 2009 at 11:19 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> I just looked over the latest version of this patch and it seems to satisfy
>> all the issues suggested by the initial review.  This looks like it's ready
>> for a committer from a quality perspective and I'm going to mark it as such.
>>
>
> yes. i have just finished my tests and seems like the patch is working
> just fine...
>
> BTW, seems like KaiGai miss this comment in
> src/backend/catalog/pg_largeobject.c when renaming the parameter
> * large_object_privilege_checks is not refered here,
>
> i still doesn't like the name but we have changed it a lot of times so
> if anyone has a better idea now is when you have to speak

Oops, it should be fixed to "lo_compat_privileges".
This comment also have version number issue, so I fixed it as follows:

BEFORE:
    /*
     * large_object_privilege_checks is not refered here,
     * because it is a compatibility option, but we don't
     * have ALTER LARGE OBJECT prior to the v8.5.0.
     */

AFTER:
     /*
      * The 'lo_compat_privileges' is not checked here, because we
      * don't have any access control features in the 8.4.x series
      * or earlier release.
      * So, it is not a place we can define a compatible behavior.
      */

Nothing are changed in other codes, including something corresponding to
in-place upgrading. I'm waiting for suggestion.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: [PATCH] Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> Apart from SELinux, it is quite natural to apply any access
> controls on binary data. If we could not have any valid access
> controls, users will not want to store their sensitive
> information, such as confidential PDF files, as a large object.
Absolutely.  The historical security issues for large objects
immediately eliminated them as a possible place to store PDF files.
-Kevin


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
Hi, I'm reviewing LO-AC patch.

KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> Nothing are changed in other codes, including something corresponding to
> in-place upgrading. I'm waiting for suggestion.

I have a question about the behavior -- the patch adds ownership
management of large objects. Non-privileged users cannot read, write,
or drop othere's LOs. But they can read the contents of large object
when they read pg_catalog.pg_largeobject directly. Even if the patch
is applied, we still allow "SELECT * FROM pg_largeobject" ...right?

This issue might be solved by the core SE-PgSQL patch,
but what should we do fo now?


Other changes in the patch seem to be reasonable.

"GRANT/REVOKE ON LARGE OBJECT <number>" might be hard to use if used alone,
but we can use the commands as dynamic SQLs in DO statements if we want to
grant or revoke privileges in bulk.

"SELECT oid FROM pg_largeobject_metadata" is used in some places instead of
"SELECT DISTINCT loid FROM pg_largeobject". They return the same result,
but the former will be faster because we don't use DISTINCT. pg_dump will
be slightly accelerated by the new query.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
Takahiro Itagaki wrote:
> Hi, I'm reviewing LO-AC patch.
>
> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
>> Nothing are changed in other codes, including something corresponding to
>> in-place upgrading. I'm waiting for suggestion.
>
> I have a question about the behavior -- the patch adds ownership
> management of large objects. Non-privileged users cannot read, write,
> or drop othere's LOs. But they can read the contents of large object
> when they read pg_catalog.pg_largeobject directly. Even if the patch
> is applied, we still allow "SELECT * FROM pg_largeobject" ...right?
>
> This issue might be solved by the core SE-PgSQL patch,
> but what should we do fo now?

Oops, I forgot to fix it.

It is a misconfiguration on database initialization, and not related
issue with SE-PgSQL feature.

It can be solved with revoking any privileges from anybody in the initdb
phase. So, we should inject the following statement for setup_privileges().

  REVOKE ALL ON pg_largeobject FROM PUBLIC;

In the default PG model, database superuser is an exception in access
controls, so he can bypass the checks eventually. Here is no difference,
even if he can see pg_largeobject.
For unprivileged users, this configuration restricts the way to access
large object into lo_*() functions, so we can acquire all their accesses
and apply permission checks comprehensively.

When database superuser tries to consider something malicious, such as
exposing any large object to public, we have to give up anything.

> Other changes in the patch seem to be reasonable.
>
> "GRANT/REVOKE ON LARGE OBJECT <number>" might be hard to use if used alone,
> but we can use the commands as dynamic SQLs in DO statements if we want to
> grant or revoke privileges in bulk.

We already have COMMENT ON LARGE OBJECT <number> IS <comment>; statement :-)

> "SELECT oid FROM pg_largeobject_metadata" is used in some places instead of
> "SELECT DISTINCT loid FROM pg_largeobject". They return the same result,
> but the former will be faster because we don't use DISTINCT. pg_dump will
> be slightly accelerated by the new query.
>
> Regards,
> ---
> Takahiro Itagaki
> NTT Open Source Software Center
>
>
>


--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>
*** base/src/bin/initdb/initdb.c    2009-11-21 05:52:12.000000000 +0900
--- blob/src/bin/initdb/initdb.c    2009-12-13 06:33:55.000000000 +0900
*************** setup_privileges(void)
*** 1783,1788 ****
--- 1783,1789 ----
          "  WHERE relkind IN ('r', 'v', 'S') AND relacl IS NULL;\n",
          "GRANT USAGE ON SCHEMA pg_catalog TO PUBLIC;\n",
          "GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n",
+         "REVOKE ALL ON pg_largeobject FROM PUBLIC;\n",
          NULL
      };


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> > we still allow "SELECT * FROM pg_largeobject" ...right?
> 
> It can be solved with revoking any privileges from anybody in the initdb
> phase. So, we should inject the following statement for setup_privileges().
> 
>   REVOKE ALL ON pg_largeobject FROM PUBLIC;

OK, I'll add the following description in the documentation of pg_largeobject.
  <structname>pg_largeobject</structname> should not be readable by the  public, since the catalog contains data in
largeobjects of all users.  <structname>pg_largeobject_metadata</> is a publicly readable catalog  that only contains
identifiersof large objects.
 

> {"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS,
>     gettext_noop("Turn on/off privilege checks on large objects."),

The description is true, but gives a confusion because
"lo_compat_privileges = on" means "privilege checks are turned off".
 short desc: Enables backward compatibility in privilege checks on large objects long desc: When turned on, privilege
checkson large objects are disabled.
 

Are those descriptions appropriate?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
Takahiro Itagaki wrote:
> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> 
>>> we still allow "SELECT * FROM pg_largeobject" ...right?
>> It can be solved with revoking any privileges from anybody in the initdb
>> phase. So, we should inject the following statement for setup_privileges().
>>
>>   REVOKE ALL ON pg_largeobject FROM PUBLIC;
> 
> OK, I'll add the following description in the documentation of pg_largeobject.
> 
>    <structname>pg_largeobject</structname> should not be readable by the
>    public, since the catalog contains data in large objects of all users.
>    <structname>pg_largeobject_metadata</> is a publicly readable catalog
>    that only contains identifiers of large objects.

It makes sense to me.

>> {"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS,
>>     gettext_noop("Turn on/off privilege checks on large objects."),
> 
> The description is true, but gives a confusion because
> "lo_compat_privileges = on" means "privilege checks are turned off".
> 
>   short desc: Enables backward compatibility in privilege checks on large objects
>   long desc: When turned on, privilege checks on large objects are disabled.
> 
> Are those descriptions appropriate?

The long description is a bit confusable, because it does not disable all the
privilege checks, such as lo_export/lo_import which already checks superuser
privilege on execution in the earlier releases.

What's your opinion about:
 long desc: When turned on, privilege checks on large objects perform with            backward compatibility as 8.4.x
orearlier releases.
 

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> What's your opinion about:
>   long desc: When turned on, privilege checks on large objects perform with
>              backward compatibility as 8.4.x or earlier releases.

I updated the description as your suggest.

Applied with minor editorialization,
mainly around tab-completion support in psql.

# This is my first commit :)

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> OK, I'll add the following description in the documentation of pg_largeobject.

>    <structname>pg_largeobject</structname> should not be readable by the
>    public, since the catalog contains data in large objects of all users.

This is going to be a problem, because it will break applications that
expect to be able to read pg_largeobject.  Like, say, pg_dump.
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
Tom Lane wrote:
> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
>> OK, I'll add the following description in the documentation of pg_largeobject.
> 
>>    <structname>pg_largeobject</structname> should not be readable by the
>>    public, since the catalog contains data in large objects of all users.
> 
> This is going to be a problem, because it will break applications that
> expect to be able to read pg_largeobject.  Like, say, pg_dump.

Is it a right behavior, even if we have permission checks on large objects?

If so, we can inject a hardwired rule to prevent to select pg_largeobject
when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC.

-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
Jaime Casanova
Date:
2009/12/10 KaiGai Kohei <kaigai@ak.jp.nec.com>:
>
> If so, we can inject a hardwired rule to prevent to select pg_largeobject
> when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC.
>

it doesn't seem like a good idea to make that GUC act like a GRANT or
REVOKE on the case of pg_largeobject.
besides if a normal user can read from pg_class why we deny pg_largeobject

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> Tom Lane wrote:
> > Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> >>    <structname>pg_largeobject</structname> should not be readable by the
> >>    public, since the catalog contains data in large objects of all users.
> > 
> > This is going to be a problem, because it will break applications that
> > expect to be able to read pg_largeobject.  Like, say, pg_dump.
> 
> Is it a right behavior, even if we have permission checks on large objects?

Can we use column-level access control here?
  REVOKE ALL ON pg_largeobject FROM PUBLIC;
=> GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;

We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
replace pg_largeobject_metadata instead if we try to fix only pg_dump,
but it's no wonder that any other user applications use such queries.
I think to allow reading loid is a balanced solution.

> If so, we can inject a hardwired rule to prevent to select pg_largeobject
> when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC.

Is it enough to run "GRANT SELECT ON pg_largeobject TO PUBLIC" ?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
Takahiro Itagaki wrote:
> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> 
>> Tom Lane wrote:
>>> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
>>>>    <structname>pg_largeobject</structname> should not be readable by the
>>>>    public, since the catalog contains data in large objects of all users.
>>> This is going to be a problem, because it will break applications that
>>> expect to be able to read pg_largeobject.  Like, say, pg_dump.
>> Is it a right behavior, even if we have permission checks on large objects?
> 
> Can we use column-level access control here?
> 
>    REVOKE ALL ON pg_largeobject FROM PUBLIC;
> => GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;

Indeed, it seems to me reasonable.

> We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
> replace pg_largeobject_metadata instead if we try to fix only pg_dump,
> but it's no wonder that any other user applications use such queries.
> I think to allow reading loid is a balanced solution.

Right, I also remind this query has to be fixed up by other reason right now.
If all the large objects are empty, this query can return nothing, even if
large object entries are in pg_largeobject_metadata.

Please wait for a while.

>> If so, we can inject a hardwired rule to prevent to select pg_largeobject
>> when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC.
> 
> Is it enough to run "GRANT SELECT ON pg_largeobject TO PUBLIC" ?

Agreed.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
Jaime Casanova <jcasanov@systemguards.com.ec> wrote:

> besides if a normal user can read from pg_class why we deny pg_largeobject

pg_class and pg_largeobject_metadata contain only metadata of objects.
Tables and pg_largeobject contain actual data of the objects. A normal user
can read pg_class, but cannot read contents of tables without privilege.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
KaiGai Kohei wrote:
> Takahiro Itagaki wrote:
>> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
>>
>>> Tom Lane wrote:
>>>> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
>>>>>    <structname>pg_largeobject</structname> should not be readable by the
>>>>>    public, since the catalog contains data in large objects of all users.
>>>> This is going to be a problem, because it will break applications that
>>>> expect to be able to read pg_largeobject.  Like, say, pg_dump.
>>> Is it a right behavior, even if we have permission checks on large objects?
>> Can we use column-level access control here?
>>
>>    REVOKE ALL ON pg_largeobject FROM PUBLIC;
>> => GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;
>
> Indeed, it seems to me reasonable.
>
>> We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
>> replace pg_largeobject_metadata instead if we try to fix only pg_dump,
>> but it's no wonder that any other user applications use such queries.
>> I think to allow reading loid is a balanced solution.
>
> Right, I also remind this query has to be fixed up by other reason right now.
> If all the large objects are empty, this query can return nothing, even if
> large object entries are in pg_largeobject_metadata.
>
> Please wait for a while.

The attached patch fixes these matters.

* It adds "GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;" during initdb
  phase to resolve the matter pointed out.

* A few queries in pg_dump were fixed to select pg_largeobject_metadata
  instead of pg_largeobject. If a dumpable large obejct is empty (it means
  no page frames are on pg_largeobject), pg_dump misunderstand no such
  large object is here.
  We have to reference pg_largeobject_metadata to check whether a certain
  large objct exists, or not.

Thanks,

$ diffstat ~/pgsql-blob-priv-fix.patch
 doc/src/sgml/catalogs.sgml               |    3 !!!
 src/bin/initdb/initdb.c                  |    1 +
 src/bin/pg_dump/pg_dump.c                |    8 !!!!!!!!
 src/test/regress/expected/privileges.out |   15 +++++++++++++++
 src/test/regress/sql/privileges.sql      |    8 ++++++++
 5 files changed, 24 insertions(+), 11 modifications(!)
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>
*** base/doc/src/sgml/catalogs.sgml    (revision 2467)
--- base/doc/src/sgml/catalogs.sgml    (working copy)
***************
*** 3136,3142 ****

    <para>
     <structname>pg_largeobject</structname> should not be readable by the
!    public, since the catalog contains data in large objects of all users.
     <structname>pg_largeobject_metadata</> is a publicly readable catalog
     that only contains identifiers of large objects.
    </para>
--- 3136,3143 ----

    <para>
     <structname>pg_largeobject</structname> should not be readable by the
!    public (except for <structfield>loid</structfield>), since the catalog
!    contains data in large objects of all users.
     <structname>pg_largeobject_metadata</> is a publicly readable catalog
     that only contains identifiers of large objects.
    </para>
*** base/src/test/regress/sql/privileges.sql    (revision 2467)
--- base/src/test/regress/sql/privileges.sql    (working copy)
***************
*** 565,570 ****
--- 565,578 ----
  SELECT lo_unlink(1002);
  SELECT lo_export(1001, '/dev/null');            -- to be denied

+ -- don't allow unpriv users to access pg_largeobject contents
+ \c -
+ SELECT * FROM pg_largeobject LIMIT 0;
+
+ SET SESSION AUTHORIZATION regressuser1;
+ SELECT * FROM pg_largeobject LIMIT 0;            -- to be denied
+ SELECT loid FROM pg_largeobject LIMIT 0;
+
  -- test default ACLs
  \c -

*** base/src/test/regress/expected/privileges.out    (revision 2467)
--- base/src/test/regress/expected/privileges.out    (working copy)
***************
*** 1041,1046 ****
--- 1041,1061 ----
  SELECT lo_export(1001, '/dev/null');            -- to be denied
  ERROR:  must be superuser to use server-side lo_export()
  HINT:  Anyone can use the client-side lo_export() provided by libpq.
+ -- don't allow unpriv users to access pg_largeobject contents
+ \c -
+ SELECT * FROM pg_largeobject LIMIT 0;
+  loid | pageno | data
+ ------+--------+------
+ (0 rows)
+
+ SET SESSION AUTHORIZATION regressuser1;
+ SELECT * FROM pg_largeobject LIMIT 0;            -- to be denied
+ ERROR:  permission denied for relation pg_largeobject
+ SELECT loid FROM pg_largeobject LIMIT 0;
+  loid
+ ------
+ (0 rows)
+
  -- test default ACLs
  \c -
  CREATE SCHEMA testns;
*** base/src/bin/initdb/initdb.c    (revision 2467)
--- base/src/bin/initdb/initdb.c    (working copy)
***************
*** 1784,1789 ****
--- 1784,1790 ----
          "GRANT USAGE ON SCHEMA pg_catalog TO PUBLIC;\n",
          "GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n",
          "REVOKE ALL ON pg_largeobject FROM PUBLIC;\n",
+         "GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;\n",
          NULL
      };

*** base/src/bin/pg_dump/pg_dump.c    (revision 2467)
--- base/src/bin/pg_dump/pg_dump.c    (working copy)
***************
*** 1945,1951 ****
      selectSourceSchema("pg_catalog");

      /* Check for BLOB OIDs */
!     if (AH->remoteVersion >= 70100)
          blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
      else
          blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
--- 1945,1953 ----
      selectSourceSchema("pg_catalog");

      /* Check for BLOB OIDs */
!     if (AH->remoteVersion >= 80500)
!         blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1";
!     else if (AH->remoteVersion >= 70100)
          blobQry = "SELECT loid FROM pg_largeobject LIMIT 1";
      else
          blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1";
***************
*** 1981,1987 ****
      selectSourceSchema("pg_catalog");

      /* Cursor to get all BLOB OIDs */
!     if (AH->remoteVersion >= 70100)
          blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject";
      else
          blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";
--- 1983,1991 ----
      selectSourceSchema("pg_catalog");

      /* Cursor to get all BLOB OIDs */
!     if (AH->remoteVersion >= 80500)
!         blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_largeobject_metadata";
!     else if (AH->remoteVersion >= 70100)
          blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject";
      else
          blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";

Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> The attached patch fixes these matters.

I'll start to check it.

>   We have to reference pg_largeobject_metadata to check whether a certain
>   large objct exists, or not.

What is the situation where there is a row in pg_largeobject_metadata
and no corresponding rows in pg_largeobject? Do we have a method to
delete all rows in pg_largeobject but leave some metadata?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
Takahiro Itagaki wrote:
> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> 
>> The attached patch fixes these matters.
> 
> I'll start to check it.

Thanks,

>>   We have to reference pg_largeobject_metadata to check whether a certain
>>   large objct exists, or not.
> 
> What is the situation where there is a row in pg_largeobject_metadata
> and no corresponding rows in pg_largeobject? Do we have a method to
> delete all rows in pg_largeobject but leave some metadata?

It is a case when we create a new large object, but write nothing.
 postgres=# SELECT lo_create(1234);  lo_create -----------       1234 (1 row)
 postgres=# SELECT * FROM pg_largeobject_metadata WHERE oid = 1234;  lomowner | lomacl ----------+--------        10 |
(1row)
 
 postgres=# SELECT * FROM pg_largeobject WHERE loid = 1234;  loid | pageno | data ------+--------+------ (0 rows)

In this case, the following two queries are not equivalent.* SELECT oid FROM pg_largeobject_metadata* SELECT DISTINCT
loidFROM pg_largeobject
 

The second query does not return the loid of empty large objects.

The prior implementation inserted a zero-length page to show here is
a large object with this loid, but it got unnecessary with this
enhancement.
If we need compatibility in this level, we can insert a zero-length
page into pg_largeobject on LargeObjectCreate().
It is harmless, but its worth is uncertain.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> >>   We have to reference pg_largeobject_metadata to check whether a certain
> >>   large objct exists, or not.
> It is a case when we create a new large object, but write nothing.

OK, that makes sense.

In addition of the patch, we also need to fix pg_restore with
--clean option. I added DropBlobIfExists() in pg_backup_db.c.

A revised patch attached. Please check further mistakes.


BTW, we can optimize lo_truncate because we allow metadata-only large
objects. inv_truncate() doesn't have to update the first data tuple to
be zero length. It only has to delete all corresponding tuples like as:
    DELETE FROM pg_largeobject WHERE loid = {obj_desc->id}

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


Attachment

Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote:

> In addition of the patch, we also need to fix pg_restore with
> --clean option. I added DropBlobIfExists() in pg_backup_db.c.
>
> A revised patch attached. Please check further mistakes.

...and here is an additional fix for contrib modules.


Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


Attachment

Re: Largeobject Access Controls (r2460)

From
Bruce Momjian
Date:
KaiGai Kohei wrote:
> Takahiro Itagaki wrote:
> > KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> > 
> >> Tom Lane wrote:
> >>> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> >>>>    <structname>pg_largeobject</structname> should not be readable by the
> >>>>    public, since the catalog contains data in large objects of all users.
> >>> This is going to be a problem, because it will break applications that
> >>> expect to be able to read pg_largeobject.  Like, say, pg_dump.
> >> Is it a right behavior, even if we have permission checks on large objects?
> > 
> > Can we use column-level access control here?
> > 
> >    REVOKE ALL ON pg_largeobject FROM PUBLIC;
> > => GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;
> 
> Indeed, it seems to me reasonable.
> 
> > We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
> > replace pg_largeobject_metadata instead if we try to fix only pg_dump,
> > but it's no wonder that any other user applications use such queries.
> > I think to allow reading loid is a balanced solution.
> 
> Right, I also remind this query has to be fixed up by other reason right now.
> If all the large objects are empty, this query can return nothing, even if
> large object entries are in pg_largeobject_metadata.

"metadata" seems very vague.  Can't we come up with a more descriptive
name?

Also, how will this affect pg_migrator?  pg_migrator copies
pg_largeobject and its index from the old to the new server.  Is the
format inside pg_largeobject changed by this patch?  What happens when
there is no entry in pg_largeobject_metadata for a specific row?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
Bruce Momjian さんは書きました:
> KaiGai Kohei wrote:
>> Takahiro Itagaki wrote:
>>> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
>>>
>>>> Tom Lane wrote:
>>>>> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
>>>>>>    <structname>pg_largeobject</structname> should not be readable by the
>>>>>>    public, since the catalog contains data in large objects of all users.
>>>>> This is going to be a problem, because it will break applications that
>>>>> expect to be able to read pg_largeobject.  Like, say, pg_dump.
>>>> Is it a right behavior, even if we have permission checks on large objects?
>>> Can we use column-level access control here?
>>>
>>>    REVOKE ALL ON pg_largeobject FROM PUBLIC;
>>> => GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;
>> Indeed, it seems to me reasonable.
>>
>>> We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could
>>> replace pg_largeobject_metadata instead if we try to fix only pg_dump,
>>> but it's no wonder that any other user applications use such queries.
>>> I think to allow reading loid is a balanced solution.
>> Right, I also remind this query has to be fixed up by other reason right now.
>> If all the large objects are empty, this query can return nothing, even if
>> large object entries are in pg_largeobject_metadata.
> 
> "metadata" seems very vague.  Can't we come up with a more descriptive
> name?

What about "property"?
The "metadata" was the suggested name from Robert Haas at the last
commit fest, because we may store any other properties of a large
object in this catalog future.

> Also, how will this affect pg_migrator?  pg_migrator copies
> pg_largeobject and its index from the old to the new server.  Is the
> format inside pg_largeobject changed by this patch?

The format of pg_largeobject was not touched.

> What happens when
> there is no entry in pg_largeobject_metadata for a specific row?

In this case, these rows become orphan.
So, I think we need to create an empty large object with same LOID on
pg_migrator. It makes an entry on pg_largeobject_metadata without
writing anything to the pg_largeobject.
I guess rest of migrations are not difference. Correct?

Thanks,



Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
Takahiro Itagaki wrote:
> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> 
>>>>   We have to reference pg_largeobject_metadata to check whether a certain
>>>>   large objct exists, or not.
>> It is a case when we create a new large object, but write nothing.
> 
> OK, that makes sense.
> 
> In addition of the patch, we also need to fix pg_restore with
> --clean option. I added DropBlobIfExists() in pg_backup_db.c.
> 
> A revised patch attached. Please check further mistakes.

+ void
+ DropBlobIfExists(ArchiveHandle *AH, Oid oid)
+ {
+   const char *lo_relname;
+   const char *lo_colname;
+
+   if (PQserverVersion(AH->connection) >= 80500)
+   {
+       lo_relname = "pg_largeobject_metadata";
+       lo_colname = "oid";
+   }
+   else
+   {
+       lo_relname = "pg_largeobject";
+       lo_colname = "loid";
+   }
+
+   /* Call lo_unlink only if exists to avoid not-found error. */
+   ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.%s WHERE %s = '%u') THEN pg_catalog.lo_unlink('%u')
END;\n",
+            lo_relname, lo_colname, oid, oid);
+ }

I think the following approach is more reasonable for the current design.
  if (PQserverVersion(AH->connection) >= 80500)  {      /* newer query */      ahprintf(AH, "SELECT
pg_catalog.lo_unlink(oid)"                   "FROM pg_catalog.pg_largeobject_metadata "                   "WHERE oid =
%u;\n",oid);  }  else  {      /* original query */      ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM
pg_catalog.pg_largeobjectWHERE loid = '%u') "                   "THEN pg_catalog.lo_unlink('%u') END;\n", oid, oid);
}

We don't have any reason why still CASE ... WHEN and subquery for the given
LOID. Right?

The fix-lo-contrib.patch looks good for me.

> BTW, we can optimize lo_truncate because we allow metadata-only large
> objects. inv_truncate() doesn't have to update the first data tuple to
> be zero length. It only has to delete all corresponding tuples like as:
>     DELETE FROM pg_largeobject WHERE loid = {obj_desc->id}

Right, when inv_truncate takes an aligned length by LOBLKSIZE.

I'll also submit a small patch on CF-Jan, OK?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: Largeobject Access Controls (r2460)

From
Bruce Momjian
Date:
KaiGai Kohei wrote:
> > What happens when
> > there is no entry in pg_largeobject_metadata for a specific row?
>
> In this case, these rows become orphan.
> So, I think we need to create an empty large object with same LOID on
> pg_migrator. It makes an entry on pg_largeobject_metadata without
> writing anything to the pg_largeobject.
> I guess rest of migrations are not difference. Correct?

Agreed.  I have modified pg_migrator with the attached patch which
creates a script that adds default permissions for all large object
tables.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
? tools
? log
? src/pg_migrator
Index: src/info.c
===================================================================
RCS file: /cvsroot/pg-migrator/pg_migrator/src/info.c,v
retrieving revision 1.25
diff -c -r1.25 info.c
*** src/info.c    10 Dec 2009 23:14:25 -0000    1.25
--- src/info.c    13 Dec 2009 01:17:37 -0000
***************
*** 480,486 ****
                                          "SELECT DISTINCT probin "
                                          "FROM    pg_catalog.pg_proc "
                                          "WHERE    prolang = 13 /* C */ AND "
!                                         "        probin IS NOT NULL");
          totaltups += PQntuples(ress[dbnum]);

          PQfinish(conn);
--- 480,488 ----
                                          "SELECT DISTINCT probin "
                                          "FROM    pg_catalog.pg_proc "
                                          "WHERE    prolang = 13 /* C */ AND "
!                                         "        probin IS NOT NULL AND "
!                                         "        oid >= "
!                                         STRINGIFY(FirstNormalObjectId) ";");
          totaltups += PQntuples(ress[dbnum]);

          PQfinish(conn);
Index: src/pg_migrator.c
===================================================================
RCS file: /cvsroot/pg-migrator/pg_migrator/src/pg_migrator.c,v
retrieving revision 1.69
diff -c -r1.69 pg_migrator.c
*** src/pg_migrator.c    10 Dec 2009 14:34:19 -0000    1.69
--- src/pg_migrator.c    13 Dec 2009 01:17:37 -0000
***************
*** 92,97 ****
--- 92,100 ----
              sequence_script_file_name =
                  v8_3_create_sequence_script(&ctx, CLUSTER_OLD);
      }
+     if (GET_MAJOR_VERSION(ctx.old.pg_version) <= 804 &&
+         GET_MAJOR_VERSION(ctx.new.pg_version) >= 805)
+         v8_4_populate_pg_largeobject_metadata(&ctx, true, CLUSTER_OLD);

      /* Looks okay so far.  Prepare the pg_dump output */
      generate_old_dump(&ctx);
***************
*** 294,299 ****
--- 297,309 ----
          v8_3_invalidate_bpchar_pattern_ops_indexes(&ctx, false, CLUSTER_NEW);
          stop_postmaster(&ctx, false, true);
      }
+     if (GET_MAJOR_VERSION(ctx.old.pg_version) <= 804 &&
+         GET_MAJOR_VERSION(ctx.new.pg_version) >= 805)
+     {
+         start_postmaster(&ctx, CLUSTER_NEW, true);
+         v8_4_populate_pg_largeobject_metadata(&ctx, false, CLUSTER_NEW);
+         stop_postmaster(&ctx, false, true);
+     }

      pg_log(&ctx, PG_REPORT, "\n*Upgrade complete*\n");

***************
*** 416,422 ****
      char        new_clog_path[MAXPGPATH];

      /* copy old commit logs to new data dir */
!     prep_status(ctx, "Deleting old commit clogs");

      snprintf(old_clog_path, sizeof(old_clog_path), "%s/pg_clog", ctx->old.pgdata);
      snprintf(new_clog_path, sizeof(new_clog_path), "%s/pg_clog", ctx->new.pgdata);
--- 426,432 ----
      char        new_clog_path[MAXPGPATH];

      /* copy old commit logs to new data dir */
!     prep_status(ctx, "Deleting new commit clogs");

      snprintf(old_clog_path, sizeof(old_clog_path), "%s/pg_clog", ctx->old.pgdata);
      snprintf(new_clog_path, sizeof(new_clog_path), "%s/pg_clog", ctx->new.pgdata);
***************
*** 424,430 ****
          pg_log(ctx, PG_FATAL, "Unable to delete directory %s\n", new_clog_path);
      check_ok(ctx);

!     prep_status(ctx, "Copying commit clogs");
      /* libpgport's copydir() doesn't work in FRONTEND code */
  #ifndef WIN32
      exec_prog(ctx, true, SYSTEMQUOTE "%s \"%s\" \"%s\"" SYSTEMQUOTE,
--- 434,440 ----
          pg_log(ctx, PG_FATAL, "Unable to delete directory %s\n", new_clog_path);
      check_ok(ctx);

!     prep_status(ctx, "Copying old commit clogs to new server");
      /* libpgport's copydir() doesn't work in FRONTEND code */
  #ifndef WIN32
      exec_prog(ctx, true, SYSTEMQUOTE "%s \"%s\" \"%s\"" SYSTEMQUOTE,
Index: src/pg_migrator.h
===================================================================
RCS file: /cvsroot/pg-migrator/pg_migrator/src/pg_migrator.h,v
retrieving revision 1.75
diff -c -r1.75 pg_migrator.h
*** src/pg_migrator.h    12 Dec 2009 16:56:23 -0000    1.75
--- src/pg_migrator.h    13 Dec 2009 01:17:37 -0000
***************
*** 395,400 ****
--- 395,402 ----
                              bool check_mode, Cluster whichCluster);
  void        v8_3_invalidate_bpchar_pattern_ops_indexes(migratorContext *ctx,
                              bool check_mode, Cluster whichCluster);
+ void        v8_4_populate_pg_largeobject_metadata(migratorContext *ctx,
+                             bool check_mode, Cluster whichCluster);
  char         *v8_3_create_sequence_script(migratorContext *ctx,
                              Cluster whichCluster);
  void        check_for_composite_types(migratorContext *ctx,
Index: src/version.c
===================================================================
RCS file: /cvsroot/pg-migrator/pg_migrator/src/version.c,v
retrieving revision 1.32
diff -c -r1.32 version.c
*** src/version.c    7 Aug 2009 20:16:12 -0000    1.32
--- src/version.c    13 Dec 2009 01:17:37 -0000
***************
*** 421,427 ****
                      "| between your old and new clusters so the tables\n"
                      "| must be rebuilt.  The file:\n"
                      "| \t%s\n"
!                     "| when executed by psql by the database super-user,\n"
                      "| will rebuild all tables with tsvector columns.\n\n",
                      output_path);
      }
--- 421,427 ----
                      "| between your old and new clusters so the tables\n"
                      "| must be rebuilt.  The file:\n"
                      "| \t%s\n"
!                     "| when executed by psql by the database super-user\n"
                      "| will rebuild all tables with tsvector columns.\n\n",
                      output_path);
      }
***************
*** 535,541 ****
                      "| they must be reindexed with the REINDEX command.\n"
                      "| The file:\n"
                      "| \t%s\n"
!                     "| when executed by psql by the database super-user,\n"
                      "| will recreate all invalid indexes; until then,\n"
                      "| none of these indexes will be used.\n\n",
                      output_path);
--- 535,541 ----
                      "| they must be reindexed with the REINDEX command.\n"
                      "| The file:\n"
                      "| \t%s\n"
!                     "| when executed by psql by the database super-user\n"
                      "| will recreate all invalid indexes; until then,\n"
                      "| none of these indexes will be used.\n\n",
                      output_path);
***************
*** 664,670 ****
                      "| new clusters so they must be reindexed with the\n"
                      "| REINDEX command.  The file:\n"
                      "| \t%s\n"
!                     "| when executed by psql by the database super-user,\n"
                      "| will recreate all invalid indexes; until then,\n"
                      "| none of these indexes will be used.\n\n",
                      output_path);
--- 664,670 ----
                      "| new clusters so they must be reindexed with the\n"
                      "| REINDEX command.  The file:\n"
                      "| \t%s\n"
!                     "| when executed by psql by the database super-user\n"
                      "| will recreate all invalid indexes; until then,\n"
                      "| none of these indexes will be used.\n\n",
                      output_path);
***************
*** 675,680 ****
--- 675,762 ----


  /*
+  * v8_4_populate_pg_largeobject_metadata()
+  *
+  *    8.5 has a new pg_largeobject permission table
+  */
+ void
+ v8_4_populate_pg_largeobject_metadata(migratorContext *ctx, bool check_mode,
+                                       Cluster whichCluster)
+ {
+     ClusterInfo    *active_cluster = (whichCluster == CLUSTER_OLD) ?
+                     &ctx->old : &ctx->new;
+     int            dbnum;
+     FILE        *script = NULL;
+     bool        found = false;
+     char        output_path[MAXPGPATH];
+
+     prep_status(ctx, "Checking for large objects");
+
+     snprintf(output_path, sizeof(output_path), "%s/pg_largeobject.sql",
+             ctx->home_dir);
+
+     for (dbnum = 0; dbnum < active_cluster->dbarr.ndbs; dbnum++)
+     {
+         PGresult   *res;
+         int            i_count;
+         DbInfo       *active_db = &active_cluster->dbarr.dbs[dbnum];
+         PGconn       *conn = connectToServer(ctx, active_db->db_name, whichCluster);
+
+         /* find if there are any large objects */
+         res = executeQueryOrDie(ctx, conn,
+                                 "SELECT count(*) "
+                                 "FROM    pg_catalog.pg_largeobject ");
+
+         i_count = PQfnumber(res, "count");
+         if (atoi(PQgetvalue(res, 0, i_count)) != 0)
+         {
+             found = true;
+             if (!check_mode)
+             {
+                 if (script == NULL && (script = fopen(output_path, "w")) == NULL)
+                         pg_log(ctx, PG_FATAL, "Could not create necessary file:  %s\n", output_path);
+                 fprintf(script, "\\connect %s\n",
+                         quote_identifier(ctx, active_db->db_name));
+                 fprintf(script,
+                     "INSERT INTO pg_catalog.pg_largeobject_metadata (lomowner)\n"
+                                 "SELECT DISTINCT loid\n"
+                                 "FROM pg_catalog.pg_largeobject;\n");
+             }
+         }
+
+         PQclear(res);
+         PQfinish(conn);
+     }
+
+     if (found)
+     {
+         if (!check_mode)
+             fclose(script);
+         report_status(ctx, PG_WARNING, "warning");
+         if (check_mode)
+             pg_log(ctx, PG_WARNING, "\n"
+                     "| Your installation contains large objects.\n"
+                     "| The new database has an additional large object\n"
+                     "| permission table.  After migration, you will be\n"
+                     "| given a command to populate the pg_largeobject\n"
+                     "| permission table with default permissions.\n\n");
+         else
+             pg_log(ctx, PG_WARNING, "\n"
+                     "| Your installation contains large objects.\n"
+                     "| The new database has an additional large object\n"
+                     "| permission table so default permissions must be\n"
+                     "| defined for all large objects.  The file:\n"
+                     "| \t%s\n"
+                     "| when executed by psql by the database super-user\n"
+                     "| will define the default permissions.\n\n",
+                     output_path);
+     }
+     else
+         check_ok(ctx);
+ }
+
+
+ /*
   * v8_3_create_sequence_script()
   *
   *    8.4 added the column "start_value" to all sequences.  For this reason,

Largeobject Access Controls and pg_migrator

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> KaiGai Kohei wrote:
> > > What happens when
> > > there is no entry in pg_largeobject_metadata for a specific row?
> > 
> > In this case, these rows become orphan.
> > So, I think we need to create an empty large object with same LOID on
> > pg_migrator. It makes an entry on pg_largeobject_metadata without
> > writing anything to the pg_largeobject.
> > I guess rest of migrations are not difference. Correct?
> 
> Agreed.  I have modified pg_migrator with the attached patch which
> creates a script that adds default permissions for all large object
> tables.

Oops, seem like I have a problem in getting pg_migrator to populate
pg_largeobject_metadata:
test=> select lo_import('/etc/profile'); lo_import-----------     16385(1 row)test=> select
lo_import('/etc/profile.env');lo_import-----------     16386(1 row)test=> select oid,* from pg_largeobject_metadata;
oid | lomowner | lomacl-------+----------+-------- 16385 |       10 | 16386 |       10 |(2 rows)
 

You might remember that INSERT cannot set the oid of a row, so I don't
see how pg_migrator can migrate this?  Is there a reason we used 'oid'
in pg_largeobject_metadata but 'lobj' in pg_largeobject?  Why did't we
add the lomowner and lomacl columns to pg_largeobject?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Largeobject Access Controls and pg_migrator

From
KaiGai Kohei
Date:
(2009/12/13 10:39), Bruce Momjian wrote:
> Bruce Momjian wrote:
>> KaiGai Kohei wrote:
>>>> What happens when
>>>> there is no entry in pg_largeobject_metadata for a specific row?
>>>
>>> In this case, these rows become orphan.
>>> So, I think we need to create an empty large object with same LOID on
>>> pg_migrator. It makes an entry on pg_largeobject_metadata without
>>> writing anything to the pg_largeobject.
>>> I guess rest of migrations are not difference. Correct?
>>
>> Agreed.  I have modified pg_migrator with the attached patch which
>> creates a script that adds default permissions for all large object
>> tables.
>
> Oops, seem like I have a problem in getting pg_migrator to populate
> pg_largeobject_metadata:
>
>     test=>  select lo_import('/etc/profile');
>      lo_import
>     -----------
>          16385
>     (1 row)
>     
>     test=>  select lo_import('/etc/profile.env');
>      lo_import
>     -----------
>          16386
>     (1 row)
>     
>     test=>  select oid,* from pg_largeobject_metadata;
>       oid  | lomowner | lomacl
>     -------+----------+--------
>      16385 |       10 |
>      16386 |       10 |
>     (2 rows)

lo_import() has an another prototype which takes second argument to
specify LOID. Isn't it available to restore a large object with
correct LOID? For example, lo_import('/etc/profile', 1234)

Or, if you intend to restore metadata in the second lo_import(),
ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata
of a certain large object.

Or, am I missing the problem?

> You might remember that INSERT cannot set the oid of a row, so I don't
> see how pg_migrator can migrate this?  Is there a reason we used 'oid'
> in pg_largeobject_metadata but 'lobj' in pg_largeobject?  Why did't we
> add the lomowner and lomacl columns to pg_largeobject?

A large object consists of multiple tuples within pg_largeobject.
If we added lomowner and lomacl into pg_largeobject, we have to manage
all the pages in a large object to keep consistent state.

-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: Largeobject Access Controls and pg_migrator

From
Bruce Momjian
Date:
KaiGai Kohei wrote:
> lo_import() has an another prototype which takes second argument to
> specify LOID. Isn't it available to restore a large object with
> correct LOID? For example, lo_import('/etc/profile', 1234)

I can't use that because the migration has already brought over the
pg_largeobject file which has the data.

> Or, if you intend to restore metadata in the second lo_import(),
> ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata
> of a certain large object.

Yes, that will work cleanly.  The file might be large because I need a
GRANT for every large object, but I suppose that is OK.

> > You might remember that INSERT cannot set the oid of a row, so I don't
> > see how pg_migrator can migrate this?  Is there a reason we used 'oid'
> > in pg_largeobject_metadata but 'lobj' in pg_largeobject?  Why did't we
> > add the lomowner and lomacl columns to pg_largeobject?
> 
> A large object consists of multiple tuples within pg_largeobject.
> If we added lomowner and lomacl into pg_largeobject, we have to manage
> all the pages in a large object to keep consistent state.

Ah, good point.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Largeobject Access Controls and pg_migrator

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> KaiGai Kohei wrote:
> > lo_import() has an another prototype which takes second argument to
> > specify LOID. Isn't it available to restore a large object with
> > correct LOID? For example, lo_import('/etc/profile', 1234)
> 
> I can't use that because the migration has already brought over the
> pg_largeobject file which has the data.
> 
> > Or, if you intend to restore metadata in the second lo_import(),
> > ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata
> > of a certain large object.
> 
> Yes, that will work cleanly.  The file might be large because I need a
> GRANT for every large object, but I suppose that is OK.

Uh, I tested pg_migrator and found a problem with this approach:
test=> select loid from pg_largeobject; loid------- 16385 16385 16386(3 rows)test=> grant all ON LARGE OBJECT 16385 to
public;ERROR: large object 16385 does not exist
 

I am wondering if the missing pg_largeobject_metadata row is causing
this, and again I have no way of creating one with the specified oid.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Largeobject Access Controls and pg_migrator

From
KaiGai Kohei
Date:
(2009/12/13 11:31), Bruce Momjian wrote:
> Bruce Momjian wrote:
>> KaiGai Kohei wrote:
>>> lo_import() has an another prototype which takes second argument to
>>> specify LOID. Isn't it available to restore a large object with
>>> correct LOID? For example, lo_import('/etc/profile', 1234)
>>
>> I can't use that because the migration has already brought over the
>> pg_largeobject file which has the data.
>>
>>> Or, if you intend to restore metadata in the second lo_import(),
>>> ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata
>>> of a certain large object.
>>
>> Yes, that will work cleanly.  The file might be large because I need a
>> GRANT for every large object, but I suppose that is OK.
>
> Uh, I tested pg_migrator and found a problem with this approach:
>
>     test=>  select loid from pg_largeobject;
>      loid
>     -------
>      16385
>      16385
>      16386
>     (3 rows)
>     
>     test=>  grant all ON LARGE OBJECT 16385 to public;
>     ERROR:  large object 16385 does not exist
>
> I am wondering if the missing pg_largeobject_metadata row is causing
> this, and again I have no way of creating one with the specified oid.

Can SELECT lo_create(16385); help this situation?

It create an entry into pg_largeobject_metadata, but does not touch
pg_largeobject because it is empty in the initial state.
But, in this case, pg_migrator already bring only data chunks to
pg_largeobject. So, this operation enables to recombine orphan chunks
and a metadata entry.

I'm not clear whether we also check pg_largeobejct has chunks with same
LOID on lo_create(). In the regular operation, it shall never happen.

-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: Largeobject Access Controls and pg_migrator

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:

> Can SELECT lo_create(16385); help this situation?

SELECT lo_create(loid) FROM (SELECT DISTINCT loid FROM pg_largeobject) AS t

would work for pg_migrator.

> I'm not clear whether we also check pg_largeobejct has chunks with same
> LOID on lo_create(). In the regular operation, it shall never happen.

I think the omission is a reasonable optimization.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:

> We don't have any reason why still CASE ... WHEN and subquery for the given
> LOID. Right?

Ah, I see. I used your suggestion.

I applied the bug fixes. Our tools and contrib modules will always use
pg_largeobject_metadata instead of pg_largeobject to enumerate large objects.

I removed "GRANT SELECT (loid) ON pg_largeobject TO PUBLIC" from initdb
because users must use pg_largeobject_metadata.oid when they want to check
OIDs of large objects; If not, they could misjudge the existence of objects.
This is an unavoidable incompatibility unless we always have corresponding
tuples in pg_largeobject even for zero-length large objects.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls and pg_migrator

From
Bruce Momjian
Date:
Takahiro Itagaki wrote:
> 
> KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:
> 
> > Can SELECT lo_create(16385); help this situation?
> 
> SELECT lo_create(loid) FROM (SELECT DISTINCT loid FROM pg_largeobject) AS t
> 
> would work for pg_migrator.
> 
> > I'm not clear whether we also check pg_largeobejct has chunks with same
> > LOID on lo_create(). In the regular operation, it shall never happen.
> 
> I think the omission is a reasonable optimization.

Thanks, I have updated pg_migrator to use your suggested method.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
>
>> What's your opinion about:
>>   long desc: When turned on, privilege checks on large objects perform with
>>              backward compatibility as 8.4.x or earlier releases.
>
> I updated the description as your suggest.
>
> Applied with minor editorialization,
> mainly around tab-completion support in psql.

The documentation in this patch needs work.

...Robert


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2009/12/17 7:25), Robert Haas wrote:
> On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki
> <itagaki.takahiro@oss.ntt.co.jp>  wrote:
>>
>> KaiGai Kohei<kaigai@ak.jp.nec.com>  wrote:
>>
>>> What's your opinion about:
>>>    long desc: When turned on, privilege checks on large objects perform with
>>>               backward compatibility as 8.4.x or earlier releases.
>>
>> I updated the description as your suggest.
>>
>> Applied with minor editorialization,
>> mainly around tab-completion support in psql.
> 
> The documentation in this patch needs work.

Are you talking about English quality? or Am I missing something to be
documented?

-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
2009/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> (2009/12/17 7:25), Robert Haas wrote:
>> On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki
>> <itagaki.takahiro@oss.ntt.co.jp>  wrote:
>>>
>>> KaiGai Kohei<kaigai@ak.jp.nec.com>  wrote:
>>>
>>>> What's your opinion about:
>>>>    long desc: When turned on, privilege checks on large objects perform with
>>>>               backward compatibility as 8.4.x or earlier releases.
>>>
>>> I updated the description as your suggest.
>>>
>>> Applied with minor editorialization,
>>> mainly around tab-completion support in psql.
>>
>> The documentation in this patch needs work.
>
> Are you talking about English quality? or Am I missing something to be
> documented?

Mostly English quality, but there are some other issues too.  Proposed
patch attached.

...Robert

Attachment

Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2009/12/17 13:20), Robert Haas wrote:
> 2009/12/16 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>> (2009/12/17 7:25), Robert Haas wrote:
>>> On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki
>>> <itagaki.takahiro@oss.ntt.co.jp>    wrote:
>>>>
>>>> KaiGai Kohei<kaigai@ak.jp.nec.com>    wrote:
>>>>
>>>>> What's your opinion about:
>>>>>     long desc: When turned on, privilege checks on large objects perform with
>>>>>                backward compatibility as 8.4.x or earlier releases.
>>>>
>>>> I updated the description as your suggest.
>>>>
>>>> Applied with minor editorialization,
>>>> mainly around tab-completion support in psql.
>>>
>>> The documentation in this patch needs work.
>>
>> Are you talking about English quality? or Am I missing something to be
>> documented?
> 
> Mostly English quality, but there are some other issues too.  Proposed
> patch attached.

I have nothing to comment on English quality....

Thanks for your fixups.
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> 2009/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> >>>> ? ?long desc: When turned on, privilege checks on large objects perform with
> >>>> ? ? ? ? ? ? ? backward compatibility as 8.4.x or earlier releases.

> Mostly English quality, but there are some other issues too.  Proposed
> patch attached.

I remember we had discussions about the version number, like
"Don't use '8.5' because it might be released as '9.0'", no?

Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata">
for the first <structname>pg_largeobject</structname> in each topic.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
2009/12/17 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:
>
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> 2009/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
>> >>>> ? ?long desc: When turned on, privilege checks on large objects perform with
>> >>>> ? ? ? ? ? ? ? backward compatibility as 8.4.x or earlier releases.
>
>> Mostly English quality, but there are some other issues too.  Proposed
>> patch attached.
>
> I remember we had discussions about the version number, like
> "Don't use '8.5' because it might be released as '9.0'", no?

I chose to follow the style which Tom indicated that he preferred
here.  We don't use the phrase "8.4.x series" anywhere else in the
documentation, so this doesn't seem like a good time to start.  Tom or
I will go through and renumber everything if we end up renaming the
release to 9.0.

> Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata">
> for the first <structname>pg_largeobject</structname> in each topic.

Those two things aren't the same.  Perhaps you meant <link
linkend="catalog-pg-largeobject">? I'll tweak the pg_largeobject and
pg_largeobject_metadata sections to make sure each has a link to the
other and commit this.  I also found one more spelling mistake so I
will include that correction as well.

...Robert


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> > Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata">
> > for the first <structname>pg_largeobject</structname> in each topic.
> 
> Those two things aren't the same.  Perhaps you meant <link
> linkend="catalog-pg-largeobject">?

Oops, yes. Thank you for the correction.

We also have "8.4.x series" in the core code. Do you think we also
rewrite those messages? One of them is an use-visible message.

LargeObjectAlterOwner() : pg_largeobject.c        * The 'lo_compat_privileges' is not checked here, because we        *
don'thave any access control features in the 8.4.x series        * or earlier release.        * So, it is not a place
wecan define a compatible behavior.
 

guc.c   {"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS,       gettext_noop("Enables backward compatibility
inprivilege checks on large objects"),       gettext_noop("When turned on, privilege checks on large objects perform "
                 "with backward compatibility as 8.4.x or earlier releases.")
 


Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
On Thu, Dec 17, 2009 at 7:27 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>> > Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata">
>> > for the first <structname>pg_largeobject</structname> in each topic.
>> Those two things aren't the same.  Perhaps you meant <link
>> linkend="catalog-pg-largeobject">?
> Oops, yes. Thank you for the correction.
>
> We also have "8.4.x series" in the core code. Do you think we also
> rewrite those messages? One of them is an use-visible message.

Yes.  I started going through the comments tonight.  Partial patch
attached.  There were two comments that I was unable to understand and
therefore could not reword - the one at the top of
pg_largeobject_aclmask_snapshot(), and the second part of the comment
at the top of LargeObjectExists():

 * Note that LargeObjectExists always scans the system catalog
 * with SnapshotNow, so it is unavailable to use to check
 * existence in read-only accesses.

In both cases, I'm lost.  Help?

In acldefault(), there is this comment:

  /* Grant SELECT,UPDATE by default, for now */

This doesn't seem to match what the code is doing, so I think we
should remove it.

I also notice that dumpBlobComments() is now misnamed, but it seems
we've chosen to add a comment
mentioning that fact rather than fixing it.  That doesn't seem like
the right approach.

...Robert

Attachment

Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> In both cases, I'm lost.  Help?

They might be contrasted with the comments for myLargeObjectExists.
Since we use MVCC visibility in loread(), metadata for large object
also should be visible in MVCC rule.

If I understand them, they say: * pg_largeobject_aclmask_snapshot requires a snapshot which will be   used in loread().
*Don't use LargeObjectExists if you need MVCC visibility.
 

> In acldefault(), there is this comment:
>   /* Grant SELECT,UPDATE by default, for now */
> This doesn't seem to match what the code is doing, so I think we
> should remove it.

Ah, ACL_NO_RIGHTS is the default.

> I also notice that dumpBlobComments() is now misnamed, but it seems
> we've chosen to add a comment mentioning that fact rather than fixing it.

Hmmm, now it dumps not only comments but also ownership of large objects.
Should we rename it dumpBlobMetadata() or so?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2009/12/18 15:48), Takahiro Itagaki wrote:
>
> Robert Haas<robertmhaas@gmail.com>  wrote:
>
>> In both cases, I'm lost.  Help?
>
> They might be contrasted with the comments for myLargeObjectExists.
> Since we use MVCC visibility in loread(), metadata for large object
> also should be visible in MVCC rule.
>
> If I understand them, they say:
>    * pg_largeobject_aclmask_snapshot requires a snapshot which will be
>      used in loread().
>    * Don't use LargeObjectExists if you need MVCC visibility.

Yes, correct.

>> In acldefault(), there is this comment:
>>    /* Grant SELECT,UPDATE by default, for now */
>> This doesn't seem to match what the code is doing, so I think we
>> should remove it.
>
> Ah, ACL_NO_RIGHTS is the default.

Oops, it reflects very early phase design, but fixed later.

>> I also notice that dumpBlobComments() is now misnamed, but it seems
>> we've chosen to add a comment mentioning that fact rather than fixing it.
>
> Hmmm, now it dumps not only comments but also ownership of large objects.
> Should we rename it dumpBlobMetadata() or so?

It seems to me quite natural.

The attached patch fixes them.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
2009/12/18 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> (2009/12/18 15:48), Takahiro Itagaki wrote:
>>
>> Robert Haas<robertmhaas@gmail.com>  wrote:
>>
>>> In both cases, I'm lost.  Help?
>>
>> They might be contrasted with the comments for myLargeObjectExists.
>> Since we use MVCC visibility in loread(), metadata for large object
>> also should be visible in MVCC rule.
>>
>> If I understand them, they say:
>>    * pg_largeobject_aclmask_snapshot requires a snapshot which will be
>>      used in loread().
>>    * Don't use LargeObjectExists if you need MVCC visibility.
>
> Yes, correct.
>
>>> In acldefault(), there is this comment:
>>>    /* Grant SELECT,UPDATE by default, for now */
>>> This doesn't seem to match what the code is doing, so I think we
>>> should remove it.
>>
>> Ah, ACL_NO_RIGHTS is the default.
>
> Oops, it reflects very early phase design, but fixed later.
>
>>> I also notice that dumpBlobComments() is now misnamed, but it seems
>>> we've chosen to add a comment mentioning that fact rather than fixing it.
>>
>> Hmmm, now it dumps not only comments but also ownership of large objects.
>> Should we rename it dumpBlobMetadata() or so?
>
> It seems to me quite natural.
>
> The attached patch fixes them.

I think we might want to go with dumpBlobProperties(), because
dumpBlobMetadata() might lead you to think that all of the properties
being dumped are stored in pg_largeobject_metadata, which is not the
case.

I do also wonder why we are calling these blobs in this code rather
than large objects, but that problem predates this patch and I think
we might as well leave it alone for now.

...Robert


Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
On Fri, Dec 18, 2009 at 9:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> 2009/12/18 KaiGai Kohei <kaigai@ak.jp.nec.com>:
>> (2009/12/18 15:48), Takahiro Itagaki wrote:
>>>
>>> Robert Haas<robertmhaas@gmail.com>  wrote:
>>>
>>>> In both cases, I'm lost.  Help?
>>>
>>> They might be contrasted with the comments for myLargeObjectExists.
>>> Since we use MVCC visibility in loread(), metadata for large object
>>> also should be visible in MVCC rule.
>>>
>>> If I understand them, they say:
>>>    * pg_largeobject_aclmask_snapshot requires a snapshot which will be
>>>      used in loread().
>>>    * Don't use LargeObjectExists if you need MVCC visibility.
>>
>> Yes, correct.
>>
>>>> In acldefault(), there is this comment:
>>>>    /* Grant SELECT,UPDATE by default, for now */
>>>> This doesn't seem to match what the code is doing, so I think we
>>>> should remove it.
>>>
>>> Ah, ACL_NO_RIGHTS is the default.
>>
>> Oops, it reflects very early phase design, but fixed later.
>>
>>>> I also notice that dumpBlobComments() is now misnamed, but it seems
>>>> we've chosen to add a comment mentioning that fact rather than fixing it.
>>>
>>> Hmmm, now it dumps not only comments but also ownership of large objects.
>>> Should we rename it dumpBlobMetadata() or so?
>>
>> It seems to me quite natural.
>>
>> The attached patch fixes them.
>
> I think we might want to go with dumpBlobProperties(), because
> dumpBlobMetadata() might lead you to think that all of the properties
> being dumped are stored in pg_largeobject_metadata, which is not the
> case.

Oh.  This is more complicated than it appeared on the surface.  It
seems that the string "BLOB COMMENTS" actually gets inserted into
custom dumps somewhere, so I'm not sure whether we can just change it.Was this issue discussed at some point before
thiswas committed? 
Changing it would seem to require inserting some backward
compatibility code here.  Another option would be to add a separate
section for "BLOB METADATA", and leave "BLOB COMMENTS" alone.  Can
anyone comment on what the Right Thing To Do is here?

...Robert


Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
On Fri, Dec 18, 2009 at 1:48 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>> In both cases, I'm lost.  Help?
>
> They might be contrasted with the comments for myLargeObjectExists.
> Since we use MVCC visibility in loread(), metadata for large object
> also should be visible in MVCC rule.
>
> If I understand them, they say:
>  * pg_largeobject_aclmask_snapshot requires a snapshot which will be
>    used in loread().
>  * Don't use LargeObjectExists if you need MVCC visibility.

Part of what I'm confused about (and what I think should be documented
in a comment somewhere) is why we're using MVCC visibility in some
places but not others.  In particular, there seem to be some bits of
the comment that imply that we do this for read but not for write,
which seems really strange.  It may or may not actually be strange,
but I don't understand it.

...Robert


Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Oh.  This is more complicated than it appeared on the surface.  It
> seems that the string "BLOB COMMENTS" actually gets inserted into
> custom dumps somewhere, so I'm not sure whether we can just change it.
>  Was this issue discussed at some point before this was committed?
> Changing it would seem to require inserting some backward
> compatibility code here.  Another option would be to add a separate
> section for "BLOB METADATA", and leave "BLOB COMMENTS" alone.  Can
> anyone comment on what the Right Thing To Do is here?

The BLOB COMMENTS label is, or was, correct for what it contained.
If this patch has usurped it to contain other things I would argue
that that is seriously wrong.  pg_dump already has a clear notion
of how to handle ACLs for objects.  ACLs for blobs ought to be
made to fit into that structure, not dumped in some random place
because that saved a few lines of code.
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Part of what I'm confused about (and what I think should be documented
> in a comment somewhere) is why we're using MVCC visibility in some
> places but not others.  In particular, there seem to be some bits of
> the comment that imply that we do this for read but not for write,
> which seems really strange.  It may or may not actually be strange,
> but I don't understand it.

It is supposed to depend on whether you opened the blob for read only
or for read write.  Please do not tell me that this patch broke that;
because if it did it broke pg_dump.

This behavior is documented at least here:
http://www.postgresql.org/docs/8.4/static/lo-interfaces.html#AEN36338
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Oh.  This is more complicated than it appeared on the surface.  It
>> seems that the string "BLOB COMMENTS" actually gets inserted into
>> custom dumps somewhere, so I'm not sure whether we can just change it.
>>  Was this issue discussed at some point before this was committed?
>> Changing it would seem to require inserting some backward
>> compatibility code here.  Another option would be to add a separate
>> section for "BLOB METADATA", and leave "BLOB COMMENTS" alone.  Can
>> anyone comment on what the Right Thing To Do is here?
>
> The BLOB COMMENTS label is, or was, correct for what it contained.
> If this patch has usurped it to contain other things

It has.

> I would argue
> that that is seriously wrong.  pg_dump already has a clear notion
> of how to handle ACLs for objects.  ACLs for blobs ought to be
> made to fit into that structure, not dumped in some random place
> because that saved a few lines of code.

OK.  Hopefully KaiGai or Takahiro can suggest a fix.

Thanks,

...Robert


Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
On Fri, Dec 18, 2009 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Part of what I'm confused about (and what I think should be documented
>> in a comment somewhere) is why we're using MVCC visibility in some
>> places but not others.  In particular, there seem to be some bits of
>> the comment that imply that we do this for read but not for write,
>> which seems really strange.  It may or may not actually be strange,
>> but I don't understand it.
>
> It is supposed to depend on whether you opened the blob for read only
> or for read write.  Please do not tell me that this patch broke that;
> because if it did it broke pg_dump.
>
> This behavior is documented at least here:
> http://www.postgresql.org/docs/8.4/static/lo-interfaces.html#AEN36338

Oh, I see.  Thanks for the pointer.  Having read that through, I can
now say that the comments in the patch seem to imply that it attempted
to preserve those semantics, but I can't swear that it did.  I will
take another look at it, but it might bear closer examination by
someone with more MVCC-fu than myself.

...Robert


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2009/12/19 12:05), Robert Haas wrote:
> On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Robert Haas<robertmhaas@gmail.com>  writes:
>>> Oh.  This is more complicated than it appeared on the surface.  It
>>> seems that the string "BLOB COMMENTS" actually gets inserted into
>>> custom dumps somewhere, so I'm not sure whether we can just change it.
>>>   Was this issue discussed at some point before this was committed?
>>> Changing it would seem to require inserting some backward
>>> compatibility code here.  Another option would be to add a separate
>>> section for "BLOB METADATA", and leave "BLOB COMMENTS" alone.  Can
>>> anyone comment on what the Right Thing To Do is here?
>>
>> The BLOB COMMENTS label is, or was, correct for what it contained.
>> If this patch has usurped it to contain other things
> 
> It has.
> 
>> I would argue
>> that that is seriously wrong.  pg_dump already has a clear notion
>> of how to handle ACLs for objects.  ACLs for blobs ought to be
>> made to fit into that structure, not dumped in some random place
>> because that saved a few lines of code.
> 
> OK.  Hopefully KaiGai or Takahiro can suggest a fix.

Currently, BLOBS (and BLOB COMMENTS) section does not contain
owner of the large objects, because it may press the local memory
of pg_dump when the database contains massive amount of large
objects.
I believe it is the reason why we dump all the large objects in
a single section. Correct?

I don't think it is reasonable to dump all the large object with
its individual section.
However, we can categorize them per owner. In generally, we can
assume the number of database users are smaller than the number
of large objects.
In other word, we can obtain the number of sections to be dumped
as result of the following query:
 SELECT DISTINCT lomowner FROM pg_largeobject_metadata;

Then, we can dump them per users.

For earlier versions, all the large objects will be dumped in
a single section with anonymous user.

What's your opinion?
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2009/12/21 9:39), KaiGai Kohei wrote:
> (2009/12/19 12:05), Robert Haas wrote:
>> On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane<tgl@sss.pgh.pa.us>   wrote:
>>> Robert Haas<robertmhaas@gmail.com>   writes:
>>>> Oh.  This is more complicated than it appeared on the surface.  It
>>>> seems that the string "BLOB COMMENTS" actually gets inserted into
>>>> custom dumps somewhere, so I'm not sure whether we can just change it.
>>>>    Was this issue discussed at some point before this was committed?
>>>> Changing it would seem to require inserting some backward
>>>> compatibility code here.  Another option would be to add a separate
>>>> section for "BLOB METADATA", and leave "BLOB COMMENTS" alone.  Can
>>>> anyone comment on what the Right Thing To Do is here?
>>>
>>> The BLOB COMMENTS label is, or was, correct for what it contained.
>>> If this patch has usurped it to contain other things
>>
>> It has.
>>
>>> I would argue
>>> that that is seriously wrong.  pg_dump already has a clear notion
>>> of how to handle ACLs for objects.  ACLs for blobs ought to be
>>> made to fit into that structure, not dumped in some random place
>>> because that saved a few lines of code.
>>
>> OK.  Hopefully KaiGai or Takahiro can suggest a fix.

The patch has grown larger than I expected before, because the way
to handle large objects are far from any other object classes.

Here are three points:

1) The new BLOB ACLS section was added.

It is a single purpose section to describe GRANT/REVOKE statements
on large objects, and BLOB COMMENTS section was reverted to put
only descriptions.

Because we need to assume a case when the database holds massive
number of large objects, it is not reasonable to store them using
dumpACL(). It chains an ACL entry with the list of TOC entries,
then, these are dumped. It means pg_dump may have to register massive
number of large objects in the local memory space.

Currently, we also store GRANT/REVOKE statements in BLOB COMMENTS
section, but confusable. Even if pg_restore is launched with
--no-privileges options, it cannot ignore GRANT/REVOKE statements
on large objects. This fix enables to distinguish ACLs on large
objects from other properties, and to handle them correctly.

2) The BLOBS section was separated for each database users.

Currently, the BLOBS section does not have information about owner
of the large objects to be restored. So, we tried to alter its
ownership in the BLOB COMMENTS section, but incorrect.

The --use-set-session-authorization option requires to restore
ownership of objects without ALTER ... OWNER TO statements.
So, we need to set up correct database username on the section
properties.

This patch renamed the hasBlobs() by getBlobs(), and changed its
purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
for each large objects owners, if necessary.
For example, if here are five users owning large objects, getBlobs()
shall register five TOC entries for each users, and dumpBlobs(),
dumpBlobComments() and dumpBlobAcls() shall be also invoked five
times with the username.

3) _LoadBlobs()

For regular database object classes, _printTocEntry() can inject
"ALTER xxx OWNER TO ..." statement on the restored object based on
the ownership described in the section header.
However, we cannot use this infrastructure for large objects as-is,
because one BLOBS section can restore multiple large objects.

_LoadBlobs() is a routine to restore large objects within a certain
section. This patch modifies this routine to inject "ALTER LARGE
OBJECT <loid> OWNER TO <user>" statement for each large objects
based on the ownership of the section.
(if --use-set-session-authorization is not given.)


$ diffstat pgsql-fix-pg_dump-blob-privs.patch
 pg_backup_archiver.c |    4
 pg_backup_custom.c   |   11 !
 pg_backup_files.c    |    9 !
 pg_backup_tar.c      |    9 !
 pg_dump.c            |  312 +++++++----!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 pg_dump.h            |    9 !
 pg_dump_sort.c       |    8 !
 7 files changed, 68 insertions(+), 25 deletions(-), 269 modifications(!)

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
2009/12/22 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> (2009/12/21 9:39), KaiGai Kohei wrote:
>> (2009/12/19 12:05), Robert Haas wrote:
>>> On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane<tgl@sss.pgh.pa.us>   wrote:
>>>> Robert Haas<robertmhaas@gmail.com>   writes:
>>>>> Oh.  This is more complicated than it appeared on the surface.  It
>>>>> seems that the string "BLOB COMMENTS" actually gets inserted into
>>>>> custom dumps somewhere, so I'm not sure whether we can just change it.
>>>>>    Was this issue discussed at some point before this was committed?
>>>>> Changing it would seem to require inserting some backward
>>>>> compatibility code here.  Another option would be to add a separate
>>>>> section for "BLOB METADATA", and leave "BLOB COMMENTS" alone.  Can
>>>>> anyone comment on what the Right Thing To Do is here?
>>>>
>>>> The BLOB COMMENTS label is, or was, correct for what it contained.
>>>> If this patch has usurped it to contain other things
>>>
>>> It has.
>>>
>>>> I would argue
>>>> that that is seriously wrong.  pg_dump already has a clear notion
>>>> of how to handle ACLs for objects.  ACLs for blobs ought to be
>>>> made to fit into that structure, not dumped in some random place
>>>> because that saved a few lines of code.
>>>
>>> OK.  Hopefully KaiGai or Takahiro can suggest a fix.
>
> The patch has grown larger than I expected before, because the way
> to handle large objects are far from any other object classes.
>
> Here are three points:
>
> 1) The new BLOB ACLS section was added.
>
> It is a single purpose section to describe GRANT/REVOKE statements
> on large objects, and BLOB COMMENTS section was reverted to put
> only descriptions.
>
> Because we need to assume a case when the database holds massive
> number of large objects, it is not reasonable to store them using
> dumpACL(). It chains an ACL entry with the list of TOC entries,
> then, these are dumped. It means pg_dump may have to register massive
> number of large objects in the local memory space.
>
> Currently, we also store GRANT/REVOKE statements in BLOB COMMENTS
> section, but confusable. Even if pg_restore is launched with
> --no-privileges options, it cannot ignore GRANT/REVOKE statements
> on large objects. This fix enables to distinguish ACLs on large
> objects from other properties, and to handle them correctly.
>
> 2) The BLOBS section was separated for each database users.
>
> Currently, the BLOBS section does not have information about owner
> of the large objects to be restored. So, we tried to alter its
> ownership in the BLOB COMMENTS section, but incorrect.
>
> The --use-set-session-authorization option requires to restore
> ownership of objects without ALTER ... OWNER TO statements.
> So, we need to set up correct database username on the section
> properties.
>
> This patch renamed the hasBlobs() by getBlobs(), and changed its
> purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
> for each large objects owners, if necessary.
> For example, if here are five users owning large objects, getBlobs()
> shall register five TOC entries for each users, and dumpBlobs(),
> dumpBlobComments() and dumpBlobAcls() shall be also invoked five
> times with the username.
>
> 3) _LoadBlobs()
>
> For regular database object classes, _printTocEntry() can inject
> "ALTER xxx OWNER TO ..." statement on the restored object based on
> the ownership described in the section header.
> However, we cannot use this infrastructure for large objects as-is,
> because one BLOBS section can restore multiple large objects.
>
> _LoadBlobs() is a routine to restore large objects within a certain
> section. This patch modifies this routine to inject "ALTER LARGE
> OBJECT <loid> OWNER TO <user>" statement for each large objects
> based on the ownership of the section.
> (if --use-set-session-authorization is not given.)
>
>
> $ diffstat pgsql-fix-pg_dump-blob-privs.patch
>  pg_backup_archiver.c |    4
>  pg_backup_custom.c   |   11 !
>  pg_backup_files.c    |    9 !
>  pg_backup_tar.c      |    9 !
>  pg_dump.c            |  312 +++++++----!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>  pg_dump.h            |    9 !
>  pg_dump_sort.c       |    8 !
>  7 files changed, 68 insertions(+), 25 deletions(-), 269 modifications(!)

I will review this sooner if I have time, but please make sure it gets
added to the next CommitFest so we don't lose it.  I think it also
needs to be added here, since AFAICS this is a must-fix for 8.5.

http://wiki.postgresql.org/wiki/PostgreSQL_8.5_Open_Items

Thanks,

...Robert


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> This patch renamed the hasBlobs() by getBlobs(), and changed its
> purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
> for each large objects owners, if necessary.

This patch adds DumpableObjectType DO_BLOB_ACLS and struct BlobsInfo. We
use three BlobsInfo objects for DO_BLOBS, DO_BLOB_COMMENTS, and DO_BLOB_ACLS
_for each distinct owners_ of large objects. So, even if we have many large
objects in the database, we just keep at most (3 * num-of-roles) BlobsInfo
in memory. For older versions of server, we assume that blobs are owned by
only one user with an empty name. We have no BlobsInfo if no large objects.


I'm not sure whether we need to make groups for each owner of large objects.
If I remember right, the primary issue was separating routines for dump
BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?


Another concern is their confusable identifier names -- getBlobs()
returns BlobsInfo for each owners. Could we rename them something
like getBlobOwners() and BlobOwnerInfo?

Also, DumpableObject.name is not used in BlobsInfo. We could reuse
DumpableObject.name instead of the "rolname" field in BlobsInfo.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/01/21 16:52), Takahiro Itagaki wrote:
> 
> KaiGai Kohei<kaigai@ak.jp.nec.com>  wrote:
> 
>> This patch renamed the hasBlobs() by getBlobs(), and changed its
>> purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
>> for each large objects owners, if necessary.
> 
> This patch adds DumpableObjectType DO_BLOB_ACLS and struct BlobsInfo. We
> use three BlobsInfo objects for DO_BLOBS, DO_BLOB_COMMENTS, and DO_BLOB_ACLS
> _for each distinct owners_ of large objects. So, even if we have many large
> objects in the database, we just keep at most (3 * num-of-roles) BlobsInfo
> in memory. For older versions of server, we assume that blobs are owned by
> only one user with an empty name. We have no BlobsInfo if no large objects.
> 
> 
> I'm not sure whether we need to make groups for each owner of large objects.
> If I remember right, the primary issue was separating routines for dump
> BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?

When --use-set-session-authorization is specified, pg_restore tries to
change database role of the current session just before creation of
database objects to be restored.

Ownership of the database objects are recorded in the section header,
and it informs pg_restore who should be owner of the objects to be
restored in this section.

Then, pg_restore can generate ALTER xxx OWNER TO after creation, or
SET SESSION AUTHORIZATION before creation in runtime.
So, we cannot put creation of largeobjects with different ownership
in same section.

It is the reason why we have to group largeobjects by database user.

> Another concern is their confusable identifier names -- getBlobs()
> returns BlobsInfo for each owners. Could we rename them something
> like getBlobOwners() and BlobOwnerInfo?

OK, I'll do.

> Also, DumpableObject.name is not used in BlobsInfo. We could reuse
> DumpableObject.name instead of the "rolname" field in BlobsInfo.

Isn't it confusable for the future hacks?
It follows the manner in other database object classes.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> > I'm not sure whether we need to make groups for each owner of large objects.
> > If I remember right, the primary issue was separating routines for dump
> > BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?
> 
> When --use-set-session-authorization is specified, pg_restore tries to
> change database role of the current session just before creation of
> database objects to be restored.
> 
> Ownership of the database objects are recorded in the section header,
> and it informs pg_restore who should be owner of the objects to be
> restored in this section.
> 
> Then, pg_restore can generate ALTER xxx OWNER TO after creation, or
> SET SESSION AUTHORIZATION before creation in runtime.
> So, we cannot put creation of largeobjects with different ownership
> in same section.
> 
> It is the reason why we have to group largeobjects by database user.

Ah, I see.

Then... What happen if we drop or rename roles who have large objects
during pg_dump? Does the patch still work? It uses pg_get_userbyid().

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/01/21 19:42), Takahiro Itagaki wrote:
> 
> KaiGai Kohei<kaigai@ak.jp.nec.com>  wrote:
> 
>>> I'm not sure whether we need to make groups for each owner of large objects.
>>> If I remember right, the primary issue was separating routines for dump
>>> BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?
>>
>> When --use-set-session-authorization is specified, pg_restore tries to
>> change database role of the current session just before creation of
>> database objects to be restored.
>>
>> Ownership of the database objects are recorded in the section header,
>> and it informs pg_restore who should be owner of the objects to be
>> restored in this section.
>>
>> Then, pg_restore can generate ALTER xxx OWNER TO after creation, or
>> SET SESSION AUTHORIZATION before creation in runtime.
>> So, we cannot put creation of largeobjects with different ownership
>> in same section.
>>
>> It is the reason why we have to group largeobjects by database user.
> 
> Ah, I see.
> 
> Then... What happen if we drop or rename roles who have large objects
> during pg_dump? Does the patch still work? It uses pg_get_userbyid().

Indeed, pg_get_userbyid() internally uses SnapshotNow always, then it
can return "unknown (OID=%u)" in this case.

The "username_subquery" should be here, instead.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
The attached patch is a revised version.

List of updates:
- cleanup: getBlobs() was renamed to getBlobOwners()
- cleanup: BlobsInfo was renamed to BlobOwnerInfo
- bugfix: pg_get_userbyid() in SQLs were replaced by username_subquery which
          constins a right subquery to obtain a username for the given id.
          It enables to run pg_dump under the concurrent role deletion.
- bugfix: Even if we give -a (--data-only) or -O (--no-owner) options,
          or archive does not contain Owner information, it tried to write
          out "ALTER LARGE OBJECT xxx OWNER TO ..." statement.
- bugfix: Even if we give -a (--data-only) or -x (--no-privileges) options,
          it tried to write out "BLOB ACLS" section.

The last two are the problems I noticed newly. It needs to introduce them.

The BLOB section can contain multiple definitions of large objects, unlike
any other object classes. It is also a reason why I had to group large
objects by database user.
The Owner tag of BLOB section is used to identify owner of the large objects
to be restored, and also used in --use-set-session-authorization mode.
However, we need to inject "ALTER LARGE OBJECT xxx OWNER TO ..." statement
for each lo_create() in _LoadBlobs(), because we cannot know how many large
objects are in the section before reading the archive.
But the last patch didn't pay mention about -a/-O option and an archive
which does not have Owner: tag.

The BLOB ACLS section is categorized to SECTION_DATA, it follows the manner
in BLOB COMMENTS behavior. In same reason, it has to handle the -a/-x option
by itself, but the last patch didn't handle it well.

BTW, here is a known issue. When we run pg_dump with -s(--schema-only),
it write out descriptions of regular object classes, but does not write
out the description of large objects.
It seems to me the description of large objects are considered as a part
of data, not properties. However, it might be inconsistent.

The reason of this behavior is all the BLOB dumps are categorized to
SECTION_DATA, so -s option informs pg_backup_archiver.c to skip routines
related to large objects.

However, it may be a time to consider this code into two steps.
In the schema section stage,
 - It creates empty large objects with lo_create()
 - It restores the description of the large objects.
 - It restores the ownership/privileges of the large objects.

In the date section stage,
 - It loads actual data contents into the empty large objects with lowrite().

Thanks,

(2010/01/21 19:42), Takahiro Itagaki wrote:
>
> KaiGai Kohei<kaigai@ak.jp.nec.com>  wrote:
>
>>> I'm not sure whether we need to make groups for each owner of large objects.
>>> If I remember right, the primary issue was separating routines for dump
>>> BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?
>>
>> When --use-set-session-authorization is specified, pg_restore tries to
>> change database role of the current session just before creation of
>> database objects to be restored.
>>
>> Ownership of the database objects are recorded in the section header,
>> and it informs pg_restore who should be owner of the objects to be
>> restored in this section.
>>
>> Then, pg_restore can generate ALTER xxx OWNER TO after creation, or
>> SET SESSION AUTHORIZATION before creation in runtime.
>> So, we cannot put creation of largeobjects with different ownership
>> in same section.
>>
>> It is the reason why we have to group largeobjects by database user.
>
> Ah, I see.
>
> Then... What happen if we drop or rename roles who have large objects
> during pg_dump? Does the patch still work? It uses pg_get_userbyid().
>
> Regards,
> ---
> Takahiro Itagaki
> NTT Open Source Software Center
>
>
>


--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> writes:
> The attached patch is a revised version.

I'm inclined to wonder whether this patch doesn't prove that we've
reached the end of the line for the current representation of blobs
in pg_dump archives.  The alternative that I'm thinking about is to
treat each blob as an independent object (hence, with its own TOC
entry).  If we did that, then the standard pg_dump mechanisms for
ownership, ACLs, and comments would apply, and we could get rid of
the messy hacks that this patch is just adding to.  That would also
open the door to future improvements like being able to selectively
restore blobs.  (Actually you could do it immediately if you didn't
mind editing a -l file...)  And it would for instance allow loading
of blobs to be parallelized.

Now the argument against that is that it won't scale terribly well
to situations with very large numbers of blobs.  However, I'm not
convinced that the current approach of cramming them all into one
TOC entry scales so well either.  If your large objects are actually
large, there's not going to be an enormous number of them.  We've
heard of people with many tens of thousands of tables, and pg_dump
speed didn't seem to be a huge bottleneck for them (at least not
in recent versions).  So I'm feeling we should not dismiss the
idea of one TOC entry per blob.

Thoughts?
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now the argument against that is that it won't scale terribly well
> to situations with very large numbers of blobs.  However, I'm not
> convinced that the current approach of cramming them all into one
> TOC entry scales so well either.  If your large objects are
> actually large, there's not going to be an enormous number of
> them.  We've heard of people with many tens of thousands of
> tables, and pg_dump speed didn't seem to be a huge bottleneck for
> them (at least not in recent versions).  So I'm feeling we should
> not dismiss the idea of one TOC entry per blob.
> 
> Thoughts?
We've got a "DocImage" table with about 7 million rows storing PDF
documents in a bytea column, approaching 1 TB of data.  (We don't
want to give up ACID guarantees, replication, etc. by storing them
on the file system with filenames in the database.)  This works
pretty well, except that client software occasionally has a tendency
to run out of RAM.  The interface could arguably be cleaner if we
used BLOBs, but the security issues have precluded that in
PostgreSQL.
I suspect that 7 million BLOBs (and growing fast) would be a problem
for this approach.  Of course, if we're atypical, we could stay with
bytea if this changed.  Just a data point.
-Kevin
cir=> select count(*) from "DocImage"; count
---------6891626
(1 row)

cir=> select pg_size_pretty(pg_total_relation_size('"DocImage"'));pg_size_pretty
----------------956 GB
(1 row)



Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We've heard of people with many tens of thousands of
>> tables, and pg_dump speed didn't seem to be a huge bottleneck for
>> them (at least not in recent versions).  So I'm feeling we should
>> not dismiss the idea of one TOC entry per blob.
>> 
>> Thoughts?

> I suspect that 7 million BLOBs (and growing fast) would be a problem
> for this approach.  Of course, if we're atypical, we could stay with
> bytea if this changed.  Just a data point.

Do you have the opportunity to try an experiment on hardware similar to
what you're running that on?  Create a database with 7 million tables
and see what the dump/restore times are like, and whether
pg_dump/pg_restore appear to be CPU-bound or memory-limited when doing
it.  If they aren't, we could conclude that millions of TOC entries
isn't a problem.

A compromise we could consider is some sort of sub-TOC-entry scheme that
gets the per-BLOB entries out of the main speed bottlenecks, while still
letting us share most of the logic.  For instance, I suspect that the
first bottleneck in pg_dump would be the dependency sorting, but we
don't really need to sort all the blobs individually for that.
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Do you have the opportunity to try an experiment on hardware
> similar to what you're running that on?  Create a database with 7
> million tables and see what the dump/restore times are like, and
> whether pg_dump/pg_restore appear to be CPU-bound or
> memory-limited when doing it.
If these can be empty (or nearly empty) tables, I can probably swing
it as a background task.  You didn't need to match the current 1.3
TB database size I assume?
> If they aren't, we could conclude that millions of TOC entries
> isn't a problem.
I'd actually be rather more concerned about the effects on normal
query plan times, or are you confident that won't be an issue?
> A compromise we could consider is some sort of sub-TOC-entry
> scheme that gets the per-BLOB entries out of the main speed
> bottlenecks, while still letting us share most of the logic.  For
> instance, I suspect that the first bottleneck in pg_dump would be
> the dependency sorting, but we don't really need to sort all the
> blobs individually for that.
That might also address the plan time issue, if it actually exists.
-Kevin


Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Do you have the opportunity to try an experiment on hardware
>> similar to what you're running that on?  Create a database with 7
>> million tables and see what the dump/restore times are like, and
>> whether pg_dump/pg_restore appear to be CPU-bound or
>> memory-limited when doing it.
> If these can be empty (or nearly empty) tables, I can probably swing
> it as a background task.  You didn't need to match the current 1.3
> TB database size I assume?

Empty is fine.

>> If they aren't, we could conclude that millions of TOC entries
>> isn't a problem.
> I'd actually be rather more concerned about the effects on normal
> query plan times, or are you confident that won't be an issue?

This is only a question of what happens internally in pg_dump and
pg_restore --- I'm not suggesting we change anything on the database
side.
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Empty is fine.
I'll get started.
-Kevin


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> I'll get started.
After a couple false starts, the creation of the millions of tables
is underway.  At the rate it's going, it won't finish for 8.2 hours,
so I'll have to come in and test the dump tomorrow morning.
-Kevin


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/01/23 5:12), Tom Lane wrote:
> KaiGai Kohei<kaigai@ak.jp.nec.com>  writes:
>> The attached patch is a revised version.
> 
> I'm inclined to wonder whether this patch doesn't prove that we've
> reached the end of the line for the current representation of blobs
> in pg_dump archives.  The alternative that I'm thinking about is to
> treat each blob as an independent object (hence, with its own TOC
> entry).  If we did that, then the standard pg_dump mechanisms for
> ownership, ACLs, and comments would apply, and we could get rid of
> the messy hacks that this patch is just adding to.  That would also
> open the door to future improvements like being able to selectively
> restore blobs.  (Actually you could do it immediately if you didn't
> mind editing a -l file...)  And it would for instance allow loading
> of blobs to be parallelized.

I also think it is better approach than the current blob representation.

> Now the argument against that is that it won't scale terribly well
> to situations with very large numbers of blobs.  However, I'm not
> convinced that the current approach of cramming them all into one
> TOC entry scales so well either.  If your large objects are actually
> large, there's not going to be an enormous number of them.  We've
> heard of people with many tens of thousands of tables, and pg_dump
> speed didn't seem to be a huge bottleneck for them (at least not
> in recent versions).  So I'm feeling we should not dismiss the
> idea of one TOC entry per blob.

Even if the database contains massive number of large objects, all the
pg_dump has to manege on RAM is its metadata, not data contents.
If we have one TOC entry per blob, the amount of total i/o size between
server and pg_dump is not different from the current approach.

If we assume one TOC entry consume 64 bytes of RAM, it needs 450MB of
RAM for 7 million BLOBs.
In the recent computers, is it unacceptable pain?
If you try to dump TB class database, I'd like to assume the machine
where pg_dump runs has adequate storage and RAM.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Do you have the opportunity to try an experiment on hardware
>>> similar to what you're running that on?  Create a database with
>>> 7 million tables and see what the dump/restore times are like,
>>> and whether pg_dump/pg_restore appear to be CPU-bound or
>>> memory-limited when doing it.
>  
>> If these can be empty (or nearly empty) tables, I can probably
>> swing it as a background task.  You didn't need to match the
>> current 1.3 TB database size I assume?
> 
> Empty is fine.
After about 15 hours of run time it was around 5.5 million tables;
the rate of creation had slowed rather dramatically.  I did create
them with primary keys (out of habit) which was probably the wrong
thing.  I canceled the table creation process and started a VACUUM
ANALYZE, figuring that we didn't want any hint-bit writing or bad
statistics confusing the results.  That has been running for 30
minutes with 65 MB to 140 MB per second disk activity, mixed read
and write.  After a few minutes that left me curious just how big
the database was, so I tried:
select pg_size_pretty(pg_database_size('test'));
I did a Ctrl+C after about five minutes and got:
Cancel request sent
but it didn't return for 15 or 20 minutes.  Any attempt to query
pg_locks stalls.  Tab completion stalls.  (By the way, this is not
related to the false alarm on that yesterday, which was a result of
my attempting tab completion from within a failed transaction, which
just found nothing rather than stalling.)
So I'm not sure whether I can get to a state suitable for starting
the desired test, but I'll stay with a for a while.
-Kevin


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> So I'm not sure whether I can get to a state suitable for starting
> the desired test, but I'll stay with a for a while.
I have other commitments today, so I'm going to leave the VACUUM
ANALYZE running and come back tomorrow morning to try the pg_dump.
-Kevin


Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> ...  After a few minutes that left me curious just how big
> the database was, so I tried:
> select pg_size_pretty(pg_database_size('test'));
> I did a Ctrl+C after about five minutes and got:
> Cancel request sent
> but it didn't return for 15 or 20 minutes.

Hm, we probably are lacking CHECK_FOR_INTERRUPTS in the inner loops in
dbsize.c ...
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
KaiGai Kohei <kaigai@kaigai.gr.jp> writes:
> (2010/01/23 5:12), Tom Lane wrote:
>> Now the argument against that is that it won't scale terribly well
>> to situations with very large numbers of blobs.

> Even if the database contains massive number of large objects, all the
> pg_dump has to manege on RAM is its metadata, not data contents.

I'm not so worried about the amount of RAM needed as whether pg_dump's
internal algorithms will scale to large numbers of TOC entries.  Any
O(N^2) behavior would be pretty painful, for example.  No doubt we could
fix any such problems, but it might take more work than we want to do
right now.
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not so worried about the amount of RAM needed as whether
> pg_dump's internal algorithms will scale to large numbers of TOC
> entries.  Any O(N^2) behavior would be pretty painful, for
> example.  No doubt we could fix any such problems, but it might
> take more work than we want to do right now.
I'm afraid pg_dump didn't get very far with this before:
pg_dump: WARNING:  out of shared memory
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.
pg_dump: The command was: LOCK TABLE public.test2672 IN ACCESS SHARE
MODE
Given how fast it happened, I suspect that it was 2672 tables into
the dump, versus 26% of the way through 5.5 million tables.
A sampling of the vmstat 1 output lines in "baseline" state --
before the dump started:
procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------1  0 319804 583656  23372 124473248    0    0 17224    10 1742 18995 
9  1 88  2  03  1 319804 595840  23368 124458856    0    0 17016    10 2014 22965 
9  1 89  1  01  0 319804 586912  23376 124469128    0    0 16808   158 1807 19181 
8  1 89  2  02  0 319804 576304  23368 124479416    0    0 16840     5 1764 19136 
8  1 90  1  00  1 319804 590480  23364 124459888    0    0  1488   130 3449 13844 
2  1 93  3  00  1 319804 589476  23364 124460912    0    0  1456   115 3328 11800 
2  1 94  4  01  0 319804 588468  23364 124461944    0    0  1376   146 3156 11770 
2  1 95  2  01  1 319804 587836  23364 124465024    0    0  1576   133 3599 14797 
3  1 94  3  0
While it was running:
procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------2  1 429080 886244  23308 111242464    0    0 25684    38 2920 18847 
7  3 85  5  02  1 429080 798172  23308 111297976    0    0 40024    26 1342 16967
13  2 82  4  02  1 429080 707708  23308 111357600    0    0 42520    34 1588 19148
13  2 81  4  00  5 429080 620700  23308 111414144    0    0 40272 73863 1434 18077
12  2 80  6  01  5 429080 605616  23308 111425448    0    0  6920 131232  729 5187 
3  1 66 31  00  6 429080 582852  23316 111442912    0    0 10840 131248  665 4987 
3  1 66 30  02  4 429080 584976  23308 111433672    0    0  9776 139416  693 7890 
4  1 66 29  00  5 429080 575752  23308 111436752    0    0 10776 131217  647 6157 
3  1 66 30  01  3 429080 583768  23308 111420304    0    0 13616 90352 1043 13047 
6  1 68 25  04  0 429080 578888  23300 111397696    0    0 40000    44 1347 25329
12  2 79  6  02  1 429080 582368  23292 111367896    0    0 40320    76 1517 28628
13  2 80  5  02  0 429080 584960  23276 111338096    0    0 40240   163 1374 26988
13  2 80  5  06  0 429080 576176  23268 111319600    0    0 40328   170 1465 27229
13  2 80  5  04  0 429080 583212  23212 111288816    0    0 39568   138 1418 27296
13  2 80  5  0
This box has 16 CPUs, so the jump from 3% user CPU to 13% with an
increase of I/O wait from 3% to 5% suggests that pg_dump was
primarily CPU bound in user code before the crash.
I can leave this database around for a while if there are other
things you would like me to try.
-Kevin


Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> I'm afraid pg_dump didn't get very far with this before:
> pg_dump: WARNING:  out of shared memory
> pg_dump: SQL command failed
> Given how fast it happened, I suspect that it was 2672 tables into
> the dump, versus 26% of the way through 5.5 million tables.

Yeah, I didn't think about that.  You'd have to bump
max_locks_per_transaction up awfully far to get to where pg_dump
could dump millions of tables, because it wants to lock each one.

It might be better to try a test case with lighter-weight objects,
say 5 million simple functions.
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It might be better to try a test case with lighter-weight objects,
> say 5 million simple functions.
So the current database is expendable?  I'd just as soon delete it
before creating the other one, if you're fairly confident the other
one will do it.
-Kevin


Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It might be better to try a test case with lighter-weight objects,
>> say 5 million simple functions.
> So the current database is expendable?

Yeah, I think it was a bad experimental design anyway...
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It might be better to try a test case with lighter-weight objects,
> say 5 million simple functions.
A dump of that quickly settled into running a series of these:
SELECT proretset, prosrc, probin,
pg_catalog.pg_get_function_arguments(oid) AS funcargs,
pg_catalog.pg_get_fun
ction_identity_arguments(oid) AS funciargs,
pg_catalog.pg_get_function_result(oid) AS funcresult, proiswindow,
provolatile, proisstrict, prosecdef, proconfig, procost, prorows, (S
ELECT lanname FROM pg_catalog.pg_language WHERE oid = prolang) AS
lanname FROM pg_catalog.pg_proc WHERE oid =
'1404528'::pg_catalog.oid
(with different oid values, of course).
Is this before or after the point you were worried about.  Anything
in particular for which I should be alert?
-Kevin


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> It might be better to try a test case with lighter-weight objects,
> say 5 million simple functions.
Said dump ran in about 45 minutes with no obvious stalls or
problems.  The 2.2 GB database dumped to a 1.1 GB text file, which
was a little bit of a surprise.
-Kevin


Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
>> It might be better to try a test case with lighter-weight objects,
>> say 5 million simple functions.
> Said dump ran in about 45 minutes with no obvious stalls or
> problems.  The 2.2 GB database dumped to a 1.1 GB text file, which
> was a little bit of a surprise.

Did you happen to notice anything about pg_dump's memory consumption?
For an all-DDL case like this, I'd sort of expect the memory usage to
be comparable to the output file size.

Anyway this seems to suggest that we don't have any huge problem with
large numbers of archive TOC objects, so the next step probably is to
look at how big a code change it would be to switch over to
TOC-per-blob.
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Did you happen to notice anything about pg_dump's memory
> consumption?
Not directly, but I was running 'vmstat 1' throughout.  Cache space
dropped about 2.1 GB while it was running and popped back up to the
previous level at the end.
-Kevin


Re: Largeobject Access Controls (r2460)

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  
>> Did you happen to notice anything about pg_dump's memory
>> consumption?
>  
> Not directly, but I was running 'vmstat 1' throughout.  Cache
> space dropped about 2.1 GB while it was running and popped back up
> to the previous level at the end.
I took a closer look, and there's some bad news, I think.  The above
numbers were from the ends of the range.  I've gone back over and
found that while it dropped about 2.1 GB almost immediately, cache
usage slowly dropped throughout the dump, and bottomed at about 6.9
GB below baseline.
-Kevin


Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Did you happen to notice anything about pg_dump's memory
>>> consumption?
> I took a closer look, and there's some bad news, I think.  The above
> numbers were from the ends of the range.  I've gone back over and
> found that while it dropped about 2.1 GB almost immediately, cache
> usage slowly dropped throughout the dump, and bottomed at about 6.9
> GB below baseline.

OK, that's still not very scary --- it just means my off-the-cuff
estimate of 1:1 space usage was bad.  3:1 isn't that surprising either
given padding and other issues.  The representation of ArchiveEntries
could probably be made a bit more compact ...
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
The attached patch uses one TOC entry for each blob objects.

It adds two new section types.

* "BLOB ITEM"

This section provides properties of a certain large object.
It contains a query to create an empty large object, and restore
ownership of the large object, if necessary.

|  --
|  -- Name: 16406; Type: BLOB ITEM; Schema: -; Owner: ymj
|  --
|
|  SELECT lo_create(16406);
|
|  ALTER LARGE OBJECT "16406" OWNER TO ymj;

The comment descriptions were moved to COMMENT section, like any other
object classes.

| --
| -- Name: LARGE OBJECT 16406; Type: COMMENT; Schema: -; Owner: ymj
| --
|
| COMMENT ON LARGE OBJECT 16406 IS 'This is a small large object.';

Also, access privileges were moved to ACL section.

| --
| -- Name: 16405; Type: ACL; Schema: -; Owner: kaigai
| --
|
| REVOKE ALL ON LARGE OBJECT 16405 FROM PUBLIC;
| REVOKE ALL ON LARGE OBJECT 16405 FROM kaigai;
| GRANT ALL ON LARGE OBJECT 16405 TO kaigai;
| GRANT ALL ON LARGE OBJECT 16405 TO PUBLIC;


* "BLOB DATA"

This section is same as existing "BLOBS" section, except for _LoadBlobs()
does not create a new large object before opening it with INV_WRITE, and
lo_truncate() will be used instead of lo_unlink() when --clean is given.

The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read
for compatibility, but newer pg_dump never create these sections.


Internally, the getBlobs() scans all the blobs and makes DO_BLOB_ITEM
entries for each blobs and a DO_BLOB_DATA entry if the database contains
a large object at least.

The _PrintTocData() handles both of "BLOBS" and "BLOB DATA" sections.
If the given section is "BLOB DATA", it calls _LoadBlobs() of the specified
format with compat = false, because this section is new style.

In this case, _LoadBlobs() does not create a large object before opening
it with INV_WRITE, because "BLOB ITEM" section already create an empty
large obejct.

And DropBlobIfExists() was renamed to CleanupBlobIfExists(), because it
is modified to apply lo_truncate() if "BLOB DATA" section.
When --clean is given, SELECT lo_unlink(xxx) will be injected on the head
of queries to store, instead of the mid-flow of loading blobs.

One remained issue is the way to decide whether blobs to be dumped, or not.
Right now, --schema-only eliminate all the blob dumps.
However, I think it should follow the manner in any other object classes.

 -a, --data-only    ... only "BLOB DATA" sections, not "BLOB ITEM"
 -s, --schema-only  ... only "BLOB ITEM" sections, not "BLOB DATA"
 -b, --blobs        ... both of "BLOB ITEM" and "BLOB DATA" independently
                        from --data-only and --schema-only?

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> The attached patch uses one TOC entry for each blob objects.

When I'm testing the new patch, I found "ALTER LARGE OBJECT" command
returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT"
instead?  As I remember, we had decided not to use LARGEOBJECT
(without a space) in user-visible messages, right?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
Tom Lane
Date:
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> When I'm testing the new patch, I found "ALTER LARGE OBJECT" command
> returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT"
> instead?  As I remember, we had decided not to use LARGEOBJECT
> (without a space) in user-visible messages, right?

The command tag should match the actual command.  If the command name
is "ALTER LARGE OBJECT", the command tag should be too.  This is
independent of phraseology we might choose in error messages (though
I agree I don't like "largeobject" in those either).
        regards, tom lane


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/01/28 18:21), Takahiro Itagaki wrote:
>
> KaiGai Kohei<kaigai@ak.jp.nec.com>  wrote:
>
>> The attached patch uses one TOC entry for each blob objects.
>
> When I'm testing the new patch, I found "ALTER LARGE OBJECT" command
> returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT"
> instead?  As I remember, we had decided not to use LARGEOBJECT
> (without a space) in user-visible messages, right?

Sorry, I left for fix this tag when I was pointed out LARGEOBJECT should
be LARGE(space)OBJECT.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> > When I'm testing the new patch, I found "ALTER LARGE OBJECT" command
> > returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT"
> > instead?
> 
> Sorry, I left for fix this tag when I was pointed out LARGEOBJECT should
> be LARGE(space)OBJECT.

Committed. Thanks.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> The attached patch uses one TOC entry for each blob objects.

This patch does not only fix the existing bugs, but also refactor
the dump format of large objects in pg_dump. The new format are
more similar to the format of tables:
Section    <Tables>     <New LO>    <Old LO>
-----------------------------------------------------Schema     "TABLE"      "BLOB ITEM" "BLOBS"Data       "TABLE DATA"
"BLOBDATA" "BLOBS"Comments   "COMMENT"    "COMMENT"   "BLOB COMMENTS"
 

We will allocate BlobInfo in memory for each large object. It might
consume much more memory compared with former versions if we have
many large objects, but we discussed it is an acceptable change.

As far as I read, the patch is almost ready to commit
except the following issue about backward compatibility:

> * "BLOB DATA"
> This section is same as existing "BLOBS" section, except for _LoadBlobs()
> does not create a new large object before opening it with INV_WRITE, and
> lo_truncate() will be used instead of lo_unlink() when --clean is given.

> The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read
> for compatibility, but newer pg_dump never create these sections.

I wonder we need to support older versions in pg_restore. You add a check
"PQserverVersion >= 80500" in CleanupBlobIfExists(), but out documentation
says we cannot use pg_restore 9.0 for 8.4 or older servers:

http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html
| it is not guaranteed that pg_dump's output can be loaded into
| a server of an older major version

Can we remove such path and raise an error instead?
Also, even if we support the older servers in the routine,
the new bytea format will be another problem anyway.


> One remained issue is the way to decide whether blobs to be dumped, or not.
> Right now, --schema-only eliminate all the blob dumps.
> However, I think it should follow the manner in any other object classes.
> 
>  -a, --data-only    ... only "BLOB DATA" sections, not "BLOB ITEM"
>  -s, --schema-only  ... only "BLOB ITEM" sections, not "BLOB DATA"
>  -b, --blobs        ... both of "BLOB ITEM" and "BLOB DATA" independently
>                         from --data-only and --schema-only?

I cannot image situations that require data-only dumps -- for example,
restoring database has a filled pg_largeobject_metadata and an empty
or broken pg_largeobject -- but it seems to be unnatural.

I'd prefer to keep the existing behavior: * default or data-only : dump all attributes and data of blobs * schema-only
       : don't dump any blobs
 
and have independent options to control blob dumps: * -b, --blobs    : dump all blobs even if schema-only * -B,
--no-blobs: [NEW] don't dump any blobs even if default or data-only
 

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/01 14:19), Takahiro Itagaki wrote:
> As far as I read, the patch is almost ready to commit
> except the following issue about backward compatibility:
> 
>> * "BLOB DATA"
>> This section is same as existing "BLOBS" section, except for _LoadBlobs()
>> does not create a new large object before opening it with INV_WRITE, and
>> lo_truncate() will be used instead of lo_unlink() when --clean is given.
> 
>> The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read
>> for compatibility, but newer pg_dump never create these sections.
> 
> I wonder we need to support older versions in pg_restore. You add a check
> "PQserverVersion>= 80500" in CleanupBlobIfExists(), but out documentation
> says we cannot use pg_restore 9.0 for 8.4 or older servers:
> 
> http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html
> | it is not guaranteed that pg_dump's output can be loaded into
> | a server of an older major version
> 
> Can we remove such path and raise an error instead?
> Also, even if we support the older servers in the routine,
> the new bytea format will be another problem anyway.

OK, I'll fix it.

>> One remained issue is the way to decide whether blobs to be dumped, or not.
>> Right now, --schema-only eliminate all the blob dumps.
>> However, I think it should follow the manner in any other object classes.
>>
>>   -a, --data-only    ... only "BLOB DATA" sections, not "BLOB ITEM"
>>   -s, --schema-only  ... only "BLOB ITEM" sections, not "BLOB DATA"
>>   -b, --blobs        ... both of "BLOB ITEM" and "BLOB DATA" independently
>>                          from --data-only and --schema-only?
> 
> I cannot image situations that require data-only dumps -- for example,
> restoring database has a filled pg_largeobject_metadata and an empty
> or broken pg_largeobject -- but it seems to be unnatural.

Indeed, it might not be a sane situation.

However, we can assume the situation that user wants to backup a database
into two separated files (one for schema definition; one for data contents).
Just after restoring schema definitions, all the large obejcts are created
as empty blobs. Then, we can restore their data contents.

I wonder if the behavior is easily understandable for end users.
The "BLOB ITEM" section contains properties of a certain large obejct,
such as identifier (loid), comment, ownership and access privileges.
These are categorized to schema definitions in other object classes,
but we still need special treatment for blobs.

The --schema-only with large objects might be unnatural, but the
--data-only with properties of large objects are also unnatural.
Which behavior is more unnatural?

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> > Can we remove such path and raise an error instead?
> > Also, even if we support the older servers in the routine,
> > the new bytea format will be another problem anyway.
> 
> OK, I'll fix it.

I think we might need to discuss about explicit version checks in pg_restore.
It is not related with large objects, but with pg_restore's capability.
We've not supported to restore a dump to older servers, but we don't have any
version checks, right? Should we do the checks at the beginning of restoring?
If we do so, LO patch could be more simplified.


> The --schema-only with large objects might be unnatural, but the
> --data-only with properties of large objects are also unnatural.
> Which behavior is more unnatural?

I think large object metadata is a kind of row-based access controls.
How do we dump and restore ACLs per rows when we support them for
normal tables? We should treat LO metadata as same as row-based ACLs.
In my opinion, I'd like to treat them as part of data (not of schema).

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/02 9:33), Takahiro Itagaki wrote:
> 
> KaiGai Kohei<kaigai@ak.jp.nec.com>  wrote:
> 
>>> Can we remove such path and raise an error instead?
>>> Also, even if we support the older servers in the routine,
>>> the new bytea format will be another problem anyway.
>>
>> OK, I'll fix it.
> 
> I think we might need to discuss about explicit version checks in pg_restore.
> It is not related with large objects, but with pg_restore's capability.
> We've not supported to restore a dump to older servers, but we don't have any
> version checks, right? Should we do the checks at the beginning of restoring?
> If we do so, LO patch could be more simplified.

I agree it is a good idea.

>> The --schema-only with large objects might be unnatural, but the
>> --data-only with properties of large objects are also unnatural.
>> Which behavior is more unnatural?
> 
> I think large object metadata is a kind of row-based access controls.
> How do we dump and restore ACLs per rows when we support them for
> normal tables? We should treat LO metadata as same as row-based ACLs.
> In my opinion, I'd like to treat them as part of data (not of schema).

OK, I'll update the patch according to the behavior you suggested.
| I'd prefer to keep the existing behavior:
|   * default or data-only : dump all attributes and data of blobs
|   * schema-only          : don't dump any blobs
| and have independent options to control blob dumps:
|   * -b, --blobs    : dump all blobs even if schema-only
|   * -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only

Please wait for a while. Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
>>> The --schema-only with large objects might be unnatural, but the
>>> --data-only with properties of large objects are also unnatural.
>>> Which behavior is more unnatural?
>>
>> I think large object metadata is a kind of row-based access controls.
>> How do we dump and restore ACLs per rows when we support them for
>> normal tables? We should treat LO metadata as same as row-based ACLs.
>> In my opinion, I'd like to treat them as part of data (not of schema).
> 
> OK, I'll update the patch according to the behavior you suggested.
> | I'd prefer to keep the existing behavior:
> |   * default or data-only : dump all attributes and data of blobs
> |   * schema-only          : don't dump any blobs
> | and have independent options to control blob dumps:
> |   * -b, --blobs    : dump all blobs even if schema-only
> |   * -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only

I found out it needs special treatments to dump comments and access
privileges of blobs. See dumpACL() and dumpComment()

| static void
| dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
|         const char *type, const char *name, const char *subname,
|         const char *tag, const char *nspname, const char *owner,
|         const char *acls)
| {
|     PQExpBuffer sql;
|
|     /* Do nothing if ACL dump is not enabled */
|     if (dataOnly || aclsSkip)
|         return;
|     :

| static void
| dumpComment(Archive *fout, const char *target,
|             const char *namespace, const char *owner,
|             CatalogId catalogId, int subid, DumpId dumpId)
| {
|     CommentItem *comments;
|     int         ncomments;
|
|     /* Comments are SCHEMA not data */
|     if (dataOnly)
|         return;
|     :

In addition, _printTocEntry() is not called with acl_pass = true,
when --data-only is given.

I again wonder whether we are on the right direction.

Originally, the reason why we decide to use per blob toc entry was
that "BLOB ACLS" entry needs a few exceptional treatments in the code.
But, if we deal with "BLOB ITEM" entry as data contents, it will also
need additional exceptional treatments.

Indeed, even if we have row-level ACLs, it will be dumped in data section
without separating them into properties and data contents because of the
restriction on implementation, not a data modeling reason.

Many of empty large objects may not be sane situation, but it is suitable
for the existing manner in pg_dump/pg_restore, at least.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
2010/2/1 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> I again wonder whether we are on the right direction.

I believe the proposed approach is to dump blob metadata if and only
if you are also dumping blob contents, and to do all of this for data
dumps but not schema dumps.  That seems about right to me.

> Originally, the reason why we decide to use per blob toc entry was
> that "BLOB ACLS" entry needs a few exceptional treatments in the code.
> But, if we deal with "BLOB ITEM" entry as data contents, it will also
> need additional exceptional treatments.

But the new ones are less objectionable, maybe.

...Robert


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/04 0:20), Robert Haas wrote:
> 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>> I again wonder whether we are on the right direction.
> 
> I believe the proposed approach is to dump blob metadata if and only
> if you are also dumping blob contents, and to do all of this for data
> dumps but not schema dumps.  That seems about right to me.

In other words:
<default>     -> blob contents and metadata (owner, acl, comments) shall                 be dumped--data-only   -> only
blobcontents shall be dumped--schema-only -> neither blob contents and metadata are dumped.
 

Can I understand correctly?

>> Originally, the reason why we decide to use per blob toc entry was
>> that "BLOB ACLS" entry needs a few exceptional treatments in the code.
>> But, if we deal with "BLOB ITEM" entry as data contents, it will also
>> need additional exceptional treatments.
> 
> But the new ones are less objectionable, maybe.
> 
> ...Robert
> 


-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/04 17:30), KaiGai Kohei wrote:
> (2010/02/04 0:20), Robert Haas wrote:
>> 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>>> I again wonder whether we are on the right direction.
>>
>> I believe the proposed approach is to dump blob metadata if and only
>> if you are also dumping blob contents, and to do all of this for data
>> dumps but not schema dumps.  That seems about right to me.
>
> In other words:
>
>   <default>      ->  blob contents and metadata (owner, acl, comments) shall
>                    be dumped
>   --data-only   ->  only blob contents shall be dumped
>   --schema-only ->  neither blob contents and metadata are dumped.
>
> Can I understand correctly?

The attached patch enables not to dump "BLOB ITEM" section and corresponding
metadata when --data-only is specified. In addition, it does not output
both "BLOB DATA" and "BLOB ITEM" section when --schema-only is specified.

When --data-only is given to pg_dump, it does not construct any DO_BLOB_ITEM
entries in getBlobs(), so all the metadata (owner, acls, comment) are not
dumped. And it writes legacy "BLOBS" section instead of the new "BLOB DATA"
section to inform pg_restore this archive does not create large objects in
"BLOB ITEM" section.
If --schema-only is given, getBlobs() is simply skipped.

When --data-only is given to pg_restore, it skips all the "BLOB ITEM" sections.
Large objects are created in _LoadBlobs() instead of the section, like as we
have done until now.
The _LoadBlobs() takes the third argument which specifies whether we should
create large object here, or not. Its condition is a bit modified from the
previous patch.

  if (strcmp(te->desc, "BLOBS") == 0 || ropt->dataOnly)
      _LoadBlobs(AH, ropt, true);    ^^^^^^^^^^^^^^^^^
  else if (strcmp(te->desc, "BLOB DATA") == 0)
      _LoadBlobs(AH, ropt, false);

When --data-only is given to pg_restore, "BLOB ITEM" secition is skipped,
so we need to create large objects at _LoadBlobs() stage, even if the
archive has "BLOB DATA" section.

In addition, --schema-only kills all the "BLOB ITEM" section using a special
condition that was added to _tocEntryRequired().

It might be a bit different from what Itagaki-san suggested, because
"BLOB ITEM" section is still in SECTION_PRE_DATA section.
However, it minimizes special treatments in the code, and no differences
from the viewpoint of end-users.

Or, is it necessary to pack them into SECTION_DATA section anyway?

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Largeobject Access Controls (r2460)

From
Robert Haas
Date:
2010/2/4 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> (2010/02/04 0:20), Robert Haas wrote:
>> 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>>> I again wonder whether we are on the right direction.
>>
>> I believe the proposed approach is to dump blob metadata if and only
>> if you are also dumping blob contents, and to do all of this for data
>> dumps but not schema dumps.  That seems about right to me.
>
> In other words:
>
>  <default>     -> blob contents and metadata (owner, acl, comments) shall
>                  be dumped
>  --data-only   -> only blob contents shall be dumped
>  --schema-only -> neither blob contents and metadata are dumped.
>
> Can I understand correctly?

No, that's not what I said.  Please reread.  I don't think you should
ever dump blob contents without the metadata, or the other way around.

...Robert


Re: Largeobject Access Controls (r2460)

From
Alvaro Herrera
Date:
Robert Haas escribió:
> 2010/2/4 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> > (2010/02/04 0:20), Robert Haas wrote:
> >> 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
> >>> I again wonder whether we are on the right direction.
> >>
> >> I believe the proposed approach is to dump blob metadata if and only
> >> if you are also dumping blob contents, and to do all of this for data
> >> dumps but not schema dumps.  That seems about right to me.
> >
> > In other words:
> >
> >  <default>     -> blob contents and metadata (owner, acl, comments) shall
> >                  be dumped
> >  --data-only   -> only blob contents shall be dumped
> >  --schema-only -> neither blob contents and metadata are dumped.
> >
> > Can I understand correctly?
> 
> No, that's not what I said.  Please reread.  I don't think you should
> ever dump blob contents without the metadata, or the other way around.

So:default:    both contents and metadata--data-only:    same--schema-only:    neither

Seems reasonable.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/05 3:27), Alvaro Herrera wrote:
> Robert Haas escribió:
>> 2010/2/4 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>>> (2010/02/04 0:20), Robert Haas wrote:
>>>> 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
>>>>> I again wonder whether we are on the right direction.
>>>>
>>>> I believe the proposed approach is to dump blob metadata if and only
>>>> if you are also dumping blob contents, and to do all of this for data
>>>> dumps but not schema dumps.  That seems about right to me.
>>>
>>> In other words:
>>>
>>>   <default>       ->  blob contents and metadata (owner, acl, comments) shall
>>>                   be dumped
>>>   --data-only   ->  only blob contents shall be dumped
>>>   --schema-only ->  neither blob contents and metadata are dumped.
>>>
>>> Can I understand correctly?
>>
>> No, that's not what I said.  Please reread.  I don't think you should
>> ever dump blob contents without the metadata, or the other way around.
>
> So:
>     default:    both contents and metadata
>     --data-only:    same
>     --schema-only:    neither
>
> Seems reasonable.

OK... I'll try to update the patch, anyway.

However, it means only large object performs an exceptional object class
that dumps its owner, acl and comment even if --data-only is given.
Is it really what you suggested, isn't it?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:

> >     default:    both contents and metadata
> >     --data-only:    same
> >     --schema-only:    neither
> 
> However, it means only large object performs an exceptional object class
> that dumps its owner, acl and comment even if --data-only is given.
> Is it really what you suggested, isn't it?

I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
even if we will take the all-or-nothing behavior. Can we handle
BLOB's owner, acl, comment and data with one entry kind?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/05 13:53), Takahiro Itagaki wrote:
> 
> KaiGai Kohei<kaigai@kaigai.gr.jp>  wrote:
> 
>>>     default:    both contents and metadata
>>>     --data-only:    same
>>>     --schema-only:    neither
>>
>> However, it means only large object performs an exceptional object class
>> that dumps its owner, acl and comment even if --data-only is given.
>> Is it really what you suggested, isn't it?
> 
> I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
> even if we will take the all-or-nothing behavior. Can we handle
> BLOB's owner, acl, comment and data with one entry kind?

Is it possible to fetch a certain blob from tar/custom archive
when pg_restore found a toc entry of the blob?

Currently, when pg_restore found a "BLOB DATA" or "BLOBS" entry,
it opens the archive and restores all the blob objects sequentially.
It seems to me we also have to rework the custom format........

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/05 13:53), Takahiro Itagaki wrote:
> 
> KaiGai Kohei<kaigai@kaigai.gr.jp>  wrote:
> 
>>>     default:    both contents and metadata
>>>     --data-only:    same
>>>     --schema-only:    neither
>>
>> However, it means only large object performs an exceptional object class
>> that dumps its owner, acl and comment even if --data-only is given.
>> Is it really what you suggested, isn't it?
> 
> I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
> even if we will take the all-or-nothing behavior. Can we handle
> BLOB's owner, acl, comment and data with one entry kind?

I looked at the corresponding code.

Currently, we have three _LoadBlobs() variations in pg_backup_tar.c,
pg_backup_files.c and pg_backup_custom.c.

In the _tar.c and _files.c case, we can reasonably fetch data contents
of the blob to be restored. All we need to do is to provide an explicit
filename to the tarOpen() function, and a blob is not necessary to be
restored sequentially.
It means pg_restore can restore an arbitrary file when it found a new
unified blob entry.

In the _custom.c case, its _LoadBlobs() is called from _PrintTocData()
when the given TocEntry is "BLOBS", and it tries to load the following
multiple blobs. However, I could not find any restriction that custom
format cannot have multiple "BLOBS" section. In other word, we can
write out multiple sections with a blob for each a new unified blob entry.

Right now, it seems to me it is feasible to implement what you suggested.

The matter is whether we should do it, or not.
At least, it seems to me better than some of exceptional treatments in
pg_dump and pg_restore from the perspective of design.

What is your opinion?

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/05 13:53), Takahiro Itagaki wrote:
>
> KaiGai Kohei<kaigai@kaigai.gr.jp>  wrote:
>
>>>     default:    both contents and metadata
>>>     --data-only:    same
>>>     --schema-only:    neither
>>
>> However, it means only large object performs an exceptional object class
>> that dumps its owner, acl and comment even if --data-only is given.
>> Is it really what you suggested, isn't it?
>
> I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
> even if we will take the all-or-nothing behavior. Can we handle
> BLOB's owner, acl, comment and data with one entry kind?

The attached patch was a refactored one according to the suggestion.

In the default or --data-only, it dumps data contents of large objects
and its properties (owner, comment and access privileges), but it dumps
nothing when --schema-only is given.

    default:    both contents and metadata
    --data-only:    same
    --schema-only:    neither

It replaced existing "BLOBS" and "BLOB COMMENTS" sections by the new
"LARGE OBJECT" section which is associated with a certain large object.
Its section header contains OID of the large object to be restored, so
the pg_restore tries to load the specified large object from the given
archive.

_PrintTocData() handlers were modified to support the "LARGE OBJECT"
section that loads the specified large object only, not whole of the
archived ones like "BLOBS". It also support to read "BLOBS" and "BLOB
COMMENTS" sections, but never write out these legacy sections any more.

The archive file will never contain "blobs.toc", because we can find
OID of the large objects to be restored in the section header, without
any special purpose files. It also allows to omit _StartBlobs() and
_EndBlobs() method in tar and file format.

Basically, I like this approach more than the previous combination of
"BLOB ITEM" and "BLOB DATA".

However, we have a known issue here.
The ACL section is categorized to REQ_SCHEMA in _tocEntryRequired(),
so we cannot dump them when --data-only options, even if large object
itself is dumped out. Of course, we can solve it with putting a few more
exceptional treatments, although it is not graceful.
However, it seems to me the matter comes from that _tocEntryRequired()
can only returns a mask of REQ_SCHEMA and REQ_DATA. Right now, it is
not easy to categorize ACL/COMMENT section into either of them.
I think we should consider REQ_ACL and REQ_COMMENT to inform caller
whether the appeared section to be dumped now, or not.

Any idea?

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Largeobject Access Controls (r2460)

From
Alvaro Herrera
Date:
Takahiro Itagaki escribió:
> 
> KaiGai Kohei <kaigai@kaigai.gr.jp> wrote:
> 
> > >     default:    both contents and metadata
> > >     --data-only:    same
> > >     --schema-only:    neither
> > 
> > However, it means only large object performs an exceptional object class
> > that dumps its owner, acl and comment even if --data-only is given.
> > Is it really what you suggested, isn't it?
> 
> I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
> even if we will take the all-or-nothing behavior. Can we handle
> BLOB's owner, acl, comment and data with one entry kind?

I don't think this is necessarily a good idea.  We might decide to treat
both things separately in the future and it having them represented
separately in the dump would prove useful.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/08 22:23), Alvaro Herrera wrote:
> Takahiro Itagaki escribió:
>>
>> KaiGai Kohei<kaigai@kaigai.gr.jp>  wrote:
>>
>>>>     default:    both contents and metadata
>>>>     --data-only:    same
>>>>     --schema-only:    neither
>>>
>>> However, it means only large object performs an exceptional object class
>>> that dumps its owner, acl and comment even if --data-only is given.
>>> Is it really what you suggested, isn't it?
>>
>> I wonder we still need to have both "BLOB ITEM" and "BLOB DATA"
>> even if we will take the all-or-nothing behavior. Can we handle
>> BLOB's owner, acl, comment and data with one entry kind?
>
> I don't think this is necessarily a good idea.  We might decide to treat
> both things separately in the future and it having them represented
> separately in the dump would prove useful.

I agree. From design perspective, the single section approach is more
simple than dual section, but its change set is larger than the dual.

The attached patch revised the previous implementation which have
two types of sections, to handle options correctly, as follows:

    * default:        both contents and metadata
    * --data-only:        same
    * --schema-only:    neither

Below is the points to be introduced.

The _tocEntryRequired() makes its decision whether the given TocEntry
to be dumped here, or not, based on the given context.
Previously, all the sections which needs cleanups and access privileges
were not belonging to SECTION_DATA, so, data sections were ignored, even
if it needs to restore cleanup code and access privileges.

At the pg_backup_archiver.c:329 chunk, it checks whether we need to clean
up the existing object specified by the TocEntry. If the entry is "BLOB
ITEM", _tocEntryRequired() returns REQ_DATA (if --data-only given), then
it does not write out the cleanup code.
(We have to unlink the existing large objects to be restored prior to
creation of them, so it got unavailable to clean up at _StartBlob().)

At the pg_backup_archiver.c:381 chunk, it checks whether we need to restore
access privileges, or not, using the given "ACL" TocEntry. In similar way,
the caller does not expect access privileges being restored when --data-only
is given.

The _tocEntryRequired() was also modified to handle large objects correctly.
Previously, when TocEntry does not have its own dumper (except for "SEQUENCE
SET" section), it was handled as a SECTION_SCHEMA.
If the 16th argument of ArchiveEntry() was NULL, it does not have its own
dumper function, even if the section is SECTION_DATA. Also, the dumpBlobItem()
calls ArchiveEntry() without its dumper, so it is misidentified as a schema
section. The "ACL" section of large objects are also misidentified.
So, I had to add these special treatments.

The dumpACL() is a utility function to write out GRANT/REVOKE statement for
the given acl string. When --data-only is given, it returns immediately
without any works. It prevented to dump access privileges of large objects.
However, all the caller already checks "if (dataOnly)" cases prior to its
invocation. So, I removed this check from the dumpACL().

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> > I don't think this is necessarily a good idea.  We might decide to treat
> > both things separately in the future and it having them represented
> > separately in the dump would prove useful.
> 
> I agree. From design perspective, the single section approach is more
> simple than dual section, but its change set is larger than the dual.

OK.


When I tested a custom dump with pg_restore, --clean & --single-transaction
will fail with the new dump format because it always call lo_unlink()
even if the large object doesn't exist. It comes from dumpBlobItem:

! dumpBlobItem(Archive *AH, BlobInfo *binfo)
!     appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name);

The query in DropBlobIfExists() could avoid errors -- should we use it here?
| SELECT lo_unlink(oid) FROM pg_largeobject_metadata WHERE oid = %s;


BTW, --clean option is ambiguous if combined with --data-only. Restoring
large objects fails for the above reason if previous objects don't exist,
but table data are restored *without* truncation of existing data. Will
normal users expect TRUNCATE-before-load for --clean & --data-only cases?
   Present behaviors are;       Table data    - Appended. (--clean is ignored)       Large objects - End with an error
ifobject doesn't exist.   IMO, ideal behaviors are:       Table data    - Truncate existing data and load new ones.
 Large objects - Work like as MERGE (or REPLACE, UPSERT).
 

Comments?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/09 20:16), Takahiro Itagaki wrote:
>
> KaiGai Kohei<kaigai@ak.jp.nec.com>  wrote:
>
>>> I don't think this is necessarily a good idea.  We might decide to treat
>>> both things separately in the future and it having them represented
>>> separately in the dump would prove useful.
>>
>> I agree. From design perspective, the single section approach is more
>> simple than dual section, but its change set is larger than the dual.
>
> OK.
>
>
> When I tested a custom dump with pg_restore, --clean&  --single-transaction
> will fail with the new dump format because it always call lo_unlink()
> even if the large object doesn't exist. It comes from dumpBlobItem:
>
> ! dumpBlobItem(Archive *AH, BlobInfo *binfo)
> !     appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name);
>
> The query in DropBlobIfExists() could avoid errors -- should we use it here?
> | SELECT lo_unlink(oid) FROM pg_largeobject_metadata WHERE oid = %s;

Yes, we can use this query to handle --clean option.
I'll fix it soon.

> BTW, --clean option is ambiguous if combined with --data-only. Restoring
> large objects fails for the above reason if previous objects don't exist,
> but table data are restored *without* truncation of existing data. Will
> normal users expect TRUNCATE-before-load for --clean&  --data-only cases?
>
>      Present behaviors are;
>          Table data    - Appended. (--clean is ignored)
>          Large objects - End with an error if object doesn't exist.
>      IMO, ideal behaviors are:
>          Table data    - Truncate existing data and load new ones.
>          Large objects - Work like as MERGE (or REPLACE, UPSERT).
>
> Comments?

In the existing "BLOBS" section, it creates and restores large objects 
in same time. And, it also unlink existing large object (if exists)
just before restoring them, when --clean is given.

In my opinion, when --clean is given, it also should truncate the table
before restoring, even if --data-only is co-given.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: Largeobject Access Controls (r2460)

From
KaiGai Kohei
Date:
(2010/02/09 21:18), KaiGai Kohei wrote:
> (2010/02/09 20:16), Takahiro Itagaki wrote:
>>
>> KaiGai Kohei<kaigai@ak.jp.nec.com> wrote:
>>
>>>> I don't think this is necessarily a good idea. We might decide to treat
>>>> both things separately in the future and it having them represented
>>>> separately in the dump would prove useful.
>>>
>>> I agree. From design perspective, the single section approach is more
>>> simple than dual section, but its change set is larger than the dual.
>>
>> OK.
>>
>>
>> When I tested a custom dump with pg_restore, --clean&
>> --single-transaction
>> will fail with the new dump format because it always call lo_unlink()
>> even if the large object doesn't exist. It comes from dumpBlobItem:
>>
>> ! dumpBlobItem(Archive *AH, BlobInfo *binfo)
>> ! appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name);
>>
>> The query in DropBlobIfExists() could avoid errors -- should we use it
>> here?
>> | SELECT lo_unlink(oid) FROM pg_largeobject_metadata WHERE oid = %s;
>
> Yes, we can use this query to handle --clean option.
> I'll fix it soon.

The attached patch fixed up the cleanup query as follows:

+   appendPQExpBuffer(dquery,
+                     "SELECT pg_catalog.lo_unlink(oid) "
+                     "FROM pg_catalog.pg_largeobject_metadata "
+                     "WHERE oid = %s;\n", binfo->dobj.name);

And, I also noticed that lo_create() was not prefixed by "pg_catalog.",
so I also add it.

Rest of parts were not changed.

Thanks,


>> BTW, --clean option is ambiguous if combined with --data-only. Restoring
>> large objects fails for the above reason if previous objects don't exist,
>> but table data are restored *without* truncation of existing data. Will
>> normal users expect TRUNCATE-before-load for --clean& --data-only cases?
>>
>> Present behaviors are;
>> Table data - Appended. (--clean is ignored)
>> Large objects - End with an error if object doesn't exist.
>> IMO, ideal behaviors are:
>> Table data - Truncate existing data and load new ones.
>> Large objects - Work like as MERGE (or REPLACE, UPSERT).
>>
>> Comments?
>
> In the existing "BLOBS" section, it creates and restores large objects
> in same time. And, it also unlink existing large object (if exists)
> just before restoring them, when --clean is given.
>
> In my opinion, when --clean is given, it also should truncate the table
> before restoring, even if --data-only is co-given.
>
> Thanks,


--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Largeobject Access Controls (r2460)

From
Takahiro Itagaki
Date:
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> The attached patch fixed up the cleanup query as follows:
> +   appendPQExpBuffer(dquery,
> +                     "SELECT pg_catalog.lo_unlink(oid) "
> +                     "FROM pg_catalog.pg_largeobject_metadata "
> +                     "WHERE oid = %s;\n", binfo->dobj.name);
>
> And, I also noticed that lo_create() was not prefixed by "pg_catalog.",
> so I also add it.

Thanks. Now the patch is ready to commit.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center