Thread: [PATCH] Largeobject Access Controls (r2432)
The attached patch is a revised version of large object privileges based on the feedbacks at the last commit fest. List of updates: * Rebased to the latest CVS HEAD * Add pg_largeobject_aclcheck_snapshot() interface. In the read-only access mode, large object feature uses query's snaphot, not only SnapshotNow. This behavior also should be applied on accesses to its metadata. When it makes its access control decision, it has to fetch database acls from the system catalog. In this time, we scan the pg_largeobject_metadata with a snapshot of large object descriptor. (Note that it is SnapshotNow in read-write mode.) It enables to resolve the matter when access rights are changed during large objects are opened. * Add pg_dump support. * Replace all the "largeobject" by "large object" from user visible messages and documentation. * Remove ac_largeobject_*() routines because we decided not to share same entry points between DAC and MAC. * Add a description of large object privileges in SGML. * Add a definition of pg_largeobject_metadata in SGML. * \lo_list command supports both of v8.5 and prior version. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2009/11/12 KaiGai Kohei <kaigai@ak.jp.nec.com>: > The attached patch is a revised version of large object privileges > based on the feedbacks at the last commit fest. > please update the patch, it's giving an error when 'make check' is trying to "create template1" in initdb: creating template1 database in /home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1 ... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))", File: "md.c", Line: 254) child process was terminated by signal 6: Aborted Meanwhile, i will make some comments: This manual will be specific for 8.5 so i think all mentions to the version should be removed for example; + In this version, a large object has OID of its owner, access permissions + and OID of the largeobject itself. + Prior to the version 8.5.x release does not have any privilege checks on + large objects. the parameter name (large_object_privilege_checks) is confusing enough that we have to make this statements to clarify... let's think in a better less confuse name + Please note that it is not equivalent to disable all the security + checks corresponding to large objects. + For example, the <literal>lo_import()</literal> and + <literal>lo_export</literal> need superuser privileges independent + from this setting as prior versions were doing. this will not be off by default? it should be for compatibility reasons... i remember there was a discussion about that but can't remember the conclusion Mmm... One of them? the first? + The one is <literal>SELECT</literal>. + Even if a transaction modified access rights and commit it, it is + not invisible from other transaction which already opened the large + object. The other one, the second + The other is <literal>UPDATE</literal>. it seems there is an "are" that should not be there :) + + These functions are originally requires database superuser privilege, + and it allows to bypass the default database privilege checks, so + we don't need to check an obvious test twice. a typo, obviously + For largeo bjects, this privilege also allows to read from + the target large object. We have two versions of these functions one that a recieve an SnapShot parameter and other that don't... what is the rationale of this? AFAIU, the one that doesn't receive SnapShot is calling the other one with SnapShotNow, can't we simply call it that way and drop the version of the functions that doesn't have that parameter? + pg_largeobject_aclmask(Oid lobj_oid, Oid roleid, + AclMode mask, AclMaskHow how) + pg_largeobject_aclcheck(Oid lobj_oid, Oid roleid, AclMode mode) -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Thu, Dec 3, 2009 at 12:49 PM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > This manual will be specific for 8.5 so i think all mentions to the > version should be removed Not sure I agree on this point. We have similar mentions elsewhere. ...Robert
Robert Haas wrote: <blockquote cite="mid:603c8f070912030953u625784fme8ff46184c3a1831@mail.gmail.com" type="cite"><pre wrap="">OnThu, Dec 3, 2009 at 12:49 PM, Jaime Casanova <a class="moz-txt-link-rfc2396E" href="mailto:jcasanov@systemguards.com.ec"><jcasanov@systemguards.com.ec></a> wrote:</pre><blockquote type="cite"><pre wrap="">This manual will be specific for 8.5 so i think all mentions to the version should be removed </pre></blockquote><pre wrap=""> Not sure I agree on this point. We have similar mentions elsewhere. </pre></blockquote> In this particular example, it'sbad form because it's even possible that 8.5 will actually be 9.0. You don't want to refer to a version number thatdoesn't even exist for sure yet, lest it leave a loose end that needs to be cleaned up later if that number is changedbefore release.<br /><br /> Rewriting in terms like "in earlier versions..." instead is one approach. Then peoplewill have to manually scan earlier docs to sort that out, I know I end up doing that all the time. If you want tokeep the note specific, saying "in 8.4 and earlier versions [old behavior]" is better than "before 8.5 [old behavior]"because it only mentions version numbers that are historical rather than future.<br /><br /><pre class="moz-signature"cols="72">-- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support <a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a> <a class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.com">www.2ndQuadrant.com</a> </pre>
On Thu, Dec 3, 2009 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Robert Haas wrote: > On Thu, Dec 3, 2009 at 12:49 PM, Jaime Casanova > <jcasanov@systemguards.com.ec> wrote: > > This manual will be specific for 8.5 so i think all mentions to the > version should be removed > > Not sure I agree on this point. We have similar mentions elsewhere. > > In this particular example, it's bad form because it's even possible that > 8.5 will actually be 9.0. You don't want to refer to a version number that > doesn't even exist for sure yet, lest it leave a loose end that needs to be > cleaned up later if that number is changed before release. > > Rewriting in terms like "in earlier versions..." instead is one approach. > Then people will have to manually scan earlier docs to sort that out, I know > I end up doing that all the time. If you want to keep the note specific, > saying "in 8.4 and earlier versions [old behavior]" is better than "before > 8.5 [old behavior]" because it only mentions version numbers that are > historical rather than future. Ah, yes, I like "In 8.4 and earlier versions", or maybe "earlier releases". Compare: http://www.postgresql.org/docs/8.4/static/sql-copy.html#AEN55855 http://www.postgresql.org/docs/8.4/static/runtime-config-logging.html#GUC-LOG-FILENAME ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Dec 3, 2009 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote: >> In this particular example, it's bad form because it's even possible that >> 8.5 will actually be 9.0.� You don't want to refer to a version number that >> doesn't even exist for sure yet, lest it leave a loose end that needs to be >> cleaned up later if that number is changed before release. > Ah, yes, I like "In 8.4 and earlier versions", or maybe "earlier > releases". Compare: Please do *not* resort to awkward constructions just to avoid one mention of the current version number. If we did decide to call the next version 9.0, the search-and-replace effort involved is not going to be measurably affected by any one usage. There are plenty already. (I did the work when we decided to call 7.5 8.0, so I know whereof I speak.) regards, tom lane
On Thu, Dec 3, 2009 at 2:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Dec 3, 2009 at 1:23 PM, Greg Smith <greg@2ndquadrant.com> wrote: >>> In this particular example, it's bad form because it's even possible that >>> 8.5 will actually be 9.0. You don't want to refer to a version number that >>> doesn't even exist for sure yet, lest it leave a loose end that needs to be >>> cleaned up later if that number is changed before release. > >> Ah, yes, I like "In 8.4 and earlier versions", or maybe "earlier >> releases". Compare: > > Please do *not* resort to awkward constructions just to avoid one > mention of the current version number. If we did decide to call the > next version 9.0, the search-and-replace effort involved is not going > to be measurably affected by any one usage. There are plenty already. > > (I did the work when we decided to call 7.5 8.0, so I know whereof > I speak.) I agree that search and replace isn't that hard, but I don't find the proposed construction awkward, and we have various uses of it in the docs already. Actually the COPY one is not quite clear whether it means <= 7.3 or < 7.3. I think we're just aiming for consistency here as much as anything. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I agree that search and replace isn't that hard, but I don't find the > proposed construction awkward, and we have various uses of it in the > docs already. Actually the COPY one is not quite clear whether it > means <= 7.3 or < 7.3. I think we're just aiming for consistency here > as much as anything. Well, the problem is that "<= 8.4" is confusing as to whether it includes 8.4.n. You and I know that it does because we know we don't make feature changes in minor releases, but this is not necessarily obvious to everyone. "< 8.5" is much less ambiguous. regards, tom lane
On Thu, Dec 3, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I agree that search and replace isn't that hard, but I don't find the >> proposed construction awkward, and we have various uses of it in the >> docs already. Actually the COPY one is not quite clear whether it >> means <= 7.3 or < 7.3. I think we're just aiming for consistency here >> as much as anything. > > Well, the problem is that "<= 8.4" is confusing as to whether it > includes 8.4.n. You and I know that it does because we know we > don't make feature changes in minor releases, but this is not > necessarily obvious to everyone. "< 8.5" is much less ambiguous. Ah. I would not have considered that, but it does make sense. ...Robert
Robert Haas wrote: > I agree that search and replace isn't that hard, but I don't find the > proposed construction awkward, and we have various uses of it in the > docs already. Actually the COPY one is not quite clear whether it > means <= 7.3 or < 7.3. > Yeah, I wouldn't have suggested it if it made the wording particularly difficult in the process. I don't know what your issue with the COPY one is: "The following syntax was used before PostgreSQL version 7.3 and is still supported" I can't parse that as anything other than "<7.3"; now sure how can someone read that to be "<="? In any case, the two examples you gave are certainly good for showing the standard practices used here. Specific version numbers are strewn all about, and if there's commits mentioning 8.5 already in there one more won't hurt. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Jaime Casanova wrote: > 2009/11/12 KaiGai Kohei <kaigai@ak.jp.nec.com>: >> The attached patch is a revised version of large object privileges >> based on the feedbacks at the last commit fest. >> > > please update the patch, it's giving an error when 'make check' is > trying to "create template1" in initdb: > > creating template1 database in > /home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1 > ... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))", > File: "md.c", Line: 254) > child process was terminated by signal 6: Aborted I could not reproduce it. Could you run "make clean", then "make check"? Various kind of patches are merged under the commit fest, so some of them changes definition of structures. If *.o files are already built based on older definitions, it may refers incorrect addresses. > Meanwhile, i will make some comments: > > This manual will be specific for 8.5 so i think all mentions to the > version should be removed > for example; > > + In this version, a large object has OID of its owner, access permissions > + and OID of the largeobject itself. > > + Prior to the version 8.5.x release does not have any > privilege checks on > + large objects. The conclusion is unclear for me. Is "In the 8.4.x and prior release, ..." an ambiguous expression? ^^^^^ > the parameter name (large_object_privilege_checks) is confusing enough > that we have to make this statements to clarify... let's think in a > better less confuse name > + Please note that it is not equivalent to disable all the security > + checks corresponding to large objects. > + For example, the <literal>lo_import()</literal> and > + <literal>lo_export</literal> need superuser privileges independent > + from this setting as prior versions were doing. In the last commit fest, it was named "largeobject_compat_acl", but it is not preferable for Tom Lane, so he suggested to rename it into "large_object_privilege_checks". Other candidates:- lo_compat_privileges (<- my preference in this four)- large_object_compat_privs- lo_compat_access_control-large_object_compat_ac I think "_compat_" should be contained to emphasize it is a compatibility option. > this will not be off by default? it should be for compatibility > reasons... i remember there was a discussion about that but can't > remember the conclusion IIRC, we have no discussion about its default value, although similar topics were discussed: * what should be checked on creation of a large object?-> No need to check permission on its creation. It allows everyoneto create a new large object like current implementation. (Also note that this behavior may be changed in thefuture.) * DELETE should be checked on deletion of a large object?-> No. PgSQL checks ownership of the database objects on its deletionsuch as DROP TABLE. The DELETE permission is checked when we delete contents of a certain database object, notthe database object itself. > Mmm... One of them? the first? > + The one is <literal>SELECT</literal>. > > + Even if a transaction modified access rights and commit it, it is > + not invisible from other transaction which already opened the large > + object. > > The other one, the second > + The other is <literal>UPDATE</literal>. I have no arguments about English expression. BTW, "The one is ..., the other is ..." was a statement on textbook to introduce two things. :-) > it seems there is an "are" that should not be there :) > + > + These functions are originally requires database superuser privilege, > + and it allows to bypass the default database privilege checks, so > + we don't need to check an obvious test twice. > > a typo, obviously > + For largeo bjects, this privilege also allows to read from > + the target large object. Thanks, I see. > We have two versions of these functions one that a recieve an SnapShot > parameter and other that don't... > what is the rationale of this? AFAIU, the one that doesn't receive > SnapShot is calling the other one with SnapShotNow, can't we simply > call it that way and drop the version of the functions that doesn't > have that parameter? > + pg_largeobject_aclmask(Oid lobj_oid, Oid roleid, > + AclMode mask, AclMaskHow how) > > + pg_largeobject_aclcheck(Oid lobj_oid, Oid roleid, AclMode mode) We have no reason other than cosmetic rationale. In the current implementation, all the caller of pg_largeobejct_aclcheck_*() needs to provides correct snapshot including SnapshotNow when read-writable. When pg_aclmask() calls pg_largeobject_aclmask(), it is the only case that caller assumes SnapshotNow shall be applied implicitly. On the other hand, all the case when pg_largeobject_ownercheck() is called, the caller assumes SnapshotNow is applied, so we don't have multiple versions. So, I'll reorganize these APIs as follows:- pg_largeobject_aclmask_snapshot()- pg_largeobject_aclcheck_snapshot()- pg_largeobject_ownercheck() Thanks, please wait for revised revision. -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > > creating template1 database in > > /home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1 > > ... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))", > > File: "md.c", Line: 254) > > child process was terminated by signal 6: Aborted > > I could not reproduce it. I had the same trap before when I mistakenly used duplicated oids. Don't you add a new catalog with existing oids? src/include/catalog/duplicate_oids might be a help. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Itagaki Takahiro wrote: > KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > >>> creating template1 database in >>> /home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1 >>> ... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))", >>> File: "md.c", Line: 254) >>> child process was terminated by signal 6: Aborted >> I could not reproduce it. > > I had the same trap before when I mistakenly used duplicated oids. > Don't you add a new catalog with existing oids? > src/include/catalog/duplicate_oids might be a help. Thanks, Bingo! toasting.h:DECLARE_TOAST(pg_trigger, 2336, 2337); pg_largeobject_metadata.h:CATALOG(pg_largeobject_metadata,2336) -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
The attached patch is an updated revision of Largeobject Access Controls. List of updates: * rebased to the latest CVS HEAD * SGML documentation fixes: - The future version number was replaced as: "In the 8.4.x series and earlier release, ..." - Other strange English representations and typoes were fixed. * Fixed OID conflicts in system catalog definition. The new TOAST relation for pg_trigger used same OID number with pg_largeobject_metadata. * Fixed incorrect error code in pg_largeobject_ownercheck(). It raised _UNDEFINED_FUNCTION, but should be _UNDEFINED_OBJECT. * Renamed GUC parameter to "lo_compat_privileges" from "large_object_privilege_checks". * pg_largeobject_aclmask() and pg_largeobject_aclcheck(), not take an argument of snapshot, were removed. Currently, the caller provide an appropriate snapshot them. Thanks, Jaime Casanova wrote: > 2009/11/12 KaiGai Kohei <kaigai@ak.jp.nec.com>: >> The attached patch is a revised version of large object privileges >> based on the feedbacks at the last commit fest. >> > > please update the patch, it's giving an error when 'make check' is > trying to "create template1" in initdb: > > creating template1 database in > /home/postgres/pg_releases/pgsql/src/test/regress/./tmp_check/data/base/1 > ... TRAP: FailedAssertion("!(reln->md_fd[forkNum] == ((void *)0))", > File: "md.c", Line: 254) > child process was terminated by signal 6: Aborted > > > Meanwhile, i will make some comments: > > This manual will be specific for 8.5 so i think all mentions to the > version should be removed > for example; > > + In this version, a large object has OID of its owner, access permissions > + and OID of the largeobject itself. > > + Prior to the version 8.5.x release does not have any > privilege checks on > + large objects. > > the parameter name (large_object_privilege_checks) is confusing enough > that we have to make this statements to clarify... let's think in a > better less confuse name > + Please note that it is not equivalent to disable all the security > + checks corresponding to large objects. > + For example, the <literal>lo_import()</literal> and > + <literal>lo_export</literal> need superuser privileges independent > + from this setting as prior versions were doing. > > this will not be off by default? it should be for compatibility > reasons... i remember there was a discussion about that but can't > remember the conclusion > > Mmm... One of them? the first? > + The one is <literal>SELECT</literal>. > > + Even if a transaction modified access rights and commit it, it is > + not invisible from other transaction which already opened the large > + object. > > The other one, the second > + The other is <literal>UPDATE</literal>. > > > it seems there is an "are" that should not be there :) > + > + These functions are originally requires database superuser privilege, > + and it allows to bypass the default database privilege checks, so > + we don't need to check an obvious test twice. > > a typo, obviously > + For largeo bjects, this privilege also allows to read from > + the target large object. > > > We have two versions of these functions one that a recieve an SnapShot > parameter and other that don't... > what is the rationale of this? AFAIU, the one that doesn't receive > SnapShot is calling the other one with SnapShotNow, can't we simply > call it that way and drop the version of the functions that doesn't > have that parameter? > + pg_largeobject_aclmask(Oid lobj_oid, Oid roleid, > + AclMode mask, AclMaskHow how) > > + pg_largeobject_aclcheck(Oid lobj_oid, Oid roleid, AclMode mode) > -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
I just looked over the latest version of this patch and it seems to satisfy all the issues suggested by the initial review. This looks like it's ready for a committer from a quality perspective and I'm going to mark it as such. I have a guess what some of the first points of discussion are going to be though, so might as well raise them here. This patch is 2.8K lines of code that's in a lot of places: a mix of full new functions, tweaks to existing ones, docs, regression tests, it's a well structured but somewhat heavy bit of work. One obvious questions is whether there's enough demand for access controls on large objects to justify adding the complexity involved to do so. A second thing I'm concerned about is what implications this change would have for in-place upgrades. If there's demand and it's not going to cause upgrade issues, then we just need to find a committer willing to chew on it. I think those are the main hurdles left for this patch. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Greg Smith wrote: > I just looked over the latest version of this patch and it seems to > satisfy all the issues suggested by the initial review. This looks like > it's ready for a committer from a quality perspective and I'm going to > mark it as such. Thanks for your efforts. > I have a guess what some of the first points of discussion are going to > be though, so might as well raise them here. This patch is 2.8K lines > of code that's in a lot of places: a mix of full new functions, tweaks > to existing ones, docs, regression tests, it's a well structured but > somewhat heavy bit of work. One obvious questions is whether there's > enough demand for access controls on large objects to justify adding the > complexity involved to do so. At least, it is a todo item in the community: http://wiki.postgresql.org/wiki/Todo#Binary_Data Apart from SELinux, it is quite natural to apply any access controls on binary data. If we could not have any valid access controls, users will not want to store their sensitive information, such as confidential PDF files, as a large object. > A second thing I'm concerned about is > what implications this change would have for in-place upgrades. If > there's demand and it's not going to cause upgrade issues, then we just > need to find a committer willing to chew on it. I think those are the > main hurdles left for this patch. I guess we need to create an empty entry with a given OID into the pg_largeobject_metadata for each large objects when we try to upgrade in-place from 8.4.x or earlier release to the upcoming release. However, no format changes in the pg_largeobject catalog, including an empty large object, so I guess we need a small amount of additional support in pg_dump to create empty metadata. I want any suggestion about here. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
On Sun, Dec 6, 2009 at 11:19 PM, Greg Smith <greg@2ndquadrant.com> wrote: > I just looked over the latest version of this patch and it seems to satisfy > all the issues suggested by the initial review. This looks like it's ready > for a committer from a quality perspective and I'm going to mark it as such. > yes. i have just finished my tests and seems like the patch is working just fine... BTW, seems like KaiGai miss this comment in src/backend/catalog/pg_largeobject.c when renaming the parameter * large_object_privilege_checks is not refered here, i still doesn't like the name but we have changed it a lot of times so if anyone has a better idea now is when you have to speak -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Casanova wrote: > On Sun, Dec 6, 2009 at 11:19 PM, Greg Smith <greg@2ndquadrant.com> wrote: >> I just looked over the latest version of this patch and it seems to satisfy >> all the issues suggested by the initial review. This looks like it's ready >> for a committer from a quality perspective and I'm going to mark it as such. >> > > yes. i have just finished my tests and seems like the patch is working > just fine... > > BTW, seems like KaiGai miss this comment in > src/backend/catalog/pg_largeobject.c when renaming the parameter > * large_object_privilege_checks is not refered here, > > i still doesn't like the name but we have changed it a lot of times so > if anyone has a better idea now is when you have to speak Oops, it should be fixed to "lo_compat_privileges". This comment also have version number issue, so I fixed it as follows: BEFORE: /* * large_object_privilege_checks is not refered here, * because it is a compatibility option, but we don't * have ALTER LARGE OBJECT prior to the v8.5.0. */ AFTER: /* * The 'lo_compat_privileges' is not checked here, because we * don't have any access control features in the 8.4.x series * or earlier release. * So, it is not a place we can define a compatible behavior. */ Nothing are changed in other codes, including something corresponding to in-place upgrading. I'm waiting for suggestion. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > Apart from SELinux, it is quite natural to apply any access > controls on binary data. If we could not have any valid access > controls, users will not want to store their sensitive > information, such as confidential PDF files, as a large object. Absolutely. The historical security issues for large objects immediately eliminated them as a possible place to store PDF files. -Kevin
Hi, I'm reviewing LO-AC patch. KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > Nothing are changed in other codes, including something corresponding to > in-place upgrading. I'm waiting for suggestion. I have a question about the behavior -- the patch adds ownership management of large objects. Non-privileged users cannot read, write, or drop othere's LOs. But they can read the contents of large object when they read pg_catalog.pg_largeobject directly. Even if the patch is applied, we still allow "SELECT * FROM pg_largeobject" ...right? This issue might be solved by the core SE-PgSQL patch, but what should we do fo now? Other changes in the patch seem to be reasonable. "GRANT/REVOKE ON LARGE OBJECT <number>" might be hard to use if used alone, but we can use the commands as dynamic SQLs in DO statements if we want to grant or revoke privileges in bulk. "SELECT oid FROM pg_largeobject_metadata" is used in some places instead of "SELECT DISTINCT loid FROM pg_largeobject". They return the same result, but the former will be faster because we don't use DISTINCT. pg_dump will be slightly accelerated by the new query. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > Hi, I'm reviewing LO-AC patch. > > KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: >> Nothing are changed in other codes, including something corresponding to >> in-place upgrading. I'm waiting for suggestion. > > I have a question about the behavior -- the patch adds ownership > management of large objects. Non-privileged users cannot read, write, > or drop othere's LOs. But they can read the contents of large object > when they read pg_catalog.pg_largeobject directly. Even if the patch > is applied, we still allow "SELECT * FROM pg_largeobject" ...right? > > This issue might be solved by the core SE-PgSQL patch, > but what should we do fo now? Oops, I forgot to fix it. It is a misconfiguration on database initialization, and not related issue with SE-PgSQL feature. It can be solved with revoking any privileges from anybody in the initdb phase. So, we should inject the following statement for setup_privileges(). REVOKE ALL ON pg_largeobject FROM PUBLIC; In the default PG model, database superuser is an exception in access controls, so he can bypass the checks eventually. Here is no difference, even if he can see pg_largeobject. For unprivileged users, this configuration restricts the way to access large object into lo_*() functions, so we can acquire all their accesses and apply permission checks comprehensively. When database superuser tries to consider something malicious, such as exposing any large object to public, we have to give up anything. > Other changes in the patch seem to be reasonable. > > "GRANT/REVOKE ON LARGE OBJECT <number>" might be hard to use if used alone, > but we can use the commands as dynamic SQLs in DO statements if we want to > grant or revoke privileges in bulk. We already have COMMENT ON LARGE OBJECT <number> IS <comment>; statement :-) > "SELECT oid FROM pg_largeobject_metadata" is used in some places instead of > "SELECT DISTINCT loid FROM pg_largeobject". They return the same result, > but the former will be faster because we don't use DISTINCT. pg_dump will > be slightly accelerated by the new query. > > Regards, > --- > Takahiro Itagaki > NTT Open Source Software Center > > > -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com> *** base/src/bin/initdb/initdb.c 2009-11-21 05:52:12.000000000 +0900 --- blob/src/bin/initdb/initdb.c 2009-12-13 06:33:55.000000000 +0900 *************** setup_privileges(void) *** 1783,1788 **** --- 1783,1789 ---- " WHERE relkind IN ('r', 'v', 'S') AND relacl IS NULL;\n", "GRANT USAGE ON SCHEMA pg_catalog TO PUBLIC;\n", "GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n", + "REVOKE ALL ON pg_largeobject FROM PUBLIC;\n", NULL };
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > > we still allow "SELECT * FROM pg_largeobject" ...right? > > It can be solved with revoking any privileges from anybody in the initdb > phase. So, we should inject the following statement for setup_privileges(). > > REVOKE ALL ON pg_largeobject FROM PUBLIC; OK, I'll add the following description in the documentation of pg_largeobject. <structname>pg_largeobject</structname> should not be readable by the public, since the catalog contains data in largeobjects of all users. <structname>pg_largeobject_metadata</> is a publicly readable catalog that only contains identifiersof large objects. > {"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS, > gettext_noop("Turn on/off privilege checks on large objects."), The description is true, but gives a confusion because "lo_compat_privileges = on" means "privilege checks are turned off". short desc: Enables backward compatibility in privilege checks on large objects long desc: When turned on, privilege checkson large objects are disabled. Are those descriptions appropriate? Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > >>> we still allow "SELECT * FROM pg_largeobject" ...right? >> It can be solved with revoking any privileges from anybody in the initdb >> phase. So, we should inject the following statement for setup_privileges(). >> >> REVOKE ALL ON pg_largeobject FROM PUBLIC; > > OK, I'll add the following description in the documentation of pg_largeobject. > > <structname>pg_largeobject</structname> should not be readable by the > public, since the catalog contains data in large objects of all users. > <structname>pg_largeobject_metadata</> is a publicly readable catalog > that only contains identifiers of large objects. It makes sense to me. >> {"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS, >> gettext_noop("Turn on/off privilege checks on large objects."), > > The description is true, but gives a confusion because > "lo_compat_privileges = on" means "privilege checks are turned off". > > short desc: Enables backward compatibility in privilege checks on large objects > long desc: When turned on, privilege checks on large objects are disabled. > > Are those descriptions appropriate? The long description is a bit confusable, because it does not disable all the privilege checks, such as lo_export/lo_import which already checks superuser privilege on execution in the earlier releases. What's your opinion about: long desc: When turned on, privilege checks on large objects perform with backward compatibility as 8.4.x orearlier releases. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > What's your opinion about: > long desc: When turned on, privilege checks on large objects perform with > backward compatibility as 8.4.x or earlier releases. I updated the description as your suggest. Applied with minor editorialization, mainly around tab-completion support in psql. # This is my first commit :) Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: > OK, I'll add the following description in the documentation of pg_largeobject. > <structname>pg_largeobject</structname> should not be readable by the > public, since the catalog contains data in large objects of all users. This is going to be a problem, because it will break applications that expect to be able to read pg_largeobject. Like, say, pg_dump. regards, tom lane
Tom Lane wrote: > Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: >> OK, I'll add the following description in the documentation of pg_largeobject. > >> <structname>pg_largeobject</structname> should not be readable by the >> public, since the catalog contains data in large objects of all users. > > This is going to be a problem, because it will break applications that > expect to be able to read pg_largeobject. Like, say, pg_dump. Is it a right behavior, even if we have permission checks on large objects? If so, we can inject a hardwired rule to prevent to select pg_largeobject when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC. -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
2009/12/10 KaiGai Kohei <kaigai@ak.jp.nec.com>: > > If so, we can inject a hardwired rule to prevent to select pg_largeobject > when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC. > it doesn't seem like a good idea to make that GUC act like a GRANT or REVOKE on the case of pg_largeobject. besides if a normal user can read from pg_class why we deny pg_largeobject -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > Tom Lane wrote: > > Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: > >> <structname>pg_largeobject</structname> should not be readable by the > >> public, since the catalog contains data in large objects of all users. > > > > This is going to be a problem, because it will break applications that > > expect to be able to read pg_largeobject. Like, say, pg_dump. > > Is it a right behavior, even if we have permission checks on large objects? Can we use column-level access control here? REVOKE ALL ON pg_largeobject FROM PUBLIC; => GRANT SELECT (loid) ON pg_largeobject TO PUBLIC; We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could replace pg_largeobject_metadata instead if we try to fix only pg_dump, but it's no wonder that any other user applications use such queries. I think to allow reading loid is a balanced solution. > If so, we can inject a hardwired rule to prevent to select pg_largeobject > when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC. Is it enough to run "GRANT SELECT ON pg_largeobject TO PUBLIC" ? Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > >> Tom Lane wrote: >>> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: >>>> <structname>pg_largeobject</structname> should not be readable by the >>>> public, since the catalog contains data in large objects of all users. >>> This is going to be a problem, because it will break applications that >>> expect to be able to read pg_largeobject. Like, say, pg_dump. >> Is it a right behavior, even if we have permission checks on large objects? > > Can we use column-level access control here? > > REVOKE ALL ON pg_largeobject FROM PUBLIC; > => GRANT SELECT (loid) ON pg_largeobject TO PUBLIC; Indeed, it seems to me reasonable. > We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could > replace pg_largeobject_metadata instead if we try to fix only pg_dump, > but it's no wonder that any other user applications use such queries. > I think to allow reading loid is a balanced solution. Right, I also remind this query has to be fixed up by other reason right now. If all the large objects are empty, this query can return nothing, even if large object entries are in pg_largeobject_metadata. Please wait for a while. >> If so, we can inject a hardwired rule to prevent to select pg_largeobject >> when lo_compat_privileges is turned off, instead of REVOKE ALL FROM PUBLIC. > > Is it enough to run "GRANT SELECT ON pg_largeobject TO PUBLIC" ? Agreed. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > besides if a normal user can read from pg_class why we deny pg_largeobject pg_class and pg_largeobject_metadata contain only metadata of objects. Tables and pg_largeobject contain actual data of the objects. A normal user can read pg_class, but cannot read contents of tables without privilege. Regards, --- Takahiro Itagaki NTT Open Source Software Center
KaiGai Kohei wrote: > Takahiro Itagaki wrote: >> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: >> >>> Tom Lane wrote: >>>> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: >>>>> <structname>pg_largeobject</structname> should not be readable by the >>>>> public, since the catalog contains data in large objects of all users. >>>> This is going to be a problem, because it will break applications that >>>> expect to be able to read pg_largeobject. Like, say, pg_dump. >>> Is it a right behavior, even if we have permission checks on large objects? >> Can we use column-level access control here? >> >> REVOKE ALL ON pg_largeobject FROM PUBLIC; >> => GRANT SELECT (loid) ON pg_largeobject TO PUBLIC; > > Indeed, it seems to me reasonable. > >> We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could >> replace pg_largeobject_metadata instead if we try to fix only pg_dump, >> but it's no wonder that any other user applications use such queries. >> I think to allow reading loid is a balanced solution. > > Right, I also remind this query has to be fixed up by other reason right now. > If all the large objects are empty, this query can return nothing, even if > large object entries are in pg_largeobject_metadata. > > Please wait for a while. The attached patch fixes these matters. * It adds "GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;" during initdb phase to resolve the matter pointed out. * A few queries in pg_dump were fixed to select pg_largeobject_metadata instead of pg_largeobject. If a dumpable large obejct is empty (it means no page frames are on pg_largeobject), pg_dump misunderstand no such large object is here. We have to reference pg_largeobject_metadata to check whether a certain large objct exists, or not. Thanks, $ diffstat ~/pgsql-blob-priv-fix.patch doc/src/sgml/catalogs.sgml | 3 !!! src/bin/initdb/initdb.c | 1 + src/bin/pg_dump/pg_dump.c | 8 !!!!!!!! src/test/regress/expected/privileges.out | 15 +++++++++++++++ src/test/regress/sql/privileges.sql | 8 ++++++++ 5 files changed, 24 insertions(+), 11 modifications(!) -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com> *** base/doc/src/sgml/catalogs.sgml (revision 2467) --- base/doc/src/sgml/catalogs.sgml (working copy) *************** *** 3136,3142 **** <para> <structname>pg_largeobject</structname> should not be readable by the ! public, since the catalog contains data in large objects of all users. <structname>pg_largeobject_metadata</> is a publicly readable catalog that only contains identifiers of large objects. </para> --- 3136,3143 ---- <para> <structname>pg_largeobject</structname> should not be readable by the ! public (except for <structfield>loid</structfield>), since the catalog ! contains data in large objects of all users. <structname>pg_largeobject_metadata</> is a publicly readable catalog that only contains identifiers of large objects. </para> *** base/src/test/regress/sql/privileges.sql (revision 2467) --- base/src/test/regress/sql/privileges.sql (working copy) *************** *** 565,570 **** --- 565,578 ---- SELECT lo_unlink(1002); SELECT lo_export(1001, '/dev/null'); -- to be denied + -- don't allow unpriv users to access pg_largeobject contents + \c - + SELECT * FROM pg_largeobject LIMIT 0; + + SET SESSION AUTHORIZATION regressuser1; + SELECT * FROM pg_largeobject LIMIT 0; -- to be denied + SELECT loid FROM pg_largeobject LIMIT 0; + -- test default ACLs \c - *** base/src/test/regress/expected/privileges.out (revision 2467) --- base/src/test/regress/expected/privileges.out (working copy) *************** *** 1041,1046 **** --- 1041,1061 ---- SELECT lo_export(1001, '/dev/null'); -- to be denied ERROR: must be superuser to use server-side lo_export() HINT: Anyone can use the client-side lo_export() provided by libpq. + -- don't allow unpriv users to access pg_largeobject contents + \c - + SELECT * FROM pg_largeobject LIMIT 0; + loid | pageno | data + ------+--------+------ + (0 rows) + + SET SESSION AUTHORIZATION regressuser1; + SELECT * FROM pg_largeobject LIMIT 0; -- to be denied + ERROR: permission denied for relation pg_largeobject + SELECT loid FROM pg_largeobject LIMIT 0; + loid + ------ + (0 rows) + -- test default ACLs \c - CREATE SCHEMA testns; *** base/src/bin/initdb/initdb.c (revision 2467) --- base/src/bin/initdb/initdb.c (working copy) *************** *** 1784,1789 **** --- 1784,1790 ---- "GRANT USAGE ON SCHEMA pg_catalog TO PUBLIC;\n", "GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n", "REVOKE ALL ON pg_largeobject FROM PUBLIC;\n", + "GRANT SELECT (loid) ON pg_largeobject TO PUBLIC;\n", NULL }; *** base/src/bin/pg_dump/pg_dump.c (revision 2467) --- base/src/bin/pg_dump/pg_dump.c (working copy) *************** *** 1945,1951 **** selectSourceSchema("pg_catalog"); /* Check for BLOB OIDs */ ! if (AH->remoteVersion >= 70100) blobQry = "SELECT loid FROM pg_largeobject LIMIT 1"; else blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1"; --- 1945,1953 ---- selectSourceSchema("pg_catalog"); /* Check for BLOB OIDs */ ! if (AH->remoteVersion >= 80500) ! blobQry = "SELECT oid FROM pg_largeobject_metadata LIMIT 1"; ! else if (AH->remoteVersion >= 70100) blobQry = "SELECT loid FROM pg_largeobject LIMIT 1"; else blobQry = "SELECT oid FROM pg_class WHERE relkind = 'l' LIMIT 1"; *************** *** 1981,1987 **** selectSourceSchema("pg_catalog"); /* Cursor to get all BLOB OIDs */ ! if (AH->remoteVersion >= 70100) blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject"; else blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'"; --- 1983,1991 ---- selectSourceSchema("pg_catalog"); /* Cursor to get all BLOB OIDs */ ! if (AH->remoteVersion >= 80500) ! blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_largeobject_metadata"; ! else if (AH->remoteVersion >= 70100) blobQry = "DECLARE bloboid CURSOR FOR SELECT DISTINCT loid FROM pg_largeobject"; else blobQry = "DECLARE bloboid CURSOR FOR SELECT oid FROM pg_class WHERE relkind = 'l'";
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > The attached patch fixes these matters. I'll start to check it. > We have to reference pg_largeobject_metadata to check whether a certain > large objct exists, or not. What is the situation where there is a row in pg_largeobject_metadata and no corresponding rows in pg_largeobject? Do we have a method to delete all rows in pg_largeobject but leave some metadata? Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > >> The attached patch fixes these matters. > > I'll start to check it. Thanks, >> We have to reference pg_largeobject_metadata to check whether a certain >> large objct exists, or not. > > What is the situation where there is a row in pg_largeobject_metadata > and no corresponding rows in pg_largeobject? Do we have a method to > delete all rows in pg_largeobject but leave some metadata? It is a case when we create a new large object, but write nothing. postgres=# SELECT lo_create(1234); lo_create ----------- 1234 (1 row) postgres=# SELECT * FROM pg_largeobject_metadata WHERE oid = 1234; lomowner | lomacl ----------+-------- 10 | (1row) postgres=# SELECT * FROM pg_largeobject WHERE loid = 1234; loid | pageno | data ------+--------+------ (0 rows) In this case, the following two queries are not equivalent.* SELECT oid FROM pg_largeobject_metadata* SELECT DISTINCT loidFROM pg_largeobject The second query does not return the loid of empty large objects. The prior implementation inserted a zero-length page to show here is a large object with this loid, but it got unnecessary with this enhancement. If we need compatibility in this level, we can insert a zero-length page into pg_largeobject on LargeObjectCreate(). It is harmless, but its worth is uncertain. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > >> We have to reference pg_largeobject_metadata to check whether a certain > >> large objct exists, or not. > It is a case when we create a new large object, but write nothing. OK, that makes sense. In addition of the patch, we also need to fix pg_restore with --clean option. I added DropBlobIfExists() in pg_backup_db.c. A revised patch attached. Please check further mistakes. BTW, we can optimize lo_truncate because we allow metadata-only large objects. inv_truncate() doesn't have to update the first data tuple to be zero length. It only has to delete all corresponding tuples like as: DELETE FROM pg_largeobject WHERE loid = {obj_desc->id} Regards, --- Takahiro Itagaki NTT Open Source Software Center
Attachment
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: > In addition of the patch, we also need to fix pg_restore with > --clean option. I added DropBlobIfExists() in pg_backup_db.c. > > A revised patch attached. Please check further mistakes. ...and here is an additional fix for contrib modules. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Attachment
KaiGai Kohei wrote: > Takahiro Itagaki wrote: > > KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > > > >> Tom Lane wrote: > >>> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: > >>>> <structname>pg_largeobject</structname> should not be readable by the > >>>> public, since the catalog contains data in large objects of all users. > >>> This is going to be a problem, because it will break applications that > >>> expect to be able to read pg_largeobject. Like, say, pg_dump. > >> Is it a right behavior, even if we have permission checks on large objects? > > > > Can we use column-level access control here? > > > > REVOKE ALL ON pg_largeobject FROM PUBLIC; > > => GRANT SELECT (loid) ON pg_largeobject TO PUBLIC; > > Indeed, it seems to me reasonable. > > > We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could > > replace pg_largeobject_metadata instead if we try to fix only pg_dump, > > but it's no wonder that any other user applications use such queries. > > I think to allow reading loid is a balanced solution. > > Right, I also remind this query has to be fixed up by other reason right now. > If all the large objects are empty, this query can return nothing, even if > large object entries are in pg_largeobject_metadata. "metadata" seems very vague. Can't we come up with a more descriptive name? Also, how will this affect pg_migrator? pg_migrator copies pg_largeobject and its index from the old to the new server. Is the format inside pg_largeobject changed by this patch? What happens when there is no entry in pg_largeobject_metadata for a specific row? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian さんは書きました: > KaiGai Kohei wrote: >> Takahiro Itagaki wrote: >>> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: >>> >>>> Tom Lane wrote: >>>>> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: >>>>>> <structname>pg_largeobject</structname> should not be readable by the >>>>>> public, since the catalog contains data in large objects of all users. >>>>> This is going to be a problem, because it will break applications that >>>>> expect to be able to read pg_largeobject. Like, say, pg_dump. >>>> Is it a right behavior, even if we have permission checks on large objects? >>> Can we use column-level access control here? >>> >>> REVOKE ALL ON pg_largeobject FROM PUBLIC; >>> => GRANT SELECT (loid) ON pg_largeobject TO PUBLIC; >> Indeed, it seems to me reasonable. >> >>> We use "SELECT loid FROM pg_largeobject LIMIT 1" in pg_dump. We could >>> replace pg_largeobject_metadata instead if we try to fix only pg_dump, >>> but it's no wonder that any other user applications use such queries. >>> I think to allow reading loid is a balanced solution. >> Right, I also remind this query has to be fixed up by other reason right now. >> If all the large objects are empty, this query can return nothing, even if >> large object entries are in pg_largeobject_metadata. > > "metadata" seems very vague. Can't we come up with a more descriptive > name? What about "property"? The "metadata" was the suggested name from Robert Haas at the last commit fest, because we may store any other properties of a large object in this catalog future. > Also, how will this affect pg_migrator? pg_migrator copies > pg_largeobject and its index from the old to the new server. Is the > format inside pg_largeobject changed by this patch? The format of pg_largeobject was not touched. > What happens when > there is no entry in pg_largeobject_metadata for a specific row? In this case, these rows become orphan. So, I think we need to create an empty large object with same LOID on pg_migrator. It makes an entry on pg_largeobject_metadata without writing anything to the pg_largeobject. I guess rest of migrations are not difference. Correct? Thanks,
Takahiro Itagaki wrote: > KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > >>>> We have to reference pg_largeobject_metadata to check whether a certain >>>> large objct exists, or not. >> It is a case when we create a new large object, but write nothing. > > OK, that makes sense. > > In addition of the patch, we also need to fix pg_restore with > --clean option. I added DropBlobIfExists() in pg_backup_db.c. > > A revised patch attached. Please check further mistakes. + void + DropBlobIfExists(ArchiveHandle *AH, Oid oid) + { + const char *lo_relname; + const char *lo_colname; + + if (PQserverVersion(AH->connection) >= 80500) + { + lo_relname = "pg_largeobject_metadata"; + lo_colname = "oid"; + } + else + { + lo_relname = "pg_largeobject"; + lo_colname = "loid"; + } + + /* Call lo_unlink only if exists to avoid not-found error. */ + ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.%s WHERE %s = '%u') THEN pg_catalog.lo_unlink('%u') END;\n", + lo_relname, lo_colname, oid, oid); + } I think the following approach is more reasonable for the current design. if (PQserverVersion(AH->connection) >= 80500) { /* newer query */ ahprintf(AH, "SELECT pg_catalog.lo_unlink(oid)" "FROM pg_catalog.pg_largeobject_metadata " "WHERE oid = %u;\n",oid); } else { /* original query */ ahprintf(AH, "SELECT CASE WHEN EXISTS(SELECT 1 FROM pg_catalog.pg_largeobjectWHERE loid = '%u') " "THEN pg_catalog.lo_unlink('%u') END;\n", oid, oid); } We don't have any reason why still CASE ... WHEN and subquery for the given LOID. Right? The fix-lo-contrib.patch looks good for me. > BTW, we can optimize lo_truncate because we allow metadata-only large > objects. inv_truncate() doesn't have to update the first data tuple to > be zero length. It only has to delete all corresponding tuples like as: > DELETE FROM pg_largeobject WHERE loid = {obj_desc->id} Right, when inv_truncate takes an aligned length by LOBLKSIZE. I'll also submit a small patch on CF-Jan, OK? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
KaiGai Kohei wrote: > > What happens when > > there is no entry in pg_largeobject_metadata for a specific row? > > In this case, these rows become orphan. > So, I think we need to create an empty large object with same LOID on > pg_migrator. It makes an entry on pg_largeobject_metadata without > writing anything to the pg_largeobject. > I guess rest of migrations are not difference. Correct? Agreed. I have modified pg_migrator with the attached patch which creates a script that adds default permissions for all large object tables. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ? tools ? log ? src/pg_migrator Index: src/info.c =================================================================== RCS file: /cvsroot/pg-migrator/pg_migrator/src/info.c,v retrieving revision 1.25 diff -c -r1.25 info.c *** src/info.c 10 Dec 2009 23:14:25 -0000 1.25 --- src/info.c 13 Dec 2009 01:17:37 -0000 *************** *** 480,486 **** "SELECT DISTINCT probin " "FROM pg_catalog.pg_proc " "WHERE prolang = 13 /* C */ AND " ! " probin IS NOT NULL"); totaltups += PQntuples(ress[dbnum]); PQfinish(conn); --- 480,488 ---- "SELECT DISTINCT probin " "FROM pg_catalog.pg_proc " "WHERE prolang = 13 /* C */ AND " ! " probin IS NOT NULL AND " ! " oid >= " ! STRINGIFY(FirstNormalObjectId) ";"); totaltups += PQntuples(ress[dbnum]); PQfinish(conn); Index: src/pg_migrator.c =================================================================== RCS file: /cvsroot/pg-migrator/pg_migrator/src/pg_migrator.c,v retrieving revision 1.69 diff -c -r1.69 pg_migrator.c *** src/pg_migrator.c 10 Dec 2009 14:34:19 -0000 1.69 --- src/pg_migrator.c 13 Dec 2009 01:17:37 -0000 *************** *** 92,97 **** --- 92,100 ---- sequence_script_file_name = v8_3_create_sequence_script(&ctx, CLUSTER_OLD); } + if (GET_MAJOR_VERSION(ctx.old.pg_version) <= 804 && + GET_MAJOR_VERSION(ctx.new.pg_version) >= 805) + v8_4_populate_pg_largeobject_metadata(&ctx, true, CLUSTER_OLD); /* Looks okay so far. Prepare the pg_dump output */ generate_old_dump(&ctx); *************** *** 294,299 **** --- 297,309 ---- v8_3_invalidate_bpchar_pattern_ops_indexes(&ctx, false, CLUSTER_NEW); stop_postmaster(&ctx, false, true); } + if (GET_MAJOR_VERSION(ctx.old.pg_version) <= 804 && + GET_MAJOR_VERSION(ctx.new.pg_version) >= 805) + { + start_postmaster(&ctx, CLUSTER_NEW, true); + v8_4_populate_pg_largeobject_metadata(&ctx, false, CLUSTER_NEW); + stop_postmaster(&ctx, false, true); + } pg_log(&ctx, PG_REPORT, "\n*Upgrade complete*\n"); *************** *** 416,422 **** char new_clog_path[MAXPGPATH]; /* copy old commit logs to new data dir */ ! prep_status(ctx, "Deleting old commit clogs"); snprintf(old_clog_path, sizeof(old_clog_path), "%s/pg_clog", ctx->old.pgdata); snprintf(new_clog_path, sizeof(new_clog_path), "%s/pg_clog", ctx->new.pgdata); --- 426,432 ---- char new_clog_path[MAXPGPATH]; /* copy old commit logs to new data dir */ ! prep_status(ctx, "Deleting new commit clogs"); snprintf(old_clog_path, sizeof(old_clog_path), "%s/pg_clog", ctx->old.pgdata); snprintf(new_clog_path, sizeof(new_clog_path), "%s/pg_clog", ctx->new.pgdata); *************** *** 424,430 **** pg_log(ctx, PG_FATAL, "Unable to delete directory %s\n", new_clog_path); check_ok(ctx); ! prep_status(ctx, "Copying commit clogs"); /* libpgport's copydir() doesn't work in FRONTEND code */ #ifndef WIN32 exec_prog(ctx, true, SYSTEMQUOTE "%s \"%s\" \"%s\"" SYSTEMQUOTE, --- 434,440 ---- pg_log(ctx, PG_FATAL, "Unable to delete directory %s\n", new_clog_path); check_ok(ctx); ! prep_status(ctx, "Copying old commit clogs to new server"); /* libpgport's copydir() doesn't work in FRONTEND code */ #ifndef WIN32 exec_prog(ctx, true, SYSTEMQUOTE "%s \"%s\" \"%s\"" SYSTEMQUOTE, Index: src/pg_migrator.h =================================================================== RCS file: /cvsroot/pg-migrator/pg_migrator/src/pg_migrator.h,v retrieving revision 1.75 diff -c -r1.75 pg_migrator.h *** src/pg_migrator.h 12 Dec 2009 16:56:23 -0000 1.75 --- src/pg_migrator.h 13 Dec 2009 01:17:37 -0000 *************** *** 395,400 **** --- 395,402 ---- bool check_mode, Cluster whichCluster); void v8_3_invalidate_bpchar_pattern_ops_indexes(migratorContext *ctx, bool check_mode, Cluster whichCluster); + void v8_4_populate_pg_largeobject_metadata(migratorContext *ctx, + bool check_mode, Cluster whichCluster); char *v8_3_create_sequence_script(migratorContext *ctx, Cluster whichCluster); void check_for_composite_types(migratorContext *ctx, Index: src/version.c =================================================================== RCS file: /cvsroot/pg-migrator/pg_migrator/src/version.c,v retrieving revision 1.32 diff -c -r1.32 version.c *** src/version.c 7 Aug 2009 20:16:12 -0000 1.32 --- src/version.c 13 Dec 2009 01:17:37 -0000 *************** *** 421,427 **** "| between your old and new clusters so the tables\n" "| must be rebuilt. The file:\n" "| \t%s\n" ! "| when executed by psql by the database super-user,\n" "| will rebuild all tables with tsvector columns.\n\n", output_path); } --- 421,427 ---- "| between your old and new clusters so the tables\n" "| must be rebuilt. The file:\n" "| \t%s\n" ! "| when executed by psql by the database super-user\n" "| will rebuild all tables with tsvector columns.\n\n", output_path); } *************** *** 535,541 **** "| they must be reindexed with the REINDEX command.\n" "| The file:\n" "| \t%s\n" ! "| when executed by psql by the database super-user,\n" "| will recreate all invalid indexes; until then,\n" "| none of these indexes will be used.\n\n", output_path); --- 535,541 ---- "| they must be reindexed with the REINDEX command.\n" "| The file:\n" "| \t%s\n" ! "| when executed by psql by the database super-user\n" "| will recreate all invalid indexes; until then,\n" "| none of these indexes will be used.\n\n", output_path); *************** *** 664,670 **** "| new clusters so they must be reindexed with the\n" "| REINDEX command. The file:\n" "| \t%s\n" ! "| when executed by psql by the database super-user,\n" "| will recreate all invalid indexes; until then,\n" "| none of these indexes will be used.\n\n", output_path); --- 664,670 ---- "| new clusters so they must be reindexed with the\n" "| REINDEX command. The file:\n" "| \t%s\n" ! "| when executed by psql by the database super-user\n" "| will recreate all invalid indexes; until then,\n" "| none of these indexes will be used.\n\n", output_path); *************** *** 675,680 **** --- 675,762 ---- /* + * v8_4_populate_pg_largeobject_metadata() + * + * 8.5 has a new pg_largeobject permission table + */ + void + v8_4_populate_pg_largeobject_metadata(migratorContext *ctx, bool check_mode, + Cluster whichCluster) + { + ClusterInfo *active_cluster = (whichCluster == CLUSTER_OLD) ? + &ctx->old : &ctx->new; + int dbnum; + FILE *script = NULL; + bool found = false; + char output_path[MAXPGPATH]; + + prep_status(ctx, "Checking for large objects"); + + snprintf(output_path, sizeof(output_path), "%s/pg_largeobject.sql", + ctx->home_dir); + + for (dbnum = 0; dbnum < active_cluster->dbarr.ndbs; dbnum++) + { + PGresult *res; + int i_count; + DbInfo *active_db = &active_cluster->dbarr.dbs[dbnum]; + PGconn *conn = connectToServer(ctx, active_db->db_name, whichCluster); + + /* find if there are any large objects */ + res = executeQueryOrDie(ctx, conn, + "SELECT count(*) " + "FROM pg_catalog.pg_largeobject "); + + i_count = PQfnumber(res, "count"); + if (atoi(PQgetvalue(res, 0, i_count)) != 0) + { + found = true; + if (!check_mode) + { + if (script == NULL && (script = fopen(output_path, "w")) == NULL) + pg_log(ctx, PG_FATAL, "Could not create necessary file: %s\n", output_path); + fprintf(script, "\\connect %s\n", + quote_identifier(ctx, active_db->db_name)); + fprintf(script, + "INSERT INTO pg_catalog.pg_largeobject_metadata (lomowner)\n" + "SELECT DISTINCT loid\n" + "FROM pg_catalog.pg_largeobject;\n"); + } + } + + PQclear(res); + PQfinish(conn); + } + + if (found) + { + if (!check_mode) + fclose(script); + report_status(ctx, PG_WARNING, "warning"); + if (check_mode) + pg_log(ctx, PG_WARNING, "\n" + "| Your installation contains large objects.\n" + "| The new database has an additional large object\n" + "| permission table. After migration, you will be\n" + "| given a command to populate the pg_largeobject\n" + "| permission table with default permissions.\n\n"); + else + pg_log(ctx, PG_WARNING, "\n" + "| Your installation contains large objects.\n" + "| The new database has an additional large object\n" + "| permission table so default permissions must be\n" + "| defined for all large objects. The file:\n" + "| \t%s\n" + "| when executed by psql by the database super-user\n" + "| will define the default permissions.\n\n", + output_path); + } + else + check_ok(ctx); + } + + + /* * v8_3_create_sequence_script() * * 8.4 added the column "start_value" to all sequences. For this reason,
Bruce Momjian wrote: > KaiGai Kohei wrote: > > > What happens when > > > there is no entry in pg_largeobject_metadata for a specific row? > > > > In this case, these rows become orphan. > > So, I think we need to create an empty large object with same LOID on > > pg_migrator. It makes an entry on pg_largeobject_metadata without > > writing anything to the pg_largeobject. > > I guess rest of migrations are not difference. Correct? > > Agreed. I have modified pg_migrator with the attached patch which > creates a script that adds default permissions for all large object > tables. Oops, seem like I have a problem in getting pg_migrator to populate pg_largeobject_metadata: test=> select lo_import('/etc/profile'); lo_import----------- 16385(1 row)test=> select lo_import('/etc/profile.env');lo_import----------- 16386(1 row)test=> select oid,* from pg_largeobject_metadata; oid | lomowner | lomacl-------+----------+-------- 16385 | 10 | 16386 | 10 |(2 rows) You might remember that INSERT cannot set the oid of a row, so I don't see how pg_migrator can migrate this? Is there a reason we used 'oid' in pg_largeobject_metadata but 'lobj' in pg_largeobject? Why did't we add the lomowner and lomacl columns to pg_largeobject? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
(2009/12/13 10:39), Bruce Momjian wrote: > Bruce Momjian wrote: >> KaiGai Kohei wrote: >>>> What happens when >>>> there is no entry in pg_largeobject_metadata for a specific row? >>> >>> In this case, these rows become orphan. >>> So, I think we need to create an empty large object with same LOID on >>> pg_migrator. It makes an entry on pg_largeobject_metadata without >>> writing anything to the pg_largeobject. >>> I guess rest of migrations are not difference. Correct? >> >> Agreed. I have modified pg_migrator with the attached patch which >> creates a script that adds default permissions for all large object >> tables. > > Oops, seem like I have a problem in getting pg_migrator to populate > pg_largeobject_metadata: > > test=> select lo_import('/etc/profile'); > lo_import > ----------- > 16385 > (1 row) > > test=> select lo_import('/etc/profile.env'); > lo_import > ----------- > 16386 > (1 row) > > test=> select oid,* from pg_largeobject_metadata; > oid | lomowner | lomacl > -------+----------+-------- > 16385 | 10 | > 16386 | 10 | > (2 rows) lo_import() has an another prototype which takes second argument to specify LOID. Isn't it available to restore a large object with correct LOID? For example, lo_import('/etc/profile', 1234) Or, if you intend to restore metadata in the second lo_import(), ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata of a certain large object. Or, am I missing the problem? > You might remember that INSERT cannot set the oid of a row, so I don't > see how pg_migrator can migrate this? Is there a reason we used 'oid' > in pg_largeobject_metadata but 'lobj' in pg_largeobject? Why did't we > add the lomowner and lomacl columns to pg_largeobject? A large object consists of multiple tuples within pg_largeobject. If we added lomowner and lomacl into pg_largeobject, we have to manage all the pages in a large object to keep consistent state. -- KaiGai Kohei <kaigai@kaigai.gr.jp>
KaiGai Kohei wrote: > lo_import() has an another prototype which takes second argument to > specify LOID. Isn't it available to restore a large object with > correct LOID? For example, lo_import('/etc/profile', 1234) I can't use that because the migration has already brought over the pg_largeobject file which has the data. > Or, if you intend to restore metadata in the second lo_import(), > ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata > of a certain large object. Yes, that will work cleanly. The file might be large because I need a GRANT for every large object, but I suppose that is OK. > > You might remember that INSERT cannot set the oid of a row, so I don't > > see how pg_migrator can migrate this? Is there a reason we used 'oid' > > in pg_largeobject_metadata but 'lobj' in pg_largeobject? Why did't we > > add the lomowner and lomacl columns to pg_largeobject? > > A large object consists of multiple tuples within pg_largeobject. > If we added lomowner and lomacl into pg_largeobject, we have to manage > all the pages in a large object to keep consistent state. Ah, good point. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > KaiGai Kohei wrote: > > lo_import() has an another prototype which takes second argument to > > specify LOID. Isn't it available to restore a large object with > > correct LOID? For example, lo_import('/etc/profile', 1234) > > I can't use that because the migration has already brought over the > pg_largeobject file which has the data. > > > Or, if you intend to restore metadata in the second lo_import(), > > ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata > > of a certain large object. > > Yes, that will work cleanly. The file might be large because I need a > GRANT for every large object, but I suppose that is OK. Uh, I tested pg_migrator and found a problem with this approach: test=> select loid from pg_largeobject; loid------- 16385 16385 16386(3 rows)test=> grant all ON LARGE OBJECT 16385 to public;ERROR: large object 16385 does not exist I am wondering if the missing pg_largeobject_metadata row is causing this, and again I have no way of creating one with the specified oid. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
(2009/12/13 11:31), Bruce Momjian wrote: > Bruce Momjian wrote: >> KaiGai Kohei wrote: >>> lo_import() has an another prototype which takes second argument to >>> specify LOID. Isn't it available to restore a large object with >>> correct LOID? For example, lo_import('/etc/profile', 1234) >> >> I can't use that because the migration has already brought over the >> pg_largeobject file which has the data. >> >>> Or, if you intend to restore metadata in the second lo_import(), >>> ALTER LAEGE OBJECT and GRANT LARGE OBJECT enable to set up metadata >>> of a certain large object. >> >> Yes, that will work cleanly. The file might be large because I need a >> GRANT for every large object, but I suppose that is OK. > > Uh, I tested pg_migrator and found a problem with this approach: > > test=> select loid from pg_largeobject; > loid > ------- > 16385 > 16385 > 16386 > (3 rows) > > test=> grant all ON LARGE OBJECT 16385 to public; > ERROR: large object 16385 does not exist > > I am wondering if the missing pg_largeobject_metadata row is causing > this, and again I have no way of creating one with the specified oid. Can SELECT lo_create(16385); help this situation? It create an entry into pg_largeobject_metadata, but does not touch pg_largeobject because it is empty in the initial state. But, in this case, pg_migrator already bring only data chunks to pg_largeobject. So, this operation enables to recombine orphan chunks and a metadata entry. I'm not clear whether we also check pg_largeobejct has chunks with same LOID on lo_create(). In the regular operation, it shall never happen. -- KaiGai Kohei <kaigai@kaigai.gr.jp>
KaiGai Kohei <kaigai@kaigai.gr.jp> wrote: > Can SELECT lo_create(16385); help this situation? SELECT lo_create(loid) FROM (SELECT DISTINCT loid FROM pg_largeobject) AS t would work for pg_migrator. > I'm not clear whether we also check pg_largeobejct has chunks with same > LOID on lo_create(). In the regular operation, it shall never happen. I think the omission is a reasonable optimization. Regards, --- Takahiro Itagaki NTT Open Source Software Center
KaiGai Kohei <kaigai@kaigai.gr.jp> wrote: > We don't have any reason why still CASE ... WHEN and subquery for the given > LOID. Right? Ah, I see. I used your suggestion. I applied the bug fixes. Our tools and contrib modules will always use pg_largeobject_metadata instead of pg_largeobject to enumerate large objects. I removed "GRANT SELECT (loid) ON pg_largeobject TO PUBLIC" from initdb because users must use pg_largeobject_metadata.oid when they want to check OIDs of large objects; If not, they could misjudge the existence of objects. This is an unavoidable incompatibility unless we always have corresponding tuples in pg_largeobject even for zero-length large objects. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > > KaiGai Kohei <kaigai@kaigai.gr.jp> wrote: > > > Can SELECT lo_create(16385); help this situation? > > SELECT lo_create(loid) FROM (SELECT DISTINCT loid FROM pg_largeobject) AS t > > would work for pg_migrator. > > > I'm not clear whether we also check pg_largeobejct has chunks with same > > LOID on lo_create(). In the regular operation, it shall never happen. > > I think the omission is a reasonable optimization. Thanks, I have updated pg_migrator to use your suggested method. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: > > KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > >> What's your opinion about: >> long desc: When turned on, privilege checks on large objects perform with >> backward compatibility as 8.4.x or earlier releases. > > I updated the description as your suggest. > > Applied with minor editorialization, > mainly around tab-completion support in psql. The documentation in this patch needs work. ...Robert
(2009/12/17 7:25), Robert Haas wrote: > On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki > <itagaki.takahiro@oss.ntt.co.jp> wrote: >> >> KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: >> >>> What's your opinion about: >>> long desc: When turned on, privilege checks on large objects perform with >>> backward compatibility as 8.4.x or earlier releases. >> >> I updated the description as your suggest. >> >> Applied with minor editorialization, >> mainly around tab-completion support in psql. > > The documentation in this patch needs work. Are you talking about English quality? or Am I missing something to be documented? -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
2009/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>: > (2009/12/17 7:25), Robert Haas wrote: >> On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki >> <itagaki.takahiro@oss.ntt.co.jp> wrote: >>> >>> KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: >>> >>>> What's your opinion about: >>>> long desc: When turned on, privilege checks on large objects perform with >>>> backward compatibility as 8.4.x or earlier releases. >>> >>> I updated the description as your suggest. >>> >>> Applied with minor editorialization, >>> mainly around tab-completion support in psql. >> >> The documentation in this patch needs work. > > Are you talking about English quality? or Am I missing something to be > documented? Mostly English quality, but there are some other issues too. Proposed patch attached. ...Robert
Attachment
(2009/12/17 13:20), Robert Haas wrote: > 2009/12/16 KaiGai Kohei<kaigai@ak.jp.nec.com>: >> (2009/12/17 7:25), Robert Haas wrote: >>> On Thu, Dec 10, 2009 at 10:41 PM, Takahiro Itagaki >>> <itagaki.takahiro@oss.ntt.co.jp> wrote: >>>> >>>> KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: >>>> >>>>> What's your opinion about: >>>>> long desc: When turned on, privilege checks on large objects perform with >>>>> backward compatibility as 8.4.x or earlier releases. >>>> >>>> I updated the description as your suggest. >>>> >>>> Applied with minor editorialization, >>>> mainly around tab-completion support in psql. >>> >>> The documentation in this patch needs work. >> >> Are you talking about English quality? or Am I missing something to be >> documented? > > Mostly English quality, but there are some other issues too. Proposed > patch attached. I have nothing to comment on English quality.... Thanks for your fixups. -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Robert Haas <robertmhaas@gmail.com> wrote: > 2009/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>: > >>>> ? ?long desc: When turned on, privilege checks on large objects perform with > >>>> ? ? ? ? ? ? ? backward compatibility as 8.4.x or earlier releases. > Mostly English quality, but there are some other issues too. Proposed > patch attached. I remember we had discussions about the version number, like "Don't use '8.5' because it might be released as '9.0'", no? Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata"> for the first <structname>pg_largeobject</structname> in each topic. Regards, --- Takahiro Itagaki NTT Open Source Software Center
2009/12/17 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>: > > Robert Haas <robertmhaas@gmail.com> wrote: > >> 2009/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>: >> >>>> ? ?long desc: When turned on, privilege checks on large objects perform with >> >>>> ? ? ? ? ? ? ? backward compatibility as 8.4.x or earlier releases. > >> Mostly English quality, but there are some other issues too. Proposed >> patch attached. > > I remember we had discussions about the version number, like > "Don't use '8.5' because it might be released as '9.0'", no? I chose to follow the style which Tom indicated that he preferred here. We don't use the phrase "8.4.x series" anywhere else in the documentation, so this doesn't seem like a good time to start. Tom or I will go through and renumber everything if we end up renaming the release to 9.0. > Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata"> > for the first <structname>pg_largeobject</structname> in each topic. Those two things aren't the same. Perhaps you meant <link linkend="catalog-pg-largeobject">? I'll tweak the pg_largeobject and pg_largeobject_metadata sections to make sure each has a link to the other and commit this. I also found one more spelling mistake so I will include that correction as well. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > > Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata"> > > for the first <structname>pg_largeobject</structname> in each topic. > > Those two things aren't the same. Perhaps you meant <link > linkend="catalog-pg-largeobject">? Oops, yes. Thank you for the correction. We also have "8.4.x series" in the core code. Do you think we also rewrite those messages? One of them is an use-visible message. LargeObjectAlterOwner() : pg_largeobject.c * The 'lo_compat_privileges' is not checked here, because we * don'thave any access control features in the 8.4.x series * or earlier release. * So, it is not a place wecan define a compatible behavior. guc.c {"lo_compat_privileges", PGC_SUSET, COMPAT_OPTIONS_PREVIOUS, gettext_noop("Enables backward compatibility inprivilege checks on large objects"), gettext_noop("When turned on, privilege checks on large objects perform " "with backward compatibility as 8.4.x or earlier releases.") Regards, --- Takahiro Itagaki NTT Open Source Software Center
On Thu, Dec 17, 2009 at 7:27 PM, Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: >> > Another comment is I'd like to keep <link linkend="catalog-pg-largeobject-metadata"> >> > for the first <structname>pg_largeobject</structname> in each topic. >> Those two things aren't the same. Perhaps you meant <link >> linkend="catalog-pg-largeobject">? > Oops, yes. Thank you for the correction. > > We also have "8.4.x series" in the core code. Do you think we also > rewrite those messages? One of them is an use-visible message. Yes. I started going through the comments tonight. Partial patch attached. There were two comments that I was unable to understand and therefore could not reword - the one at the top of pg_largeobject_aclmask_snapshot(), and the second part of the comment at the top of LargeObjectExists(): * Note that LargeObjectExists always scans the system catalog * with SnapshotNow, so it is unavailable to use to check * existence in read-only accesses. In both cases, I'm lost. Help? In acldefault(), there is this comment: /* Grant SELECT,UPDATE by default, for now */ This doesn't seem to match what the code is doing, so I think we should remove it. I also notice that dumpBlobComments() is now misnamed, but it seems we've chosen to add a comment mentioning that fact rather than fixing it. That doesn't seem like the right approach. ...Robert
Attachment
Robert Haas <robertmhaas@gmail.com> wrote: > In both cases, I'm lost. Help? They might be contrasted with the comments for myLargeObjectExists. Since we use MVCC visibility in loread(), metadata for large object also should be visible in MVCC rule. If I understand them, they say: * pg_largeobject_aclmask_snapshot requires a snapshot which will be used in loread(). *Don't use LargeObjectExists if you need MVCC visibility. > In acldefault(), there is this comment: > /* Grant SELECT,UPDATE by default, for now */ > This doesn't seem to match what the code is doing, so I think we > should remove it. Ah, ACL_NO_RIGHTS is the default. > I also notice that dumpBlobComments() is now misnamed, but it seems > we've chosen to add a comment mentioning that fact rather than fixing it. Hmmm, now it dumps not only comments but also ownership of large objects. Should we rename it dumpBlobMetadata() or so? Regards, --- Takahiro Itagaki NTT Open Source Software Center
(2009/12/18 15:48), Takahiro Itagaki wrote: > > Robert Haas<robertmhaas@gmail.com> wrote: > >> In both cases, I'm lost. Help? > > They might be contrasted with the comments for myLargeObjectExists. > Since we use MVCC visibility in loread(), metadata for large object > also should be visible in MVCC rule. > > If I understand them, they say: > * pg_largeobject_aclmask_snapshot requires a snapshot which will be > used in loread(). > * Don't use LargeObjectExists if you need MVCC visibility. Yes, correct. >> In acldefault(), there is this comment: >> /* Grant SELECT,UPDATE by default, for now */ >> This doesn't seem to match what the code is doing, so I think we >> should remove it. > > Ah, ACL_NO_RIGHTS is the default. Oops, it reflects very early phase design, but fixed later. >> I also notice that dumpBlobComments() is now misnamed, but it seems >> we've chosen to add a comment mentioning that fact rather than fixing it. > > Hmmm, now it dumps not only comments but also ownership of large objects. > Should we rename it dumpBlobMetadata() or so? It seems to me quite natural. The attached patch fixes them. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2009/12/18 KaiGai Kohei <kaigai@ak.jp.nec.com>: > (2009/12/18 15:48), Takahiro Itagaki wrote: >> >> Robert Haas<robertmhaas@gmail.com> wrote: >> >>> In both cases, I'm lost. Help? >> >> They might be contrasted with the comments for myLargeObjectExists. >> Since we use MVCC visibility in loread(), metadata for large object >> also should be visible in MVCC rule. >> >> If I understand them, they say: >> * pg_largeobject_aclmask_snapshot requires a snapshot which will be >> used in loread(). >> * Don't use LargeObjectExists if you need MVCC visibility. > > Yes, correct. > >>> In acldefault(), there is this comment: >>> /* Grant SELECT,UPDATE by default, for now */ >>> This doesn't seem to match what the code is doing, so I think we >>> should remove it. >> >> Ah, ACL_NO_RIGHTS is the default. > > Oops, it reflects very early phase design, but fixed later. > >>> I also notice that dumpBlobComments() is now misnamed, but it seems >>> we've chosen to add a comment mentioning that fact rather than fixing it. >> >> Hmmm, now it dumps not only comments but also ownership of large objects. >> Should we rename it dumpBlobMetadata() or so? > > It seems to me quite natural. > > The attached patch fixes them. I think we might want to go with dumpBlobProperties(), because dumpBlobMetadata() might lead you to think that all of the properties being dumped are stored in pg_largeobject_metadata, which is not the case. I do also wonder why we are calling these blobs in this code rather than large objects, but that problem predates this patch and I think we might as well leave it alone for now. ...Robert
On Fri, Dec 18, 2009 at 9:00 AM, Robert Haas <robertmhaas@gmail.com> wrote: > 2009/12/18 KaiGai Kohei <kaigai@ak.jp.nec.com>: >> (2009/12/18 15:48), Takahiro Itagaki wrote: >>> >>> Robert Haas<robertmhaas@gmail.com> wrote: >>> >>>> In both cases, I'm lost. Help? >>> >>> They might be contrasted with the comments for myLargeObjectExists. >>> Since we use MVCC visibility in loread(), metadata for large object >>> also should be visible in MVCC rule. >>> >>> If I understand them, they say: >>> * pg_largeobject_aclmask_snapshot requires a snapshot which will be >>> used in loread(). >>> * Don't use LargeObjectExists if you need MVCC visibility. >> >> Yes, correct. >> >>>> In acldefault(), there is this comment: >>>> /* Grant SELECT,UPDATE by default, for now */ >>>> This doesn't seem to match what the code is doing, so I think we >>>> should remove it. >>> >>> Ah, ACL_NO_RIGHTS is the default. >> >> Oops, it reflects very early phase design, but fixed later. >> >>>> I also notice that dumpBlobComments() is now misnamed, but it seems >>>> we've chosen to add a comment mentioning that fact rather than fixing it. >>> >>> Hmmm, now it dumps not only comments but also ownership of large objects. >>> Should we rename it dumpBlobMetadata() or so? >> >> It seems to me quite natural. >> >> The attached patch fixes them. > > I think we might want to go with dumpBlobProperties(), because > dumpBlobMetadata() might lead you to think that all of the properties > being dumped are stored in pg_largeobject_metadata, which is not the > case. Oh. This is more complicated than it appeared on the surface. It seems that the string "BLOB COMMENTS" actually gets inserted into custom dumps somewhere, so I'm not sure whether we can just change it.Was this issue discussed at some point before thiswas committed? Changing it would seem to require inserting some backward compatibility code here. Another option would be to add a separate section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can anyone comment on what the Right Thing To Do is here? ...Robert
On Fri, Dec 18, 2009 at 1:48 AM, Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: >> In both cases, I'm lost. Help? > > They might be contrasted with the comments for myLargeObjectExists. > Since we use MVCC visibility in loread(), metadata for large object > also should be visible in MVCC rule. > > If I understand them, they say: > * pg_largeobject_aclmask_snapshot requires a snapshot which will be > used in loread(). > * Don't use LargeObjectExists if you need MVCC visibility. Part of what I'm confused about (and what I think should be documented in a comment somewhere) is why we're using MVCC visibility in some places but not others. In particular, there seem to be some bits of the comment that imply that we do this for read but not for write, which seems really strange. It may or may not actually be strange, but I don't understand it. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Oh. This is more complicated than it appeared on the surface. It > seems that the string "BLOB COMMENTS" actually gets inserted into > custom dumps somewhere, so I'm not sure whether we can just change it. > Was this issue discussed at some point before this was committed? > Changing it would seem to require inserting some backward > compatibility code here. Another option would be to add a separate > section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can > anyone comment on what the Right Thing To Do is here? The BLOB COMMENTS label is, or was, correct for what it contained. If this patch has usurped it to contain other things I would argue that that is seriously wrong. pg_dump already has a clear notion of how to handle ACLs for objects. ACLs for blobs ought to be made to fit into that structure, not dumped in some random place because that saved a few lines of code. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > Part of what I'm confused about (and what I think should be documented > in a comment somewhere) is why we're using MVCC visibility in some > places but not others. In particular, there seem to be some bits of > the comment that imply that we do this for read but not for write, > which seems really strange. It may or may not actually be strange, > but I don't understand it. It is supposed to depend on whether you opened the blob for read only or for read write. Please do not tell me that this patch broke that; because if it did it broke pg_dump. This behavior is documented at least here: http://www.postgresql.org/docs/8.4/static/lo-interfaces.html#AEN36338 regards, tom lane
On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Oh. This is more complicated than it appeared on the surface. It >> seems that the string "BLOB COMMENTS" actually gets inserted into >> custom dumps somewhere, so I'm not sure whether we can just change it. >> Was this issue discussed at some point before this was committed? >> Changing it would seem to require inserting some backward >> compatibility code here. Another option would be to add a separate >> section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can >> anyone comment on what the Right Thing To Do is here? > > The BLOB COMMENTS label is, or was, correct for what it contained. > If this patch has usurped it to contain other things It has. > I would argue > that that is seriously wrong. pg_dump already has a clear notion > of how to handle ACLs for objects. ACLs for blobs ought to be > made to fit into that structure, not dumped in some random place > because that saved a few lines of code. OK. Hopefully KaiGai or Takahiro can suggest a fix. Thanks, ...Robert
On Fri, Dec 18, 2009 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Part of what I'm confused about (and what I think should be documented >> in a comment somewhere) is why we're using MVCC visibility in some >> places but not others. In particular, there seem to be some bits of >> the comment that imply that we do this for read but not for write, >> which seems really strange. It may or may not actually be strange, >> but I don't understand it. > > It is supposed to depend on whether you opened the blob for read only > or for read write. Please do not tell me that this patch broke that; > because if it did it broke pg_dump. > > This behavior is documented at least here: > http://www.postgresql.org/docs/8.4/static/lo-interfaces.html#AEN36338 Oh, I see. Thanks for the pointer. Having read that through, I can now say that the comments in the patch seem to imply that it attempted to preserve those semantics, but I can't swear that it did. I will take another look at it, but it might bear closer examination by someone with more MVCC-fu than myself. ...Robert
(2009/12/19 12:05), Robert Haas wrote: > On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Robert Haas<robertmhaas@gmail.com> writes: >>> Oh. This is more complicated than it appeared on the surface. It >>> seems that the string "BLOB COMMENTS" actually gets inserted into >>> custom dumps somewhere, so I'm not sure whether we can just change it. >>> Was this issue discussed at some point before this was committed? >>> Changing it would seem to require inserting some backward >>> compatibility code here. Another option would be to add a separate >>> section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can >>> anyone comment on what the Right Thing To Do is here? >> >> The BLOB COMMENTS label is, or was, correct for what it contained. >> If this patch has usurped it to contain other things > > It has. > >> I would argue >> that that is seriously wrong. pg_dump already has a clear notion >> of how to handle ACLs for objects. ACLs for blobs ought to be >> made to fit into that structure, not dumped in some random place >> because that saved a few lines of code. > > OK. Hopefully KaiGai or Takahiro can suggest a fix. Currently, BLOBS (and BLOB COMMENTS) section does not contain owner of the large objects, because it may press the local memory of pg_dump when the database contains massive amount of large objects. I believe it is the reason why we dump all the large objects in a single section. Correct? I don't think it is reasonable to dump all the large object with its individual section. However, we can categorize them per owner. In generally, we can assume the number of database users are smaller than the number of large objects. In other word, we can obtain the number of sections to be dumped as result of the following query: SELECT DISTINCT lomowner FROM pg_largeobject_metadata; Then, we can dump them per users. For earlier versions, all the large objects will be dumped in a single section with anonymous user. What's your opinion? -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
(2009/12/21 9:39), KaiGai Kohei wrote: > (2009/12/19 12:05), Robert Haas wrote: >> On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Robert Haas<robertmhaas@gmail.com> writes: >>>> Oh. This is more complicated than it appeared on the surface. It >>>> seems that the string "BLOB COMMENTS" actually gets inserted into >>>> custom dumps somewhere, so I'm not sure whether we can just change it. >>>> Was this issue discussed at some point before this was committed? >>>> Changing it would seem to require inserting some backward >>>> compatibility code here. Another option would be to add a separate >>>> section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can >>>> anyone comment on what the Right Thing To Do is here? >>> >>> The BLOB COMMENTS label is, or was, correct for what it contained. >>> If this patch has usurped it to contain other things >> >> It has. >> >>> I would argue >>> that that is seriously wrong. pg_dump already has a clear notion >>> of how to handle ACLs for objects. ACLs for blobs ought to be >>> made to fit into that structure, not dumped in some random place >>> because that saved a few lines of code. >> >> OK. Hopefully KaiGai or Takahiro can suggest a fix. The patch has grown larger than I expected before, because the way to handle large objects are far from any other object classes. Here are three points: 1) The new BLOB ACLS section was added. It is a single purpose section to describe GRANT/REVOKE statements on large objects, and BLOB COMMENTS section was reverted to put only descriptions. Because we need to assume a case when the database holds massive number of large objects, it is not reasonable to store them using dumpACL(). It chains an ACL entry with the list of TOC entries, then, these are dumped. It means pg_dump may have to register massive number of large objects in the local memory space. Currently, we also store GRANT/REVOKE statements in BLOB COMMENTS section, but confusable. Even if pg_restore is launched with --no-privileges options, it cannot ignore GRANT/REVOKE statements on large objects. This fix enables to distinguish ACLs on large objects from other properties, and to handle them correctly. 2) The BLOBS section was separated for each database users. Currently, the BLOBS section does not have information about owner of the large objects to be restored. So, we tried to alter its ownership in the BLOB COMMENTS section, but incorrect. The --use-set-session-authorization option requires to restore ownership of objects without ALTER ... OWNER TO statements. So, we need to set up correct database username on the section properties. This patch renamed the hasBlobs() by getBlobs(), and changed its purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS for each large objects owners, if necessary. For example, if here are five users owning large objects, getBlobs() shall register five TOC entries for each users, and dumpBlobs(), dumpBlobComments() and dumpBlobAcls() shall be also invoked five times with the username. 3) _LoadBlobs() For regular database object classes, _printTocEntry() can inject "ALTER xxx OWNER TO ..." statement on the restored object based on the ownership described in the section header. However, we cannot use this infrastructure for large objects as-is, because one BLOBS section can restore multiple large objects. _LoadBlobs() is a routine to restore large objects within a certain section. This patch modifies this routine to inject "ALTER LARGE OBJECT <loid> OWNER TO <user>" statement for each large objects based on the ownership of the section. (if --use-set-session-authorization is not given.) $ diffstat pgsql-fix-pg_dump-blob-privs.patch pg_backup_archiver.c | 4 pg_backup_custom.c | 11 ! pg_backup_files.c | 9 ! pg_backup_tar.c | 9 ! pg_dump.c | 312 +++++++----!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! pg_dump.h | 9 ! pg_dump_sort.c | 8 ! 7 files changed, 68 insertions(+), 25 deletions(-), 269 modifications(!) Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2009/12/22 KaiGai Kohei <kaigai@ak.jp.nec.com>: > (2009/12/21 9:39), KaiGai Kohei wrote: >> (2009/12/19 12:05), Robert Haas wrote: >>> On Fri, Dec 18, 2009 at 9:48 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>> Robert Haas<robertmhaas@gmail.com> writes: >>>>> Oh. This is more complicated than it appeared on the surface. It >>>>> seems that the string "BLOB COMMENTS" actually gets inserted into >>>>> custom dumps somewhere, so I'm not sure whether we can just change it. >>>>> Was this issue discussed at some point before this was committed? >>>>> Changing it would seem to require inserting some backward >>>>> compatibility code here. Another option would be to add a separate >>>>> section for "BLOB METADATA", and leave "BLOB COMMENTS" alone. Can >>>>> anyone comment on what the Right Thing To Do is here? >>>> >>>> The BLOB COMMENTS label is, or was, correct for what it contained. >>>> If this patch has usurped it to contain other things >>> >>> It has. >>> >>>> I would argue >>>> that that is seriously wrong. pg_dump already has a clear notion >>>> of how to handle ACLs for objects. ACLs for blobs ought to be >>>> made to fit into that structure, not dumped in some random place >>>> because that saved a few lines of code. >>> >>> OK. Hopefully KaiGai or Takahiro can suggest a fix. > > The patch has grown larger than I expected before, because the way > to handle large objects are far from any other object classes. > > Here are three points: > > 1) The new BLOB ACLS section was added. > > It is a single purpose section to describe GRANT/REVOKE statements > on large objects, and BLOB COMMENTS section was reverted to put > only descriptions. > > Because we need to assume a case when the database holds massive > number of large objects, it is not reasonable to store them using > dumpACL(). It chains an ACL entry with the list of TOC entries, > then, these are dumped. It means pg_dump may have to register massive > number of large objects in the local memory space. > > Currently, we also store GRANT/REVOKE statements in BLOB COMMENTS > section, but confusable. Even if pg_restore is launched with > --no-privileges options, it cannot ignore GRANT/REVOKE statements > on large objects. This fix enables to distinguish ACLs on large > objects from other properties, and to handle them correctly. > > 2) The BLOBS section was separated for each database users. > > Currently, the BLOBS section does not have information about owner > of the large objects to be restored. So, we tried to alter its > ownership in the BLOB COMMENTS section, but incorrect. > > The --use-set-session-authorization option requires to restore > ownership of objects without ALTER ... OWNER TO statements. > So, we need to set up correct database username on the section > properties. > > This patch renamed the hasBlobs() by getBlobs(), and changed its > purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS > for each large objects owners, if necessary. > For example, if here are five users owning large objects, getBlobs() > shall register five TOC entries for each users, and dumpBlobs(), > dumpBlobComments() and dumpBlobAcls() shall be also invoked five > times with the username. > > 3) _LoadBlobs() > > For regular database object classes, _printTocEntry() can inject > "ALTER xxx OWNER TO ..." statement on the restored object based on > the ownership described in the section header. > However, we cannot use this infrastructure for large objects as-is, > because one BLOBS section can restore multiple large objects. > > _LoadBlobs() is a routine to restore large objects within a certain > section. This patch modifies this routine to inject "ALTER LARGE > OBJECT <loid> OWNER TO <user>" statement for each large objects > based on the ownership of the section. > (if --use-set-session-authorization is not given.) > > > $ diffstat pgsql-fix-pg_dump-blob-privs.patch > pg_backup_archiver.c | 4 > pg_backup_custom.c | 11 ! > pg_backup_files.c | 9 ! > pg_backup_tar.c | 9 ! > pg_dump.c | 312 +++++++----!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > pg_dump.h | 9 ! > pg_dump_sort.c | 8 ! > 7 files changed, 68 insertions(+), 25 deletions(-), 269 modifications(!) I will review this sooner if I have time, but please make sure it gets added to the next CommitFest so we don't lose it. I think it also needs to be added here, since AFAICS this is a must-fix for 8.5. http://wiki.postgresql.org/wiki/PostgreSQL_8.5_Open_Items Thanks, ...Robert
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > This patch renamed the hasBlobs() by getBlobs(), and changed its > purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS > for each large objects owners, if necessary. This patch adds DumpableObjectType DO_BLOB_ACLS and struct BlobsInfo. We use three BlobsInfo objects for DO_BLOBS, DO_BLOB_COMMENTS, and DO_BLOB_ACLS _for each distinct owners_ of large objects. So, even if we have many large objects in the database, we just keep at most (3 * num-of-roles) BlobsInfo in memory. For older versions of server, we assume that blobs are owned by only one user with an empty name. We have no BlobsInfo if no large objects. I'm not sure whether we need to make groups for each owner of large objects. If I remember right, the primary issue was separating routines for dump BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change? Another concern is their confusable identifier names -- getBlobs() returns BlobsInfo for each owners. Could we rename them something like getBlobOwners() and BlobOwnerInfo? Also, DumpableObject.name is not used in BlobsInfo. We could reuse DumpableObject.name instead of the "rolname" field in BlobsInfo. Regards, --- Takahiro Itagaki NTT Open Source Software Center
(2010/01/21 16:52), Takahiro Itagaki wrote: > > KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: > >> This patch renamed the hasBlobs() by getBlobs(), and changed its >> purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS >> for each large objects owners, if necessary. > > This patch adds DumpableObjectType DO_BLOB_ACLS and struct BlobsInfo. We > use three BlobsInfo objects for DO_BLOBS, DO_BLOB_COMMENTS, and DO_BLOB_ACLS > _for each distinct owners_ of large objects. So, even if we have many large > objects in the database, we just keep at most (3 * num-of-roles) BlobsInfo > in memory. For older versions of server, we assume that blobs are owned by > only one user with an empty name. We have no BlobsInfo if no large objects. > > > I'm not sure whether we need to make groups for each owner of large objects. > If I remember right, the primary issue was separating routines for dump > BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change? When --use-set-session-authorization is specified, pg_restore tries to change database role of the current session just before creation of database objects to be restored. Ownership of the database objects are recorded in the section header, and it informs pg_restore who should be owner of the objects to be restored in this section. Then, pg_restore can generate ALTER xxx OWNER TO after creation, or SET SESSION AUTHORIZATION before creation in runtime. So, we cannot put creation of largeobjects with different ownership in same section. It is the reason why we have to group largeobjects by database user. > Another concern is their confusable identifier names -- getBlobs() > returns BlobsInfo for each owners. Could we rename them something > like getBlobOwners() and BlobOwnerInfo? OK, I'll do. > Also, DumpableObject.name is not used in BlobsInfo. We could reuse > DumpableObject.name instead of the "rolname" field in BlobsInfo. Isn't it confusable for the future hacks? It follows the manner in other database object classes. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > > I'm not sure whether we need to make groups for each owner of large objects. > > If I remember right, the primary issue was separating routines for dump > > BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change? > > When --use-set-session-authorization is specified, pg_restore tries to > change database role of the current session just before creation of > database objects to be restored. > > Ownership of the database objects are recorded in the section header, > and it informs pg_restore who should be owner of the objects to be > restored in this section. > > Then, pg_restore can generate ALTER xxx OWNER TO after creation, or > SET SESSION AUTHORIZATION before creation in runtime. > So, we cannot put creation of largeobjects with different ownership > in same section. > > It is the reason why we have to group largeobjects by database user. Ah, I see. Then... What happen if we drop or rename roles who have large objects during pg_dump? Does the patch still work? It uses pg_get_userbyid(). Regards, --- Takahiro Itagaki NTT Open Source Software Center
(2010/01/21 19:42), Takahiro Itagaki wrote: > > KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: > >>> I'm not sure whether we need to make groups for each owner of large objects. >>> If I remember right, the primary issue was separating routines for dump >>> BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change? >> >> When --use-set-session-authorization is specified, pg_restore tries to >> change database role of the current session just before creation of >> database objects to be restored. >> >> Ownership of the database objects are recorded in the section header, >> and it informs pg_restore who should be owner of the objects to be >> restored in this section. >> >> Then, pg_restore can generate ALTER xxx OWNER TO after creation, or >> SET SESSION AUTHORIZATION before creation in runtime. >> So, we cannot put creation of largeobjects with different ownership >> in same section. >> >> It is the reason why we have to group largeobjects by database user. > > Ah, I see. > > Then... What happen if we drop or rename roles who have large objects > during pg_dump? Does the patch still work? It uses pg_get_userbyid(). Indeed, pg_get_userbyid() internally uses SnapshotNow always, then it can return "unknown (OID=%u)" in this case. The "username_subquery" should be here, instead. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
The attached patch is a revised version. List of updates: - cleanup: getBlobs() was renamed to getBlobOwners() - cleanup: BlobsInfo was renamed to BlobOwnerInfo - bugfix: pg_get_userbyid() in SQLs were replaced by username_subquery which constins a right subquery to obtain a username for the given id. It enables to run pg_dump under the concurrent role deletion. - bugfix: Even if we give -a (--data-only) or -O (--no-owner) options, or archive does not contain Owner information, it tried to write out "ALTER LARGE OBJECT xxx OWNER TO ..." statement. - bugfix: Even if we give -a (--data-only) or -x (--no-privileges) options, it tried to write out "BLOB ACLS" section. The last two are the problems I noticed newly. It needs to introduce them. The BLOB section can contain multiple definitions of large objects, unlike any other object classes. It is also a reason why I had to group large objects by database user. The Owner tag of BLOB section is used to identify owner of the large objects to be restored, and also used in --use-set-session-authorization mode. However, we need to inject "ALTER LARGE OBJECT xxx OWNER TO ..." statement for each lo_create() in _LoadBlobs(), because we cannot know how many large objects are in the section before reading the archive. But the last patch didn't pay mention about -a/-O option and an archive which does not have Owner: tag. The BLOB ACLS section is categorized to SECTION_DATA, it follows the manner in BLOB COMMENTS behavior. In same reason, it has to handle the -a/-x option by itself, but the last patch didn't handle it well. BTW, here is a known issue. When we run pg_dump with -s(--schema-only), it write out descriptions of regular object classes, but does not write out the description of large objects. It seems to me the description of large objects are considered as a part of data, not properties. However, it might be inconsistent. The reason of this behavior is all the BLOB dumps are categorized to SECTION_DATA, so -s option informs pg_backup_archiver.c to skip routines related to large objects. However, it may be a time to consider this code into two steps. In the schema section stage, - It creates empty large objects with lo_create() - It restores the description of the large objects. - It restores the ownership/privileges of the large objects. In the date section stage, - It loads actual data contents into the empty large objects with lowrite(). Thanks, (2010/01/21 19:42), Takahiro Itagaki wrote: > > KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: > >>> I'm not sure whether we need to make groups for each owner of large objects. >>> If I remember right, the primary issue was separating routines for dump >>> BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change? >> >> When --use-set-session-authorization is specified, pg_restore tries to >> change database role of the current session just before creation of >> database objects to be restored. >> >> Ownership of the database objects are recorded in the section header, >> and it informs pg_restore who should be owner of the objects to be >> restored in this section. >> >> Then, pg_restore can generate ALTER xxx OWNER TO after creation, or >> SET SESSION AUTHORIZATION before creation in runtime. >> So, we cannot put creation of largeobjects with different ownership >> in same section. >> >> It is the reason why we have to group largeobjects by database user. > > Ah, I see. > > Then... What happen if we drop or rename roles who have large objects > during pg_dump? Does the patch still work? It uses pg_get_userbyid(). > > Regards, > --- > Takahiro Itagaki > NTT Open Source Software Center > > > -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
KaiGai Kohei <kaigai@ak.jp.nec.com> writes: > The attached patch is a revised version. I'm inclined to wonder whether this patch doesn't prove that we've reached the end of the line for the current representation of blobs in pg_dump archives. The alternative that I'm thinking about is to treat each blob as an independent object (hence, with its own TOC entry). If we did that, then the standard pg_dump mechanisms for ownership, ACLs, and comments would apply, and we could get rid of the messy hacks that this patch is just adding to. That would also open the door to future improvements like being able to selectively restore blobs. (Actually you could do it immediately if you didn't mind editing a -l file...) And it would for instance allow loading of blobs to be parallelized. Now the argument against that is that it won't scale terribly well to situations with very large numbers of blobs. However, I'm not convinced that the current approach of cramming them all into one TOC entry scales so well either. If your large objects are actually large, there's not going to be an enormous number of them. We've heard of people with many tens of thousands of tables, and pg_dump speed didn't seem to be a huge bottleneck for them (at least not in recent versions). So I'm feeling we should not dismiss the idea of one TOC entry per blob. Thoughts? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Now the argument against that is that it won't scale terribly well > to situations with very large numbers of blobs. However, I'm not > convinced that the current approach of cramming them all into one > TOC entry scales so well either. If your large objects are > actually large, there's not going to be an enormous number of > them. We've heard of people with many tens of thousands of > tables, and pg_dump speed didn't seem to be a huge bottleneck for > them (at least not in recent versions). So I'm feeling we should > not dismiss the idea of one TOC entry per blob. > > Thoughts? We've got a "DocImage" table with about 7 million rows storing PDF documents in a bytea column, approaching 1 TB of data. (We don't want to give up ACID guarantees, replication, etc. by storing them on the file system with filenames in the database.) This works pretty well, except that client software occasionally has a tendency to run out of RAM. The interface could arguably be cleaner if we used BLOBs, but the security issues have precluded that in PostgreSQL. I suspect that 7 million BLOBs (and growing fast) would be a problem for this approach. Of course, if we're atypical, we could stay with bytea if this changed. Just a data point. -Kevin cir=> select count(*) from "DocImage"; count ---------6891626 (1 row) cir=> select pg_size_pretty(pg_total_relation_size('"DocImage"'));pg_size_pretty ----------------956 GB (1 row)
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We've heard of people with many tens of thousands of >> tables, and pg_dump speed didn't seem to be a huge bottleneck for >> them (at least not in recent versions). So I'm feeling we should >> not dismiss the idea of one TOC entry per blob. >> >> Thoughts? > I suspect that 7 million BLOBs (and growing fast) would be a problem > for this approach. Of course, if we're atypical, we could stay with > bytea if this changed. Just a data point. Do you have the opportunity to try an experiment on hardware similar to what you're running that on? Create a database with 7 million tables and see what the dump/restore times are like, and whether pg_dump/pg_restore appear to be CPU-bound or memory-limited when doing it. If they aren't, we could conclude that millions of TOC entries isn't a problem. A compromise we could consider is some sort of sub-TOC-entry scheme that gets the per-BLOB entries out of the main speed bottlenecks, while still letting us share most of the logic. For instance, I suspect that the first bottleneck in pg_dump would be the dependency sorting, but we don't really need to sort all the blobs individually for that. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Do you have the opportunity to try an experiment on hardware > similar to what you're running that on? Create a database with 7 > million tables and see what the dump/restore times are like, and > whether pg_dump/pg_restore appear to be CPU-bound or > memory-limited when doing it. If these can be empty (or nearly empty) tables, I can probably swing it as a background task. You didn't need to match the current 1.3 TB database size I assume? > If they aren't, we could conclude that millions of TOC entries > isn't a problem. I'd actually be rather more concerned about the effects on normal query plan times, or are you confident that won't be an issue? > A compromise we could consider is some sort of sub-TOC-entry > scheme that gets the per-BLOB entries out of the main speed > bottlenecks, while still letting us share most of the logic. For > instance, I suspect that the first bottleneck in pg_dump would be > the dependency sorting, but we don't really need to sort all the > blobs individually for that. That might also address the plan time issue, if it actually exists. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Do you have the opportunity to try an experiment on hardware >> similar to what you're running that on? Create a database with 7 >> million tables and see what the dump/restore times are like, and >> whether pg_dump/pg_restore appear to be CPU-bound or >> memory-limited when doing it. > If these can be empty (or nearly empty) tables, I can probably swing > it as a background task. You didn't need to match the current 1.3 > TB database size I assume? Empty is fine. >> If they aren't, we could conclude that millions of TOC entries >> isn't a problem. > I'd actually be rather more concerned about the effects on normal > query plan times, or are you confident that won't be an issue? This is only a question of what happens internally in pg_dump and pg_restore --- I'm not suggesting we change anything on the database side. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Empty is fine. I'll get started. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > I'll get started. After a couple false starts, the creation of the millions of tables is underway. At the rate it's going, it won't finish for 8.2 hours, so I'll have to come in and test the dump tomorrow morning. -Kevin
(2010/01/23 5:12), Tom Lane wrote: > KaiGai Kohei<kaigai@ak.jp.nec.com> writes: >> The attached patch is a revised version. > > I'm inclined to wonder whether this patch doesn't prove that we've > reached the end of the line for the current representation of blobs > in pg_dump archives. The alternative that I'm thinking about is to > treat each blob as an independent object (hence, with its own TOC > entry). If we did that, then the standard pg_dump mechanisms for > ownership, ACLs, and comments would apply, and we could get rid of > the messy hacks that this patch is just adding to. That would also > open the door to future improvements like being able to selectively > restore blobs. (Actually you could do it immediately if you didn't > mind editing a -l file...) And it would for instance allow loading > of blobs to be parallelized. I also think it is better approach than the current blob representation. > Now the argument against that is that it won't scale terribly well > to situations with very large numbers of blobs. However, I'm not > convinced that the current approach of cramming them all into one > TOC entry scales so well either. If your large objects are actually > large, there's not going to be an enormous number of them. We've > heard of people with many tens of thousands of tables, and pg_dump > speed didn't seem to be a huge bottleneck for them (at least not > in recent versions). So I'm feeling we should not dismiss the > idea of one TOC entry per blob. Even if the database contains massive number of large objects, all the pg_dump has to manege on RAM is its metadata, not data contents. If we have one TOC entry per blob, the amount of total i/o size between server and pg_dump is not different from the current approach. If we assume one TOC entry consume 64 bytes of RAM, it needs 450MB of RAM for 7 million BLOBs. In the recent computers, is it unacceptable pain? If you try to dump TB class database, I'd like to assume the machine where pg_dump runs has adequate storage and RAM. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Do you have the opportunity to try an experiment on hardware >>> similar to what you're running that on? Create a database with >>> 7 million tables and see what the dump/restore times are like, >>> and whether pg_dump/pg_restore appear to be CPU-bound or >>> memory-limited when doing it. > >> If these can be empty (or nearly empty) tables, I can probably >> swing it as a background task. You didn't need to match the >> current 1.3 TB database size I assume? > > Empty is fine. After about 15 hours of run time it was around 5.5 million tables; the rate of creation had slowed rather dramatically. I did create them with primary keys (out of habit) which was probably the wrong thing. I canceled the table creation process and started a VACUUM ANALYZE, figuring that we didn't want any hint-bit writing or bad statistics confusing the results. That has been running for 30 minutes with 65 MB to 140 MB per second disk activity, mixed read and write. After a few minutes that left me curious just how big the database was, so I tried: select pg_size_pretty(pg_database_size('test')); I did a Ctrl+C after about five minutes and got: Cancel request sent but it didn't return for 15 or 20 minutes. Any attempt to query pg_locks stalls. Tab completion stalls. (By the way, this is not related to the false alarm on that yesterday, which was a result of my attempting tab completion from within a failed transaction, which just found nothing rather than stalling.) So I'm not sure whether I can get to a state suitable for starting the desired test, but I'll stay with a for a while. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > So I'm not sure whether I can get to a state suitable for starting > the desired test, but I'll stay with a for a while. I have other commitments today, so I'm going to leave the VACUUM ANALYZE running and come back tomorrow morning to try the pg_dump. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > ... After a few minutes that left me curious just how big > the database was, so I tried: > select pg_size_pretty(pg_database_size('test')); > I did a Ctrl+C after about five minutes and got: > Cancel request sent > but it didn't return for 15 or 20 minutes. Hm, we probably are lacking CHECK_FOR_INTERRUPTS in the inner loops in dbsize.c ... regards, tom lane
KaiGai Kohei <kaigai@kaigai.gr.jp> writes: > (2010/01/23 5:12), Tom Lane wrote: >> Now the argument against that is that it won't scale terribly well >> to situations with very large numbers of blobs. > Even if the database contains massive number of large objects, all the > pg_dump has to manege on RAM is its metadata, not data contents. I'm not so worried about the amount of RAM needed as whether pg_dump's internal algorithms will scale to large numbers of TOC entries. Any O(N^2) behavior would be pretty painful, for example. No doubt we could fix any such problems, but it might take more work than we want to do right now. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not so worried about the amount of RAM needed as whether > pg_dump's internal algorithms will scale to large numbers of TOC > entries. Any O(N^2) behavior would be pretty painful, for > example. No doubt we could fix any such problems, but it might > take more work than we want to do right now. I'm afraid pg_dump didn't get very far with this before: pg_dump: WARNING: out of shared memory pg_dump: SQL command failed pg_dump: Error message from server: ERROR: out of shared memory HINT: You might need to increase max_locks_per_transaction. pg_dump: The command was: LOCK TABLE public.test2672 IN ACCESS SHARE MODE Given how fast it happened, I suspect that it was 2672 tables into the dump, versus 26% of the way through 5.5 million tables. A sampling of the vmstat 1 output lines in "baseline" state -- before the dump started: procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------1 0 319804 583656 23372 124473248 0 0 17224 10 1742 18995 9 1 88 2 03 1 319804 595840 23368 124458856 0 0 17016 10 2014 22965 9 1 89 1 01 0 319804 586912 23376 124469128 0 0 16808 158 1807 19181 8 1 89 2 02 0 319804 576304 23368 124479416 0 0 16840 5 1764 19136 8 1 90 1 00 1 319804 590480 23364 124459888 0 0 1488 130 3449 13844 2 1 93 3 00 1 319804 589476 23364 124460912 0 0 1456 115 3328 11800 2 1 94 4 01 0 319804 588468 23364 124461944 0 0 1376 146 3156 11770 2 1 95 2 01 1 319804 587836 23364 124465024 0 0 1576 133 3599 14797 3 1 94 3 0 While it was running: procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------2 1 429080 886244 23308 111242464 0 0 25684 38 2920 18847 7 3 85 5 02 1 429080 798172 23308 111297976 0 0 40024 26 1342 16967 13 2 82 4 02 1 429080 707708 23308 111357600 0 0 42520 34 1588 19148 13 2 81 4 00 5 429080 620700 23308 111414144 0 0 40272 73863 1434 18077 12 2 80 6 01 5 429080 605616 23308 111425448 0 0 6920 131232 729 5187 3 1 66 31 00 6 429080 582852 23316 111442912 0 0 10840 131248 665 4987 3 1 66 30 02 4 429080 584976 23308 111433672 0 0 9776 139416 693 7890 4 1 66 29 00 5 429080 575752 23308 111436752 0 0 10776 131217 647 6157 3 1 66 30 01 3 429080 583768 23308 111420304 0 0 13616 90352 1043 13047 6 1 68 25 04 0 429080 578888 23300 111397696 0 0 40000 44 1347 25329 12 2 79 6 02 1 429080 582368 23292 111367896 0 0 40320 76 1517 28628 13 2 80 5 02 0 429080 584960 23276 111338096 0 0 40240 163 1374 26988 13 2 80 5 06 0 429080 576176 23268 111319600 0 0 40328 170 1465 27229 13 2 80 5 04 0 429080 583212 23212 111288816 0 0 39568 138 1418 27296 13 2 80 5 0 This box has 16 CPUs, so the jump from 3% user CPU to 13% with an increase of I/O wait from 3% to 5% suggests that pg_dump was primarily CPU bound in user code before the crash. I can leave this database around for a while if there are other things you would like me to try. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > I'm afraid pg_dump didn't get very far with this before: > pg_dump: WARNING: out of shared memory > pg_dump: SQL command failed > Given how fast it happened, I suspect that it was 2672 tables into > the dump, versus 26% of the way through 5.5 million tables. Yeah, I didn't think about that. You'd have to bump max_locks_per_transaction up awfully far to get to where pg_dump could dump millions of tables, because it wants to lock each one. It might be better to try a test case with lighter-weight objects, say 5 million simple functions. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > It might be better to try a test case with lighter-weight objects, > say 5 million simple functions. So the current database is expendable? I'd just as soon delete it before creating the other one, if you're fairly confident the other one will do it. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It might be better to try a test case with lighter-weight objects, >> say 5 million simple functions. > So the current database is expendable? Yeah, I think it was a bad experimental design anyway... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > It might be better to try a test case with lighter-weight objects, > say 5 million simple functions. A dump of that quickly settled into running a series of these: SELECT proretset, prosrc, probin, pg_catalog.pg_get_function_arguments(oid) AS funcargs, pg_catalog.pg_get_fun ction_identity_arguments(oid) AS funciargs, pg_catalog.pg_get_function_result(oid) AS funcresult, proiswindow, provolatile, proisstrict, prosecdef, proconfig, procost, prorows, (S ELECT lanname FROM pg_catalog.pg_language WHERE oid = prolang) AS lanname FROM pg_catalog.pg_proc WHERE oid = '1404528'::pg_catalog.oid (with different oid values, of course). Is this before or after the point you were worried about. Anything in particular for which I should be alert? -Kevin
Tom Lane <tgl@sss.pgh.pa.us> wrote: > It might be better to try a test case with lighter-weight objects, > say 5 million simple functions. Said dump ran in about 45 minutes with no obvious stalls or problems. The 2.2 GB database dumped to a 1.1 GB text file, which was a little bit of a surprise. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It might be better to try a test case with lighter-weight objects, >> say 5 million simple functions. > Said dump ran in about 45 minutes with no obvious stalls or > problems. The 2.2 GB database dumped to a 1.1 GB text file, which > was a little bit of a surprise. Did you happen to notice anything about pg_dump's memory consumption? For an all-DDL case like this, I'd sort of expect the memory usage to be comparable to the output file size. Anyway this seems to suggest that we don't have any huge problem with large numbers of archive TOC objects, so the next step probably is to look at how big a code change it would be to switch over to TOC-per-blob. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Did you happen to notice anything about pg_dump's memory > consumption? Not directly, but I was running 'vmstat 1' throughout. Cache space dropped about 2.1 GB while it was running and popped back up to the previous level at the end. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Did you happen to notice anything about pg_dump's memory >> consumption? > > Not directly, but I was running 'vmstat 1' throughout. Cache > space dropped about 2.1 GB while it was running and popped back up > to the previous level at the end. I took a closer look, and there's some bad news, I think. The above numbers were from the ends of the range. I've gone back over and found that while it dropped about 2.1 GB almost immediately, cache usage slowly dropped throughout the dump, and bottomed at about 6.9 GB below baseline. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Did you happen to notice anything about pg_dump's memory >>> consumption? > I took a closer look, and there's some bad news, I think. The above > numbers were from the ends of the range. I've gone back over and > found that while it dropped about 2.1 GB almost immediately, cache > usage slowly dropped throughout the dump, and bottomed at about 6.9 > GB below baseline. OK, that's still not very scary --- it just means my off-the-cuff estimate of 1:1 space usage was bad. 3:1 isn't that surprising either given padding and other issues. The representation of ArchiveEntries could probably be made a bit more compact ... regards, tom lane
The attached patch uses one TOC entry for each blob objects. It adds two new section types. * "BLOB ITEM" This section provides properties of a certain large object. It contains a query to create an empty large object, and restore ownership of the large object, if necessary. | -- | -- Name: 16406; Type: BLOB ITEM; Schema: -; Owner: ymj | -- | | SELECT lo_create(16406); | | ALTER LARGE OBJECT "16406" OWNER TO ymj; The comment descriptions were moved to COMMENT section, like any other object classes. | -- | -- Name: LARGE OBJECT 16406; Type: COMMENT; Schema: -; Owner: ymj | -- | | COMMENT ON LARGE OBJECT 16406 IS 'This is a small large object.'; Also, access privileges were moved to ACL section. | -- | -- Name: 16405; Type: ACL; Schema: -; Owner: kaigai | -- | | REVOKE ALL ON LARGE OBJECT 16405 FROM PUBLIC; | REVOKE ALL ON LARGE OBJECT 16405 FROM kaigai; | GRANT ALL ON LARGE OBJECT 16405 TO kaigai; | GRANT ALL ON LARGE OBJECT 16405 TO PUBLIC; * "BLOB DATA" This section is same as existing "BLOBS" section, except for _LoadBlobs() does not create a new large object before opening it with INV_WRITE, and lo_truncate() will be used instead of lo_unlink() when --clean is given. The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read for compatibility, but newer pg_dump never create these sections. Internally, the getBlobs() scans all the blobs and makes DO_BLOB_ITEM entries for each blobs and a DO_BLOB_DATA entry if the database contains a large object at least. The _PrintTocData() handles both of "BLOBS" and "BLOB DATA" sections. If the given section is "BLOB DATA", it calls _LoadBlobs() of the specified format with compat = false, because this section is new style. In this case, _LoadBlobs() does not create a large object before opening it with INV_WRITE, because "BLOB ITEM" section already create an empty large obejct. And DropBlobIfExists() was renamed to CleanupBlobIfExists(), because it is modified to apply lo_truncate() if "BLOB DATA" section. When --clean is given, SELECT lo_unlink(xxx) will be injected on the head of queries to store, instead of the mid-flow of loading blobs. One remained issue is the way to decide whether blobs to be dumped, or not. Right now, --schema-only eliminate all the blob dumps. However, I think it should follow the manner in any other object classes. -a, --data-only ... only "BLOB DATA" sections, not "BLOB ITEM" -s, --schema-only ... only "BLOB ITEM" sections, not "BLOB DATA" -b, --blobs ... both of "BLOB ITEM" and "BLOB DATA" independently from --data-only and --schema-only? Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > The attached patch uses one TOC entry for each blob objects. When I'm testing the new patch, I found "ALTER LARGE OBJECT" command returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT" instead? As I remember, we had decided not to use LARGEOBJECT (without a space) in user-visible messages, right? Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: > When I'm testing the new patch, I found "ALTER LARGE OBJECT" command > returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT" > instead? As I remember, we had decided not to use LARGEOBJECT > (without a space) in user-visible messages, right? The command tag should match the actual command. If the command name is "ALTER LARGE OBJECT", the command tag should be too. This is independent of phraseology we might choose in error messages (though I agree I don't like "largeobject" in those either). regards, tom lane
(2010/01/28 18:21), Takahiro Itagaki wrote: > > KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: > >> The attached patch uses one TOC entry for each blob objects. > > When I'm testing the new patch, I found "ALTER LARGE OBJECT" command > returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT" > instead? As I remember, we had decided not to use LARGEOBJECT > (without a space) in user-visible messages, right? Sorry, I left for fix this tag when I was pointed out LARGEOBJECT should be LARGE(space)OBJECT. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > > When I'm testing the new patch, I found "ALTER LARGE OBJECT" command > > returns "ALTER LARGEOBJECT" tag. Should it be "ALTER LARGE(space)OBJECT" > > instead? > > Sorry, I left for fix this tag when I was pointed out LARGEOBJECT should > be LARGE(space)OBJECT. Committed. Thanks. Regards, --- Takahiro Itagaki NTT Open Source Software Center
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > The attached patch uses one TOC entry for each blob objects. This patch does not only fix the existing bugs, but also refactor the dump format of large objects in pg_dump. The new format are more similar to the format of tables: Section <Tables> <New LO> <Old LO> -----------------------------------------------------Schema "TABLE" "BLOB ITEM" "BLOBS"Data "TABLE DATA" "BLOBDATA" "BLOBS"Comments "COMMENT" "COMMENT" "BLOB COMMENTS" We will allocate BlobInfo in memory for each large object. It might consume much more memory compared with former versions if we have many large objects, but we discussed it is an acceptable change. As far as I read, the patch is almost ready to commit except the following issue about backward compatibility: > * "BLOB DATA" > This section is same as existing "BLOBS" section, except for _LoadBlobs() > does not create a new large object before opening it with INV_WRITE, and > lo_truncate() will be used instead of lo_unlink() when --clean is given. > The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read > for compatibility, but newer pg_dump never create these sections. I wonder we need to support older versions in pg_restore. You add a check "PQserverVersion >= 80500" in CleanupBlobIfExists(), but out documentation says we cannot use pg_restore 9.0 for 8.4 or older servers: http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html | it is not guaranteed that pg_dump's output can be loaded into | a server of an older major version Can we remove such path and raise an error instead? Also, even if we support the older servers in the routine, the new bytea format will be another problem anyway. > One remained issue is the way to decide whether blobs to be dumped, or not. > Right now, --schema-only eliminate all the blob dumps. > However, I think it should follow the manner in any other object classes. > > -a, --data-only ... only "BLOB DATA" sections, not "BLOB ITEM" > -s, --schema-only ... only "BLOB ITEM" sections, not "BLOB DATA" > -b, --blobs ... both of "BLOB ITEM" and "BLOB DATA" independently > from --data-only and --schema-only? I cannot image situations that require data-only dumps -- for example, restoring database has a filled pg_largeobject_metadata and an empty or broken pg_largeobject -- but it seems to be unnatural. I'd prefer to keep the existing behavior: * default or data-only : dump all attributes and data of blobs * schema-only : don't dump any blobs and have independent options to control blob dumps: * -b, --blobs : dump all blobs even if schema-only * -B, --no-blobs: [NEW] don't dump any blobs even if default or data-only Regards, --- Takahiro Itagaki NTT Open Source Software Center
(2010/02/01 14:19), Takahiro Itagaki wrote: > As far as I read, the patch is almost ready to commit > except the following issue about backward compatibility: > >> * "BLOB DATA" >> This section is same as existing "BLOBS" section, except for _LoadBlobs() >> does not create a new large object before opening it with INV_WRITE, and >> lo_truncate() will be used instead of lo_unlink() when --clean is given. > >> The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read >> for compatibility, but newer pg_dump never create these sections. > > I wonder we need to support older versions in pg_restore. You add a check > "PQserverVersion>= 80500" in CleanupBlobIfExists(), but out documentation > says we cannot use pg_restore 9.0 for 8.4 or older servers: > > http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html > | it is not guaranteed that pg_dump's output can be loaded into > | a server of an older major version > > Can we remove such path and raise an error instead? > Also, even if we support the older servers in the routine, > the new bytea format will be another problem anyway. OK, I'll fix it. >> One remained issue is the way to decide whether blobs to be dumped, or not. >> Right now, --schema-only eliminate all the blob dumps. >> However, I think it should follow the manner in any other object classes. >> >> -a, --data-only ... only "BLOB DATA" sections, not "BLOB ITEM" >> -s, --schema-only ... only "BLOB ITEM" sections, not "BLOB DATA" >> -b, --blobs ... both of "BLOB ITEM" and "BLOB DATA" independently >> from --data-only and --schema-only? > > I cannot image situations that require data-only dumps -- for example, > restoring database has a filled pg_largeobject_metadata and an empty > or broken pg_largeobject -- but it seems to be unnatural. Indeed, it might not be a sane situation. However, we can assume the situation that user wants to backup a database into two separated files (one for schema definition; one for data contents). Just after restoring schema definitions, all the large obejcts are created as empty blobs. Then, we can restore their data contents. I wonder if the behavior is easily understandable for end users. The "BLOB ITEM" section contains properties of a certain large obejct, such as identifier (loid), comment, ownership and access privileges. These are categorized to schema definitions in other object classes, but we still need special treatment for blobs. The --schema-only with large objects might be unnatural, but the --data-only with properties of large objects are also unnatural. Which behavior is more unnatural? Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > > Can we remove such path and raise an error instead? > > Also, even if we support the older servers in the routine, > > the new bytea format will be another problem anyway. > > OK, I'll fix it. I think we might need to discuss about explicit version checks in pg_restore. It is not related with large objects, but with pg_restore's capability. We've not supported to restore a dump to older servers, but we don't have any version checks, right? Should we do the checks at the beginning of restoring? If we do so, LO patch could be more simplified. > The --schema-only with large objects might be unnatural, but the > --data-only with properties of large objects are also unnatural. > Which behavior is more unnatural? I think large object metadata is a kind of row-based access controls. How do we dump and restore ACLs per rows when we support them for normal tables? We should treat LO metadata as same as row-based ACLs. In my opinion, I'd like to treat them as part of data (not of schema). Regards, --- Takahiro Itagaki NTT Open Source Software Center
(2010/02/02 9:33), Takahiro Itagaki wrote: > > KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: > >>> Can we remove such path and raise an error instead? >>> Also, even if we support the older servers in the routine, >>> the new bytea format will be another problem anyway. >> >> OK, I'll fix it. > > I think we might need to discuss about explicit version checks in pg_restore. > It is not related with large objects, but with pg_restore's capability. > We've not supported to restore a dump to older servers, but we don't have any > version checks, right? Should we do the checks at the beginning of restoring? > If we do so, LO patch could be more simplified. I agree it is a good idea. >> The --schema-only with large objects might be unnatural, but the >> --data-only with properties of large objects are also unnatural. >> Which behavior is more unnatural? > > I think large object metadata is a kind of row-based access controls. > How do we dump and restore ACLs per rows when we support them for > normal tables? We should treat LO metadata as same as row-based ACLs. > In my opinion, I'd like to treat them as part of data (not of schema). OK, I'll update the patch according to the behavior you suggested. | I'd prefer to keep the existing behavior: | * default or data-only : dump all attributes and data of blobs | * schema-only : don't dump any blobs | and have independent options to control blob dumps: | * -b, --blobs : dump all blobs even if schema-only | * -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only Please wait for a while. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
>>> The --schema-only with large objects might be unnatural, but the >>> --data-only with properties of large objects are also unnatural. >>> Which behavior is more unnatural? >> >> I think large object metadata is a kind of row-based access controls. >> How do we dump and restore ACLs per rows when we support them for >> normal tables? We should treat LO metadata as same as row-based ACLs. >> In my opinion, I'd like to treat them as part of data (not of schema). > > OK, I'll update the patch according to the behavior you suggested. > | I'd prefer to keep the existing behavior: > | * default or data-only : dump all attributes and data of blobs > | * schema-only : don't dump any blobs > | and have independent options to control blob dumps: > | * -b, --blobs : dump all blobs even if schema-only > | * -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only I found out it needs special treatments to dump comments and access privileges of blobs. See dumpACL() and dumpComment() | static void | dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId, | const char *type, const char *name, const char *subname, | const char *tag, const char *nspname, const char *owner, | const char *acls) | { | PQExpBuffer sql; | | /* Do nothing if ACL dump is not enabled */ | if (dataOnly || aclsSkip) | return; | : | static void | dumpComment(Archive *fout, const char *target, | const char *namespace, const char *owner, | CatalogId catalogId, int subid, DumpId dumpId) | { | CommentItem *comments; | int ncomments; | | /* Comments are SCHEMA not data */ | if (dataOnly) | return; | : In addition, _printTocEntry() is not called with acl_pass = true, when --data-only is given. I again wonder whether we are on the right direction. Originally, the reason why we decide to use per blob toc entry was that "BLOB ACLS" entry needs a few exceptional treatments in the code. But, if we deal with "BLOB ITEM" entry as data contents, it will also need additional exceptional treatments. Indeed, even if we have row-level ACLs, it will be dumped in data section without separating them into properties and data contents because of the restriction on implementation, not a data modeling reason. Many of empty large objects may not be sane situation, but it is suitable for the existing manner in pg_dump/pg_restore, at least. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
2010/2/1 KaiGai Kohei <kaigai@ak.jp.nec.com>: > I again wonder whether we are on the right direction. I believe the proposed approach is to dump blob metadata if and only if you are also dumping blob contents, and to do all of this for data dumps but not schema dumps. That seems about right to me. > Originally, the reason why we decide to use per blob toc entry was > that "BLOB ACLS" entry needs a few exceptional treatments in the code. > But, if we deal with "BLOB ITEM" entry as data contents, it will also > need additional exceptional treatments. But the new ones are less objectionable, maybe. ...Robert
(2010/02/04 0:20), Robert Haas wrote: > 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >> I again wonder whether we are on the right direction. > > I believe the proposed approach is to dump blob metadata if and only > if you are also dumping blob contents, and to do all of this for data > dumps but not schema dumps. That seems about right to me. In other words: <default> -> blob contents and metadata (owner, acl, comments) shall be dumped--data-only -> only blobcontents shall be dumped--schema-only -> neither blob contents and metadata are dumped. Can I understand correctly? >> Originally, the reason why we decide to use per blob toc entry was >> that "BLOB ACLS" entry needs a few exceptional treatments in the code. >> But, if we deal with "BLOB ITEM" entry as data contents, it will also >> need additional exceptional treatments. > > But the new ones are less objectionable, maybe. > > ...Robert > -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/02/04 17:30), KaiGai Kohei wrote: > (2010/02/04 0:20), Robert Haas wrote: >> 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> I again wonder whether we are on the right direction. >> >> I believe the proposed approach is to dump blob metadata if and only >> if you are also dumping blob contents, and to do all of this for data >> dumps but not schema dumps. That seems about right to me. > > In other words: > > <default> -> blob contents and metadata (owner, acl, comments) shall > be dumped > --data-only -> only blob contents shall be dumped > --schema-only -> neither blob contents and metadata are dumped. > > Can I understand correctly? The attached patch enables not to dump "BLOB ITEM" section and corresponding metadata when --data-only is specified. In addition, it does not output both "BLOB DATA" and "BLOB ITEM" section when --schema-only is specified. When --data-only is given to pg_dump, it does not construct any DO_BLOB_ITEM entries in getBlobs(), so all the metadata (owner, acls, comment) are not dumped. And it writes legacy "BLOBS" section instead of the new "BLOB DATA" section to inform pg_restore this archive does not create large objects in "BLOB ITEM" section. If --schema-only is given, getBlobs() is simply skipped. When --data-only is given to pg_restore, it skips all the "BLOB ITEM" sections. Large objects are created in _LoadBlobs() instead of the section, like as we have done until now. The _LoadBlobs() takes the third argument which specifies whether we should create large object here, or not. Its condition is a bit modified from the previous patch. if (strcmp(te->desc, "BLOBS") == 0 || ropt->dataOnly) _LoadBlobs(AH, ropt, true); ^^^^^^^^^^^^^^^^^ else if (strcmp(te->desc, "BLOB DATA") == 0) _LoadBlobs(AH, ropt, false); When --data-only is given to pg_restore, "BLOB ITEM" secition is skipped, so we need to create large objects at _LoadBlobs() stage, even if the archive has "BLOB DATA" section. In addition, --schema-only kills all the "BLOB ITEM" section using a special condition that was added to _tocEntryRequired(). It might be a bit different from what Itagaki-san suggested, because "BLOB ITEM" section is still in SECTION_PRE_DATA section. However, it minimizes special treatments in the code, and no differences from the viewpoint of end-users. Or, is it necessary to pack them into SECTION_DATA section anyway? Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2010/2/4 KaiGai Kohei <kaigai@ak.jp.nec.com>: > (2010/02/04 0:20), Robert Haas wrote: >> 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> I again wonder whether we are on the right direction. >> >> I believe the proposed approach is to dump blob metadata if and only >> if you are also dumping blob contents, and to do all of this for data >> dumps but not schema dumps. That seems about right to me. > > In other words: > > <default> -> blob contents and metadata (owner, acl, comments) shall > be dumped > --data-only -> only blob contents shall be dumped > --schema-only -> neither blob contents and metadata are dumped. > > Can I understand correctly? No, that's not what I said. Please reread. I don't think you should ever dump blob contents without the metadata, or the other way around. ...Robert
Robert Haas escribió: > 2010/2/4 KaiGai Kohei <kaigai@ak.jp.nec.com>: > > (2010/02/04 0:20), Robert Haas wrote: > >> 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: > >>> I again wonder whether we are on the right direction. > >> > >> I believe the proposed approach is to dump blob metadata if and only > >> if you are also dumping blob contents, and to do all of this for data > >> dumps but not schema dumps. That seems about right to me. > > > > In other words: > > > > <default> -> blob contents and metadata (owner, acl, comments) shall > > be dumped > > --data-only -> only blob contents shall be dumped > > --schema-only -> neither blob contents and metadata are dumped. > > > > Can I understand correctly? > > No, that's not what I said. Please reread. I don't think you should > ever dump blob contents without the metadata, or the other way around. So:default: both contents and metadata--data-only: same--schema-only: neither Seems reasonable. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
(2010/02/05 3:27), Alvaro Herrera wrote: > Robert Haas escribió: >> 2010/2/4 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>> (2010/02/04 0:20), Robert Haas wrote: >>>> 2010/2/1 KaiGai Kohei<kaigai@ak.jp.nec.com>: >>>>> I again wonder whether we are on the right direction. >>>> >>>> I believe the proposed approach is to dump blob metadata if and only >>>> if you are also dumping blob contents, and to do all of this for data >>>> dumps but not schema dumps. That seems about right to me. >>> >>> In other words: >>> >>> <default> -> blob contents and metadata (owner, acl, comments) shall >>> be dumped >>> --data-only -> only blob contents shall be dumped >>> --schema-only -> neither blob contents and metadata are dumped. >>> >>> Can I understand correctly? >> >> No, that's not what I said. Please reread. I don't think you should >> ever dump blob contents without the metadata, or the other way around. > > So: > default: both contents and metadata > --data-only: same > --schema-only: neither > > Seems reasonable. OK... I'll try to update the patch, anyway. However, it means only large object performs an exceptional object class that dumps its owner, acl and comment even if --data-only is given. Is it really what you suggested, isn't it? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
KaiGai Kohei <kaigai@kaigai.gr.jp> wrote: > > default: both contents and metadata > > --data-only: same > > --schema-only: neither > > However, it means only large object performs an exceptional object class > that dumps its owner, acl and comment even if --data-only is given. > Is it really what you suggested, isn't it? I wonder we still need to have both "BLOB ITEM" and "BLOB DATA" even if we will take the all-or-nothing behavior. Can we handle BLOB's owner, acl, comment and data with one entry kind? Regards, --- Takahiro Itagaki NTT Open Source Software Center
(2010/02/05 13:53), Takahiro Itagaki wrote: > > KaiGai Kohei<kaigai@kaigai.gr.jp> wrote: > >>> default: both contents and metadata >>> --data-only: same >>> --schema-only: neither >> >> However, it means only large object performs an exceptional object class >> that dumps its owner, acl and comment even if --data-only is given. >> Is it really what you suggested, isn't it? > > I wonder we still need to have both "BLOB ITEM" and "BLOB DATA" > even if we will take the all-or-nothing behavior. Can we handle > BLOB's owner, acl, comment and data with one entry kind? Is it possible to fetch a certain blob from tar/custom archive when pg_restore found a toc entry of the blob? Currently, when pg_restore found a "BLOB DATA" or "BLOBS" entry, it opens the archive and restores all the blob objects sequentially. It seems to me we also have to rework the custom format........ Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/02/05 13:53), Takahiro Itagaki wrote: > > KaiGai Kohei<kaigai@kaigai.gr.jp> wrote: > >>> default: both contents and metadata >>> --data-only: same >>> --schema-only: neither >> >> However, it means only large object performs an exceptional object class >> that dumps its owner, acl and comment even if --data-only is given. >> Is it really what you suggested, isn't it? > > I wonder we still need to have both "BLOB ITEM" and "BLOB DATA" > even if we will take the all-or-nothing behavior. Can we handle > BLOB's owner, acl, comment and data with one entry kind? I looked at the corresponding code. Currently, we have three _LoadBlobs() variations in pg_backup_tar.c, pg_backup_files.c and pg_backup_custom.c. In the _tar.c and _files.c case, we can reasonably fetch data contents of the blob to be restored. All we need to do is to provide an explicit filename to the tarOpen() function, and a blob is not necessary to be restored sequentially. It means pg_restore can restore an arbitrary file when it found a new unified blob entry. In the _custom.c case, its _LoadBlobs() is called from _PrintTocData() when the given TocEntry is "BLOBS", and it tries to load the following multiple blobs. However, I could not find any restriction that custom format cannot have multiple "BLOBS" section. In other word, we can write out multiple sections with a blob for each a new unified blob entry. Right now, it seems to me it is feasible to implement what you suggested. The matter is whether we should do it, or not. At least, it seems to me better than some of exceptional treatments in pg_dump and pg_restore from the perspective of design. What is your opinion? Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/02/05 13:53), Takahiro Itagaki wrote: > > KaiGai Kohei<kaigai@kaigai.gr.jp> wrote: > >>> default: both contents and metadata >>> --data-only: same >>> --schema-only: neither >> >> However, it means only large object performs an exceptional object class >> that dumps its owner, acl and comment even if --data-only is given. >> Is it really what you suggested, isn't it? > > I wonder we still need to have both "BLOB ITEM" and "BLOB DATA" > even if we will take the all-or-nothing behavior. Can we handle > BLOB's owner, acl, comment and data with one entry kind? The attached patch was a refactored one according to the suggestion. In the default or --data-only, it dumps data contents of large objects and its properties (owner, comment and access privileges), but it dumps nothing when --schema-only is given. default: both contents and metadata --data-only: same --schema-only: neither It replaced existing "BLOBS" and "BLOB COMMENTS" sections by the new "LARGE OBJECT" section which is associated with a certain large object. Its section header contains OID of the large object to be restored, so the pg_restore tries to load the specified large object from the given archive. _PrintTocData() handlers were modified to support the "LARGE OBJECT" section that loads the specified large object only, not whole of the archived ones like "BLOBS". It also support to read "BLOBS" and "BLOB COMMENTS" sections, but never write out these legacy sections any more. The archive file will never contain "blobs.toc", because we can find OID of the large objects to be restored in the section header, without any special purpose files. It also allows to omit _StartBlobs() and _EndBlobs() method in tar and file format. Basically, I like this approach more than the previous combination of "BLOB ITEM" and "BLOB DATA". However, we have a known issue here. The ACL section is categorized to REQ_SCHEMA in _tocEntryRequired(), so we cannot dump them when --data-only options, even if large object itself is dumped out. Of course, we can solve it with putting a few more exceptional treatments, although it is not graceful. However, it seems to me the matter comes from that _tocEntryRequired() can only returns a mask of REQ_SCHEMA and REQ_DATA. Right now, it is not easy to categorize ACL/COMMENT section into either of them. I think we should consider REQ_ACL and REQ_COMMENT to inform caller whether the appeared section to be dumped now, or not. Any idea? Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
Takahiro Itagaki escribió: > > KaiGai Kohei <kaigai@kaigai.gr.jp> wrote: > > > > default: both contents and metadata > > > --data-only: same > > > --schema-only: neither > > > > However, it means only large object performs an exceptional object class > > that dumps its owner, acl and comment even if --data-only is given. > > Is it really what you suggested, isn't it? > > I wonder we still need to have both "BLOB ITEM" and "BLOB DATA" > even if we will take the all-or-nothing behavior. Can we handle > BLOB's owner, acl, comment and data with one entry kind? I don't think this is necessarily a good idea. We might decide to treat both things separately in the future and it having them represented separately in the dump would prove useful. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
(2010/02/08 22:23), Alvaro Herrera wrote: > Takahiro Itagaki escribió: >> >> KaiGai Kohei<kaigai@kaigai.gr.jp> wrote: >> >>>> default: both contents and metadata >>>> --data-only: same >>>> --schema-only: neither >>> >>> However, it means only large object performs an exceptional object class >>> that dumps its owner, acl and comment even if --data-only is given. >>> Is it really what you suggested, isn't it? >> >> I wonder we still need to have both "BLOB ITEM" and "BLOB DATA" >> even if we will take the all-or-nothing behavior. Can we handle >> BLOB's owner, acl, comment and data with one entry kind? > > I don't think this is necessarily a good idea. We might decide to treat > both things separately in the future and it having them represented > separately in the dump would prove useful. I agree. From design perspective, the single section approach is more simple than dual section, but its change set is larger than the dual. The attached patch revised the previous implementation which have two types of sections, to handle options correctly, as follows: * default: both contents and metadata * --data-only: same * --schema-only: neither Below is the points to be introduced. The _tocEntryRequired() makes its decision whether the given TocEntry to be dumped here, or not, based on the given context. Previously, all the sections which needs cleanups and access privileges were not belonging to SECTION_DATA, so, data sections were ignored, even if it needs to restore cleanup code and access privileges. At the pg_backup_archiver.c:329 chunk, it checks whether we need to clean up the existing object specified by the TocEntry. If the entry is "BLOB ITEM", _tocEntryRequired() returns REQ_DATA (if --data-only given), then it does not write out the cleanup code. (We have to unlink the existing large objects to be restored prior to creation of them, so it got unavailable to clean up at _StartBlob().) At the pg_backup_archiver.c:381 chunk, it checks whether we need to restore access privileges, or not, using the given "ACL" TocEntry. In similar way, the caller does not expect access privileges being restored when --data-only is given. The _tocEntryRequired() was also modified to handle large objects correctly. Previously, when TocEntry does not have its own dumper (except for "SEQUENCE SET" section), it was handled as a SECTION_SCHEMA. If the 16th argument of ArchiveEntry() was NULL, it does not have its own dumper function, even if the section is SECTION_DATA. Also, the dumpBlobItem() calls ArchiveEntry() without its dumper, so it is misidentified as a schema section. The "ACL" section of large objects are also misidentified. So, I had to add these special treatments. The dumpACL() is a utility function to write out GRANT/REVOKE statement for the given acl string. When --data-only is given, it returns immediately without any works. It prevented to dump access privileges of large objects. However, all the caller already checks "if (dataOnly)" cases prior to its invocation. So, I removed this check from the dumpACL(). Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > > I don't think this is necessarily a good idea. We might decide to treat > > both things separately in the future and it having them represented > > separately in the dump would prove useful. > > I agree. From design perspective, the single section approach is more > simple than dual section, but its change set is larger than the dual. OK. When I tested a custom dump with pg_restore, --clean & --single-transaction will fail with the new dump format because it always call lo_unlink() even if the large object doesn't exist. It comes from dumpBlobItem: ! dumpBlobItem(Archive *AH, BlobInfo *binfo) ! appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name); The query in DropBlobIfExists() could avoid errors -- should we use it here? | SELECT lo_unlink(oid) FROM pg_largeobject_metadata WHERE oid = %s; BTW, --clean option is ambiguous if combined with --data-only. Restoring large objects fails for the above reason if previous objects don't exist, but table data are restored *without* truncation of existing data. Will normal users expect TRUNCATE-before-load for --clean & --data-only cases? Present behaviors are; Table data - Appended. (--clean is ignored) Large objects - End with an error ifobject doesn't exist. IMO, ideal behaviors are: Table data - Truncate existing data and load new ones. Large objects - Work like as MERGE (or REPLACE, UPSERT). Comments? Regards, --- Takahiro Itagaki NTT Open Source Software Center
(2010/02/09 20:16), Takahiro Itagaki wrote: > > KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: > >>> I don't think this is necessarily a good idea. We might decide to treat >>> both things separately in the future and it having them represented >>> separately in the dump would prove useful. >> >> I agree. From design perspective, the single section approach is more >> simple than dual section, but its change set is larger than the dual. > > OK. > > > When I tested a custom dump with pg_restore, --clean& --single-transaction > will fail with the new dump format because it always call lo_unlink() > even if the large object doesn't exist. It comes from dumpBlobItem: > > ! dumpBlobItem(Archive *AH, BlobInfo *binfo) > ! appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name); > > The query in DropBlobIfExists() could avoid errors -- should we use it here? > | SELECT lo_unlink(oid) FROM pg_largeobject_metadata WHERE oid = %s; Yes, we can use this query to handle --clean option. I'll fix it soon. > BTW, --clean option is ambiguous if combined with --data-only. Restoring > large objects fails for the above reason if previous objects don't exist, > but table data are restored *without* truncation of existing data. Will > normal users expect TRUNCATE-before-load for --clean& --data-only cases? > > Present behaviors are; > Table data - Appended. (--clean is ignored) > Large objects - End with an error if object doesn't exist. > IMO, ideal behaviors are: > Table data - Truncate existing data and load new ones. > Large objects - Work like as MERGE (or REPLACE, UPSERT). > > Comments? In the existing "BLOBS" section, it creates and restores large objects in same time. And, it also unlink existing large object (if exists) just before restoring them, when --clean is given. In my opinion, when --clean is given, it also should truncate the table before restoring, even if --data-only is co-given. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
(2010/02/09 21:18), KaiGai Kohei wrote: > (2010/02/09 20:16), Takahiro Itagaki wrote: >> >> KaiGai Kohei<kaigai@ak.jp.nec.com> wrote: >> >>>> I don't think this is necessarily a good idea. We might decide to treat >>>> both things separately in the future and it having them represented >>>> separately in the dump would prove useful. >>> >>> I agree. From design perspective, the single section approach is more >>> simple than dual section, but its change set is larger than the dual. >> >> OK. >> >> >> When I tested a custom dump with pg_restore, --clean& >> --single-transaction >> will fail with the new dump format because it always call lo_unlink() >> even if the large object doesn't exist. It comes from dumpBlobItem: >> >> ! dumpBlobItem(Archive *AH, BlobInfo *binfo) >> ! appendPQExpBuffer(dquery, "SELECT lo_unlink(%s);\n", binfo->dobj.name); >> >> The query in DropBlobIfExists() could avoid errors -- should we use it >> here? >> | SELECT lo_unlink(oid) FROM pg_largeobject_metadata WHERE oid = %s; > > Yes, we can use this query to handle --clean option. > I'll fix it soon. The attached patch fixed up the cleanup query as follows: + appendPQExpBuffer(dquery, + "SELECT pg_catalog.lo_unlink(oid) " + "FROM pg_catalog.pg_largeobject_metadata " + "WHERE oid = %s;\n", binfo->dobj.name); And, I also noticed that lo_create() was not prefixed by "pg_catalog.", so I also add it. Rest of parts were not changed. Thanks, >> BTW, --clean option is ambiguous if combined with --data-only. Restoring >> large objects fails for the above reason if previous objects don't exist, >> but table data are restored *without* truncation of existing data. Will >> normal users expect TRUNCATE-before-load for --clean& --data-only cases? >> >> Present behaviors are; >> Table data - Appended. (--clean is ignored) >> Large objects - End with an error if object doesn't exist. >> IMO, ideal behaviors are: >> Table data - Truncate existing data and load new ones. >> Large objects - Work like as MERGE (or REPLACE, UPSERT). >> >> Comments? > > In the existing "BLOBS" section, it creates and restores large objects > in same time. And, it also unlink existing large object (if exists) > just before restoring them, when --clean is given. > > In my opinion, when --clean is given, it also should truncate the table > before restoring, even if --data-only is co-given. > > Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote: > The attached patch fixed up the cleanup query as follows: > + appendPQExpBuffer(dquery, > + "SELECT pg_catalog.lo_unlink(oid) " > + "FROM pg_catalog.pg_largeobject_metadata " > + "WHERE oid = %s;\n", binfo->dobj.name); > > And, I also noticed that lo_create() was not prefixed by "pg_catalog.", > so I also add it. Thanks. Now the patch is ready to commit. Regards, --- Takahiro Itagaki NTT Open Source Software Center