Re: Largeobject Access Controls (r2460) - Mailing list pgsql-hackers

From Takahiro Itagaki
Subject Re: Largeobject Access Controls (r2460)
Date
Msg-id 20100201141915.BA5F.52131E4D@oss.ntt.co.jp
Whole thread Raw
In response to Re: Largeobject Access Controls (r2460)  (KaiGai Kohei <kaigai@ak.jp.nec.com>)
Responses Re: Largeobject Access Controls (r2460)
List pgsql-hackers
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> The attached patch uses one TOC entry for each blob objects.

This patch does not only fix the existing bugs, but also refactor
the dump format of large objects in pg_dump. The new format are
more similar to the format of tables:
Section    <Tables>     <New LO>    <Old LO>
-----------------------------------------------------Schema     "TABLE"      "BLOB ITEM" "BLOBS"Data       "TABLE DATA"
"BLOBDATA" "BLOBS"Comments   "COMMENT"    "COMMENT"   "BLOB COMMENTS"
 

We will allocate BlobInfo in memory for each large object. It might
consume much more memory compared with former versions if we have
many large objects, but we discussed it is an acceptable change.

As far as I read, the patch is almost ready to commit
except the following issue about backward compatibility:

> * "BLOB DATA"
> This section is same as existing "BLOBS" section, except for _LoadBlobs()
> does not create a new large object before opening it with INV_WRITE, and
> lo_truncate() will be used instead of lo_unlink() when --clean is given.

> The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read
> for compatibility, but newer pg_dump never create these sections.

I wonder we need to support older versions in pg_restore. You add a check
"PQserverVersion >= 80500" in CleanupBlobIfExists(), but out documentation
says we cannot use pg_restore 9.0 for 8.4 or older servers:

http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html
| it is not guaranteed that pg_dump's output can be loaded into
| a server of an older major version

Can we remove such path and raise an error instead?
Also, even if we support the older servers in the routine,
the new bytea format will be another problem anyway.


> One remained issue is the way to decide whether blobs to be dumped, or not.
> Right now, --schema-only eliminate all the blob dumps.
> However, I think it should follow the manner in any other object classes.
> 
>  -a, --data-only    ... only "BLOB DATA" sections, not "BLOB ITEM"
>  -s, --schema-only  ... only "BLOB ITEM" sections, not "BLOB DATA"
>  -b, --blobs        ... both of "BLOB ITEM" and "BLOB DATA" independently
>                         from --data-only and --schema-only?

I cannot image situations that require data-only dumps -- for example,
restoring database has a filled pg_largeobject_metadata and an empty
or broken pg_largeobject -- but it seems to be unnatural.

I'd prefer to keep the existing behavior: * default or data-only : dump all attributes and data of blobs * schema-only
       : don't dump any blobs
 
and have independent options to control blob dumps: * -b, --blobs    : dump all blobs even if schema-only * -B,
--no-blobs: [NEW] don't dump any blobs even if default or data-only
 

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Review: listagg aggregate
Next
From: KaiGai Kohei
Date:
Subject: Re: Largeobject Access Controls (r2460)