Re: Large object dumps vs older pg_restore - Mailing list pgsql-hackers

From KaiGai Kohei
Subject Re: Large object dumps vs older pg_restore
Date
Msg-id 4B7C9EB6.5070503@ak.jp.nec.com
Whole thread Raw
In response to Re: Large object dumps vs older pg_restore  (KaiGai Kohei <kaigai@ak.jp.nec.com>)
List pgsql-hackers
The attached patch is modified to be applied to the latest tree,
and it increments the version number of the archive file.

  kaigai@saba pg_dump]$ ./pg_dump -Ft postgres > ~/hoge.tar

  [kaigai@saba pg_dump]$ /usr/bin/pg_restore --version
  pg_restore (PostgreSQL) 8.4.2
  [kaigai@saba pg_dump]$ /usr/bin/pg_restore -d test ~/hoge.tar
  pg_restore: [archiver] unsupported version (1.12) in file header
 --> unable to restore using older pg_restore

  [kaigai@saba pg_dump]$ ./pg_restore --version
  pg_restore (PostgreSQL) 9.0devel
  [kaigai@saba pg_dump]$ ./pg_restore -d test ~/hoge.tar

  [kaigai@saba pg_dump]$ psql test
  psql (9.0devel)
  Type "help" for help.

  test=# \lo_list
                     Large objects
    ID   | Owner  |          Description
  -------+--------+--------------------------------
   16384 | kaigai |
   16385 | ymj    |
   16386 | tak    | this is a small large object
   16387 | kaigai | this is a small large object 2
  (4 rows)

  test=# select oid, * from pg_largeobject_metadata;
    oid  | lomowner |             lomacl
  -------+----------+---------------------------------
   16387 |       10 |
   16384 |       10 | {kaigai=rw/kaigai,ymj=r/kaigai}
   16385 |    16388 | {ymj=rw/ymj,tak=r/ymj}
   16386 |    16389 | {tak=rw/tak,ymj=r/tak}
  (4 rows)

Thanks,

(2010/02/18 9:12), KaiGai Kohei wrote:
> (2010/02/18 4:54), Tom Lane wrote:
>> I've been looking at the proposed patch for pg_dump with large objects,
>> and I'm afraid it breaks things even worse than they're broken in HEAD.
>>
>> Currently, when pg_restore is processing a BLOBS item, it emits
>> lo_create() followed by lo_write() calls, and conditionally precedes
>> that by lo_unlink() if --clean was specified.  It pretty much has to
>> drive all that off the one archive entry since there are no others
>> (excepting BLOB COMMENTS which is unhelpful since it appears later).
>>
>> The patch wants to emit a separate BLOB item for each blob, upon which
>> we can hang ownership, ACL, and comment information in the same ways
>> that we deal with these things for other sorts of database objects.
>> That's good since it means that switches like --no-owner will work
>> properly.  The trouble is that this means the responsibility to do
>> lo_unlink and lo_create has to be taken out from processing of
>> the BLOBS item.
>>
>> The way the patch does that is that it renames BLOBS to BLOB DATA,
>> and drives the do-we-create decision off the name of the archive
>> entry.  This breaks compatibility of the archive to prior pg_restore
>> versions.  An 8.4 pg_restore will have no idea that it doesn't know
>> how to process the archive, but nonetheless will emit wrong results
>> --- in particular, you get a lo_create() from the BLOB item and then
>> another from BLOB DATA, so the restore fails completely.  There are
>> other bad effects too because 8.4 doesn't really know quite what
>> BLOB DATA is, but it needs to because there are various places that
>> have to treat that specially.
>>
>> Probably the only way we can make this design work is to bump the
>> archive version number so that older pg_restores will fail.  (Whereupon
>> there is no need to rename the entry type BTW.)  This is slightly
>> annoying but it's not like we've not done it multiple times before.
>
> IIRC, Itagaki-san also suggested to abort when we try to restore
> the dumped database image to the older pgsql. So, the checks in
> the DropBlobIfExists() are modified.
> This idea allows to abort restoring at the head of pg_restore.
> It seems to me fair enough.
>
>> If we wanted to keep backwards compatibility, we'd have to leave
>> the lo_create responsibility with the BLOBS item, and have the
>> BLOB metadata items be things that just add ACLs/ownership/comments
>> without doing the actual create, and have to be processed after
>> BLOBS instead of before it.  This is probably workable but it
>> doesn't seem to me that it's accomplishing the goal of making blobs
>> work like normal objects.
>
> It was also an idea that I tried to implement in this approach at first.
> But it is difficult to treat various kind of pg_restore options correctly.
>
> Thanks,


--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Streaming replication on win32, still broken
Next
From: Andrew Dunstan
Date:
Subject: Re: CommitFest Status Summary - 2010-02-14