Re: Large object corruption during 'piped' pg_restore - Mailing list pgsql-general

From Tom Lane
Subject Re: Large object corruption during 'piped' pg_restore
Date
Msg-id 23205.1295565297@sss.pgh.pa.us
Whole thread Raw
In response to Re: Large object corruption during 'piped' pg_restore  (Bosco Rama <postgres@boscorama.com>)
Responses Re: Large object corruption during 'piped' pg_restore  (Vick Khera <vivek@khera.org>)
Re: Large object corruption during 'piped' pg_restore  (Bosco Rama <postgres@boscorama.com>)
List pgsql-general
Bosco Rama <postgres@boscorama.com> writes:
>>> If 'standard_conforming_strings = on' is set in our DB (which is required for
>>> our app) then the piped restore method (e.g. pg_restore -O backup.dat | psql)
>>> results in the large objects being corrupted.

> All servers and client tools involved are PG 8.4.6 on Ubuntu Server 10.04.1 LTS
> with all current updates applied.

I've been able to replicate this in 8.4; it doesn't happen in 9.0
(but probably does in all 8.x versions).

The problem is that pg_dump (or in this case really pg_restore) is
relying on libpq's PQescapeBytea() to format the bytea literal that
will be given as argument to lowrite() during the restore.  When
pg_dump is producing SQL directly, or when pg_restore is connected
to a database, PQescapeBytea() mooches the standard_conforming_strings
value from the active libpq connection and gets the right answer.
In the single case where pg_restore is producing SQL without ever
opening a database connection, PQescapeBytea doesn't know what to do
and defaults to the old non-standard-strings behavior.  Unfortunately
pg_restore set standard_conforming_strings=on earlier in the script
(if it was set in the original source database) so you get the wrong
thing.

The bottom line is that pg_dump can't depend on libpq's PQescapeBytea,
but needs its own copy.  We have in fact done that as of 9.0, which is
what I was vaguely remembering:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_0_BR [b1732111f] 2009-08-04 21:56:09 +0000

    Fix pg_dump to do the right thing when escaping the contents of large objects.

    The previous implementation got it right in most cases but failed in one:
    if you pg_dump into an archive with standard_conforming_strings enabled, then
    pg_restore to a script file (not directly to a database), the script will set
    standard_conforming_strings = on but then emit large object data as
    nonstandardly-escaped strings.

    At the moment the code is made to emit hex-format bytea strings when dumping
    to a script file.  We might want to change to old-style escaping for backwards
    compatibility, but that would be slower and bulkier.  If we do, it's just a
    matter of reimplementing appendByteaLiteral().

    This has been broken for a long time, but given the lack of field complaints
    I'm not going to worry about back-patching.

I'm not sure whether this new complaint is enough reason to reconsider
back-patching.  We cannot just backport the 9.0 patch, since it assumes
it can do bytea hex output --- we'd need to emit old style escaped
output instead.  So it's a bit of work, and more to the point would
involve pushing poorly-tested code into stable branches.  I doubt it
would go wrong, but in the worst-case scenario we might create failures
for blob-restore cases that work now.

So I'm not sure whether to fix it, or leave it as a known failure case
in old branches.  Comments?

            regards, tom lane

pgsql-general by date:

Previous
From: Bosco Rama
Date:
Subject: Re: Large object corruption during 'piped' pg_restore
Next
From: Bruce Momjian
Date:
Subject: Re: plpyhton