Thread: pg_restore silently chokes on object comments/descriptions ending in a backslash

I observe the following issue on PostgreSQL 9.0.4 on at least the
following platforms:

  * FreeBSD 6.3 (amd64)
    `uname -a`:
    FreeBSD <hostname> 6.3-STABLE FreeBSD 6.3-STABLE #1: Fri May 30 18:11:4=
7 PDT 2008
      root@<hostname>:/data/obj/data/home/<username>/symbols/builddir_amd64=
/usr/src/sys/MESSAGING_GATEWAY.amd64_INSTALL  amd64

  * Mac OS X 10.6.8 (i386)
    `uname -a`:
    Darwin joule 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 P=
DT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386 i386

  * semi-current Debian testing (amd64)
    `uname -a`:
    Linux gray 2.6.30-2-amd64 #1 SMP Mon Dec 7 05:21:45 UTC 2009 x86_64 GNU=
/Linux

If the comment/description of a database object (table, function, etc.)
ends in a backslash (which generally works fine otherwise), then
pg_restore is unable to completely restore a custom-format dump of the
schema.  pg_restore does not complain, but silently(!) stops issuing DDL
statements to the server starting with the first "COMMENT ON =E2=80=A6" sta=
tement
that would have set an object comment/description ending in a backslash.

Reproduce as follows:

    $ createdb test0
    $ createdb test1
    $ psql -c "CREATE TABLE bar (); COMMENT ON TABLE bar IS 'bar\\';" test0
    COMMENT
    $ psql -c "CREATE TABLE foo (); COMMENT ON TABLE foo IS 'foo';" test0
    COMMENT
    $ pg_dump --format custom --file test0.pg_dump --schema-only test0
    $ pg_restore -d test1 test0.pg_dump=20
    $ psql -c '\dt+' test0
                       List of relations
     Schema | Name | Type  | Owner  |  Size   | Description=20
    --------+------+-------+--------+---------+-------------
     public | bar  | table | julian | 0 bytes | bar\
     public | foo  | table | julian | 0 bytes | foo
    (2 rows)
=20=20=20=20
    $ psql -c '\dt+' test1
                       List of relations
     Schema | Name | Type  | Owner  |  Size   | Description=20
    --------+------+-------+--------+---------+-------------
     public | bar  | table | julian | 0 bytes |=20
    (1 row)

This also happens with PostgreSQL 8.4.

To demonstrate that this is not an academic issue, these are a few
functions I have defined, and their comments:

    List of functions
    -[ RECORD 1 ]-------+--------------------------------------------------=
----------------------
    Schema              | public
    Name                | escape_are
    ...                 : ...
    Description         | escape advanced regexp (ARE) special characters: =
.*+?|[](){}^$\
    -[ RECORD 2 ]-------+--------------------------------------------------=
----------------------
    Schema              | public
    Name                | escape_control
    ...                 : ...
    Description         | escape control characters: \a\b\t\n\v\f\r\e\\
    -[ RECORD 3 ]-------+--------------------------------------------------=
----------------------
    Schema              | public
    Name                | escape_like
    ...                 : ...
    Description         | escape LIKE pattern special characters: %_\

I have worked around the issue by appending a space character to each of
those function descriptions.  What makes the problem really bad is that it
silently renders your custom-format database dumps (which pg_dump creates
just fine) useless, which you notice only after you do a restore (without
an error being thrown) and your restored database being incomplete.

-Julian
Julian Mehnle <julian@mehnle.net> writes:
> If the comment/description of a database object (table, function, etc.)
> ends in a backslash (which generally works fine otherwise), then
> pg_restore is unable to completely restore a custom-format dump of the
> schema.

Reproduced here against HEAD.  The problem seems to be that
pg_backup_db.c's _sendSQLLine() contains a mini SQL lexer that is not
cognizant of standard_conforming_strings.  Not sure about a simple fix,
and I rather wonder if we shouldn't try to remove that code entirely
instead of "fix" it.

As a temporary workaround, the SQL text file that pg_restore produces
by default seems to be valid, so you could pipe that into psql.

            regards, tom lane
I'm subscribed to the pgsql-bugs list, so no need to CC me. :-)

Tom Lane wrote:

> Reproduced here against HEAD.  The problem seems to be that
> pg_backup_db.c's _sendSQLLine() contains a mini SQL lexer that is not
> cognizant of standard_conforming_strings.

Oh, right, I forgot to mention I have standard_conforming_strings enabled.=
=20=20
However, I understand that is to become the new default anyway.

> Not sure about a simple fix,=20
> and I rather wonder if we shouldn't try to remove that code entirely
> instead of "fix" it.

What would "removing that code entirely" mean?

> As a temporary workaround, the SQL text file that pg_restore produces
> by default seems to be valid, so you could pipe that into psql.

Hmm, right.  So at least my existing dumps aren't useless.  Thanks.

-Julian
Julian Mehnle <julian@mehnle.net> writes:
> I'm subscribed to the pgsql-bugs list, so no need to CC me. :-)

cc to people in the thread is the established practice on these lists.
It provides a bit more robustness when the lists are busy or slow.
You can set your subscription so the listserv won't send you an extra
copy, I believe, but I don't know the incantation offhand.

> Tom Lane wrote:
>> Not sure about a simple fix,
>> and I rather wonder if we shouldn't try to remove that code entirely
>> instead of "fix" it.

> What would "removing that code entirely" mean?

I was wondering why it's necessary to parse the entry in the dump file
at all, rather than just spit it out to PQexec as-is.  There's probably
a reason, but maybe we can find another way to solve whatever the
real problem is.

            regards, tom lane
Tom Lane wrote:

> Julian Mehnle <julian@mehnle.net> writes:
> > I'm subscribed to the pgsql-bugs list, so no need to CC me. :-)
>
> cc to people in the thread is the established practice on these lists.
> It provides a bit more robustness when the lists are busy or slow.

Got it.  I'll try to remember that and CC others when posting, and will=20
also not take offense when receiving extra CCs.

> > What would "removing that code entirely" mean?
>
> I was wondering why it's necessary to parse the entry in the dump file
> at all, rather than just spit it out to PQexec as-is.  There's probably
> a reason, but maybe we can find another way to solve whatever the
> real problem is.

I see.  Given that I'm only a Pg user, not a Pg developer, I have no=20
opinion on this.  Thanks for the explanation, though!

-Julian
Excerpts from Julian Mehnle's message of mié jul 27 13:28:21 -0400 2011:
> Tom Lane wrote:
>
> > Julian Mehnle <julian@mehnle.net> writes:
> > > I'm subscribed to the pgsql-bugs list, so no need to CC me. :-)
> >
> > cc to people in the thread is the established practice on these lists.
> > It provides a bit more robustness when the lists are busy or slow.
>
> Got it.  I'll try to remember that and CC others when posting, and will
> also not take offense when receiving extra CCs.

If you're on procmail, you can get very easily use a well-known de-dupe
recipe and it all works wonderfully.

:0 Wh: msgid.lock
| formail -D 65536 $HOME/.msgid.cache

I assume similar tools can use equivalent mechanisms.

Note that CCing others is customary but obviously not mandatory.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote:

> Note that CCing others is customary but obviously not mandatory.

For the record, CCing posters who haven't explicitly requested it is=20
frowned upon on the Debian mailing lists , but apparently those have a=20
lower latency than the Pg ones. :-)

I'll shut up now since this is drifting off topic.

-Julian
I wrote:
> Julian Mehnle <julian@mehnle.net> writes:
>> What would "removing that code entirely" mean?

> I was wondering why it's necessary to parse the entry in the dump file
> at all, rather than just spit it out to PQexec as-is.  There's probably
> a reason, but maybe we can find another way to solve whatever the
> real problem is.

I've applied a patch along those lines.  Thanks for the report!

            regards, tom lane