Thread: pg_restore (libpq? parser?) bug in 8

pg_restore (libpq? parser?) bug in 8

From
Philip Warner
Date:
>         CREATE FUNCTION xxx() RETURNS integer
>             AS $$ begin return 1;
>2004-08-12 01:38:48 EST<zzz,birds>: ERROR:  unterminated dollar-quoted 
>string at or near "$$ begin return 1;" at character 115


Just realized the problem; pg_restore uses a trivial parser to work out 
when statements start/end. It knows about quotes but not about dollar-quotes.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/  



Re: pg_restore (libpq? parser?) bug in 8

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Just realized the problem; pg_restore uses a trivial parser to work out 
> when statements start/end. It knows about quotes but not about dollar-quotes.

Blech.  Do you have time to fix it?
        regards, tom lane


Re: pg_restore (libpq? parser?) bug in 8

From
Philip Warner
Date:
At 03:33 AM 12/08/2004, Tom Lane wrote:
>Do you have time to fix it?

Should do; I'll add the die-on-error option as well.

Con someone confirm how dollar quoting works:
    '$[tag]$'

where tag is alpha chars? any chars? \n? \r?

and closing tag must match.

All dollar-quotes inside any kind of quotes can be ignored.

Is there any circumstance where an unquoted '$' is valid?



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/ 



Re: pg_restore (libpq? parser?) bug in 8

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Con someone confirm how dollar quoting works:
>      '$[tag]$'
> where tag is alpha chars? any chars? \n? \r?

IIRC the tag is either empty or anything that looks like a
(dollar-sign-less) identifier.  But check the rules in scan.l to be sure.

> Is there any circumstance where an unquoted '$' is valid?

$n parameter identifiers (which is why the tag cannot start with
a digit).  Also, I believe $ can be embedded in identifiers (ie,
it can be a non-first character).  So a dollar quote can't be
adjacent to a preceding identifier.

IIRC we tried to do ad-hoc code for dollar quoting in psql, and gave it
up as a bad job.  You might need to bite the bullet and implement a flex
lexer.

Why exactly does pg_restore need to parse the SQL anyway?
        regards, tom lane


Re: pg_restore (libpq? parser?) bug in 8

From
Philip Warner
Date:
At 12:15 PM 12/08/2004, Tom Lane wrote:
>IIRC we tried to do ad-hoc code for dollar quoting in psql, and gave it
>up as a bad job.  You might need to bite the bullet and implement a flex
>lexer.

Looks like the psql side of things is not ideal (see other post).

Any idea how backslashes should handled in and out of the tag?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/ 



Re: pg_restore (libpq? parser?) bug in 8

From
Philip Warner
Date:
At 12:15 PM 12/08/2004, Tom Lane wrote:
>Why exactly does pg_restore need to parse the SQL anyway?

It just looks for complete statements. From memory it relates to the 
possibility that TOC entries can have more than one statement, or it may 
relate to handling COPY statements. I think it has to look for 
PQresultStatus(...) == PGRES_COPY_IN for each statement it executes, so it 
needs to pass statements one at a time.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/ 



Re: pg_restore (libpq? parser?) bug in 8

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Any idea how backslashes should handled in and out of the tag?

Backslashes aren't legal in the tag, I'm quite sure.  In the body of the
quoted string they are not special in any way.
        regards, tom lane


Re: pg_restore (libpq? parser?) bug in 8

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 12:15 PM 12/08/2004, Tom Lane wrote:
>> Why exactly does pg_restore need to parse the SQL anyway?

> It just looks for complete statements. From memory it relates to the 
> possibility that TOC entries can have more than one statement, or it may 
> relate to handling COPY statements. I think it has to look for 
> PQresultStatus(...) == PGRES_COPY_IN for each statement it executes, so it 
> needs to pass statements one at a time.

Hm.  But we could assume that a COPY will be all by itself in a TOC
entry, couldn't we?
        regards, tom lane


Re: pg_restore (libpq? parser?) bug in 8

From
Philip Warner
Date:
At 12:42 PM 12/08/2004, Tom Lane wrote:
>Hm.  But we could assume that a COPY will be all by itself in a TOC
>entry, couldn't we?

Maybe. I know I hit a couple of nasty examples in the original code. Isn't 
the COPY combined with the data? If so, we still have to scan for it's end. 
The existing scanner is pretty trivial. The dollar-quoting will not make it 
much worse (I hope). But I'll hold off on the changes if you want to 
experiment -- I used to use my own DBs + the regression DB for testing.

Another possible issue - if I pass two statements in one string to libpq, 
separated by semicolons, will it cope? If so, has that been true since 7.0? 
If the answers are ('no',_), or ('yes', 'no') then that explains why 
pg_restore has to parse statements - the TOC entry can have more than one 
statement.

Sorry to be vague, it's a long time since I wrote the code.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/ 



Re: pg_restore (libpq? parser?) bug in 8

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 12:42 PM 12/08/2004, Tom Lane wrote:
>> Hm.  But we could assume that a COPY will be all by itself in a TOC
>> entry, couldn't we?

> Maybe. I know I hit a couple of nasty examples in the original code. Isn't 
> the COPY combined with the data? If so, we still have to scan for it's end. 
> The existing scanner is pretty trivial.

Agreed.  But we only emit dollar quoting in CREATE FUNCTION entries, and
I don't really see why you need to parse those with any accuracy.  I
think we could do something here with making assumptions based on the
known TOC entry type about what might be in it.

> Another possible issue - if I pass two statements in one string to libpq, 
> separated by semicolons, will it cope? If so, has that been true since 7.0? 

Yes, and yes, except that if the first one gets an error the second will
not be executed.  Per the other thread, that's probably a behavior change
we don't want.
        regards, tom lane


Re: pg_restore (libpq? parser?) bug in 8

From
Philip Warner
Date:
At 12:15 PM 12/08/2004, Tom Lane wrote:
>You might need to bite the bullet and implement a flex
>lexer.

I'd like to avoid this if I can; AFAICT, for statement detection on 
pg_restore, I can require white space before the $tag. Since I also skip 
over all quoted text, the bodies of functions are ignored. The only issues 
will be attribute names with ' $' in them, but they will be quoted as well 
(so ignored).

So to recognize a tag, I look for a '$' after white space, and assume it's 
a tag start. If I subsequently read an invalid tag char, I just go back 
into scan mode on that character and assume the '$...' was some other valid 
sql element.
From other threads, it sounds like removing the statement detection code 
entirely is not an option.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/ 



Re: pg_restore (libpq? parser?) bug in 8

From
"Andrew Dunstan"
Date:
Philip Warner said:
> At 12:15 PM 12/08/2004, Tom Lane wrote:
>>You might need to bite the bullet and implement a flex
>>lexer.
>
> I'd like to avoid this if I can; AFAICT, for statement detection on
> pg_restore, I can require white space before the $tag. Since I also
> skip  over all quoted text, the bodies of functions are ignored. The
> only issues  will be attribute names with ' $' in them, but they will
> be quoted as well  (so ignored).
>
> So to recognize a tag, I look for a '$' after white space, and assume
> it's  a tag start. If I subsequently read an invalid tag char, I just
> go back  into scan mode on that character and assume the '$...' was
> some other valid  sql element.
>
> From other threads, it sounds like removing the statement detection
> code
> entirely is not an option.
>
>


See the discussions that culminated here before Tom bit the bullet and
implemented a flex scanner for psql:

http://archives.postgresql.org/pgsql-patches/2004-02/msg00182.php

It's not as easy as you might think.

I had a thought that might short-circuit this - do we even need pg_dump to
use dollar quoting for non-text output, such as is required by pg_restore?
IIRC, the idea was to have text dumps that didn't undo what the user had
done in using dollar quoting, but I am not sure that consideration applies
to non-text output. Turning it off should be very easy - we already have the
switch.

Thoughts?

cheers

andrew




Re: pg_restore (libpq? parser?) bug in 8

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> I had a thought that might short-circuit this - do we even need pg_dump to
> use dollar quoting for non-text output, such as is required by pg_restore?

That's a possibility, but I'd rather work around it by finding a way for
pg_restore not to need to parse the dollar-quoted literal in the first
place.  I haven't heard a reason why it needs to parse the contents of a
CREATE FUNCTION entry.  If we guarantee not to put dollar-quoted
literals into the TOC types it *does* need to parse, ISTM we're good.
        regards, tom lane


Re: pg_restore (libpq? parser?) bug in 8

From
Philip Warner
Date:
At 11:42 PM 12/08/2004, Tom Lane wrote:
>That's a possibility, but I'd rather work around it by finding a way for
>pg_restore not to need to parse the dollar-quoted literal in the first
>place.

Two possibilities:

1) Parse the tags (I have the code working): it's not that hard, the only 
trick bit being recognizing the tags in the first place. I have assumed 
that any bare unquoted string that is not preceded by valid identifier name 
chars, and which starts with a '$' may be a dollar quote. This seems valid 
to me.

2) We could avoid special coding for TOC entry types (eg. pg_restore 
knowing 'FUNCTION' TOC entries should not be parsed), by changing the TOC 
data to include a flag/counter (set by pg_dump) indicating that the entry 
contains > 1 statements. Then we don't hard code knowledge of TOC entry 
types, and function definitions will not be parsed. Old dump files would be 
treated as multi-statement, and still be parsed.

If my assumption in (1) is valid, then I have a very mild preference for 
it, but am happy with either.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/ 



Re: pg_restore (libpq? parser?) bug in 8

From
Bruce Momjian
Date:
Added to open items list:
* fix dollar quoting problems in pg_restore


---------------------------------------------------------------------------

Philip Warner wrote:
> 
> >         CREATE FUNCTION xxx() RETURNS integer
> >             AS $$ begin return 1;
> >2004-08-12 01:38:48 EST<zzz,birds>: ERROR:  unterminated dollar-quoted 
> >string at or near "$$ begin return 1;" at character 115
> 
> 
> Just realized the problem; pg_restore uses a trivial parser to work out 
> when statements start/end. It knows about quotes but not about dollar-quotes.
> 
> 
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.B.N. 75 008 659 498)          |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 03 5330 3172          |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                   |    --________--
> PGP key available upon request,  |  /
> and from pgp.mit.edu:11371       |/  
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073