Thread: pg_restore (libpq? parser?) bug in 8
> CREATE FUNCTION xxx() RETURNS integer > AS $$ begin return 1; >2004-08-12 01:38:48 EST<zzz,birds>: ERROR: unterminated dollar-quoted >string at or near "$$ begin return 1;" at character 115 Just realized the problem; pg_restore uses a trivial parser to work out when statements start/end. It knows about quotes but not about dollar-quotes. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Just realized the problem; pg_restore uses a trivial parser to work out > when statements start/end. It knows about quotes but not about dollar-quotes. Blech. Do you have time to fix it? regards, tom lane
At 03:33 AM 12/08/2004, Tom Lane wrote: >Do you have time to fix it? Should do; I'll add the die-on-error option as well. Con someone confirm how dollar quoting works: '$[tag]$' where tag is alpha chars? any chars? \n? \r? and closing tag must match. All dollar-quotes inside any kind of quotes can be ignored. Is there any circumstance where an unquoted '$' is valid? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Con someone confirm how dollar quoting works: > '$[tag]$' > where tag is alpha chars? any chars? \n? \r? IIRC the tag is either empty or anything that looks like a (dollar-sign-less) identifier. But check the rules in scan.l to be sure. > Is there any circumstance where an unquoted '$' is valid? $n parameter identifiers (which is why the tag cannot start with a digit). Also, I believe $ can be embedded in identifiers (ie, it can be a non-first character). So a dollar quote can't be adjacent to a preceding identifier. IIRC we tried to do ad-hoc code for dollar quoting in psql, and gave it up as a bad job. You might need to bite the bullet and implement a flex lexer. Why exactly does pg_restore need to parse the SQL anyway? regards, tom lane
At 12:15 PM 12/08/2004, Tom Lane wrote: >IIRC we tried to do ad-hoc code for dollar quoting in psql, and gave it >up as a bad job. You might need to bite the bullet and implement a flex >lexer. Looks like the psql side of things is not ideal (see other post). Any idea how backslashes should handled in and out of the tag? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
At 12:15 PM 12/08/2004, Tom Lane wrote: >Why exactly does pg_restore need to parse the SQL anyway? It just looks for complete statements. From memory it relates to the possibility that TOC entries can have more than one statement, or it may relate to handling COPY statements. I think it has to look for PQresultStatus(...) == PGRES_COPY_IN for each statement it executes, so it needs to pass statements one at a time. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Any idea how backslashes should handled in and out of the tag? Backslashes aren't legal in the tag, I'm quite sure. In the body of the quoted string they are not special in any way. regards, tom lane
Philip Warner <pjw@rhyme.com.au> writes: > At 12:15 PM 12/08/2004, Tom Lane wrote: >> Why exactly does pg_restore need to parse the SQL anyway? > It just looks for complete statements. From memory it relates to the > possibility that TOC entries can have more than one statement, or it may > relate to handling COPY statements. I think it has to look for > PQresultStatus(...) == PGRES_COPY_IN for each statement it executes, so it > needs to pass statements one at a time. Hm. But we could assume that a COPY will be all by itself in a TOC entry, couldn't we? regards, tom lane
At 12:42 PM 12/08/2004, Tom Lane wrote: >Hm. But we could assume that a COPY will be all by itself in a TOC >entry, couldn't we? Maybe. I know I hit a couple of nasty examples in the original code. Isn't the COPY combined with the data? If so, we still have to scan for it's end. The existing scanner is pretty trivial. The dollar-quoting will not make it much worse (I hope). But I'll hold off on the changes if you want to experiment -- I used to use my own DBs + the regression DB for testing. Another possible issue - if I pass two statements in one string to libpq, separated by semicolons, will it cope? If so, has that been true since 7.0? If the answers are ('no',_), or ('yes', 'no') then that explains why pg_restore has to parse statements - the TOC entry can have more than one statement. Sorry to be vague, it's a long time since I wrote the code. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 12:42 PM 12/08/2004, Tom Lane wrote: >> Hm. But we could assume that a COPY will be all by itself in a TOC >> entry, couldn't we? > Maybe. I know I hit a couple of nasty examples in the original code. Isn't > the COPY combined with the data? If so, we still have to scan for it's end. > The existing scanner is pretty trivial. Agreed. But we only emit dollar quoting in CREATE FUNCTION entries, and I don't really see why you need to parse those with any accuracy. I think we could do something here with making assumptions based on the known TOC entry type about what might be in it. > Another possible issue - if I pass two statements in one string to libpq, > separated by semicolons, will it cope? If so, has that been true since 7.0? Yes, and yes, except that if the first one gets an error the second will not be executed. Per the other thread, that's probably a behavior change we don't want. regards, tom lane
At 12:15 PM 12/08/2004, Tom Lane wrote: >You might need to bite the bullet and implement a flex >lexer. I'd like to avoid this if I can; AFAICT, for statement detection on pg_restore, I can require white space before the $tag. Since I also skip over all quoted text, the bodies of functions are ignored. The only issues will be attribute names with ' $' in them, but they will be quoted as well (so ignored). So to recognize a tag, I look for a '$' after white space, and assume it's a tag start. If I subsequently read an invalid tag char, I just go back into scan mode on that character and assume the '$...' was some other valid sql element. From other threads, it sounds like removing the statement detection code entirely is not an option. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Philip Warner said: > At 12:15 PM 12/08/2004, Tom Lane wrote: >>You might need to bite the bullet and implement a flex >>lexer. > > I'd like to avoid this if I can; AFAICT, for statement detection on > pg_restore, I can require white space before the $tag. Since I also > skip over all quoted text, the bodies of functions are ignored. The > only issues will be attribute names with ' $' in them, but they will > be quoted as well (so ignored). > > So to recognize a tag, I look for a '$' after white space, and assume > it's a tag start. If I subsequently read an invalid tag char, I just > go back into scan mode on that character and assume the '$...' was > some other valid sql element. > > From other threads, it sounds like removing the statement detection > code > entirely is not an option. > > See the discussions that culminated here before Tom bit the bullet and implemented a flex scanner for psql: http://archives.postgresql.org/pgsql-patches/2004-02/msg00182.php It's not as easy as you might think. I had a thought that might short-circuit this - do we even need pg_dump to use dollar quoting for non-text output, such as is required by pg_restore? IIRC, the idea was to have text dumps that didn't undo what the user had done in using dollar quoting, but I am not sure that consideration applies to non-text output. Turning it off should be very easy - we already have the switch. Thoughts? cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > I had a thought that might short-circuit this - do we even need pg_dump to > use dollar quoting for non-text output, such as is required by pg_restore? That's a possibility, but I'd rather work around it by finding a way for pg_restore not to need to parse the dollar-quoted literal in the first place. I haven't heard a reason why it needs to parse the contents of a CREATE FUNCTION entry. If we guarantee not to put dollar-quoted literals into the TOC types it *does* need to parse, ISTM we're good. regards, tom lane
At 11:42 PM 12/08/2004, Tom Lane wrote: >That's a possibility, but I'd rather work around it by finding a way for >pg_restore not to need to parse the dollar-quoted literal in the first >place. Two possibilities: 1) Parse the tags (I have the code working): it's not that hard, the only trick bit being recognizing the tags in the first place. I have assumed that any bare unquoted string that is not preceded by valid identifier name chars, and which starts with a '$' may be a dollar quote. This seems valid to me. 2) We could avoid special coding for TOC entry types (eg. pg_restore knowing 'FUNCTION' TOC entries should not be parsed), by changing the TOC data to include a flag/counter (set by pg_dump) indicating that the entry contains > 1 statements. Then we don't hard code knowledge of TOC entry types, and function definitions will not be parsed. Old dump files would be treated as multi-statement, and still be parsed. If my assumption in (1) is valid, then I have a very mild preference for it, but am happy with either. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Added to open items list: * fix dollar quoting problems in pg_restore --------------------------------------------------------------------------- Philip Warner wrote: > > > CREATE FUNCTION xxx() RETURNS integer > > AS $$ begin return 1; > >2004-08-12 01:38:48 EST<zzz,birds>: ERROR: unterminated dollar-quoted > >string at or near "$$ begin return 1;" at character 115 > > > Just realized the problem; pg_restore uses a trivial parser to work out > when statements start/end. It knows about quotes but not about dollar-quotes. > > > > ---------------------------------------------------------------- > Philip Warner | __---_____ > Albatross Consulting Pty. Ltd. |----/ - \ > (A.B.N. 75 008 659 498) | /(@) ______---_ > Tel: (+61) 0500 83 82 81 | _________ \ > Fax: (+61) 03 5330 3172 | ___________ | > Http://www.rhyme.com.au | / \| > | --________-- > PGP key available upon request, | / > and from pgp.mit.edu:11371 |/ > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073