Re: [PATCHES] dollar quoting - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: [PATCHES] dollar quoting
Date
Msg-id 402E7F4F.3080300@dunslane.net
Whole thread Raw
In response to Re: [PATCHES] dollar quoting  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCHES] dollar quoting  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>
>
>>I ended up not using a regex, which seemed to be a little heavy handed,
>>but just writing a small custom recognition function, that should (and I
>>think does) mimic the pattern recognition for these tokens used by the
>>backend lexer.
>>
>>
>
>I looked at this and realized that it still doesn't do very well at
>distinguishing $foo$ from other random uses of $.  The problem is that
>looking back at just the immediately preceding character isn't enough
>context to tell whether a $ is part of an identifier.  Consider the
>input
>    a42$foo$
>This is a legal identifier according to PG 7.4.  But how about
>    42$foo$
>This is a syntax error in 7.4, and we propose to redefine it as an
>integer literal '42' followed by a dollar-quote start symbol.
>

The test in the patch I sent is this:


            else if (!dol_quote && valid_dolquote(line+i) &&
                     (i == 0 ||
                      ! ((line[i-prevlen] & 0x80) != 0 ||
                         isalnum(line[i-prevlen]) ||
                         line[i-prevlen] == '_' ||
                         line[i-prevlen] == '$' )))


The test should not succeed anywhere in the string '42$foo$'.

Note that psql does not change any '$foo$' at all - it just passes it to
the backend. The reason we need this at all in psql is that it has to
detect the end of a statement, and it has to prompt correctly, and to do
that it needs to know if we are in a quote (single, double, dollar) or a
comment.

psql does not detect many syntax errors, or even lexical errors - that
is the job of the backend - rightly so, I believe.

>
>There's no way to tell these apart with a single-character lookback,
>or indeed any fixed number of characters of lookback.
>

I'm still not convinced, although maybe there's something I'm not getting.

>
>I begin to think that we'll really have to bite the bullet and convert
>psql's input parser to use flex.  If we're not scanning with exactly the
>same rules as the backend uses, we're going to get the wrong answers.
>
>
>

Interacting with lexer states would probably be ... unpleasant. Matching
a stream oriented lexer with a line oriented CLI would be messy I suspect.

cheers

andrew


pgsql-hackers by date:

Previous
From: Joseph Tate
Date:
Subject: Re: pg_restore problems and suggested resolution
Next
From: "m.jjoe"
Date:
Subject: Participate to translation