Thread: Lexer patch question

Lexer patch question

From

Bruce Momjian

Date:

15 June 2005, 17:26:05

I am confused why the following change Tom made to scan.l works.
Isn't that 'x' required so xqescape doesn't match '\x'?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: scan.l
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v
retrieving revision 1.123
retrieving revision 1.124
diff -c -c -r1.123 -r1.124
*** scan.l    2 Jun 2005 01:23:08 -0000    1.123
--- scan.l    2 Jun 2005 17:45:17 -0000    1.124
***************
*** 193,199 ****
  xqstart            {quote}
  xqdouble        {quote}{quote}
  xqinside        [^\\']+
! xqescape        [\\][^0-7x]
  xqoctesc        [\\][0-7]{1,3}
  xqhexesc        [\\]x[0-9A-Fa-f]{1,2}

--- 193,199 ----
  xqstart            {quote}
  xqdouble        {quote}{quote}
  xqinside        [^\\']+
! xqescape        [\\][^0-7]
  xqoctesc        [\\][0-7]{1,3}
  xqhexesc        [\\]x[0-9A-Fa-f]{1,2}

Re: Lexer patch question

From

Tom Lane

Date:

15 June 2005, 17:31:34

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am confused why the following change Tom made to scan.l works.
> Isn't that 'x' required so xqescape doesn't match '\x'?

> *** scan.l    2 Jun 2005 01:23:08 -0000    1.123
> --- scan.l    2 Jun 2005 17:45:17 -0000    1.124
> ***************
> *** 193,199 ****
>   xqstart            {quote}
>   xqdouble        {quote}{quote}
>   xqinside        [^\\']+
> ! xqescape        [\\][^0-7x]
>   xqoctesc        [\\][0-7]{1,3}
>   xqhexesc        [\\]x[0-9A-Fa-f]{1,2}

> --- 193,199 ----
>   xqstart            {quote}
>   xqdouble        {quote}{quote}
>   xqinside        [^\\']+
> ! xqescape        [\\][^0-7]
>   xqoctesc        [\\][0-7]{1,3}
>   xqhexesc        [\\]x[0-9A-Fa-f]{1,2}

No; if a match to xqhexesc is possible, the lexer will prefer that match
because it is longer.  If a match to xqhexesc is not possible --- that
is, we have \x not followed by a hex digit --- then we *want* xqescape
to match.  The original coding forced a backup to the <xq>. rule in this
situation, which is not how we want it to behave.

            regards, tom lane

Re: Lexer patch question

From

Bruce Momjian

Date:

15 June 2005, 17:36:39

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am confused why the following change Tom made to scan.l works.
> > Isn't that 'x' required so xqescape doesn't match '\x'?
>
> > *** scan.l    2 Jun 2005 01:23:08 -0000    1.123
> > --- scan.l    2 Jun 2005 17:45:17 -0000    1.124
> > ***************
> > *** 193,199 ****
> >   xqstart            {quote}
> >   xqdouble        {quote}{quote}
> >   xqinside        [^\\']+
> > ! xqescape        [\\][^0-7x]
> >   xqoctesc        [\\][0-7]{1,3}
> >   xqhexesc        [\\]x[0-9A-Fa-f]{1,2}
>
> > --- 193,199 ----
> >   xqstart            {quote}
> >   xqdouble        {quote}{quote}
> >   xqinside        [^\\']+
> > ! xqescape        [\\][^0-7]
> >   xqoctesc        [\\][0-7]{1,3}
> >   xqhexesc        [\\]x[0-9A-Fa-f]{1,2}
>
> No; if a match to xqhexesc is possible, the lexer will prefer that match
> because it is longer.  If a match to xqhexesc is not possible --- that
> is, we have \x not followed by a hex digit --- then we *want* xqescape
> to match.  The original coding forced a backup to the <xq>. rule in this
> situation, which is not how we want it to behave.

Oh, I didn't realize lexers would choose the longer token when given
multiple options.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Lexer patch question

From

Tom Lane

Date:

15 June 2005, 17:39:09

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Oh, I didn't realize lexers would choose the longer token when given
> multiple options.

See lines 95-100 in scan.l:

 * OK, here is a short description of lex/flex rules behavior.
 * The longest pattern which matches an input string is always chosen.
 * For equal-length patterns, the first occurring in the rules list is chosen.
 * INITIAL is the starting state, to which all non-conditional rules apply.
 * Exclusive states change parsing rules while the state is active.  When in
 * an exclusive state, only those rules defined for that state apply.

            regards, tom lane