Thread: Lexer patch question
I am confused why the following change Tom made to scan.l works. Isn't that 'x' required so xqescape doesn't match '\x'? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: scan.l =================================================================== RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v retrieving revision 1.123 retrieving revision 1.124 diff -c -c -r1.123 -r1.124 *** scan.l 2 Jun 2005 01:23:08 -0000 1.123 --- scan.l 2 Jun 2005 17:45:17 -0000 1.124 *************** *** 193,199 **** xqstart {quote} xqdouble {quote}{quote} xqinside [^\\']+ ! xqescape [\\][^0-7x] xqoctesc [\\][0-7]{1,3} xqhexesc [\\]x[0-9A-Fa-f]{1,2} --- 193,199 ---- xqstart {quote} xqdouble {quote}{quote} xqinside [^\\']+ ! xqescape [\\][^0-7] xqoctesc [\\][0-7]{1,3} xqhexesc [\\]x[0-9A-Fa-f]{1,2}
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I am confused why the following change Tom made to scan.l works. > Isn't that 'x' required so xqescape doesn't match '\x'? > *** scan.l 2 Jun 2005 01:23:08 -0000 1.123 > --- scan.l 2 Jun 2005 17:45:17 -0000 1.124 > *************** > *** 193,199 **** > xqstart {quote} > xqdouble {quote}{quote} > xqinside [^\\']+ > ! xqescape [\\][^0-7x] > xqoctesc [\\][0-7]{1,3} > xqhexesc [\\]x[0-9A-Fa-f]{1,2} > --- 193,199 ---- > xqstart {quote} > xqdouble {quote}{quote} > xqinside [^\\']+ > ! xqescape [\\][^0-7] > xqoctesc [\\][0-7]{1,3} > xqhexesc [\\]x[0-9A-Fa-f]{1,2} No; if a match to xqhexesc is possible, the lexer will prefer that match because it is longer. If a match to xqhexesc is not possible --- that is, we have \x not followed by a hex digit --- then we *want* xqescape to match. The original coding forced a backup to the <xq>. rule in this situation, which is not how we want it to behave. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I am confused why the following change Tom made to scan.l works. > > Isn't that 'x' required so xqescape doesn't match '\x'? > > > *** scan.l 2 Jun 2005 01:23:08 -0000 1.123 > > --- scan.l 2 Jun 2005 17:45:17 -0000 1.124 > > *************** > > *** 193,199 **** > > xqstart {quote} > > xqdouble {quote}{quote} > > xqinside [^\\']+ > > ! xqescape [\\][^0-7x] > > xqoctesc [\\][0-7]{1,3} > > xqhexesc [\\]x[0-9A-Fa-f]{1,2} > > > --- 193,199 ---- > > xqstart {quote} > > xqdouble {quote}{quote} > > xqinside [^\\']+ > > ! xqescape [\\][^0-7] > > xqoctesc [\\][0-7]{1,3} > > xqhexesc [\\]x[0-9A-Fa-f]{1,2} > > No; if a match to xqhexesc is possible, the lexer will prefer that match > because it is longer. If a match to xqhexesc is not possible --- that > is, we have \x not followed by a hex digit --- then we *want* xqescape > to match. The original coding forced a backup to the <xq>. rule in this > situation, which is not how we want it to behave. Oh, I didn't realize lexers would choose the longer token when given multiple options. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Oh, I didn't realize lexers would choose the longer token when given > multiple options. See lines 95-100 in scan.l: * OK, here is a short description of lex/flex rules behavior. * The longest pattern which matches an input string is always chosen. * For equal-length patterns, the first occurring in the rules list is chosen. * INITIAL is the starting state, to which all non-conditional rules apply. * Exclusive states change parsing rules while the state is active. When in * an exclusive state, only those rules defined for that state apply. regards, tom lane