Thread: Use of backslash in tsearch2

Use of backslash in tsearch2

From
Bruce Momjian
Date:
Bruce Momjian wrote:
>
> I backed out the patch, attached, and it has fixed the regression
> problem.  What has me confused is that is looks like it is checking for
> ', then putting \, which doesn't make a lot of sense, but the regression
> output is corrected, so I just don't get it.  Here is an example:
>
>     test=> SELECT E'''1 \\''2''';
>      ?column?
>     ----------
>      '1 \'2'
>
> My only guess is that the output is somehow a single-quoted string
> itself, and in fact \' should become ''.  Is that right?  Basically they
> are doing \' in their output, and it should be doing '', but then the
> query above would be wrong and shouldn't be using \'.

As part of the move to support standard-conforming strings and treat
backslash literally, I reviewed the tsearch2 code and found two place
that seemed to use \' rather than '', and generated the attached patch.
('' is standards conforming.)  However, when I fixed the code, the
regression tests failed.

Teodor, are the new attached regression results correct?  If so, I will
apply the patch and update the expected file.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: contrib/tsearch2/query.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/query.c,v
retrieving revision 1.25
diff -c -c -r1.25 query.c
*** contrib/tsearch2/query.c    19 May 2006 02:38:47 -0000    1.25
--- contrib/tsearch2/query.c    19 May 2006 04:37:35 -0000
***************
*** 748,754 ****
          {
              if ( t_iseq(op, '\'') )
              {
!                 *(in->cur) = '\'';
                  in->cur++;
              }
              COPYCHAR(in->cur,op);
--- 748,754 ----
          {
              if ( t_iseq(op, '\'') )
              {
!                 *(in->cur) = '\\';
                  in->cur++;
              }
              COPYCHAR(in->cur,op);
Index: contrib/tsearch2/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/tsvector.c,v
retrieving revision 1.18
diff -c -c -r1.18 tsvector.c
*** contrib/tsearch2/tsvector.c    19 May 2006 02:38:47 -0000    1.18
--- contrib/tsearch2/tsvector.c    19 May 2006 04:37:39 -0000
***************
*** 529,535 ****

                  outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
                  curout = outbuf + pos;
!                 *curout++ = '\'';
              }
              while(len--)
                  *curout++ = *curin++;
--- 529,535 ----

                  outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
                  curout = outbuf + pos;
!                 *curout++ = '\\';
              }
              while(len--)
                  *curout++ = *curin++;
*** ./expected/tsearch2.out    Wed May 31 10:05:31 2006
--- ./results/tsearch2.out    Mon Aug 21 20:01:12 2006
***************
*** 59,83 ****
  SELECT E'''1 \\''2'''::tsvector;
   tsvector
  ----------
!  '1 \'2'
  (1 row)

  SELECT E'''1 \\''2''3'::tsvector;
    tsvector
  -------------
!  '3' '1 \'2'
  (1 row)

  SELECT E'''1 \\''2'' 3'::tsvector;
    tsvector
  -------------
!  '3' '1 \'2'
  (1 row)

  SELECT E'''1 \\''2'' '' 3'' 4 '::tsvector;
       tsvector
  ------------------
!  '4' ' 3' '1 \'2'
  (1 row)

  select '''w'':4A,3B,2C,1D,5 a:8';
--- 59,83 ----
  SELECT E'''1 \\''2'''::tsvector;
   tsvector
  ----------
!  '1 ''2'
  (1 row)

  SELECT E'''1 \\''2''3'::tsvector;
    tsvector
  -------------
!  '3' '1 ''2'
  (1 row)

  SELECT E'''1 \\''2'' 3'::tsvector;
    tsvector
  -------------
!  '3' '1 ''2'
  (1 row)

  SELECT E'''1 \\''2'' '' 3'' 4 '::tsvector;
       tsvector
  ------------------
!  '4' ' 3' '1 ''2'
  (1 row)

  select '''w'':4A,3B,2C,1D,5 a:8';
***************
*** 138,144 ****
  SELECT E'''1 \\''2'''::tsquery;
   tsquery
  ---------
!  '1 \'2'
  (1 row)

  SELECT '!1'::tsquery;
--- 138,144 ----
  SELECT E'''1 \\''2'''::tsquery;
   tsquery
  ---------
!  '1 ''2'
  (1 row)

  SELECT '!1'::tsquery;
***************
*** 336,342 ****
  SELECT E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'::tsquery;
                   tsquery
  ------------------------------------------
!  '1' & '2' & ' 4' & ( '|5' | '6 \' !|&' )
  (1 row)

  SELECT '''the wether'':dc & '' sKies '':BC & a:d b:a';
--- 336,342 ----
  SELECT E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'::tsquery;
                   tsquery
  ------------------------------------------
!  '1' & '2' & ' 4' & ( '|5' | '6 '' !|&' )
  (1 row)

  SELECT '''the wether'':dc & '' sKies '':BC & a:d b:a';

======================================================================


Re: Use of backslash in tsearch2

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> As part of the move to support standard-conforming strings and treat
> backslash literally, I reviewed the tsearch2 code and found two place
> that seemed to use \' rather than '', and generated the attached patch.

I thought we had decided that that code should not be changed.  It has
nothing to do with SQL literals, and changing it will create unnecessary
backwards-compatibility problems.

            regards, tom lane

Re: Use of backslash in tsearch2

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > As part of the move to support standard-conforming strings and treat
> > backslash literally, I reviewed the tsearch2 code and found two place
> > that seemed to use \' rather than '', and generated the attached patch.
>
> I thought we had decided that that code should not be changed.  It has
> nothing to do with SQL literals, and changing it will create unnecessary
> backwards-compatibility problems.

I don't remember any comment regarding that.  I think it does relate to
SQL literals because it is creating a literal inside a literal. Also, at
the time this was a core-only discussion and I am hoping from a comment
from the tsearch2 folks.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Use of backslash in tsearch2

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Bruce Momjian wrote:
> >
> > I backed out the patch, attached, and it has fixed the regression
> > problem.  What has me confused is that is looks like it is checking for
> > ', then putting \, which doesn't make a lot of sense, but the regression
> > output is corrected, so I just don't get it.  Here is an example:
> >
> >     test=> SELECT E'''1 \\''2''';
> >      ?column?
> >     ----------
> >      '1 \'2'
> >
> > My only guess is that the output is somehow a single-quoted string
> > itself, and in fact \' should become ''.  Is that right?  Basically they
> > are doing \' in their output, and it should be doing '', but then the
> > query above would be wrong and shouldn't be using \'.
>
> As part of the move to support standard-conforming strings and treat
> backslash literally, I reviewed the tsearch2 code and found two place
> that seemed to use \' rather than '', and generated the attached patch.
> ('' is standards conforming.)  However, when I fixed the code, the
> regression tests failed.
>
> Teodor, are the new attached regression results correct?  If so, I will
> apply the patch and update the expected file.

Updated patch attached.  The previous one was reversed.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: contrib/tsearch2/query.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/query.c,v
retrieving revision 1.26
diff -c -c -r1.26 query.c
*** contrib/tsearch2/query.c    19 May 2006 04:39:47 -0000    1.26
--- contrib/tsearch2/query.c    21 Aug 2006 23:58:20 -0000
***************
*** 748,754 ****
          {
              if ( t_iseq(op, '\'') )
              {
!                 *(in->cur) = '\\';
                  in->cur++;
              }
              COPYCHAR(in->cur,op);
--- 748,754 ----
          {
              if ( t_iseq(op, '\'') )
              {
!                 *(in->cur) = '\'';
                  in->cur++;
              }
              COPYCHAR(in->cur,op);
Index: contrib/tsearch2/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/tsvector.c,v
retrieving revision 1.20
diff -c -c -r1.20 tsvector.c
*** contrib/tsearch2/tsvector.c    11 Jul 2006 18:26:10 -0000    1.20
--- contrib/tsearch2/tsvector.c    21 Aug 2006 23:58:20 -0000
***************
*** 529,535 ****

                  outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
                  curout = outbuf + pos;
!                 *curout++ = '\\';
              }
              while(len--)
                  *curout++ = *curin++;
--- 529,535 ----

                  outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
                  curout = outbuf + pos;
!                 *curout++ = '\'';
              }
              while(len--)
                  *curout++ = *curin++;

Re: Use of backslash in tsearch2

From
Teodor Sigaev
Date:
>> Teodor, are the new attached regression results correct?  If so, I will
>> apply the patch and update the expected file.

Patch isn't full, simple test (values are took from regression.diffs):
# create table tt (tv tsvector, tq tsquery);
# insert into tt values (E'''1 \\''2''', NULL);
# insert into tt values (E'''1 \\''2''3', NULL);
# insert into tt values ( E'''1 \\''2'' 3', NULL);
# insert into tt values ( E'''1 \\''2'' '' 3'' 4 ', NULL);
# insert into tt values ( NULL, E'''1 \\''2''');
# insert into tt values ( NULL, E'''1 \\''2''');
# insert into tt values ( NULL, E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))');
# insert into tt values ( NULL, E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))');


and try dump table and restore:
ERROR:  syntax error
CONTEXT:  COPY tt, line 5, column tq: "'1 ''2'"

PS I'm not subscribed to -patches, so I post to -hackers

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: [HACKERS] Use of backslash in tsearch2

From
Teodor Sigaev
Date:
> Patch isn't full, simple test (values are took from regression.diffs):
> and try dump table and restore:
> ERROR:  syntax error
> CONTEXT:  COPY tt, line 5, column tq: "'1 ''2'"
>

Attached cumulative patch fixes problem, but I have some doubts, is it really
needed?


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: [HACKERS] Use of backslash in tsearch2

From
Bruce Momjian
Date:
Thanks.  Yes, it is need for two reasons.  In 8.2 you can set
standard_conforming_strings to on, meaning \' is really treated as \ and
', and because some encodings now can't support \' for security reasons,
though I don't think tsearch2 supports those multibyte encodings.
Anyway, applied to 8.2 only, not backpatched.  Thanks.

---------------------------------------------------------------------------

Teodor Sigaev wrote:
> > Patch isn't full, simple test (values are took from regression.diffs):
> > and try dump table and restore:
> > ERROR:  syntax error
> > CONTEXT:  COPY tt, line 5, column tq: "'1 ''2'"
> >
>
> Attached cumulative patch fixes problem, but I have some doubts, is it really
> needed?
>
>
> --
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
>                                                     WWW: http://www.sigaev.ru/

[ application/x-tar is not supported, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +