Thread: Use of backslash in tsearch2
Bruce Momjian wrote: > > I backed out the patch, attached, and it has fixed the regression > problem. What has me confused is that is looks like it is checking for > ', then putting \, which doesn't make a lot of sense, but the regression > output is corrected, so I just don't get it. Here is an example: > > test=> SELECT E'''1 \\''2'''; > ?column? > ---------- > '1 \'2' > > My only guess is that the output is somehow a single-quoted string > itself, and in fact \' should become ''. Is that right? Basically they > are doing \' in their output, and it should be doing '', but then the > query above would be wrong and shouldn't be using \'. As part of the move to support standard-conforming strings and treat backslash literally, I reviewed the tsearch2 code and found two place that seemed to use \' rather than '', and generated the attached patch. ('' is standards conforming.) However, when I fixed the code, the regression tests failed. Teodor, are the new attached regression results correct? If so, I will apply the patch and update the expected file. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: contrib/tsearch2/query.c =================================================================== RCS file: /cvsroot/pgsql/contrib/tsearch2/query.c,v retrieving revision 1.25 diff -c -c -r1.25 query.c *** contrib/tsearch2/query.c 19 May 2006 02:38:47 -0000 1.25 --- contrib/tsearch2/query.c 19 May 2006 04:37:35 -0000 *************** *** 748,754 **** { if ( t_iseq(op, '\'') ) { ! *(in->cur) = '\''; in->cur++; } COPYCHAR(in->cur,op); --- 748,754 ---- { if ( t_iseq(op, '\'') ) { ! *(in->cur) = '\\'; in->cur++; } COPYCHAR(in->cur,op); Index: contrib/tsearch2/tsvector.c =================================================================== RCS file: /cvsroot/pgsql/contrib/tsearch2/tsvector.c,v retrieving revision 1.18 diff -c -c -r1.18 tsvector.c *** contrib/tsearch2/tsvector.c 19 May 2006 02:38:47 -0000 1.18 --- contrib/tsearch2/tsvector.c 19 May 2006 04:37:39 -0000 *************** *** 529,535 **** outbuf = (char *) repalloc((void *) outbuf, ++lenbuf); curout = outbuf + pos; ! *curout++ = '\''; } while(len--) *curout++ = *curin++; --- 529,535 ---- outbuf = (char *) repalloc((void *) outbuf, ++lenbuf); curout = outbuf + pos; ! *curout++ = '\\'; } while(len--) *curout++ = *curin++; *** ./expected/tsearch2.out Wed May 31 10:05:31 2006 --- ./results/tsearch2.out Mon Aug 21 20:01:12 2006 *************** *** 59,83 **** SELECT E'''1 \\''2'''::tsvector; tsvector ---------- ! '1 \'2' (1 row) SELECT E'''1 \\''2''3'::tsvector; tsvector ------------- ! '3' '1 \'2' (1 row) SELECT E'''1 \\''2'' 3'::tsvector; tsvector ------------- ! '3' '1 \'2' (1 row) SELECT E'''1 \\''2'' '' 3'' 4 '::tsvector; tsvector ------------------ ! '4' ' 3' '1 \'2' (1 row) select '''w'':4A,3B,2C,1D,5 a:8'; --- 59,83 ---- SELECT E'''1 \\''2'''::tsvector; tsvector ---------- ! '1 ''2' (1 row) SELECT E'''1 \\''2''3'::tsvector; tsvector ------------- ! '3' '1 ''2' (1 row) SELECT E'''1 \\''2'' 3'::tsvector; tsvector ------------- ! '3' '1 ''2' (1 row) SELECT E'''1 \\''2'' '' 3'' 4 '::tsvector; tsvector ------------------ ! '4' ' 3' '1 ''2' (1 row) select '''w'':4A,3B,2C,1D,5 a:8'; *************** *** 138,144 **** SELECT E'''1 \\''2'''::tsquery; tsquery --------- ! '1 \'2' (1 row) SELECT '!1'::tsquery; --- 138,144 ---- SELECT E'''1 \\''2'''::tsquery; tsquery --------- ! '1 ''2' (1 row) SELECT '!1'::tsquery; *************** *** 336,342 **** SELECT E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'::tsquery; tsquery ------------------------------------------ ! '1' & '2' & ' 4' & ( '|5' | '6 \' !|&' ) (1 row) SELECT '''the wether'':dc & '' sKies '':BC & a:d b:a'; --- 336,342 ---- SELECT E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'::tsquery; tsquery ------------------------------------------ ! '1' & '2' & ' 4' & ( '|5' | '6 '' !|&' ) (1 row) SELECT '''the wether'':dc & '' sKies '':BC & a:d b:a'; ======================================================================
Bruce Momjian <bruce@momjian.us> writes: > As part of the move to support standard-conforming strings and treat > backslash literally, I reviewed the tsearch2 code and found two place > that seemed to use \' rather than '', and generated the attached patch. I thought we had decided that that code should not be changed. It has nothing to do with SQL literals, and changing it will create unnecessary backwards-compatibility problems. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > As part of the move to support standard-conforming strings and treat > > backslash literally, I reviewed the tsearch2 code and found two place > > that seemed to use \' rather than '', and generated the attached patch. > > I thought we had decided that that code should not be changed. It has > nothing to do with SQL literals, and changing it will create unnecessary > backwards-compatibility problems. I don't remember any comment regarding that. I think it does relate to SQL literals because it is creating a literal inside a literal. Also, at the time this was a core-only discussion and I am hoping from a comment from the tsearch2 folks. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Bruce Momjian wrote: > > > > I backed out the patch, attached, and it has fixed the regression > > problem. What has me confused is that is looks like it is checking for > > ', then putting \, which doesn't make a lot of sense, but the regression > > output is corrected, so I just don't get it. Here is an example: > > > > test=> SELECT E'''1 \\''2'''; > > ?column? > > ---------- > > '1 \'2' > > > > My only guess is that the output is somehow a single-quoted string > > itself, and in fact \' should become ''. Is that right? Basically they > > are doing \' in their output, and it should be doing '', but then the > > query above would be wrong and shouldn't be using \'. > > As part of the move to support standard-conforming strings and treat > backslash literally, I reviewed the tsearch2 code and found two place > that seemed to use \' rather than '', and generated the attached patch. > ('' is standards conforming.) However, when I fixed the code, the > regression tests failed. > > Teodor, are the new attached regression results correct? If so, I will > apply the patch and update the expected file. Updated patch attached. The previous one was reversed. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: contrib/tsearch2/query.c =================================================================== RCS file: /cvsroot/pgsql/contrib/tsearch2/query.c,v retrieving revision 1.26 diff -c -c -r1.26 query.c *** contrib/tsearch2/query.c 19 May 2006 04:39:47 -0000 1.26 --- contrib/tsearch2/query.c 21 Aug 2006 23:58:20 -0000 *************** *** 748,754 **** { if ( t_iseq(op, '\'') ) { ! *(in->cur) = '\\'; in->cur++; } COPYCHAR(in->cur,op); --- 748,754 ---- { if ( t_iseq(op, '\'') ) { ! *(in->cur) = '\''; in->cur++; } COPYCHAR(in->cur,op); Index: contrib/tsearch2/tsvector.c =================================================================== RCS file: /cvsroot/pgsql/contrib/tsearch2/tsvector.c,v retrieving revision 1.20 diff -c -c -r1.20 tsvector.c *** contrib/tsearch2/tsvector.c 11 Jul 2006 18:26:10 -0000 1.20 --- contrib/tsearch2/tsvector.c 21 Aug 2006 23:58:20 -0000 *************** *** 529,535 **** outbuf = (char *) repalloc((void *) outbuf, ++lenbuf); curout = outbuf + pos; ! *curout++ = '\\'; } while(len--) *curout++ = *curin++; --- 529,535 ---- outbuf = (char *) repalloc((void *) outbuf, ++lenbuf); curout = outbuf + pos; ! *curout++ = '\''; } while(len--) *curout++ = *curin++;
>> Teodor, are the new attached regression results correct? If so, I will >> apply the patch and update the expected file. Patch isn't full, simple test (values are took from regression.diffs): # create table tt (tv tsvector, tq tsquery); # insert into tt values (E'''1 \\''2''', NULL); # insert into tt values (E'''1 \\''2''3', NULL); # insert into tt values ( E'''1 \\''2'' 3', NULL); # insert into tt values ( E'''1 \\''2'' '' 3'' 4 ', NULL); # insert into tt values ( NULL, E'''1 \\''2'''); # insert into tt values ( NULL, E'''1 \\''2'''); # insert into tt values ( NULL, E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'); # insert into tt values ( NULL, E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'); and try dump table and restore: ERROR: syntax error CONTEXT: COPY tt, line 5, column tq: "'1 ''2'" PS I'm not subscribed to -patches, so I post to -hackers -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> Patch isn't full, simple test (values are took from regression.diffs): > and try dump table and restore: > ERROR: syntax error > CONTEXT: COPY tt, line 5, column tq: "'1 ''2'" > Attached cumulative patch fixes problem, but I have some doubts, is it really needed? -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
Thanks. Yes, it is need for two reasons. In 8.2 you can set standard_conforming_strings to on, meaning \' is really treated as \ and ', and because some encodings now can't support \' for security reasons, though I don't think tsearch2 supports those multibyte encodings. Anyway, applied to 8.2 only, not backpatched. Thanks. --------------------------------------------------------------------------- Teodor Sigaev wrote: > > Patch isn't full, simple test (values are took from regression.diffs): > > and try dump table and restore: > > ERROR: syntax error > > CONTEXT: COPY tt, line 5, column tq: "'1 ''2'" > > > > Attached cumulative patch fixes problem, but I have some doubts, is it really > needed? > > > -- > Teodor Sigaev E-mail: teodor@sigaev.ru > WWW: http://www.sigaev.ru/ [ application/x-tar is not supported, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +