Thread: tsearch2 regression test failures

tsearch2 regression test failures

From
Magnus Hagander
Date:
tsearch2 regression tests are also failing on win32/msvc, with attached
diffs.

Any pointers on where to start? ;)

//Magnus
*** ./expected/tsearch2.out    2006-09-10 19:36:52.000000000 +0200
--- ./results/tsearch2.out    2007-03-24 15:03:01.593750000 +0100
***************
*** 799,806 ****
  /usr/local/fff /awdf/dwqe/4325 rewt/ewr wefjn /wqe-324/ewr gist.h gist.h.c gist.c. readline 4.2 4.2. 4.2,
readline-4.2readline-4.2. 234  
  <i <b> wow  < jqw <> qwerty');



                                                    to_tsvector


                                                                                                                   
!
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!  'ad':17 'dw':19 'jf':39 '234':63 '345':1 '4.2':54,55,56,59,62 '455':31 'jqw':66 'qwe':2,18,27,28,35 'wer':36
'wow':65'asdf':37 'ewr1':43 'qwer':38 'sdjk':40 '5.005':32 'efd.r':3 'ewri2':44 'hjwer':42 'qwqwe':29 'wefjn':48
'gist.c':52'gist.h':50 'qwerti':67 '234.435':30 'qwe-wer':34 'readlin':53,58,61 'www.com':4 '+4.0e-10':26 'gist.h.c':51
'rewt/ewr':47'/?ad=qwe&dw':7,10,14,22 '/wqe-324/ewr':49 'aew.werc.ewr':6 'readline-4.2':57,60 '1aew.werc.ewr':9
'2aew.werc.ewr':11'3aew.werc.ewr':13 '4aew.werc.ewr':15 '/usr/local/fff':45 '/awdf/dwqe/4325':46 'teodor@stack.net':33
'/?ad=qwe&dw=%20%32':25'5aew.werc.ewr:8100':16 '6aew.werc.ewr:8100':21 '7aew.werc.ewr:8100':24
'aew.werc.ewr/?ad=qwe&dw':5'1aew.werc.ewr/?ad=qwe&dw':8 '3aew.werc.ewr/?ad=qwe&dw':12
'6aew.werc.ewr:8100/?ad=qwe&dw':20'7aew.werc.ewr:8100/?ad=qwe&dw=%20%32':23 
  (1 row)

  SELECT length(to_tsvector('default', '345 qw'));
--- 799,806 ----
  /usr/local/fff /awdf/dwqe/4325 rewt/ewr wefjn /wqe-324/ewr gist.h gist.h.c gist.c. readline 4.2 4.2. 4.2,
readline-4.2readline-4.2. 234  
  <i <b> wow  < jqw <> qwerty');



                                                            to_tsvector



         
!
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!  'i':64 'ad':17 'dw':19 'jf':39 'we':41 '234':63 '345':1 '4.2':54,55,56,59,62 '455':31 'jqw':66 'qwe':2,18,27,28,35
'wer':36'wow':65 'asdf':37 'ewr1':43 'qwer':38 'sdjk':40 '5.005':32 'efd.r':3 'ewri2':44 'hjwer':42 'qwqwe':29
'wefjn':48'gist.c':52 'gist.h':50 'qwerti':67 '234.435':30 'qwe-wer':34 'readlin':53,58,61 'www.com':4 '+4.0e-10':26
'gist.h.c':51'rewt/ewr':47 '/?ad=qwe&dw':7,10,14,22 '/wqe-324/ewr':49 'aew.werc.ewr':6 'readline-4.2':57,60
'1aew.werc.ewr':9'2aew.werc.ewr':11 '3aew.werc.ewr':13 '4aew.werc.ewr':15 '/usr/local/fff':45 '/awdf/dwqe/4325':46
'teodor@stack.net':33'/?ad=qwe&dw=%20%32':25 '5aew.werc.ewr:8100':16 '6aew.werc.ewr:8100':21 '7aew.werc.ewr:8100':24
'aew.werc.ewr/?ad=qwe&dw':5'1aew.werc.ewr/?ad=qwe&dw':8 '3aew.werc.ewr/?ad=qwe&dw':12
'6aew.werc.ewr:8100/?ad=qwe&dw':20'7aew.werc.ewr:8100/?ad=qwe&dw=%20%32':23 
  (1 row)

  SELECT length(to_tsvector('default', '345 qw'));
***************
*** 814,820 ****
  <i <b> wow  < jqw <> qwerty'));
   length
  --------
!      51
  (1 row)

  select to_tsquery('default', 'qwe & sKies ');
--- 814,820 ----
  <i <b> wow  < jqw <> qwerty'));
   length
  --------
!      53
  (1 row)

  select to_tsquery('default', 'qwe & sKies ');
***************
*** 831,868 ****

  select to_tsquery('default', '''the wether'':dc & ''           sKies '':BC ');
         to_tsquery
! ------------------------
!  'wether':CD & 'sky':BC
  (1 row)

  select to_tsquery('default', 'asd&(and|fghj)');
     to_tsquery
! ----------------
!  'asd' & 'fghj'
  (1 row)

  select to_tsquery('default', '(asd&and)|fghj');
     to_tsquery
! ----------------
!  'asd' | 'fghj'
  (1 row)

  select to_tsquery('default', '(asd&!and)|fghj');
     to_tsquery
! ----------------
!  'asd' | 'fghj'
  (1 row)

  select to_tsquery('default', '(the|and&(i&1))&fghj');
    to_tsquery
! --------------
!  '1' & 'fghj'
  (1 row)

  select plainto_tsquery('default', 'the and z 1))& fghj');
    plainto_tsquery
! --------------------
!  'z' & '1' & 'fghj'
  (1 row)

  select plainto_tsquery('default', 'foo bar') && plainto_tsquery('default', 'asd');
--- 831,868 ----

  select to_tsquery('default', '''the wether'':dc & ''           sKies '':BC ');
              to_tsquery
! -----------------------------------
!  'the':CD & 'wether':CD & 'sky':BC
  (1 row)

  select to_tsquery('default', 'asd&(and|fghj)');
           to_tsquery
! ----------------------------
!  'asd' & ( 'and' | 'fghj' )
  (1 row)

  select to_tsquery('default', '(asd&and)|fghj');
         to_tsquery
! ------------------------
!  'asd' & 'and' | 'fghj'
  (1 row)

  select to_tsquery('default', '(asd&!and)|fghj');
         to_tsquery
! -------------------------
!  'asd' & !'and' | 'fghj'
  (1 row)

  select to_tsquery('default', '(the|and&(i&1))&fghj');
                 to_tsquery
! ----------------------------------------
!  ( 'the' | 'and' & 'i' & '1' ) & 'fghj'
  (1 row)

  select plainto_tsquery('default', 'the and z 1))& fghj');
            plainto_tsquery
! ------------------------------------
!  'the' & 'and' & 'z' & '1' & 'fghj'
  (1 row)

  select plainto_tsquery('default', 'foo bar') && plainto_tsquery('default', 'asd');
***************
*** 2227,2234 ****
   9h        |    1 |      1
   9r        |    1 |      1
   9w        |    1 |      1
   qwerti    |    1 |      1
! (1146 rows)

  insert into test_tsvector values ('1', 'a:1a,2,3b b:5a,6a,7c,8');
  insert into test_tsvector values ('1', 'a:1a,2,3c b:5a,6b,7c,8b');
--- 2227,2236 ----
   9h        |    1 |      1
   9r        |    1 |      1
   9w        |    1 |      1
+  over      |    1 |      1
   qwerti    |    1 |      1
!  the       |    1 |      1
! (1148 rows)

  insert into test_tsvector values ('1', 'a:1a,2,3b b:5a,6a,7c,8');
  insert into test_tsvector values ('1', 'a:1a,2,3c b:5a,6b,7c,8b');
***************
*** 2262,2270 ****
   bar       |    1 |      2
   345       |    1 |      1
   b         |    1 |      1
   qq        |    1 |      1
   qwerti    |    1 |      1
! (8 rows)

  select * from stat('select a from test_tsvector','ad') order by ndoc desc, nentry desc, word;
     word    | ndoc | nentry
--- 2264,2274 ----
   bar       |    1 |      2
   345       |    1 |      1
   b         |    1 |      1
+  over      |    1 |      1
   qq        |    1 |      1
   qwerti    |    1 |      1
!  the       |    1 |      1
! (10 rows)

  select * from stat('select a from test_tsvector','ad') order by ndoc desc, nentry desc, word;
     word    | ndoc | nentry
***************
*** 2275,2283 ****
   foo       |    1 |      3
   bar       |    1 |      2
   345       |    1 |      1
   qq        |    1 |      1
   qwerti    |    1 |      1
! (8 rows)

  select reset_tsearch();
  NOTICE:  TSearch cache cleaned
--- 2279,2289 ----
   foo       |    1 |      3
   bar       |    1 |      2
   345       |    1 |      1
+  over      |    1 |      1
   qq        |    1 |      1
   qwerti    |    1 |      1
!  the       |    1 |      1
! (10 rows)

  select reset_tsearch();
  NOTICE:  TSearch cache cleaned
***************
*** 2344,2351 ****
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  '), to_tsquery('sea&thousand&years'));
                                                                                               get_covers
                                                                              
!
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!  eros took {1 sea thousand year }1 {2 thousand year trace granit featur cliff crag scarp base took sea }2 hour one
nighthour storm place sculptur granit seam upon woman face e j pratt 1882 1964  
  (1 row)

  select get_covers(to_tsvector('Erosion It took the sea a thousand years,
--- 2350,2357 ----
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  '), to_tsquery('sea&thousand&years'));

          get_covers
                              
!
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!  eros it took the {1 sea a thousand year }1 a {2 thousand year to trace the granit featur of this cliff in crag and
scarpand base it took the sea }2 an hour one night an hour of storm to place the sculptur of these granit seam upon a
womans face e j pratt 1882 1964  
  (1 row)

  select get_covers(to_tsvector('Erosion It took the sea a thousand years,
***************
*** 2358,2365 ****
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  '), to_tsquery('granite&sea'));
                                                                                                  get_covers
                                                                                    
!
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!  eros took {1 sea thousand year thousand year trace {2 granit }1 featur cliff crag scarp base took {3 sea }2 hour one
nighthour storm place sculptur granit }3 seam upon woman face e j pratt 1882 1964  
  (1 row)

  select get_covers(to_tsvector('Erosion It took the sea a thousand years,
--- 2364,2371 ----
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  '), to_tsquery('granite&sea'));

             get_covers
                                    
!
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!  eros it took the {1 sea a thousand year a thousand year to trace the {2 granit }1 featur of this cliff in crag and
scarpand base it took the {3 sea }2 an hour one night an hour of storm to place the sculptur of these granit }3 seam
upona woman s face e j pratt 1882 1964  
  (1 row)

  select get_covers(to_tsvector('Erosion It took the sea a thousand years,
***************
*** 2372,2379 ****
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  '), to_tsquery('sea'));
                                                                                               get_covers
                                                                              
!
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!  eros took {1 sea }1 thousand year thousand year trace granit featur cliff crag scarp base took {2 sea }2 hour one
nighthour storm place sculptur granit seam upon woman face e j pratt 1882 1964  
  (1 row)

  select headline('Erosion It took the sea a thousand years,
--- 2378,2385 ----
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  '), to_tsquery('sea'));

          get_covers
                              
!
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!  eros it took the {1 sea }1 a thousand year a thousand year to trace the granit featur of this cliff in crag and
scarpand base it took the {2 sea }2 an hour one night an hour of storm to place the sculptur of these granit seam upon
awoman s face e j pratt 1882 1964  
  (1 row)

  select headline('Erosion It took the sea a thousand years,
***************
*** 2461,2467 ****
  ---------+----------+-------------+------------+-----------+--------------
   default | lword    | Latin word  | Tsearch    | {en_stem} | 'tsearch'
   default | lword    | Latin word  | module     | {en_stem} | 'modul'
!  default | lword    | Latin word  | for        | {en_stem} |
   default | lword    | Latin word  | PostgreSQL | {en_stem} | 'postgresql'
   default | version  | VERSION     | 7.3.3      | {simple}  | '7.3.3'
  (5 rows)
--- 2467,2473 ----
  ---------+----------+-------------+------------+-----------+--------------
   default | lword    | Latin word  | Tsearch    | {en_stem} | 'tsearch'
   default | lword    | Latin word  | module     | {en_stem} | 'modul'
!  default | lword    | Latin word  | for        | {en_stem} | 'for'
   default | lword    | Latin word  | PostgreSQL | {en_stem} | 'postgresql'
   default | version  | VERSION     | 7.3.3      | {simple}  | '7.3.3'
  (5 rows)
***************
*** 2481,2487 ****
   f        |
   f        |
   f        | '345':1 'qwerti':2 'copyright':3
!  f        | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9
   f        | 'a':1A,2,3B 'b':5A,6A,7C,8
   f        | 'a':1A,2,3C 'b':5A,6B,7C,8B
   f        | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'
--- 2487,2493 ----
   f        |
   f        |
   f        | '345':1 'qwerti':2 'copyright':3
!  f        | 'qq':7 'bar':2,8 'foo':1,3,6 'the':4 'over':5 'copyright':9
   f        | 'a':1A,2,3B 'b':5A,6A,7C,8
   f        | 'a':1A,2,3C 'b':5A,6B,7C,8B
   f        | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'

======================================================================


Re: tsearch2 regression test failures

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> tsearch2 regression tests are also failing on win32/msvc, with attached
> diffs.
> Any pointers on where to start? ;)

FWIW, it looks like it failed to reject stopwords.  Is it possible you
ran it in an environment that would make it pick the Russian stopword
list?
        regards, tom lane


Re: tsearch2 regression test failures

From
Teodor Sigaev
Date:
> FWIW, it looks like it failed to reject stopwords.  Is it possible you
Right.

I suppose the problem is with '\r\n'... Try attached patch.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
*** ./contrib/tsearch2/stopword.c.orig    Mon Mar 26 14:25:16 2007
--- ./contrib/tsearch2/stopword.c    Mon Mar 26 14:28:25 2007
***************
*** 47,53 ****

          while (fgets(buf, sizeof(buf), hin))
          {
!             buf[strlen(buf) - 1] = '\0';
              pg_verifymbstr(buf, strlen(buf), false);
              if (*buf == '\0')
                  continue;
--- 47,57 ----

          while (fgets(buf, sizeof(buf), hin))
          {
!             pbuf = buf;
!             while( !isspace( *pbuf ) )
!                 pbuf++;
!             *pbuf = '\0';
!
              pg_verifymbstr(buf, strlen(buf), false);
              if (*buf == '\0')
                  continue;

Re: tsearch2 regression test failures

From
Magnus Hagander
Date:
On Mon, Mar 26, 2007 at 02:32:26PM +0400, Teodor Sigaev wrote:
> >FWIW, it looks like it failed to reject stopwords.  Is it possible you
> Right.
> 
> I suppose the problem is with '\r\n'... Try attached patch.
> -- 
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
>                                                    WWW: 
>                                                    http://www.sigaev.ru/

> *** ./contrib/tsearch2/stopword.c.orig    Mon Mar 26 14:25:16 2007
> --- ./contrib/tsearch2/stopword.c    Mon Mar 26 14:28:25 2007
> ***************
> *** 47,53 ****
>   
>           while (fgets(buf, sizeof(buf), hin))
>           {
> !             buf[strlen(buf) - 1] = '\0';
>               pg_verifymbstr(buf, strlen(buf), false);
>               if (*buf == '\0')
>                   continue;
> --- 47,57 ----
>   
>           while (fgets(buf, sizeof(buf), hin))
>           {
> !             pbuf = buf;
> !             while( !isspace( *pbuf ) )
> !                 pbuf++;
> !             *pbuf = '\0';
> ! 
>               pg_verifymbstr(buf, strlen(buf), false);
>               if (*buf == '\0')
>                   continue;


Yup, that solved the problem, thanks.

Wouldn't it be more efficiently written to walk the string backwards until
!isspace instead? Not sure that it matters at all, but then you'll
normallyi never step over more than two bytes...

//Magnus


Re: tsearch2 regression test failures

From
Teodor Sigaev
Date:
> Yup, that solved the problem, thanks.
I'll commit extended  patch - there is one more place with the same bug.

> Wouldn't it be more efficiently written to walk the string backwards until
> !isspace instead? Not sure that it matters at all, but then you'll
> normallyi never step over more than two bytes...
It doesn't significant matter - file reads once per backend lifetime.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: tsearch2 regression test failures

From
"Magnus Hagander"
Date:
> > Yup, that solved the problem, thanks.> I'll commit extended  patch - there is one more place with the same bug.

Ok, thanks.
> Wouldn't it be more efficiently written to walk the string backwards until
> > !isspace instead? Not sure that it matters at all, but then you'll
> > normallyi never step over more than two bytes...
> It doesn't significant matter - file reads once per backend lifetime.
> 

ok.

/Magnus



Re: tsearch2 regression test failures

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> !             pbuf = buf;
> !             while( !isspace( *pbuf ) )
> !                 pbuf++;
> !             *pbuf = '\0';

Surely the loop needs to look like
while (*pbuf && !isspace(*pbuf))    pbuf++;
        regards, tom lane


Re: tsearch2 regression test failures

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
>   
>> !             pbuf = buf;
>> !             while( !isspace( *pbuf ) )
>> !                 pbuf++;
>> !             *pbuf = '\0';
>>     
>
> Surely the loop needs to look like
>
>     while (*pbuf && !isspace(*pbuf))
>         pbuf++;
>
>   
Yes.

But in any case, I am having difficulty in understanding why we are 
seeing a CR at all - the file should be opened in text mode, which 
should translate CR-LF in the file to a simple LF in the buffer. So 
regardless of the odd behavior of CVSNT, which presumabye caused this 
mess, it's rather strange.

Can someone please explain?

cheers

andrew