Thread: Regexps -- too complex?

Regexps -- too complex?

From
"Emils Klotins"
Date:
Running 7.0.2 on Alpha/RedHat 6.2 256MB RAM

In order to implement a fulltext search, I have a func that parses list 
of words and creates a regexp query with things like [[:<:]]( word | 
word | ... )[[:>:]]

That query then is passed to backend...

Now the strange thing:

gamenet=# SELECT id, title,publishdate,categoryid FROM articles WHERE
translate(title,'abcdefghijklmnopqrstuvwxyzâèçìîíïòðûþõäöü','ABCDEFGHIJKLMNOPQRSTUVWXYZÂÈÇÌÎÍÏÒÐÛÞÕÄÖÜ')~
'(BLACK|SERIOUS|SAM)[[:>:]]'::text; id  |                      title                      | publishdate | categoryid
 
------+-------------------------------------------------+-------------+------------ 600 | Serious Sam ceïâ pie
pircçjiem                 | 2001-03-22  |        149 523 | Black & White gaidîðanas svçtki                 | 2001-03-19
|        155 241 | Lorgaine: The Black Standard - íeltu varoòeposs | 2001-02-27  |        155 707 | Lorgaine: The Black
Standardbeta versija       | 2001-03-23  |        1561484 | Black&White tomçr neesot spiegu programma   | 2001-04-18  |
      1551490 | Black & White FAQ                               | 2001-04-18  |        1601496 | Black & White
"ïaunais"FAQ                     | 2001-04-18  |        1601732 | Black & White - pârdotâkâ spçle ASV             |
2001-04-24 |        155
 
(8 rows)


gamenet=# SELECT id, title,publishdate,categoryid FROM articles WHERE
translate(title,'abcdefghijklmnopqrstuvwxyzâèçìîíïòðûþõäöü','ABCDEFGHIJKLMNOPQRSTUVWXYZÂÈÇÌÎÍÏÒÐÛÞÕÄÖÜ')~
'(BLACK|SERIOUS|WHITE|SAM)[[:>:]]'::text;id | title | publishdate | categoryid
 
----+-------+-------------+------------
(0 rows)


It seems that if the regexp is too complex (more than 3 |-ed 
elements) it doesnt return.

Any ideas?



Re: Regexps -- too complex?

From
"Emils Klotins"
Date:
> SELECT id, title,publishdate,categoryid FROM articles WHERE
> upper(title) ~ '(BLACK|SERIOUS|SAM)[[:>:]]'::text ;
> 
> I think the proiblem is in trnsalte, not in regexp
> 
> If you have installed apprporiate character encoding in Postgres,
> 'upper' will work!
> 
>  Vladimir

Thanks for the advice, unfortunately, it does not seem to work that 
way.


CREATE TABLE "test" (       "title" text
);
COPY "test" FROM stdin;
Serious Sam ceïâ pie pircçjiem
Black & White gaidîðanas svçtki
Lorgaine: The Black Standard - íeltu varoòeposs
Lorgaine: The Black Standard beta versija
Black&White tomçr neesot spiegu programma
Black & White FAQ
Black & White "ïaunais" FAQ
Black & White - pârdotâkâ spçle ASV
\.


SELECT title FROM test WHERE title ~ '(BLACK|WHITE|SAM)';

yields 8 rows.

SELECT title FROM test WHERE title ~ 
'(BLACK|WHITE|blahblah|SAM)'; 

yields 0 rows!

SELECT title FROM test WHERE title ~ '(BLACK|WHITE|SAM) *'; 
also yields 0 rows!


I dont think this is right no matter what the characters I am using 
there. At least it shouldn't, should it?

Emils


Re: Regexps -- too complex?

From
Tom Lane
Date:
"Emils Klotins" <emils@grafton.lv> writes:
> Running 7.0.2 on Alpha/RedHat 6.2 256MB RAM

Update to 7.1.  7.0.* has a lot of portability problems on Alphas,
and one of them is that regexps with between 33 and 64 states don't
work (int vs long problem...)
        regards, tom lane