Thread: indices: ~* / text_ops

indices: ~* / text_ops

From
Bernhard Lorenz
Date:
hullo,

ive two serious problems with postgresql (latest version):

1.) i seem to be unable (well, i _am_ ;-) to create an index
    on a text field and then have that index being used if
    i perform a search using ~*, ~~, and related operators.
    i found out that these operators can only successfully
    be implemented if i use box* field types etc. this is a real
    pain, since i have that database with more than 70,000
    entries and it always does a sequential scan.
    a query might look like
       "select * from table where field ~* 'string'" or
       "select * from table where field ~~ '%string%' etc.
    there are indices (hash, btree, (field text_ops)), but
    they wont be used.
    can anybody of you possibly help me any further on this issue?

2.) another thing i noticed is that while "~*" is supposed to
    perform a case insensitive search, it does not. i havent
    checked too much into the bug behind it (like "all ascii
    values > 127 ..."), but the typical austrian and german
    characters like "ae" with two dots (forgot the terminologically
    exact name, forgive me) etc.  will not be searched
    properly, thus, if i have a field entry with
       'AEyadayada'
    and perform a search with
    ... ~* 'aeyadayada',
    it wont find anything, i have to use ~* 'AEyadayada'.
    opinions on that one woul dbe appreciated as well!



best regards,

+bl.

--
Bernhard Lorenz                           Managing Partner
!C Internet Consult                http://www.iconsult.at/
Pacassistrasse 32, A-1130 Wien             +43/1/319 09 90
Aichholzgasse 6/5, A-1120 Wien             +43/1/817 39 23

Re: [HACKERS] indices: ~* / text_ops

From
Bruce Momjian
Date:
[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> hullo,
>
> ive two serious problems with postgresql (latest version):
>
> 1.) i seem to be unable (well, i _am_ ;-) to create an index
>     on a text field and then have that index being used if
>     i perform a search using ~*, ~~, and related operators.
>     i found out that these operators can only successfully
>     be implemented if i use box* field types etc. this is a real
>     pain, since i have that database with more than 70,000
>     entries and it always does a sequential scan.
>     a query might look like
>        "select * from table where field ~* 'string'" or
>        "select * from table where field ~~ '%string%' etc.
>     there are indices (hash, btree, (field text_ops)), but
>     they wont be used.
>     can anybody of you possibly help me any further on this issue?

Indexes on strings can only be used if the start of the string is
anchored with ^ because the index only sorts the strings starting with
the first character.

I have added this to the FAQ, which is on the web site.

>
> 2.) another thing i noticed is that while "~*" is supposed to
>     perform a case insensitive search, it does not. i havent
>     checked too much into the bug behind it (like "all ascii
>     values > 127 ..."), but the typical austrian and german
>     characters like "ae" with two dots (forgot the terminologically
>     exact name, forgive me) etc.  will not be searched
>     properly, thus, if i have a field entry with
>        'AEyadayada'
>     and perform a search with
>     ... ~* 'aeyadayada',
>     it wont find anything, i have to use ~* 'AEyadayada'.
>     opinions on that one woul dbe appreciated as well!
>

Not sure how we handle this.  I don't think we handle this, but we
should, but it is hard.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026