Thread: Does pgsql's regex processor optimize Common-Prefix?

Does pgsql's regex processor optimize Common-Prefix?

From
Kurapica
Date:
Hi all.
I am developing an application which searches for city names in a
column. There is a lot of cities and I have to 'like' every name which
is not effective enough. So I want to know whether pgsql's regex
processor can optimize regexes such as:

Nebraska|Nevada|North Carolina
to
N(e(braska|vada)|orth Carolina)

If the processor can do that like a Dictionary-Tree, it may be
affordable to me or else I have to write a matcher myself.

Any suggestion is appreciated. Thank you and appologize for my poor English.

--Xig

Re: Does pgsql's regex processor optimize Common-Prefix?

From
Alvaro Herrera
Date:
Kurapica wrote:

> I am developing an application which searches for city names in a
> column. There is a lot of cities and I have to 'like' every name which
> is not effective enough. So I want to know whether pgsql's regex
> processor can optimize regexes such as:
>
> Nebraska|Nevada|North Carolina
> to
> N(e(braska|vada)|orth Carolina)
>
> If the processor can do that like a Dictionary-Tree, it may be
> affordable to me or else I have to write a matcher myself.
>
> Any suggestion is appreciated. Thank you and appologize for my poor English.

Compared to the use of indexes to skip whole table scanning, this
optimization is going to have very little impact.  So don't worry about
it.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Does pgsql's regex processor optimize Common-Prefix?

From
Oleg Bartunov
Date:
Kurapica,

I'd use contrib/pg_trgm for your application.

Олег
On Tue, 26 Dec 2006, Alvaro Herrera wrote:

> Kurapica wrote:
>
>> I am developing an application which searches for city names in a
>> column. There is a lot of cities and I have to 'like' every name which
>> is not effective enough. So I want to know whether pgsql's regex
>> processor can optimize regexes such as:
>>
>> Nebraska|Nevada|North Carolina
>> to
>> N(e(braska|vada)|orth Carolina)
>>
>> If the processor can do that like a Dictionary-Tree, it may be
>> affordable to me or else I have to write a matcher myself.
>>
>> Any suggestion is appreciated. Thank you and appologize for my poor English.
>
> Compared to the use of indexes to skip whole table scanning, this
> optimization is going to have very little impact.  So don't worry about
> it.
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Does pgsql's regex processor optimize Common-Prefix?

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Kurapica wrote:
>> So I want to know whether pgsql's regex
>> processor can optimize regexes such as:
>> Nebraska|Nevada|North Carolina
>> to
>> N(e(braska|vada)|orth Carolina)

> Compared to the use of indexes to skip whole table scanning, this
> optimization is going to have very little impact.  So don't worry about
> it.

Well, if you were able to extract a long enough common prefix to make an
index optimization possible/useful, then it would have some value.  But
that seems unlikely.  What I think would be considerably more
interesting is a conversion to an OR form:
    state ~ '(^Nebraska)|(^Nevada)|(^North Carolina)'
to
    state ~ '^Nebraska' OR state ~ '^Nevada' OR state ~ '^North Carolina'

which could be planned as three separate, very-selective indexscans ---
unlike the rewritten version proposed above.

But Oleg's suggestion of using pg_trgm or some other full-text searching
mechanism is probably at least as good, and it requires no new coding.

            regards, tom lane