Re: non-static LIKE patterns - Mailing list pgsql-general

From hamann.w@t-online.de
Subject Re: non-static LIKE patterns
Date
Msg-id wolfgang-1120414084536.A0226706@amadeus3.local
Whole thread Raw
In response to Re: non-static LIKE patterns  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
>>
>> hamann.w@t-online.de writes:
>> > Tom Lane wrote:
>> > If you want it to be bulletproof, what I'd think about is something like
>> >     WHERE second.path LIKE quote_like(first.path)||'%'
>>
>> > Just out of curiosity: wouldn't that (as well as using non-static like)
>> > be an enormous performance problem?
>>
>> Well, it won't be free, but I think you've already doomed yourself to
>> a not-very-bright plan by using LIKE in this way at all.
>>
>> In any case, as a wise man once said, you can make it run arbitrarily
>> fast if it doesn't have to give the right answer.  Correctness trumps
>> any micro-optimization questions, so if you have to have prefix matching
>> of this sort, it's gonna cost ya somehow.
>>

Hi Tom,

I just stumbled across this question because I regularly come across problems that,
at first, look like they should be solved with non-static LIKE or REGEX patterns

I actually have two situations where I would need a better plan. One is, fortunately,
fairly static (mostly lookups, hardly inserts) for name matches. Many famous people
appear in different spellings, say these two musicians
Franz|Ferenc Liszt
Fr(e|y)der(ic|yk) Chopin
So the first plan would be to regex-compare the sought name against the first name (or last name)
regexes. Run-time is astronomical, though
My current approach is to
a) keep the regexes in a separate table/column, so names with a regex entry are handled in
a smaller query
b) reverse the query: for every regex (they are well-behaved in this context) I pre-create a pattern
so that my actual query becomes
where pre-made-pattern ~ searched_name
c) while preparing the pattern, a common initial character (the "F" for Franz and Ferenc) is identified
to build an index. In the rare case that the first letter is already different, there would be
two entries in the table. So the actual query can check for first letter before it does the regex.

The other situation, unfortionately, is ad-hoc queries where I cannot do that kind of preparation
typically, the DB would contain strings like XY4711A, XY271, XY17321AAA, and I want to
check whether an input like XY17321 matches a database entry up to the end of the
numerals. So I add [^0-9]*$ to the end of my candidates, select
  where candidate ~ entry-in-table
and go for a coffee or two

Of course I would prefer to see a pre-built solution do all that mess for me...

Regards
Wolfgang Hamann


pgsql-general by date:

Previous
From: Abel Abraham Camarillo Ojeda
Date:
Subject: Re: Horizontal aggregation?
Next
From: hamann.w@t-online.de
Date:
Subject: Re: Horizontal aggregation?