Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION? - Mailing list pgsql-general

From David Rowley
Subject Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?
Date
Msg-id CAApHDvruPCe3Sz5M=y=zzgT71Xj8chwD-MvXTih9b934V8Wf1w@mail.gmail.com
Whole thread Raw
In response to Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
List pgsql-general
On Thu, 9 Feb 2023 at 14:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Tom's argument seems to think it's impossible, so if you find that
> > it's definitely not impossible, then you can assume he's wrong about
> > that.
>
> My point was that it seems like you'd need a separate BMH engine for
> each %-separated segment of the LIKE pattern.  I'm not quite clear on
> whether BMH can handle '_' (single-char wildcard) conveniently by
> itself, although my gut feel is that you can probably make that part
> work.  Maybe you can even extend the idea to embedded %, but that
> seems more difficult.

Yeah, I think to make it work with more complex patterns like
'%some%string%' or '%some_string%' you'd need to break that into
multiple independent searches for each portion between a wildcard
character.  For the former pattern, you'd need to do some final check
that ensures that the 2nd pattern was found in some position >= the
position of the 1st pattern + its length.  For the latter, the final
check would need to validate that the 2nd pattern was found at a
position of the first pattern + its length + 1. It's probably also
possible to make those patterns work when they don't contain the
leading and trailing % by checking that the first pattern is found at
position 0 and the end of the 2nd pattern is found at the end of the
search string.

However, I imagine going to the trouble of trying to make it work for
more complex patterns initially would be a bad idea.  I imagine there
are just too many cases where we could demonstrate performance
regressions and that would cause us to reject the patch.

David



pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?
Next
From: Vladimir Sitnikov
Date:
Subject: Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?