The function
static int32 text_position(text *t1, text *t2, int matchnum)
defined in src/backend/utils/adt/varlena.c uses repeated calls to strncmp (or
pg_wchar_strncmp) to find the location of the pattern in the text. The worst
case runtime for such an approach is O(n*m) where n and m are the lengths of the
pattern and text. The best case would be O(n), I guess, because it only takes n
comparisons to find the pattern at the very start of the text. I'm not sure how
to determine the average case runtime, because it depends what your data looks
like on average.
The fastest worst-case algorithmic solutions (boyer-moore and similar) have an
O(n+m) worst-case runtime.
In practice, the current iterative strncmp solution may be faster than the
boyer-moore solution for average data, because the data may not be constructed
in such a way as to trigger worst-case or nearly worst-case run times. The
current solution also has the advantage of not requiring a palloc of O(n) space
for the pattern preprocessing that boyer-moore requires.
My question is, would the core development team entertain the idea of changing
this function if I supplied the new code?
Thanks,
mark