Home > mailing lists

Re: text_position worst case runtime - Mailing list pgsql-hackers

From	Mark Dilger
Subject	Re: text_position worst case runtime
Date	May 18, 2006 22:49:55
Msg-id	446D2432.3030201@markdilger.com Whole thread Raw
In response to	Re: text_position worst case runtime (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: text_position worst case runtime
List	pgsql-hackers

Tree view

Tom Lane wrote:
> Mark Dilger <pgsql@markdilger.com> writes:
> 
>>The function
>>  static int32 text_position(text *t1, text *t2, int matchnum)
>>defined in src/backend/utils/adt/varlena.c uses repeated calls to
>>strncmp (or pg_wchar_strncmp) to find the location of the pattern in
>>the text.  The worst case runtime for such an approach is O(n*m) where
>>n and m are the lengths of the pattern and text.  The best case would
>>be O(n), I guess, because it only takes n comparisons to find the
>>pattern at the very start of the text.  I'm not sure how to determine
>>the average case runtime, because it depends what your data looks like
>>on average.
> 
> 
> I would think that the worst-case times would be fairly improbable.
> I'm disinclined to push something as complicated as Boyer-Moore matching
> into this function without considerable evidence that it's a performance
> bottleneck for real applications.

A common approach in biological data applications is to store nucleic and amino
acid sequences as text in a relational database.  The smaller alphabet sizes and
the tendency for redundancy in these sequences increases the likelihood of a
performance problem.  I have solved this problem by writing my own data types
with their own functions for sequence comparison and alignment, and I used
boyer-moore for some of that work.  Whether the same technique should be used
for the text and varchar types was unclear to me, hence the question.

> The question that comes to mind for me is why we're not using simple
> strncmp in all cases in that code.  The conversion to pg_wchar format
> looks expensive and unnecessary ...

pgsql-hackers by date:

From: Mark Kirkwood
Date: 18 May 2006, 22:42:06
Subject: Re: [OT] MySQL is bad, but THIS bad?

From: Tom Lane
Date: 18 May 2006, 23:33:06
Subject: TupleDesc refcounting, again

Re: text_position worst case runtime - Mailing list pgsql-hackers

Previous

Next