Re: looking for a faster way to do that - Mailing list pgsql-general

From Eduardo Morras
Subject Re: looking for a faster way to do that
Date
Msg-id 4E5EF2B0006A4305@
Whole thread Raw
In response to Re: looking for a faster way to do that  (hamann.w@t-online.de)
List pgsql-general
At 18:18 26/09/2011, you wrote:

>Eduardo Morras <nec556@retena.com> wrote:
>
> >>
> >> At 08:04 25/09/2011, hamann.w@t-online.de wrote:
> >>
> >> > >> select * from items where regexp_matches(items.code,'(ABC) (DE1)
> >> > >> (any_substring)')<>{};
> >> > >>
> >> >
> >> >Hi Eduardo,
> >> >
> >> >it is clear that scanning the table once with a list of matches will
> >> >outperform
> >> >rescanning the table for every string wanted. Now, my problem is
> >> >that the patterns are
> >> >dynamic as well. So if I could translate a table with one
> >> >column  and a few thousand rows
> >> >into something like
> >> >regexp_matches(code,'string1|string2|.....string2781')
> >> >would ideally be a performant query. Unfortunately I have no idea
> >> >how I could achieve this
> >> >transformation inside the database. Doing it externally fails,
> >> >because any single query cannot
> >> >be more than so few characters.
> >>
> >> You can create a plsql function and pass a setof text that do it.
> >> Sorry but instead saying you What Must You Type, WMYT(c), i prefer
> >> the How Should You Do way, HSYD(c). Note that you can get the same
> >> results using other approachs (f.ex. using FTS described in chapter 12)
> >>
> >> Check this topics:
> >>
> >> Function
> >>
> Creation  http://www.postgresql.org/docs/9.0/static/sql-createfunction.html
> >>
> >> Tutorial about Function
> >> Creation  http://www.adderpit.com/practical-postgresql/x10374.htm
> >>
>
>Hi,
>
>I tried the pl/sql approach to convert the contents of that patterns
>table into a regex.
>Results: 40 seconds runtime for 9500 candidates and 815 patterns
>718 seconds for the same set of 9500 candidates, but using 4000
>patterns instead.
>So it seems that I am reaching limits of pattern match

Perhaps calling the function twice with half the values go faster.
How do you call the function? EXECUTE or SELECT? If you use EXECUTE
then the prepared plan in a previous call is ignored and is usually
faster. Don't know if in your case it run faster but you can try it.

>As for the fulltext index (and the underlying tsquery): this is an
>exact match rather than prefix
>match, so I would need to know match patterns in advance in order to
>build the index
>
>I am thinking about that anyway (because ABC1234 likely should not
>match ABC123 pattern
>in my context), but I would sort of prefer a system where I can
>state the rules when I
>see the data set, rather than having to pre-create an index.
>
>Thanks for the tutorial link :)

It's the one i used time ago. A bit old but very good one.

>It seems that the responses on my post give all sorts of input that
>will help me on other
>tasks

>Regards
>Wolfgang Hamann

pgsql-general by date:

Previous
From: "Edson Carlos Ericksson Richter"
Date:
Subject: RES: Does postgresql 9.0.4 use index on PREFIX%SUFFIX queries?
Next
From: Alan Hodgson
Date:
Subject: Re: : PostgreSQL Online Backup