Home > mailing lists

Re: looking for a faster way to do that - Mailing list pgsql-general

From	hamann.w@t-online.de
Subject	Re: looking for a faster way to do that
Date	September 26, 2011 13:18:21
Msg-id	wolfgang-1110926181807.A0430472@amadeus3.local Whole thread Raw
In response to	looking for a faster way to do that (hamann.w@t-online.de)
Responses	Re: looking for a faster way to do that Re: looking for a faster way to do that
List	pgsql-general

Tree view

Eduardo Morras <nec556@retena.com> wrote:

>>
>> At 08:04 25/09/2011, hamann.w@t-online.de wrote:
>>
>> > >> select * from items where regexp_matches(items.code,'(ABC) (DE1)
>> > >> (any_substring)')<>{};
>> > >>
>> >
>> >Hi Eduardo,
>> >
>> >it is clear that scanning the table once with a list of matches will
>> >outperform
>> >rescanning the table for every string wanted. Now, my problem is
>> >that the patterns are
>> >dynamic as well. So if I could translate a table with one
>> >column  and a few thousand rows
>> >into something like
>> >regexp_matches(code,'string1|string2|.....string2781')
>> >would ideally be a performant query. Unfortunately I have no idea
>> >how I could achieve this
>> >transformation inside the database. Doing it externally fails,
>> >because any single query cannot
>> >be more than so few characters.
>>
>> You can create a plsql function and pass a setof text that do it.
>> Sorry but instead saying you What Must You Type, WMYT(c), i prefer
>> the How Should You Do way, HSYD(c). Note that you can get the same
>> results using other approachs (f.ex. using FTS described in chapter 12)
>>
>> Check this topics:
>>
>> Function
>> Creation  http://www.postgresql.org/docs/9.0/static/sql-createfunction.html
>>
>> Tutorial about Function
>> Creation  http://www.adderpit.com/practical-postgresql/x10374.htm
>>

Hi,

I tried the pl/sql approach to convert the contents of that patterns table into a regex.
Results: 40 seconds runtime for 9500 candidates and 815 patterns
718 seconds for the same set of 9500 candidates, but using 4000 patterns instead.
So it seems that I am reaching limits of pattern match


As for the fulltext index (and the underlying tsquery): this is an exact match rather than prefix
match, so I would need to know match patterns in advance in order to build the index

I am thinking about that anyway (because ABC1234 likely should not match ABC123 pattern
in my context), but I would sort of prefer a system where I can state the rules when I
see the data set, rather than having to pre-create an index.

Thanks for the tutorial link :)
It seems that the responses on my post give all sorts of input that will help me on other
tasks

Regards
Wolfgang Hamann

pgsql-general by date:

From: Andy Chambers
Date: 26 September 2011, 13:05:26
Subject: Batching up data into groups of n rows

From: hamann.w@t-online.de
Date: 26 September 2011, 13:18:25
Subject: Re: looking for a faster way to do that

Re: looking for a faster way to do that - Mailing list pgsql-general

Previous

Next