Re: looking for a faster way to do that - Mailing list pgsql-general

From hamann.w@t-online.de
Subject Re: looking for a faster way to do that
Date
Msg-id wolfgang-1110925080443.A0225425@amadeus3.local
Whole thread Raw
In response to Re: looking for a faster way to do that  (Eduardo Morras <nec556@retena.com>)
Responses Re: looking for a faster way to do that
Re: looking for a faster way to do that
Re: looking for a faster way to do that
List pgsql-general
Eduardo Morras wrote:

>> >
>> >Hi,
>> >
>> >if I understand this right, it does not mean "check if the string
>> >appears at position 0"
>> >which could translate into an index query, but rather "check if the
>> >string appears anywhere
>> >and then check if that is position 0", so the entire table is checked.
>>
>> The second one yes, as it checks all patterns you want only one time
>> per row they only needs one table scan. The first one eliminates the
>> substring 'ABC' from the string, if the lengths of both strings are
>> equal, the substring 'ABC' wasn't in it. If they are different, the
>> trimmed string will be shorter.
>>
>> >explain analyze select items.num, wantcode from items, n where
>> >strpos(code, wantcode) = 0;
>> >  Nested Loop  (cost=167.14..196066.54 rows=39178 width=36) (actual
>> > time=0.074..36639.312 rows=7832539 loops=1)
>> >    Join Filter: (strpos(("inner".code)::text, "outer".wantcode) = 0)
>> >    ->  Seq Scan on n  (cost=0.00..14.15 rows=815 width=32) (actual
>> > time=0.005..2.212 rows=815 loops=1)
>> >    ->  Materialize  (cost=167.14..263.28 rows=9614 width=42)
>> > (actual time=0.007..13.970 rows=9614 loops=815)
>> >          ->  Seq Scan on items  (cost=0.00..167.14 rows=9614
>> > width=42) (actual time=0.044..14.855 rows=9614 loops=1)
>> >  Total runtime: 46229.836 ms
>> >
>> >
>> >The query ran much faster than the pattern query, however. This
>> >seems to be the performance
>> >of just searching for a plain string vs. initializing the regex
>> >engine every time (for 815
>> >queries in a test set)
>>
>> It will do only one table scan while your original code will do one
>> for each substring you want to test. You can add more and more
>> substrings without too much cost. If you want to use the regex engine
>> instead the postgresql string funtions check the regexp_matches(), it
>> should be faster if you have 3000 substrings.
>>
>> select * from items where regexp_matches(items.code,'(ABC) (DE1)
>> (any_substring)')<>{};
>>

Hi Eduardo,

it is clear that scanning the table once with a list of matches will outperform
rescanning the table for every string wanted. Now, my problem is that the patterns are
dynamic as well. So if I could translate a table with one column  and a few thousand rows
into something like
regexp_matches(code,'string1|string2|.....string2781')
would ideally be a performant query. Unfortunately I have no idea how I could achieve this
transformation inside the database. Doing it externally fails, because any single query cannot
be more than so few characters.

Regards
Wolfgang Hamann


pgsql-general by date:

Previous
From: Albretch Mueller
Date:
Subject: Re: (another ;-)) PostgreSQL-derived project ...
Next
From: Raghavendra
Date:
Subject: Re: In which case PG_VERSION file updates ?