Hello
> regex is compiled to a finite state machine and then the datanumber
> column is scanned in a single pass (for each row)
>
>> Searches are currently taking to long and we would like to optimize
>> them, but before we dive into our own solution we
>> where wondering if there already common solutions for this...
>
> try regex first if that's too slow you may need to write a
> dictionary function that splits datanuimber into it's components
> and use full text index/search. (this will slow down updates as they will do
> upto 20 inserts into the index)
>
> searches should then be optimally fast
>
I did some tests:
1) I fill test table
insert into test SELECT
array_to_string(array_agg(array_to_string(ARRAY(select
substring('01234567890' from (random()*10)::int + 1 for 1) from
generate_series(1,(random()*10+5)::int + i - i)),'')),',') as b from
generate_series(1,100000) g(i) group by (random()*1000)::int;
2. I tested searching of 5 or 13 values. I did tests on 8.4 and 8.1
8.1
using like 190ms(440ms*)
using regexp 115ms(259ms*)
* for 13 values - so there regexp is faster than like
on 8.4
using like 80ms(151ms)
using regexp 131ms(267ms)
so like is faster then regexp on 8.4.
fulltext test (8.4)
420ms(470ms) -- without index
14ms(26ms) -- with GiST index
1ms(2ms) -- with Gin index
some samples of test queries:
select * from test where to_tsvector('simple',a) @@
to_tsquery('simple','296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323');
select * from test where a ~
'296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323';
select * from test where a like '%296426496%' or a like '%
7707431116555%' or a like '%98173598191%' or a like '%302598%' or a
like '%53174827%' or a like '%02292064629%' or a like '%188631468777%'
or a like '%4756243248%' or a like '%920473%' or a like '%16602317%'
or a like '%76613513%' or a like '%78640%' or a like '%9176323%';
regards
Pavel Stehule