Home > mailing lists

Re: best practise/pattern for large OR / LIKE searches - Mailing list pgsql-general

From	Pavel Stehule
Subject	Re: best practise/pattern for large OR / LIKE searches
Date	August 30, 2009 10:17:40
Msg-id	162867790908300617u794ba134k2dfa0c7ebad32c8b@mail.gmail.com Whole thread Raw
In response to	Re: best practise/pattern for large OR / LIKE searches (Jasen Betts <jasen@xnet.co.nz>)
List	pgsql-general

Tree view

Hello

> regex is compiled to a finite state machine and then the datanumber
> column is scanned in a single pass (for each row)
>
>> Searches are currently taking to long and we would like to optimize
>> them, but before we dive into our own solution we
>> where wondering if there already common solutions for this...
>
> try regex first if that's too slow you may need to write a
> dictionary function that splits datanuimber into it's components
> and use full text index/search. (this will slow down updates as they will do
> upto 20 inserts into the index)
>
> searches should then be optimally fast
>

I did some tests:

1) I fill test table
insert into test SELECT
array_to_string(array_agg(array_to_string(ARRAY(select
substring('01234567890' from (random()*10)::int + 1 for 1) from
generate_series(1,(random()*10+5)::int + i - i)),'')),',') as b from
generate_series(1,100000) g(i) group by (random()*1000)::int;

2. I tested searching of 5 or 13 values. I did tests on 8.4 and 8.1

8.1
using like 190ms(440ms*)
using regexp 115ms(259ms*)

* for 13 values - so there regexp is faster than like

on 8.4
using like 80ms(151ms)
using regexp 131ms(267ms)

so like is faster then regexp on 8.4.

fulltext test (8.4)
420ms(470ms) -- without index
14ms(26ms) -- with GiST index
1ms(2ms) -- with Gin index

some samples of test queries:
select * from test where to_tsvector('simple',a) @@

to_tsquery('simple','296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323');

select * from test where a ~

'296426496|7707431116555|98173598191|302598|53174827|02292064629|188631468777|4756243248|920473|16602317|76613513|78640|9176323';

select * from test where a like '%296426496%' or a like '%
7707431116555%' or a like '%98173598191%' or a like '%302598%' or a
like '%53174827%' or a like '%02292064629%' or a like '%188631468777%'
or a like '%4756243248%' or a like '%920473%' or a like '%16602317%'
or a like '%76613513%' or a like '%78640%' or a like '%9176323%';

regards
Pavel Stehule

pgsql-general by date:

From: Stephen Cuppett
Date: 30 August 2009, 10:12:08
Subject: Re: Trouble using TG_TABLE_NAME in BEFORE INSERT OR UPDATE trigger

From: Pavel Stehule
Date: 30 August 2009, 10:22:29
Subject: Re: Trouble using TG_TABLE_NAME in BEFORE INSERT OR UPDATE trigger

Re: best practise/pattern for large OR / LIKE searches - Mailing list pgsql-general

Previous

Next