Re: Working with huge amount of data. RESULTS! - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: Working with huge amount of data. RESULTS!
Date
Msg-id Pine.LNX.4.64.0802121809410.23796@sn.sai.msu.ru
Whole thread Raw
In response to Re: Working with huge amount of data. RESULTS!  (Mario Lopez <mario@lar3d.com>)
List pgsql-general
On Tue, 12 Feb 2008, Mario Lopez wrote:

> Hi!,
>
> I optimized the LIKE 'keyword%' and LIKE '%keyword' with the following
> results:
>
> # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from
> table1 where varchar_reverse(data) like varchar_reverse('%keyword');"
> real    0m0.055s
> user    0m0.011s
> sys     0m0.006s
>
>
> # time /Library/PostgreSQL8/bin/psql -U postgres -d testdb -c "select * from
> table1 where data like 'keyword%';"
> real    0m0.026s
> user    0m0.012s
> sys     0m0.006s
>
> It works flawlesly as you can see by the timings, take in consideration that
> "table1" has 100 million records. The only problem is generating the reversed
> index which takes like 20 minutes, I guess it has to do with the plperl
> function, perhaps a C function for inverting would make it up in less time.
>
> The problem is still with the LIKE '%keyword%', my problem is that I am not
> searching for Words in a dictionary fashion, suppose my "data" is random
> garbage, that it has common consecutive bytes. How could I generate a
> dictionary from this random garbage to make it easier for indexing?

suffix tree (array) would speedup '%keyword%' query, but currently it doesn't
supported by GiST extension architecture (we have it in our TODO), so I see
several ways (not tested):

1. try contrib/pg_trgm to reduce a number of candidate words
2. generate all possible substrings and use your tested approach


>
>> On Mon, Feb 11, 2008 at 04:37:24PM +0100, Mario Lopez wrote:
>>
>>> SELECT * FROM names WHERE name LIKE keyword%
>>> Or
>>> SELECT * FROM names WHERE name LIKE %keyword%
>>>
>>
>> check this:
>> http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/
>> and this:
>> http://www.depesz.com/index.php/2007/09/15/speeding-up-like-xxx/
>>
>> depesz
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>      choose an index scan if your joining column's datatypes do not
>      match
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

Previous
From: "Thomas Chille"
Date:
Subject: Some Autovacuum Questions
Next
From: Tom Lane
Date:
Subject: Re: TSearch2 Migration Guide from 8.2 to 8.3