Re: Regex performance issue - Mailing list pgsql-performance

From Alexandru Coseru
Subject Re: Regex performance issue
Date
Msg-id 024d01c71675$b8fa61f0$3cb16956@alex
Whole thread Raw
In response to Re: Regex performance issue  ("Dave Dutcher" <dave@tridecap.com>)
Responses Re: Regex performance issue
List pgsql-performance
Hello..

I have never used tsearch2  , but at a first glance , i would not see any
major improvement , because the main advantage of tsearch is the splitting
in words of a phrase..
But here , i only have one word  (no spaces).


Regards
    Alex
----- Original Message -----
From: "Oleg Bartunov" <oleg@sai.msu.su>
To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
Cc: "Dave Dutcher" <dave@tridecap.com>; <pgsql-performance@postgresql.org>
Sent: Saturday, December 02, 2006 10:54 PM
Subject: Re: [PERFORM] Regex performance issue


>I may miss something but I'd use tsearch2. Check
> intdict dictionary for basic idea -
> http://www.sai.msu.su/~megera/wiki/Gendict
>
> Oleg
> On Sat, 2 Dec 2006, Alexandru Coseru wrote:
>
>> Hello...
>>
>> I cannot use LIKE , because the order of the match is reversed.
>> The prefix column is containing telephone destinations.
>> IE:    ^001  - US  , ^0039 Italy , etc..
>>
>> Here is a small sample:
>>
>> asterisk=> select * from destlist LIMIT 10;
>> id | id_ent | dir |   prefix   |   country   |      network       | tip
>> ----+--------+-----+------------+-------------+--------------------+-----
>> 1 |     -1 |   0 | (^0093)    | AFGHANISTAN | AFGHANISTAN        |   6
>> 2 |     -1 |   0 | (^00937)   | AFGHANISTAN | AFGHANISTAN Mobile |   5
>> 3 |     -1 |   0 | (^00355)   | ALBANIA     | ALBANIA            |   6
>> 4 |     -1 |   0 | (^0035538) | ALBANIA     | ALBANIA Mobile     |   5
>> 5 |     -1 |   0 | (^0035568) | ALBANIA     | ALBANIA Mobile     |   5
>> 6 |     -1 |   0 | (^0035569) | ALBANIA     | ALBANIA Mobile     |   5
>> 7 |     -1 |   0 | (^00213)   | ALGERIA     | ALGERIA            |   6
>> 8 |     -1 |   0 | (^0021361) | ALGERIA     | ALGERIA Mobile     |   5
>> 9 |     -1 |   0 | (^0021362) | ALGERIA     | ALGERIA Mobile     |   5
>> 10 |     -1 |   0 | (^0021363) | ALGERIA     | ALGERIA Mobile     |   5
>>
>>
>> Now , I have to match a dialednumber   (let's say   00213618833) and find
>> it's destination...(It's algeria mobile).
>> I tried to make with a query of using LIKE , but i was not able to get
>> something..
>>
>>
>> Regards
>>   Alex
>>
>>
>>
>>
>>
>> ----- Original Message ----- From: "Dave Dutcher" <dave@tridecap.com>
>> To: "'Alexandru Coseru'" <alexandru.coseru@totaltelecom.ro>;
>> <pgsql-performance@postgresql.org>
>> Sent: Saturday, December 02, 2006 10:36 PM
>> Subject: RE: [PERFORM] Regex performance issue
>>
>>
>>> -----Original Message-----
>>> From: pgsql-performance-owner@postgresql.org On Behalf Of Alexandru
>>> Coseru
>>> asterisk=> explain analyze SELECT * FROM destlist WHERE
>>> '0039051248787' ~
>>> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
>>>
>>>
>>> QUERY PLAN
>>> --------------------------------------------------------------
>>> ----------------------------------------------------------------------
>>>  Sort  (cost=7925.07..7925.15 rows=31 width=67) (actual
>>> time=857.715..857.716 rows=2 loops=1)
>>>    Sort Key: length((prefix)::text)
>>>    ->  Bitmap Heap Scan on destlist  (cost=60.16..7924.30
>>> rows=31 width=67)
>>> (actual time=2.156..857.686 rows=2 loops=1)
>>>          Recheck Cond: ((id_ent = -2) AND (dir = 0))
>>>          Filter: ('0039051248787'::text ~ (prefix)::text)
>>>          ->  Bitmap Index Scan on destlist_indx2  (cost=0.00..60.16
>>> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
>>>                Index Cond: ((id_ent = -2) AND (dir = 0))
>>>  Total runtime: 857.804 ms
>>> (8 rows)
>>>
>>>     "mmumu" btree (prefix varchar_pattern_ops)
>>>
>>
>> I'm surpised Postgres isn't using the index on prefix seeing as the index
>> uses the varchar_pattern_ops operator class.  It could be that the index
>> isn't selective enough, or is Postgres not able to use an index with
>> Posix
>> regular expressions?  The docs seem to say that it can, but I'd be
>> curious
>> to see what happens if you use LIKE instead of ~.
>>
>> Dave
>>
>>
>>
>>
>>
>>
>
>  Regards,
>  Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>


pgsql-performance by date:

Previous
From: "Alexandru Coseru"
Date:
Subject: Re: Regex performance issue
Next
From: Scott Marlowe
Date:
Subject: Re: Regex performance issue