Re: Regex performance issue - Mailing list pgsql-performance

From Oleg Bartunov
Subject Re: Regex performance issue
Date
Msg-id Pine.LNX.4.64.0612031003400.16338@sn.sai.msu.ru
Whole thread Raw
In response to Re: Regex performance issue  ("Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>)
List pgsql-performance
On Sun, 3 Dec 2006, Alexandru Coseru wrote:

> Hello..
>
> I have never used tsearch2  , but at a first glance , i would not see any
> major improvement , because the main advantage of tsearch is the splitting in
> words of a phrase..
> But here , i only have one word  (no spaces).

Oh, yes, I was confused :) What if you consider you prefix as
1.2.3.4.5.6, then you could try our contrib/ltree module.


Oleg


>
>
> Regards
>   Alex
> ----- Original Message ----- From: "Oleg Bartunov" <oleg@sai.msu.su>
> To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
> Cc: "Dave Dutcher" <dave@tridecap.com>; <pgsql-performance@postgresql.org>
> Sent: Saturday, December 02, 2006 10:54 PM
> Subject: Re: [PERFORM] Regex performance issue
>
>
>> I may miss something but I'd use tsearch2. Check
>> intdict dictionary for basic idea -
>> http://www.sai.msu.su/~megera/wiki/Gendict
>>
>> Oleg
>> On Sat, 2 Dec 2006, Alexandru Coseru wrote:
>>
>>> Hello...
>>>
>>> I cannot use LIKE , because the order of the match is reversed.
>>> The prefix column is containing telephone destinations.
>>> IE:    ^001  - US  , ^0039 Italy , etc..
>>>
>>> Here is a small sample:
>>>
>>> asterisk=> select * from destlist LIMIT 10;
>>> id | id_ent | dir |   prefix   |   country   |      network       | tip
>>> ----+--------+-----+------------+-------------+--------------------+-----
>>> 1 |     -1 |   0 | (^0093)    | AFGHANISTAN | AFGHANISTAN        |   6
>>> 2 |     -1 |   0 | (^00937)   | AFGHANISTAN | AFGHANISTAN Mobile |   5
>>> 3 |     -1 |   0 | (^00355)   | ALBANIA     | ALBANIA            |   6
>>> 4 |     -1 |   0 | (^0035538) | ALBANIA     | ALBANIA Mobile     |   5
>>> 5 |     -1 |   0 | (^0035568) | ALBANIA     | ALBANIA Mobile     |   5
>>> 6 |     -1 |   0 | (^0035569) | ALBANIA     | ALBANIA Mobile     |   5
>>> 7 |     -1 |   0 | (^00213)   | ALGERIA     | ALGERIA            |   6
>>> 8 |     -1 |   0 | (^0021361) | ALGERIA     | ALGERIA Mobile     |   5
>>> 9 |     -1 |   0 | (^0021362) | ALGERIA     | ALGERIA Mobile     |   5
>>> 10 |     -1 |   0 | (^0021363) | ALGERIA     | ALGERIA Mobile     |   5
>>>
>>>
>>> Now , I have to match a dialednumber   (let's say   00213618833) and find
>>> it's destination...(It's algeria mobile).
>>> I tried to make with a query of using LIKE , but i was not able to get
>>> something..
>>>
>>>
>>> Regards
>>>   Alex
>>>
>>>
>>>
>>>
>>>
>>> ----- Original Message ----- From: "Dave Dutcher" <dave@tridecap.com>
>>> To: "'Alexandru Coseru'" <alexandru.coseru@totaltelecom.ro>;
>>> <pgsql-performance@postgresql.org>
>>> Sent: Saturday, December 02, 2006 10:36 PM
>>> Subject: RE: [PERFORM] Regex performance issue
>>>
>>>
>>>> -----Original Message-----
>>>> From: pgsql-performance-owner@postgresql.org On Behalf Of Alexandru
>>>> Coseru
>>>> asterisk=> explain analyze SELECT * FROM destlist WHERE
>>>> '0039051248787' ~
>>>> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
>>>>
>>>>
>>>> QUERY PLAN
>>>> --------------------------------------------------------------
>>>> ----------------------------------------------------------------------
>>>>  Sort  (cost=7925.07..7925.15 rows=31 width=67) (actual
>>>> time=857.715..857.716 rows=2 loops=1)
>>>>    Sort Key: length((prefix)::text)
>>>>    ->  Bitmap Heap Scan on destlist  (cost=60.16..7924.30
>>>> rows=31 width=67)
>>>> (actual time=2.156..857.686 rows=2 loops=1)
>>>>          Recheck Cond: ((id_ent = -2) AND (dir = 0))
>>>>          Filter: ('0039051248787'::text ~ (prefix)::text)
>>>>          ->  Bitmap Index Scan on destlist_indx2  (cost=0.00..60.16
>>>> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
>>>>                Index Cond: ((id_ent = -2) AND (dir = 0))
>>>>  Total runtime: 857.804 ms
>>>> (8 rows)
>>>>
>>>>     "mmumu" btree (prefix varchar_pattern_ops)
>>>>
>>>
>>> I'm surpised Postgres isn't using the index on prefix seeing as the index
>>> uses the varchar_pattern_ops operator class.  It could be that the index
>>> isn't selective enough, or is Postgres not able to use an index with Posix
>>> regular expressions?  The docs seem to say that it can, but I'd be curious
>>> to see what happens if you use LIKE instead of ~.
>>>
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>  Regards,
>>  Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>>
>>
>> --
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Regex performance issue
Next
From: Joost Kraaijeveld
Date:
Subject: Which query analiser tools are available?