Re: Regex performance issue - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: Regex performance issue |
Date | |
Msg-id | Pine.LNX.4.64.0612031003400.16338@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Regex performance issue ("Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>) |
List | pgsql-performance |
On Sun, 3 Dec 2006, Alexandru Coseru wrote: > Hello.. > > I have never used tsearch2 , but at a first glance , i would not see any > major improvement , because the main advantage of tsearch is the splitting in > words of a phrase.. > But here , i only have one word (no spaces). Oh, yes, I was confused :) What if you consider you prefix as 1.2.3.4.5.6, then you could try our contrib/ltree module. Oleg > > > Regards > Alex > ----- Original Message ----- From: "Oleg Bartunov" <oleg@sai.msu.su> > To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro> > Cc: "Dave Dutcher" <dave@tridecap.com>; <pgsql-performance@postgresql.org> > Sent: Saturday, December 02, 2006 10:54 PM > Subject: Re: [PERFORM] Regex performance issue > > >> I may miss something but I'd use tsearch2. Check >> intdict dictionary for basic idea - >> http://www.sai.msu.su/~megera/wiki/Gendict >> >> Oleg >> On Sat, 2 Dec 2006, Alexandru Coseru wrote: >> >>> Hello... >>> >>> I cannot use LIKE , because the order of the match is reversed. >>> The prefix column is containing telephone destinations. >>> IE: ^001 - US , ^0039 Italy , etc.. >>> >>> Here is a small sample: >>> >>> asterisk=> select * from destlist LIMIT 10; >>> id | id_ent | dir | prefix | country | network | tip >>> ----+--------+-----+------------+-------------+--------------------+----- >>> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6 >>> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5 >>> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6 >>> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5 >>> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5 >>> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5 >>> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6 >>> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5 >>> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5 >>> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5 >>> >>> >>> Now , I have to match a dialednumber (let's say 00213618833) and find >>> it's destination...(It's algeria mobile). >>> I tried to make with a query of using LIKE , but i was not able to get >>> something.. >>> >>> >>> Regards >>> Alex >>> >>> >>> >>> >>> >>> ----- Original Message ----- From: "Dave Dutcher" <dave@tridecap.com> >>> To: "'Alexandru Coseru'" <alexandru.coseru@totaltelecom.ro>; >>> <pgsql-performance@postgresql.org> >>> Sent: Saturday, December 02, 2006 10:36 PM >>> Subject: RE: [PERFORM] Regex performance issue >>> >>> >>>> -----Original Message----- >>>> From: pgsql-performance-owner@postgresql.org On Behalf Of Alexandru >>>> Coseru >>>> asterisk=> explain analyze SELECT * FROM destlist WHERE >>>> '0039051248787' ~ >>>> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC; >>>> >>>> >>>> QUERY PLAN >>>> -------------------------------------------------------------- >>>> ---------------------------------------------------------------------- >>>> Sort (cost=7925.07..7925.15 rows=31 width=67) (actual >>>> time=857.715..857.716 rows=2 loops=1) >>>> Sort Key: length((prefix)::text) >>>> -> Bitmap Heap Scan on destlist (cost=60.16..7924.30 >>>> rows=31 width=67) >>>> (actual time=2.156..857.686 rows=2 loops=1) >>>> Recheck Cond: ((id_ent = -2) AND (dir = 0)) >>>> Filter: ('0039051248787'::text ~ (prefix)::text) >>>> -> Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16 >>>> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1) >>>> Index Cond: ((id_ent = -2) AND (dir = 0)) >>>> Total runtime: 857.804 ms >>>> (8 rows) >>>> >>>> "mmumu" btree (prefix varchar_pattern_ops) >>>> >>> >>> I'm surpised Postgres isn't using the index on prefix seeing as the index >>> uses the varchar_pattern_ops operator class. It could be that the index >>> isn't selective enough, or is Postgres not able to use an index with Posix >>> regular expressions? The docs seem to say that it can, but I'd be curious >>> to see what happens if you use LIKE instead of ~. >>> >>> Dave >>> >>> >>> >>> >>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> >> >> >> -- >> No virus found in this incoming message. >> Checked by AVG Free Edition. >> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006 >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-performance by date: