Re: Regex performance issue - Mailing list pgsql-performance
From | Alexandru Coseru |
---|---|
Subject | Re: Regex performance issue |
Date | |
Msg-id | 024601c71675$81277560$3cb16956@alex Whole thread Raw |
In response to | Re: Regex performance issue ("Dave Dutcher" <dave@tridecap.com>) |
Responses |
Re: Regex performance issue
Re: Regex performance issue |
List | pgsql-performance |
Hello.. Thanks for the tip , i think i have got the ideea.. I'm too tired too , and i will try it tommorow. Anyway , anybody has a clue why this regex is that CPU intensive ? I did not saw the light on my drives blinking , and also vmstat doesn't yeld any blocks in or out... And how can it be optimized ? Is there a way to trace the system calls ? strace doesn't give me anything else but some lseeks and reads... PS: Tried it with a 8.2 snaphsot and the result is the same.. Regards Alex ----- Original Message ----- From: "Heikki Linnakangas" <heikki@enterprisedb.com> To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro> Cc: <pgsql-performance@postgresql.org> Sent: Sunday, December 03, 2006 12:35 AM Subject: Re: [PERFORM] Regex performance issue > Alexandru Coseru wrote: >> Hello.. >> >> I cannot use the first advice , because i'm not aware of the prefix >> length in the database... >> This is why i'm ordering after length(prefix).. >> >> >> On the 2nd one , i'm not sure that i can follow you.. > > Ok, let me try again :) > >> asterisk=> select * from destlist LIMIT 10; >> id | id_ent | dir | prefix | country | network | tip >> ----+--------+-----+------------+-------------+--------------------+----- >> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6 >> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5 >> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6 >> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5 >> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5 >> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5 >> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6 >> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5 >> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5 >> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5 > > Store the prefix in a character column, without the regex stuff. Like > below. I've removed the columns that are not relevant, in fact it would > make sense to store them in another table, and have just the id and prefix > in this table. > > id | prefix | network > ---+---------+-------------------- > 1 | 0093 | AFGHANISTAN > 2 | 00937 | AFGHANISTAN Mobile > 3 | 00355 | ALBANIA > 4 | 0035538 | ALBANIA Mobile > 5 | 0035568 | ALBANIA Mobile > 6 | 0035569 | ALBANIA Mobile > 7 | 00213 | ALGERIA > 8 | 0021361 | ALGERIA Mobile > 9 | 0021362 | ALGERIA Mobile > 10 | 0021363 | ALGERIA Mobile > > Now, add the rows marked with start below: > > id | prefix | network > ----+---------+-------------------- > 1 | 0093 | AFGHANISTAN > 2 | 00937 | AFGHANISTAN Mobile > * 1 | 00938 | AFGHANISTAN > 3 | 00355 | ALBANIA > 4 | 0035538 | ALBANIA Mobile > * 3 | 0035539 | ALBANIA > 5 | 0035568 | ALBANIA Mobile > 6 | 0035569 | ALBANIA Mobile > * 3 | 003557 | ALBANIA > 7 | 00213 | ALGERIA > 8 | 0021361 | ALGERIA Mobile > 9 | 0021362 | ALGERIA Mobile > 10 | 0021363 | ALGERIA Mobile > * 7 | 0021364 | ALGERIA > > The added rows make it unnecessary to use regex for the searches. You can > use just the >= operator like this: (using the example number you gave) > > SELECT id FROM destlist WHERE '00213618833' >= prefix ORDER BY prefix > LIMIT 1 > > Which would return id 7. As another example, a query of "00213654321" > would match the last row, which again has id 7. > > I'm too tired to figure out the exact algorithm for adding the rows, but > I'm pretty sure it can be automated... The basic idea is that when there's > a row with id A and prefix XXXX, and another row with id B and prefix > XXXXY, we add another row with id A and prefix XXXX(Y+1). > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > > > -- > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006 >
pgsql-performance by date: