Re: Regex performance issue - Mailing list pgsql-performance

From Alexandru Coseru
Subject Re: Regex performance issue
Date
Msg-id 024601c71675$81277560$3cb16956@alex
Whole thread Raw
In response to Re: Regex performance issue  ("Dave Dutcher" <dave@tridecap.com>)
Responses Re: Regex performance issue
Re: Regex performance issue
List pgsql-performance
Hello..

Thanks for the tip , i think i have got the ideea..

I'm too tired too , and i will try it tommorow.


Anyway , anybody has a clue why this regex is that CPU intensive ?  I did
not saw the light on my drives blinking , and also vmstat doesn't yeld any
blocks in or out...
And how can it be optimized ?

Is there a way to trace the system calls ?
strace doesn't give me anything else but some lseeks and reads...


PS: Tried it with a 8.2 snaphsot and the result is the same..

Regards
    Alex
----- Original Message -----
From: "Heikki Linnakangas" <heikki@enterprisedb.com>
To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
Cc: <pgsql-performance@postgresql.org>
Sent: Sunday, December 03, 2006 12:35 AM
Subject: Re: [PERFORM] Regex performance issue


> Alexandru Coseru wrote:
>> Hello..
>>
>> I cannot use the first advice , because i'm not aware of the prefix
>> length in the database...
>> This is why i'm ordering after length(prefix)..
>>
>>
>> On the 2nd one , i'm not sure that i can follow you..
>
> Ok, let me try again :)
>
>> asterisk=> select * from destlist LIMIT 10;
>> id | id_ent | dir |   prefix   |   country   |      network       | tip
>> ----+--------+-----+------------+-------------+--------------------+-----
>>  1 |     -1 |   0 | (^0093)    | AFGHANISTAN | AFGHANISTAN        |   6
>>  2 |     -1 |   0 | (^00937)   | AFGHANISTAN | AFGHANISTAN Mobile |   5
>>  3 |     -1 |   0 | (^00355)   | ALBANIA     | ALBANIA            |   6
>>  4 |     -1 |   0 | (^0035538) | ALBANIA     | ALBANIA Mobile     |   5
>>  5 |     -1 |   0 | (^0035568) | ALBANIA     | ALBANIA Mobile     |   5
>>  6 |     -1 |   0 | (^0035569) | ALBANIA     | ALBANIA Mobile     |   5
>>  7 |     -1 |   0 | (^00213)   | ALGERIA     | ALGERIA            |   6
>>  8 |     -1 |   0 | (^0021361) | ALGERIA     | ALGERIA Mobile     |   5
>>  9 |     -1 |   0 | (^0021362) | ALGERIA     | ALGERIA Mobile     |   5
>> 10 |     -1 |   0 | (^0021363) | ALGERIA     | ALGERIA Mobile     |   5
>
> Store the prefix in a character column, without the regex stuff. Like
> below. I've removed the columns that are not relevant, in fact it would
> make sense to store them in another table, and have just the id and prefix
> in this table.
>
> id | prefix  | network
> ---+---------+--------------------
>  1 | 0093    | AFGHANISTAN
>  2 | 00937   | AFGHANISTAN Mobile
>  3 | 00355   | ALBANIA
>  4 | 0035538 | ALBANIA Mobile
>  5 | 0035568 | ALBANIA Mobile
>  6 | 0035569 | ALBANIA Mobile
>  7 | 00213   | ALGERIA
>  8 | 0021361 | ALGERIA Mobile
>  9 | 0021362 | ALGERIA Mobile
> 10 | 0021363 | ALGERIA Mobile
>
> Now, add the rows marked with start below:
>
>  id | prefix  | network
> ----+---------+--------------------
>   1 | 0093    | AFGHANISTAN
>   2 | 00937   | AFGHANISTAN Mobile
> * 1 | 00938   | AFGHANISTAN
>   3 | 00355   | ALBANIA
>   4 | 0035538 | ALBANIA Mobile
> * 3 | 0035539 | ALBANIA
>   5 | 0035568 | ALBANIA Mobile
>   6 | 0035569 | ALBANIA Mobile
> * 3 | 003557  | ALBANIA
>   7 | 00213   | ALGERIA
>   8 | 0021361 | ALGERIA Mobile
>   9 | 0021362 | ALGERIA Mobile
>  10 | 0021363 | ALGERIA Mobile
> * 7 | 0021364 | ALGERIA
>
> The added rows make it unnecessary to use regex for the searches. You can
> use just the >= operator like this: (using the example number you gave)
>
> SELECT id FROM destlist WHERE '00213618833' >= prefix ORDER BY prefix
> LIMIT 1
>
> Which would return id 7. As another example, a query of "00213654321"
> would match the last row, which again has id 7.
>
> I'm too tired to figure out the exact algorithm for adding the rows, but
> I'm pretty sure it can be automated... The basic idea is that when there's
> a row with id A and prefix XXXX, and another row with id B and prefix
> XXXXY, we add another row with id A and prefix XXXX(Y+1).
>
> --
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>


pgsql-performance by date:

Previous
From: "Heikki Linnakangas"
Date:
Subject: Re: Regex performance issue
Next
From: "Alexandru Coseru"
Date:
Subject: Re: Regex performance issue