Re: Regexps - never completing join. - Mailing list pgsql-performance
From | Rusty Conover |
---|---|
Subject | Re: Regexps - never completing join. |
Date | |
Msg-id | DEA15B6C-2C0B-4139-964E-2BC6FFFA0F12@infogears.com Whole thread Raw |
In response to | Regexps - never completing join. (Rusty Conover <rconover@infogears.com>) |
Responses |
Re: Regexps - never completing join.
|
List | pgsql-performance |
On May 13, 2008, at 11:45 PM, Rusty Conover wrote: > Hi Guys, > > I'm using postgresql 8.3.1 and I'm seeing weird behavior between > what I expect and what's happening when the query is executed > > I'm trying to match a table that contains regexps against another > table that is full of the text to match against so my query is: > > select wc_rule.id from classifications, wc_rule where > classifications.classification ~* wc_rule.regexp; > > When I run that the query takes a very very long time (never ending > so far 20 minutes or so) to execute. > > But if I loop through all of the rules and a query for each rule: > > select wc_rule.id from classifications, wc_rule where > classifications.classification ~* wc_rule.regexp and wc_rule.id = ? > > All of the rules when run individually can be matched in a little > under then 3 minutes. I'd assume postgres would be equal to or > faster with the single row execution method. > > The table schema: > > CREATE TABLE wc_rule ( > id integer NOT NULL, > regexp text, > ); > > CREATE TABLE classifications ( > id integer NOT NULL, > classification text NOT NULL > ); > > gb_render_1_db=# explain select wc_rule.id from classifications, > wc_rule where classifications.classification ~* wc_rule.regexp; > QUERY PLAN > ----------------------------------------------------------------------------- > Nested Loop (cost=13.71..891401.71 rows=197843 width=4) > Join Filter: (classifications.classification ~* wc_rule.regexp) > -> Seq Scan on classifications (cost=0.00..1093.46 rows=56446 > width=42) > -> Materialize (cost=13.71..20.72 rows=701 width=22) > -> Seq Scan on wc_rule (cost=0.00..13.01 rows=701 width=22) > (5 rows) > > As a followup I did some digging: by editing: src/backend/utils/adt/regexp.c and increasing the cache size for regular expressions to an arbitrarily large number #define MAX_CACHED_RES 3200 Rather then the default of #define MAX_CACHED_RES 32 I was able to get the query to complete in a respectable amount of time: gb_render_1_db=# explain analyze select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=13.71..891401.71 rows=197843 width=4) (actual time=72.714..366899.913 rows=55052 loops=1) Join Filter: (classifications.classification ~* wc_rule.regexp) -> Seq Scan on classifications (cost=0.00..1093.46 rows=56446 width=42) (actual time=28.820..109.895 rows=56446 loops=1) -> Materialize (cost=13.71..20.72 rows=701 width=22) (actual time=0.000..0.193 rows=701 loops=56446) -> Seq Scan on wc_rule (cost=0.00..13.01 rows=701 width=22) (actual time=0.030..0.593 rows=701 loops=1) Total runtime: 366916.632 ms (6 rows) Which is still > 6 minutes, but at least it completed. I'll keep digging into what is causing this bad performance. Thanks, Rusty -- Rusty Conover InfoGears Inc. http://www.infogears.com
pgsql-performance by date: