Regexps - never completing join. - Mailing list pgsql-performance
From | Rusty Conover |
---|---|
Subject | Regexps - never completing join. |
Date | |
Msg-id | FE4C9EB4-6599-460E-82D3-D040C93F202D@infogears.com Whole thread Raw |
Responses |
Re: Regexps - never completing join.
|
List | pgsql-performance |
Hi Guys, I'm using postgresql 8.3.1 and I'm seeing weird behavior between what I expect and what's happening when the query is executed I'm trying to match a table that contains regexps against another table that is full of the text to match against so my query is: select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp; When I run that the query takes a very very long time (never ending so far 20 minutes or so) to execute. But if I loop through all of the rules and a query for each rule: select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp and wc_rule.id = ? All of the rules when run individually can be matched in a little under then 3 minutes. I'd assume postgres would be equal to or faster with the single row execution method. The table schema: CREATE TABLE wc_rule ( id integer NOT NULL, regexp text, ); CREATE TABLE classifications ( id integer NOT NULL, classification text NOT NULL ); gb_render_1_db=# explain select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp; QUERY PLAN ----------------------------------------------------------------------------- Nested Loop (cost=13.71..891401.71 rows=197843 width=4) Join Filter: (classifications.classification ~* wc_rule.regexp) -> Seq Scan on classifications (cost=0.00..1093.46 rows=56446 width=42) -> Materialize (cost=13.71..20.72 rows=701 width=22) -> Seq Scan on wc_rule (cost=0.00..13.01 rows=701 width=22) (5 rows) gb_render_1_db=# select count(*) from classifications; count ------- 56446 (1 row) gb_render_1_db=# select count(*) from wc_rule; count ------- 701 (1 row) I have exports of the tables up at so you can try it if you'd like. http://rusty.devel.infogears.com/regexp-tables.tar.bz2 Any insight is greatly appreciated, even if it's just showing me how I made a mistake in the query. Thanks, Rusty -- Rusty Conover InfoGears Inc. http://www.infogears.com An example script that shows how each rule was run individually in perl. $dbh->begin_work(); eval { my $all_rules = $dbh->selectall_arrayref("select id from wc_rule"); foreach my $row (@$all_rules) { print "Doing rule: $row->[0]\n"; eval { local $SIG{ALRM} = sub { die("Alarm") }; alarm(5); my $results = $dbh->selectall_arrayref("select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp and wc_rule.id = ?", undef, $row->[0]); alarm(0); }; if ($@) { alarm(0); print "Got bad rule id of : $row->[0]\n"; exit(0); } alarm(0); print "ok rule: $row->[0]\n"; } }; if ($@) { print "Failed to run rules:\n$@\n"; $dbh->rollback(); $dbh->disconnect(); exit(-1); } $dbh->commit(); $dbh->disconnect(); exit(0);
pgsql-performance by date: