Re: poor pefrormance with regexp searches on large tables - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: poor pefrormance with regexp searches on large tables
Date
Msg-id 4E428338020000250003FD5C@gw.wicourts.gov
Whole thread Raw
In response to Re: poor pefrormance with regexp searches on large tables  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> So far I haven't seen any real indication that the time is spent
> in evaluating the regular expressions

Just as a reality check here, I ran some counts against a
moderately-sized table (half a million rows).  Just counting the
rows unconditionally was about five times as fast as having to pick
out even a small column for a compare.  Taking a substring of a
bigger (but normally non-TOASTed) value and doing a compare was only
a little slower.  Using a regular expression anchored to the front
of the string to do the equivalent of the compare to the substring
took about twice as long as the substring approach.  For a
non-anchored regular expression where it would normally need to scan
in a bit, it took twice as long as the anchored regular expression.
These times seem like they might leave some room for improvement,
but it doesn't seem too outrageous.

Each test run three times.

select count(*) from "Case";
 count
--------
 527769
(1 row)

Time: 47.696 ms
Time: 47.858 ms
Time: 47.687 ms

select count(*) from "Case" where "filingCtofcNo" = '0878';
 count
--------
 198645
(1 row)

Time: 219.233 ms
Time: 225.410 ms
Time: 226.723 ms

select count(*) from "Case"
where substring("caption" from 1 for 5) = 'State';
 count
--------
 178142
(1 row)

Time: 238.160 ms
Time: 237.114 ms
Time: 240.388 ms

select count(*) from "Case" where "caption" ~ '^State';
 count
--------
 178142
(1 row)

Time: 532.821 ms
Time: 535.341 ms
Time: 529.121 ms

select count(*) from "Case" where "caption" ~ 'Wisconsin';
 count
--------
 157483
(1 row)

Time: 1167.433 ms
Time: 1172.282 ms
Time: 1170.562 ms

-Kevin

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: poor pefrormance with regexp searches on large tables
Next
From: Alexis Lê-Quôc
Date:
Subject: Re: Autovacuum running out of memory