matching column of regexps - Mailing list pgsql-sql

From James Cloos
Subject matching column of regexps
Date
Msg-id m3wqdi649f.fsf@carbon.jhcloos.org
Whole thread Raw
Responses Re: matching column of regexps  (Vik Fearing <vik.fearing@dalibo.com>)
List pgsql-sql
I have a table with a column of regexps, and need to query whether a
provided string matches any of them.

Eg,
 create table retest ( id serial primary key, active bool not null default true,                       re text unique
notnull, description text );
 

with queries of the form:
 select count(re) > 0 from retest where active is true and ? ~ re;

There also will be occasional not-as-speed-sensitive queries which need
to return the matching descriptions:
 select re, description from retest where active is true and ? ~ re;

(The serial column is there only to make it easier to change or delete
some rows when managing the table in psql.)

I was happy to find that the ~ operator works in both directions, but
querying whether count(re) > 0 was the best I could come up with to
get a bool result.

Is there a better way to answer the question, "Do ANY rows match?"
without having to return the list of matching rows?  I didn't find
anything googling.

Is there a way to index such a table/query?

One of my use cases, on contsrained systems, is likely to have fewer
than fifty rows, few of which will have active=f.  I presume that an
index is unlikely to help any given the small table size.

But another use case may end up with thousands to millions of rows.

I've considerred a single-row view defined via a function which
collapeses a list of regexps into a single regexp.  But I'm concerned
that a single massive regexp may may be too much for pg's re engine?

My tests suggest that the planner is not able to stop iterating though
the rows once one matches the where.  Do I need to write an aggregate to
accomplish that shortcut?

-JimC
--
James Cloos <cloos@jhcloos.com>         OpenPGP: 0x997A9F17ED7DAEA6



pgsql-sql by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: group number
Next
From: Vik Fearing
Date:
Subject: Re: matching column of regexps