Home > mailing lists

Regex matching where text is input and regex stored in column - Mailing list pgsql-sql

From	Achilleas Mantzios - cloud
Subject	Regex matching where text is input and regex stored in column
Date	September 15, 2023 13:52:52
Msg-id	351abe02-57a5-a219-a789-57f3b6079f73@cloud.gatewaynet.com Whole thread Raw
Responses	Re: Regex matching where text is input and regex stored in column
List	pgsql-sql

Tree view

Hello

I have a table like :

where address stores basically email patterns, human / edited text with possible wild cards, meant to be input mainly by our admins. There is an query that checks a particular new address (this runs via an exim4 mail server) which has become quite slow :

SELECT text('news3@newsaccess.in') from mail_global_addressbook where text('news3@newsaccess.in') ~* replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?') UNION SELE
CT text('news3@newsaccess.in') from mail_vessel_addressbook where text('news3@newsaccess.in') ~* replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?') LIMIT 1;

It basically seq scans both tables, and the 2nd table in particular is fairly big : 123966 rows. So the above query tends to take 3.6 secs on average to execute. A pg_trgm index would not help since there is no way (that I know of) to index a regex in pgsql. Most common use is about indexing values to be checked against a given regex, quite the reverse from what we need here.

I got great improvement by materializing replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?'), we are running pgsql 10.23 :( , no generated columns possible, so I used a trigger, and then adding a normal b-tree index on the column. This causes Index only scan which is much better than the seq scan. Also I had to re-write the query as :

with

qry1 as

(SELECT text('watchkeepers@ukmto.org') from mail_vessel_addressbook where

text('watchkeepers@ukmto.org') ~* replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')

LIMIT 1),
qry2 as

(SELECT text('watchkeepers@ukmto.org') from mail_global_addressbook where

text('watchkeepers@ukmto.org') ~* replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')

LIMIT 1)
select text('watchkeepers@ukmto.org') from qry1 UNION select text('watchkeepers@ukmto.org') from qry2 LIMIT 1;

I was wondering if there is some more elegant and better solution, without the need for extra column and trigger.

pgsql-sql by date:

From: Mehmet Sabri KUNT
Date: 14 September 2023, 11:11:30
Subject: Re: Query with conditional statement

From: "Jean-Marc Voillequin (MA)"
Date: 16 September 2023, 12:26:29
Subject: no_data_found oracle vs pg

Regex matching where text is input and regex stored in column - Mailing list pgsql-sql

Previous

Next