Home > mailing lists

Re: Some regular-expression performance hacking - Mailing list pgsql-hackers

From	Joel Jacobson
Subject	Re: Some regular-expression performance hacking
Date	February 18, 2021 23:54:38
Msg-id	87316a6a-2c84-4888-b58f-a53812aca369@www.fastmail.com Whole thread Raw
In response to	Re: Some regular-expression performance hacking (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

On Thu, Feb 18, 2021, at 21:44, Tom Lane wrote:

>Hmm ... I might be misunderstanding, but I think our engine already

>does a version of this. See the discussion of "colors" in

>src/backend/regex/README.

Thanks, I will read it with great interest.

>Maybe. In practice the actual scanning tends to be tracking more than one

>possible NFA state in parallel, so I'm not sure how often we could expect

>to be able to use this idea. That is, even if we know that state X can

>only succeed by following an arc to Y and then another to Z, we might

>also be interested in what happens if the NFA is in state Q at this point;

>and it seems unlikely that Q would have exactly the same two following

>arc colors.

Right. Actually I don't have a clear idea on how it could be implemented in an NFA engine.

>I do have some ideas about possible future optimizations, and one reason

>I'm grateful for this large set of real regexes is that it can provide a

>concrete basis for deciding that particular optimizations are or are not

>worth pursuing. So thanks again for collecting it!

My pleasure. Thanks for using it!

/Joel

From: Tom Lane
Date: 18 February 2021, 23:44:47
Subject: Re: Some regular-expression performance hacking

From: Tomas Vondra
Date: 19 February 2021, 00:03:56
Subject: Re: Improvements and additions to COPY progress reporting