Re: Some regular-expression performance hacking - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Some regular-expression performance hacking
Date
Msg-id 1662033.1613237745@sss.pgh.pa.us
Whole thread Raw
In response to Re: Some regular-expression performance hacking  ("Joel Jacobson" <joel@compiler.org>)
List pgsql-hackers
"Joel Jacobson" <joel@compiler.org> writes:
> In total, I scraped the first-page of some ~50k websites,
> which produced 45M test rows to import,
> which when GROUP BY pattern and flags was reduced
> down to 235k different regex patterns,
> and 1.5M different text string subjects.

This seems like an incredibly useful test dataset.
I'd definitely like a copy.

> No is_match differences were detected, good!

Cool ...

> However, there were 23 cases where what got captured differed:

I shall take a closer look at that.

Many thanks for doing this work!

            regards, tom lane



pgsql-hackers by date:

Previous
From: "Joel Jacobson"
Date:
Subject: Re: Some regular-expression performance hacking
Next
From: Patrick Handja
Date:
Subject: How to get Relation tuples in C function