On Thu, Feb 18, 2021, at 11:30, Joel Jacobson wrote:
>SELECT * FROM vdeviations;
>-[ RECORD 1 ]----+-------------------------------------------------------------------------------------------------------
>pattern | \.(ac|com\.ac|edu\.ac|gov\.ac|net\.ac|mi ... 100497 chars ... abs\.org|yolasite\.com|za\.net|za\.org)$
Heh, what a funny coincidence:
The regex I used to shrink the very-long-pattern,
actually happens to run a lot faster with the patches.
I noticed it when trying to read from the vdeviations view in PostgreSQL 13.2.
Here is my little helper-function which I used to shrink patterns/subjects longer than N characters:
CREATE OR REPLACE FUNCTION shrink_text(text,integer) RETURNS text LANGUAGE sql AS $$
SELECT CASE WHEN length($1) < $2 THEN $1 ELSE
format('%s ... %s chars ... %s', m[1], length(m[2]), m[3])
END
FROM (
SELECT regexp_matches($1,format('^(.{1,%1$s})(.*?)(.{1,%1$s})$',$2/2)) AS m
) AS q
$$;
The regex aims to produce three capture groups,
where I wanted the first and third ones to be greedy
and match up to $2 characters (controlled by the second input param to the function),
and the second capture group in the middle to be non-greedy,
but match the remainder to make up a fully anchored match.
It works like expected in both 13.2 and HEAD+patches, but the speed-up it enormous:
PostgreSQL 13.2:
EXPLAIN ANALYZE SELECT regexp_matches(repeat('a',100000),'^(.{1,80})(.*?)(.{1,80})$');
QUERY PLAN
-------------------------------------------------------------------------------------------------
ProjectSet (cost=0.00..0.02 rows=1 width=32) (actual time=23600.816..23600.838 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)
Planning Time: 0.432 ms
Execution Time: 23600.859 ms
(4 rows)
HEAD+0001+0002+0003+0004+0005:
EXPLAIN ANALYZE SELECT regexp_matches(repeat('a',100000),'^(.{1,80})(.*?)(.{1,80})$');
QUERY PLAN
-------------------------------------------------------------------------------------------
ProjectSet (cost=0.00..0.02 rows=1 width=32) (actual time=36.656..36.661 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.002 rows=1 loops=1)
Planning Time: 0.575 ms
Execution Time: 36.689 ms
(4 rows)
Cool stuff.
/Joel