Regex with > 32k different chars causes a backend crash - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Regex with > 32k different chars causes a backend crash
Date
Msg-id 515C46A0.3090002@vmware.com
Whole thread Raw
Responses Re: Regex with > 32k different chars causes a backend crash
List pgsql-hackers
While playing with Alexander's pg_trgm regexp patch, I noticed that the 
regexp library trips an assertion (if enabled) or crashes, when passed 
an input string that contains more than 32k different characters:

select 'foo' ~ (select string_agg(chr(x),'') from generate_series(100, 
35000) x) as nastyregex;

This is because it uses 'short' as the datatype to identify colors. When 
it overflows, -32768 is used as index to the colordesc array, and you 
get a crash. AFAICS this can't reliably be used for anything more 
sinister than crashing the backend.

A regex with that many different colors is an extreme case, so I think 
it's enough to turn the assertion in newcolor() into a run-time check, 
and throw a "too many colors in regexp" error. Alternatively, we could 
expand 'color' from short to int, but that would double the memory usage 
of sane regexps with less different characters.

Thoughts?

- Heikki



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Drastic performance loss in assert-enabled build in HEAD
Next
From: Tom Lane
Date:
Subject: Re: Regex with > 32k different chars causes a backend crash