Re: Undocumented(?) limits on regexp functions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Undocumented(?) limits on regexp functions
Date
Msg-id 17622.1534258137@sss.pgh.pa.us
Whole thread Raw
In response to Re: Undocumented(?) limits on regexp functions  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-hackers
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Doubt it --- we could use the "huge" request variants, maybe, but
>  Tom> I wonder whether the engine could run fast enough that you'd want
>  Tom> to.

> I do wonder (albeit without evidence) whether the quadratic slowdown
> problem I posted a patch for earlier was ignored for so long because
> people just went "meh, regexps are slow" rather than wondering why a
> trivial splitting of a 40kbyte string was taking more than a second.

I have done performance measurements on the regex stuff in the past,
and not noticed any huge penalty in regexp.c.  I was planning to try
to figure out what test case you were using that was different from
what I'd looked at, but not got round to it yet.

In the light of morning I'm reconsidering my initial thought of
not wanting to use MemoryContextAllocHuge.  My reaction was based
on thinking that that would allow people to reach indefinitely
large regexp inputs, but really that's not so; the maximum input
length will be a 1GB text object, hence at most 1G characters.
regexp.c needs to expand that into 4-bytes-each "chr" characters,
so it could be at most 4GB of data.  The fact that inputs between
256M and 1G characters fail could be seen as an implementation
rough edge that we ought to sand down, at least on 64-bit platforms.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] proposal: schema variables
Next
From: Konstantin Knizhnik
Date:
Subject: Re: libpq compression