On Sat, Jan 14, 2023 at 3:23 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Thu, Jan 5, 2023 at 2:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > The rcancelrequested API is something that I devised out of whole cloth
> > awhile ago. It's not in Tcl's copy of the code, which AFAIK is the
> > only other project using this regex engine. I do still have vague
> > hopes of someday seeing the engine as a standalone project, which is
> > why I'd prefer to keep a bright line between the engine and Postgres.
> > But there's no very strong reason to think that any hypothetical future
> > external users who need a cancel API would want this API as opposed to
> > one that requires exit() or longjmp() to get out of the engine. So if
> > we're changing the way we use it, I think it's perfectly reasonable to
> > redesign that API to make it simpler and less of an impedance mismatch.
>
> Thanks for that background. Alright then, here's a new iteration
> exploring this direction. It gets rid of CANCEL_REQUESTED() ->
> REG_CANCEL and the associated error and callback function, and instead
> has just "INTERRUPT(re);" at those cancellation points, which is a
> macro that defaults to nothing (for Tcl's benefit). Our regcustom.h
> defines it as CHECK_FOR_INTERRUPTS(). I dunno if it's worth passing
> the "re" argument... I was imagining that someone who wants to free
> memory explicitly and then longjmp would probably need it? (It might
> even be possible to expand to something that sets an error and
> returns, not investigated.) Better name or design very welcome.
I think this experiment worked out pretty well. I think it's a nice
side-effect that you can see what memory the regexp subsystem is
using, and that's likely to lead to more improvements. (Why is it
limited to caching 32 entries? Why is it a linear search, not a hash
table? Why is LRU implemented with memmove() and not a list? Could
we have a GUC regex_cache_memory, so someone who uses a lot of regexes
can opt into a large cache?) On the other hand it also uses a bit
more RAM, like other code using the reparenting trick, which is a
topic for future research.
I vote for proceeding with this approach. I wish we didn't have to
tackle either a regexp interface/management change (done here) or a
CFI() redesign (not done, but probably also a good idea for other
reasons) before getting this signal stuff straightened out, but here
we are. This approach seems good to me. Anyone have a different
take?