Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace - Mailing list pgsql-hackers
From | Gilles Darold |
---|---|
Subject | Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace |
Date | |
Msg-id | d042905c-f0b3-be60-44b8-bd3229cca66e@darold.net Whole thread Raw |
In response to | Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace (Gilles Darold <gilles@darold.net>) |
Responses |
Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
|
List | pgsql-hackers |
Le 02/08/2021 à 23:22, Gilles Darold a écrit : > Le 02/08/2021 à 01:21, Tom Lane a écrit : >> Gilles Darold <gilles@darold.net> writes: >>> [ v5-0001-regexp-foo-functions.patch ] >> I've gone through this whole patch now, and found quite a lot that I did >> not like. In no particular order: >> >> * Wrapping parentheses around the user's regexp doesn't work. It can >> turn an invalid regexp into a valid one: for example 'a)(b' should draw >> a syntax error. With this patch, no error would be thrown, but the >> "outer" parens wouldn't do what you expected. Worse, it can turn a >> valid regexp into an invalid one: the metasyntax options described in >> 9.7.3.4 only work at the start of the regexp. So we have to handle >> whole-regexp cases honestly rather than trying to turn them into an >> instance of the parenthesized-subexpression case. >> >> * You did a lot of things quite inefficiently, apparently to avoid >> touching any existing code. I think it's better to extend >> setup_regexp_matches() and replace_text_regexp() a little bit so that >> they can support the behaviors these new functions need. In both of >> them, it's absolutely trivial to allow a search start position to be >> passed in; and it doesn't take much to teach replace_text_regexp() >> to replace only the N'th match. >> >> * Speaking of N'th, there is not much of anything that I like >> about Oracle's terminology for the function arguments, and I don't >> think we ought to adopt it. If we're documenting the functions as >> processing the "N'th match", it seems to me to be natural to call >> the parameter "N" not "occurrence". Speaking of the "occurrence'th >> occurrence" is just silly, not to mention long and easy to misspell. >> Likewise, "position" is a horribly vague term for the search start >> position; it could be interpreted to mean several other things. >> "start" seems much better. "return_opt" is likewise awfully unclear. >> I went with "endoption" below, though I could be talked into something >> else. The only one of Oracle's choices that I like is "subexpr" for >> subexpression number ... but you went with DB2's rather vague "group" >> instead. I don't want to use their "capture group" terminology, >> because that appears nowhere else in our documentation. Our existing >> terminology is "parenthesized subexpression", which seems fine to me >> (and also agrees with Oracle's docs). >> >> * I spent a lot of time on the docs too. A lot of the syntax specs >> were wrong (where you put the brackets matters), many of the examples >> seemed confusingly overcomplicated, and the text explanations needed >> copy-editing. >> >> * Also, the regression tests seemed misguided. This patch is not >> responsible for testing the regexp engine as such; we have tests >> elsewhere that do that. So I don't think we need complex regexps >> here. We just need to verify that the parameters of these functions >> act properly, and check their error cases. That can be done much >> more quickly and straightforwardly than what you had. >> >> >> So here's a revised version that I like better. I think this >> is pretty nearly committable, aside from the question of whether >> a too-large subexpression number should be an error or not. > > Thanks a lot for the patch improvement and the guidance. I have read the > patch and I agree with your choices I think I was too much trying to > mimic the oraclisms. I don't think we should take care of the too-large > subexpression number, the regexp writer should always test its regular > expression and also this will not prevent him to chose the wrong capture > group number but just a non existing one. Actually I just found that the regexp_like() function doesn't support the start parameter which is something we should support. I saw that Oracle do not support it but DB2 does and I think we should also support it. I will post a new version of the patch once it is done. Best regards, -- Gilles Darold
pgsql-hackers by date: