Re: add function argument names to regex* functions. - Mailing list pgsql-hackers

From jian he
Subject Re: add function argument names to regex* functions.
Date
Msg-id CACJufxHTw22kOmmDBpubmju9bfFhonEekwf0s6LGvysbnWEg9Q@mail.gmail.com
Whole thread Raw
In response to Re: add function argument names to regex* functions.  ("Dian Fay" <di@nmfay.com>)
Responses Re: add function argument names to regex* functions.
List pgsql-hackers
On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
>
> On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
> > On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
> > >
> > > On 1/3/24 5:05 PM, Dian Fay wrote:
> > >
> > > Another possibility is `index`, which is relatively short and not a
> > > reserved keyword ^1. `position` is not as precise but would avoid the
> > > conceptual overloading of ordinary indices.
> > >
> > > I'm not a fan of "index" since that leaves the question of
> > > whether it's 0 or 1 based. "Position" is a bit better, but I think
> > > Jian's suggestion of "occurance" is best.
> > >
> > > We do have precedent for one-based `index` in Postgres: array types are
> > > 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> > > and easy to misspell (I looked it up after typing it just now and it
> > > _still_ feels off).
> > >
> > > How's "instance"?
> > >
> > > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so
presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. 
> > >
> > > --
> > > Jim Nasby, Data Architect, Austin TX
> >
> > regexp_instr: It has the syntax regexp_instr(string, pattern [, start
> > [, N [, endoption [, flags [, subexpr ]]]]])
> > oracle:
> > REGEXP_INSTR (source_char, pattern,  [, position [, occurrence [,
> > return_opt  [, match_param  [, subexpr ]]]]] )
> >
> > "string" and "source_char" are almost the same descriptive, so maybe
> > there is no need to change.
> > "start" is better than "position", imho.
> > "return_opt" is better than "endoption", (maybe we need change, for
> > now I didn't)
> > "flags" cannot be changed to "match_param", given it quite everywhere
> > in functions-matching.html.
> >
> > similarly for function regexp_replace, oracle using "repplace_string",
> > we use "replacement"(mentioned in the doc).
> > so I don't think we need to change to "repplace_string".
> >
> > Based on how people google[0], I think `occurrence` is ok, even though
> > it's verbose.
> > to change from `N` to `occurrence`, we also need to change the doc,
> > that is why this patch is more larger.
> >
> >
> > [0]:
https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8
>
> The `regexp_replace` summary in table 9.10 is mismatched and still
> specifies the first parameter name as `string` instead of `source`.
> Since all the other functions use `string`, should `regexp_replace` do
> the same or is this a case where an established "standard" diverges?
>

got it. Thanks for pointing it out.

in functions-matching.html
if I change <replaceable>source</replaceable> to
<replaceable>string</replaceable> then
there are no markup "string" and markup "string", it's kind of
slightly confusing.

So does the following refactored description of regexp_replace make sense:

     The <replaceable>string</replaceable> is returned unchanged if
     there is no match to the <replaceable>pattern</replaceable>.  If there is a
     match, the <replaceable>string</replaceable> is returned with the
     <replaceable>replacement</replaceable> string substituted for the matching
     substring.  The <replaceable>replacement</replaceable> string can contain
     <literal>\</literal><replaceable>n</replaceable>, where
<replaceable>n</replaceable> is 1
     through 9, to indicate that the source substring matching the
     <replaceable>n</replaceable>'th parenthesized subexpression of
the pattern should be
     inserted, and it can contain <literal>\&</literal> to indicate that the
     substring matching the entire pattern should be inserted.  Write
     <literal>\\</literal> if you need to put a literal backslash in
the replacement
     text.

> I noticed the original documentation for some of these functions is
> rather disorganized; summaries explain `occurrence` without explaining
> the prior `start` parameter, and detailed documentation in 9.7 is
> usually a single paragraph per function running pell-mell through ifs
> and buts without section headings, so entries in table 9.10 have to
> reference the entire section 9.7.3 instead of their specific functions.
> It's out of scope here, but should I bring this up on pgsql-docs?

I got it.
in Table 9.10. Other String Functions and Operators, if we can
reference the specific function would be great.
As for now, in the browser, you need to use Ctrl+F to find the
detailed explanation in 9.7.3.
you can just bring your suggested or patch to pgsql-hackers@postgresql.org.



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Escape output of pg_amcheck test
Next
From: jian he
Date:
Subject: Re: SQL:2011 application time