Thread: split_part for the last element

split_part for the last element

From
Nikhil Benesch
Date:
Hi,

Suppose I need to split a string on a delimiter and select one of the
resulting components. If I want a specific component counting from the
start, that's easy:

    split_part('foo bar baz', ' ', 1) -> 'foo'

But if I want the last component, I have several less-than-ideal options:

1. (string_to_array('foo bar baz', '
'))[cardinality(string_to_array('foo bar baz', ' ')) - 1]
2. reverse(split_part(reverse('foo bar baz'), ' ', 1))
3. (regexp_match('foo baz bar', '\S*$'))[1]

Option 1 is probably the most understandable, especially if you are
willing to introduce a temporary parts array:

    select parts[cardinality(parts) - 1] from string_to_array('foo bar
baz', ' ') parts

But if the strings are long, this needlessly builds an array just to
throw it away. Option 2 has similar efficiency problems and is just
kind of silly. Option 3 is probably the best, but it's still a good
bit more complicated than a simple split_part invocation.

Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:

    split_part('foo bar baz', ' ', -1) -> 'baz'

Or adding a split_part_end function in which positive indices counted
from the end:

    split_part_end('foo bar baz', ' ', 1) -> 'baz'

I'd be happy to prepare a patch if so.

Cheers,
Nikhil



Re: split_part for the last element

From
PALAYRET Jacques
Date:
Hello,

  reverse(split_part(reverse('foo bar baz'), ' ', 1))   -> 'baz'

Regards

----- Mail original -----
De: "Nikhil Benesch" <nikhil.benesch@gmail.com>
À: pgsql-general@lists.postgresql.org
Envoyé: Vendredi 23 Octobre 2020 17:47:16
Objet: split_part for the last element

Hi,

Suppose I need to split a string on a delimiter and select one of the
resulting components. If I want a specific component counting from the
start, that's easy:

    split_part('foo bar baz', ' ', 1) -> 'foo'

But if I want the last component, I have several less-than-ideal options:

1. (string_to_array('foo bar baz', '
'))[cardinality(string_to_array('foo bar baz', ' ')) - 1]
2. reverse(split_part(reverse('foo bar baz'), ' ', 1))
3. (regexp_match('foo baz bar', '\S*$'))[1]

Option 1 is probably the most understandable, especially if you are
willing to introduce a temporary parts array:

    select parts[cardinality(parts) - 1] from string_to_array('foo bar
baz', ' ') parts

But if the strings are long, this needlessly builds an array just to
throw it away. Option 2 has similar efficiency problems and is just
kind of silly. Option 3 is probably the best, but it's still a good
bit more complicated than a simple split_part invocation.

Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:

    split_part('foo bar baz', ' ', -1) -> 'baz'

Or adding a split_part_end function in which positive indices counted
from the end:

    split_part_end('foo bar baz', ' ', 1) -> 'baz'

I'd be happy to prepare a patch if so.

Cheers,
Nikhil



Re: split_part for the last element

From
Nikhil Benesch
Date:
Right, that's option 2 in my original mail. There are several
deficiencies with that idiom:

  * It is non-obvious. Sure, it might make sense to you and I, but to
someone just learning SQL, it takes a minute to reason through why it
works. They're also unlikely to invent the trick on their own.
  * It is inefficient. When the strings are large reversing the
strings is a silly waste of compute.

On Fri, Oct 23, 2020 at 12:03 PM PALAYRET Jacques
<jacques.palayret@meteo.fr> wrote:
>
> Hello,
>
>   reverse(split_part(reverse('foo bar baz'), ' ', 1))   -> 'baz'
>
> Regards
>
> ----- Mail original -----
> De: "Nikhil Benesch" <nikhil.benesch@gmail.com>
> À: pgsql-general@lists.postgresql.org
> Envoyé: Vendredi 23 Octobre 2020 17:47:16
> Objet: split_part for the last element
>
> Hi,
>
> Suppose I need to split a string on a delimiter and select one of the
> resulting components. If I want a specific component counting from the
> start, that's easy:
>
>     split_part('foo bar baz', ' ', 1) -> 'foo'
>
> But if I want the last component, I have several less-than-ideal options:
>
> 1. (string_to_array('foo bar baz', '
> '))[cardinality(string_to_array('foo bar baz', ' ')) - 1]
> 2. reverse(split_part(reverse('foo bar baz'), ' ', 1))
> 3. (regexp_match('foo baz bar', '\S*$'))[1]
>
> Option 1 is probably the most understandable, especially if you are
> willing to introduce a temporary parts array:
>
>     select parts[cardinality(parts) - 1] from string_to_array('foo bar
> baz', ' ') parts
>
> But if the strings are long, this needlessly builds an array just to
> throw it away. Option 2 has similar efficiency problems and is just
> kind of silly. Option 3 is probably the best, but it's still a good
> bit more complicated than a simple split_part invocation.
>
> Is there another option I'm missing? Would there be interest in
> extending split part so that negative indices counted from the end, as
> in:
>
>     split_part('foo bar baz', ' ', -1) -> 'baz'
>
> Or adding a split_part_end function in which positive indices counted
> from the end:
>
>     split_part_end('foo bar baz', ' ', 1) -> 'baz'
>
> I'd be happy to prepare a patch if so.
>
> Cheers,
> Nikhil
>
>



Re: split_part for the last element

From
"David G. Johnston"
Date:
On Fri, Oct 23, 2020 at 8:47 AM Nikhil Benesch <nikhil.benesch@gmail.com> wrote:
Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:

    split_part('foo bar baz', ' ', -1) -> 'baz'

Some thoughts:

I'm torn here because this would be the first usage of this concept in PostgreSQL (I think).

Tangentially, I noticed that we have a "starts_with" function but no corresponding "end_with".

It's been a while but there used to be a systemic inertia working against adding minor useful functions such as these.

With the new documentation layout I would at least consider updating the description for the normal functions with an example on how to formulate an expression that works contra-normally, and in the case where there does exist such a specialized function, naming it.

David J.

Re: split_part for the last element

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Fri, Oct 23, 2020 at 8:47 AM Nikhil Benesch <nikhil.benesch@gmail.com>
> wrote:
>> Is there another option I'm missing? Would there be interest in
>> extending split part so that negative indices counted from the end, as
>> in:
>>     split_part('foo bar baz', ' ', -1) -> 'baz'

> I'm torn here because this would be the first usage of this concept in
> PostgreSQL (I think).

We already have some JSON functions that act like that, not to mention
the left() and right() string functions, so I don't see that much of an
argument against extending split_part to do it.

> Tangentially, I noticed that we have a "starts_with" function but no
> corresponding "end_with".
> It's been a while but there used to be a systemic inertia working
> against adding minor useful functions such as these.

Part of the reason for that bias is that these one-off functions tend
not to be very well thought out or complete :-(.  The point that
could be raised here is why we have split_part but not any corresponding
regex-based splitter.

            regards, tom lane



Re: split_part for the last element

From
Nikhil Benesch
Date:
On Fri, Oct 23, 2020 at 2:21 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

> I'm torn here because this would be the first usage of this concept in
> PostgreSQL (I think).

Yeah, I also have some qualms about this design in the context of Postgres.
Particularly because Postgres allows arrays to begin at negative indices.

> Tangentially, I noticed that we have a "starts_with" function but no
> corresponding "end_with".

Ah, interesting. On the other hand, there are both "left" and "right",
"lpad" and "rpad", and "ltrim" and "rtrim". And at least ends_with has the
fairly elegant alternative of "s LIKE '%suffix'".

> It's been a while but there used to be a systemic inertia working against
> adding minor useful functions such as these.
>
> With the new documentation layout I would at least consider updating the
> description for the normal functions with an example on how to formulate
> an expression that works contra-normally, and in the case where there does
> exist such a specialized function, naming it.

Supposing you go this route, which of the options would you envision
mentioning as the converse of split_part?



Re: split_part for the last element

From
Nikhil Benesch
Date:
On Fri, Oct 23, 2020 at 2:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > On Fri, Oct 23, 2020 at 8:47 AM Nikhil Benesch <nikhil.benesch@gmail.com>
> > wrote:
> >> Is there another option I'm missing? Would there be interest in
> >> extending split part so that negative indices counted from the end, as
> >> in:
> >>      split_part('foo bar baz', ' ', -1) -> 'baz'
>
> > I'm torn here because this would be the first usage of this concept in
> > PostgreSQL (I think).
>
> We already have some JSON functions that act like that, not to mention
> the left() and right() string functions, so I don't see that much of an
> argument against extending split_part to do it.

Oh, I didn't realize left and right already worked this way. That
makes this design much more compelling, at least to me.

If the consensus is this extension is ok, I'd be happy to prepare a patch.