Thread: split_part for the last element
Hi, Suppose I need to split a string on a delimiter and select one of the resulting components. If I want a specific component counting from the start, that's easy: split_part('foo bar baz', ' ', 1) -> 'foo' But if I want the last component, I have several less-than-ideal options: 1. (string_to_array('foo bar baz', ' '))[cardinality(string_to_array('foo bar baz', ' ')) - 1] 2. reverse(split_part(reverse('foo bar baz'), ' ', 1)) 3. (regexp_match('foo baz bar', '\S*$'))[1] Option 1 is probably the most understandable, especially if you are willing to introduce a temporary parts array: select parts[cardinality(parts) - 1] from string_to_array('foo bar baz', ' ') parts But if the strings are long, this needlessly builds an array just to throw it away. Option 2 has similar efficiency problems and is just kind of silly. Option 3 is probably the best, but it's still a good bit more complicated than a simple split_part invocation. Is there another option I'm missing? Would there be interest in extending split part so that negative indices counted from the end, as in: split_part('foo bar baz', ' ', -1) -> 'baz' Or adding a split_part_end function in which positive indices counted from the end: split_part_end('foo bar baz', ' ', 1) -> 'baz' I'd be happy to prepare a patch if so. Cheers, Nikhil
Hello, reverse(split_part(reverse('foo bar baz'), ' ', 1)) -> 'baz' Regards ----- Mail original ----- De: "Nikhil Benesch" <nikhil.benesch@gmail.com> À: pgsql-general@lists.postgresql.org Envoyé: Vendredi 23 Octobre 2020 17:47:16 Objet: split_part for the last element Hi, Suppose I need to split a string on a delimiter and select one of the resulting components. If I want a specific component counting from the start, that's easy: split_part('foo bar baz', ' ', 1) -> 'foo' But if I want the last component, I have several less-than-ideal options: 1. (string_to_array('foo bar baz', ' '))[cardinality(string_to_array('foo bar baz', ' ')) - 1] 2. reverse(split_part(reverse('foo bar baz'), ' ', 1)) 3. (regexp_match('foo baz bar', '\S*$'))[1] Option 1 is probably the most understandable, especially if you are willing to introduce a temporary parts array: select parts[cardinality(parts) - 1] from string_to_array('foo bar baz', ' ') parts But if the strings are long, this needlessly builds an array just to throw it away. Option 2 has similar efficiency problems and is just kind of silly. Option 3 is probably the best, but it's still a good bit more complicated than a simple split_part invocation. Is there another option I'm missing? Would there be interest in extending split part so that negative indices counted from the end, as in: split_part('foo bar baz', ' ', -1) -> 'baz' Or adding a split_part_end function in which positive indices counted from the end: split_part_end('foo bar baz', ' ', 1) -> 'baz' I'd be happy to prepare a patch if so. Cheers, Nikhil
Right, that's option 2 in my original mail. There are several deficiencies with that idiom: * It is non-obvious. Sure, it might make sense to you and I, but to someone just learning SQL, it takes a minute to reason through why it works. They're also unlikely to invent the trick on their own. * It is inefficient. When the strings are large reversing the strings is a silly waste of compute. On Fri, Oct 23, 2020 at 12:03 PM PALAYRET Jacques <jacques.palayret@meteo.fr> wrote: > > Hello, > > reverse(split_part(reverse('foo bar baz'), ' ', 1)) -> 'baz' > > Regards > > ----- Mail original ----- > De: "Nikhil Benesch" <nikhil.benesch@gmail.com> > À: pgsql-general@lists.postgresql.org > Envoyé: Vendredi 23 Octobre 2020 17:47:16 > Objet: split_part for the last element > > Hi, > > Suppose I need to split a string on a delimiter and select one of the > resulting components. If I want a specific component counting from the > start, that's easy: > > split_part('foo bar baz', ' ', 1) -> 'foo' > > But if I want the last component, I have several less-than-ideal options: > > 1. (string_to_array('foo bar baz', ' > '))[cardinality(string_to_array('foo bar baz', ' ')) - 1] > 2. reverse(split_part(reverse('foo bar baz'), ' ', 1)) > 3. (regexp_match('foo baz bar', '\S*$'))[1] > > Option 1 is probably the most understandable, especially if you are > willing to introduce a temporary parts array: > > select parts[cardinality(parts) - 1] from string_to_array('foo bar > baz', ' ') parts > > But if the strings are long, this needlessly builds an array just to > throw it away. Option 2 has similar efficiency problems and is just > kind of silly. Option 3 is probably the best, but it's still a good > bit more complicated than a simple split_part invocation. > > Is there another option I'm missing? Would there be interest in > extending split part so that negative indices counted from the end, as > in: > > split_part('foo bar baz', ' ', -1) -> 'baz' > > Or adding a split_part_end function in which positive indices counted > from the end: > > split_part_end('foo bar baz', ' ', 1) -> 'baz' > > I'd be happy to prepare a patch if so. > > Cheers, > Nikhil > >
On Fri, Oct 23, 2020 at 8:47 AM Nikhil Benesch <nikhil.benesch@gmail.com> wrote:
Is there another option I'm missing? Would there be interest in
extending split part so that negative indices counted from the end, as
in:
split_part('foo bar baz', ' ', -1) -> 'baz'
Some thoughts:
I'm torn here because this would be the first usage of this concept in PostgreSQL (I think).
Tangentially, I noticed that we have a "starts_with" function but no corresponding "end_with".
It's been a while but there used to be a systemic inertia working against adding minor useful functions such as these.
With the new documentation layout I would at least consider updating the description for the normal functions with an example on how to formulate an expression that works contra-normally, and in the case where there does exist such a specialized function, naming it.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Fri, Oct 23, 2020 at 8:47 AM Nikhil Benesch <nikhil.benesch@gmail.com> > wrote: >> Is there another option I'm missing? Would there be interest in >> extending split part so that negative indices counted from the end, as >> in: >> split_part('foo bar baz', ' ', -1) -> 'baz' > I'm torn here because this would be the first usage of this concept in > PostgreSQL (I think). We already have some JSON functions that act like that, not to mention the left() and right() string functions, so I don't see that much of an argument against extending split_part to do it. > Tangentially, I noticed that we have a "starts_with" function but no > corresponding "end_with". > It's been a while but there used to be a systemic inertia working > against adding minor useful functions such as these. Part of the reason for that bias is that these one-off functions tend not to be very well thought out or complete :-(. The point that could be raised here is why we have split_part but not any corresponding regex-based splitter. regards, tom lane
On Fri, Oct 23, 2020 at 2:21 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > I'm torn here because this would be the first usage of this concept in > PostgreSQL (I think). Yeah, I also have some qualms about this design in the context of Postgres. Particularly because Postgres allows arrays to begin at negative indices. > Tangentially, I noticed that we have a "starts_with" function but no > corresponding "end_with". Ah, interesting. On the other hand, there are both "left" and "right", "lpad" and "rpad", and "ltrim" and "rtrim". And at least ends_with has the fairly elegant alternative of "s LIKE '%suffix'". > It's been a while but there used to be a systemic inertia working against > adding minor useful functions such as these. > > With the new documentation layout I would at least consider updating the > description for the normal functions with an example on how to formulate > an expression that works contra-normally, and in the case where there does > exist such a specialized function, naming it. Supposing you go this route, which of the options would you envision mentioning as the converse of split_part?
On Fri, Oct 23, 2020 at 2:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "David G. Johnston" <david.g.johnston@gmail.com> writes: > > On Fri, Oct 23, 2020 at 8:47 AM Nikhil Benesch <nikhil.benesch@gmail.com> > > wrote: > >> Is there another option I'm missing? Would there be interest in > >> extending split part so that negative indices counted from the end, as > >> in: > >> split_part('foo bar baz', ' ', -1) -> 'baz' > > > I'm torn here because this would be the first usage of this concept in > > PostgreSQL (I think). > > We already have some JSON functions that act like that, not to mention > the left() and right() string functions, so I don't see that much of an > argument against extending split_part to do it. Oh, I didn't realize left and right already worked this way. That makes this design much more compelling, at least to me. If the consensus is this extension is ok, I'd be happy to prepare a patch.