Thread: [PATCH] Support negative indexes in split_part
I posted the idea of allowing negative indexes in split_part to pgsql-general last week, and it seemed there was some interest: http://postgr.es/m/CAPWqQZR%2B-5pAZNSSrnmYczRaX-huemc%3DoO8URvDZvUA-M%3DMOBA%40mail.gmail.com Attached is a patch, based on master, that implements the approach as described in that discussion. The motivation is that the existing idioms for splitting a string and selecting the nth-to-last element are rather complicated and/or inefficient: 1. (string_to_array('foo bar baz', ' '))[cardinality(string_to_array('foo bar baz', ' ')) - 1] 2. reverse(split_part(reverse('foo bar baz'), ' ', 1)) 3. (regexp_match('foo baz bar', '\S*$'))[1] With the patch, split_part(haystack, needle, -1) selects the last field of the string, split_part(haystack, needle, -2) selects the second-to-last field, and so on. Per Tom Lane, there is precedent for this design, where negative indices meaning "count from the end", namely the left and right string functions. The patch includes updates to the docs and regression tests. If the feature is deemed desirable, I believe the patch is "commit quality" (though, fair warning, this is my first contribution to Postgres, so I might have the wrong notion of what a committable patch looks like). Note that the implementation is deliberately a bit inefficient to keep things simple. When presented with a negative index, the implementation does an extra pass over the string to count the total number of fields, in order to convert the negative index to a positive index. Then it proceeds as it normally would. One can imagine adding support for backwards B-M-H, but I'm not sure that could be made to work with all multibyte encodings. We could at least avoid the extra pass over the string by allocating a circular buffer of size |n| when n is negative, but that wasn't clearly worthwhile. (I did implement the optimization for the special case of -1, since its implementation was trivial.) Cheers, Nikhil
Attachment
Nikhil Benesch <nikhil.benesch@gmail.com> writes: > I posted the idea of allowing negative indexes in split_part to pgsql-general > last week, and it seemed there was some interest: > http://postgr.es/m/CAPWqQZR%2B-5pAZNSSrnmYczRaX-huemc%3DoO8URvDZvUA-M%3DMOBA%40mail.gmail.com > Attached is a patch, based on master, that implements the approach as described > in that discussion. Please add an entry to the upcoming commitfest, to make sure we don't lose track of this: https://commitfest.postgresql.org/30/ regards, tom lane
On 11/1/20 9:56 PM, Tom Lane wrote: > Please add an entry to the upcoming commitfest, to make sure we don't > lose track of this: > > https://commitfest.postgresql.org/30/ Done: https://commitfest.postgresql.org/30/2816/
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested Patch looks good to me. Seems like a useful feature, and I agree that the two-pass implementation makes the change very easyto review. Quick note on test coverage: gcov marks the "needle not found" branch (the one marked `/* special case if fldsep not foundat all */`) as being completely uncovered. I don't think that needs to gate this patch; it looks like it was uncoveredbefore this feature was added. Doc builds are currently failing due to what appears to be an xmllint failure: /usr/bin/xmllint --path . --noout --valid postgres.sgml error : Unknown IO error postgres.sgml:21: warning: failed to load external entity "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" but that doesn't have anything to do with this patch. Marking Ready for Committer. (I'm a little new to this myself, so someoneplease let me know if I'm jumping the gun.) Thanks! --Jacob The new status of this patch is: Ready for Committer
Jacob Champion <pchampion@pivotal.io> writes: > Patch looks good to me. Seems like a useful feature, and I agree that the two-pass implementation makes the change veryeasy to review. LGTM too. I made a couple of cosmetic improvements and pushed it. > Quick note on test coverage: gcov marks the "needle not found" branch (the one marked `/* special case if fldsep not foundat all */`) as being completely uncovered. I don't think that needs to gate this patch; it looks like it was uncoveredbefore this feature was added. We seem to be trying for full test coverage of this function now, so I added a test case for that branch too. > Doc builds are currently failing due to what appears to be an xmllint failure: Unrelated, but see https://www.postgresql.org/message-id/flat/E2EE6B76-2D96-408A-B961-CAE47D1A86F0%40yesql.se regards, tom lane