Re: [GENERAL] workaround for lack of REPLACE() function - Mailing list pgsql-hackers
From | Joe Conway |
---|---|
Subject | Re: [GENERAL] workaround for lack of REPLACE() function |
Date | |
Msg-id | 3D549ED8.40102@joeconway.com Whole thread Raw |
In response to | Re: [GENERAL] workaround for lack of REPLACE() function (Thomas Lockhart <lockhart@fourpalms.org>) |
Responses |
Re: [GENERAL] workaround for lack of REPLACE() function
(Tatsuo Ishii <t-ishii@sra.co.jp>)
|
List | pgsql-hackers |
Joe Conway wrote: > more work than I had time for when I wrote the current replace(). But as > I said, if there is support for getting this into the backend, I'll add > it to my todo list: > > - Create new backend function replace() > - Either create new backend functions, or merge into existing functions: > to_hex() and extract_tok() > I'm just starting to take a look at this again. While studying the current text_substr() function I found two behaviors which conflict with specific SQL92/SQL99 requirements, and one bug. First the spec compliance -- SQL92 section 6.7/SQL99 section 6.18 say: If <character substring function> is specified, then: a) Let C be the value of the <character value expression>, let LC be the length of C, and let S be the value of the <startposition>. b) If <string length> is specified, then let L be the value of <string length> and let E be S+L. Otherwise, let E be thelarger of LC + 1 and S. c) If either C, S, or L is the null value, then the result of the <character substring function> is the null value. d) If E is less than S, then an exception condition is raised: data exception-substring error. e) Case: i) If S is greater than LC or if E is less than 1, then the result of the <character substring function>is a zero-length string. ii) Otherwise, 1) Let SI be the larger of S and 1. Let El be the smaller of E and LC+l. Let Ll be El-Sl. 2) The result of the <character substring function> is a character stringcontaining the Ll characters of C starting at character number Sl in the same order that the characters appearin C. The only way for d) to be true is when L < 0. Instead of an error, we do: test=# select substr('hello',2,-1); substr -------- ello (1 row) The other spec issue is wrt para e)i). If E (=S+L) < 1, we should return a zero-length string. Currently I get: test=# select substr('hello',-4,3); substr -------- hello (1 row) Neither behavior is documented (unless it's somewhere other than: http://developer.postgresql.org/docs/postgres/functions-string.html ). The bug is this one: test=# create DATABASE testmb with encoding = 'EUC_JP'; CREATE DATABASE test=# \c testmb You are now connected to database testmb. testmb=# select substr('hello',6,2); substr -------- ~ (1 row) testmb=# \c test You are now connected to database test. test=# select substr('hello',6,2); substr -------- (1 row) The multibyte database behavior is the bug. The SQL_ASCII behavior is correct (zero-length string): test=# select substr('hello',6,2) is null; ?column? ---------- f (1 row) Any objection if I rework this function to meet SQL92 and fix the bug? Or is the SQL92 part not desirable because it breaks backward compatability? In any case, can the #ifdef MULTIBYTE's be removed now in favor of a test for encoding max length? Joe
pgsql-hackers by date: