Thread: make_greater_string() does not return a string in some cases

make_greater_string() does not return a string in some cases

From
Tatsuhito Kasahara
Date:
Hi !

make_greater_string() does not return a string when some UTF8 strings
set to str_const.
# Especially UTF8 strings which contains 'BF' in last byte.

Because make_greater_string() only try incrementing the last byte of
the string, and not try same test for upper bytes.

Therefore, some queries which contains "LIKE '<contains 'BF' in last byte>%'"
can not perform (Btree's) index-scan.
# Or may be nearly full-index-scan.

# See follwing example.
===============================================================================
'西' (Japanese Letter) : 0xE8A5BF

[client : UTF8 ⇔ server : EUC_JP]
=# EXPLAIN ANALYZE SELECT * FROM test2 WHERE name LIKE '西%';
                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Index Scan using test2_name on test2  (cost=0.00..8.28 rows=1 width=3) (actual time=0.077..0.078 rows=1 loops=1)
  Index Cond: ((name >= '西'::text) AND (name < '誠'::text))  <-- Index-scan is chosen
  Filter: (name ~~ '西%'::text)
 Total runtime: 0.110 ms
(4 rows)

[client : UTF8 ⇔ server : UTF8]
=# EXPLAIN ANALYZE SELECT * FROM test2 WHERE name LIKE '西%';
                                            QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on test2  (cost=0.00..1693.01 rows=1 width=4) (actual time=22.598..22.599 rows=1 loops=1)
  Filter: (name ~~ '西%'::text)  <-- Seq-scan is chosen !
 Total runtime: 22.626 ms
(3 rows)
===============================================================================

Attached patch solve above problem.

Best regards,

--
NTT OSS Center
Tatsuhito Kasahara



diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fc3c5b0..fdf58cf 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*************** make_greater_string(const Const *str_con
*** 5542,5552 ****
          *lastchar = savelastchar;

          /*
!          * Truncate off the last character, which might be more than 1 byte,
!          * depending on the character encoding.
           */
          if (datatype != BYTEAOID && pg_database_encoding_max_length() > 1)
!             len = pg_mbcliplen(workstr, len, len - 1);
          else
              len -= 1;

--- 5542,5567 ----
          *lastchar = savelastchar;

          /*
!          * Increment the previous character, or truncate off the last character,
!          * which might be more than 1 byte, depending on the character encoding.
           */
          if (datatype != BYTEAOID && pg_database_encoding_max_length() > 1)
!         {
!             int        i;
!             int        cliplen = pg_mbcliplen(workstr, len, len - 1);
!
!             for (i = len - 1; i > cliplen; i--)
!             {
!                 if ((unsigned char) workstr[i] < (unsigned char) 255)
!                 {
!                     workstr[i]++;
!                     memset(workstr + i + 1, 1 /* or 0? */, len - i);
!                     break;
!                 }
!             }
!             if (i <= cliplen)
!                 len = cliplen;
!         }
          else
              len -= 1;


Re: make_greater_string() does not return a string in some cases

From
Tom Lane
Date:
Tatsuhito Kasahara <kasahara.tatsuhito@oss.ntt.co.jp> writes:
> make_greater_string() does not return a string when some UTF8 strings
> set to str_const.
> # Especially UTF8 strings which contains 'BF' in last byte.

The patch you propose for this is really untenable: it will re-introduce
many corner cases that we got rid of years ago, for example cases
wherein pg_verifymbstr and pg_mbcliplen index off the end of the string
because they think the last character occupies more bytes than are
there.  It's intentional that the existing code doesn't mess with the
first byte of a multibyte character (which is the one that determines
the character length, in all encodings of interest).

Another problem is that if the last character is several bytes long,
this coding would cause us to iterate through potentially many millions
of character values before giving up and truncating off the last
character.  In a large number of cases that's just wasted time because
there is no chance of getting a larger string without incrementing some
character further to the left.  So there's a tradeoff that limits how
many values we should consider for each character position --- choosing
to consider at most 255 is a bit arbitrary, but "all of them" isn't
going to work.

I don't think that the set of cases that could be improved this way is
large enough to justify trying to find solutions to these problems.

            regards, tom lane

Re: make_greater_string() does not return a string in some cases

From
Tatsuhito Kasahara
Date:
Tom Lane wrote:
> The patch you propose for this is really untenable: it will re-introduce
> many corner cases that we got rid of years ago, for example cases
> wherein pg_verifymbstr and pg_mbcliplen index off the end of the string
> because they think the last character occupies more bytes than are
> there.

> Another problem is that if the last character is several bytes long,
> this coding would cause us to iterate through potentially many millions
> of character values before giving up and truncating off the last
> character.
Hmm...  OK, I see your points.

I have another idea.

1. We prepare new operators ( <,<=,>,=>,= ) for text and bytea.
2. In make_greater_string(), if
   multi-byte-string was set and
   using locale-C and
   could not find greater string,
   returns bytea which has greater byte-code of last-character.

User will get the following result.

=======================================================================================================
-- $B@>(B : 0xe8a5bf
=# EXPLAIN ANALYZE SELECT * FROM test WHERE name LIKE '$B@>(B%';
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Index Scan using test_name on test  (cost=0.00..8.28 rows=1 width=4) (actual time=0.022..0.024 rows=1 loops=1)
   Index Cond: ((name >= '$B@>(B'::text) AND (name < '\\xe8a5c0'::bytea))
   Filter: (name ~~ '$B@>(B%'::text)
 Total runtime: 0.053 ms
(4 rows)
=======================================================================================================

Is the idea reasonable ?

Best regards,

--
NTT OSS Center
Tatsuhito Kasahara

Re: make_greater_string() does not return a string in some cases

From
Tom Lane
Date:
Tatsuhito Kasahara <kasahara.tatsuhito@oss.ntt.co.jp> writes:
> I have another idea.

> 1. We prepare new operators ( <,<=,>,=>,= ) for text and bytea.
> 2. In make_greater_string(), if
>    multi-byte-string was set and
>    using locale-C and
>    could not find greater string,
>    returns bytea which has greater byte-code of last-character.

> Is the idea reasonable ?

Maybe, but it only works for text_pattern_ops indexes not normal ones.
Not sure if people will be happy with maintaining a special index just
to cover this corner case.

I'm not convinced that there's enough of a problem here to be worth
sweating over.  If we're not able to generate a "greater" string with
the current rules, the odds are that the pattern is so close to the end
of the index range that a one-sided test is not going to make much
difference compared to a two-sided one.

            regards, tom lane

Re: make_greater_string() does not return a string in some cases

From
Kyotaro HORIGUCHI
Date:
Hello, Could you let me go on with this topic?

It is hard to ignore this glitch for us using CJK - Chinese,
Japanese, and Korean - characters on databse.. Maybe..

Saying on Japanese under the standard usage, about a hundred
characters out of seven thousand make make_greater_string() fail.

This is not so frequent to happen but also not as rare as
ignorable.

I think this glitch is caused because the method to derive the
`next character' is fundamentally a secret of each encoding but
now it is done in make_greater_string() using the method extended
from that of 1 byte ASCII charset for all encodings together.
So, I think it is reasonable that encoding info table (struct
pg_wchar_tbl) holds the function to do that.

How about this idea?


Points to realize this follows,

- pg_wchar_tbl@pg_wchar.c has new element `charinc' that holds a function to increment a character of this encoding.

- Basically, the value of charinc is a `generic' increment function that does what make_greater_string() does in
currentimplement.
 

- make_greater_string() now uses charinc for database encoding to increment characters instead of the code directly
writtenin it.
 

- Give UTF-8 a special increment function.


As a consequence of this modification, make_greater_string()
looks somewhat simple thanks to disappearing of the sequence that
handles bare bytes in string.  And doing `increment character'
with the knowledge of the encoding can be straightforward and
light and backtrack-free, and have fewer glitches than the
generic method.

# But the process for BYTEAOID remains there dissapointingly.

There still remains some glitches but I think it is overdo to do
conversion that changes the length of the character. Only 5
points out of 17 thousands (in current method, roughly for all
BMP characters) remains, and none of them are not Japanese
character :-)

The attached patch is sample implement of this idea.

What do you think about this patch?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 10b73fb..4151ce2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5502,6 +5502,16 @@ pattern_selectivity(Const *patt, Pattern_Type ptype)/*
+ * This function is "character increment" function for bytea used in
+ * make_greater_string() that has same interface with pg_wchar_tbl.charinc.
+ */
+static bool byte_increment(unsigned char *ptr, int len)
+{
+    (*ptr)--;
+    return true;
+}
+
+/* * Try to generate a string greater than the given string or any * string it is a prefix of.  If successful, return
apalloc'd string * in the form of a Const node; else return NULL.
 
@@ -5540,6 +5550,7 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation)    int
len;   Datum        cmpstr;    text       *cmptxt = NULL;
 
+    character_incrementer charincfunc;    /*     * Get a modifiable copy of the prefix string in C-string format, and
set
@@ -5601,27 +5612,38 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation)        }    }
+    if (datatype != BYTEAOID)
+        charincfunc = pg_database_encoding_character_incrementer();
+    else
+        charincfunc = &byte_increment;
+    while (len > 0)    {
-        unsigned char *lastchar = (unsigned char *) (workstr + len - 1);
-        unsigned char savelastchar = *lastchar;
+        int charlen;
+        unsigned char *lastchar;
+        unsigned char savelastbyte;
+        Const       *workstr_const;
+        
+        if (datatype == BYTEAOID)
+            charlen = 1;
+        else
+            charlen = len - pg_mbcliplen(workstr, len, len - 1);
+
+        lastchar = (unsigned char *) (workstr + len - charlen);        /*
-         * Try to generate a larger string by incrementing the last byte.
+         * savelastbyte has meaning only for datatype == BYTEAOID         */
-        while (*lastchar < (unsigned char) 255)
-        {
-            Const       *workstr_const;
+        savelastbyte = *lastchar;
-            (*lastchar)++;
+        /*
+         * Try to generate a larger string by incrementing the last byte or
+         * character.
+         */
+        if (charincfunc(lastchar, charlen)) {            if (datatype != BYTEAOID)
-            {
-                /* do not generate invalid encoding sequences */
-                if (!pg_verifymbstr(workstr, len, true))
-                    continue;                workstr_const = string_to_const(workstr, datatype);
-            }            else                workstr_const = string_to_bytea_const(workstr, len);
@@ -5636,26 +5658,17 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation)
pfree(workstr);               return workstr_const;            }
 
-
+                        /* No good, release unusable value and try again */
pfree(DatumGetPointer(workstr_const->constvalue));           pfree(workstr_const);        }
 
-        /* restore last byte so we don't confuse pg_mbcliplen */
-        *lastchar = savelastchar;
-        /*
-         * Truncate off the last character, which might be more than 1 byte,
-         * depending on the character encoding.
+         * Truncate off the last character or restore last byte for BYTEA.         */
-        if (datatype != BYTEAOID && pg_database_encoding_max_length() > 1)
-            len = pg_mbcliplen(workstr, len, len - 1);
-        else
-            len -= 1;
-
-        if (datatype != BYTEAOID)
-            workstr[len] = '\0';
+        len -= charlen;
+        workstr[len] = (datatype != BYTEAOID ? '\0' : savelastbyte);    }    /* Failed... */
diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c
index 5b0cf62..1d6aee0 100644
--- a/src/backend/utils/mb/wchar.c
+++ b/src/backend/utils/mb/wchar.c
@@ -935,6 +935,85 @@ pg_gb18030_dsplen(const unsigned char *s)/*
*-------------------------------------------------------------------
+ * multibyte character incrementer
+ *
+ * These functions accept "charptr", a pointer to the first byte of a
+ * maybe-multibyte character. Try `increment' the character and return true if
+ * successed.  If these functions returns false, the character is not modified.
+ * -------------------------------------------------------------------
+ */
+
+static bool pg_generic_charinc(unsigned char *charptr, int len)
+{
+    unsigned char *lastchar = (unsigned char *) (charptr + len - 1);
+    unsigned char savelastchar = *lastchar;
+    const char *const_charptr = (const char *)charptr;
+
+    while (*lastchar < (unsigned char) 255)
+    {
+        (*lastchar)++;
+        if (!pg_verifymbstr(const_charptr, len, true))
+            continue;
+        return true;
+    }
+
+    *lastchar = savelastchar;
+    return false;
+}
+
+static bool pg_utf8_increment(unsigned char *charptr, int length)
+{
+    unsigned char a;
+    unsigned char bak[4];
+
+    memcpy(bak, charptr, length);
+    switch (length)
+    {
+        default:
+            /* reject lengths 5 and 6 for now */
+            return false;
+        case 4:
+            a = charptr[3];
+            if (a < 0xBF)
+            {
+                charptr[3]++;
+                break;
+            }
+            charptr[3] = 0x80;
+            /* FALL THRU */
+        case 3:
+            a = charptr[2];
+            if (a < 0xBF)
+            {
+                charptr[2]++;
+                break;
+            }
+            charptr[2] = 0x80;
+            /* FALL THRU */
+        case 2:
+            a = charptr[1];
+            if ((*charptr == 0xed && a < 0x9F) || a < 0xBF)
+            {
+                charptr[1]++;
+                break;
+            }
+            charptr[1] = 0x80;
+            /* FALL THRU */
+        case 1:
+            a = *charptr;
+            if (a == 0x7F || a == 0xDF || a == 0xEF || a == 0xF7) {
+              memcpy(charptr, bak, length);
+                return false;
+            }
+            charptr[0]++;
+            break;
+    }
+
+    return pg_utf8_islegal(charptr, length);
+}
+
+/*
+ *------------------------------------------------------------------- * multibyte sequence validators * * These
functionsaccept "s", a pointer to the first byte of a string,
 
@@ -1341,48 +1420,48 @@ pg_utf8_islegal(const unsigned char *source, int length)
*-------------------------------------------------------------------*/pg_wchar_tbl pg_wchar_table[] = {
 
-    {pg_ascii2wchar_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_ascii_verifier, 1},    /* PG_SQL_ASCII */
-    {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},    /* PG_EUC_JP */
-    {pg_euccn2wchar_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_euccn_verifier, 2},    /* PG_EUC_CN */
-    {pg_euckr2wchar_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_euckr_verifier, 3},    /* PG_EUC_KR */
-    {pg_euctw2wchar_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_euctw_verifier, 4},    /* PG_EUC_TW */
-    {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},    /* PG_EUC_JIS_2004 */
-    {pg_utf2wchar_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4},    /* PG_UTF8 */
-    {pg_mule2wchar_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier, 4},        /* PG_MULE_INTERNAL */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN1 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN2 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN3 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN4 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN5 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN6 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN7 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN8 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN9 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_LATIN10 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN1256 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN1258 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN866 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN874 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_KOI8R */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN1251 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN1252 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* ISO-8859-5 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* ISO-8859-6 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* ISO-8859-7 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* ISO-8859-8 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN1250 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN1253 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN1254 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN1255 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_WIN1257 */
-    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1},        /* PG_KOI8U */
-    {0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2},    /* PG_SJIS */
-    {0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2},    /* PG_BIG5 */
-    {0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2},        /* PG_GBK */
-    {0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2},        /* PG_UHC */
-    {0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4},    /* PG_GB18030 */
-    {0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3}, /* PG_JOHAB */
-    {0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}        /* PG_SHIFT_JIS_2004 */
+    {pg_ascii2wchar_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_generic_charinc, pg_ascii_verifier, 1},    /*
PG_SQL_ASCII*/
 
+    {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_generic_charinc, pg_eucjp_verifier, 3},    /*
PG_EUC_JP*/
 
+    {pg_euccn2wchar_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_generic_charinc, pg_euccn_verifier, 2},    /*
PG_EUC_CN*/
 
+    {pg_euckr2wchar_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_generic_charinc, pg_euckr_verifier, 3},    /*
PG_EUC_KR*/
 
+    {pg_euctw2wchar_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_generic_charinc, pg_euctw_verifier, 4},    /*
PG_EUC_TW*/
 
+    {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_generic_charinc, pg_eucjp_verifier, 3},    /*
PG_EUC_JIS_2004*/
 
+    {pg_utf2wchar_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_increment, pg_utf8_verifier, 4},    /* PG_UTF8 */
+    {pg_mule2wchar_with_len, pg_mule_mblen, pg_mule_dsplen, pg_generic_charinc, pg_mule_verifier, 4},        /*
PG_MULE_INTERNAL*/
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN1 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN2 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN3 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN4 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN5 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN6 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN7 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN8 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN9 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_LATIN10 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN1256 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN1258 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN866 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN874 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_KOI8R */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN1251 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN1252 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*ISO-8859-5 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*ISO-8859-6 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*ISO-8859-7 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*ISO-8859-8 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN1250 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN1253 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN1254 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN1255 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_WIN1257 */
 
+    {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_generic_charinc, pg_latin1_verifier, 1},
/*PG_KOI8U */
 
+    {0, pg_sjis_mblen, pg_sjis_dsplen, pg_generic_charinc, pg_sjis_verifier, 2},    /* PG_SJIS */
+    {0, pg_big5_mblen, pg_big5_dsplen, pg_generic_charinc, pg_big5_verifier, 2},    /* PG_BIG5 */
+    {0, pg_gbk_mblen, pg_gbk_dsplen, pg_generic_charinc, pg_gbk_verifier, 2},        /* PG_GBK */
+    {0, pg_uhc_mblen, pg_uhc_dsplen, pg_generic_charinc, pg_uhc_verifier, 2},        /* PG_UHC */
+    {0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_generic_charinc, pg_gb18030_verifier, 4},    /* PG_GB18030 */
+    {0, pg_johab_mblen, pg_johab_dsplen, pg_generic_charinc, pg_johab_verifier, 3}, /* PG_JOHAB */
+    {0, pg_sjis_mblen, pg_sjis_dsplen, pg_generic_charinc, pg_sjis_verifier, 2}        /* PG_SHIFT_JIS_2004 */};/*
returnsthe byte length of a word for mule internal code */
 
@@ -1459,6 +1538,15 @@ pg_database_encoding_max_length(void)}/*
+ * fetch maximum length of the encoding for the current database
+ */
+character_incrementer
+pg_database_encoding_character_incrementer(void)
+{
+    return pg_wchar_table[GetDatabaseEncoding()].charinc;
+}
+
+/* * Verify mbstr to make sure that it is validly encoded in the current * database encoding.  Otherwise same as
pg_verify_mbstr().*/
 
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 826c7af..356703a 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -284,6 +284,8 @@ typedef int (*mblen_converter) (const unsigned char *mbstr);typedef int (*mbdisplaylen_converter)
(constunsigned char *mbstr);
 
+typedef bool (*character_incrementer) (unsigned char *mbstr, int len);
+typedef int (*mbverifier) (const unsigned char *mbstr, int len);typedef struct
@@ -292,6 +294,7 @@ typedef struct                                                         * string to a wchar */
mblen_convertermblen;        /* get byte length of a char */    mbdisplaylen_converter dsplen;        /* get display
widthof a char */
 
+    character_incrementer charinc;  /* Character code incrementer if not null */    mbverifier    mbverify;        /*
verifymultibyte sequence */    int            maxmblen;        /* max bytes for a char in this encoding */}
pg_wchar_tbl;
@@ -389,6 +392,7 @@ extern int pg_encoding_mbcliplen(int encoding, const char *mbstr,extern int
pg_mbcharcliplen(constchar *mbstr, int len, int imit);extern int    pg_encoding_max_length(int encoding);extern int
pg_database_encoding_max_length(void);
+extern character_incrementer pg_database_encoding_character_incrementer(void);extern int    PrepareClientEncoding(int
encoding);externint    SetClientEncoding(int encoding); 

Re: make_greater_string() does not return a string in some cases

From
Robert Haas
Date:
On Fri, Jul 8, 2011 at 5:21 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:
> Points to realize this follows,

Please add your patch to the next CommitFest.

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] make_greater_string() does not return a string in some cases

From
Kyotaro HORIGUCHI
Date:
Thanks for your suggestion, I'll do so.

At Fri, 8 Jul 2011 23:28:32 -0400, Robert Haas <robertmhaas@gmail.com> wrote:
> Please add your patch to the next CommitFest.
> 
> https://commitfest.postgresql.org/action/commitfest_view/open
-- 
Kyotaro Horiguchi
NTT Open Source Software Center