Home > mailing lists
Re: making the backend's json parser work in frontend code - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: making the backend's json parser work in frontend code
Date	January 16, 2020 20:11:15
Msg-id	15738.1579205475@sss.pgh.pa.us Whole thread Raw
In response to	making the backend's json parser work in frontend code (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: making the backend's json parser work in frontend code
List	pgsql-hackers
Tree view
Robert Haas <robertmhaas@gmail.com> writes:
> 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
> missing something, this seems like an overdue cleanup.

Here's a reviewed version of 0001.  You missed fixing the MSVC build,
and there were assorted comments and other things referencing wchar.c
that needed to be cleaned up.

Also, it seemed to me that if we are going to move wchar.c, we should
also move encnames.c, so that libpq can get fully out of the
symlinking-source-files business.  It makes initdb less weird too.

I took the liberty of sticking proper copyright headers onto these
two files, too.  (This makes the diff a lot more bulky :-(.  Would
it help to add the headers in a separate commit?)

Another thing I'm wondering about is if any of the #ifndef FRONTEND
code should get moved *back* to src/backend/utils/mb.  But that
could be a separate commit, too.

Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
migrate over to src/include/common.  But that'd be far more invasive
to other source files, so I've not touched the issue here.

            regards, tom lane

diff --git a/src/backend/utils/mb/Makefile b/src/backend/utils/mb/Makefile
index cd4a016..b19a125 100644
--- a/src/backend/utils/mb/Makefile
+++ b/src/backend/utils/mb/Makefile
@@ -14,10 +14,8 @@ include $(top_builddir)/src/Makefile.global

 OBJS = \
     conv.o \
-    encnames.o \
     mbutils.o \
     stringinfo_mb.o \
-    wchar.o \
     wstrcmp.o \
     wstrncmp.o

diff --git a/src/backend/utils/mb/README b/src/backend/utils/mb/README
index 7495ca5..ef36626 100644
--- a/src/backend/utils/mb/README
+++ b/src/backend/utils/mb/README
@@ -3,12 +3,8 @@ src/backend/utils/mb/README
 Encodings
 =========

-encnames.c:    public functions for both the backend and the frontend.
 conv.c:        static functions and a public table for code conversion
-wchar.c:    mostly static functions and a public table for mb string and
-        multibyte conversion
 mbutils.c:    public functions for the backend only.
-        requires conv.c and wchar.c
 stringinfo_mb.c: public backend-only multibyte-aware stringinfo functions
 wstrcmp.c:    strcmp for mb
 wstrncmp.c:    strncmp for mb
@@ -16,6 +12,12 @@ win866.c:    a tool to generate KOI8 <--> CP866 conversion table
 iso.c:        a tool to generate KOI8 <--> ISO8859-5 conversion table
 win1251.c:    a tool to generate KOI8 <--> CP1251 conversion table

+See also in src/common/:
+
+encnames.c:    public functions for encoding names
+wchar.c:    mostly static functions and a public table for mb string and
+        multibyte conversion
+
 Introduction
 ------------
     http://www.cprogramming.com/tutorial/unicode.html
diff --git a/src/backend/utils/mb/encnames.c b/src/backend/utils/mb/encnames.c
deleted file mode 100644
index 12b61cd..0000000
--- a/src/backend/utils/mb/encnames.c
+++ /dev/null
@@ -1,629 +0,0 @@
-/*
- * Encoding names and routines for work with it. All
- * in this file is shared between FE and BE.
- *
- * src/backend/utils/mb/encnames.c
- */
-#ifdef FRONTEND
-#include "postgres_fe.h"
-#else
-#include "postgres.h"
-#include "utils/builtins.h"
-#endif
-
-#include <ctype.h>
-#include <unistd.h>
-
-#include "mb/pg_wchar.h"
-
-
-/* ----------
- * All encoding names, sorted:         *** A L P H A B E T I C ***
- *
- * All names must be without irrelevant chars, search routines use
- * isalnum() chars only. It means ISO-8859-1, iso_8859-1 and Iso8859_1
- * are always converted to 'iso88591'. All must be lower case.
- *
- * The table doesn't contain 'cs' aliases (like csISOLatin1). It's needed?
- *
- * Karel Zak, Aug 2001
- * ----------
- */
-typedef struct pg_encname
-{
-    const char *name;
-    pg_enc        encoding;
-} pg_encname;
-
-static const pg_encname pg_encname_tbl[] =
-{
-    {
-        "abc", PG_WIN1258
-    },                            /* alias for WIN1258 */
-    {
-        "alt", PG_WIN866
-    },                            /* IBM866 */
-    {
-        "big5", PG_BIG5
-    },                            /* Big5; Chinese for Taiwan multibyte set */
-    {
-        "euccn", PG_EUC_CN
-    },                            /* EUC-CN; Extended Unix Code for simplified
-                                 * Chinese */
-    {
-        "eucjis2004", PG_EUC_JIS_2004
-    },                            /* EUC-JIS-2004; Extended UNIX Code fixed
-                                 * Width for Japanese, standard JIS X 0213 */
-    {
-        "eucjp", PG_EUC_JP
-    },                            /* EUC-JP; Extended UNIX Code fixed Width for
-                                 * Japanese, standard OSF */
-    {
-        "euckr", PG_EUC_KR
-    },                            /* EUC-KR; Extended Unix Code for Korean , KS
-                                 * X 1001 standard */
-    {
-        "euctw", PG_EUC_TW
-    },                            /* EUC-TW; Extended Unix Code for
-                                 *
-                                 * traditional Chinese */
-    {
-        "gb18030", PG_GB18030
-    },                            /* GB18030;GB18030 */
-    {
-        "gbk", PG_GBK
-    },                            /* GBK; Chinese Windows CodePage 936
-                                 * simplified Chinese */
-    {
-        "iso88591", PG_LATIN1
-    },                            /* ISO-8859-1; RFC1345,KXS2 */
-    {
-        "iso885910", PG_LATIN6
-    },                            /* ISO-8859-10; RFC1345,KXS2 */
-    {
-        "iso885913", PG_LATIN7
-    },                            /* ISO-8859-13; RFC1345,KXS2 */
-    {
-        "iso885914", PG_LATIN8
-    },                            /* ISO-8859-14; RFC1345,KXS2 */
-    {
-        "iso885915", PG_LATIN9
-    },                            /* ISO-8859-15; RFC1345,KXS2 */
-    {
-        "iso885916", PG_LATIN10
-    },                            /* ISO-8859-16; RFC1345,KXS2 */
-    {
-        "iso88592", PG_LATIN2
-    },                            /* ISO-8859-2; RFC1345,KXS2 */
-    {
-        "iso88593", PG_LATIN3
-    },                            /* ISO-8859-3; RFC1345,KXS2 */
-    {
-        "iso88594", PG_LATIN4
-    },                            /* ISO-8859-4; RFC1345,KXS2 */
-    {
-        "iso88595", PG_ISO_8859_5
-    },                            /* ISO-8859-5; RFC1345,KXS2 */
-    {
-        "iso88596", PG_ISO_8859_6
-    },                            /* ISO-8859-6; RFC1345,KXS2 */
-    {
-        "iso88597", PG_ISO_8859_7
-    },                            /* ISO-8859-7; RFC1345,KXS2 */
-    {
-        "iso88598", PG_ISO_8859_8
-    },                            /* ISO-8859-8; RFC1345,KXS2 */
-    {
-        "iso88599", PG_LATIN5
-    },                            /* ISO-8859-9; RFC1345,KXS2 */
-    {
-        "johab", PG_JOHAB
-    },                            /* JOHAB; Extended Unix Code for simplified
-                                 * Chinese */
-    {
-        "koi8", PG_KOI8R
-    },                            /* _dirty_ alias for KOI8-R (backward
-                                 * compatibility) */
-    {
-        "koi8r", PG_KOI8R
-    },                            /* KOI8-R; RFC1489 */
-    {
-        "koi8u", PG_KOI8U
-    },                            /* KOI8-U; RFC2319 */
-    {
-        "latin1", PG_LATIN1
-    },                            /* alias for ISO-8859-1 */
-    {
-        "latin10", PG_LATIN10
-    },                            /* alias for ISO-8859-16 */
-    {
-        "latin2", PG_LATIN2
-    },                            /* alias for ISO-8859-2 */
-    {
-        "latin3", PG_LATIN3
-    },                            /* alias for ISO-8859-3 */
-    {
-        "latin4", PG_LATIN4
-    },                            /* alias for ISO-8859-4 */
-    {
-        "latin5", PG_LATIN5
-    },                            /* alias for ISO-8859-9 */
-    {
-        "latin6", PG_LATIN6
-    },                            /* alias for ISO-8859-10 */
-    {
-        "latin7", PG_LATIN7
-    },                            /* alias for ISO-8859-13 */
-    {
-        "latin8", PG_LATIN8
-    },                            /* alias for ISO-8859-14 */
-    {
-        "latin9", PG_LATIN9
-    },                            /* alias for ISO-8859-15 */
-    {
-        "mskanji", PG_SJIS
-    },                            /* alias for Shift_JIS */
-    {
-        "muleinternal", PG_MULE_INTERNAL
-    },
-    {
-        "shiftjis", PG_SJIS
-    },                            /* Shift_JIS; JIS X 0202-1991 */
-
-    {
-        "shiftjis2004", PG_SHIFT_JIS_2004
-    },                            /* SHIFT-JIS-2004; Shift JIS for Japanese,
-                                 * standard JIS X 0213 */
-    {
-        "sjis", PG_SJIS
-    },                            /* alias for Shift_JIS */
-    {
-        "sqlascii", PG_SQL_ASCII
-    },
-    {
-        "tcvn", PG_WIN1258
-    },                            /* alias for WIN1258 */
-    {
-        "tcvn5712", PG_WIN1258
-    },                            /* alias for WIN1258 */
-    {
-        "uhc", PG_UHC
-    },                            /* UHC; Korean Windows CodePage 949 */
-    {
-        "unicode", PG_UTF8
-    },                            /* alias for UTF8 */
-    {
-        "utf8", PG_UTF8
-    },                            /* alias for UTF8 */
-    {
-        "vscii", PG_WIN1258
-    },                            /* alias for WIN1258 */
-    {
-        "win", PG_WIN1251
-    },                            /* _dirty_ alias for windows-1251 (backward
-                                 * compatibility) */
-    {
-        "win1250", PG_WIN1250
-    },                            /* alias for Windows-1250 */
-    {
-        "win1251", PG_WIN1251
-    },                            /* alias for Windows-1251 */
-    {
-        "win1252", PG_WIN1252
-    },                            /* alias for Windows-1252 */
-    {
-        "win1253", PG_WIN1253
-    },                            /* alias for Windows-1253 */
-    {
-        "win1254", PG_WIN1254
-    },                            /* alias for Windows-1254 */
-    {
-        "win1255", PG_WIN1255
-    },                            /* alias for Windows-1255 */
-    {
-        "win1256", PG_WIN1256
-    },                            /* alias for Windows-1256 */
-    {
-        "win1257", PG_WIN1257
-    },                            /* alias for Windows-1257 */
-    {
-        "win1258", PG_WIN1258
-    },                            /* alias for Windows-1258 */
-    {
-        "win866", PG_WIN866
-    },                            /* IBM866 */
-    {
-        "win874", PG_WIN874
-    },                            /* alias for Windows-874 */
-    {
-        "win932", PG_SJIS
-    },                            /* alias for Shift_JIS */
-    {
-        "win936", PG_GBK
-    },                            /* alias for GBK */
-    {
-        "win949", PG_UHC
-    },                            /* alias for UHC */
-    {
-        "win950", PG_BIG5
-    },                            /* alias for BIG5 */
-    {
-        "windows1250", PG_WIN1250
-    },                            /* Windows-1251; Microsoft */
-    {
-        "windows1251", PG_WIN1251
-    },                            /* Windows-1251; Microsoft */
-    {
-        "windows1252", PG_WIN1252
-    },                            /* Windows-1252; Microsoft */
-    {
-        "windows1253", PG_WIN1253
-    },                            /* Windows-1253; Microsoft */
-    {
-        "windows1254", PG_WIN1254
-    },                            /* Windows-1254; Microsoft */
-    {
-        "windows1255", PG_WIN1255
-    },                            /* Windows-1255; Microsoft */
-    {
-        "windows1256", PG_WIN1256
-    },                            /* Windows-1256; Microsoft */
-    {
-        "windows1257", PG_WIN1257
-    },                            /* Windows-1257; Microsoft */
-    {
-        "windows1258", PG_WIN1258
-    },                            /* Windows-1258; Microsoft */
-    {
-        "windows866", PG_WIN866
-    },                            /* IBM866 */
-    {
-        "windows874", PG_WIN874
-    },                            /* Windows-874; Microsoft */
-    {
-        "windows932", PG_SJIS
-    },                            /* alias for Shift_JIS */
-    {
-        "windows936", PG_GBK
-    },                            /* alias for GBK */
-    {
-        "windows949", PG_UHC
-    },                            /* alias for UHC */
-    {
-        "windows950", PG_BIG5
-    }                            /* alias for BIG5 */
-};
-
-/* ----------
- * These are "official" encoding names.
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
- * ----------
- */
-#ifndef WIN32
-#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
-#else
-#define DEF_ENC2NAME(name, codepage) { #name, PG_##name, codepage }
-#endif
-const pg_enc2name pg_enc2name_tbl[] =
-{
-    DEF_ENC2NAME(SQL_ASCII, 0),
-    DEF_ENC2NAME(EUC_JP, 20932),
-    DEF_ENC2NAME(EUC_CN, 20936),
-    DEF_ENC2NAME(EUC_KR, 51949),
-    DEF_ENC2NAME(EUC_TW, 0),
-    DEF_ENC2NAME(EUC_JIS_2004, 20932),
-    DEF_ENC2NAME(UTF8, 65001),
-    DEF_ENC2NAME(MULE_INTERNAL, 0),
-    DEF_ENC2NAME(LATIN1, 28591),
-    DEF_ENC2NAME(LATIN2, 28592),
-    DEF_ENC2NAME(LATIN3, 28593),
-    DEF_ENC2NAME(LATIN4, 28594),
-    DEF_ENC2NAME(LATIN5, 28599),
-    DEF_ENC2NAME(LATIN6, 0),
-    DEF_ENC2NAME(LATIN7, 0),
-    DEF_ENC2NAME(LATIN8, 0),
-    DEF_ENC2NAME(LATIN9, 28605),
-    DEF_ENC2NAME(LATIN10, 0),
-    DEF_ENC2NAME(WIN1256, 1256),
-    DEF_ENC2NAME(WIN1258, 1258),
-    DEF_ENC2NAME(WIN866, 866),
-    DEF_ENC2NAME(WIN874, 874),
-    DEF_ENC2NAME(KOI8R, 20866),
-    DEF_ENC2NAME(WIN1251, 1251),
-    DEF_ENC2NAME(WIN1252, 1252),
-    DEF_ENC2NAME(ISO_8859_5, 28595),
-    DEF_ENC2NAME(ISO_8859_6, 28596),
-    DEF_ENC2NAME(ISO_8859_7, 28597),
-    DEF_ENC2NAME(ISO_8859_8, 28598),
-    DEF_ENC2NAME(WIN1250, 1250),
-    DEF_ENC2NAME(WIN1253, 1253),
-    DEF_ENC2NAME(WIN1254, 1254),
-    DEF_ENC2NAME(WIN1255, 1255),
-    DEF_ENC2NAME(WIN1257, 1257),
-    DEF_ENC2NAME(KOI8U, 21866),
-    DEF_ENC2NAME(SJIS, 932),
-    DEF_ENC2NAME(BIG5, 950),
-    DEF_ENC2NAME(GBK, 936),
-    DEF_ENC2NAME(UHC, 949),
-    DEF_ENC2NAME(GB18030, 54936),
-    DEF_ENC2NAME(JOHAB, 0),
-    DEF_ENC2NAME(SHIFT_JIS_2004, 932)
-};
-
-/* ----------
- * These are encoding names for gettext.
- *
- * This covers all encodings except MULE_INTERNAL, which is alien to gettext.
- * ----------
- */
-const pg_enc2gettext pg_enc2gettext_tbl[] =
-{
-    {PG_SQL_ASCII, "US-ASCII"},
-    {PG_UTF8, "UTF-8"},
-    {PG_LATIN1, "LATIN1"},
-    {PG_LATIN2, "LATIN2"},
-    {PG_LATIN3, "LATIN3"},
-    {PG_LATIN4, "LATIN4"},
-    {PG_ISO_8859_5, "ISO-8859-5"},
-    {PG_ISO_8859_6, "ISO_8859-6"},
-    {PG_ISO_8859_7, "ISO-8859-7"},
-    {PG_ISO_8859_8, "ISO-8859-8"},
-    {PG_LATIN5, "LATIN5"},
-    {PG_LATIN6, "LATIN6"},
-    {PG_LATIN7, "LATIN7"},
-    {PG_LATIN8, "LATIN8"},
-    {PG_LATIN9, "LATIN-9"},
-    {PG_LATIN10, "LATIN10"},
-    {PG_KOI8R, "KOI8-R"},
-    {PG_KOI8U, "KOI8-U"},
-    {PG_WIN1250, "CP1250"},
-    {PG_WIN1251, "CP1251"},
-    {PG_WIN1252, "CP1252"},
-    {PG_WIN1253, "CP1253"},
-    {PG_WIN1254, "CP1254"},
-    {PG_WIN1255, "CP1255"},
-    {PG_WIN1256, "CP1256"},
-    {PG_WIN1257, "CP1257"},
-    {PG_WIN1258, "CP1258"},
-    {PG_WIN866, "CP866"},
-    {PG_WIN874, "CP874"},
-    {PG_EUC_CN, "EUC-CN"},
-    {PG_EUC_JP, "EUC-JP"},
-    {PG_EUC_KR, "EUC-KR"},
-    {PG_EUC_TW, "EUC-TW"},
-    {PG_EUC_JIS_2004, "EUC-JP"},
-    {PG_SJIS, "SHIFT-JIS"},
-    {PG_BIG5, "BIG5"},
-    {PG_GBK, "GBK"},
-    {PG_UHC, "UHC"},
-    {PG_GB18030, "GB18030"},
-    {PG_JOHAB, "JOHAB"},
-    {PG_SHIFT_JIS_2004, "SHIFT_JISX0213"},
-    {0, NULL}
-};
-
-
-#ifndef FRONTEND
-
-/*
- * Table of encoding names for ICU
- *
- * Reference: <https://ssl.icu-project.org/icu-bin/convexp>
- *
- * NULL entries are not supported by ICU, or their mapping is unclear.
- */
-static const char *const pg_enc2icu_tbl[] =
-{
-    NULL,                        /* PG_SQL_ASCII */
-    "EUC-JP",                    /* PG_EUC_JP */
-    "EUC-CN",                    /* PG_EUC_CN */
-    "EUC-KR",                    /* PG_EUC_KR */
-    "EUC-TW",                    /* PG_EUC_TW */
-    NULL,                        /* PG_EUC_JIS_2004 */
-    "UTF-8",                    /* PG_UTF8 */
-    NULL,                        /* PG_MULE_INTERNAL */
-    "ISO-8859-1",                /* PG_LATIN1 */
-    "ISO-8859-2",                /* PG_LATIN2 */
-    "ISO-8859-3",                /* PG_LATIN3 */
-    "ISO-8859-4",                /* PG_LATIN4 */
-    "ISO-8859-9",                /* PG_LATIN5 */
-    "ISO-8859-10",                /* PG_LATIN6 */
-    "ISO-8859-13",                /* PG_LATIN7 */
-    "ISO-8859-14",                /* PG_LATIN8 */
-    "ISO-8859-15",                /* PG_LATIN9 */
-    NULL,                        /* PG_LATIN10 */
-    "CP1256",                    /* PG_WIN1256 */
-    "CP1258",                    /* PG_WIN1258 */
-    "CP866",                    /* PG_WIN866 */
-    NULL,                        /* PG_WIN874 */
-    "KOI8-R",                    /* PG_KOI8R */
-    "CP1251",                    /* PG_WIN1251 */
-    "CP1252",                    /* PG_WIN1252 */
-    "ISO-8859-5",                /* PG_ISO_8859_5 */
-    "ISO-8859-6",                /* PG_ISO_8859_6 */
-    "ISO-8859-7",                /* PG_ISO_8859_7 */
-    "ISO-8859-8",                /* PG_ISO_8859_8 */
-    "CP1250",                    /* PG_WIN1250 */
-    "CP1253",                    /* PG_WIN1253 */
-    "CP1254",                    /* PG_WIN1254 */
-    "CP1255",                    /* PG_WIN1255 */
-    "CP1257",                    /* PG_WIN1257 */
-    "KOI8-U",                    /* PG_KOI8U */
-};
-
-bool
-is_encoding_supported_by_icu(int encoding)
-{
-    return (pg_enc2icu_tbl[encoding] != NULL);
-}
-
-const char *
-get_encoding_name_for_icu(int encoding)
-{
-    const char *icu_encoding_name;
-
-    StaticAssertStmt(lengthof(pg_enc2icu_tbl) == PG_ENCODING_BE_LAST + 1,
-                     "pg_enc2icu_tbl incomplete");
-
-    icu_encoding_name = pg_enc2icu_tbl[encoding];
-
-    if (!icu_encoding_name)
-        ereport(ERROR,
-                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                 errmsg("encoding \"%s\" not supported by ICU",
-                        pg_encoding_to_char(encoding))));
-
-    return icu_encoding_name;
-}
-
-#endif                            /* not FRONTEND */
-
-
-/* ----------
- * Encoding checks, for error returns -1 else encoding id
- * ----------
- */
-int
-pg_valid_client_encoding(const char *name)
-{
-    int            enc;
-
-    if ((enc = pg_char_to_encoding(name)) < 0)
-        return -1;
-
-    if (!PG_VALID_FE_ENCODING(enc))
-        return -1;
-
-    return enc;
-}
-
-int
-pg_valid_server_encoding(const char *name)
-{
-    int            enc;
-
-    if ((enc = pg_char_to_encoding(name)) < 0)
-        return -1;
-
-    if (!PG_VALID_BE_ENCODING(enc))
-        return -1;
-
-    return enc;
-}
-
-int
-pg_valid_server_encoding_id(int encoding)
-{
-    return PG_VALID_BE_ENCODING(encoding);
-}
-
-/* ----------
- * Remove irrelevant chars from encoding name
- * ----------
- */
-static char *
-clean_encoding_name(const char *key, char *newkey)
-{
-    const char *p;
-    char       *np;
-
-    for (p = key, np = newkey; *p != '\0'; p++)
-    {
-        if (isalnum((unsigned char) *p))
-        {
-            if (*p >= 'A' && *p <= 'Z')
-                *np++ = *p + 'a' - 'A';
-            else
-                *np++ = *p;
-        }
-    }
-    *np = '\0';
-    return newkey;
-}
-
-/* ----------
- * Search encoding by encoding name
- *
- * Returns encoding ID, or -1 for error
- * ----------
- */
-int
-pg_char_to_encoding(const char *name)
-{
-    unsigned int nel = lengthof(pg_encname_tbl);
-    const pg_encname *base = pg_encname_tbl,
-               *last = base + nel - 1,
-               *position;
-    int            result;
-    char        buff[NAMEDATALEN],
-               *key;
-
-    if (name == NULL || *name == '\0')
-        return -1;
-
-    if (strlen(name) >= NAMEDATALEN)
-    {
-#ifdef FRONTEND
-        fprintf(stderr, "encoding name too long\n");
-        return -1;
-#else
-        ereport(ERROR,
-                (errcode(ERRCODE_NAME_TOO_LONG),
-                 errmsg("encoding name too long")));
-#endif
-    }
-    key = clean_encoding_name(name, buff);
-
-    while (last >= base)
-    {
-        position = base + ((last - base) >> 1);
-        result = key[0] - position->name[0];
-
-        if (result == 0)
-        {
-            result = strcmp(key, position->name);
-            if (result == 0)
-                return position->encoding;
-        }
-        if (result < 0)
-            last = position - 1;
-        else
-            base = position + 1;
-    }
-    return -1;
-}
-
-#ifndef FRONTEND
-Datum
-PG_char_to_encoding(PG_FUNCTION_ARGS)
-{
-    Name        s = PG_GETARG_NAME(0);
-
-    PG_RETURN_INT32(pg_char_to_encoding(NameStr(*s)));
-}
-#endif
-
-const char *
-pg_encoding_to_char(int encoding)
-{
-    if (PG_VALID_ENCODING(encoding))
-    {
-        const pg_enc2name *p = &pg_enc2name_tbl[encoding];
-
-        Assert(encoding == p->encoding);
-        return p->name;
-    }
-    return "";
-}
-
-#ifndef FRONTEND
-Datum
-PG_encoding_to_char(PG_FUNCTION_ARGS)
-{
-    int32        encoding = PG_GETARG_INT32(0);
-    const char *encoding_name = pg_encoding_to_char(encoding);
-
-    return DirectFunctionCall1(namein, CStringGetDatum(encoding_name));
-}
-
-#endif
diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c
deleted file mode 100644
index 02e2588..0000000
--- a/src/backend/utils/mb/wchar.c
+++ /dev/null
@@ -1,2036 +0,0 @@
-/*
- * conversion functions between pg_wchar and multibyte streams.
- * Tatsuo Ishii
- * src/backend/utils/mb/wchar.c
- *
- */
-/* can be used in either frontend or backend */
-#ifdef FRONTEND
-#include "postgres_fe.h"
-#else
-#include "postgres.h"
-#endif
-
-#include "mb/pg_wchar.h"
-
-
-/*
- * Operations on multi-byte encodings are driven by a table of helper
- * functions.
- *
- * To add an encoding support, define mblen(), dsplen() and verifier() for
- * the encoding.  For server-encodings, also define mb2wchar() and wchar2mb()
- * conversion functions.
- *
- * These functions generally assume that their input is validly formed.
- * The "verifier" functions, further down in the file, have to be more
- * paranoid.
- *
- * We expect that mblen() does not need to examine more than the first byte
- * of the character to discover the correct length.  GB18030 is an exception
- * to that rule, though, as it also looks at second byte.  But even that
- * behaves in a predictable way, if you only pass the first byte: it will
- * treat 4-byte encoded characters as two 2-byte encoded characters, which is
- * good enough for all current uses.
- *
- * Note: for the display output of psql to work properly, the return values
- * of the dsplen functions must conform to the Unicode standard. In particular
- * the NUL character is zero width and control characters are generally
- * width -1. It is recommended that non-ASCII encodings refer their ASCII
- * subset to the ASCII routines to ensure consistency.
- */
-
-/*
- * SQL/ASCII
- */
-static int
-pg_ascii2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        *to++ = *from++;
-        len--;
-        cnt++;
-    }
-    *to = 0;
-    return cnt;
-}
-
-static int
-pg_ascii_mblen(const unsigned char *s)
-{
-    return 1;
-}
-
-static int
-pg_ascii_dsplen(const unsigned char *s)
-{
-    if (*s == '\0')
-        return 0;
-    if (*s < 0x20 || *s == 0x7f)
-        return -1;
-
-    return 1;
-}
-
-/*
- * EUC
- */
-static int
-pg_euc2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        if (*from == SS2 && len >= 2)    /* JIS X 0201 (so called "1 byte
-                                         * KANA") */
-        {
-            from++;
-            *to = (SS2 << 8) | *from++;
-            len -= 2;
-        }
-        else if (*from == SS3 && len >= 3)    /* JIS X 0212 KANJI */
-        {
-            from++;
-            *to = (SS3 << 16) | (*from++ << 8);
-            *to |= *from++;
-            len -= 3;
-        }
-        else if (IS_HIGHBIT_SET(*from) && len >= 2) /* JIS X 0208 KANJI */
-        {
-            *to = *from++ << 8;
-            *to |= *from++;
-            len -= 2;
-        }
-        else                    /* must be ASCII */
-        {
-            *to = *from++;
-            len--;
-        }
-        to++;
-        cnt++;
-    }
-    *to = 0;
-    return cnt;
-}
-
-static inline int
-pg_euc_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if (*s == SS2)
-        len = 2;
-    else if (*s == SS3)
-        len = 3;
-    else if (IS_HIGHBIT_SET(*s))
-        len = 2;
-    else
-        len = 1;
-    return len;
-}
-
-static inline int
-pg_euc_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    if (*s == SS2)
-        len = 2;
-    else if (*s == SS3)
-        len = 2;
-    else if (IS_HIGHBIT_SET(*s))
-        len = 2;
-    else
-        len = pg_ascii_dsplen(s);
-    return len;
-}
-
-/*
- * EUC_JP
- */
-static int
-pg_eucjp2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-    return pg_euc2wchar_with_len(from, to, len);
-}
-
-static int
-pg_eucjp_mblen(const unsigned char *s)
-{
-    return pg_euc_mblen(s);
-}
-
-static int
-pg_eucjp_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    if (*s == SS2)
-        len = 1;
-    else if (*s == SS3)
-        len = 2;
-    else if (IS_HIGHBIT_SET(*s))
-        len = 2;
-    else
-        len = pg_ascii_dsplen(s);
-    return len;
-}
-
-/*
- * EUC_KR
- */
-static int
-pg_euckr2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-    return pg_euc2wchar_with_len(from, to, len);
-}
-
-static int
-pg_euckr_mblen(const unsigned char *s)
-{
-    return pg_euc_mblen(s);
-}
-
-static int
-pg_euckr_dsplen(const unsigned char *s)
-{
-    return pg_euc_dsplen(s);
-}
-
-/*
- * EUC_CN
- *
- */
-static int
-pg_euccn2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        if (*from == SS2 && len >= 3)    /* code set 2 (unused?) */
-        {
-            from++;
-            *to = (SS2 << 16) | (*from++ << 8);
-            *to |= *from++;
-            len -= 3;
-        }
-        else if (*from == SS3 && len >= 3)    /* code set 3 (unused ?) */
-        {
-            from++;
-            *to = (SS3 << 16) | (*from++ << 8);
-            *to |= *from++;
-            len -= 3;
-        }
-        else if (IS_HIGHBIT_SET(*from) && len >= 2) /* code set 1 */
-        {
-            *to = *from++ << 8;
-            *to |= *from++;
-            len -= 2;
-        }
-        else
-        {
-            *to = *from++;
-            len--;
-        }
-        to++;
-        cnt++;
-    }
-    *to = 0;
-    return cnt;
-}
-
-static int
-pg_euccn_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_HIGHBIT_SET(*s))
-        len = 2;
-    else
-        len = 1;
-    return len;
-}
-
-static int
-pg_euccn_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_HIGHBIT_SET(*s))
-        len = 2;
-    else
-        len = pg_ascii_dsplen(s);
-    return len;
-}
-
-/*
- * EUC_TW
- *
- */
-static int
-pg_euctw2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        if (*from == SS2 && len >= 4)    /* code set 2 */
-        {
-            from++;
-            *to = (((uint32) SS2) << 24) | (*from++ << 16);
-            *to |= *from++ << 8;
-            *to |= *from++;
-            len -= 4;
-        }
-        else if (*from == SS3 && len >= 3)    /* code set 3 (unused?) */
-        {
-            from++;
-            *to = (SS3 << 16) | (*from++ << 8);
-            *to |= *from++;
-            len -= 3;
-        }
-        else if (IS_HIGHBIT_SET(*from) && len >= 2) /* code set 2 */
-        {
-            *to = *from++ << 8;
-            *to |= *from++;
-            len -= 2;
-        }
-        else
-        {
-            *to = *from++;
-            len--;
-        }
-        to++;
-        cnt++;
-    }
-    *to = 0;
-    return cnt;
-}
-
-static int
-pg_euctw_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if (*s == SS2)
-        len = 4;
-    else if (*s == SS3)
-        len = 3;
-    else if (IS_HIGHBIT_SET(*s))
-        len = 2;
-    else
-        len = 1;
-    return len;
-}
-
-static int
-pg_euctw_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    if (*s == SS2)
-        len = 2;
-    else if (*s == SS3)
-        len = 2;
-    else if (IS_HIGHBIT_SET(*s))
-        len = 2;
-    else
-        len = pg_ascii_dsplen(s);
-    return len;
-}
-
-/*
- * Convert pg_wchar to EUC_* encoding.
- * caller must allocate enough space for "to", including a trailing zero!
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_wchar2euc_with_len(const pg_wchar *from, unsigned char *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        unsigned char c;
-
-        if ((c = (*from >> 24)))
-        {
-            *to++ = c;
-            *to++ = (*from >> 16) & 0xff;
-            *to++ = (*from >> 8) & 0xff;
-            *to++ = *from & 0xff;
-            cnt += 4;
-        }
-        else if ((c = (*from >> 16)))
-        {
-            *to++ = c;
-            *to++ = (*from >> 8) & 0xff;
-            *to++ = *from & 0xff;
-            cnt += 3;
-        }
-        else if ((c = (*from >> 8)))
-        {
-            *to++ = c;
-            *to++ = *from & 0xff;
-            cnt += 2;
-        }
-        else
-        {
-            *to++ = *from;
-            cnt++;
-        }
-        from++;
-        len--;
-    }
-    *to = 0;
-    return cnt;
-}
-
-
-/*
- * JOHAB
- */
-static int
-pg_johab_mblen(const unsigned char *s)
-{
-    return pg_euc_mblen(s);
-}
-
-static int
-pg_johab_dsplen(const unsigned char *s)
-{
-    return pg_euc_dsplen(s);
-}
-
-/*
- * convert UTF8 string to pg_wchar (UCS-4)
- * caller must allocate enough space for "to", including a trailing zero!
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_utf2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-    int            cnt = 0;
-    uint32        c1,
-                c2,
-                c3,
-                c4;
-
-    while (len > 0 && *from)
-    {
-        if ((*from & 0x80) == 0)
-        {
-            *to = *from++;
-            len--;
-        }
-        else if ((*from & 0xe0) == 0xc0)
-        {
-            if (len < 2)
-                break;            /* drop trailing incomplete char */
-            c1 = *from++ & 0x1f;
-            c2 = *from++ & 0x3f;
-            *to = (c1 << 6) | c2;
-            len -= 2;
-        }
-        else if ((*from & 0xf0) == 0xe0)
-        {
-            if (len < 3)
-                break;            /* drop trailing incomplete char */
-            c1 = *from++ & 0x0f;
-            c2 = *from++ & 0x3f;
-            c3 = *from++ & 0x3f;
-            *to = (c1 << 12) | (c2 << 6) | c3;
-            len -= 3;
-        }
-        else if ((*from & 0xf8) == 0xf0)
-        {
-            if (len < 4)
-                break;            /* drop trailing incomplete char */
-            c1 = *from++ & 0x07;
-            c2 = *from++ & 0x3f;
-            c3 = *from++ & 0x3f;
-            c4 = *from++ & 0x3f;
-            *to = (c1 << 18) | (c2 << 12) | (c3 << 6) | c4;
-            len -= 4;
-        }
-        else
-        {
-            /* treat a bogus char as length 1; not ours to raise error */
-            *to = *from++;
-            len--;
-        }
-        to++;
-        cnt++;
-    }
-    *to = 0;
-    return cnt;
-}
-
-
-/*
- * Map a Unicode code point to UTF-8.  utf8string must have 4 bytes of
- * space allocated.
- */
-unsigned char *
-unicode_to_utf8(pg_wchar c, unsigned char *utf8string)
-{
-    if (c <= 0x7F)
-    {
-        utf8string[0] = c;
-    }
-    else if (c <= 0x7FF)
-    {
-        utf8string[0] = 0xC0 | ((c >> 6) & 0x1F);
-        utf8string[1] = 0x80 | (c & 0x3F);
-    }
-    else if (c <= 0xFFFF)
-    {
-        utf8string[0] = 0xE0 | ((c >> 12) & 0x0F);
-        utf8string[1] = 0x80 | ((c >> 6) & 0x3F);
-        utf8string[2] = 0x80 | (c & 0x3F);
-    }
-    else
-    {
-        utf8string[0] = 0xF0 | ((c >> 18) & 0x07);
-        utf8string[1] = 0x80 | ((c >> 12) & 0x3F);
-        utf8string[2] = 0x80 | ((c >> 6) & 0x3F);
-        utf8string[3] = 0x80 | (c & 0x3F);
-    }
-
-    return utf8string;
-}
-
-/*
- * Trivial conversion from pg_wchar to UTF-8.
- * caller should allocate enough space for "to"
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_wchar2utf_with_len(const pg_wchar *from, unsigned char *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        int            char_len;
-
-        unicode_to_utf8(*from, to);
-        char_len = pg_utf_mblen(to);
-        cnt += char_len;
-        to += char_len;
-        from++;
-        len--;
-    }
-    *to = 0;
-    return cnt;
-}
-
-/*
- * Return the byte length of a UTF8 character pointed to by s
- *
- * Note: in the current implementation we do not support UTF8 sequences
- * of more than 4 bytes; hence do NOT return a value larger than 4.
- * We return "1" for any leading byte that is either flat-out illegal or
- * indicates a length larger than we support.
- *
- * pg_utf2wchar_with_len(), utf8_to_unicode(), pg_utf8_islegal(), and perhaps
- * other places would need to be fixed to change this.
- */
-int
-pg_utf_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if ((*s & 0x80) == 0)
-        len = 1;
-    else if ((*s & 0xe0) == 0xc0)
-        len = 2;
-    else if ((*s & 0xf0) == 0xe0)
-        len = 3;
-    else if ((*s & 0xf8) == 0xf0)
-        len = 4;
-#ifdef NOT_USED
-    else if ((*s & 0xfc) == 0xf8)
-        len = 5;
-    else if ((*s & 0xfe) == 0xfc)
-        len = 6;
-#endif
-    else
-        len = 1;
-    return len;
-}
-
-/*
- * This is an implementation of wcwidth() and wcswidth() as defined in
- * "The Single UNIX Specification, Version 2, The Open Group, 1997"
- * <http://www.unix.org/online.html>
- *
- * Markus Kuhn -- 2001-09-08 -- public domain
- *
- * customised for PostgreSQL
- *
- * original available at : http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
- */
-
-struct mbinterval
-{
-    unsigned short first;
-    unsigned short last;
-};
-
-/* auxiliary function for binary search in interval table */
-static int
-mbbisearch(pg_wchar ucs, const struct mbinterval *table, int max)
-{
-    int            min = 0;
-    int            mid;
-
-    if (ucs < table[0].first || ucs > table[max].last)
-        return 0;
-    while (max >= min)
-    {
-        mid = (min + max) / 2;
-        if (ucs > table[mid].last)
-            min = mid + 1;
-        else if (ucs < table[mid].first)
-            max = mid - 1;
-        else
-            return 1;
-    }
-
-    return 0;
-}
-
-
-/* The following functions define the column width of an ISO 10646
- * character as follows:
- *
- *      - The null character (U+0000) has a column width of 0.
- *
- *      - Other C0/C1 control characters and DEL will lead to a return
- *        value of -1.
- *
- *      - Non-spacing and enclosing combining characters (general
- *        category code Mn or Me in the Unicode database) have a
- *        column width of 0.
- *
- *      - Other format characters (general category code Cf in the Unicode
- *        database) and ZERO WIDTH SPACE (U+200B) have a column width of 0.
- *
- *      - Hangul Jamo medial vowels and final consonants (U+1160-U+11FF)
- *        have a column width of 0.
- *
- *      - Spacing characters in the East Asian Wide (W) or East Asian
- *        FullWidth (F) category as defined in Unicode Technical
- *        Report #11 have a column width of 2.
- *
- *      - All remaining characters (including all printable
- *        ISO 8859-1 and WGL4 characters, Unicode control characters,
- *        etc.) have a column width of 1.
- *
- * This implementation assumes that wchar_t characters are encoded
- * in ISO 10646.
- */
-
-static int
-ucs_wcwidth(pg_wchar ucs)
-{
-#include "common/unicode_combining_table.h"
-
-    /* test for 8-bit control characters */
-    if (ucs == 0)
-        return 0;
-
-    if (ucs < 0x20 || (ucs >= 0x7f && ucs < 0xa0) || ucs > 0x0010ffff)
-        return -1;
-
-    /* binary search in table of non-spacing characters */
-    if (mbbisearch(ucs, combining,
-                   sizeof(combining) / sizeof(struct mbinterval) - 1))
-        return 0;
-
-    /*
-     * if we arrive here, ucs is not a combining or C0/C1 control character
-     */
-
-    return 1 +
-        (ucs >= 0x1100 &&
-         (ucs <= 0x115f ||        /* Hangul Jamo init. consonants */
-          (ucs >= 0x2e80 && ucs <= 0xa4cf && (ucs & ~0x0011) != 0x300a &&
-           ucs != 0x303f) ||    /* CJK ... Yi */
-          (ucs >= 0xac00 && ucs <= 0xd7a3) ||    /* Hangul Syllables */
-          (ucs >= 0xf900 && ucs <= 0xfaff) ||    /* CJK Compatibility
-                                                 * Ideographs */
-          (ucs >= 0xfe30 && ucs <= 0xfe6f) ||    /* CJK Compatibility Forms */
-          (ucs >= 0xff00 && ucs <= 0xff5f) ||    /* Fullwidth Forms */
-          (ucs >= 0xffe0 && ucs <= 0xffe6) ||
-          (ucs >= 0x20000 && ucs <= 0x2ffff)));
-}
-
-/*
- * Convert a UTF-8 character to a Unicode code point.
- * This is a one-character version of pg_utf2wchar_with_len.
- *
- * No error checks here, c must point to a long-enough string.
- */
-pg_wchar
-utf8_to_unicode(const unsigned char *c)
-{
-    if ((*c & 0x80) == 0)
-        return (pg_wchar) c[0];
-    else if ((*c & 0xe0) == 0xc0)
-        return (pg_wchar) (((c[0] & 0x1f) << 6) |
-                           (c[1] & 0x3f));
-    else if ((*c & 0xf0) == 0xe0)
-        return (pg_wchar) (((c[0] & 0x0f) << 12) |
-                           ((c[1] & 0x3f) << 6) |
-                           (c[2] & 0x3f));
-    else if ((*c & 0xf8) == 0xf0)
-        return (pg_wchar) (((c[0] & 0x07) << 18) |
-                           ((c[1] & 0x3f) << 12) |
-                           ((c[2] & 0x3f) << 6) |
-                           (c[3] & 0x3f));
-    else
-        /* that is an invalid code on purpose */
-        return 0xffffffff;
-}
-
-static int
-pg_utf_dsplen(const unsigned char *s)
-{
-    return ucs_wcwidth(utf8_to_unicode(s));
-}
-
-/*
- * convert mule internal code to pg_wchar
- * caller should allocate enough space for "to"
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_mule2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        if (IS_LC1(*from) && len >= 2)
-        {
-            *to = *from++ << 16;
-            *to |= *from++;
-            len -= 2;
-        }
-        else if (IS_LCPRV1(*from) && len >= 3)
-        {
-            from++;
-            *to = *from++ << 16;
-            *to |= *from++;
-            len -= 3;
-        }
-        else if (IS_LC2(*from) && len >= 3)
-        {
-            *to = *from++ << 16;
-            *to |= *from++ << 8;
-            *to |= *from++;
-            len -= 3;
-        }
-        else if (IS_LCPRV2(*from) && len >= 4)
-        {
-            from++;
-            *to = *from++ << 16;
-            *to |= *from++ << 8;
-            *to |= *from++;
-            len -= 4;
-        }
-        else
-        {                        /* assume ASCII */
-            *to = (unsigned char) *from++;
-            len--;
-        }
-        to++;
-        cnt++;
-    }
-    *to = 0;
-    return cnt;
-}
-
-/*
- * convert pg_wchar to mule internal code
- * caller should allocate enough space for "to"
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_wchar2mule_with_len(const pg_wchar *from, unsigned char *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        unsigned char lb;
-
-        lb = (*from >> 16) & 0xff;
-        if (IS_LC1(lb))
-        {
-            *to++ = lb;
-            *to++ = *from & 0xff;
-            cnt += 2;
-        }
-        else if (IS_LC2(lb))
-        {
-            *to++ = lb;
-            *to++ = (*from >> 8) & 0xff;
-            *to++ = *from & 0xff;
-            cnt += 3;
-        }
-        else if (IS_LCPRV1_A_RANGE(lb))
-        {
-            *to++ = LCPRV1_A;
-            *to++ = lb;
-            *to++ = *from & 0xff;
-            cnt += 3;
-        }
-        else if (IS_LCPRV1_B_RANGE(lb))
-        {
-            *to++ = LCPRV1_B;
-            *to++ = lb;
-            *to++ = *from & 0xff;
-            cnt += 3;
-        }
-        else if (IS_LCPRV2_A_RANGE(lb))
-        {
-            *to++ = LCPRV2_A;
-            *to++ = lb;
-            *to++ = (*from >> 8) & 0xff;
-            *to++ = *from & 0xff;
-            cnt += 4;
-        }
-        else if (IS_LCPRV2_B_RANGE(lb))
-        {
-            *to++ = LCPRV2_B;
-            *to++ = lb;
-            *to++ = (*from >> 8) & 0xff;
-            *to++ = *from & 0xff;
-            cnt += 4;
-        }
-        else
-        {
-            *to++ = *from & 0xff;
-            cnt += 1;
-        }
-        from++;
-        len--;
-    }
-    *to = 0;
-    return cnt;
-}
-
-int
-pg_mule_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_LC1(*s))
-        len = 2;
-    else if (IS_LCPRV1(*s))
-        len = 3;
-    else if (IS_LC2(*s))
-        len = 3;
-    else if (IS_LCPRV2(*s))
-        len = 4;
-    else
-        len = 1;                /* assume ASCII */
-    return len;
-}
-
-static int
-pg_mule_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    /*
-     * Note: it's not really appropriate to assume that all multibyte charsets
-     * are double-wide on screen.  But this seems an okay approximation for
-     * the MULE charsets we currently support.
-     */
-
-    if (IS_LC1(*s))
-        len = 1;
-    else if (IS_LCPRV1(*s))
-        len = 1;
-    else if (IS_LC2(*s))
-        len = 2;
-    else if (IS_LCPRV2(*s))
-        len = 2;
-    else
-        len = 1;                /* assume ASCII */
-
-    return len;
-}
-
-/*
- * ISO8859-1
- */
-static int
-pg_latin12wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        *to++ = *from++;
-        len--;
-        cnt++;
-    }
-    *to = 0;
-    return cnt;
-}
-
-/*
- * Trivial conversion from pg_wchar to single byte encoding. Just ignores
- * high bits.
- * caller should allocate enough space for "to"
- * len: length of from.
- * "from" not necessarily null terminated.
- */
-static int
-pg_wchar2single_with_len(const pg_wchar *from, unsigned char *to, int len)
-{
-    int            cnt = 0;
-
-    while (len > 0 && *from)
-    {
-        *to++ = *from++;
-        len--;
-        cnt++;
-    }
-    *to = 0;
-    return cnt;
-}
-
-static int
-pg_latin1_mblen(const unsigned char *s)
-{
-    return 1;
-}
-
-static int
-pg_latin1_dsplen(const unsigned char *s)
-{
-    return pg_ascii_dsplen(s);
-}
-
-/*
- * SJIS
- */
-static int
-pg_sjis_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if (*s >= 0xa1 && *s <= 0xdf)
-        len = 1;                /* 1 byte kana? */
-    else if (IS_HIGHBIT_SET(*s))
-        len = 2;                /* kanji? */
-    else
-        len = 1;                /* should be ASCII */
-    return len;
-}
-
-static int
-pg_sjis_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    if (*s >= 0xa1 && *s <= 0xdf)
-        len = 1;                /* 1 byte kana? */
-    else if (IS_HIGHBIT_SET(*s))
-        len = 2;                /* kanji? */
-    else
-        len = pg_ascii_dsplen(s);    /* should be ASCII */
-    return len;
-}
-
-/*
- * Big5
- */
-static int
-pg_big5_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_HIGHBIT_SET(*s))
-        len = 2;                /* kanji? */
-    else
-        len = 1;                /* should be ASCII */
-    return len;
-}
-
-static int
-pg_big5_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_HIGHBIT_SET(*s))
-        len = 2;                /* kanji? */
-    else
-        len = pg_ascii_dsplen(s);    /* should be ASCII */
-    return len;
-}
-
-/*
- * GBK
- */
-static int
-pg_gbk_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_HIGHBIT_SET(*s))
-        len = 2;                /* kanji? */
-    else
-        len = 1;                /* should be ASCII */
-    return len;
-}
-
-static int
-pg_gbk_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_HIGHBIT_SET(*s))
-        len = 2;                /* kanji? */
-    else
-        len = pg_ascii_dsplen(s);    /* should be ASCII */
-    return len;
-}
-
-/*
- * UHC
- */
-static int
-pg_uhc_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_HIGHBIT_SET(*s))
-        len = 2;                /* 2byte? */
-    else
-        len = 1;                /* should be ASCII */
-    return len;
-}
-
-static int
-pg_uhc_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_HIGHBIT_SET(*s))
-        len = 2;                /* 2byte? */
-    else
-        len = pg_ascii_dsplen(s);    /* should be ASCII */
-    return len;
-}
-
-/*
- * GB18030
- *    Added by Bill Huang <bhuang@redhat.com>,<bill_huanghb@ybb.ne.jp>
- */
-
-/*
- * Unlike all other mblen() functions, this also looks at the second byte of
- * the input.  However, if you only pass the first byte of a multi-byte
- * string, and \0 as the second byte, this still works in a predictable way:
- * a 4-byte character will be reported as two 2-byte characters.  That's
- * enough for all current uses, as a client-only encoding.  It works that
- * way, because in any valid 4-byte GB18030-encoded character, the third and
- * fourth byte look like a 2-byte encoded character, when looked at
- * separately.
- */
-static int
-pg_gb18030_mblen(const unsigned char *s)
-{
-    int            len;
-
-    if (!IS_HIGHBIT_SET(*s))
-        len = 1;                /* ASCII */
-    else if (*(s + 1) >= 0x30 && *(s + 1) <= 0x39)
-        len = 4;
-    else
-        len = 2;
-    return len;
-}
-
-static int
-pg_gb18030_dsplen(const unsigned char *s)
-{
-    int            len;
-
-    if (IS_HIGHBIT_SET(*s))
-        len = 2;
-    else
-        len = pg_ascii_dsplen(s);    /* ASCII */
-    return len;
-}
-
-/*
- *-------------------------------------------------------------------
- * multibyte sequence validators
- *
- * These functions accept "s", a pointer to the first byte of a string,
- * and "len", the remaining length of the string.  If there is a validly
- * encoded character beginning at *s, return its length in bytes; else
- * return -1.
- *
- * The functions can assume that len > 0 and that *s != '\0', but they must
- * test for and reject zeroes in any additional bytes of a multibyte character.
- *
- * Note that this definition allows the function for a single-byte
- * encoding to be just "return 1".
- *-------------------------------------------------------------------
- */
-
-static int
-pg_ascii_verifier(const unsigned char *s, int len)
-{
-    return 1;
-}
-
-#define IS_EUC_RANGE_VALID(c)    ((c) >= 0xa1 && (c) <= 0xfe)
-
-static int
-pg_eucjp_verifier(const unsigned char *s, int len)
-{
-    int            l;
-    unsigned char c1,
-                c2;
-
-    c1 = *s++;
-
-    switch (c1)
-    {
-        case SS2:                /* JIS X 0201 */
-            l = 2;
-            if (l > len)
-                return -1;
-            c2 = *s++;
-            if (c2 < 0xa1 || c2 > 0xdf)
-                return -1;
-            break;
-
-        case SS3:                /* JIS X 0212 */
-            l = 3;
-            if (l > len)
-                return -1;
-            c2 = *s++;
-            if (!IS_EUC_RANGE_VALID(c2))
-                return -1;
-            c2 = *s++;
-            if (!IS_EUC_RANGE_VALID(c2))
-                return -1;
-            break;
-
-        default:
-            if (IS_HIGHBIT_SET(c1)) /* JIS X 0208? */
-            {
-                l = 2;
-                if (l > len)
-                    return -1;
-                if (!IS_EUC_RANGE_VALID(c1))
-                    return -1;
-                c2 = *s++;
-                if (!IS_EUC_RANGE_VALID(c2))
-                    return -1;
-            }
-            else
-                /* must be ASCII */
-            {
-                l = 1;
-            }
-            break;
-    }
-
-    return l;
-}
-
-static int
-pg_euckr_verifier(const unsigned char *s, int len)
-{
-    int            l;
-    unsigned char c1,
-                c2;
-
-    c1 = *s++;
-
-    if (IS_HIGHBIT_SET(c1))
-    {
-        l = 2;
-        if (l > len)
-            return -1;
-        if (!IS_EUC_RANGE_VALID(c1))
-            return -1;
-        c2 = *s++;
-        if (!IS_EUC_RANGE_VALID(c2))
-            return -1;
-    }
-    else
-        /* must be ASCII */
-    {
-        l = 1;
-    }
-
-    return l;
-}
-
-/* EUC-CN byte sequences are exactly same as EUC-KR */
-#define pg_euccn_verifier    pg_euckr_verifier
-
-static int
-pg_euctw_verifier(const unsigned char *s, int len)
-{
-    int            l;
-    unsigned char c1,
-                c2;
-
-    c1 = *s++;
-
-    switch (c1)
-    {
-        case SS2:                /* CNS 11643 Plane 1-7 */
-            l = 4;
-            if (l > len)
-                return -1;
-            c2 = *s++;
-            if (c2 < 0xa1 || c2 > 0xa7)
-                return -1;
-            c2 = *s++;
-            if (!IS_EUC_RANGE_VALID(c2))
-                return -1;
-            c2 = *s++;
-            if (!IS_EUC_RANGE_VALID(c2))
-                return -1;
-            break;
-
-        case SS3:                /* unused */
-            return -1;
-
-        default:
-            if (IS_HIGHBIT_SET(c1)) /* CNS 11643 Plane 1 */
-            {
-                l = 2;
-                if (l > len)
-                    return -1;
-                /* no further range check on c1? */
-                c2 = *s++;
-                if (!IS_EUC_RANGE_VALID(c2))
-                    return -1;
-            }
-            else
-                /* must be ASCII */
-            {
-                l = 1;
-            }
-            break;
-    }
-    return l;
-}
-
-static int
-pg_johab_verifier(const unsigned char *s, int len)
-{
-    int            l,
-                mbl;
-    unsigned char c;
-
-    l = mbl = pg_johab_mblen(s);
-
-    if (len < l)
-        return -1;
-
-    if (!IS_HIGHBIT_SET(*s))
-        return mbl;
-
-    while (--l > 0)
-    {
-        c = *++s;
-        if (!IS_EUC_RANGE_VALID(c))
-            return -1;
-    }
-    return mbl;
-}
-
-static int
-pg_mule_verifier(const unsigned char *s, int len)
-{
-    int            l,
-                mbl;
-    unsigned char c;
-
-    l = mbl = pg_mule_mblen(s);
-
-    if (len < l)
-        return -1;
-
-    while (--l > 0)
-    {
-        c = *++s;
-        if (!IS_HIGHBIT_SET(c))
-            return -1;
-    }
-    return mbl;
-}
-
-static int
-pg_latin1_verifier(const unsigned char *s, int len)
-{
-    return 1;
-}
-
-static int
-pg_sjis_verifier(const unsigned char *s, int len)
-{
-    int            l,
-                mbl;
-    unsigned char c1,
-                c2;
-
-    l = mbl = pg_sjis_mblen(s);
-
-    if (len < l)
-        return -1;
-
-    if (l == 1)                    /* pg_sjis_mblen already verified it */
-        return mbl;
-
-    c1 = *s++;
-    c2 = *s;
-    if (!ISSJISHEAD(c1) || !ISSJISTAIL(c2))
-        return -1;
-    return mbl;
-}
-
-static int
-pg_big5_verifier(const unsigned char *s, int len)
-{
-    int            l,
-                mbl;
-
-    l = mbl = pg_big5_mblen(s);
-
-    if (len < l)
-        return -1;
-
-    while (--l > 0)
-    {
-        if (*++s == '\0')
-            return -1;
-    }
-
-    return mbl;
-}
-
-static int
-pg_gbk_verifier(const unsigned char *s, int len)
-{
-    int            l,
-                mbl;
-
-    l = mbl = pg_gbk_mblen(s);
-
-    if (len < l)
-        return -1;
-
-    while (--l > 0)
-    {
-        if (*++s == '\0')
-            return -1;
-    }
-
-    return mbl;
-}
-
-static int
-pg_uhc_verifier(const unsigned char *s, int len)
-{
-    int            l,
-                mbl;
-
-    l = mbl = pg_uhc_mblen(s);
-
-    if (len < l)
-        return -1;
-
-    while (--l > 0)
-    {
-        if (*++s == '\0')
-            return -1;
-    }
-
-    return mbl;
-}
-
-static int
-pg_gb18030_verifier(const unsigned char *s, int len)
-{
-    int            l;
-
-    if (!IS_HIGHBIT_SET(*s))
-        l = 1;                    /* ASCII */
-    else if (len >= 4 && *(s + 1) >= 0x30 && *(s + 1) <= 0x39)
-    {
-        /* Should be 4-byte, validate remaining bytes */
-        if (*s >= 0x81 && *s <= 0xfe &&
-            *(s + 2) >= 0x81 && *(s + 2) <= 0xfe &&
-            *(s + 3) >= 0x30 && *(s + 3) <= 0x39)
-            l = 4;
-        else
-            l = -1;
-    }
-    else if (len >= 2 && *s >= 0x81 && *s <= 0xfe)
-    {
-        /* Should be 2-byte, validate */
-        if ((*(s + 1) >= 0x40 && *(s + 1) <= 0x7e) ||
-            (*(s + 1) >= 0x80 && *(s + 1) <= 0xfe))
-            l = 2;
-        else
-            l = -1;
-    }
-    else
-        l = -1;
-    return l;
-}
-
-static int
-pg_utf8_verifier(const unsigned char *s, int len)
-{
-    int            l = pg_utf_mblen(s);
-
-    if (len < l)
-        return -1;
-
-    if (!pg_utf8_islegal(s, l))
-        return -1;
-
-    return l;
-}
-
-/*
- * Check for validity of a single UTF-8 encoded character
- *
- * This directly implements the rules in RFC3629.  The bizarre-looking
- * restrictions on the second byte are meant to ensure that there isn't
- * more than one encoding of a given Unicode character point; that is,
- * you may not use a longer-than-necessary byte sequence with high order
- * zero bits to represent a character that would fit in fewer bytes.
- * To do otherwise is to create security hazards (eg, create an apparent
- * non-ASCII character that decodes to plain ASCII).
- *
- * length is assumed to have been obtained by pg_utf_mblen(), and the
- * caller must have checked that that many bytes are present in the buffer.
- */
-bool
-pg_utf8_islegal(const unsigned char *source, int length)
-{
-    unsigned char a;
-
-    switch (length)
-    {
-        default:
-            /* reject lengths 5 and 6 for now */
-            return false;
-        case 4:
-            a = source[3];
-            if (a < 0x80 || a > 0xBF)
-                return false;
-            /* FALL THRU */
-        case 3:
-            a = source[2];
-            if (a < 0x80 || a > 0xBF)
-                return false;
-            /* FALL THRU */
-        case 2:
-            a = source[1];
-            switch (*source)
-            {
-                case 0xE0:
-                    if (a < 0xA0 || a > 0xBF)
-                        return false;
-                    break;
-                case 0xED:
-                    if (a < 0x80 || a > 0x9F)
-                        return false;
-                    break;
-                case 0xF0:
-                    if (a < 0x90 || a > 0xBF)
-                        return false;
-                    break;
-                case 0xF4:
-                    if (a < 0x80 || a > 0x8F)
-                        return false;
-                    break;
-                default:
-                    if (a < 0x80 || a > 0xBF)
-                        return false;
-                    break;
-            }
-            /* FALL THRU */
-        case 1:
-            a = *source;
-            if (a >= 0x80 && a < 0xC2)
-                return false;
-            if (a > 0xF4)
-                return false;
-            break;
-    }
-    return true;
-}
-
-#ifndef FRONTEND
-
-/*
- * Generic character incrementer function.
- *
- * Not knowing anything about the properties of the encoding in use, we just
- * keep incrementing the last byte until we get a validly-encoded result,
- * or we run out of values to try.  We don't bother to try incrementing
- * higher-order bytes, so there's no growth in runtime for wider characters.
- * (If we did try to do that, we'd need to consider the likelihood that 255
- * is not a valid final byte in the encoding.)
- */
-static bool
-pg_generic_charinc(unsigned char *charptr, int len)
-{
-    unsigned char *lastbyte = charptr + len - 1;
-    mbverifier    mbverify;
-
-    /* We can just invoke the character verifier directly. */
-    mbverify = pg_wchar_table[GetDatabaseEncoding()].mbverify;
-
-    while (*lastbyte < (unsigned char) 255)
-    {
-        (*lastbyte)++;
-        if ((*mbverify) (charptr, len) == len)
-            return true;
-    }
-
-    return false;
-}
-
-/*
- * UTF-8 character incrementer function.
- *
- * For a one-byte character less than 0x7F, we just increment the byte.
- *
- * For a multibyte character, every byte but the first must fall between 0x80
- * and 0xBF; and the first byte must be between 0xC0 and 0xF4.  We increment
- * the last byte that's not already at its maximum value.  If we can't find a
- * byte that's less than the maximum allowable value, we simply fail.  We also
- * need some special-case logic to skip regions used for surrogate pair
- * handling, as those should not occur in valid UTF-8.
- *
- * Note that we don't reset lower-order bytes back to their minimums, since
- * we can't afford to make an exhaustive search (see make_greater_string).
- */
-static bool
-pg_utf8_increment(unsigned char *charptr, int length)
-{
-    unsigned char a;
-    unsigned char limit;
-
-    switch (length)
-    {
-        default:
-            /* reject lengths 5 and 6 for now */
-            return false;
-        case 4:
-            a = charptr[3];
-            if (a < 0xBF)
-            {
-                charptr[3]++;
-                break;
-            }
-            /* FALL THRU */
-        case 3:
-            a = charptr[2];
-            if (a < 0xBF)
-            {
-                charptr[2]++;
-                break;
-            }
-            /* FALL THRU */
-        case 2:
-            a = charptr[1];
-            switch (*charptr)
-            {
-                case 0xED:
-                    limit = 0x9F;
-                    break;
-                case 0xF4:
-                    limit = 0x8F;
-                    break;
-                default:
-                    limit = 0xBF;
-                    break;
-            }
-            if (a < limit)
-            {
-                charptr[1]++;
-                break;
-            }
-            /* FALL THRU */
-        case 1:
-            a = *charptr;
-            if (a == 0x7F || a == 0xDF || a == 0xEF || a == 0xF4)
-                return false;
-            charptr[0]++;
-            break;
-    }
-
-    return true;
-}
-
-/*
- * EUC-JP character incrementer function.
- *
- * If the sequence starts with SS2 (0x8e), it must be a two-byte sequence
- * representing JIS X 0201 characters with the second byte ranging between
- * 0xa1 and 0xdf.  We just increment the last byte if it's less than 0xdf,
- * and otherwise rewrite the whole sequence to 0xa1 0xa1.
- *
- * If the sequence starts with SS3 (0x8f), it must be a three-byte sequence
- * in which the last two bytes range between 0xa1 and 0xfe.  The last byte
- * is incremented if possible, otherwise the second-to-last byte.
- *
- * If the sequence starts with a value other than the above and its MSB
- * is set, it must be a two-byte sequence representing JIS X 0208 characters
- * with both bytes ranging between 0xa1 and 0xfe.  The last byte is
- * incremented if possible, otherwise the second-to-last byte.
- *
- * Otherwise, the sequence is a single-byte ASCII character. It is
- * incremented up to 0x7f.
- */
-static bool
-pg_eucjp_increment(unsigned char *charptr, int length)
-{
-    unsigned char c1,
-                c2;
-    int            i;
-
-    c1 = *charptr;
-
-    switch (c1)
-    {
-        case SS2:                /* JIS X 0201 */
-            if (length != 2)
-                return false;
-
-            c2 = charptr[1];
-
-            if (c2 >= 0xdf)
-                charptr[0] = charptr[1] = 0xa1;
-            else if (c2 < 0xa1)
-                charptr[1] = 0xa1;
-            else
-                charptr[1]++;
-            break;
-
-        case SS3:                /* JIS X 0212 */
-            if (length != 3)
-                return false;
-
-            for (i = 2; i > 0; i--)
-            {
-                c2 = charptr[i];
-                if (c2 < 0xa1)
-                {
-                    charptr[i] = 0xa1;
-                    return true;
-                }
-                else if (c2 < 0xfe)
-                {
-                    charptr[i]++;
-                    return true;
-                }
-            }
-
-            /* Out of 3-byte code region */
-            return false;
-
-        default:
-            if (IS_HIGHBIT_SET(c1)) /* JIS X 0208? */
-            {
-                if (length != 2)
-                    return false;
-
-                for (i = 1; i >= 0; i--)
-                {
-                    c2 = charptr[i];
-                    if (c2 < 0xa1)
-                    {
-                        charptr[i] = 0xa1;
-                        return true;
-                    }
-                    else if (c2 < 0xfe)
-                    {
-                        charptr[i]++;
-                        return true;
-                    }
-                }
-
-                /* Out of 2 byte code region */
-                return false;
-            }
-            else
-            {                    /* ASCII, single byte */
-                if (c1 > 0x7e)
-                    return false;
-                (*charptr)++;
-            }
-            break;
-    }
-
-    return true;
-}
-#endif                            /* !FRONTEND */
-
-
-/*
- *-------------------------------------------------------------------
- * encoding info table
- * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
- *-------------------------------------------------------------------
- */
-const pg_wchar_tbl pg_wchar_table[] = {
-    {pg_ascii2wchar_with_len, pg_wchar2single_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_ascii_verifier, 1}, /*
PG_SQL_ASCII*/ 
-    {pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},    /*
PG_EUC_JP*/ 
-    {pg_euccn2wchar_with_len, pg_wchar2euc_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_euccn_verifier, 2},    /*
PG_EUC_CN*/ 
-    {pg_euckr2wchar_with_len, pg_wchar2euc_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_euckr_verifier, 3},    /*
PG_EUC_KR*/ 
-    {pg_euctw2wchar_with_len, pg_wchar2euc_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_euctw_verifier, 4},    /*
PG_EUC_TW*/ 
-    {pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},    /*
PG_EUC_JIS_2004*/ 
-    {pg_utf2wchar_with_len, pg_wchar2utf_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4},    /* PG_UTF8 */
-    {pg_mule2wchar_with_len, pg_wchar2mule_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier, 4},    /*
PG_MULE_INTERNAL*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN1*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN2*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN3*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN4*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN5*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN6*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN7*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN8*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN9*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN10*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1256*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1258*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN866*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN874*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_KOI8R*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1251*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1252*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
ISO-8859-5*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
ISO-8859-6*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
ISO-8859-7*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
ISO-8859-8*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1250*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1253*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1254*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1255*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1257*/ 
-    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_KOI8U*/ 
-    {0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}, /* PG_SJIS */
-    {0, 0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2}, /* PG_BIG5 */
-    {0, 0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2},    /* PG_GBK */
-    {0, 0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2},    /* PG_UHC */
-    {0, 0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4},    /* PG_GB18030 */
-    {0, 0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3},    /* PG_JOHAB */
-    {0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}    /* PG_SHIFT_JIS_2004 */
-};
-
-/* returns the byte length of a word for mule internal code */
-int
-pg_mic_mblen(const unsigned char *mbstr)
-{
-    return pg_mule_mblen(mbstr);
-}
-
-/*
- * Returns the byte length of a multibyte character.
- */
-int
-pg_encoding_mblen(int encoding, const char *mbstr)
-{
-    return (PG_VALID_ENCODING(encoding) ?
-            pg_wchar_table[encoding].mblen((const unsigned char *) mbstr) :
-            pg_wchar_table[PG_SQL_ASCII].mblen((const unsigned char *) mbstr));
-}
-
-/*
- * Returns the display length of a multibyte character.
- */
-int
-pg_encoding_dsplen(int encoding, const char *mbstr)
-{
-    return (PG_VALID_ENCODING(encoding) ?
-            pg_wchar_table[encoding].dsplen((const unsigned char *) mbstr) :
-            pg_wchar_table[PG_SQL_ASCII].dsplen((const unsigned char *) mbstr));
-}
-
-/*
- * Verify the first multibyte character of the given string.
- * Return its byte length if good, -1 if bad.  (See comments above for
- * full details of the mbverify API.)
- */
-int
-pg_encoding_verifymb(int encoding, const char *mbstr, int len)
-{
-    return (PG_VALID_ENCODING(encoding) ?
-            pg_wchar_table[encoding].mbverify((const unsigned char *) mbstr, len) :
-            pg_wchar_table[PG_SQL_ASCII].mbverify((const unsigned char *) mbstr, len));
-}
-
-/*
- * fetch maximum length of a given encoding
- */
-int
-pg_encoding_max_length(int encoding)
-{
-    Assert(PG_VALID_ENCODING(encoding));
-
-    return pg_wchar_table[encoding].maxmblen;
-}
-
-#ifndef FRONTEND
-
-/*
- * fetch maximum length of the encoding for the current database
- */
-int
-pg_database_encoding_max_length(void)
-{
-    return pg_wchar_table[GetDatabaseEncoding()].maxmblen;
-}
-
-/*
- * get the character incrementer for the encoding for the current database
- */
-mbcharacter_incrementer
-pg_database_encoding_character_incrementer(void)
-{
-    /*
-     * Eventually it might be best to add a field to pg_wchar_table[], but for
-     * now we just use a switch.
-     */
-    switch (GetDatabaseEncoding())
-    {
-        case PG_UTF8:
-            return pg_utf8_increment;
-
-        case PG_EUC_JP:
-            return pg_eucjp_increment;
-
-        default:
-            return pg_generic_charinc;
-    }
-}
-
-/*
- * Verify mbstr to make sure that it is validly encoded in the current
- * database encoding.  Otherwise same as pg_verify_mbstr().
- */
-bool
-pg_verifymbstr(const char *mbstr, int len, bool noError)
-{
-    return
-        pg_verify_mbstr_len(GetDatabaseEncoding(), mbstr, len, noError) >= 0;
-}
-
-/*
- * Verify mbstr to make sure that it is validly encoded in the specified
- * encoding.
- */
-bool
-pg_verify_mbstr(int encoding, const char *mbstr, int len, bool noError)
-{
-    return pg_verify_mbstr_len(encoding, mbstr, len, noError) >= 0;
-}
-
-/*
- * Verify mbstr to make sure that it is validly encoded in the specified
- * encoding.
- *
- * mbstr is not necessarily zero terminated; length of mbstr is
- * specified by len.
- *
- * If OK, return length of string in the encoding.
- * If a problem is found, return -1 when noError is
- * true; when noError is false, ereport() a descriptive message.
- */
-int
-pg_verify_mbstr_len(int encoding, const char *mbstr, int len, bool noError)
-{
-    mbverifier    mbverify;
-    int            mb_len;
-
-    Assert(PG_VALID_ENCODING(encoding));
-
-    /*
-     * In single-byte encodings, we need only reject nulls (\0).
-     */
-    if (pg_encoding_max_length(encoding) <= 1)
-    {
-        const char *nullpos = memchr(mbstr, 0, len);
-
-        if (nullpos == NULL)
-            return len;
-        if (noError)
-            return -1;
-        report_invalid_encoding(encoding, nullpos, 1);
-    }
-
-    /* fetch function pointer just once */
-    mbverify = pg_wchar_table[encoding].mbverify;
-
-    mb_len = 0;
-
-    while (len > 0)
-    {
-        int            l;
-
-        /* fast path for ASCII-subset characters */
-        if (!IS_HIGHBIT_SET(*mbstr))
-        {
-            if (*mbstr != '\0')
-            {
-                mb_len++;
-                mbstr++;
-                len--;
-                continue;
-            }
-            if (noError)
-                return -1;
-            report_invalid_encoding(encoding, mbstr, len);
-        }
-
-        l = (*mbverify) ((const unsigned char *) mbstr, len);
-
-        if (l < 0)
-        {
-            if (noError)
-                return -1;
-            report_invalid_encoding(encoding, mbstr, len);
-        }
-
-        mbstr += l;
-        len -= l;
-        mb_len++;
-    }
-    return mb_len;
-}
-
-/*
- * check_encoding_conversion_args: check arguments of a conversion function
- *
- * "expected" arguments can be either an encoding ID or -1 to indicate that
- * the caller will check whether it accepts the ID.
- *
- * Note: the errors here are not really user-facing, so elog instead of
- * ereport seems sufficient.  Also, we trust that the "expected" encoding
- * arguments are valid encoding IDs, but we don't trust the actuals.
- */
-void
-check_encoding_conversion_args(int src_encoding,
-                               int dest_encoding,
-                               int len,
-                               int expected_src_encoding,
-                               int expected_dest_encoding)
-{
-    if (!PG_VALID_ENCODING(src_encoding))
-        elog(ERROR, "invalid source encoding ID: %d", src_encoding);
-    if (src_encoding != expected_src_encoding && expected_src_encoding >= 0)
-        elog(ERROR, "expected source encoding \"%s\", but got \"%s\"",
-             pg_enc2name_tbl[expected_src_encoding].name,
-             pg_enc2name_tbl[src_encoding].name);
-    if (!PG_VALID_ENCODING(dest_encoding))
-        elog(ERROR, "invalid destination encoding ID: %d", dest_encoding);
-    if (dest_encoding != expected_dest_encoding && expected_dest_encoding >= 0)
-        elog(ERROR, "expected destination encoding \"%s\", but got \"%s\"",
-             pg_enc2name_tbl[expected_dest_encoding].name,
-             pg_enc2name_tbl[dest_encoding].name);
-    if (len < 0)
-        elog(ERROR, "encoding conversion length must not be negative");
-}
-
-/*
- * report_invalid_encoding: complain about invalid multibyte character
- *
- * note: len is remaining length of string, not length of character;
- * len must be greater than zero, as we always examine the first byte.
- */
-void
-report_invalid_encoding(int encoding, const char *mbstr, int len)
-{
-    int            l = pg_encoding_mblen(encoding, mbstr);
-    char        buf[8 * 5 + 1];
-    char       *p = buf;
-    int            j,
-                jlimit;
-
-    jlimit = Min(l, len);
-    jlimit = Min(jlimit, 8);    /* prevent buffer overrun */
-
-    for (j = 0; j < jlimit; j++)
-    {
-        p += sprintf(p, "0x%02x", (unsigned char) mbstr[j]);
-        if (j < jlimit - 1)
-            p += sprintf(p, " ");
-    }
-
-    ereport(ERROR,
-            (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
-             errmsg("invalid byte sequence for encoding \"%s\": %s",
-                    pg_enc2name_tbl[encoding].name,
-                    buf)));
-}
-
-/*
- * report_untranslatable_char: complain about untranslatable character
- *
- * note: len is remaining length of string, not length of character;
- * len must be greater than zero, as we always examine the first byte.
- */
-void
-report_untranslatable_char(int src_encoding, int dest_encoding,
-                           const char *mbstr, int len)
-{
-    int            l = pg_encoding_mblen(src_encoding, mbstr);
-    char        buf[8 * 5 + 1];
-    char       *p = buf;
-    int            j,
-                jlimit;
-
-    jlimit = Min(l, len);
-    jlimit = Min(jlimit, 8);    /* prevent buffer overrun */
-
-    for (j = 0; j < jlimit; j++)
-    {
-        p += sprintf(p, "0x%02x", (unsigned char) mbstr[j]);
-        if (j < jlimit - 1)
-            p += sprintf(p, " ");
-    }
-
-    ereport(ERROR,
-            (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
-             errmsg("character with byte sequence %s in encoding \"%s\" has no equivalent in encoding \"%s\"",
-                    buf,
-                    pg_enc2name_tbl[src_encoding].name,
-                    pg_enc2name_tbl[dest_encoding].name)));
-}
-
-#endif                            /* !FRONTEND */
diff --git a/src/bin/initdb/.gitignore b/src/bin/initdb/.gitignore
index 71a899f..b3167c4 100644
--- a/src/bin/initdb/.gitignore
+++ b/src/bin/initdb/.gitignore
@@ -1,4 +1,3 @@
-/encnames.c
 /localtime.c

 /initdb
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index f587a86..7e23754 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -18,7 +18,12 @@ include $(top_builddir)/src/Makefile.global

 override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) -I$(top_srcdir)/src/timezone $(CPPFLAGS)

-# note: we need libpq only because fe_utils does
+# Note: it's important that we link to encnames.o from libpgcommon, not
+# from libpq, else we have risks of version skew if we run with a libpq
+# shared library from a different PG version.  The libpq_pgport macro
+# should ensure that that happens.
+#
+# We need libpq only because fe_utils does.
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)

 # use system timezone data?
@@ -28,7 +33,6 @@ endif

 OBJS = \
     $(WIN32RES) \
-    encnames.o \
     findtimezone.o \
     initdb.o \
     localtime.o
@@ -38,15 +42,7 @@ all: initdb
 initdb: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
     $(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)

-# We used to pull in all of libpq to get encnames.c, but that
-# exposes us to risks of version skew if we link to a shared library.
-# Do it the hard way, instead, so that we're statically linked.
-
-encnames.c: % : $(top_srcdir)/src/backend/utils/mb/%
-    rm -f $@ && $(LN_S) $< .
-
-# Likewise, pull in localtime.c from src/timezones
-
+# We must pull in localtime.c from src/timezones
 localtime.c: % : $(top_srcdir)/src/timezone/%
     rm -f $@ && $(LN_S) $< .

@@ -60,7 +56,7 @@ uninstall:
     rm -f '$(DESTDIR)$(bindir)/initdb$(X)'

 clean distclean maintainer-clean:
-    rm -f initdb$(X) $(OBJS) encnames.c localtime.c
+    rm -f initdb$(X) $(OBJS) localtime.c
     rm -rf tmp_check

 # ensure that changes in datadir propagate into object file
diff --git a/src/common/Makefile b/src/common/Makefile
index ffb0f6e..5b44340 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -51,6 +51,7 @@ OBJS_COMMON = \
     config_info.o \
     controldata_utils.o \
     d2s.o \
+    encnames.o \
     exec.o \
     f2s.o \
     file_perm.o \
@@ -70,7 +71,8 @@ OBJS_COMMON = \
     stringinfo.o \
     unicode_norm.o \
     username.o \
-    wait_error.o
+    wait_error.o \
+    wchar.o

 ifeq ($(with_openssl),yes)
 OBJS_COMMON += sha2_openssl.o
diff --git a/src/common/encnames.c b/src/common/encnames.c
new file mode 100644
index 0000000..2086e00
--- /dev/null
+++ b/src/common/encnames.c
@@ -0,0 +1,635 @@
+/*-------------------------------------------------------------------------
+ *
+ * encnames.c
+ *      Encoding names and routines for working with them.
+ *
+ * Portions Copyright (c) 2001-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *      src/common/encnames.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
+#include "postgres.h"
+#include "utils/builtins.h"
+#endif
+
+#include <ctype.h>
+#include <unistd.h>
+
+#include "mb/pg_wchar.h"
+
+
+/* ----------
+ * All encoding names, sorted:         *** A L P H A B E T I C ***
+ *
+ * All names must be without irrelevant chars, search routines use
+ * isalnum() chars only. It means ISO-8859-1, iso_8859-1 and Iso8859_1
+ * are always converted to 'iso88591'. All must be lower case.
+ *
+ * The table doesn't contain 'cs' aliases (like csISOLatin1). It's needed?
+ *
+ * Karel Zak, Aug 2001
+ * ----------
+ */
+typedef struct pg_encname
+{
+    const char *name;
+    pg_enc        encoding;
+} pg_encname;
+
+static const pg_encname pg_encname_tbl[] =
+{
+    {
+        "abc", PG_WIN1258
+    },                            /* alias for WIN1258 */
+    {
+        "alt", PG_WIN866
+    },                            /* IBM866 */
+    {
+        "big5", PG_BIG5
+    },                            /* Big5; Chinese for Taiwan multibyte set */
+    {
+        "euccn", PG_EUC_CN
+    },                            /* EUC-CN; Extended Unix Code for simplified
+                                 * Chinese */
+    {
+        "eucjis2004", PG_EUC_JIS_2004
+    },                            /* EUC-JIS-2004; Extended UNIX Code fixed
+                                 * Width for Japanese, standard JIS X 0213 */
+    {
+        "eucjp", PG_EUC_JP
+    },                            /* EUC-JP; Extended UNIX Code fixed Width for
+                                 * Japanese, standard OSF */
+    {
+        "euckr", PG_EUC_KR
+    },                            /* EUC-KR; Extended Unix Code for Korean , KS
+                                 * X 1001 standard */
+    {
+        "euctw", PG_EUC_TW
+    },                            /* EUC-TW; Extended Unix Code for
+                                 *
+                                 * traditional Chinese */
+    {
+        "gb18030", PG_GB18030
+    },                            /* GB18030;GB18030 */
+    {
+        "gbk", PG_GBK
+    },                            /* GBK; Chinese Windows CodePage 936
+                                 * simplified Chinese */
+    {
+        "iso88591", PG_LATIN1
+    },                            /* ISO-8859-1; RFC1345,KXS2 */
+    {
+        "iso885910", PG_LATIN6
+    },                            /* ISO-8859-10; RFC1345,KXS2 */
+    {
+        "iso885913", PG_LATIN7
+    },                            /* ISO-8859-13; RFC1345,KXS2 */
+    {
+        "iso885914", PG_LATIN8
+    },                            /* ISO-8859-14; RFC1345,KXS2 */
+    {
+        "iso885915", PG_LATIN9
+    },                            /* ISO-8859-15; RFC1345,KXS2 */
+    {
+        "iso885916", PG_LATIN10
+    },                            /* ISO-8859-16; RFC1345,KXS2 */
+    {
+        "iso88592", PG_LATIN2
+    },                            /* ISO-8859-2; RFC1345,KXS2 */
+    {
+        "iso88593", PG_LATIN3
+    },                            /* ISO-8859-3; RFC1345,KXS2 */
+    {
+        "iso88594", PG_LATIN4
+    },                            /* ISO-8859-4; RFC1345,KXS2 */
+    {
+        "iso88595", PG_ISO_8859_5
+    },                            /* ISO-8859-5; RFC1345,KXS2 */
+    {
+        "iso88596", PG_ISO_8859_6
+    },                            /* ISO-8859-6; RFC1345,KXS2 */
+    {
+        "iso88597", PG_ISO_8859_7
+    },                            /* ISO-8859-7; RFC1345,KXS2 */
+    {
+        "iso88598", PG_ISO_8859_8
+    },                            /* ISO-8859-8; RFC1345,KXS2 */
+    {
+        "iso88599", PG_LATIN5
+    },                            /* ISO-8859-9; RFC1345,KXS2 */
+    {
+        "johab", PG_JOHAB
+    },                            /* JOHAB; Extended Unix Code for simplified
+                                 * Chinese */
+    {
+        "koi8", PG_KOI8R
+    },                            /* _dirty_ alias for KOI8-R (backward
+                                 * compatibility) */
+    {
+        "koi8r", PG_KOI8R
+    },                            /* KOI8-R; RFC1489 */
+    {
+        "koi8u", PG_KOI8U
+    },                            /* KOI8-U; RFC2319 */
+    {
+        "latin1", PG_LATIN1
+    },                            /* alias for ISO-8859-1 */
+    {
+        "latin10", PG_LATIN10
+    },                            /* alias for ISO-8859-16 */
+    {
+        "latin2", PG_LATIN2
+    },                            /* alias for ISO-8859-2 */
+    {
+        "latin3", PG_LATIN3
+    },                            /* alias for ISO-8859-3 */
+    {
+        "latin4", PG_LATIN4
+    },                            /* alias for ISO-8859-4 */
+    {
+        "latin5", PG_LATIN5
+    },                            /* alias for ISO-8859-9 */
+    {
+        "latin6", PG_LATIN6
+    },                            /* alias for ISO-8859-10 */
+    {
+        "latin7", PG_LATIN7
+    },                            /* alias for ISO-8859-13 */
+    {
+        "latin8", PG_LATIN8
+    },                            /* alias for ISO-8859-14 */
+    {
+        "latin9", PG_LATIN9
+    },                            /* alias for ISO-8859-15 */
+    {
+        "mskanji", PG_SJIS
+    },                            /* alias for Shift_JIS */
+    {
+        "muleinternal", PG_MULE_INTERNAL
+    },
+    {
+        "shiftjis", PG_SJIS
+    },                            /* Shift_JIS; JIS X 0202-1991 */
+
+    {
+        "shiftjis2004", PG_SHIFT_JIS_2004
+    },                            /* SHIFT-JIS-2004; Shift JIS for Japanese,
+                                 * standard JIS X 0213 */
+    {
+        "sjis", PG_SJIS
+    },                            /* alias for Shift_JIS */
+    {
+        "sqlascii", PG_SQL_ASCII
+    },
+    {
+        "tcvn", PG_WIN1258
+    },                            /* alias for WIN1258 */
+    {
+        "tcvn5712", PG_WIN1258
+    },                            /* alias for WIN1258 */
+    {
+        "uhc", PG_UHC
+    },                            /* UHC; Korean Windows CodePage 949 */
+    {
+        "unicode", PG_UTF8
+    },                            /* alias for UTF8 */
+    {
+        "utf8", PG_UTF8
+    },                            /* alias for UTF8 */
+    {
+        "vscii", PG_WIN1258
+    },                            /* alias for WIN1258 */
+    {
+        "win", PG_WIN1251
+    },                            /* _dirty_ alias for windows-1251 (backward
+                                 * compatibility) */
+    {
+        "win1250", PG_WIN1250
+    },                            /* alias for Windows-1250 */
+    {
+        "win1251", PG_WIN1251
+    },                            /* alias for Windows-1251 */
+    {
+        "win1252", PG_WIN1252
+    },                            /* alias for Windows-1252 */
+    {
+        "win1253", PG_WIN1253
+    },                            /* alias for Windows-1253 */
+    {
+        "win1254", PG_WIN1254
+    },                            /* alias for Windows-1254 */
+    {
+        "win1255", PG_WIN1255
+    },                            /* alias for Windows-1255 */
+    {
+        "win1256", PG_WIN1256
+    },                            /* alias for Windows-1256 */
+    {
+        "win1257", PG_WIN1257
+    },                            /* alias for Windows-1257 */
+    {
+        "win1258", PG_WIN1258
+    },                            /* alias for Windows-1258 */
+    {
+        "win866", PG_WIN866
+    },                            /* IBM866 */
+    {
+        "win874", PG_WIN874
+    },                            /* alias for Windows-874 */
+    {
+        "win932", PG_SJIS
+    },                            /* alias for Shift_JIS */
+    {
+        "win936", PG_GBK
+    },                            /* alias for GBK */
+    {
+        "win949", PG_UHC
+    },                            /* alias for UHC */
+    {
+        "win950", PG_BIG5
+    },                            /* alias for BIG5 */
+    {
+        "windows1250", PG_WIN1250
+    },                            /* Windows-1251; Microsoft */
+    {
+        "windows1251", PG_WIN1251
+    },                            /* Windows-1251; Microsoft */
+    {
+        "windows1252", PG_WIN1252
+    },                            /* Windows-1252; Microsoft */
+    {
+        "windows1253", PG_WIN1253
+    },                            /* Windows-1253; Microsoft */
+    {
+        "windows1254", PG_WIN1254
+    },                            /* Windows-1254; Microsoft */
+    {
+        "windows1255", PG_WIN1255
+    },                            /* Windows-1255; Microsoft */
+    {
+        "windows1256", PG_WIN1256
+    },                            /* Windows-1256; Microsoft */
+    {
+        "windows1257", PG_WIN1257
+    },                            /* Windows-1257; Microsoft */
+    {
+        "windows1258", PG_WIN1258
+    },                            /* Windows-1258; Microsoft */
+    {
+        "windows866", PG_WIN866
+    },                            /* IBM866 */
+    {
+        "windows874", PG_WIN874
+    },                            /* Windows-874; Microsoft */
+    {
+        "windows932", PG_SJIS
+    },                            /* alias for Shift_JIS */
+    {
+        "windows936", PG_GBK
+    },                            /* alias for GBK */
+    {
+        "windows949", PG_UHC
+    },                            /* alias for UHC */
+    {
+        "windows950", PG_BIG5
+    }                            /* alias for BIG5 */
+};
+
+/* ----------
+ * These are "official" encoding names.
+ * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ * ----------
+ */
+#ifndef WIN32
+#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
+#else
+#define DEF_ENC2NAME(name, codepage) { #name, PG_##name, codepage }
+#endif
+const pg_enc2name pg_enc2name_tbl[] =
+{
+    DEF_ENC2NAME(SQL_ASCII, 0),
+    DEF_ENC2NAME(EUC_JP, 20932),
+    DEF_ENC2NAME(EUC_CN, 20936),
+    DEF_ENC2NAME(EUC_KR, 51949),
+    DEF_ENC2NAME(EUC_TW, 0),
+    DEF_ENC2NAME(EUC_JIS_2004, 20932),
+    DEF_ENC2NAME(UTF8, 65001),
+    DEF_ENC2NAME(MULE_INTERNAL, 0),
+    DEF_ENC2NAME(LATIN1, 28591),
+    DEF_ENC2NAME(LATIN2, 28592),
+    DEF_ENC2NAME(LATIN3, 28593),
+    DEF_ENC2NAME(LATIN4, 28594),
+    DEF_ENC2NAME(LATIN5, 28599),
+    DEF_ENC2NAME(LATIN6, 0),
+    DEF_ENC2NAME(LATIN7, 0),
+    DEF_ENC2NAME(LATIN8, 0),
+    DEF_ENC2NAME(LATIN9, 28605),
+    DEF_ENC2NAME(LATIN10, 0),
+    DEF_ENC2NAME(WIN1256, 1256),
+    DEF_ENC2NAME(WIN1258, 1258),
+    DEF_ENC2NAME(WIN866, 866),
+    DEF_ENC2NAME(WIN874, 874),
+    DEF_ENC2NAME(KOI8R, 20866),
+    DEF_ENC2NAME(WIN1251, 1251),
+    DEF_ENC2NAME(WIN1252, 1252),
+    DEF_ENC2NAME(ISO_8859_5, 28595),
+    DEF_ENC2NAME(ISO_8859_6, 28596),
+    DEF_ENC2NAME(ISO_8859_7, 28597),
+    DEF_ENC2NAME(ISO_8859_8, 28598),
+    DEF_ENC2NAME(WIN1250, 1250),
+    DEF_ENC2NAME(WIN1253, 1253),
+    DEF_ENC2NAME(WIN1254, 1254),
+    DEF_ENC2NAME(WIN1255, 1255),
+    DEF_ENC2NAME(WIN1257, 1257),
+    DEF_ENC2NAME(KOI8U, 21866),
+    DEF_ENC2NAME(SJIS, 932),
+    DEF_ENC2NAME(BIG5, 950),
+    DEF_ENC2NAME(GBK, 936),
+    DEF_ENC2NAME(UHC, 949),
+    DEF_ENC2NAME(GB18030, 54936),
+    DEF_ENC2NAME(JOHAB, 0),
+    DEF_ENC2NAME(SHIFT_JIS_2004, 932)
+};
+
+/* ----------
+ * These are encoding names for gettext.
+ *
+ * This covers all encodings except MULE_INTERNAL, which is alien to gettext.
+ * ----------
+ */
+const pg_enc2gettext pg_enc2gettext_tbl[] =
+{
+    {PG_SQL_ASCII, "US-ASCII"},
+    {PG_UTF8, "UTF-8"},
+    {PG_LATIN1, "LATIN1"},
+    {PG_LATIN2, "LATIN2"},
+    {PG_LATIN3, "LATIN3"},
+    {PG_LATIN4, "LATIN4"},
+    {PG_ISO_8859_5, "ISO-8859-5"},
+    {PG_ISO_8859_6, "ISO_8859-6"},
+    {PG_ISO_8859_7, "ISO-8859-7"},
+    {PG_ISO_8859_8, "ISO-8859-8"},
+    {PG_LATIN5, "LATIN5"},
+    {PG_LATIN6, "LATIN6"},
+    {PG_LATIN7, "LATIN7"},
+    {PG_LATIN8, "LATIN8"},
+    {PG_LATIN9, "LATIN-9"},
+    {PG_LATIN10, "LATIN10"},
+    {PG_KOI8R, "KOI8-R"},
+    {PG_KOI8U, "KOI8-U"},
+    {PG_WIN1250, "CP1250"},
+    {PG_WIN1251, "CP1251"},
+    {PG_WIN1252, "CP1252"},
+    {PG_WIN1253, "CP1253"},
+    {PG_WIN1254, "CP1254"},
+    {PG_WIN1255, "CP1255"},
+    {PG_WIN1256, "CP1256"},
+    {PG_WIN1257, "CP1257"},
+    {PG_WIN1258, "CP1258"},
+    {PG_WIN866, "CP866"},
+    {PG_WIN874, "CP874"},
+    {PG_EUC_CN, "EUC-CN"},
+    {PG_EUC_JP, "EUC-JP"},
+    {PG_EUC_KR, "EUC-KR"},
+    {PG_EUC_TW, "EUC-TW"},
+    {PG_EUC_JIS_2004, "EUC-JP"},
+    {PG_SJIS, "SHIFT-JIS"},
+    {PG_BIG5, "BIG5"},
+    {PG_GBK, "GBK"},
+    {PG_UHC, "UHC"},
+    {PG_GB18030, "GB18030"},
+    {PG_JOHAB, "JOHAB"},
+    {PG_SHIFT_JIS_2004, "SHIFT_JISX0213"},
+    {0, NULL}
+};
+
+
+#ifndef FRONTEND
+
+/*
+ * Table of encoding names for ICU
+ *
+ * Reference: <https://ssl.icu-project.org/icu-bin/convexp>
+ *
+ * NULL entries are not supported by ICU, or their mapping is unclear.
+ */
+static const char *const pg_enc2icu_tbl[] =
+{
+    NULL,                        /* PG_SQL_ASCII */
+    "EUC-JP",                    /* PG_EUC_JP */
+    "EUC-CN",                    /* PG_EUC_CN */
+    "EUC-KR",                    /* PG_EUC_KR */
+    "EUC-TW",                    /* PG_EUC_TW */
+    NULL,                        /* PG_EUC_JIS_2004 */
+    "UTF-8",                    /* PG_UTF8 */
+    NULL,                        /* PG_MULE_INTERNAL */
+    "ISO-8859-1",                /* PG_LATIN1 */
+    "ISO-8859-2",                /* PG_LATIN2 */
+    "ISO-8859-3",                /* PG_LATIN3 */
+    "ISO-8859-4",                /* PG_LATIN4 */
+    "ISO-8859-9",                /* PG_LATIN5 */
+    "ISO-8859-10",                /* PG_LATIN6 */
+    "ISO-8859-13",                /* PG_LATIN7 */
+    "ISO-8859-14",                /* PG_LATIN8 */
+    "ISO-8859-15",                /* PG_LATIN9 */
+    NULL,                        /* PG_LATIN10 */
+    "CP1256",                    /* PG_WIN1256 */
+    "CP1258",                    /* PG_WIN1258 */
+    "CP866",                    /* PG_WIN866 */
+    NULL,                        /* PG_WIN874 */
+    "KOI8-R",                    /* PG_KOI8R */
+    "CP1251",                    /* PG_WIN1251 */
+    "CP1252",                    /* PG_WIN1252 */
+    "ISO-8859-5",                /* PG_ISO_8859_5 */
+    "ISO-8859-6",                /* PG_ISO_8859_6 */
+    "ISO-8859-7",                /* PG_ISO_8859_7 */
+    "ISO-8859-8",                /* PG_ISO_8859_8 */
+    "CP1250",                    /* PG_WIN1250 */
+    "CP1253",                    /* PG_WIN1253 */
+    "CP1254",                    /* PG_WIN1254 */
+    "CP1255",                    /* PG_WIN1255 */
+    "CP1257",                    /* PG_WIN1257 */
+    "KOI8-U",                    /* PG_KOI8U */
+};
+
+bool
+is_encoding_supported_by_icu(int encoding)
+{
+    return (pg_enc2icu_tbl[encoding] != NULL);
+}
+
+const char *
+get_encoding_name_for_icu(int encoding)
+{
+    const char *icu_encoding_name;
+
+    StaticAssertStmt(lengthof(pg_enc2icu_tbl) == PG_ENCODING_BE_LAST + 1,
+                     "pg_enc2icu_tbl incomplete");
+
+    icu_encoding_name = pg_enc2icu_tbl[encoding];
+
+    if (!icu_encoding_name)
+        ereport(ERROR,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                 errmsg("encoding \"%s\" not supported by ICU",
+                        pg_encoding_to_char(encoding))));
+
+    return icu_encoding_name;
+}
+
+#endif                            /* not FRONTEND */
+
+
+/* ----------
+ * Encoding checks, for error returns -1 else encoding id
+ * ----------
+ */
+int
+pg_valid_client_encoding(const char *name)
+{
+    int            enc;
+
+    if ((enc = pg_char_to_encoding(name)) < 0)
+        return -1;
+
+    if (!PG_VALID_FE_ENCODING(enc))
+        return -1;
+
+    return enc;
+}
+
+int
+pg_valid_server_encoding(const char *name)
+{
+    int            enc;
+
+    if ((enc = pg_char_to_encoding(name)) < 0)
+        return -1;
+
+    if (!PG_VALID_BE_ENCODING(enc))
+        return -1;
+
+    return enc;
+}
+
+int
+pg_valid_server_encoding_id(int encoding)
+{
+    return PG_VALID_BE_ENCODING(encoding);
+}
+
+/* ----------
+ * Remove irrelevant chars from encoding name
+ * ----------
+ */
+static char *
+clean_encoding_name(const char *key, char *newkey)
+{
+    const char *p;
+    char       *np;
+
+    for (p = key, np = newkey; *p != '\0'; p++)
+    {
+        if (isalnum((unsigned char) *p))
+        {
+            if (*p >= 'A' && *p <= 'Z')
+                *np++ = *p + 'a' - 'A';
+            else
+                *np++ = *p;
+        }
+    }
+    *np = '\0';
+    return newkey;
+}
+
+/* ----------
+ * Search encoding by encoding name
+ *
+ * Returns encoding ID, or -1 for error
+ * ----------
+ */
+int
+pg_char_to_encoding(const char *name)
+{
+    unsigned int nel = lengthof(pg_encname_tbl);
+    const pg_encname *base = pg_encname_tbl,
+               *last = base + nel - 1,
+               *position;
+    int            result;
+    char        buff[NAMEDATALEN],
+               *key;
+
+    if (name == NULL || *name == '\0')
+        return -1;
+
+    if (strlen(name) >= NAMEDATALEN)
+    {
+#ifdef FRONTEND
+        fprintf(stderr, "encoding name too long\n");
+        return -1;
+#else
+        ereport(ERROR,
+                (errcode(ERRCODE_NAME_TOO_LONG),
+                 errmsg("encoding name too long")));
+#endif
+    }
+    key = clean_encoding_name(name, buff);
+
+    while (last >= base)
+    {
+        position = base + ((last - base) >> 1);
+        result = key[0] - position->name[0];
+
+        if (result == 0)
+        {
+            result = strcmp(key, position->name);
+            if (result == 0)
+                return position->encoding;
+        }
+        if (result < 0)
+            last = position - 1;
+        else
+            base = position + 1;
+    }
+    return -1;
+}
+
+#ifndef FRONTEND
+Datum
+PG_char_to_encoding(PG_FUNCTION_ARGS)
+{
+    Name        s = PG_GETARG_NAME(0);
+
+    PG_RETURN_INT32(pg_char_to_encoding(NameStr(*s)));
+}
+#endif
+
+const char *
+pg_encoding_to_char(int encoding)
+{
+    if (PG_VALID_ENCODING(encoding))
+    {
+        const pg_enc2name *p = &pg_enc2name_tbl[encoding];
+
+        Assert(encoding == p->encoding);
+        return p->name;
+    }
+    return "";
+}
+
+#ifndef FRONTEND
+Datum
+PG_encoding_to_char(PG_FUNCTION_ARGS)
+{
+    int32        encoding = PG_GETARG_INT32(0);
+    const char *encoding_name = pg_encoding_to_char(encoding);
+
+    return DirectFunctionCall1(namein, CStringGetDatum(encoding_name));
+}
+
+#endif
diff --git a/src/common/saslprep.c b/src/common/saslprep.c
index 2a2449e..7739b81 100644
--- a/src/common/saslprep.c
+++ b/src/common/saslprep.c
@@ -27,12 +27,6 @@

 #include "common/saslprep.h"
 #include "common/unicode_norm.h"
-
-/*
- * Note: The functions in this file depend on functions from
- * src/backend/utils/mb/wchar.c, so in order to use this in frontend
- * code, you will need to link that in, too.
- */
 #include "mb/pg_wchar.h"

 /*
diff --git a/src/common/wchar.c b/src/common/wchar.c
new file mode 100644
index 0000000..74a8823
--- /dev/null
+++ b/src/common/wchar.c
@@ -0,0 +1,2041 @@
+/*-------------------------------------------------------------------------
+ *
+ * wchar.c
+ *      Functions for working with multibyte characters in various encodings.
+ *
+ * Portions Copyright (c) 1998-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *      src/common/wchar.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
+#include "postgres.h"
+#endif
+
+#include "mb/pg_wchar.h"
+
+
+/*
+ * Operations on multi-byte encodings are driven by a table of helper
+ * functions.
+ *
+ * To add an encoding support, define mblen(), dsplen() and verifier() for
+ * the encoding.  For server-encodings, also define mb2wchar() and wchar2mb()
+ * conversion functions.
+ *
+ * These functions generally assume that their input is validly formed.
+ * The "verifier" functions, further down in the file, have to be more
+ * paranoid.
+ *
+ * We expect that mblen() does not need to examine more than the first byte
+ * of the character to discover the correct length.  GB18030 is an exception
+ * to that rule, though, as it also looks at second byte.  But even that
+ * behaves in a predictable way, if you only pass the first byte: it will
+ * treat 4-byte encoded characters as two 2-byte encoded characters, which is
+ * good enough for all current uses.
+ *
+ * Note: for the display output of psql to work properly, the return values
+ * of the dsplen functions must conform to the Unicode standard. In particular
+ * the NUL character is zero width and control characters are generally
+ * width -1. It is recommended that non-ASCII encodings refer their ASCII
+ * subset to the ASCII routines to ensure consistency.
+ */
+
+/*
+ * SQL/ASCII
+ */
+static int
+pg_ascii2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        *to++ = *from++;
+        len--;
+        cnt++;
+    }
+    *to = 0;
+    return cnt;
+}
+
+static int
+pg_ascii_mblen(const unsigned char *s)
+{
+    return 1;
+}
+
+static int
+pg_ascii_dsplen(const unsigned char *s)
+{
+    if (*s == '\0')
+        return 0;
+    if (*s < 0x20 || *s == 0x7f)
+        return -1;
+
+    return 1;
+}
+
+/*
+ * EUC
+ */
+static int
+pg_euc2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        if (*from == SS2 && len >= 2)    /* JIS X 0201 (so called "1 byte
+                                         * KANA") */
+        {
+            from++;
+            *to = (SS2 << 8) | *from++;
+            len -= 2;
+        }
+        else if (*from == SS3 && len >= 3)    /* JIS X 0212 KANJI */
+        {
+            from++;
+            *to = (SS3 << 16) | (*from++ << 8);
+            *to |= *from++;
+            len -= 3;
+        }
+        else if (IS_HIGHBIT_SET(*from) && len >= 2) /* JIS X 0208 KANJI */
+        {
+            *to = *from++ << 8;
+            *to |= *from++;
+            len -= 2;
+        }
+        else                    /* must be ASCII */
+        {
+            *to = *from++;
+            len--;
+        }
+        to++;
+        cnt++;
+    }
+    *to = 0;
+    return cnt;
+}
+
+static inline int
+pg_euc_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if (*s == SS2)
+        len = 2;
+    else if (*s == SS3)
+        len = 3;
+    else if (IS_HIGHBIT_SET(*s))
+        len = 2;
+    else
+        len = 1;
+    return len;
+}
+
+static inline int
+pg_euc_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    if (*s == SS2)
+        len = 2;
+    else if (*s == SS3)
+        len = 2;
+    else if (IS_HIGHBIT_SET(*s))
+        len = 2;
+    else
+        len = pg_ascii_dsplen(s);
+    return len;
+}
+
+/*
+ * EUC_JP
+ */
+static int
+pg_eucjp2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+    return pg_euc2wchar_with_len(from, to, len);
+}
+
+static int
+pg_eucjp_mblen(const unsigned char *s)
+{
+    return pg_euc_mblen(s);
+}
+
+static int
+pg_eucjp_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    if (*s == SS2)
+        len = 1;
+    else if (*s == SS3)
+        len = 2;
+    else if (IS_HIGHBIT_SET(*s))
+        len = 2;
+    else
+        len = pg_ascii_dsplen(s);
+    return len;
+}
+
+/*
+ * EUC_KR
+ */
+static int
+pg_euckr2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+    return pg_euc2wchar_with_len(from, to, len);
+}
+
+static int
+pg_euckr_mblen(const unsigned char *s)
+{
+    return pg_euc_mblen(s);
+}
+
+static int
+pg_euckr_dsplen(const unsigned char *s)
+{
+    return pg_euc_dsplen(s);
+}
+
+/*
+ * EUC_CN
+ *
+ */
+static int
+pg_euccn2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        if (*from == SS2 && len >= 3)    /* code set 2 (unused?) */
+        {
+            from++;
+            *to = (SS2 << 16) | (*from++ << 8);
+            *to |= *from++;
+            len -= 3;
+        }
+        else if (*from == SS3 && len >= 3)    /* code set 3 (unused ?) */
+        {
+            from++;
+            *to = (SS3 << 16) | (*from++ << 8);
+            *to |= *from++;
+            len -= 3;
+        }
+        else if (IS_HIGHBIT_SET(*from) && len >= 2) /* code set 1 */
+        {
+            *to = *from++ << 8;
+            *to |= *from++;
+            len -= 2;
+        }
+        else
+        {
+            *to = *from++;
+            len--;
+        }
+        to++;
+        cnt++;
+    }
+    *to = 0;
+    return cnt;
+}
+
+static int
+pg_euccn_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_HIGHBIT_SET(*s))
+        len = 2;
+    else
+        len = 1;
+    return len;
+}
+
+static int
+pg_euccn_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_HIGHBIT_SET(*s))
+        len = 2;
+    else
+        len = pg_ascii_dsplen(s);
+    return len;
+}
+
+/*
+ * EUC_TW
+ *
+ */
+static int
+pg_euctw2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        if (*from == SS2 && len >= 4)    /* code set 2 */
+        {
+            from++;
+            *to = (((uint32) SS2) << 24) | (*from++ << 16);
+            *to |= *from++ << 8;
+            *to |= *from++;
+            len -= 4;
+        }
+        else if (*from == SS3 && len >= 3)    /* code set 3 (unused?) */
+        {
+            from++;
+            *to = (SS3 << 16) | (*from++ << 8);
+            *to |= *from++;
+            len -= 3;
+        }
+        else if (IS_HIGHBIT_SET(*from) && len >= 2) /* code set 2 */
+        {
+            *to = *from++ << 8;
+            *to |= *from++;
+            len -= 2;
+        }
+        else
+        {
+            *to = *from++;
+            len--;
+        }
+        to++;
+        cnt++;
+    }
+    *to = 0;
+    return cnt;
+}
+
+static int
+pg_euctw_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if (*s == SS2)
+        len = 4;
+    else if (*s == SS3)
+        len = 3;
+    else if (IS_HIGHBIT_SET(*s))
+        len = 2;
+    else
+        len = 1;
+    return len;
+}
+
+static int
+pg_euctw_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    if (*s == SS2)
+        len = 2;
+    else if (*s == SS3)
+        len = 2;
+    else if (IS_HIGHBIT_SET(*s))
+        len = 2;
+    else
+        len = pg_ascii_dsplen(s);
+    return len;
+}
+
+/*
+ * Convert pg_wchar to EUC_* encoding.
+ * caller must allocate enough space for "to", including a trailing zero!
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_wchar2euc_with_len(const pg_wchar *from, unsigned char *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        unsigned char c;
+
+        if ((c = (*from >> 24)))
+        {
+            *to++ = c;
+            *to++ = (*from >> 16) & 0xff;
+            *to++ = (*from >> 8) & 0xff;
+            *to++ = *from & 0xff;
+            cnt += 4;
+        }
+        else if ((c = (*from >> 16)))
+        {
+            *to++ = c;
+            *to++ = (*from >> 8) & 0xff;
+            *to++ = *from & 0xff;
+            cnt += 3;
+        }
+        else if ((c = (*from >> 8)))
+        {
+            *to++ = c;
+            *to++ = *from & 0xff;
+            cnt += 2;
+        }
+        else
+        {
+            *to++ = *from;
+            cnt++;
+        }
+        from++;
+        len--;
+    }
+    *to = 0;
+    return cnt;
+}
+
+
+/*
+ * JOHAB
+ */
+static int
+pg_johab_mblen(const unsigned char *s)
+{
+    return pg_euc_mblen(s);
+}
+
+static int
+pg_johab_dsplen(const unsigned char *s)
+{
+    return pg_euc_dsplen(s);
+}
+
+/*
+ * convert UTF8 string to pg_wchar (UCS-4)
+ * caller must allocate enough space for "to", including a trailing zero!
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_utf2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+    int            cnt = 0;
+    uint32        c1,
+                c2,
+                c3,
+                c4;
+
+    while (len > 0 && *from)
+    {
+        if ((*from & 0x80) == 0)
+        {
+            *to = *from++;
+            len--;
+        }
+        else if ((*from & 0xe0) == 0xc0)
+        {
+            if (len < 2)
+                break;            /* drop trailing incomplete char */
+            c1 = *from++ & 0x1f;
+            c2 = *from++ & 0x3f;
+            *to = (c1 << 6) | c2;
+            len -= 2;
+        }
+        else if ((*from & 0xf0) == 0xe0)
+        {
+            if (len < 3)
+                break;            /* drop trailing incomplete char */
+            c1 = *from++ & 0x0f;
+            c2 = *from++ & 0x3f;
+            c3 = *from++ & 0x3f;
+            *to = (c1 << 12) | (c2 << 6) | c3;
+            len -= 3;
+        }
+        else if ((*from & 0xf8) == 0xf0)
+        {
+            if (len < 4)
+                break;            /* drop trailing incomplete char */
+            c1 = *from++ & 0x07;
+            c2 = *from++ & 0x3f;
+            c3 = *from++ & 0x3f;
+            c4 = *from++ & 0x3f;
+            *to = (c1 << 18) | (c2 << 12) | (c3 << 6) | c4;
+            len -= 4;
+        }
+        else
+        {
+            /* treat a bogus char as length 1; not ours to raise error */
+            *to = *from++;
+            len--;
+        }
+        to++;
+        cnt++;
+    }
+    *to = 0;
+    return cnt;
+}
+
+
+/*
+ * Map a Unicode code point to UTF-8.  utf8string must have 4 bytes of
+ * space allocated.
+ */
+unsigned char *
+unicode_to_utf8(pg_wchar c, unsigned char *utf8string)
+{
+    if (c <= 0x7F)
+    {
+        utf8string[0] = c;
+    }
+    else if (c <= 0x7FF)
+    {
+        utf8string[0] = 0xC0 | ((c >> 6) & 0x1F);
+        utf8string[1] = 0x80 | (c & 0x3F);
+    }
+    else if (c <= 0xFFFF)
+    {
+        utf8string[0] = 0xE0 | ((c >> 12) & 0x0F);
+        utf8string[1] = 0x80 | ((c >> 6) & 0x3F);
+        utf8string[2] = 0x80 | (c & 0x3F);
+    }
+    else
+    {
+        utf8string[0] = 0xF0 | ((c >> 18) & 0x07);
+        utf8string[1] = 0x80 | ((c >> 12) & 0x3F);
+        utf8string[2] = 0x80 | ((c >> 6) & 0x3F);
+        utf8string[3] = 0x80 | (c & 0x3F);
+    }
+
+    return utf8string;
+}
+
+/*
+ * Trivial conversion from pg_wchar to UTF-8.
+ * caller should allocate enough space for "to"
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_wchar2utf_with_len(const pg_wchar *from, unsigned char *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        int            char_len;
+
+        unicode_to_utf8(*from, to);
+        char_len = pg_utf_mblen(to);
+        cnt += char_len;
+        to += char_len;
+        from++;
+        len--;
+    }
+    *to = 0;
+    return cnt;
+}
+
+/*
+ * Return the byte length of a UTF8 character pointed to by s
+ *
+ * Note: in the current implementation we do not support UTF8 sequences
+ * of more than 4 bytes; hence do NOT return a value larger than 4.
+ * We return "1" for any leading byte that is either flat-out illegal or
+ * indicates a length larger than we support.
+ *
+ * pg_utf2wchar_with_len(), utf8_to_unicode(), pg_utf8_islegal(), and perhaps
+ * other places would need to be fixed to change this.
+ */
+int
+pg_utf_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if ((*s & 0x80) == 0)
+        len = 1;
+    else if ((*s & 0xe0) == 0xc0)
+        len = 2;
+    else if ((*s & 0xf0) == 0xe0)
+        len = 3;
+    else if ((*s & 0xf8) == 0xf0)
+        len = 4;
+#ifdef NOT_USED
+    else if ((*s & 0xfc) == 0xf8)
+        len = 5;
+    else if ((*s & 0xfe) == 0xfc)
+        len = 6;
+#endif
+    else
+        len = 1;
+    return len;
+}
+
+/*
+ * This is an implementation of wcwidth() and wcswidth() as defined in
+ * "The Single UNIX Specification, Version 2, The Open Group, 1997"
+ * <http://www.unix.org/online.html>
+ *
+ * Markus Kuhn -- 2001-09-08 -- public domain
+ *
+ * customised for PostgreSQL
+ *
+ * original available at : http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
+ */
+
+struct mbinterval
+{
+    unsigned short first;
+    unsigned short last;
+};
+
+/* auxiliary function for binary search in interval table */
+static int
+mbbisearch(pg_wchar ucs, const struct mbinterval *table, int max)
+{
+    int            min = 0;
+    int            mid;
+
+    if (ucs < table[0].first || ucs > table[max].last)
+        return 0;
+    while (max >= min)
+    {
+        mid = (min + max) / 2;
+        if (ucs > table[mid].last)
+            min = mid + 1;
+        else if (ucs < table[mid].first)
+            max = mid - 1;
+        else
+            return 1;
+    }
+
+    return 0;
+}
+
+
+/* The following functions define the column width of an ISO 10646
+ * character as follows:
+ *
+ *      - The null character (U+0000) has a column width of 0.
+ *
+ *      - Other C0/C1 control characters and DEL will lead to a return
+ *        value of -1.
+ *
+ *      - Non-spacing and enclosing combining characters (general
+ *        category code Mn or Me in the Unicode database) have a
+ *        column width of 0.
+ *
+ *      - Other format characters (general category code Cf in the Unicode
+ *        database) and ZERO WIDTH SPACE (U+200B) have a column width of 0.
+ *
+ *      - Hangul Jamo medial vowels and final consonants (U+1160-U+11FF)
+ *        have a column width of 0.
+ *
+ *      - Spacing characters in the East Asian Wide (W) or East Asian
+ *        FullWidth (F) category as defined in Unicode Technical
+ *        Report #11 have a column width of 2.
+ *
+ *      - All remaining characters (including all printable
+ *        ISO 8859-1 and WGL4 characters, Unicode control characters,
+ *        etc.) have a column width of 1.
+ *
+ * This implementation assumes that wchar_t characters are encoded
+ * in ISO 10646.
+ */
+
+static int
+ucs_wcwidth(pg_wchar ucs)
+{
+#include "common/unicode_combining_table.h"
+
+    /* test for 8-bit control characters */
+    if (ucs == 0)
+        return 0;
+
+    if (ucs < 0x20 || (ucs >= 0x7f && ucs < 0xa0) || ucs > 0x0010ffff)
+        return -1;
+
+    /* binary search in table of non-spacing characters */
+    if (mbbisearch(ucs, combining,
+                   sizeof(combining) / sizeof(struct mbinterval) - 1))
+        return 0;
+
+    /*
+     * if we arrive here, ucs is not a combining or C0/C1 control character
+     */
+
+    return 1 +
+        (ucs >= 0x1100 &&
+         (ucs <= 0x115f ||        /* Hangul Jamo init. consonants */
+          (ucs >= 0x2e80 && ucs <= 0xa4cf && (ucs & ~0x0011) != 0x300a &&
+           ucs != 0x303f) ||    /* CJK ... Yi */
+          (ucs >= 0xac00 && ucs <= 0xd7a3) ||    /* Hangul Syllables */
+          (ucs >= 0xf900 && ucs <= 0xfaff) ||    /* CJK Compatibility
+                                                 * Ideographs */
+          (ucs >= 0xfe30 && ucs <= 0xfe6f) ||    /* CJK Compatibility Forms */
+          (ucs >= 0xff00 && ucs <= 0xff5f) ||    /* Fullwidth Forms */
+          (ucs >= 0xffe0 && ucs <= 0xffe6) ||
+          (ucs >= 0x20000 && ucs <= 0x2ffff)));
+}
+
+/*
+ * Convert a UTF-8 character to a Unicode code point.
+ * This is a one-character version of pg_utf2wchar_with_len.
+ *
+ * No error checks here, c must point to a long-enough string.
+ */
+pg_wchar
+utf8_to_unicode(const unsigned char *c)
+{
+    if ((*c & 0x80) == 0)
+        return (pg_wchar) c[0];
+    else if ((*c & 0xe0) == 0xc0)
+        return (pg_wchar) (((c[0] & 0x1f) << 6) |
+                           (c[1] & 0x3f));
+    else if ((*c & 0xf0) == 0xe0)
+        return (pg_wchar) (((c[0] & 0x0f) << 12) |
+                           ((c[1] & 0x3f) << 6) |
+                           (c[2] & 0x3f));
+    else if ((*c & 0xf8) == 0xf0)
+        return (pg_wchar) (((c[0] & 0x07) << 18) |
+                           ((c[1] & 0x3f) << 12) |
+                           ((c[2] & 0x3f) << 6) |
+                           (c[3] & 0x3f));
+    else
+        /* that is an invalid code on purpose */
+        return 0xffffffff;
+}
+
+static int
+pg_utf_dsplen(const unsigned char *s)
+{
+    return ucs_wcwidth(utf8_to_unicode(s));
+}
+
+/*
+ * convert mule internal code to pg_wchar
+ * caller should allocate enough space for "to"
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_mule2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        if (IS_LC1(*from) && len >= 2)
+        {
+            *to = *from++ << 16;
+            *to |= *from++;
+            len -= 2;
+        }
+        else if (IS_LCPRV1(*from) && len >= 3)
+        {
+            from++;
+            *to = *from++ << 16;
+            *to |= *from++;
+            len -= 3;
+        }
+        else if (IS_LC2(*from) && len >= 3)
+        {
+            *to = *from++ << 16;
+            *to |= *from++ << 8;
+            *to |= *from++;
+            len -= 3;
+        }
+        else if (IS_LCPRV2(*from) && len >= 4)
+        {
+            from++;
+            *to = *from++ << 16;
+            *to |= *from++ << 8;
+            *to |= *from++;
+            len -= 4;
+        }
+        else
+        {                        /* assume ASCII */
+            *to = (unsigned char) *from++;
+            len--;
+        }
+        to++;
+        cnt++;
+    }
+    *to = 0;
+    return cnt;
+}
+
+/*
+ * convert pg_wchar to mule internal code
+ * caller should allocate enough space for "to"
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_wchar2mule_with_len(const pg_wchar *from, unsigned char *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        unsigned char lb;
+
+        lb = (*from >> 16) & 0xff;
+        if (IS_LC1(lb))
+        {
+            *to++ = lb;
+            *to++ = *from & 0xff;
+            cnt += 2;
+        }
+        else if (IS_LC2(lb))
+        {
+            *to++ = lb;
+            *to++ = (*from >> 8) & 0xff;
+            *to++ = *from & 0xff;
+            cnt += 3;
+        }
+        else if (IS_LCPRV1_A_RANGE(lb))
+        {
+            *to++ = LCPRV1_A;
+            *to++ = lb;
+            *to++ = *from & 0xff;
+            cnt += 3;
+        }
+        else if (IS_LCPRV1_B_RANGE(lb))
+        {
+            *to++ = LCPRV1_B;
+            *to++ = lb;
+            *to++ = *from & 0xff;
+            cnt += 3;
+        }
+        else if (IS_LCPRV2_A_RANGE(lb))
+        {
+            *to++ = LCPRV2_A;
+            *to++ = lb;
+            *to++ = (*from >> 8) & 0xff;
+            *to++ = *from & 0xff;
+            cnt += 4;
+        }
+        else if (IS_LCPRV2_B_RANGE(lb))
+        {
+            *to++ = LCPRV2_B;
+            *to++ = lb;
+            *to++ = (*from >> 8) & 0xff;
+            *to++ = *from & 0xff;
+            cnt += 4;
+        }
+        else
+        {
+            *to++ = *from & 0xff;
+            cnt += 1;
+        }
+        from++;
+        len--;
+    }
+    *to = 0;
+    return cnt;
+}
+
+int
+pg_mule_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_LC1(*s))
+        len = 2;
+    else if (IS_LCPRV1(*s))
+        len = 3;
+    else if (IS_LC2(*s))
+        len = 3;
+    else if (IS_LCPRV2(*s))
+        len = 4;
+    else
+        len = 1;                /* assume ASCII */
+    return len;
+}
+
+static int
+pg_mule_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    /*
+     * Note: it's not really appropriate to assume that all multibyte charsets
+     * are double-wide on screen.  But this seems an okay approximation for
+     * the MULE charsets we currently support.
+     */
+
+    if (IS_LC1(*s))
+        len = 1;
+    else if (IS_LCPRV1(*s))
+        len = 1;
+    else if (IS_LC2(*s))
+        len = 2;
+    else if (IS_LCPRV2(*s))
+        len = 2;
+    else
+        len = 1;                /* assume ASCII */
+
+    return len;
+}
+
+/*
+ * ISO8859-1
+ */
+static int
+pg_latin12wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        *to++ = *from++;
+        len--;
+        cnt++;
+    }
+    *to = 0;
+    return cnt;
+}
+
+/*
+ * Trivial conversion from pg_wchar to single byte encoding. Just ignores
+ * high bits.
+ * caller should allocate enough space for "to"
+ * len: length of from.
+ * "from" not necessarily null terminated.
+ */
+static int
+pg_wchar2single_with_len(const pg_wchar *from, unsigned char *to, int len)
+{
+    int            cnt = 0;
+
+    while (len > 0 && *from)
+    {
+        *to++ = *from++;
+        len--;
+        cnt++;
+    }
+    *to = 0;
+    return cnt;
+}
+
+static int
+pg_latin1_mblen(const unsigned char *s)
+{
+    return 1;
+}
+
+static int
+pg_latin1_dsplen(const unsigned char *s)
+{
+    return pg_ascii_dsplen(s);
+}
+
+/*
+ * SJIS
+ */
+static int
+pg_sjis_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if (*s >= 0xa1 && *s <= 0xdf)
+        len = 1;                /* 1 byte kana? */
+    else if (IS_HIGHBIT_SET(*s))
+        len = 2;                /* kanji? */
+    else
+        len = 1;                /* should be ASCII */
+    return len;
+}
+
+static int
+pg_sjis_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    if (*s >= 0xa1 && *s <= 0xdf)
+        len = 1;                /* 1 byte kana? */
+    else if (IS_HIGHBIT_SET(*s))
+        len = 2;                /* kanji? */
+    else
+        len = pg_ascii_dsplen(s);    /* should be ASCII */
+    return len;
+}
+
+/*
+ * Big5
+ */
+static int
+pg_big5_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_HIGHBIT_SET(*s))
+        len = 2;                /* kanji? */
+    else
+        len = 1;                /* should be ASCII */
+    return len;
+}
+
+static int
+pg_big5_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_HIGHBIT_SET(*s))
+        len = 2;                /* kanji? */
+    else
+        len = pg_ascii_dsplen(s);    /* should be ASCII */
+    return len;
+}
+
+/*
+ * GBK
+ */
+static int
+pg_gbk_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_HIGHBIT_SET(*s))
+        len = 2;                /* kanji? */
+    else
+        len = 1;                /* should be ASCII */
+    return len;
+}
+
+static int
+pg_gbk_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_HIGHBIT_SET(*s))
+        len = 2;                /* kanji? */
+    else
+        len = pg_ascii_dsplen(s);    /* should be ASCII */
+    return len;
+}
+
+/*
+ * UHC
+ */
+static int
+pg_uhc_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_HIGHBIT_SET(*s))
+        len = 2;                /* 2byte? */
+    else
+        len = 1;                /* should be ASCII */
+    return len;
+}
+
+static int
+pg_uhc_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_HIGHBIT_SET(*s))
+        len = 2;                /* 2byte? */
+    else
+        len = pg_ascii_dsplen(s);    /* should be ASCII */
+    return len;
+}
+
+/*
+ * GB18030
+ *    Added by Bill Huang <bhuang@redhat.com>,<bill_huanghb@ybb.ne.jp>
+ */
+
+/*
+ * Unlike all other mblen() functions, this also looks at the second byte of
+ * the input.  However, if you only pass the first byte of a multi-byte
+ * string, and \0 as the second byte, this still works in a predictable way:
+ * a 4-byte character will be reported as two 2-byte characters.  That's
+ * enough for all current uses, as a client-only encoding.  It works that
+ * way, because in any valid 4-byte GB18030-encoded character, the third and
+ * fourth byte look like a 2-byte encoded character, when looked at
+ * separately.
+ */
+static int
+pg_gb18030_mblen(const unsigned char *s)
+{
+    int            len;
+
+    if (!IS_HIGHBIT_SET(*s))
+        len = 1;                /* ASCII */
+    else if (*(s + 1) >= 0x30 && *(s + 1) <= 0x39)
+        len = 4;
+    else
+        len = 2;
+    return len;
+}
+
+static int
+pg_gb18030_dsplen(const unsigned char *s)
+{
+    int            len;
+
+    if (IS_HIGHBIT_SET(*s))
+        len = 2;
+    else
+        len = pg_ascii_dsplen(s);    /* ASCII */
+    return len;
+}
+
+/*
+ *-------------------------------------------------------------------
+ * multibyte sequence validators
+ *
+ * These functions accept "s", a pointer to the first byte of a string,
+ * and "len", the remaining length of the string.  If there is a validly
+ * encoded character beginning at *s, return its length in bytes; else
+ * return -1.
+ *
+ * The functions can assume that len > 0 and that *s != '\0', but they must
+ * test for and reject zeroes in any additional bytes of a multibyte character.
+ *
+ * Note that this definition allows the function for a single-byte
+ * encoding to be just "return 1".
+ *-------------------------------------------------------------------
+ */
+
+static int
+pg_ascii_verifier(const unsigned char *s, int len)
+{
+    return 1;
+}
+
+#define IS_EUC_RANGE_VALID(c)    ((c) >= 0xa1 && (c) <= 0xfe)
+
+static int
+pg_eucjp_verifier(const unsigned char *s, int len)
+{
+    int            l;
+    unsigned char c1,
+                c2;
+
+    c1 = *s++;
+
+    switch (c1)
+    {
+        case SS2:                /* JIS X 0201 */
+            l = 2;
+            if (l > len)
+                return -1;
+            c2 = *s++;
+            if (c2 < 0xa1 || c2 > 0xdf)
+                return -1;
+            break;
+
+        case SS3:                /* JIS X 0212 */
+            l = 3;
+            if (l > len)
+                return -1;
+            c2 = *s++;
+            if (!IS_EUC_RANGE_VALID(c2))
+                return -1;
+            c2 = *s++;
+            if (!IS_EUC_RANGE_VALID(c2))
+                return -1;
+            break;
+
+        default:
+            if (IS_HIGHBIT_SET(c1)) /* JIS X 0208? */
+            {
+                l = 2;
+                if (l > len)
+                    return -1;
+                if (!IS_EUC_RANGE_VALID(c1))
+                    return -1;
+                c2 = *s++;
+                if (!IS_EUC_RANGE_VALID(c2))
+                    return -1;
+            }
+            else
+                /* must be ASCII */
+            {
+                l = 1;
+            }
+            break;
+    }
+
+    return l;
+}
+
+static int
+pg_euckr_verifier(const unsigned char *s, int len)
+{
+    int            l;
+    unsigned char c1,
+                c2;
+
+    c1 = *s++;
+
+    if (IS_HIGHBIT_SET(c1))
+    {
+        l = 2;
+        if (l > len)
+            return -1;
+        if (!IS_EUC_RANGE_VALID(c1))
+            return -1;
+        c2 = *s++;
+        if (!IS_EUC_RANGE_VALID(c2))
+            return -1;
+    }
+    else
+        /* must be ASCII */
+    {
+        l = 1;
+    }
+
+    return l;
+}
+
+/* EUC-CN byte sequences are exactly same as EUC-KR */
+#define pg_euccn_verifier    pg_euckr_verifier
+
+static int
+pg_euctw_verifier(const unsigned char *s, int len)
+{
+    int            l;
+    unsigned char c1,
+                c2;
+
+    c1 = *s++;
+
+    switch (c1)
+    {
+        case SS2:                /* CNS 11643 Plane 1-7 */
+            l = 4;
+            if (l > len)
+                return -1;
+            c2 = *s++;
+            if (c2 < 0xa1 || c2 > 0xa7)
+                return -1;
+            c2 = *s++;
+            if (!IS_EUC_RANGE_VALID(c2))
+                return -1;
+            c2 = *s++;
+            if (!IS_EUC_RANGE_VALID(c2))
+                return -1;
+            break;
+
+        case SS3:                /* unused */
+            return -1;
+
+        default:
+            if (IS_HIGHBIT_SET(c1)) /* CNS 11643 Plane 1 */
+            {
+                l = 2;
+                if (l > len)
+                    return -1;
+                /* no further range check on c1? */
+                c2 = *s++;
+                if (!IS_EUC_RANGE_VALID(c2))
+                    return -1;
+            }
+            else
+                /* must be ASCII */
+            {
+                l = 1;
+            }
+            break;
+    }
+    return l;
+}
+
+static int
+pg_johab_verifier(const unsigned char *s, int len)
+{
+    int            l,
+                mbl;
+    unsigned char c;
+
+    l = mbl = pg_johab_mblen(s);
+
+    if (len < l)
+        return -1;
+
+    if (!IS_HIGHBIT_SET(*s))
+        return mbl;
+
+    while (--l > 0)
+    {
+        c = *++s;
+        if (!IS_EUC_RANGE_VALID(c))
+            return -1;
+    }
+    return mbl;
+}
+
+static int
+pg_mule_verifier(const unsigned char *s, int len)
+{
+    int            l,
+                mbl;
+    unsigned char c;
+
+    l = mbl = pg_mule_mblen(s);
+
+    if (len < l)
+        return -1;
+
+    while (--l > 0)
+    {
+        c = *++s;
+        if (!IS_HIGHBIT_SET(c))
+            return -1;
+    }
+    return mbl;
+}
+
+static int
+pg_latin1_verifier(const unsigned char *s, int len)
+{
+    return 1;
+}
+
+static int
+pg_sjis_verifier(const unsigned char *s, int len)
+{
+    int            l,
+                mbl;
+    unsigned char c1,
+                c2;
+
+    l = mbl = pg_sjis_mblen(s);
+
+    if (len < l)
+        return -1;
+
+    if (l == 1)                    /* pg_sjis_mblen already verified it */
+        return mbl;
+
+    c1 = *s++;
+    c2 = *s;
+    if (!ISSJISHEAD(c1) || !ISSJISTAIL(c2))
+        return -1;
+    return mbl;
+}
+
+static int
+pg_big5_verifier(const unsigned char *s, int len)
+{
+    int            l,
+                mbl;
+
+    l = mbl = pg_big5_mblen(s);
+
+    if (len < l)
+        return -1;
+
+    while (--l > 0)
+    {
+        if (*++s == '\0')
+            return -1;
+    }
+
+    return mbl;
+}
+
+static int
+pg_gbk_verifier(const unsigned char *s, int len)
+{
+    int            l,
+                mbl;
+
+    l = mbl = pg_gbk_mblen(s);
+
+    if (len < l)
+        return -1;
+
+    while (--l > 0)
+    {
+        if (*++s == '\0')
+            return -1;
+    }
+
+    return mbl;
+}
+
+static int
+pg_uhc_verifier(const unsigned char *s, int len)
+{
+    int            l,
+                mbl;
+
+    l = mbl = pg_uhc_mblen(s);
+
+    if (len < l)
+        return -1;
+
+    while (--l > 0)
+    {
+        if (*++s == '\0')
+            return -1;
+    }
+
+    return mbl;
+}
+
+static int
+pg_gb18030_verifier(const unsigned char *s, int len)
+{
+    int            l;
+
+    if (!IS_HIGHBIT_SET(*s))
+        l = 1;                    /* ASCII */
+    else if (len >= 4 && *(s + 1) >= 0x30 && *(s + 1) <= 0x39)
+    {
+        /* Should be 4-byte, validate remaining bytes */
+        if (*s >= 0x81 && *s <= 0xfe &&
+            *(s + 2) >= 0x81 && *(s + 2) <= 0xfe &&
+            *(s + 3) >= 0x30 && *(s + 3) <= 0x39)
+            l = 4;
+        else
+            l = -1;
+    }
+    else if (len >= 2 && *s >= 0x81 && *s <= 0xfe)
+    {
+        /* Should be 2-byte, validate */
+        if ((*(s + 1) >= 0x40 && *(s + 1) <= 0x7e) ||
+            (*(s + 1) >= 0x80 && *(s + 1) <= 0xfe))
+            l = 2;
+        else
+            l = -1;
+    }
+    else
+        l = -1;
+    return l;
+}
+
+static int
+pg_utf8_verifier(const unsigned char *s, int len)
+{
+    int            l = pg_utf_mblen(s);
+
+    if (len < l)
+        return -1;
+
+    if (!pg_utf8_islegal(s, l))
+        return -1;
+
+    return l;
+}
+
+/*
+ * Check for validity of a single UTF-8 encoded character
+ *
+ * This directly implements the rules in RFC3629.  The bizarre-looking
+ * restrictions on the second byte are meant to ensure that there isn't
+ * more than one encoding of a given Unicode character point; that is,
+ * you may not use a longer-than-necessary byte sequence with high order
+ * zero bits to represent a character that would fit in fewer bytes.
+ * To do otherwise is to create security hazards (eg, create an apparent
+ * non-ASCII character that decodes to plain ASCII).
+ *
+ * length is assumed to have been obtained by pg_utf_mblen(), and the
+ * caller must have checked that that many bytes are present in the buffer.
+ */
+bool
+pg_utf8_islegal(const unsigned char *source, int length)
+{
+    unsigned char a;
+
+    switch (length)
+    {
+        default:
+            /* reject lengths 5 and 6 for now */
+            return false;
+        case 4:
+            a = source[3];
+            if (a < 0x80 || a > 0xBF)
+                return false;
+            /* FALL THRU */
+        case 3:
+            a = source[2];
+            if (a < 0x80 || a > 0xBF)
+                return false;
+            /* FALL THRU */
+        case 2:
+            a = source[1];
+            switch (*source)
+            {
+                case 0xE0:
+                    if (a < 0xA0 || a > 0xBF)
+                        return false;
+                    break;
+                case 0xED:
+                    if (a < 0x80 || a > 0x9F)
+                        return false;
+                    break;
+                case 0xF0:
+                    if (a < 0x90 || a > 0xBF)
+                        return false;
+                    break;
+                case 0xF4:
+                    if (a < 0x80 || a > 0x8F)
+                        return false;
+                    break;
+                default:
+                    if (a < 0x80 || a > 0xBF)
+                        return false;
+                    break;
+            }
+            /* FALL THRU */
+        case 1:
+            a = *source;
+            if (a >= 0x80 && a < 0xC2)
+                return false;
+            if (a > 0xF4)
+                return false;
+            break;
+    }
+    return true;
+}
+
+#ifndef FRONTEND
+
+/*
+ * Generic character incrementer function.
+ *
+ * Not knowing anything about the properties of the encoding in use, we just
+ * keep incrementing the last byte until we get a validly-encoded result,
+ * or we run out of values to try.  We don't bother to try incrementing
+ * higher-order bytes, so there's no growth in runtime for wider characters.
+ * (If we did try to do that, we'd need to consider the likelihood that 255
+ * is not a valid final byte in the encoding.)
+ */
+static bool
+pg_generic_charinc(unsigned char *charptr, int len)
+{
+    unsigned char *lastbyte = charptr + len - 1;
+    mbverifier    mbverify;
+
+    /* We can just invoke the character verifier directly. */
+    mbverify = pg_wchar_table[GetDatabaseEncoding()].mbverify;
+
+    while (*lastbyte < (unsigned char) 255)
+    {
+        (*lastbyte)++;
+        if ((*mbverify) (charptr, len) == len)
+            return true;
+    }
+
+    return false;
+}
+
+/*
+ * UTF-8 character incrementer function.
+ *
+ * For a one-byte character less than 0x7F, we just increment the byte.
+ *
+ * For a multibyte character, every byte but the first must fall between 0x80
+ * and 0xBF; and the first byte must be between 0xC0 and 0xF4.  We increment
+ * the last byte that's not already at its maximum value.  If we can't find a
+ * byte that's less than the maximum allowable value, we simply fail.  We also
+ * need some special-case logic to skip regions used for surrogate pair
+ * handling, as those should not occur in valid UTF-8.
+ *
+ * Note that we don't reset lower-order bytes back to their minimums, since
+ * we can't afford to make an exhaustive search (see make_greater_string).
+ */
+static bool
+pg_utf8_increment(unsigned char *charptr, int length)
+{
+    unsigned char a;
+    unsigned char limit;
+
+    switch (length)
+    {
+        default:
+            /* reject lengths 5 and 6 for now */
+            return false;
+        case 4:
+            a = charptr[3];
+            if (a < 0xBF)
+            {
+                charptr[3]++;
+                break;
+            }
+            /* FALL THRU */
+        case 3:
+            a = charptr[2];
+            if (a < 0xBF)
+            {
+                charptr[2]++;
+                break;
+            }
+            /* FALL THRU */
+        case 2:
+            a = charptr[1];
+            switch (*charptr)
+            {
+                case 0xED:
+                    limit = 0x9F;
+                    break;
+                case 0xF4:
+                    limit = 0x8F;
+                    break;
+                default:
+                    limit = 0xBF;
+                    break;
+            }
+            if (a < limit)
+            {
+                charptr[1]++;
+                break;
+            }
+            /* FALL THRU */
+        case 1:
+            a = *charptr;
+            if (a == 0x7F || a == 0xDF || a == 0xEF || a == 0xF4)
+                return false;
+            charptr[0]++;
+            break;
+    }
+
+    return true;
+}
+
+/*
+ * EUC-JP character incrementer function.
+ *
+ * If the sequence starts with SS2 (0x8e), it must be a two-byte sequence
+ * representing JIS X 0201 characters with the second byte ranging between
+ * 0xa1 and 0xdf.  We just increment the last byte if it's less than 0xdf,
+ * and otherwise rewrite the whole sequence to 0xa1 0xa1.
+ *
+ * If the sequence starts with SS3 (0x8f), it must be a three-byte sequence
+ * in which the last two bytes range between 0xa1 and 0xfe.  The last byte
+ * is incremented if possible, otherwise the second-to-last byte.
+ *
+ * If the sequence starts with a value other than the above and its MSB
+ * is set, it must be a two-byte sequence representing JIS X 0208 characters
+ * with both bytes ranging between 0xa1 and 0xfe.  The last byte is
+ * incremented if possible, otherwise the second-to-last byte.
+ *
+ * Otherwise, the sequence is a single-byte ASCII character. It is
+ * incremented up to 0x7f.
+ */
+static bool
+pg_eucjp_increment(unsigned char *charptr, int length)
+{
+    unsigned char c1,
+                c2;
+    int            i;
+
+    c1 = *charptr;
+
+    switch (c1)
+    {
+        case SS2:                /* JIS X 0201 */
+            if (length != 2)
+                return false;
+
+            c2 = charptr[1];
+
+            if (c2 >= 0xdf)
+                charptr[0] = charptr[1] = 0xa1;
+            else if (c2 < 0xa1)
+                charptr[1] = 0xa1;
+            else
+                charptr[1]++;
+            break;
+
+        case SS3:                /* JIS X 0212 */
+            if (length != 3)
+                return false;
+
+            for (i = 2; i > 0; i--)
+            {
+                c2 = charptr[i];
+                if (c2 < 0xa1)
+                {
+                    charptr[i] = 0xa1;
+                    return true;
+                }
+                else if (c2 < 0xfe)
+                {
+                    charptr[i]++;
+                    return true;
+                }
+            }
+
+            /* Out of 3-byte code region */
+            return false;
+
+        default:
+            if (IS_HIGHBIT_SET(c1)) /* JIS X 0208? */
+            {
+                if (length != 2)
+                    return false;
+
+                for (i = 1; i >= 0; i--)
+                {
+                    c2 = charptr[i];
+                    if (c2 < 0xa1)
+                    {
+                        charptr[i] = 0xa1;
+                        return true;
+                    }
+                    else if (c2 < 0xfe)
+                    {
+                        charptr[i]++;
+                        return true;
+                    }
+                }
+
+                /* Out of 2 byte code region */
+                return false;
+            }
+            else
+            {                    /* ASCII, single byte */
+                if (c1 > 0x7e)
+                    return false;
+                (*charptr)++;
+            }
+            break;
+    }
+
+    return true;
+}
+#endif                            /* !FRONTEND */
+
+
+/*
+ *-------------------------------------------------------------------
+ * encoding info table
+ * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
+ *-------------------------------------------------------------------
+ */
+const pg_wchar_tbl pg_wchar_table[] = {
+    {pg_ascii2wchar_with_len, pg_wchar2single_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_ascii_verifier, 1}, /*
PG_SQL_ASCII*/ 
+    {pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},    /*
PG_EUC_JP*/ 
+    {pg_euccn2wchar_with_len, pg_wchar2euc_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_euccn_verifier, 2},    /*
PG_EUC_CN*/ 
+    {pg_euckr2wchar_with_len, pg_wchar2euc_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_euckr_verifier, 3},    /*
PG_EUC_KR*/ 
+    {pg_euctw2wchar_with_len, pg_wchar2euc_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_euctw_verifier, 4},    /*
PG_EUC_TW*/ 
+    {pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3},    /*
PG_EUC_JIS_2004*/ 
+    {pg_utf2wchar_with_len, pg_wchar2utf_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4},    /* PG_UTF8 */
+    {pg_mule2wchar_with_len, pg_wchar2mule_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier, 4},    /*
PG_MULE_INTERNAL*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN1*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN2*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN3*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN4*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN5*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN6*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN7*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN8*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN9*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_LATIN10*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1256*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1258*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN866*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN874*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_KOI8R*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1251*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1252*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
ISO-8859-5*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
ISO-8859-6*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
ISO-8859-7*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
ISO-8859-8*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1250*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1253*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1254*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1255*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_WIN1257*/ 
+    {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /*
PG_KOI8U*/ 
+    {0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}, /* PG_SJIS */
+    {0, 0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2}, /* PG_BIG5 */
+    {0, 0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2},    /* PG_GBK */
+    {0, 0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2},    /* PG_UHC */
+    {0, 0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4},    /* PG_GB18030 */
+    {0, 0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3},    /* PG_JOHAB */
+    {0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}    /* PG_SHIFT_JIS_2004 */
+};
+
+/* returns the byte length of a word for mule internal code */
+int
+pg_mic_mblen(const unsigned char *mbstr)
+{
+    return pg_mule_mblen(mbstr);
+}
+
+/*
+ * Returns the byte length of a multibyte character.
+ */
+int
+pg_encoding_mblen(int encoding, const char *mbstr)
+{
+    return (PG_VALID_ENCODING(encoding) ?
+            pg_wchar_table[encoding].mblen((const unsigned char *) mbstr) :
+            pg_wchar_table[PG_SQL_ASCII].mblen((const unsigned char *) mbstr));
+}
+
+/*
+ * Returns the display length of a multibyte character.
+ */
+int
+pg_encoding_dsplen(int encoding, const char *mbstr)
+{
+    return (PG_VALID_ENCODING(encoding) ?
+            pg_wchar_table[encoding].dsplen((const unsigned char *) mbstr) :
+            pg_wchar_table[PG_SQL_ASCII].dsplen((const unsigned char *) mbstr));
+}
+
+/*
+ * Verify the first multibyte character of the given string.
+ * Return its byte length if good, -1 if bad.  (See comments above for
+ * full details of the mbverify API.)
+ */
+int
+pg_encoding_verifymb(int encoding, const char *mbstr, int len)
+{
+    return (PG_VALID_ENCODING(encoding) ?
+            pg_wchar_table[encoding].mbverify((const unsigned char *) mbstr, len) :
+            pg_wchar_table[PG_SQL_ASCII].mbverify((const unsigned char *) mbstr, len));
+}
+
+/*
+ * fetch maximum length of a given encoding
+ */
+int
+pg_encoding_max_length(int encoding)
+{
+    Assert(PG_VALID_ENCODING(encoding));
+
+    return pg_wchar_table[encoding].maxmblen;
+}
+
+#ifndef FRONTEND
+
+/*
+ * fetch maximum length of the encoding for the current database
+ */
+int
+pg_database_encoding_max_length(void)
+{
+    return pg_wchar_table[GetDatabaseEncoding()].maxmblen;
+}
+
+/*
+ * get the character incrementer for the encoding for the current database
+ */
+mbcharacter_incrementer
+pg_database_encoding_character_incrementer(void)
+{
+    /*
+     * Eventually it might be best to add a field to pg_wchar_table[], but for
+     * now we just use a switch.
+     */
+    switch (GetDatabaseEncoding())
+    {
+        case PG_UTF8:
+            return pg_utf8_increment;
+
+        case PG_EUC_JP:
+            return pg_eucjp_increment;
+
+        default:
+            return pg_generic_charinc;
+    }
+}
+
+/*
+ * Verify mbstr to make sure that it is validly encoded in the current
+ * database encoding.  Otherwise same as pg_verify_mbstr().
+ */
+bool
+pg_verifymbstr(const char *mbstr, int len, bool noError)
+{
+    return
+        pg_verify_mbstr_len(GetDatabaseEncoding(), mbstr, len, noError) >= 0;
+}
+
+/*
+ * Verify mbstr to make sure that it is validly encoded in the specified
+ * encoding.
+ */
+bool
+pg_verify_mbstr(int encoding, const char *mbstr, int len, bool noError)
+{
+    return pg_verify_mbstr_len(encoding, mbstr, len, noError) >= 0;
+}
+
+/*
+ * Verify mbstr to make sure that it is validly encoded in the specified
+ * encoding.
+ *
+ * mbstr is not necessarily zero terminated; length of mbstr is
+ * specified by len.
+ *
+ * If OK, return length of string in the encoding.
+ * If a problem is found, return -1 when noError is
+ * true; when noError is false, ereport() a descriptive message.
+ */
+int
+pg_verify_mbstr_len(int encoding, const char *mbstr, int len, bool noError)
+{
+    mbverifier    mbverify;
+    int            mb_len;
+
+    Assert(PG_VALID_ENCODING(encoding));
+
+    /*
+     * In single-byte encodings, we need only reject nulls (\0).
+     */
+    if (pg_encoding_max_length(encoding) <= 1)
+    {
+        const char *nullpos = memchr(mbstr, 0, len);
+
+        if (nullpos == NULL)
+            return len;
+        if (noError)
+            return -1;
+        report_invalid_encoding(encoding, nullpos, 1);
+    }
+
+    /* fetch function pointer just once */
+    mbverify = pg_wchar_table[encoding].mbverify;
+
+    mb_len = 0;
+
+    while (len > 0)
+    {
+        int            l;
+
+        /* fast path for ASCII-subset characters */
+        if (!IS_HIGHBIT_SET(*mbstr))
+        {
+            if (*mbstr != '\0')
+            {
+                mb_len++;
+                mbstr++;
+                len--;
+                continue;
+            }
+            if (noError)
+                return -1;
+            report_invalid_encoding(encoding, mbstr, len);
+        }
+
+        l = (*mbverify) ((const unsigned char *) mbstr, len);
+
+        if (l < 0)
+        {
+            if (noError)
+                return -1;
+            report_invalid_encoding(encoding, mbstr, len);
+        }
+
+        mbstr += l;
+        len -= l;
+        mb_len++;
+    }
+    return mb_len;
+}
+
+/*
+ * check_encoding_conversion_args: check arguments of a conversion function
+ *
+ * "expected" arguments can be either an encoding ID or -1 to indicate that
+ * the caller will check whether it accepts the ID.
+ *
+ * Note: the errors here are not really user-facing, so elog instead of
+ * ereport seems sufficient.  Also, we trust that the "expected" encoding
+ * arguments are valid encoding IDs, but we don't trust the actuals.
+ */
+void
+check_encoding_conversion_args(int src_encoding,
+                               int dest_encoding,
+                               int len,
+                               int expected_src_encoding,
+                               int expected_dest_encoding)
+{
+    if (!PG_VALID_ENCODING(src_encoding))
+        elog(ERROR, "invalid source encoding ID: %d", src_encoding);
+    if (src_encoding != expected_src_encoding && expected_src_encoding >= 0)
+        elog(ERROR, "expected source encoding \"%s\", but got \"%s\"",
+             pg_enc2name_tbl[expected_src_encoding].name,
+             pg_enc2name_tbl[src_encoding].name);
+    if (!PG_VALID_ENCODING(dest_encoding))
+        elog(ERROR, "invalid destination encoding ID: %d", dest_encoding);
+    if (dest_encoding != expected_dest_encoding && expected_dest_encoding >= 0)
+        elog(ERROR, "expected destination encoding \"%s\", but got \"%s\"",
+             pg_enc2name_tbl[expected_dest_encoding].name,
+             pg_enc2name_tbl[dest_encoding].name);
+    if (len < 0)
+        elog(ERROR, "encoding conversion length must not be negative");
+}
+
+/*
+ * report_invalid_encoding: complain about invalid multibyte character
+ *
+ * note: len is remaining length of string, not length of character;
+ * len must be greater than zero, as we always examine the first byte.
+ */
+void
+report_invalid_encoding(int encoding, const char *mbstr, int len)
+{
+    int            l = pg_encoding_mblen(encoding, mbstr);
+    char        buf[8 * 5 + 1];
+    char       *p = buf;
+    int            j,
+                jlimit;
+
+    jlimit = Min(l, len);
+    jlimit = Min(jlimit, 8);    /* prevent buffer overrun */
+
+    for (j = 0; j < jlimit; j++)
+    {
+        p += sprintf(p, "0x%02x", (unsigned char) mbstr[j]);
+        if (j < jlimit - 1)
+            p += sprintf(p, " ");
+    }
+
+    ereport(ERROR,
+            (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
+             errmsg("invalid byte sequence for encoding \"%s\": %s",
+                    pg_enc2name_tbl[encoding].name,
+                    buf)));
+}
+
+/*
+ * report_untranslatable_char: complain about untranslatable character
+ *
+ * note: len is remaining length of string, not length of character;
+ * len must be greater than zero, as we always examine the first byte.
+ */
+void
+report_untranslatable_char(int src_encoding, int dest_encoding,
+                           const char *mbstr, int len)
+{
+    int            l = pg_encoding_mblen(src_encoding, mbstr);
+    char        buf[8 * 5 + 1];
+    char       *p = buf;
+    int            j,
+                jlimit;
+
+    jlimit = Min(l, len);
+    jlimit = Min(jlimit, 8);    /* prevent buffer overrun */
+
+    for (j = 0; j < jlimit; j++)
+    {
+        p += sprintf(p, "0x%02x", (unsigned char) mbstr[j]);
+        if (j < jlimit - 1)
+            p += sprintf(p, " ");
+    }
+
+    ereport(ERROR,
+            (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+             errmsg("character with byte sequence %s in encoding \"%s\" has no equivalent in encoding \"%s\"",
+                    buf,
+                    pg_enc2name_tbl[src_encoding].name,
+                    pg_enc2name_tbl[dest_encoding].name)));
+}
+
+#endif                            /* !FRONTEND */
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 7fb5fa4..026f64f 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -222,8 +222,8 @@ typedef unsigned int pg_wchar;
  * PostgreSQL encoding identifiers
  *
  * WARNING: the order of this enum must be same as order of entries
- *            in the pg_enc2name_tbl[] array (in mb/encnames.c), and
- *            in the pg_wchar_table[] array (in mb/wchar.c)!
+ *            in the pg_enc2name_tbl[] array (in src/common/encnames.c), and
+ *            in the pg_wchar_table[] array (in src/common/wchar.c)!
  *
  *            If you add some encoding don't forget to check
  *            PG_ENCODING_BE_LAST macro.
diff --git a/src/interfaces/libpq/.gitignore b/src/interfaces/libpq/.gitignore
index 7b438f3..a4afe7c 100644
--- a/src/interfaces/libpq/.gitignore
+++ b/src/interfaces/libpq/.gitignore
@@ -1,4 +1 @@
 /exports.list
-# .c files that are symlinked in from elsewhere
-/encnames.c
-/wchar.c
diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index f5f1c0c..a068826 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -45,11 +45,6 @@ OBJS = \
     pqexpbuffer.o \
     fe-auth.o

-# src/backend/utils/mb
-OBJS += \
-    encnames.o \
-    wchar.o
-
 ifeq ($(with_openssl),yes)
 OBJS += \
     fe-secure-common.o \
@@ -102,17 +97,7 @@ include $(top_srcdir)/src/Makefile.shlib
 backend_src = $(top_srcdir)/src/backend


-# We use a few backend modules verbatim, but since we need
-# to compile with appropriate options to build a shared lib, we can't
-# use the same object files built for the backend.
-# Instead, symlink the source files in here and build our own object files.
-# When you add a file here, remember to add it in the "clean" target below.
-
-encnames.c wchar.c: % : $(backend_src)/utils/mb/%
-    rm -f $@ && $(LN_S) $< .
-
-
-# Make dependencies on pg_config_paths.h visible, too.
+# Make dependencies on pg_config_paths.h visible in all builds.
 fe-connect.o: fe-connect.c $(top_builddir)/src/port/pg_config_paths.h
 fe-misc.o: fe-misc.c $(top_builddir)/src/port/pg_config_paths.h

@@ -144,8 +129,6 @@ clean distclean: clean-lib
     rm -f $(OBJS) pthread.h
 # Might be left over from a Win32 client-only build
     rm -f pg_config_paths.h
-# Remove files we (may have) symlinked in from other places
-    rm -f encnames.c wchar.c

 maintainer-clean: distclean
     $(MAKE) -C test $@
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index f6ab0d5..67b9f23 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -120,11 +120,12 @@ sub mkvcbuild
     }

     our @pgcommonallfiles = qw(
-      base64.c config_info.c controldata_utils.c d2s.c exec.c f2s.c file_perm.c ip.c
+      base64.c config_info.c controldata_utils.c d2s.c encnames.c exec.c
+      f2s.c file_perm.c ip.c
       keywords.c kwlookup.c link-canary.c md5.c
       pg_lzcompress.c pgfnames.c psprintf.c relpath.c rmtree.c
       saslprep.c scram-common.c string.c stringinfo.c unicode_norm.c username.c
-      wait_error.c);
+      wait_error.c wchar.c);

     if ($solution->{options}->{openssl})
     {
pgsql-hackers by date:
From: David Steele
Date: 16 January 2020, 20:10:53
Subject: Re: making the backend's json parser work in frontend code
From: Tom Lane
Date: 16 January 2020, 20:15:41
Subject: Re: SlabCheck leaks memory into TopMemoryContext
Re: making the backend's json parser work in frontend code - Mailing list pgsql-hackers

Previous

Next