Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding |
Date | |
Msg-id | 4385.1392859333@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #9210: PostgreSQL string store bug? not enforce check with
correct characterSET/encoding
|
List | pgsql-hackers |
I wrote: >> The minimum-refactoring solution to this would be to tweak >> pg_do_encoding_conversion() so that if the src_encoding is SQL_ASCII but >> the dest_encoding isn't, it does pg_verify_mbstr() rather than nothing. >> I'm not sure if this would break anything we need to have work, >> though. Thoughts? Do we want to back-patch such a change? > I looked through all the callers of pg_do_encoding_conversion(), and > AFAICS this change is a good idea. There are a whole bunch of places > that use pg_do_encoding_conversion() to convert from the database encoding > to encoding X (most usually UTF8), and right now if you do that in a > SQL_ASCII database you have no assurance whatever that what is produced > is actually valid in encoding X. I think we need to close that loophole. The more I looked into mbutils.c, the less happy I got. The attached proposed patch takes care of the missing-verification hole in pg_do_encoding_conversion() and pg_server_to_any(), and also gets rid of what I believe to be obsolete provisions in pg_do_encoding_conversion to "work" if called outside a transaction --- if you consider it working to completely fail to honor its API contract. That should no longer be necessary now that we perform client<->server encoding conversions via perform_default_encoding_conversion rather than here. For testing I inserted an "Assert(IsTransactionState());" at the top of pg_do_encoding_conversion(), and couldn't trigger it, so I'm fairly sure this change is OK. Still, back-patching it might not be a good thing. On the other hand, if there is still any code path that can get here outside a transaction, we probably ought to find out rather than allow completely bogus data to be returned. I also made some more-cosmetic improvements, notably removing a bunch of Asserts that are certainly dead code because the relevant variables are never NULL. I've not done anything yet about simplifying unnecessary calls of pg_do_encoding_conversion into pg_server_to_any/pg_any_to_server. How much of this is back-patch material, do you think? regards, tom lane diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c index 08440e9..7f43cae 100644 *** a/src/backend/utils/mb/mbutils.c --- b/src/backend/utils/mb/mbutils.c *************** *** 1,10 **** ! /* ! * This file contains public functions for conversion between ! * client encoding and server (database) encoding. * ! * Tatsuo Ishii * ! * src/backend/utils/mb/mbutils.c */ #include "postgres.h" --- 1,36 ---- ! /*------------------------------------------------------------------------- * ! * mbutils.c ! * This file contains functions for encoding conversion. * ! * The string-conversion functions in this file share some API quirks. ! * Note the following: ! * ! * The functions return a palloc'd, null-terminated string if conversion ! * is required. However, if no conversion is performed, the given source ! * string pointer is returned as-is. ! * ! * Although the presence of a length argument means that callers can pass ! * non-null-terminated strings, care is required because the same string ! * will be passed back if no conversion occurs. Such callers *must* check ! * whether result == src and handle that case differently. ! * ! * If the source and destination encodings are the same, the source string ! * is returned without any verification; it's assumed to be valid data. ! * If that might not be the case, the caller is responsible for validating ! * the string using a separate call to pg_verify_mbstr(). Whenever the ! * source and destination encodings are different, the functions ensure that ! * the result is validly encoded according to the destination encoding. ! * ! * ! * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group ! * Portions Copyright (c) 1994, Regents of the University of California ! * ! * ! * IDENTIFICATION ! * src/backend/utils/mb/mbutils.c ! * ! *------------------------------------------------------------------------- */ #include "postgres.h" *************** InitializeClientEncoding(void) *** 290,296 **** int pg_get_client_encoding(void) { - Assert(ClientEncoding); return ClientEncoding->encoding; } --- 316,321 ---- *************** pg_get_client_encoding(void) *** 300,328 **** const char * pg_get_client_encoding_name(void) { - Assert(ClientEncoding); return ClientEncoding->name; } /* ! * Apply encoding conversion on src and return it. The encoding ! * conversion function is chosen from the pg_conversion system catalog ! * marked as "default". If it is not found in the schema search path, ! * it's taken from pg_catalog schema. If it even is not in the schema, ! * warn and return src. ! * ! * If conversion occurs, a palloc'd null-terminated string is returned. ! * In the case of no conversion, src is returned. ! * ! * CAUTION: although the presence of a length argument means that callers ! * can pass non-null-terminated strings, care is required because the same ! * string will be passed back if no conversion occurs. Such callers *must* ! * check whether result == src and handle that case differently. * ! * Note: we try to avoid raising error, since that could get us into ! * infinite recursion when this function is invoked during error message ! * sending. It should be OK to raise error for overlength strings though, ! * since the recursion will come with a shorter message. */ unsigned char * pg_do_encoding_conversion(unsigned char *src, int len, --- 325,337 ---- const char * pg_get_client_encoding_name(void) { return ClientEncoding->name; } /* ! * Convert src string to another encoding (general case). * ! * See the notes about string conversion functions at the top of this file. */ unsigned char * pg_do_encoding_conversion(unsigned char *src, int len, *************** pg_do_encoding_conversion(unsigned char *** 331,369 **** unsigned char *result; Oid proc; ! if (!IsTransactionState()) ! return src; if (src_encoding == dest_encoding) ! return src; ! if (src_encoding == PG_SQL_ASCII || dest_encoding == PG_SQL_ASCII) ! return src; ! if (len <= 0) return src; proc = FindDefaultConversionProc(src_encoding, dest_encoding); if (!OidIsValid(proc)) ! { ! ereport(LOG, (errcode(ERRCODE_UNDEFINED_FUNCTION), errmsg("default conversion function for encoding \"%s\" to \"%s\" does not exist", pg_encoding_to_char(src_encoding), pg_encoding_to_char(dest_encoding)))); - return src; - } - - /* - * XXX we should avoid throwing errors in OidFunctionCall. Otherwise we - * are going into infinite loop! So we have to make sure that the - * function exists before calling OidFunctionCall. - */ - if (!SearchSysCacheExists1(PROCOID, ObjectIdGetDatum(proc))) - { - elog(LOG, "cache lookup failed for function %u", proc); - return src; - } /* * Allocate space for conversion result, being wary of integer overflow --- 340,371 ---- unsigned char *result; Oid proc; ! if (len <= 0) ! return src; /* empty string is always valid */ if (src_encoding == dest_encoding) ! return src; /* no conversion required, assume valid */ ! if (dest_encoding == PG_SQL_ASCII) ! return src; /* any string is valid in SQL_ASCII */ ! if (src_encoding == PG_SQL_ASCII) ! { ! /* No conversion is possible, but we must validate the result */ ! (void) pg_verify_mbstr(dest_encoding, (const char *) src, len, false); return src; + } + + if (!IsTransactionState()) /* shouldn't happen */ + elog(ERROR, "cannot perform encoding conversion outside a transaction"); proc = FindDefaultConversionProc(src_encoding, dest_encoding); if (!OidIsValid(proc)) ! ereport(ERROR, (errcode(ERRCODE_UNDEFINED_FUNCTION), errmsg("default conversion function for encoding \"%s\" to \"%s\" does not exist", pg_encoding_to_char(src_encoding), pg_encoding_to_char(dest_encoding)))); /* * Allocate space for conversion result, being wary of integer overflow *************** pg_do_encoding_conversion(unsigned char *** 387,393 **** } /* ! * Convert string using encoding_name. The source * encoding is the DB encoding. * * BYTEA convert_to(TEXT string, NAME encoding_name) */ --- 389,395 ---- } /* ! * Convert string to encoding encoding_name. The source * encoding is the DB encoding. * * BYTEA convert_to(TEXT string, NAME encoding_name) */ *************** pg_convert_to(PG_FUNCTION_ARGS) *** 412,418 **** } /* ! * Convert string using encoding_name. The destination * encoding is the DB encoding. * * TEXT convert_from(BYTEA string, NAME encoding_name) */ --- 414,420 ---- } /* ! * Convert string from encoding encoding_name. The destination * encoding is the DB encoding. * * TEXT convert_from(BYTEA string, NAME encoding_name) */ *************** pg_convert_from(PG_FUNCTION_ARGS) *** 439,445 **** } /* ! * Convert string using encoding_names. * * BYTEA convert(BYTEA string, NAME src_encoding_name, NAME dest_encoding_name) */ --- 441,447 ---- } /* ! * Convert string between two arbitrary encodings. * * BYTEA convert(BYTEA string, NAME src_encoding_name, NAME dest_encoding_name) */ *************** pg_convert(PG_FUNCTION_ARGS) *** 472,479 **** src_str = VARDATA_ANY(string); pg_verify_mbstr_len(src_encoding, src_str, len, false); ! dest_str = (char *) pg_do_encoding_conversion( ! (unsigned char *) src_str, len, src_encoding, dest_encoding); if (dest_str != src_str) len = strlen(dest_str); --- 474,486 ---- src_str = VARDATA_ANY(string); pg_verify_mbstr_len(src_encoding, src_str, len, false); ! /* perform conversion */ ! dest_str = (char *) pg_do_encoding_conversion((unsigned char *) src_str, ! len, ! src_encoding, ! dest_encoding); ! ! /* update len if conversion actually happened */ if (dest_str != src_str) len = strlen(dest_str); *************** pg_convert(PG_FUNCTION_ARGS) *** 503,512 **** Datum length_in_encoding(PG_FUNCTION_ARGS) { ! bytea *string = PG_GETARG_BYTEA_P(0); char *src_encoding_name = NameStr(*PG_GETARG_NAME(1)); int src_encoding = pg_char_to_encoding(src_encoding_name); ! int len = VARSIZE(string) - VARHDRSZ; int retval; if (src_encoding < 0) --- 510,520 ---- Datum length_in_encoding(PG_FUNCTION_ARGS) { ! bytea *string = PG_GETARG_BYTEA_PP(0); char *src_encoding_name = NameStr(*PG_GETARG_NAME(1)); int src_encoding = pg_char_to_encoding(src_encoding_name); ! const char *src_str; ! int len; int retval; if (src_encoding < 0) *************** length_in_encoding(PG_FUNCTION_ARGS) *** 515,525 **** errmsg("invalid encoding name \"%s\"", src_encoding_name))); ! retval = pg_verify_mbstr_len(src_encoding, VARDATA(string), len, false); ! PG_RETURN_INT32(retval); } Datum pg_encoding_max_length_sql(PG_FUNCTION_ARGS) { --- 523,541 ---- errmsg("invalid encoding name \"%s\"", src_encoding_name))); ! len = VARSIZE_ANY_EXHDR(string); ! src_str = VARDATA_ANY(string); + retval = pg_verify_mbstr_len(src_encoding, src_str, len, false); + + PG_RETURN_INT32(retval); } + /* + * Get maximum multibyte character length in the specified encoding. + * + * Note encoding is specified numerically, not by name as above. + */ Datum pg_encoding_max_length_sql(PG_FUNCTION_ARGS) { *************** pg_encoding_max_length_sql(PG_FUNCTION_A *** 532,558 **** } /* ! * convert client encoding to server encoding. */ char * pg_client_to_server(const char *s, int len) { - Assert(ClientEncoding); - return pg_any_to_server(s, len, ClientEncoding->encoding); } /* ! * convert any encoding to server encoding. */ char * pg_any_to_server(const char *s, int len, int encoding) { - Assert(DatabaseEncoding); - Assert(ClientEncoding); - if (len <= 0) ! return (char *) s; if (encoding == DatabaseEncoding->encoding || encoding == PG_SQL_ASCII) --- 548,578 ---- } /* ! * Convert client encoding to server encoding. ! * ! * See the notes about string conversion functions at the top of this file. */ char * pg_client_to_server(const char *s, int len) { return pg_any_to_server(s, len, ClientEncoding->encoding); } /* ! * Convert any encoding to server encoding. ! * ! * See the notes about string conversion functions at the top of this file. ! * ! * Unlike the other string conversion functions, this will apply validation ! * even if encoding == DatabaseEncoding->encoding. This is because this is ! * used to process data coming in from outside the database, and we never ! * want to just assume validity. */ char * pg_any_to_server(const char *s, int len, int encoding) { if (len <= 0) ! return (char *) s; /* empty string is always valid */ if (encoding == DatabaseEncoding->encoding || encoding == PG_SQL_ASCII) *************** pg_any_to_server(const char *s, int len, *** 594,639 **** return (char *) s; } ! if (ClientEncoding->encoding == encoding) return perform_default_encoding_conversion(s, len, true); ! else ! return (char *) pg_do_encoding_conversion( ! (unsigned char *) s, len, encoding, DatabaseEncoding->encoding); } /* ! * convert server encoding to client encoding. */ char * pg_server_to_client(const char *s, int len) { - Assert(ClientEncoding); - return pg_server_to_any(s, len, ClientEncoding->encoding); } /* ! * convert server encoding to any encoding. */ char * pg_server_to_any(const char *s, int len, int encoding) { - Assert(DatabaseEncoding); - Assert(ClientEncoding); - if (len <= 0) ! return (char *) s; if (encoding == DatabaseEncoding->encoding || ! encoding == PG_SQL_ASCII || ! DatabaseEncoding->encoding == PG_SQL_ASCII) return (char *) s; /* assume data is valid */ ! if (ClientEncoding->encoding == encoding) return perform_default_encoding_conversion(s, len, false); ! else ! return (char *) pg_do_encoding_conversion( ! (unsigned char *) s, len, DatabaseEncoding->encoding, encoding); } /* --- 614,672 ---- return (char *) s; } ! /* Fast path if we can use cached conversion function */ ! if (encoding == ClientEncoding->encoding) return perform_default_encoding_conversion(s, len, true); ! ! /* General case ... will not work outside transactions */ ! return (char *) pg_do_encoding_conversion((unsigned char *) s, ! len, ! encoding, ! DatabaseEncoding->encoding); } /* ! * Convert server encoding to client encoding. ! * ! * See the notes about string conversion functions at the top of this file. */ char * pg_server_to_client(const char *s, int len) { return pg_server_to_any(s, len, ClientEncoding->encoding); } /* ! * Convert server encoding to any encoding. ! * ! * See the notes about string conversion functions at the top of this file. */ char * pg_server_to_any(const char *s, int len, int encoding) { if (len <= 0) ! return (char *) s; /* empty string is always valid */ if (encoding == DatabaseEncoding->encoding || ! encoding == PG_SQL_ASCII) return (char *) s; /* assume data is valid */ ! if (DatabaseEncoding->encoding == PG_SQL_ASCII) ! { ! /* No conversion is possible, but we must validate the result */ ! (void) pg_verify_mbstr(encoding, s, len, false); ! return (char *) s; ! } ! ! /* Fast path if we can use cached conversion function */ ! if (encoding == ClientEncoding->encoding) return perform_default_encoding_conversion(s, len, false); ! ! /* General case ... will not work outside transactions */ ! return (char *) pg_do_encoding_conversion((unsigned char *) s, ! len, ! DatabaseEncoding->encoding, ! encoding); } /* *************** pg_server_to_any(const char *s, int len, *** 643,649 **** * SetClientEncoding(), no conversion is performed. */ static char * ! perform_default_encoding_conversion(const char *src, int len, bool is_client_to_server) { char *result; int src_encoding, --- 676,683 ---- * SetClientEncoding(), no conversion is performed. */ static char * ! perform_default_encoding_conversion(const char *src, int len, ! bool is_client_to_server) { char *result; int src_encoding, *************** raw_pg_bind_textdomain_codeset(const cha *** 931,941 **** * On most platforms, gettext defaults to the codeset implied by LC_CTYPE. * When that matches the database encoding, we don't need to do anything. In * CREATE DATABASE, we enforce or trust that the locale's codeset matches the ! * database encoding, except for the C locale. (On Windows, we also permit a * discrepancy under the UTF8 encoding.) For the C locale, explicitly bind * gettext to the right codeset. * ! * On Windows, gettext defaults to the Windows ANSI code page. This is a * convenient departure for software that passes the strings to Windows ANSI * APIs, but we don't do that. Compel gettext to use database encoding or, * failing that, the LC_CTYPE encoding as it would on other platforms. --- 965,975 ---- * On most platforms, gettext defaults to the codeset implied by LC_CTYPE. * When that matches the database encoding, we don't need to do anything. In * CREATE DATABASE, we enforce or trust that the locale's codeset matches the ! * database encoding, except for the C locale. (On Windows, we also permit a * discrepancy under the UTF8 encoding.) For the C locale, explicitly bind * gettext to the right codeset. * ! * On Windows, gettext defaults to the Windows ANSI code page. This is a * convenient departure for software that passes the strings to Windows ANSI * APIs, but we don't do that. Compel gettext to use database encoding or, * failing that, the LC_CTYPE encoding as it would on other platforms. *************** pg_bind_textdomain_codeset(const char *d *** 980,1007 **** int GetDatabaseEncoding(void) { - Assert(DatabaseEncoding); return DatabaseEncoding->encoding; } const char * GetDatabaseEncodingName(void) { - Assert(DatabaseEncoding); return DatabaseEncoding->name; } Datum getdatabaseencoding(PG_FUNCTION_ARGS) { - Assert(DatabaseEncoding); return DirectFunctionCall1(namein, CStringGetDatum(DatabaseEncoding->name)); } Datum pg_client_encoding(PG_FUNCTION_ARGS) { - Assert(ClientEncoding); return DirectFunctionCall1(namein, CStringGetDatum(ClientEncoding->name)); } --- 1014,1037 ---- *************** pg_client_encoding(PG_FUNCTION_ARGS) *** 1014,1020 **** int GetMessageEncoding(void) { - Assert(MessageEncoding); return MessageEncoding->encoding; } --- 1044,1049 ----
pgsql-hackers by date: