Tightening binary receive functions - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Tightening binary receive functions |
Date | |
Msg-id | 4A9B85DD.9020804@enterprisedb.com Whole thread Raw |
Responses |
Re: Tightening binary receive functions
Re: Tightening binary receive functions |
List | pgsql-hackers |
After the thread started by Andrew McNamara a while ago: http://archives.postgresql.org/message-id/E419F08D-B908-446D-9B1E-F3520163CE9C@object-craft.com.au the question was left hanging if other binary recv functions are missing sanity checks that the corresponding text input functions have, and whether they might pose a security issue. Tom Lane went through all the recv functions after that, and found a few missing checks but nothing that would present a security issue, but posted them to the pgsql-security list for a double check. I just went through them again, and found no security issues either. We should nevertheless fix the recv functions so that they don't accept any values that the corresponding text functions reject. While the server itself handles them, such values will throw an error on pg_dump/restore, for example. Attached patch adds the required sanity checks. I'm thinking of applying this to CVS HEAD, but not back-patch, just like Tom did with the original problem reported with time_recv() and timetz_recv(). The most notable of these is the change to "char" datatype. The patch tightens it so that it no longer accepts values >127 with a multi-byte database encoding. Also, the handling of encoding in xml_recv() was bogus. Here's the list of issues found: Tom Lane wrote: > array_recv: > > Allows zero-length dimensions (because ArrayGetNItems doesn't complain); > whereas array_in does not. Not sure if this is an issue. We have had > -hackers discussions about the behavior of zero-length arrays, so I > think there may be other ways to get them into the system anyway. > > Doesn't check lower bounds at all, so it's possible that lowerbound plus > length overflows an int. Not sure if this is a security problem either, > but it seems like something that should be rejected. Yes, that seems dangerous. I note that the handling of large bounds isn't very rigid in the text input function either: postgres=# SELECT '[9999999999999999:9999999999999] = {1}'::integer[]; int4 ----------------------------- [2147483647:2147483647]={1} (1 row) We use plain atoi() to convert the subscripts to integers, which returns INT_MAX for an overflow. I didn't do anything about that now, but the patch adds a check for the upper bound overflow in array_recv(). > charrecv > > Doesn't bother its pretty little head with maintaining encoding validity. > Of course the entire datatype doesn't, so this is hardly the fault of > the recv routine in particular. It might be that the type is fine but we > ought to constrain char_text() to fail on high-bit-set char values unless > DB encoding is single-byte. Constraining char_text() seems like a good idea. The current behavior of char_text() with a high-bit-set byte is not useful, while using the full range of char can be. However, we have to constrain charout() as well, or we're back to square one with charout(c)::text. And if we constrain charout(), then we should constrain charin() as well. Which brings us back to forbidding such values altogether. The patch constrains all the functions that you can use to get a "char" into the system: charin(), charrecv(), i4tochar() and text_char(). They now reject values with high bit set when using a multi-byte database encoding > date_recv > > Fails to do any range check, so the value might cause odd behavior later. > Should probably limit to the values date_in would accept. Yep. > float4recv, float8recv, and geometric and other types depending on > pq_getmsgfloatN > > These all accept any bit pattern whatever for a float. Now I know of no > machine where float doesn't consider all bitpatterns "valid", but > nonetheless there are some issues here: > > * We don't currently allow any other way to inject an IEEE signaling NaN > into the database. As far as I can think, this could only lead to float > traps in places where one might perhaps not have expected one, so I > don't think this is a real problem. Agreed. This is frankly the first time I even hear about signaling NaNs, and after reading up on that a bit, I get the impression that even on platforms that support them, you need a #define or a compiler flag to enable them. > * Some of these types probably aren't expecting NaNs or Infinities at all. > Can anything really bad happen if they get one? Geometric types seem to handle NaNs and Infs gracefully, although I wonder what it means to have e.g a box with the X-coordinate of one corner being NaN. I think it's ok from a security point of view. timestamp_recv (with float timestamps) accepts NaN, while timestamp_in does not. I'm not sure what happens if you pass a NaN timestamp to the system, but we should forbid that anyway. > interval_recv > > Fails to do any range check, but I think it's okay since we don't have any > a-priori restrictions on the range of intervals. Should disallow Infs and NaNs. > numeric_recv > > Allows any weight or dscale value. Can this cause problems? It seems OK to me. make_result() normalizes and checks for overflow of those. > tintervalrecv > > Bogus --- fails to verify consistency of status with values or proper > ordering of the two values. Probably nothing very bad can happen, but > should be fixed. tintervalin doesn't check the ordering of the two values either. Status should indeed be checked. > xml_recv > > Encoding handling seems pretty questionable. In particular I think it's > going to treat a document without explicit encoding marking as being in > the database encoding --- shouldn't it assume client encoding? > Converting the encoding twice seems pretty icky as well. Fixed this so that the input is treated as UTF-8 if no encoding is specified in the XML header. client_encoding does not affect xml_recv() at all. Per Peter's description. Here's two extra ones: oidvectorrecv int2vectorrecv Don't constrain the number of elements to FUNC_MAX_ARGS, like the text input functions do. They also don't force lbound to 0, like the input function. We assume in array_ref() and probably elsewhere too that fixed-size array types are 0-based. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** a/src/backend/utils/adt/arrayfuncs.c --- b/src/backend/utils/adt/arrayfuncs.c *************** *** 1207,1214 **** array_recv(PG_FUNCTION_ARGS) --- 1207,1223 ---- for (i = 0; i < ndim; i++) { + int ub; + dim[i] = pq_getmsgint(buf, 4); lBound[i] = pq_getmsgint(buf, 4); + + ub = lBound[i] + dim[i] - 1; + /* overflow? */ + if (lBound[i] > ub) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("integer out of range"))); } /* This checks for overflow of array dimensions */ *** a/src/backend/utils/adt/char.c --- b/src/backend/utils/adt/char.c *************** *** 18,23 **** --- 18,24 ---- #include <limits.h> #include "libpq/pqformat.h" + #include "mb/pg_wchar.h" #include "utils/builtins.h" /***************************************************************************** *************** *** 34,39 **** charin(PG_FUNCTION_ARGS) --- 35,52 ---- { char *ch = PG_GETARG_CSTRING(0); + /* + * Only allow characters that fit in one byte. The "char" data type + * could handle any bit pattern, but it's not clear what charout() would + * return for such a character, so it's better to stop such values from + * entering the database in the first place. + */ + if (pg_database_encoding_max_length() != 1 && IS_HIGHBIT_SET(ch[0])) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("\"char\" out of range"), + errdetail("Character must be representable as a single byte in current database encoding"))); + PG_RETURN_CHAR(ch[0]); } *************** *** 66,73 **** Datum charrecv(PG_FUNCTION_ARGS) { StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); ! PG_RETURN_CHAR(pq_getmsgbyte(buf)); } /* --- 79,96 ---- charrecv(PG_FUNCTION_ARGS) { StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); + char ch; + + ch = pq_getmsgbyte(buf); ! /* Only allow characters that fit in one byte, see charin(). */ ! if (pg_database_encoding_max_length() != 1 && IS_HIGHBIT_SET(ch)) ! ereport(ERROR, ! (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), ! errmsg("\"char\" out of range"), ! errdetail("Character must be representable as a single byte in current database encoding"))); ! ! PG_RETURN_CHAR(ch); } /* *************** *** 162,174 **** Datum i4tochar(PG_FUNCTION_ARGS) { int32 arg1 = PG_GETARG_INT32(0); if (arg1 < SCHAR_MIN || arg1 > SCHAR_MAX) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("\"char\" out of range"))); ! PG_RETURN_CHAR((int8) arg1); } --- 185,206 ---- i4tochar(PG_FUNCTION_ARGS) { int32 arg1 = PG_GETARG_INT32(0); + int8 ch; if (arg1 < SCHAR_MIN || arg1 > SCHAR_MAX) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("\"char\" out of range"))); + ch = (int8) arg1; + + /* Only allow characters that fit in one byte, see charin(). */ + if (pg_database_encoding_max_length() != 1 && IS_HIGHBIT_SET(ch)) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("\"char\" out of range"), + errdetail("Character must be representable as a single byte in current database encoding"))); ! PG_RETURN_CHAR(ch); } *************** *** 184,190 **** text_char(PG_FUNCTION_ARGS) --- 216,231 ---- * discarded. */ if (VARSIZE(arg1) > VARHDRSZ) + { result = *(VARDATA(arg1)); + + /* Only allow characters that fit in one byte, see charin(). */ + if (pg_database_encoding_max_length() != 1 && IS_HIGHBIT_SET(result)) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("\"char\" out of range"), + errdetail("Character must be representable as a single byte in current database encoding"))); + } else result = '\0'; *** a/src/backend/utils/adt/date.c --- b/src/backend/utils/adt/date.c *************** *** 203,210 **** Datum date_recv(PG_FUNCTION_ARGS) { StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); ! PG_RETURN_DATEADT((DateADT) pq_getmsgint(buf, sizeof(DateADT))); } /* --- 203,219 ---- date_recv(PG_FUNCTION_ARGS) { StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); + DateADT result; ! result = (DateADT) pq_getmsgint(buf, sizeof(DateADT)); ! ! /* Limit to the same range that date_in() accepts. */ ! if (result < 0 || result > JULIAN_MAX) ! ereport(ERROR, ! (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), ! errmsg("date out of range"))); ! ! PG_RETURN_DATEADT(result); } /* *** a/src/backend/utils/adt/int.c --- b/src/backend/utils/adt/int.c *************** *** 225,237 **** int2vectorrecv(PG_FUNCTION_ARGS) Assert(!locfcinfo.isnull); ! /* sanity checks: int2vector must be 1-D, no nulls */ if (ARR_NDIM(result) != 1 || ARR_HASNULL(result) || ! ARR_ELEMTYPE(result) != INT2OID) ereport(ERROR, (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid int2vector data"))); PG_RETURN_POINTER(result); } --- 225,245 ---- Assert(!locfcinfo.isnull); ! /* sanity checks: int2vector must be 1-D, 0-based, no nulls */ if (ARR_NDIM(result) != 1 || ARR_HASNULL(result) || ! ARR_ELEMTYPE(result) != INT2OID || ! ARR_LBOUND(result)[0] != 0) ereport(ERROR, (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid int2vector data"))); + + /* check length for consistency with int2vectorin() */ + if (ARR_DIMS(result)[0] > FUNC_MAX_ARGS) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("oidvector has too many elements"))); + PG_RETURN_POINTER(result); } *** a/src/backend/utils/adt/nabstime.c --- b/src/backend/utils/adt/nabstime.c *************** *** 786,805 **** tintervalrecv(PG_FUNCTION_ARGS) { StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); TimeInterval tinterval; tinterval = (TimeInterval) palloc(sizeof(TimeIntervalData)); tinterval->status = pq_getmsgint(buf, sizeof(tinterval->status)); ! if (!(tinterval->status == T_INTERVAL_INVAL || ! tinterval->status == T_INTERVAL_VALID)) ereport(ERROR, (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid status in external \"tinterval\" value"))); - tinterval->data[0] = pq_getmsgint(buf, sizeof(tinterval->data[0])); - tinterval->data[1] = pq_getmsgint(buf, sizeof(tinterval->data[1])); - PG_RETURN_TIMEINTERVAL(tinterval); } --- 786,810 ---- { StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); TimeInterval tinterval; + int32 status; tinterval = (TimeInterval) palloc(sizeof(TimeIntervalData)); tinterval->status = pq_getmsgint(buf, sizeof(tinterval->status)); + tinterval->data[0] = pq_getmsgint(buf, sizeof(tinterval->data[0])); + tinterval->data[1] = pq_getmsgint(buf, sizeof(tinterval->data[1])); + + if (tinterval->data[0] == INVALID_ABSTIME || + tinterval->data[1] == INVALID_ABSTIME) + status = T_INTERVAL_INVAL; /* undefined */ + else + status = T_INTERVAL_VALID; ! if (status != tinterval->status) ereport(ERROR, (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid status in external \"tinterval\" value"))); PG_RETURN_TIMEINTERVAL(tinterval); } *** a/src/backend/utils/adt/oid.c --- b/src/backend/utils/adt/oid.c *************** *** 276,288 **** oidvectorrecv(PG_FUNCTION_ARGS) Assert(!locfcinfo.isnull); ! /* sanity checks: oidvector must be 1-D, no nulls */ if (ARR_NDIM(result) != 1 || ARR_HASNULL(result) || ! ARR_ELEMTYPE(result) != OIDOID) ereport(ERROR, (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid oidvector data"))); PG_RETURN_POINTER(result); } --- 276,296 ---- Assert(!locfcinfo.isnull); ! /* sanity checks: oidvector must be 1-D, 0-based, no nulls */ if (ARR_NDIM(result) != 1 || ARR_HASNULL(result) || ! ARR_ELEMTYPE(result) != OIDOID || ! ARR_LBOUND(result)[0] != 0) ereport(ERROR, (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid oidvector data"))); + + /* check length for consistency with oidvectorin() */ + if (ARR_DIMS(result)[0] > FUNC_MAX_ARGS) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("oidvector has too many elements"))); + PG_RETURN_POINTER(result); } *** a/src/backend/utils/adt/timestamp.c --- b/src/backend/utils/adt/timestamp.c *************** *** 253,258 **** timestamp_recv(PG_FUNCTION_ARGS) --- 253,263 ---- timestamp = (Timestamp) pq_getmsgint64(buf); #else timestamp = (Timestamp) pq_getmsgfloat8(buf); + + if (isnan(timestamp)) + ereport(ERROR, + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("timestamp cannot be NaN"))); #endif /* rangecheck: see if timestamp_out would like it */ *** a/src/backend/utils/adt/xml.c --- b/src/backend/utils/adt/xml.c *************** *** 109,115 **** static int parse_xml_decl(const xmlChar *str, size_t *lenp, static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, ! bool preserve_whitespace, xmlChar *encoding); static text *xml_xmlnodetoxmltype(xmlNodePtr cur); #endif /* USE_LIBXML */ --- 109,115 ---- static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, ! bool preserve_whitespace, int encoding); static text *xml_xmlnodetoxmltype(xmlNodePtr cur); #endif /* USE_LIBXML */ *************** *** 183,189 **** xml_in(PG_FUNCTION_ARGS) * Parse the data to check if it is well-formed XML data. Assume that * ERROR occurred if parsing failed. */ ! doc = xml_parse(vardata, xmloption, true, NULL); xmlFreeDoc(doc); PG_RETURN_XML_P(vardata); --- 183,189 ---- * Parse the data to check if it is well-formed XML data. Assume that * ERROR occurred if parsing failed. */ ! doc = xml_parse(vardata, xmloption, true, GetDatabaseEncoding()); xmlFreeDoc(doc); PG_RETURN_XML_P(vardata); *************** *** 272,278 **** xml_recv(PG_FUNCTION_ARGS) char *newstr; int nbytes; xmlDocPtr doc; ! xmlChar *encoding = NULL; /* * Read the data in raw format. We don't know yet what the encoding is, as --- 272,279 ---- char *newstr; int nbytes; xmlDocPtr doc; ! xmlChar *encodingStr = NULL; ! int encoding; /* * Read the data in raw format. We don't know yet what the encoding is, as *************** *** 293,299 **** xml_recv(PG_FUNCTION_ARGS) str = VARDATA(result); str[nbytes] = '\0'; ! parse_xml_decl((xmlChar *) str, NULL, NULL, &encoding, NULL); /* * Parse the data to check if it is well-formed XML data. Assume that --- 294,308 ---- str = VARDATA(result); str[nbytes] = '\0'; ! parse_xml_decl((xmlChar *) str, NULL, NULL, &encodingStr, NULL); ! ! /* ! * If encoding was explicitly specified in the XML header, treat it as ! * UTF-8, as that's the default in XML. This is different from xml_in(), ! * where the input has to go through the normal client to server encoding ! * conversion. ! */ ! encoding = encodingStr ? xmlChar_to_encoding(encodingStr) : PG_UTF8; /* * Parse the data to check if it is well-formed XML data. Assume that *************** *** 305,313 **** xml_recv(PG_FUNCTION_ARGS) /* Now that we know what we're dealing with, convert to server encoding */ newstr = (char *) pg_do_encoding_conversion((unsigned char *) str, nbytes, ! encoding ? ! xmlChar_to_encoding(encoding) : ! PG_UTF8, GetDatabaseEncoding()); if (newstr != str) --- 314,320 ---- /* Now that we know what we're dealing with, convert to server encoding */ newstr = (char *) pg_do_encoding_conversion((unsigned char *) str, nbytes, ! encoding, GetDatabaseEncoding()); if (newstr != str) *************** *** 659,665 **** xmlparse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace) #ifdef USE_LIBXML xmlDocPtr doc; ! doc = xml_parse(data, xmloption_arg, preserve_whitespace, NULL); xmlFreeDoc(doc); return (xmltype *) data; --- 666,673 ---- #ifdef USE_LIBXML xmlDocPtr doc; ! doc = xml_parse(data, xmloption_arg, preserve_whitespace, ! GetDatabaseEncoding()); xmlFreeDoc(doc); return (xmltype *) data; *************** *** 799,805 **** xml_is_document(xmltype *arg) /* We want to catch ereport(INVALID_XML_DOCUMENT) and return false */ PG_TRY(); { ! doc = xml_parse((text *) arg, XMLOPTION_DOCUMENT, true, NULL); result = true; } PG_CATCH(); --- 807,814 ---- /* We want to catch ereport(INVALID_XML_DOCUMENT) and return false */ PG_TRY(); { ! doc = xml_parse((text *) arg, XMLOPTION_DOCUMENT, true, ! GetDatabaseEncoding()); result = true; } PG_CATCH(); *************** *** 1152,1158 **** print_xml_decl(StringInfo buf, const xmlChar *version, */ static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, ! xmlChar *encoding) { int32 len; xmlChar *string; --- 1161,1167 ---- */ static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, ! int encoding) { int32 len; xmlChar *string; *************** *** 1165,1173 **** xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, utf8string = pg_do_encoding_conversion(string, len, ! encoding ? ! xmlChar_to_encoding(encoding) : ! GetDatabaseEncoding(), PG_UTF8); /* Start up libxml and its parser (no-ops if already done) */ --- 1174,1180 ---- utf8string = pg_do_encoding_conversion(string, len, ! encoding, PG_UTF8); /* Start up libxml and its parser (no-ops if already done) */ *** a/src/include/utils/datetime.h --- b/src/include/utils/datetime.h *************** *** 262,267 **** extern const int day_tab[2][13]; --- 262,269 ---- || (((m) == JULIAN_MINMONTH) && ((d) >= JULIAN_MINDAY))))) \ && ((y) < JULIAN_MAXYEAR)) + #define JULIAN_MAX (2145031948) /* == date2j(JULIAN_MAXYEAR, 1 ,1) */ + /* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */ #define UNIX_EPOCH_JDATE 2440588 /* == date2j(1970, 1, 1) */ #define POSTGRES_EPOCH_JDATE 2451545 /* == date2j(2000, 1, 1) */
pgsql-hackers by date: