Re: Decision by Monday: PQescapeString() vs. encoding violation - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Decision by Monday: PQescapeString() vs. encoding violation |
Date | |
Msg-id | 30c84c5fa464f367cbad1d3dc8088342795f96d2.camel@j-davis.com Whole thread Raw |
List | pgsql-hackers |
Expanding on the reasoning a bit: This discussion is only relevant only when the application meets the following conditions: A. Sends invalidly-encoded input to an escaping routine. Many languages protect against this, such as python and rust. But other languages, like C, Go, and Ruby do not. B. Uses PQescapeStringConn() and ignores the error, or an escaping routine that doesn't provide an error at all. (Earlier versions of PQescapeStringConn ignored invalid sequences in the middle of the string, but 5dc1e42b4f fixed that.) C. Does some kind of post-processing to the escaped output to remove invalidly-encoded data before sending it to the server. If the invalid data wasn't removed, the server would throw an immediate error anyway. Given that the situation already involves A, B & C, it's already somewhat of a special case and I don't consider it a clear-and-present security threat beyond what was fixed for the CVE. Instead, this is more about following an accepted practice that is less likely to result in security problems even if the application code is imperfect. On Fri, 2025-02-14 at 17:27 -0800, Noah Misch wrote: > Question for all: would you switch to the "remove fewer bytes" > behavior in > next week's releases, switch later, or switch never? Why so? The reasoning why it's the right change are: 1. It seems reasonable to assume most invalid byte sequences in the middle of a string usually resulted from some kind of byte splicing or concatenation. For instance, an application taking untrusted input bytes, validating only at the byte level, and then concatenating with other text (which might be done by string interpolation or by applying a template). - Given the above assumption, valid initial bytes in a byte sequence are much more likely to be the start of a new valid character than part of the previous invalid character. - If it's the start of a new valid character, it might be an important delimiter, and removing it could have security consequences, perhaps not for Postgres itself, but for whatever the application does with that string (with a missing delimiter) later. 2. The comparable risk on the other side seems substantially lower. Let's say your application considers 0x7C ('|') to be a very important delimiter. If the untrusted input contains, e.g. "\xC2\x7C" (invalid sequence), one could imagine PQescapeStringConn() saving the day by removing both bytes. But that seems more like an accident than anything else: the application needs to validate the untrusted input regardless; otherwise the attacker could just supply a valid '|'. - Caveat: the application's validation could be broken in such a way that it skips over multibyte characters without checking for invalid sequences, i.e. seeing the \xC2 and skipping over the \x7C without escaping it. Unfortunately, that's exactly what PQescapeStringConn() did until recently, so this is a reasonable argument in favor of "remove more bytes". But we shouldn't favor multibyte-half-aware validators at the cost of breaking byte-by-byte validators. 3. Unicode strongly recommends that we take the "remove fewer bytes" option. They are in a good position to have an informed opinion about the security consequences of either choice, so this is partially an appeal to authority. But that also means that it's the way other string handling functions are likley to behave, e.g. Ruby's String#scrub seems to take the "remove fewer bytes" approach. Following convention has its own security benefits. Regards, Jeff Davis
pgsql-hackers by date: