Home > mailing lists

Re: Decision by Monday: PQescapeString() vs. encoding violation - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Decision by Monday: PQescapeString() vs. encoding violation
Date	February 15 23:10:57
Msg-id	30c84c5fa464f367cbad1d3dc8088342795f96d2.camel@j-davis.com Whole thread Raw
List	pgsql-hackers

Tree view

Expanding on the reasoning a bit:

This discussion is only relevant only when the application meets the
following conditions:

  A. Sends invalidly-encoded input to an escaping routine. Many
languages protect against this, such as python and rust. But other
languages, like C, Go, and Ruby do not.

  B. Uses PQescapeStringConn() and ignores the error, or an escaping
routine that doesn't provide an error at all. (Earlier versions of
PQescapeStringConn ignored invalid sequences in the middle of the
string, but 5dc1e42b4f fixed that.)

  C. Does some kind of post-processing to the escaped output to remove
invalidly-encoded data before sending it to the server. If the invalid
data wasn't removed, the server would throw an immediate error anyway.

Given that the situation already involves A, B & C, it's already
somewhat of a special case and I don't consider it a clear-and-present
security threat beyond what was fixed for the CVE. Instead, this is
more about following an accepted practice that is less likely to result
in security problems even if the application code is imperfect.

On Fri, 2025-02-14 at 17:27 -0800, Noah Misch wrote:
> Question for all: would you switch to the "remove fewer bytes"
> behavior in
> next week's releases, switch later, or switch never?  Why so?

The reasoning why it's the right change are:

1. It seems reasonable to assume most invalid byte sequences in the
middle of a string usually resulted from some kind of byte splicing or
concatenation. For instance, an application taking untrusted input
bytes, validating only at the byte level, and then concatenating with
other text (which might be done by string interpolation or by applying
a template).

  - Given the above assumption, valid initial bytes in a byte sequence
are much more likely to be the start of a new valid character than part
of the previous invalid character.

  - If it's the start of a new valid character, it might be an
important delimiter, and removing it could have security consequences,
perhaps not for Postgres itself, but for whatever the application does
with that string (with a missing delimiter) later.

2. The comparable risk on the other side seems substantially lower.
Let's say your application considers 0x7C ('|') to be a very important
delimiter. If the untrusted input contains, e.g. "\xC2\x7C" (invalid
sequence), one could imagine PQescapeStringConn() saving the day by
removing both bytes. But that seems more like an accident than anything
else: the application needs to validate the untrusted input regardless;
otherwise the attacker could just supply a valid '|'.

  - Caveat: the application's validation could be broken in such a way
that it skips over multibyte characters without checking for invalid
sequences, i.e. seeing the \xC2 and skipping over the \x7C without
escaping it. Unfortunately, that's exactly what PQescapeStringConn()
did until recently, so this is a reasonable argument in favor of
"remove more bytes". But we shouldn't favor multibyte-half-aware
validators at the cost of breaking byte-by-byte validators.

3. Unicode strongly recommends that we take the "remove fewer bytes"
option. They are in a good position to have an informed opinion about
the security consequences of either choice, so this is partially an
appeal to authority. But that also means that it's the way other string
handling functions are likley to behave, e.g. Ruby's String#scrub seems
to take the "remove fewer bytes" approach. Following convention has its
own security benefits.

Regards,
    Jeff Davis

pgsql-hackers by date:

From: Andres Freund
Date: 15 February, 23:09:48
Subject: Re: Decision by Monday: PQescapeString() vs. encoding violation

From: Jeff Davis
Date: 15 February, 23:35:45
Subject: Re: Decision by Monday: PQescapeString() vs. encoding violation

Re: Decision by Monday: PQescapeString() vs. encoding violation - Mailing list pgsql-hackers

Previous

Next