Thread: Unbounded %s in sscanf

Unbounded %s in sscanf

From

Daniel Gustafsson

Date:

28 June 2021, 16:06:37

I happened to spot the below call in src/bin/pg_basebackup/streamutil.c which
has an unbounded %s in the format.

    /* fetch xlog value and unit from the result */
    if (sscanf(PQgetvalue(res, 0, 0), "%d%s", &xlog_val, xlog_unit) != 2)

There is no risk of overflow as the unit is defined to be at most 2 characters,
but that's not explained (like how a similar %s is handled in pg_dump).  The
attached adds a small explanation in the comment to save readers time from
following the bouncing ball to make sure.

--
Daniel Gustafsson        https://vmware.com/

Attachment

pg_basebackup_sscanf.diff

Re: Unbounded %s in sscanf

From

Tom Lane

Date:

28 June 2021, 17:02:46

Daniel Gustafsson <daniel@yesql.se> writes:
> I happened to spot the below call in src/bin/pg_basebackup/streamutil.c which
> has an unbounded %s in the format.

>     /* fetch xlog value and unit from the result */
>     if (sscanf(PQgetvalue(res, 0, 0), "%d%s", &xlog_val, xlog_unit) != 2)

> There is no risk of overflow as the unit is defined to be at most 2 characters,
> but that's not explained (like how a similar %s is handled in pg_dump).

Ugh.  Shouldn't we instead modify the format to read not more than
two characters?  Even if this is safe on non-malicious input, it
doesn't seem like good style.

            regards, tom lane

Re: Unbounded %s in sscanf

From

Daniel Gustafsson

Date:

28 June 2021, 17:45:53

> On 28 Jun 2021, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Ugh.  Shouldn't we instead modify the format to read not more than
> two characters?  Even if this is safe on non-malicious input, it
> doesn't seem like good style.

No disagreement, I was only basing it on what is in the tree.  I would propose
that we change the sscanf in _LoadBlobs() too though to eliminate all such
callsites, even though that one is even safer.  I'll prepare a patch once more
caffeine has been ingested.

--
Daniel Gustafsson        https://vmware.com/

Re: Unbounded %s in sscanf

From

Alvaro Herrera

Date:

28 June 2021, 19:45:10

On 2021-Jun-28, Daniel Gustafsson wrote:

> I happened to spot the below call in src/bin/pg_basebackup/streamutil.c which
> has an unbounded %s in the format.
> 
>     /* fetch xlog value and unit from the result */
>     if (sscanf(PQgetvalue(res, 0, 0), "%d%s", &xlog_val, xlog_unit) != 2)
> 
> There is no risk of overflow as the unit is defined to be at most 2 characters,
> but that's not explained (like how a similar %s is handled in pg_dump).  The
> attached adds a small explanation in the comment to save readers time from
> following the bouncing ball to make sure.

Can you attack the system by crafting malicious output from the query?
I think the answer is still no, because the output comes from the query
  SHOW wal_segment_size
which, if the attacker can control, the person running pg_basebackup has
way more serious problems.

But TBH it seems easy enough to limit to the output variable width.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep."                (Robert Davidson)
               http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php

Re: Unbounded %s in sscanf

From

Daniel Gustafsson

Date:

03 July 2021, 23:18:54

> On 28 Jun 2021, at 16:45, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 28 Jun 2021, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Ugh.  Shouldn't we instead modify the format to read not more than
>> two characters?  Even if this is safe on non-malicious input, it
>> doesn't seem like good style.
>
> No disagreement, I was only basing it on what is in the tree.  I would propose
> that we change the sscanf in _LoadBlobs() too though to eliminate all such
> callsites, even though that one is even safer.  I'll prepare a patch once more
> caffeine has been ingested.

Returning to this, attached is a patchset which amends the two sscanf()
callsites with their respective buffersizes for %s format parsing.  In pg_dump
we need to inject the MAXPGPATH limit with the preprocessor and thus the buffer
needs to be increased by one to account for the terminator (which is yet more
hygiene coding since the fname buffer is now larger than the input buffer).

While in here, I noticed that the fname variable is shadowed in the loop
parsing the blobs TOC which yields a broken error message on parse errors.  The
attached 0003 fixes that.

--
Daniel Gustafsson        https://vmware.com/

Attachment

Re: Unbounded %s in sscanf

From

Daniel Gustafsson

Date:

30 July 2021, 18:48:49

I took another look at this today, and propose to push the attached.  The
pg_dump fix goes all the way back to 9.6 whereas the pg_basebackup fix is from
11 and onwards.  The adjacent shadowed variable bug in pg_dump is also present
since 9.6.

Thoughts?

--
Daniel Gustafsson        https://vmware.com/

Attachment

Re: Unbounded %s in sscanf

From

Tom Lane

Date:

30 July 2021, 19:03:59

Daniel Gustafsson <daniel@yesql.se> writes:
> I took another look at this today, and propose to push the attached.  The
> pg_dump fix goes all the way back to 9.6 whereas the pg_basebackup fix is from
> 11 and onwards.  The adjacent shadowed variable bug in pg_dump is also present
> since 9.6.
> Thoughts?

Generally +1, though I wonder if it'd be prudent to deal with the
shadowed-variable bug by renaming *both* variables.  "fname" is
clearly too generic in a function that deals with multiple file
names.

Another thing that is nibbling at the back of my mind is that one
reason we started to use src/port/snprintf.c all the time is that
glibc's *printf functions behave in a very unfriendly fashion when
asked to print text that they think is invalidly encoded, but only
if the format involves an explicit field width spec.  I wonder if
we're opening ourselves to similar problems if we start to use
field widths with *scanf.  In principle, I think the input text
always ought to be ASCII in these cases, so that there's no hazard.
But is there an interesting security aspect here?  That is, if someone
can inject a maliciously-crafted file containing non-ASCII data, what
kind of misbehavior could ensue?  It might be that sscanf would just
report failure and we'd give up, which would be fine.  But if a
stack overrun could be triggered that way, it'd not be fine.

            regards, tom lane

Re: Unbounded %s in sscanf

From

Daniel Gustafsson

Date:

15 October 2021, 14:44:12

> On 30 Jul 2021, at 18:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> I took another look at this today, and propose to push the attached.  The
>> pg_dump fix goes all the way back to 9.6 whereas the pg_basebackup fix is from
>> 11 and onwards.  The adjacent shadowed variable bug in pg_dump is also present
>> since 9.6.
>> Thoughts?

Reviving an old thread that had gotten lost in the mists of the INBOX.

> Generally +1, though I wonder if it'd be prudent to deal with the
> shadowed-variable bug by renaming *both* variables.  "fname" is
> clearly too generic in a function that deals with multiple file
> names.

Good point, done in the attached.

> Another thing that is nibbling at the back of my mind is that one
> reason we started to use src/port/snprintf.c all the time is that
> glibc's *printf functions behave in a very unfriendly fashion when
> asked to print text that they think is invalidly encoded, but only
> if the format involves an explicit field width spec.  I wonder if
> we're opening ourselves to similar problems if we start to use
> field widths with *scanf.  In principle, I think the input text
> always ought to be ASCII in these cases, so that there's no hazard.
> But is there an interesting security aspect here?  That is, if someone
> can inject a maliciously-crafted file containing non-ASCII data, what
> kind of misbehavior could ensue?  It might be that sscanf would just
> report failure and we'd give up, which would be fine.  But if a
> stack overrun could be triggered that way, it'd not be fine.

sscanf won't fail in that case.  For multibyte input, %xs will simply stop
after x bytes, ignoring torn characters with a (highly likely) incorrect value
in the result variable.  Using %xls (or %xS) would however count x towards the
number of multibytes, which if stored in a normal char* variable could result
in an overflow.

With a width specifier this isn't really a vector.  If an attacker can inject
multibyte X which after a torn read results in z being parsed and acted upon,
she could also by definition inject z to begin with.

Without a width specifier, If a malicious actor manages to inject multibyte (or
just too many bytes), it could however lead to a stack overflow as sscanf will
keep reading until a whitespace byte.

I propose to apply the attached all the way down (with the basebackup hunk from
11), or down to 10 if we want to be conservative with the final 9.6 re ancient
bugs that haven't seen complaints.

--
Daniel Gustafsson        https://vmware.com/

Attachment

Re: Unbounded %s in sscanf

From

Tom Lane

Date:

15 October 2021, 17:07:21

Daniel Gustafsson <daniel@yesql.se> writes:
> I propose to apply the attached all the way down (with the basebackup hunk from
> 11), or down to 10 if we want to be conservative with the final 9.6 re ancient
> bugs that haven't seen complaints.

LGTM.  No objection to applying this in 9.6.

            regards, tom lane