Thread: Unbounded %s in sscanf
I happened to spot the below call in src/bin/pg_basebackup/streamutil.c which has an unbounded %s in the format. /* fetch xlog value and unit from the result */ if (sscanf(PQgetvalue(res, 0, 0), "%d%s", &xlog_val, xlog_unit) != 2) There is no risk of overflow as the unit is defined to be at most 2 characters, but that's not explained (like how a similar %s is handled in pg_dump). The attached adds a small explanation in the comment to save readers time from following the bouncing ball to make sure. -- Daniel Gustafsson https://vmware.com/
Attachment
Daniel Gustafsson <daniel@yesql.se> writes: > I happened to spot the below call in src/bin/pg_basebackup/streamutil.c which > has an unbounded %s in the format. > /* fetch xlog value and unit from the result */ > if (sscanf(PQgetvalue(res, 0, 0), "%d%s", &xlog_val, xlog_unit) != 2) > There is no risk of overflow as the unit is defined to be at most 2 characters, > but that's not explained (like how a similar %s is handled in pg_dump). Ugh. Shouldn't we instead modify the format to read not more than two characters? Even if this is safe on non-malicious input, it doesn't seem like good style. regards, tom lane
> On 28 Jun 2021, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ugh. Shouldn't we instead modify the format to read not more than > two characters? Even if this is safe on non-malicious input, it > doesn't seem like good style. No disagreement, I was only basing it on what is in the tree. I would propose that we change the sscanf in _LoadBlobs() too though to eliminate all such callsites, even though that one is even safer. I'll prepare a patch once more caffeine has been ingested. -- Daniel Gustafsson https://vmware.com/
On 2021-Jun-28, Daniel Gustafsson wrote: > I happened to spot the below call in src/bin/pg_basebackup/streamutil.c which > has an unbounded %s in the format. > > /* fetch xlog value and unit from the result */ > if (sscanf(PQgetvalue(res, 0, 0), "%d%s", &xlog_val, xlog_unit) != 2) > > There is no risk of overflow as the unit is defined to be at most 2 characters, > but that's not explained (like how a similar %s is handled in pg_dump). The > attached adds a small explanation in the comment to save readers time from > following the bouncing ball to make sure. Can you attack the system by crafting malicious output from the query? I think the answer is still no, because the output comes from the query SHOW wal_segment_size which, if the attacker can control, the person running pg_basebackup has way more serious problems. But TBH it seems easy enough to limit to the output variable width. -- Álvaro Herrera 39°49'30"S 73°17'W "How amazing is that? I call it a night and come back to find that a bug has been identified and patched while I sleep." (Robert Davidson) http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php
> On 28 Jun 2021, at 16:45, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 28 Jun 2021, at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Ugh. Shouldn't we instead modify the format to read not more than >> two characters? Even if this is safe on non-malicious input, it >> doesn't seem like good style. > > No disagreement, I was only basing it on what is in the tree. I would propose > that we change the sscanf in _LoadBlobs() too though to eliminate all such > callsites, even though that one is even safer. I'll prepare a patch once more > caffeine has been ingested. Returning to this, attached is a patchset which amends the two sscanf() callsites with their respective buffersizes for %s format parsing. In pg_dump we need to inject the MAXPGPATH limit with the preprocessor and thus the buffer needs to be increased by one to account for the terminator (which is yet more hygiene coding since the fname buffer is now larger than the input buffer). While in here, I noticed that the fname variable is shadowed in the loop parsing the blobs TOC which yields a broken error message on parse errors. The attached 0003 fixes that. -- Daniel Gustafsson https://vmware.com/
Attachment
I took another look at this today, and propose to push the attached. The pg_dump fix goes all the way back to 9.6 whereas the pg_basebackup fix is from 11 and onwards. The adjacent shadowed variable bug in pg_dump is also present since 9.6. Thoughts? -- Daniel Gustafsson https://vmware.com/
Attachment
Daniel Gustafsson <daniel@yesql.se> writes: > I took another look at this today, and propose to push the attached. The > pg_dump fix goes all the way back to 9.6 whereas the pg_basebackup fix is from > 11 and onwards. The adjacent shadowed variable bug in pg_dump is also present > since 9.6. > Thoughts? Generally +1, though I wonder if it'd be prudent to deal with the shadowed-variable bug by renaming *both* variables. "fname" is clearly too generic in a function that deals with multiple file names. Another thing that is nibbling at the back of my mind is that one reason we started to use src/port/snprintf.c all the time is that glibc's *printf functions behave in a very unfriendly fashion when asked to print text that they think is invalidly encoded, but only if the format involves an explicit field width spec. I wonder if we're opening ourselves to similar problems if we start to use field widths with *scanf. In principle, I think the input text always ought to be ASCII in these cases, so that there's no hazard. But is there an interesting security aspect here? That is, if someone can inject a maliciously-crafted file containing non-ASCII data, what kind of misbehavior could ensue? It might be that sscanf would just report failure and we'd give up, which would be fine. But if a stack overrun could be triggered that way, it'd not be fine. regards, tom lane
> On 30 Jul 2021, at 18:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Daniel Gustafsson <daniel@yesql.se> writes: >> I took another look at this today, and propose to push the attached. The >> pg_dump fix goes all the way back to 9.6 whereas the pg_basebackup fix is from >> 11 and onwards. The adjacent shadowed variable bug in pg_dump is also present >> since 9.6. >> Thoughts? Reviving an old thread that had gotten lost in the mists of the INBOX. > Generally +1, though I wonder if it'd be prudent to deal with the > shadowed-variable bug by renaming *both* variables. "fname" is > clearly too generic in a function that deals with multiple file > names. Good point, done in the attached. > Another thing that is nibbling at the back of my mind is that one > reason we started to use src/port/snprintf.c all the time is that > glibc's *printf functions behave in a very unfriendly fashion when > asked to print text that they think is invalidly encoded, but only > if the format involves an explicit field width spec. I wonder if > we're opening ourselves to similar problems if we start to use > field widths with *scanf. In principle, I think the input text > always ought to be ASCII in these cases, so that there's no hazard. > But is there an interesting security aspect here? That is, if someone > can inject a maliciously-crafted file containing non-ASCII data, what > kind of misbehavior could ensue? It might be that sscanf would just > report failure and we'd give up, which would be fine. But if a > stack overrun could be triggered that way, it'd not be fine. sscanf won't fail in that case. For multibyte input, %xs will simply stop after x bytes, ignoring torn characters with a (highly likely) incorrect value in the result variable. Using %xls (or %xS) would however count x towards the number of multibytes, which if stored in a normal char* variable could result in an overflow. With a width specifier this isn't really a vector. If an attacker can inject multibyte X which after a torn read results in z being parsed and acted upon, she could also by definition inject z to begin with. Without a width specifier, If a malicious actor manages to inject multibyte (or just too many bytes), it could however lead to a stack overflow as sscanf will keep reading until a whitespace byte. I propose to apply the attached all the way down (with the basebackup hunk from 11), or down to 10 if we want to be conservative with the final 9.6 re ancient bugs that haven't seen complaints. -- Daniel Gustafsson https://vmware.com/
Attachment
Daniel Gustafsson <daniel@yesql.se> writes: > I propose to apply the attached all the way down (with the basebackup hunk from > 11), or down to 10 if we want to be conservative with the final 9.6 re ancient > bugs that haven't seen complaints. LGTM. No objection to applying this in 9.6. regards, tom lane