Thread: Refactoring base64 encoding and decoding into a safer interface
Hi all, After the issues behind CVE-2019-10164, it seems that we can do much better with the current interface of decoding and encoding functions for base64 in src/common/. The root issue is that the callers of pg_b64_decode() and pg_b64_encode() provide a buffer where the result gets stored which is allocated using respectively pg_b64_dec_len() and pg_b64_dec_enc() (those routines overestimate the allocation on purpose) but we don't allow callers to provide the length of the buffer allocated and hence those routines lack sanity checks to make sure that what is in input does not cause an overflow within the result buffer. One thing I have noticed is that many projects on the net include this code for their own purpose, and I have suspicions that many other projects link to the code from Postgres and make use of it. So that's rather scary. Attached is a refactoring patch for those interfaces, which introduces a set of overflow checks so as we cannot repeat errors of the past. This adds one argument to pg_b64_decode() and pg_b64_encode() as the size of the result buffer, and we make use of it in the code to make sure that an error is reported in case of an overflow. That's the status code -1 which is used for other errors for simplicity. One thing to note is that the decoding path can already complain on some errors, basically an incorrectly shaped encoded string, but the encoding path does not have any errors yet, so we need to make sure that all the existing callers of pg_b64_encode() complain correctly with the new interface. I am adding that to the next CF for v13. Any thoughts? -- Michael
Attachment
> On 23 Jun 2019, at 15:25, Michael Paquier <michael@paquier.xyz> wrote: > Attached is a refactoring patch for those interfaces, which introduces > a set of overflow checks so as we cannot repeat errors of the past. I’ve done a review of this submission. The patch applies cleanly, and passes make check, ssl/scram tests etc. There is enough documentation I very much agree that functions operating on a buffer like this should have the size of the buffer in order to safeguard against overflow, so +1 on the general concept. > Any thoughts? A few small comments: In src/common/scram-common.c there are a few instances like this. Shouldn’t we also free the result buffer in these cases? +#ifdef FRONTEND + return NULL; +#else In the below passage, we leave the input buffer with a non-complete encoded string. Should we memset the buffer to zero to avoid the risk that code which fails to check the returnvalue believes it has an encoded string? + /* + * Leave if there is an overflow in the area allocated for + * the encoded string. + */ + if ((p - dst + 4) > dstlen) + return -1; cheers ./daniel
On Mon, Jul 01, 2019 at 11:11:43PM +0200, Daniel Gustafsson wrote: > I very much agree that functions operating on a buffer like this should have > the size of the buffer in order to safeguard against overflow, so +1 on the > general concept. Thanks for the review! > A few small comments: > > In src/common/scram-common.c there are a few instances like this. Shouldn’t we > also free the result buffer in these cases? > > +#ifdef FRONTEND > + return NULL; > +#else Fixed. > In the below passage, we leave the input buffer with a non-complete > encoded string. Should we memset the buffer to zero to avoid the > risk that code which fails to check the return value believes it has > an encoded string? Hmm. Good point. I have not thought of that, and your suggestion makes sense. Another question is if we'd want to actually use explicit_bzero() here, but that could be a discussion on this other thread, except if the patch discussed there is merged first: https://www.postgresql.org/message-id/42d26bde-5d5b-c90d-87ae-6cab875f73be@2ndquadrant.com Attached is an updated patch. -- Michael
Attachment
> On 2 Jul 2019, at 07:41, Michael Paquier <michael@paquier.xyz> wrote: >> In the below passage, we leave the input buffer with a non-complete >> encoded string. Should we memset the buffer to zero to avoid the >> risk that code which fails to check the return value believes it has >> an encoded string? > > Hmm. Good point. I have not thought of that, and your suggestion > makes sense. > > Another question is if we'd want to actually use explicit_bzero() > here, but that could be a discussion on this other thread, except if > the patch discussed there is merged first: > https://www.postgresql.org/message-id/42d26bde-5d5b-c90d-87ae-6cab875f73be@2ndquadrant.com I’m not sure we need to go to that length, but I don’t have overly strong opinions. I think of this more like a case of “we’ve changed the API with new errorcases that we didn’t handle before, so we’re being a little defensive to help you avoid subtle bugs”. > Attached is an updated patch. Looks good, passes tests, provides value to the code. Bumping this to ready for committer as I no more comments to add. cheers ./daniel
On Tue, Jul 02, 2019 at 09:56:03AM +0200, Daniel Gustafsson wrote: > I’m not sure we need to go to that length, but I don’t have overly strong > opinions. I think of this more like a case of “we’ve changed the API with new > errorcases that we didn’t handle before, so we’re being a little defensive to > help you avoid subtle bugs”. I quite like this suggestion. > Looks good, passes tests, provides value to the code. Bumping this to ready > for committer as I no more comments to add. Thanks. I'll look at that again in a couple of days, let's see if others have any input to offer. -- Michael
Attachment
On Tue, Jul 02, 2019 at 09:56:03AM +0200, Daniel Gustafsson wrote: > Looks good, passes tests, provides value to the code. Bumping this to ready > for committer as I no more comments to add. Thanks. I have spent more time testing the different error paths and the new checks in base64.c, and committed the thing. -- Michael