Re: Moving other hex functions to /common - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Moving other hex functions to /common
Date
Msg-id 20210105034739.GG7432@momjian.us
Whole thread Raw
In response to Re: Moving other hex functions to /common  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Moving other hex functions to /common  (Michael Paquier <michael@paquier.xyz>)
Re: Moving other hex functions to /common  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Sat, Jan  2, 2021 at 02:25:33PM +0900, Michael Paquier wrote:
> > Let me get my patch building on the cfbot and then I will address each
> > of these.  I am trying to do one stage at a time since I am still
> > learning the process.  Thanks.
> 
> No problem.  On my end, this stuff has been itching me for a couple of
> days and I could not recall why..  Until I remembered that the design
> of the hex APIs in your patch is weak with overflow handling because
> we don't pass down to the function the size of the destination buffer.
> We have finished with a similar set of issues on the past with SCRAM
> and base64, with has led to CVE-2019-10164 and the improvements done
> in cfc40d3.  So I think that we need to improve things in a safer way.
> Mapping with the design for base64, I have finished with the attached
> patch, and the following set:
> +extern int64 pg_hex_decode(const char *src, int64 len, char *dst, int64 dstlen);
> +extern int64 pg_hex_encode(const char *src, int64 len, char *dst, int64 dstlen);
> +extern int64 pg_hex_enc_len(int64 srclen);
> +extern int64 pg_hex_dec_len(int64 srclen);

I can see the value of passing the destination length to the hex
functions, and I think you have to pass the src length to pg_hex_encode
since the input can be binary.  I assume the pg_hex_decode doesn't need
the source length because it is a null-terminated C string, right? 
However, pg_base64_decode(const char *src, size_t len, char *dst) does
pass in the src length, so I can see we should just make it the same.  I
also agree it is unclear if 'len' is the src or dst len, so your patch
fixes that with the names.  Also, is there a reason you use int64
instead of the size_t used by bytea?

> This ensures that the result never overflows, which explains the
> introduction of an error code for the encoding part, and does not 
> use elog()/pg_log() so as external libraries can use them.  ECPG uses
> long variables in a couple of places, explaining why it feels safer to
> use int64.  int should give enough room to any caller of those APIs,
> but there is no drawback in using a 64-bit API either, and I don't
> think it is worth the risk to break ECPG either for long handling,
> even if I don't believe either that folks are going to work on strings
> larger than 2GB.

Might as well just have hex match what bytea and esc does.

> Things get trickier for the bytea input/output because we want more
> generic error messages depending for invalid values or an incorrect
> number of digits, which is why I have left the copies in encode.c.
> This design could be easily extended with more types of error codes,
> though I am not sure if that's worth bothering.

I don't think removing the two error type reporting from src/common, and
then having to add it back into adt/encode.c makes sense.  There are
two, soon three, that want those two error reports, so I am thinking we
are best to just leave ecpg alone since it is just a library that
doesn't want more than one error reporting.  I don't think we will have
many more library call cases.

> Even with that, this leads to much more sanity for hex buffer
> manipulation in backup manifests (I don't think that using  
> PG_SHAXXX_DIGEST_STRING_LENGTH is a good idea either, I'd like to get
> rid of it in the long-term) and ECPG, so that's clearly a gain.
> 
> I don't have a Windows host at hand, though I think that it should
> work there correctly.  What do you think about the ideas in the
> attached patch?

I think your patch has a few mistakes.  First, I think you removed the
hex_decode files from the file system, rather than removing it via git,
so the diff didn't apply in the cfbot.  Second, I don't think ecpg ever
used libpgcommon before.  For non-Windows, libpgcommon got referenced
anyway in the build, so non-Windows compiles worked, but in Windows,
that was not referenced, meaning the cfbot failed on ecpglib.  I have
modified the attached patch with both of these fixed --- let's see how
it likes this version.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: pg_waldump/heapdesc.c and struct field names
Next
From: Tom Lane
Date:
Subject: Re: macOS SIP, next try