Re: Moving other hex functions to /common - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Moving other hex functions to /common
Date
Msg-id X/ADzYzkP0ExrBq1@paquier.xyz
Whole thread Raw
In response to Re: Moving other hex functions to /common  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Moving other hex functions to /common  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Fri, Jan 01, 2021 at 03:06:13PM -0500, Bruce Momjian wrote:
> Thanks.  I had to learn how to squash my commits into a new branch and
> then generate a format-patch on that:
>
>     https://bugsdb.com/_en/debug/8b648ec395b86be32efa9629cb006d74
>
> I wanted to see how the cfbot liked my original patch with the renames,
> and it didn't, so now I know I have to use this method for the
> commitfest.  Patch attached.

There are many ways to do that, indeed.  On my end, I use a local
branch. and then apply a set of git reset --soft before recreating a
single commit.

> Let me get my patch building on the cfbot and then I will address each
> of these.  I am trying to do one stage at a time since I am still
> learning the process.  Thanks.

No problem.  On my end, this stuff has been itching me for a couple of
days and I could not recall why..  Until I remembered that the design
of the hex APIs in your patch is weak with overflow handling because
we don't pass down to the function the size of the destination buffer.
We have finished with a similar set of issues on the past with SCRAM
and base64, with has led to CVE-2019-10164 and the improvements done
in cfc40d3.  So I think that we need to improve things in a safer way.
Mapping with the design for base64, I have finished with the attached
patch, and the following set:
+extern int64 pg_hex_decode(const char *src, int64 len, char *dst, int64 dstlen);
+extern int64 pg_hex_encode(const char *src, int64 len, char *dst, int64 dstlen);
+extern int64 pg_hex_enc_len(int64 srclen);
+extern int64 pg_hex_dec_len(int64 srclen);

This ensures that the result never overflows, which explains the
introduction of an error code for the encoding part, and does not
use elog()/pg_log() so as external libraries can use them.  ECPG uses
long variables in a couple of places, explaining why it feels safer to
use int64.  int should give enough room to any caller of those APIs,
but there is no drawback in using a 64-bit API either, and I don't
think it is worth the risk to break ECPG either for long handling,
even if I don't believe either that folks are going to work on strings
larger than 2GB.

Things get trickier for the bytea input/output because we want more
generic error messages depending for invalid values or an incorrect
number of digits, which is why I have left the copies in encode.c.
This design could be easily extended with more types of error codes,
though I am not sure if that's worth bothering.

Even with that, this leads to much more sanity for hex buffer
manipulation in backup manifests (I don't think that using
PG_SHAXXX_DIGEST_STRING_LENGTH is a good idea either, I'd like to get
rid of it in the long-term) and ECPG, so that's clearly a gain.

I don't have a Windows host at hand, though I think that it should
work there correctly.  What do you think about the ideas in the
attached patch?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Next
From: Michael Paquier
Date:
Subject: Re: Move --data-checksums to common options in initdb --help