Re: refactoring - share str2*int64 functions - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: refactoring - share str2*int64 functions
Date
Msg-id alpine.DEB.2.21.1907160735480.8986@lancre
Whole thread Raw
In response to Re: refactoring - share str2*int64 functions  (Michael Paquier <michael@paquier.xyz>)
Responses Re: refactoring - share str2*int64 functions
Re: refactoring - share str2*int64 functions
List pgsql-hackers
Bonjour Michaël,

> FWIW, I was looking forward to putting my hands on this patch and try
> to get it merged so as we can get rid of those duplications.  Here are
> some comments.
>
> +#ifdef FRONTEND
> +       fprintf(stderr,
> +               "invalid input syntax for type %s: \"%s\"\n",
> "bigint", str);
> +#else
> +       ereport(ERROR,
> +               (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
> +                errmsg("invalid input syntax for type %s: \"%s\"",
> +                       "bigint", str)));
> +#endif
> Have you looked at using the wrapper pg_log_error() here?

I have not.

I have simply merged the two implementations (pgbench & backend) as they 
were.

> +extern bool pg_scanint8(const char *str, bool errorOK, int64 *result);
> +extern uint64 pg_strtouint64(const char *str, char **endptr, int base);

> Hmm.  With this patch we have strtoint and pg_strtouint64, which makes
> the whole set inconsistent.

I understand that you mean bits vs bytes? Indeed it can bite!

> +
> #endif                         /* COMMON_STRING_H */
> Noise diff..

Indeed.

> numutils.c also has pg_strtoint16 and pg_strtoint32, so the locations
> become rather inconsistent with inconsistent APIs for the manipulation
> of int2 and int4 fields, and scanint8 is just a derivative of the same
> logic.  We have two categories of routines here:

Yep, but the int2/4 functions are not used elsewhere.

> - The wrappers on top of strtol and strtoul & co, which are named
> respectively strtoint and pg_strtouint64 with the patch.  The naming
> part is inconsistent, and we only handle uint64 and int32.  We don't
> need to worry about int64 and uint32 because they are not used?

Indeed, it seems that they are not needed/used by client code, AFAICT.

> That's fine by me, but at least let's have a consistent naming.

Ok.

> Prefixing the functions with pg_* is a better practice in my opinion
> as we will unlikely run into conflicts this way.

Ok.

> - The str->integer conversion routines, which actually have very
> similar characteristics to the strtol families as they remove trailing
> whitespaces first, check for a sign, etc, except that they work only
> on base 10.  And here we get into a state where pg_scanint8 should be
> actually called pg_strtoint64,

I just removed that:-)

ISTM that the issue is that the error handling of these functions is 
pretty different.

> with an interface inconsistent with its int32/int16 relatives now only 
> in the backend.

We can, but I'm not at ease with changing the error handling approach.

> Could we consider more consolidation here please?  Like moving the whole 
> set to src/common/?

My initial plan was simply to remove direct code duplications between 
front-end and back-end, not to re-engineer the full set of string to int 
conversion functions:-)

On the re-engineering front: Given the various points on the thread, ISTM 
that there should probably be two behaviors for str to signed/unsigned 
int{16,32,64}, and having only one kind of signature for all types would 
be definitely better.

One low-level one that does the conversion or return an error.

Another higher-level one which possibly adds an error message (stderr for 
front-end, log for back-end).

One choice is whether there are two functions (the higher one calling the 
lower one and adding the messages) or just one with a boolean to trigger 
the messages. I do not have a strong opinion. Maybe one function would be 
simpler. Andres boolean-compatible enum return looks like a good option.

Overall, this leads to something like:

enum { STRTOINT_OK, STRTOINT_OVERFLOW_ERROR, STRTOINT_SYNTAX_ERROR }
   pg_strto{,u}int{8?,16,32,64}
     (const char * string, const TYPE * result, const bool verbose);

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Next
From: Michael Paquier
Date:
Subject: Re: Add parallelism and glibc dependent only options to reindexdb