Re: refactoring - share str2*int64 functions - Mailing list pgsql-hackers

From Andres Freund
Subject Re: refactoring - share str2*int64 functions
Date
Msg-id 20190909101738.3qadlljsucxvf2kd@alap3.anarazel.de
Whole thread Raw
In response to Re: refactoring - share str2*int64 functions  (Michael Paquier <michael@paquier.xyz>)
Responses Re: refactoring - share str2*int64 functions
Re: refactoring - share str2*int64 functions
List pgsql-hackers
Hi,

On 2019-09-09 14:28:14 +0900, Michael Paquier wrote:
> On Thu, Sep 05, 2019 at 03:52:48PM +0900, Michael Paquier wrote:
> > Right, there was this part.  This brings also the point of having one
> > interface for the backend as all the error messages for the backend
> > are actually the same, with the most simple name being that:
> > pg_strtoint(value, size, error_ok).

I *VEHEMENTLY* oppose the introduction of any future pseudo-generic
routines taking the type width as a parameter. They're weakening the
type system and they're unnecessarily inefficient.


I don't really buy that saving a few copies of a strings is worth that
much. But if you really want to do that, the right approach imo would be
to move the error reporting into a separate function. I.e. something

void pg_attribute_noreturn()
pg_strtoint_error(pg_strtoint_status status, const char *input, const char *type)

that you'd call in small wrappers. Something like

static inline int32
pg_strtoint32_check(const char* s)
{
    int32 result;
    pg_strtoint_status status = pg_strtoint32(s, &result);

    if (unlikely(status == PG_STRTOINT_OK))
       pg_strtoint_error(status, s, "int32");
    return result;
}


> So, it seems to me that if we want to have a consolidation of
> everything, we basically need to have the generic error messages for
> the backend directly into common/string.c where the routines are
> refactored, and I think that we should do the following based on the
> patch attached:

> - Just remove pg_strtoint(), and move the error generation logic in
> common/string.c.

I'm not quite sure what you mean by moving the "error generation logic"?


> - Add an error_ok flag for all the pg_strto[u]int{16|32|64} routines,
> which generates errors only for FRONTEND is not defined.

I think this is a bad idea.


> - Use pg_log_error() for the frontend errors.
>
> If those errors are added everywhere, we would basically have no code
> paths in the frontend or the backend (for the unsigned part only)
> which use them yet.  Another possibility would be to ignore the
> error_ok flag in those cases, but that's inconsistent.

Yea, ignoring it would be terrible idea.

> diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
> index a9bd47d937..593a5ef54e 100644
> --- a/src/backend/libpq/pqmq.c
> +++ b/src/backend/libpq/pqmq.c
> @@ -286,10 +286,10 @@ pq_parse_errornotice(StringInfo msg, ErrorData *edata)
>                  edata->hint = pstrdup(value);
>                  break;
>              case PG_DIAG_STATEMENT_POSITION:
> -                edata->cursorpos = pg_strtoint32(value);
> +                edata->cursorpos = pg_strtoint(value, 4);
>                  break;

I'd be really upset if this type of change went in.



>  #include "fmgr.h"
>  #include "miscadmin.h"
> +#include "common/string.h"
>  #include "nodes/extensible.h"
>  #include "nodes/parsenodes.h"
>  #include "nodes/plannodes.h"
> @@ -80,7 +81,7 @@
>  #define READ_UINT64_FIELD(fldname) \
>      token = pg_strtok(&length);        /* skip :fldname */ \
>      token = pg_strtok(&length);        /* get field value */ \
> -    local_node->fldname = pg_strtouint64(token, NULL, 10)
> +    (void) pg_strtouint64(token, &local_node->fldname)

Seems like these actually could just ought to use the error-checked
variants. And I think it ought to change all of
READ_{INT,UINT,LONG,UINT64,OID}_FIELD, rather than just redirecting one
of them to the new routines.


>  static void pcb_error_callback(void *arg);
> @@ -496,7 +496,7 @@ make_const(ParseState *pstate, Value *value, int location)
>
>          case T_Float:
>              /* could be an oversize integer as well as a float ... */
> -            if (scanint8(strVal(value), true, &val64))
> +            if (pg_strtoint64(strVal(value), &val64) == PG_STRTOINT_OK)
>              {
>                  /*
>                   * It might actually fit in int32. Probably only INT_MIN can

Not for this change, but we really ought to move away from this crappy
logic. It's really bonkers to have T_Float represent large integers and
floats.



> +/*
> + * pg_strtoint16
> + *
> + * Convert input string to a signed 16-bit integer.  Allows any number of
> + * leading or trailing whitespace characters.
> + *
> + * NB: Accumulate input as a negative number, to deal with two's complement
> + * representation of the most negative number, which can't be represented as a
> + * positive number.
> + *
> + * The function returns immediately if the conversion failed with a status
> + * value to let the caller handle the error.  On success, the result is
> + * stored in "*result".
> + */
> +pg_strtoint_status
> +pg_strtoint16(const char *s, int16 *result)
> +{
> +    const char *ptr = s;
> +    int16        tmp = 0;
> +    bool        neg = false;
> +
> +    /* skip leading spaces */
> +    while (likely(*ptr) && isspace((unsigned char) *ptr))
> +        ptr++;
> +
> +    /* handle sign */
> +    if (*ptr == '-')
> +    {
> +        ptr++;
> +        neg = true;
> +    }
> +    else if (*ptr == '+')
> +        ptr++;

> +    /* require at least one digit */
> +    if (unlikely(!isdigit((unsigned char) *ptr)))
> +        return PG_STRTOINT_SYNTAX_ERROR;

Wonder if there's an argument for moving this behaviour to someplace
else - in most cases we really don't expect whitespace, and checking for
it is unnecessary overhead.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: fn ln
Date:
Subject: Re: BUG #15977: Inconsistent behavior in chained transactions
Next
From: Antonin Houska
Date:
Subject: Re: Attempt to consolidate reading of XLOG page