Re: refactoring - share str2*int64 functions - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: refactoring - share str2*int64 functions
Date
Msg-id alpine.DEB.2.21.1909041027080.8057@lancre
Whole thread Raw
In response to Re: refactoring - share str2*int64 functions  (Michael Paquier <michael@paquier.xyz>)
Responses Re: refactoring - share str2*int64 functions
List pgsql-hackers
Bonjour Michaël,

>> Attached a rebased version which implements the int64/uint64 stuff. It is
>> basically the previous patch without the overflow inlined functions.
>
> -     if (!strtoint64(yytext, true, &yylval->ival))
> +     if (unlikely(pg_strtoint64(yytext, &yylval->ival) != PG_STRTOINT_OK))

> It seems to me that we should have unlikely() only within
> pg_strtoint64(), pg_strtouint64(), etc.

From a compiler perspective, the (un)likely tip is potentially useful on 
any test. We know when parsing a that it is very unlikely that the string 
conversion would fail, so we can tell that, so that the compiler knows 
which branch it should optimize first.

You can argue against that if the functions are inlined, because maybe the 
compiler would propagate the information, but for distinct functions 
compiled separately the information is useful at each level.

> -   /* skip leading spaces; cast is consistent with strtoint64 */
> -   while (*ptr && isspace((unsigned char) *ptr))

> +   /* skip leading spaces; cast is consistent with pg_strtoint64 */
> +   while (isspace((unsigned char) *ptr))
>        ptr++;

> What do you think about splitting this part in two?  I would suggest
> to do the refactoring in a first patch, and the consider all the
> optimizations for the routines you have in mind afterwards.

I would not bother.

> I think that we don't actually need is_an_int() and str2int64() at all
> in pgbench.c as we could just check for the return code of
> pg_strtoint64() and switch to the double parsing only if we don't have
> PG_STRTOINT_OK.

Yep, you are right, now that the conversion functions does not error out a 
message, its failure can be used as a test.

The version attached changes slightly the semantics, because on int 
overflows a double conversion will be attempted instead of erroring. I do 
not think that it is worth the effort of preserving the previous semantic 
of erroring.

> scanint8() only has one caller in the backend with your patch, and we
> don't check after its return result in int8.c, so I would suggest to
> move the error handling directly in int8in() and to remove scanint8().

Ok.

> I think that we should also introduce the [u]int[16|32] flavors and
> expand them in the code in a single patch, in a way consistent with
> what you have done for int64/uint64 as the state that we reach with
> the patch is kind of weird as there are existing versions numutils.c.

Before dealing with the 16/32 versions, which involve quite a significant 
amount of changes, I would want a clear message that "the 64 bit approach" 
is the model to follow.

Moreover, I'm unsure how to rename the existing pg_strtoint32 and others
which call ereport, if the name is used for the common error returning 
version.

> Have you done some performance testing of the patch?  The COPY
> bulkload is a good example of that:
> https://www.postgresql.org/message-id/20190717181428.krqpmduejbqr4m6g%40alap3.anarazel.de

I have done no such thing for now.

I would not expect any significant performance difference when loading 
int8 things because basically scanint8 has just been renamed pg_strtoint64 
and made global, and that is more or less all. It might be a little bit 
slower because possible the compiler cannot inline the conversion, but on 
the other hand, the *likely hints and removed tests may marginaly help 
performance. I think that the only way to test performance significantly 
would be to write a specific program that loops over a conversion.

-- 
Fabien.
Attachment

pgsql-hackers by date:

Previous
From: Rafia Sabih
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: Ibrar Ahmed
Date:
Subject: Re: fix for BUG #3720: wrong results at using ltree