Re: refactoring - share str2*int64 functions - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: refactoring - share str2*int64 functions |
Date | |
Msg-id | 20190909052814.GA26605@paquier.xyz Whole thread Raw |
In response to | Re: refactoring - share str2*int64 functions (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: refactoring - share str2*int64 functions
|
List | pgsql-hackers |
On Thu, Sep 05, 2019 at 03:52:48PM +0900, Michael Paquier wrote: > Right, there was this part. This brings also the point of having one > interface for the backend as all the error messages for the backend > are actually the same, with the most simple name being that: > pg_strtoint(value, size, error_ok). I have been looking at that for the last couple of days. First, I have consolidated all the error strings in a single routine like this one, except that error_ok is not really needed if you take this approach: callers that don't care about failures could just call the set of routines in common/string.c and be done with it. Attached is an update of my little module that I used to check that the refactoring was done correctly (regression tests attached), it also includes a couple of routines to check the performance difference between one approach and the other, with focus on two things: - Is pg_strtouint64 really improved? - How much do we lose by moving to a common interface in the backend with pg_strtoint? The good news is that removing strtol from pg_strtouint64 really improves the performance as already reported, and with one billion calls in a tight loop you see a clear difference: =# select pg_strtouint64_old_check('10000', 1000000000); pg_strtouint64_old_check -------------------------- 10000 (1 row) Time: 15576.539 ms (00:15.577) =# select pg_strtouint64_new_check('10000', 1000000000); pg_strtouint64_new_check -------------------------- 10000 (1 row) Time: 8820.544 ms (00:08.821) So the new implementation is more than 40% faster with this micro-benchmark on my Debian box. The bad news is that a pg_strtoint() wrapper is not a good idea: =# select pg_strtoint_check('10000', 1000000000, 4); pg_strtoint_check ------------------- 10000 (1 row) Time: 11178.101 ms (00:11.178) =# select pg_strtoint32_check('10000', 1000000000); pg_strtoint32_check --------------------- 10000 (1 row) Time: 9252.894 ms (00:09.253) So trying to consolidate all error messages leads to a 15% hit with this test, which sucks. So, it seems to me that if we want to have a consolidation of everything, we basically need to have the generic error messages for the backend directly into common/string.c where the routines are refactored, and I think that we should do the following based on the patch attached: - Just remove pg_strtoint(), and move the error generation logic in common/string.c. - Add an error_ok flag for all the pg_strto[u]int{16|32|64} routines, which generates errors only for FRONTEND is not defined. - Use pg_log_error() for the frontend errors. If those errors are added everywhere, we would basically have no code paths in the frontend or the backend (for the unsigned part only) which use them yet. Another possibility would be to ignore the error_ok flag in those cases, but that's inconsistent. My take would be here to have a more generic error message, like that: "value \"%s\" is out of range for [unsigned] {16|32|64}-bit integer" "invalid input syntax for [unsigned] {16|32|64}-bit integer: \"%s\"\n" I do not suggest to change the messages for the backend for signed entries though, as these are linked to the data types used. Attached is an update of my toy module, and an updated patch with what I have done up to now. This stuff already does a lot, so for now I have not worked on the removal strtoint() in string.c yet. We could just do that with a follow-up patch and make it use the new conversion routines once we are sure that they are in a correct shape. As strtoint() makes use of strtol(), switching to the new routines will be much faster anyway... Fabien, any thoughts? -- Michael
Attachment
pgsql-hackers by date: