Thread: [PATCH] Small optimization across postgres (remove strlen duplicate usage)
Hi,
strlen it is one of the low fruits that can be harvested.
What is your opinion?
regards,
Ranier Vilela
Attachment
Re: [PATCH] Small optimization across postgres (remove strlenduplicate usage)
From
Tomas Vondra
Date:
On Sun, Apr 19, 2020 at 11:24:38AM -0300, Ranier Vilela wrote: >Hi, >strlen it is one of the low fruits that can be harvested. >What is your opinion? > That assumes this actually affects/improves performance, without any measurements proving that. Considering large number of the places you modified are related to DDL (CreateComment, ChooseIndexColumnNames, ...) or stuff that runs only once or infrequently (like the changes in PostmasterMain or libpqrcv_get_senderinfo). Likewise, it seems entirely pointless to worry about strlen() overhead e.g. in fsync_parent_path which is probably dominated by I/O. Maybe there are places where this would help, but I don't see a reason to just throw away all strlen calls and replace them with something clearly less convenient and possibly more error-prone (I'd expect quite a few off-by-one mistakes with this). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Small optimization across postgres (remove strlenduplicate usage)
From
Ranier Vilela
Date:
Em dom., 19 de abr. de 2020 às 16:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> escreveu:
On Sun, Apr 19, 2020 at 11:24:38AM -0300, Ranier Vilela wrote:
>Hi,
>strlen it is one of the low fruits that can be harvested.
>What is your opinion?
>
That assumes this actually affects/improves performance, without any
measurements proving that. Considering large number of the places you
modified are related to DDL (CreateComment, ChooseIndexColumnNames, ...)
or stuff that runs only once or infrequently (like the changes in
PostmasterMain or libpqrcv_get_senderinfo). Likewise, it seems entirely
pointless to worry about strlen() overhead e.g. in fsync_parent_path
which is probably dominated by I/O.
With code as interconnected as postgres, it is difficult to say that a function, which calls strlen, repeatedly, would not have any gain.
Regarding the functions, I was just being consistent, trying to remove all occurrences, even where, there is very little gain.
Maybe there are places where this would help, but I don't see a reason
to just throw away all strlen calls and replace them with something
clearly less convenient and possibly more error-prone (I'd expect quite
a few off-by-one mistakes with this).
Yes, always, it is prone to errors, but for the most part, they are safe changes.
It passes all 199 tests, of course it has not been tested in a real production environment.
Perhaps the time is not the best, the end of the cycle, but, once done, I believe it would be a good harvest.
It passes all 199 tests, of course it has not been tested in a real production environment.
Perhaps the time is not the best, the end of the cycle, but, once done, I believe it would be a good harvest.
Thank you for comment.
regards,
Ranier Vilela
Re: [PATCH] Small optimization across postgres (remove strlenduplicate usage)
From
Tomas Vondra
Date:
On Sun, Apr 19, 2020 at 05:29:52PM -0300, Ranier Vilela wrote: >Em dom., 19 de abr. de 2020 às 16:33, Tomas Vondra < >tomas.vondra@2ndquadrant.com> escreveu: > >> On Sun, Apr 19, 2020 at 11:24:38AM -0300, Ranier Vilela wrote: >> >Hi, >> >strlen it is one of the low fruits that can be harvested. >> >What is your opinion? >> > >> >> That assumes this actually affects/improves performance, without any >> measurements proving that. Considering large number of the places you >> modified are related to DDL (CreateComment, ChooseIndexColumnNames, ...) >> or stuff that runs only once or infrequently (like the changes in >> PostmasterMain or libpqrcv_get_senderinfo). Likewise, it seems entirely >> pointless to worry about strlen() overhead e.g. in fsync_parent_path >> which is probably dominated by I/O. >> >With code as interconnected as postgres, it is difficult to say that a >function, which calls strlen, repeatedly, would not have any gain. >Regarding the functions, I was just being consistent, trying to remove all >occurrences, even where, there is very little gain. > That very much depends on the function, I think. For most places modified by this patch it's not that hard, I think. The DDL cases (comments and indexes) seem pretty clear. Similarly for the command parsing, wal receiver, lockfile creation, guc, exec.c, and so on. Perhaps the only places worth changing might be xml.c and spell.c, but I'm not convinced even these are worth it, really. > >> >> Maybe there are places where this would help, but I don't see a reason >> to just throw away all strlen calls and replace them with something >> clearly less convenient and possibly more error-prone (I'd expect quite >> a few off-by-one mistakes with this). >> >Yes, always, it is prone to errors, but for the most part, they are safe >changes. >It passes all 199 tests, of course it has not been tested in a real >production environment. >Perhaps the time is not the best, the end of the cycle, but, once done, I >believe it would be a good harvest. > I wasn't really worried about bugs in this patch, but rather in future changes made to this code. Off-by-one errors are trivial to make. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Small optimization across postgres (remove strlenduplicate usage)
From
Andreas Karlsson
Date:
On 4/19/20 10:29 PM, Ranier Vilela wrote: > Em dom., 19 de abr. de 2020 às 16:33, Tomas Vondra > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> > escreveu: > > On Sun, Apr 19, 2020 at 11:24:38AM -0300, Ranier Vilela wrote: > >Hi, > >strlen it is one of the low fruits that can be harvested. > >What is your opinion? > > > > That assumes this actually affects/improves performance, without any > measurements proving that. Considering large number of the places you > modified are related to DDL (CreateComment, ChooseIndexColumnNames, ...) > or stuff that runs only once or infrequently (like the changes in > PostmasterMain or libpqrcv_get_senderinfo). Likewise, it seems entirely > pointless to worry about strlen() overhead e.g. in fsync_parent_path > which is probably dominated by I/O. > > With code as interconnected as postgres, it is difficult to say that a > function, which calls strlen, repeatedly, would not have any gain. > Regarding the functions, I was just being consistent, trying to remove > all occurrences, even where, there is very little gain. At least gcc 9.3 optimizes "strlen(s) == 0" to "s[0] == '\0'", even at low optimization levels. I tried it out with https://godbolt.org/. Maybe some of the others cases are performance improvements, I have not checked your patch in details, but strlen() == 0 is easily handled by the compiler. C code: int f1(char *str) { return strlen(str) == 0; } int f2(char *str) { return str[0] == '\0'; } Assembly generated with default flags: f1: pushq %rbp movq %rsp, %rbp movq %rdi, -8(%rbp) movq -8(%rbp), %rax movzbl (%rax), %eax testb %al, %al sete %al movzbl %al, %eax popq %rbp ret f2: pushq %rbp movq %rsp, %rbp movq %rdi, -8(%rbp) movq -8(%rbp), %rax movzbl (%rax), %eax testb %al, %al sete %al movzbl %al, %eax popq %rbp ret Assembly generated with -O2. f1: xorl %eax, %eax cmpb $0, (%rdi) sete %al ret f2: xorl %eax, %eax cmpb $0, (%rdi) sete %al ret Andreas
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Sun, Apr 19, 2020 at 11:24:38AM -0300, Ranier Vilela wrote: >> strlen it is one of the low fruits that can be harvested. > Maybe there are places where this would help, but I don't see a reason > to just throw away all strlen calls and replace them with something > clearly less convenient and possibly more error-prone (I'd expect quite > a few off-by-one mistakes with this). I've heard it claimed that modern compilers will optimize strlen('literal') to a constant at compile time. I'm not sure how much I believe that, but to the extent that it's true, replacing such calls would provide exactly no performance benefit. I'm quite -1 on changing these to sizeof(), in any case, because (a) that opens up room for confusion about whether the trailing nul is included, and (b) it makes it very easy, when changing or copy/pasting code, to apply sizeof to something that's not a string literal, with disastrous results. The cases where Ranier proposes to replace strlen(foo) == 0 with a test on foo[0] do seem like wins, though. Asking for the full string length to be computed is more computation than necessary, and it's less clear that the compiler could be expected to save you from that. Anyway there's a coding style proposition that we should be doing this consistently, and certainly lots of places do do this without using strlen(). I can't get excited about the proposed changes to optimize away multiple calls of strlen() either, unless there's performance measurements to support them individually. This again seems like something a compiler might do for you. regards, tom lane
Re: [PATCH] Small optimization across postgres (remove strlenduplicate usage)
From
David Rowley
Date:
On Mon, 20 Apr 2020 at 09:38, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The cases where Ranier proposes to replace strlen(foo) == 0 > with a test on foo[0] do seem like wins, though. Asking for > the full string length to be computed is more computation than > necessary, and it's less clear that the compiler could be > expected to save you from that. Anyway there's a coding style > proposition that we should be doing this consistently, and > certainly lots of places do do this without using strlen(). Looking at https://godbolt.org/z/6XsjbA it seems like GCC is pretty good at getting rid of the strlen call even at -O0. It takes -O1 for clang to use it and -O2 for icc. David
Re: [PATCH] Small optimization across postgres (remove strlenduplicate usage)
From
Ranier Vilela
Date:
Em dom., 19 de abr. de 2020 às 19:00, David Rowley <dgrowleyml@gmail.com> escreveu:
On Mon, 20 Apr 2020 at 09:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The cases where Ranier proposes to replace strlen(foo) == 0
> with a test on foo[0] do seem like wins, though. Asking for
> the full string length to be computed is more computation than
> necessary, and it's less clear that the compiler could be
> expected to save you from that. Anyway there's a coding style
> proposition that we should be doing this consistently, and
> certainly lots of places do do this without using strlen().
Looking at https://godbolt.org/z/6XsjbA it seems like GCC is pretty
good at getting rid of the strlen call even at -O0. It takes -O1 for
clang to use it and -O2 for icc.
I tried: https://godbolt.org with:
-O2:
f1:
int main (int argv, char **argc)
{
return strlen(argc[0]) == 0;
}
f1: Assembly
main: # @main
mov rcx, qword ptr [rsi]
xor eax, eax
cmp byte ptr [rcx], 0
sete al
ret
f2:
int main (int argv, char **argc)
{
return argc[0] == '\0';
}
f2: Assembly
main: # @main
xor eax, eax
cmp qword ptr [rsi], 0
sete al
ret
xor eax, eax
cmp qword ptr [rsi], 0
sete al
ret
For me clearly str [0] == '\ 0', wins.
regards,
Ranier Vilela
Re: [PATCH] Small optimization across postgres (remove strlenduplicate usage)
From
Ranier Vilela
Date:
Em dom., 19 de abr. de 2020 às 18:38, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Sun, Apr 19, 2020 at 11:24:38AM -0300, Ranier Vilela wrote:
>> strlen it is one of the low fruits that can be harvested.
> Maybe there are places where this would help, but I don't see a reason
> to just throw away all strlen calls and replace them with something
> clearly less convenient and possibly more error-prone (I'd expect quite
> a few off-by-one mistakes with this).
I've heard it claimed that modern compilers will optimize
strlen('literal') to a constant at compile time. I'm not sure how
much I believe that, but to the extent that it's true, replacing such
calls would provide exactly no performance benefit.
Tom, I wouldn't believe it very much, they still have a lot of stupid compilers out there (msvc for example).
Furthermore, optimizations are a complicated business, often the adjacent code does not allow for such optimizations.
When a programmer does, there is no doubt.
Furthermore, optimizations are a complicated business, often the adjacent code does not allow for such optimizations.
When a programmer does, there is no doubt.
I'm quite -1 on changing these to sizeof(), in any case, because
(a) that opens up room for confusion about whether the trailing nul is
included, and (b) it makes it very easy, when changing or copy/pasting
code, to apply sizeof to something that's not a string literal, with
disastrous results.
It may be true, but I have seen a lot of Postgres code, where sizeof is used extensively, even with real chances of what you said happened. So that risk already exists.
The cases where Ranier proposes to replace strlen(foo) == 0
with a test on foo[0] do seem like wins, though. Asking for
the full string length to be computed is more computation than
necessary, and it's less clear that the compiler could be
expected to save you from that. Anyway there's a coding style
proposition that we should be doing this consistently, and
certainly lots of places do do this without using strlen().
Yes, this is the idea.
I can't get excited about the proposed changes to optimize away
multiple calls of strlen() either, unless there's performance
measurements to support them individually. This again seems like
something a compiler might do for you.
Again, the compiler will not always save us.
I have seen many fanstatic solutions in Postgres, but I have also seen a lot of written code, forgive me, without caprice, without much care.
The idea is, little by little, to prevent carefree code, either written or left in Postgres.
I have seen many fanstatic solutions in Postgres, but I have also seen a lot of written code, forgive me, without caprice, without much care.
The idea is, little by little, to prevent carefree code, either written or left in Postgres.
You can see an example in that patch.
After a few calls the programmer validates the entry and leaves if it is bad. When it should be done before anything.
After a few calls the programmer validates the entry and leaves if it is bad. When it should be done before anything.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5bdc02fce2..a00cca2605 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -10699,10 +10699,15 @@ GUCArrayDelete(ArrayType *array, const char *name)
struct config_generic *record;
ArrayType *newarray;
int i;
+ int len;
int index;
Assert(name);
+ /* if array is currently null, then surely nothing to delete */
+ if (!array)
+ return NULL;
+
/* test if the option is valid and we're allowed to set it */
(void) validate_option_array_item(name, NULL, false);
@@ -10711,12 +10716,9 @@ GUCArrayDelete(ArrayType *array, const char *name)
if (record)
name = record->name;
- /* if array is currently null, then surely nothing to delete */
- if (!array)
- return NULL;
-
newarray = NULL;
index = 1;
+ len = strlen(name);
for (i = 1; i <= ARR_DIMS(array)[0]; i++)
{
@@ -10735,8 +10737,8 @@ GUCArrayDelete(ArrayType *array, const char *name)
val = TextDatumGetCString(d);
/* ignore entry if it's what we want to delete */
- if (strncmp(val, name, strlen(name)) == 0
- && val[strlen(name)] == '=')
+ if (strncmp(val, name, len) == 0
+ && val[len] == '=')
continue;
/* else add it to the output array */
index 5bdc02fce2..a00cca2605 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -10699,10 +10699,15 @@ GUCArrayDelete(ArrayType *array, const char *name)
struct config_generic *record;
ArrayType *newarray;
int i;
+ int len;
int index;
Assert(name);
+ /* if array is currently null, then surely nothing to delete */
+ if (!array)
+ return NULL;
+
/* test if the option is valid and we're allowed to set it */
(void) validate_option_array_item(name, NULL, false);
@@ -10711,12 +10716,9 @@ GUCArrayDelete(ArrayType *array, const char *name)
if (record)
name = record->name;
- /* if array is currently null, then surely nothing to delete */
- if (!array)
- return NULL;
-
newarray = NULL;
index = 1;
+ len = strlen(name);
for (i = 1; i <= ARR_DIMS(array)[0]; i++)
{
@@ -10735,8 +10737,8 @@ GUCArrayDelete(ArrayType *array, const char *name)
val = TextDatumGetCString(d);
/* ignore entry if it's what we want to delete */
- if (strncmp(val, name, strlen(name)) == 0
- && val[strlen(name)] == '=')
+ if (strncmp(val, name, len) == 0
+ && val[len] == '=')
continue;
/* else add it to the output array */
Ranier Vilela
Re: [PATCH] Small optimization across postgres (remove strlenduplicate usage)
From
David Rowley
Date:
On Mon, 20 Apr 2020 at 11:24, Ranier Vilela <ranier.vf@gmail.com> wrote: > I tried: https://godbolt.org with: > > -O2: > > f1: > int main (int argv, char **argc) > { > return strlen(argc[0]) == 0; > } > > f1: Assembly > main: # @main > mov rcx, qword ptr [rsi] > xor eax, eax > cmp byte ptr [rcx], 0 > sete al > ret > > f2: > int main (int argv, char **argc) > { > return argc[0] == '\0'; > } > > f2: Assembly > > main: # @main > xor eax, eax > cmp qword ptr [rsi], 0 > sete al > ret > > For me clearly str [0] == '\ 0', wins. I think you'd want to use argc[0][0] == '\0' or *argc[0] == '\0'. Otherwise you appear just to be checking if the first element in the argc pointer array is set to NULL, which is certainly not the same as an empty string. David
Re: [PATCH] Small optimization across postgres (remove strlenduplicate usage)
From
Ranier Vilela
Date:
Em dom., 19 de abr. de 2020 às 22:00, David Rowley <dgrowleyml@gmail.com> escreveu:
On Mon, 20 Apr 2020 at 11:24, Ranier Vilela <ranier.vf@gmail.com> wrote:
> I tried: https://godbolt.org with:
>
> -O2:
>
> f1:
> int main (int argv, char **argc)
> {
> return strlen(argc[0]) == 0;
> }
>
> f1: Assembly
> main: # @main
> mov rcx, qword ptr [rsi]
> xor eax, eax
> cmp byte ptr [rcx], 0
> sete al
> ret
>
> f2:
> int main (int argv, char **argc)
> {
> return argc[0] == '\0';
> }
>
> f2: Assembly
>
> main: # @main
> xor eax, eax
> cmp qword ptr [rsi], 0
> sete al
> ret
>
> For me clearly str [0] == '\ 0', wins.
I think you'd want to use argc[0][0] == '\0' or *argc[0] == '\0'.
Otherwise you appear just to be checking if the first element in the
argc pointer array is set to NULL, which is certainly not the same as
an empty string.
I guess you're right.
x86-64 clang (trunk) -O2
f1:
int cmp(const char * name)
{
return strlen(name) == 0;
}
cmp: # @cmp
xor eax, eax
cmp byte ptr [rdi], 0
sete al
ret
f2:
int cmp(const char * name)
{
return name[0] == '\0';
}
cmp: # @cmp
xor eax, eax
cmp byte ptr [rdi], 0
sete al
ret
int cmp(const char * name)
{
return strlen(name) == 0;
}
cmp: # @cmp
xor eax, eax
cmp byte ptr [rdi], 0
sete al
ret
f2:
int cmp(const char * name)
{
return name[0] == '\0';
}
cmp: # @cmp
xor eax, eax
cmp byte ptr [rdi], 0
sete al
ret
Is the same result in assembly.
Well, it doesn't matter to me, I will continue to use str[0] == '\0'.
Thanks for take part.
regards,
Ranier VIlela