Thread: Proposed patch to clean up signed-ness warnings
With gcc 4 spreading, it seems like it's past time to do something about all those signed-vs-unsigned-char warnings that it emits. (Translation: now that I have to use gcc 4 regularly, I got annoyed enough to fix it ;-)) I looked into it a little and determined that nearly all the warnings were associated with the multibyte code. Outside the mb subsystem, our code pretty much uses "char *" for strings, but inside mb it's mostly "unsigned char *", which is needed because there are lots of inequality comparisons in there. It seemed to me that the cleanest fix was to change the external API of the mb subsystem to take and return "char *", while still using "unsigned char *" internally. The attached patch eliminates all signed-ness warnings in CVS tip using this approach. It's kinda long and tedious, but straightforward, and quite a lot of the changes simplify existing code by removing casts that aren't needed anymore. Two questions for the list: 1. Can anyone think of a cleaner way to do this? 2. Is there objection to applying this patch now (ie, before beta3)? It's not quite a bug fix, but I think it'll make it easier to find bugs going forward. regards, tom lane
Attachment
> With gcc 4 spreading, it seems like it's past time to do something about > all those signed-vs-unsigned-char warnings that it emits. (Translation: > now that I have to use gcc 4 regularly, I got annoyed enough to fix it > ;-)) > > I looked into it a little and determined that nearly all the warnings > were associated with the multibyte code. Outside the mb subsystem, > our code pretty much uses "char *" for strings, but inside mb it's > mostly "unsigned char *", which is needed because there are lots of > inequality comparisons in there. It seemed to me that the cleanest > fix was to change the external API of the mb subsystem to take and > return "char *", while still using "unsigned char *" internally. > The attached patch eliminates all signed-ness warnings in CVS tip > using this approach. It's kinda long and tedious, but straightforward, > and quite a lot of the changes simplify existing code by removing > casts that aren't needed anymore. > > Two questions for the list: > > 1. Can anyone think of a cleaner way to do this? > > 2. Is there objection to applying this patch now (ie, before beta3)? > It's not quite a bug fix, but I think it'll make it easier to find > bugs going forward. For me, your patche seems to be a retrogression. In my understanding, the reason why PostgreSQL uses "char *" in many places is just it was designed in the old days when ASCII was the only charset in the world. -- SRA OSS, Inc. Japan Tatsuo Ishii
Tatsuo Ishii <ishii@sraoss.co.jp> writes: >> 1. Can anyone think of a cleaner way to do this? > For me, your patche seems to be a retrogression. In my understanding, > the reason why PostgreSQL uses "char *" in many places is just it was > designed in the old days when ASCII was the only charset in the world. Are you proposing that we change all the "char *" to "unsigned char *"? I looked at that briefly but it seems like a huge loss, both in notational ugliness and in the amount of code that would have to be touched. Also, it would force us to add a bunch of explicit casts to avoid warnings with standard library functions like strlen(). To me the bottom line is that 99% of the code only needs to know that a character string is a character string. As this patch demonstrates, there is only a tiny fraction that needs to have the "unsigned" declaration. I don't think we should allow that fraction to dictate a notational burden for all the rest. regards, tom lane
> > For me, your patche seems to be a retrogression. In my understanding, > > the reason why PostgreSQL uses "char *" in many places is just it was > > designed in the old days when ASCII was the only charset in the world. > > Are you proposing that we change all the "char *" to "unsigned char *"? No, I suggest we change all "char *" to "unsigned char *" only where it points a string which could hold non ASCII character strings. I thought we learned the danger of 1) comparing chars with signed bit on, 2) passing chars with sign bit on to functions which expect int etc... > I looked at that briefly but it seems like a huge loss, both in > notational ugliness and in the amount of code that would have to be > touched. If you are just care the amount of effort, why don't you leave as it is and use pre v4 gcc? :-) > Also, it would force us to add a bunch of explicit casts to > avoid warnings with standard library functions like strlen(). Counter examples could be easily found in isalpha(), toupper() etc. > To me the > bottom line is that 99% of the code only needs to know that a character > string is a character string. As this patch demonstrates, there is only > a tiny fraction that needs to have the "unsigned" declaration. I don't > think we should allow that fraction to dictate a notational burden for > all the rest. To support multiple charsets/collataions, I think we need to change the way to represent character strings from the unstructured "char *" to more intelligent structure (I know it's hard to implement that without significant performance loss, but I know we should do it in the future). So "unsigned char*" is not enough for the goal anyway, I'm not against your patches. -- SRA OSS, Inc. Japan Tatsuo Ishii
Tatsuo Ishii <ishii@sraoss.co.jp> writes: >> Are you proposing that we change all the "char *" to "unsigned char *"? > No, I suggest we change all "char *" to "unsigned char *" only where > it points a string which could hold non ASCII character strings. Which is pretty nearly all of them... > To support multiple charsets/collataions, I think we need to change > the way to represent character strings from the unstructured "char *" > to more intelligent structure (I know it's hard to implement that > without significant performance loss, but I know we should do it in > the future). Yeah, it's still not clear where we are going to end up, but in the meantime we've got a lot of warnings cluttering the code and making it hard to spot real problems. > So "unsigned char*" is not enough for the goal anyway, I'm not against > your patches. OK. No one else objected, so I'll go ahead and apply before the code drifts to the point of breaking the patch. regards, tom lane