Thread: Re: Proposed patch to clean up signed-ness warnings
> With gcc 4 spreading, it seems like it's past time to do something about > all those signed-vs-unsigned-char warnings that it emits. (Translation: > now that I have to use gcc 4 regularly, I got annoyed enough to fix it > ;-)) > > I looked into it a little and determined that nearly all the warnings > were associated with the multibyte code. Outside the mb subsystem, > our code pretty much uses "char *" for strings, but inside mb it's > mostly "unsigned char *", which is needed because there are lots of > inequality comparisons in there. It seemed to me that the cleanest > fix was to change the external API of the mb subsystem to take and > return "char *", while still using "unsigned char *" internally. > The attached patch eliminates all signed-ness warnings in CVS tip > using this approach. It's kinda long and tedious, but straightforward, > and quite a lot of the changes simplify existing code by removing > casts that aren't needed anymore. > > Two questions for the list: > > 1. Can anyone think of a cleaner way to do this? > > 2. Is there objection to applying this patch now (ie, before beta3)? > It's not quite a bug fix, but I think it'll make it easier to find > bugs going forward. For me, your patche seems to be a retrogression. In my understanding, the reason why PostgreSQL uses "char *" in many places is just it was designed in the old days when ASCII was the only charset in the world. -- SRA OSS, Inc. Japan Tatsuo Ishii
Tatsuo Ishii <ishii@sraoss.co.jp> writes: >> 1. Can anyone think of a cleaner way to do this? > For me, your patche seems to be a retrogression. In my understanding, > the reason why PostgreSQL uses "char *" in many places is just it was > designed in the old days when ASCII was the only charset in the world. Are you proposing that we change all the "char *" to "unsigned char *"? I looked at that briefly but it seems like a huge loss, both in notational ugliness and in the amount of code that would have to be touched. Also, it would force us to add a bunch of explicit casts to avoid warnings with standard library functions like strlen(). To me the bottom line is that 99% of the code only needs to know that a character string is a character string. As this patch demonstrates, there is only a tiny fraction that needs to have the "unsigned" declaration. I don't think we should allow that fraction to dictate a notational burden for all the rest. regards, tom lane
> > For me, your patche seems to be a retrogression. In my understanding, > > the reason why PostgreSQL uses "char *" in many places is just it was > > designed in the old days when ASCII was the only charset in the world. > > Are you proposing that we change all the "char *" to "unsigned char *"? No, I suggest we change all "char *" to "unsigned char *" only where it points a string which could hold non ASCII character strings. I thought we learned the danger of 1) comparing chars with signed bit on, 2) passing chars with sign bit on to functions which expect int etc... > I looked at that briefly but it seems like a huge loss, both in > notational ugliness and in the amount of code that would have to be > touched. If you are just care the amount of effort, why don't you leave as it is and use pre v4 gcc? :-) > Also, it would force us to add a bunch of explicit casts to > avoid warnings with standard library functions like strlen(). Counter examples could be easily found in isalpha(), toupper() etc. > To me the > bottom line is that 99% of the code only needs to know that a character > string is a character string. As this patch demonstrates, there is only > a tiny fraction that needs to have the "unsigned" declaration. I don't > think we should allow that fraction to dictate a notational burden for > all the rest. To support multiple charsets/collataions, I think we need to change the way to represent character strings from the unstructured "char *" to more intelligent structure (I know it's hard to implement that without significant performance loss, but I know we should do it in the future). So "unsigned char*" is not enough for the goal anyway, I'm not against your patches. -- SRA OSS, Inc. Japan Tatsuo Ishii
Tatsuo Ishii <ishii@sraoss.co.jp> writes: >> Are you proposing that we change all the "char *" to "unsigned char *"? > No, I suggest we change all "char *" to "unsigned char *" only where > it points a string which could hold non ASCII character strings. Which is pretty nearly all of them... > To support multiple charsets/collataions, I think we need to change > the way to represent character strings from the unstructured "char *" > to more intelligent structure (I know it's hard to implement that > without significant performance loss, but I know we should do it in > the future). Yeah, it's still not clear where we are going to end up, but in the meantime we've got a lot of warnings cluttering the code and making it hard to spot real problems. > So "unsigned char*" is not enough for the goal anyway, I'm not against > your patches. OK. No one else objected, so I'll go ahead and apply before the code drifts to the point of breaking the patch. regards, tom lane