Thread: Proposed patch to clean up signed-ness warnings

Proposed patch to clean up signed-ness warnings

From
Tom Lane
Date:
With gcc 4 spreading, it seems like it's past time to do something about
all those signed-vs-unsigned-char warnings that it emits.  (Translation:
now that I have to use gcc 4 regularly, I got annoyed enough to fix it
;-))

I looked into it a little and determined that nearly all the warnings
were associated with the multibyte code.  Outside the mb subsystem,
our code pretty much uses "char *" for strings, but inside mb it's
mostly "unsigned char *", which is needed because there are lots of
inequality comparisons in there.  It seemed to me that the cleanest
fix was to change the external API of the mb subsystem to take and
return "char *", while still using "unsigned char *" internally.
The attached patch eliminates all signed-ness warnings in CVS tip
using this approach.  It's kinda long and tedious, but straightforward,
and quite a lot of the changes simplify existing code by removing
casts that aren't needed anymore.

Two questions for the list:

1. Can anyone think of a cleaner way to do this?

2. Is there objection to applying this patch now (ie, before beta3)?
It's not quite a bug fix, but I think it'll make it easier to find
bugs going forward.

            regards, tom lane


Attachment

Re: Proposed patch to clean up signed-ness warnings

From
Tatsuo Ishii
Date:
> With gcc 4 spreading, it seems like it's past time to do something about
> all those signed-vs-unsigned-char warnings that it emits.  (Translation:
> now that I have to use gcc 4 regularly, I got annoyed enough to fix it
> ;-))
>
> I looked into it a little and determined that nearly all the warnings
> were associated with the multibyte code.  Outside the mb subsystem,
> our code pretty much uses "char *" for strings, but inside mb it's
> mostly "unsigned char *", which is needed because there are lots of
> inequality comparisons in there.  It seemed to me that the cleanest
> fix was to change the external API of the mb subsystem to take and
> return "char *", while still using "unsigned char *" internally.
> The attached patch eliminates all signed-ness warnings in CVS tip
> using this approach.  It's kinda long and tedious, but straightforward,
> and quite a lot of the changes simplify existing code by removing
> casts that aren't needed anymore.
>
> Two questions for the list:
>
> 1. Can anyone think of a cleaner way to do this?
>
> 2. Is there objection to applying this patch now (ie, before beta3)?
> It's not quite a bug fix, but I think it'll make it easier to find
> bugs going forward.

For me, your patche seems to be a retrogression. In my understanding,
the reason why PostgreSQL uses "char *" in many places is just it was
designed in the old days when ASCII was the only charset in the world.
--
SRA OSS, Inc. Japan
Tatsuo Ishii

Re: Proposed patch to clean up signed-ness warnings

From
Tom Lane
Date:
Tatsuo Ishii <ishii@sraoss.co.jp> writes:
>> 1. Can anyone think of a cleaner way to do this?

> For me, your patche seems to be a retrogression. In my understanding,
> the reason why PostgreSQL uses "char *" in many places is just it was
> designed in the old days when ASCII was the only charset in the world.

Are you proposing that we change all the "char *" to "unsigned char *"?
I looked at that briefly but it seems like a huge loss, both in
notational ugliness and in the amount of code that would have to be
touched.  Also, it would force us to add a bunch of explicit casts to
avoid warnings with standard library functions like strlen().  To me the
bottom line is that 99% of the code only needs to know that a character
string is a character string.  As this patch demonstrates, there is only
a tiny fraction that needs to have the "unsigned" declaration.  I don't
think we should allow that fraction to dictate a notational burden for
all the rest.

            regards, tom lane

Re: [HACKERS] Proposed patch to clean up signed-ness warnings

From
Tatsuo Ishii
Date:
> > For me, your patche seems to be a retrogression. In my understanding,
> > the reason why PostgreSQL uses "char *" in many places is just it was
> > designed in the old days when ASCII was the only charset in the world.
>
> Are you proposing that we change all the "char *" to "unsigned char *"?

No, I suggest we change all "char *" to "unsigned char *" only where
it points a string which could hold non ASCII character strings. I
thought we learned the danger of 1) comparing chars with signed bit
on, 2) passing chars with sign bit on to functions which expect int
etc...

> I looked at that briefly but it seems like a huge loss, both in
> notational ugliness and in the amount of code that would have to be
> touched.

If you are just care the amount of effort, why don't you leave as it
is and use pre v4 gcc? :-)

>  Also, it would force us to add a bunch of explicit casts to
> avoid warnings with standard library functions like strlen().

Counter examples could be easily found in isalpha(), toupper() etc.

> To me the
> bottom line is that 99% of the code only needs to know that a character
> string is a character string.  As this patch demonstrates, there is only
> a tiny fraction that needs to have the "unsigned" declaration.  I don't
> think we should allow that fraction to dictate a notational burden for
> all the rest.

To support multiple charsets/collataions, I think we need to change
the way to represent character strings from the unstructured "char *"
to more intelligent structure (I know it's hard to implement that
without significant performance loss, but I know we should do it in
the future).

So "unsigned char*" is not enough for the goal anyway, I'm not against
your patches.
--
SRA OSS, Inc. Japan
Tatsuo Ishii

Re: [HACKERS] Proposed patch to clean up signed-ness warnings

From
Tom Lane
Date:
Tatsuo Ishii <ishii@sraoss.co.jp> writes:
>> Are you proposing that we change all the "char *" to "unsigned char *"?

> No, I suggest we change all "char *" to "unsigned char *" only where
> it points a string which could hold non ASCII character strings.

Which is pretty nearly all of them...

> To support multiple charsets/collataions, I think we need to change
> the way to represent character strings from the unstructured "char *"
> to more intelligent structure (I know it's hard to implement that
> without significant performance loss, but I know we should do it in
> the future).

Yeah, it's still not clear where we are going to end up, but in the
meantime we've got a lot of warnings cluttering the code and making
it hard to spot real problems.

> So "unsigned char*" is not enough for the goal anyway, I'm not against
> your patches.

OK.  No one else objected, so I'll go ahead and apply before the code
drifts to the point of breaking the patch.

            regards, tom lane