Thread: Re: Proposed patch to clean up signed-ness warnings

Re: Proposed patch to clean up signed-ness warnings

From
Tatsuo Ishii
Date:
> With gcc 4 spreading, it seems like it's past time to do something about
> all those signed-vs-unsigned-char warnings that it emits.  (Translation:
> now that I have to use gcc 4 regularly, I got annoyed enough to fix it
> ;-))
>
> I looked into it a little and determined that nearly all the warnings
> were associated with the multibyte code.  Outside the mb subsystem,
> our code pretty much uses "char *" for strings, but inside mb it's
> mostly "unsigned char *", which is needed because there are lots of
> inequality comparisons in there.  It seemed to me that the cleanest
> fix was to change the external API of the mb subsystem to take and
> return "char *", while still using "unsigned char *" internally.
> The attached patch eliminates all signed-ness warnings in CVS tip
> using this approach.  It's kinda long and tedious, but straightforward,
> and quite a lot of the changes simplify existing code by removing
> casts that aren't needed anymore.
>
> Two questions for the list:
>
> 1. Can anyone think of a cleaner way to do this?
>
> 2. Is there objection to applying this patch now (ie, before beta3)?
> It's not quite a bug fix, but I think it'll make it easier to find
> bugs going forward.

For me, your patche seems to be a retrogression. In my understanding,
the reason why PostgreSQL uses "char *" in many places is just it was
designed in the old days when ASCII was the only charset in the world.
--
SRA OSS, Inc. Japan
Tatsuo Ishii

Re: Proposed patch to clean up signed-ness warnings

From
Tom Lane
Date:
Tatsuo Ishii <ishii@sraoss.co.jp> writes:
>> 1. Can anyone think of a cleaner way to do this?

> For me, your patche seems to be a retrogression. In my understanding,
> the reason why PostgreSQL uses "char *" in many places is just it was
> designed in the old days when ASCII was the only charset in the world.

Are you proposing that we change all the "char *" to "unsigned char *"?
I looked at that briefly but it seems like a huge loss, both in
notational ugliness and in the amount of code that would have to be
touched.  Also, it would force us to add a bunch of explicit casts to
avoid warnings with standard library functions like strlen().  To me the
bottom line is that 99% of the code only needs to know that a character
string is a character string.  As this patch demonstrates, there is only
a tiny fraction that needs to have the "unsigned" declaration.  I don't
think we should allow that fraction to dictate a notational burden for
all the rest.

            regards, tom lane

Re: Proposed patch to clean up signed-ness warnings

From
Tatsuo Ishii
Date:
> > For me, your patche seems to be a retrogression. In my understanding,
> > the reason why PostgreSQL uses "char *" in many places is just it was
> > designed in the old days when ASCII was the only charset in the world.
>
> Are you proposing that we change all the "char *" to "unsigned char *"?

No, I suggest we change all "char *" to "unsigned char *" only where
it points a string which could hold non ASCII character strings. I
thought we learned the danger of 1) comparing chars with signed bit
on, 2) passing chars with sign bit on to functions which expect int
etc...

> I looked at that briefly but it seems like a huge loss, both in
> notational ugliness and in the amount of code that would have to be
> touched.

If you are just care the amount of effort, why don't you leave as it
is and use pre v4 gcc? :-)

>  Also, it would force us to add a bunch of explicit casts to
> avoid warnings with standard library functions like strlen().

Counter examples could be easily found in isalpha(), toupper() etc.

> To me the
> bottom line is that 99% of the code only needs to know that a character
> string is a character string.  As this patch demonstrates, there is only
> a tiny fraction that needs to have the "unsigned" declaration.  I don't
> think we should allow that fraction to dictate a notational burden for
> all the rest.

To support multiple charsets/collataions, I think we need to change
the way to represent character strings from the unstructured "char *"
to more intelligent structure (I know it's hard to implement that
without significant performance loss, but I know we should do it in
the future).

So "unsigned char*" is not enough for the goal anyway, I'm not against
your patches.
--
SRA OSS, Inc. Japan
Tatsuo Ishii

Re: Proposed patch to clean up signed-ness warnings

From
Tom Lane
Date:
Tatsuo Ishii <ishii@sraoss.co.jp> writes:
>> Are you proposing that we change all the "char *" to "unsigned char *"?

> No, I suggest we change all "char *" to "unsigned char *" only where
> it points a string which could hold non ASCII character strings.

Which is pretty nearly all of them...

> To support multiple charsets/collataions, I think we need to change
> the way to represent character strings from the unstructured "char *"
> to more intelligent structure (I know it's hard to implement that
> without significant performance loss, but I know we should do it in
> the future).

Yeah, it's still not clear where we are going to end up, but in the
meantime we've got a lot of warnings cluttering the code and making
it hard to spot real problems.

> So "unsigned char*" is not enough for the goal anyway, I'm not against
> your patches.

OK.  No one else objected, so I'll go ahead and apply before the code
drifts to the point of breaking the patch.

            regards, tom lane