Thread: Portability concerns over pq_sendbyte?
In PG11, pq_sendbyte got changed from taking an int parameter to taking an int8. While that seems to work in general, it does mean that there are now several places in the code that do the equivalent of: unsigned char x = 128; pq_sendbyte(&buf, x); which I believe is not well-defined since pq_sendbyte takes an int8, and conversions of unrepresentable values to _signed_ integer types are (iirc) implementation-dependent. There are also some cases where pq_sendint16 is being called for an unsigned value or a value that might exceed 32767. Would it be better for these to take unsigned values, or have unsigned variants? -- Andrew (irc:RhodiumToad)
On Thu, May 24, 2018 at 06:13:23PM +0100, Andrew Gierth wrote: > In PG11, pq_sendbyte got changed from taking an int parameter to taking > an int8. From 1de09ad8. > While that seems to work in general, it does mean that there are now > several places in the code that do the equivalent of: > > unsigned char x = 128; > pq_sendbyte(&buf, x); > > which I believe is not well-defined since pq_sendbyte takes an int8, and > conversions of unrepresentable values to _signed_ integer types are > (iirc) implementation-dependent. Good point. > There are also some cases where pq_sendint16 is being called for an > unsigned value or a value that might exceed 32767. If kept, some safeguards based on PG_INT*_[MIN|MAX] could be appropriate. > Would it be better for these to take unsigned values, or have unsigned > variants? It seems to me that it is not too late to change those interfaces so as they use unsigned values in input so as they are consistent, it would be dangerous on the contrary to keep those as they are. -- Michael
Attachment
On 2018-May-24, Andrew Gierth wrote: > In PG11, pq_sendbyte got changed from taking an int parameter to taking > an int8. > Would it be better for these to take unsigned values, or have unsigned > variants? Do you have an answer to this question? Does anybody else? (My guts tell me it'd be better to change these routines to take unsigned values, without creating extra variants. But guts frequently misspeak.) Andres, you own this open item, unless one of the other committers who have participated in this thread would like to take it as their own. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 06, 2018 at 12:27:58PM -0400, Alvaro Herrera wrote: > Do you have an answer to this question? Does anybody else? > > (My guts tell me it'd be better to change these routines to take > unsigned values, without creating extra variants. But guts frequently > misspeak.) My guts are telling me as well to not have more variants. On top of that it seems to me that we'd want to rename any new routines to include "uint" in their name instead of "int", and for compatibility with past code pq_sendint should not be touched. -- Michael
Attachment
On Mon, Jun 11, 2018 at 02:25:44PM +0900, Michael Paquier wrote: > On Wed, Jun 06, 2018 at 12:27:58PM -0400, Alvaro Herrera wrote: >> Do you have an answer to this question? Does anybody else? >> >> (My guts tell me it'd be better to change these routines to take >> unsigned values, without creating extra variants. But guts frequently >> misspeak.) > > My guts are telling me as well to not have more variants. On top of > that it seems to me that we'd want to rename any new routines to include > "uint" in their name instead of "int", and for compatibility with past > code pq_sendint should not be touched. And also pq_sendint64 needs to be kept around for compatibility. I have quickly looked at how much code would be involved here and there are quite close to 240 code paths which involve the new routines. Please see attached for reference, I have not put much thoughts into it to be honest, so that's really at an early stage. -- Michael
Attachment
Hi, On 2018-05-24 18:13:23 +0100, Andrew Gierth wrote: > In PG11, pq_sendbyte got changed from taking an int parameter to taking > an int8. > > While that seems to work in general, it does mean that there are now > several places in the code that do the equivalent of: > > unsigned char x = 128; > pq_sendbyte(&buf, x); > > which I believe is not well-defined since pq_sendbyte takes an int8, and > conversions of unrepresentable values to _signed_ integer types are > (iirc) implementation-dependent. It's not implementation defined in postgres' dialect of C - we rely on accurate signed->unsigned conversions in a number of places. But I doin't think we should increase that reliance, so I think you're right we should do something about this. > There are also some cases where pq_sendint16 is being called for an > unsigned value or a value that might exceed 32767. Hm, which case were you thinking of here? The calls usually are exactly the types that the wire protocol expects, no? > Would it be better for these to take unsigned values, or have unsigned > variants? I wonder if we should just take 'int' out of the name. Say, pg_send{8,16,32,64}(unsigned ...). Greetings, Andres Freund
On 2018-06-13 14:10:37 +0900, Michael Paquier wrote: > On Mon, Jun 11, 2018 at 02:25:44PM +0900, Michael Paquier wrote: > > On Wed, Jun 06, 2018 at 12:27:58PM -0400, Alvaro Herrera wrote: > >> Do you have an answer to this question? Does anybody else? > >> > >> (My guts tell me it'd be better to change these routines to take > >> unsigned values, without creating extra variants. But guts frequently > >> misspeak.) > > > > My guts are telling me as well to not have more variants. Agreed. > > On top of that it seems to me that we'd want to rename any new > > routines to include "uint" in their name instead of "int", and for > > compatibility with past code pq_sendint should not be touched. I'm very doubtful about this one, unless you mean that just the signature shouldn't be touched. Otherwise we'll just increase code duplication unnecessarily? > And also pq_sendint64 needs to be kept around for compatibility. :(. Wonder if it's better to just break people's code. Greetings, Andres Freund
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: >> unsigned char x = 128; >> pq_sendbyte(&buf, x); >> >> which I believe is not well-defined since pq_sendbyte takes an int8, >> and conversions of unrepresentable values to _signed_ integer types >> are (iirc) implementation-dependent. Andres> It's not implementation defined in postgres' dialect of C - we Andres> rely on accurate signed->unsigned conversions in a number of Andres> places. Converting signed integer to unsigned is ok as I understand it - what's happening here is the reverse, converting an unrepresentable unsigned value to a signed type. >> There are also some cases where pq_sendint16 is being called for an >> unsigned value or a value that might exceed 32767. Andres> Hm, which case were you thinking of here? The calls usually are Andres> exactly the types that the wire protocol expects, no? There are cases where it's not actually clear what the wire protocol expects - I'm thinking in particular of the number of entries in a list of parameter types/formats. -- Andrew (irc:RhodiumToad)
Hi, On 2018-06-13 22:02:13 +0100, Andrew Gierth wrote: > >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: > > >> unsigned char x = 128; > >> pq_sendbyte(&buf, x); > >> > >> which I believe is not well-defined since pq_sendbyte takes an int8, > >> and conversions of unrepresentable values to _signed_ integer types > >> are (iirc) implementation-dependent. > > Andres> It's not implementation defined in postgres' dialect of C - we > Andres> rely on accurate signed->unsigned conversions in a number of > Andres> places. > > Converting signed integer to unsigned is ok as I understand it - what's > happening here is the reverse, converting an unrepresentable unsigned > value to a signed type. Err, yes, I was thinking about that conversion. Sorry for the confusion. > >> There are also some cases where pq_sendint16 is being called for an > >> unsigned value or a value that might exceed 32767. > > Andres> Hm, which case were you thinking of here? The calls usually are > Andres> exactly the types that the wire protocol expects, no? > > There are cases where it's not actually clear what the wire protocol > expects - I'm thinking in particular of the number of entries in a list > of parameter types/formats. But that didn't change around the pq_send* changes? So I'm not sure I understand how this is related? I mean I'm all for documenting the wire protocol more extensively, but we can't just change the width? Greetings, Andres Freund
On Wed, Jun 13, 2018 at 11:53:21AM -0700, Andres Freund wrote: > On 2018-06-13 14:10:37 +0900, Michael Paquier wrote: >> On Mon, Jun 11, 2018 at 02:25:44PM +0900, Michael Paquier wrote: >>> On top of that it seems to me that we'd want to rename any new >>> routines to include "uint" in their name instead of "int", and for >>> compatibility with past code pq_sendint should not be touched. > > I'm very doubtful about this one, unless you mean that just the > signature shouldn't be touched. Otherwise we'll just increase code > duplication unnecessarily? Yeah, actually that would be assuming that many modules use it, but that does not seem to be much the case, at least from github's point of view. >> And also pq_sendint64 needs to be kept around for compatibility. > > :(. Wonder if it's better to just break people's code. Indeed. At least breaking compilation has the advantage of making people directly aware of the change and think hopefully about them. A research on github shows a bunch of people having copied of pqformat.h as there are a bunch of copies of Postgres so with this much noise it is not easy to find out what would be broken. In-core contrib and test modules don't make use of those interfaces as well, except for hstore. So that could be acceptable. For pq_sendint there are many matches with printsimple.c. -- Michael
Attachment
Hello How about not renaming the functions, but just change argument types? Having to change all callsites to cope with some new naming convention does not strike me as a great idea ... it may collide with any backpatching in the area, for one. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > How about not renaming the functions, but just change argument types? Yeah, I didn't understand why anything else would be on the table. We already changed their arg types for 11, no? This is just second thoughts about what to change them to. regards, tom lane
Hi, On 2018-06-14 16:17:28 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > How about not renaming the functions, but just change argument types? Yea, I'm in favor of this. I don't think the 'u' in there would benefit us, and the cast from signed to unsigned is well defined, so it's safe to call the functions with signed input. > Yeah, I didn't understand why anything else would be on the table. Because it sounds a bit weird for a function with just 'int' in the name to take unsigned ints as a parameter? I don't think that's enough justification to rename everything, but it's not completely crazy either. > We already changed their arg types for 11, no? No, not really. We added new functions. Greetings, Andres Freund
Hi, On 2018-06-14 13:25:30 -0700, Andres Freund wrote: > On 2018-06-14 16:17:28 -0400, Tom Lane wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > > How about not renaming the functions, but just change argument types? > > Yea, I'm in favor of this. I don't think the 'u' in there would benefit > us, and the cast from signed to unsigned is well defined, so it's safe > to call the functions with signed input. Nobody argued against, thus I've pushed a patch doing so. Looking at surrounding code I found a few more oddities, but of older vintage: - pq_sendfloat4 uses an uint32 in the union, but float8 uses a int64. - same with pq_getmsgfloat[48] - pq_getmsgint64 returns a int64, should probably also be uint64 Given they're practially harmless I'm inclined to only fix them in master? Greetings, Andres Freund
On Tue, Jun 26, 2018 at 11:51:49PM -0700, Andres Freund wrote: > Looking at surrounding code I found a few more oddities, but of older > vintage: > - pq_sendfloat4 uses an uint32 in the union, but float8 uses a int64. > - same with pq_getmsgfloat[48] > - pq_getmsgint64 returns a int64, should probably also be uint64 > > Given they're practially harmless I'm inclined to only fix them in > master? Doing those three things on HEAD only looks like a good plan to me. I have not spotted more inconsistencies in pqformat.c. -- Michael