Thread: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Mikko Tiihonen
Date:
Hi, As discussed few days ago it would be beneficial if we could change the v3 backend<->client protocol without breaking backwardscompatibility. Here is a working patch that exports a supported_binary_minor constant and a binary_minor session variable and a that canbe used by clients to enable newer features. I also added an example usage where the array encoding for constant size elements is optimized if binary_minor version isnew enough. I have coded the client side support for binary_minor for jdbc driver and tested that it worked. But lets first discuss ifthis is an acceptable path forward. On 11/25/2011 02:20 AM, Oliver Jowett wrote: > Re list vs. always-incrementing minor version, you could just use an > integer and set bits to represent features, which would keep it simple > but also let clients be more selective about which features they > implement (you could support feature 21 and 23 without supporting 22) I decided not to use a feature flag because when features start to depend on each other we need multiple negotiation roundtrips until the final feature set can be known. If in your example above the feature 23 depends on server side on feature 22, but the client only requests 21,23. The servermust inform back that combination 21,23 is reduced to 21. And if then the client can not support 21 without 23 the final feature set will not contain 21 or23. -Mikko
Attachment
Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Robert Haas
Date:
On Mon, Nov 28, 2011 at 10:18 AM, Mikko Tiihonen <mikko.tiihonen@nitorcreations.com> wrote: > Here is a working patch that exports a supported_binary_minor constant and a > binary_minor session variable and a that can be used by clients to enable > newer features. Can you add this patch here so we don't lose track of it? https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Merlin Moncure
Date:
On Mon, Nov 28, 2011 at 9:18 AM, Mikko Tiihonen <mikko.tiihonen@nitorcreations.com> wrote: > Hi, > > As discussed few days ago it would be beneficial if we could change the v3 > backend<->client protocol without breaking backwards compatibility. > > Here is a working patch that exports a supported_binary_minor constant and a > binary_minor session variable and a that can be used by clients to enable > newer features. > > I also added an example usage where the array encoding for constant size > elements is optimized if binary_minor version is new enough. > > I have coded the client side support for binary_minor for jdbc driver and > tested that it worked. But lets first discuss if this is an acceptable path > forward. > > On 11/25/2011 02:20 AM, Oliver Jowett wrote: >> Re list vs. always-incrementing minor version, you could just use an >> integer and set bits to represent features, which would keep it simple >> but also let clients be more selective about which features they >> implement (you could support feature 21 and 23 without supporting 22) > > I decided not to use a feature flag because when features start to depend on > each other we need multiple negotiation round trips until the final feature > set can be known. > If in your example above the feature 23 depends on server side on feature > 22, but the client only requests 21,23. The server must inform back that > combination 21,23 is reduced to 21. And if then the client can not support > 21 without 23 the final feature set will not contain 21 or 23. Regarding your TODO in the code comments about interactions with replication: I think it should be removed. WAL streaming depends on more things being identical than what is guaranteed here which is basically the protocol + data formats. I think all references to 'protocol' should be removed; Binary wire formats != protocol: the protocol could bump to v4 but the wire formats (by happenstance) could all still continue to work -- therefore the version isn't minor (it's not dependent on protocol version but only on itself). Therefore, don't much like the name 'supported_binary_minor'. How about binary_format_version? Also, is a non granular approach really buying us anything here? ISTM *something* is likely to change format on most releases of the server so I'm wondering what's the point (if you don't happen to be on the same x.x release of the server, you might as well assume to not match or at least 'go on your own risk). The value added to the client version query is quite low. merlin
Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Mikko Tiihonen
Date:
On 11/30/2011 06:52 PM, Merlin Moncure wrote: > On Mon, Nov 28, 2011 at 9:18 AM, Mikko Tiihonen > <mikko.tiihonen@nitorcreations.com> wrote: >> Hi, >> >> As discussed few days ago it would be beneficial if we could change the v3 >> backend<->client protocol without breaking backwards compatibility. >> >> Here is a working patch that exports a supported_binary_minor constant and a >> binary_minor session variable and a that can be used by clients to enable >> newer features. >> >> I also added an example usage where the array encoding for constant size >> elements is optimized if binary_minor version is new enough. >> >> I have coded the client side support for binary_minor for jdbc driver and >> tested that it worked. But lets first discuss if this is an acceptable path >> forward. > > Regarding your TODO in the code comments about interactions with > replication: I think it should be removed. WAL streaming depends on > more things being identical than what is guaranteed here which is > basically the protocol + data formats. OK. I'll remove the comments about replication. > I think all references to > 'protocol' should be removed; Binary wire formats != protocol: the > protocol could bump to v4 but the wire formats (by happenstance) could > all still continue to work -- therefore the version isn't minor (it's > not dependent on protocol version but only on itself). Therefore, > don't much like the name 'supported_binary_minor'. How about > binary_format_version? I was thinking that it would be possible use the new minor version to introduce also new protocol messages such as streaming of large data into database without knowing it's size beforehand. > Also, is a non granular approach really buying us anything here? ISTM > *something* is likely to change format on most releases of the server > so I'm wondering what's the point (if you don't happen to be on the > same x.x release of the server, you might as well assume to not match > or at least 'go on your own risk). The value added to the client > version query is quite low. You have a very good point about changes in every new postgres version. Either text or the binary encoding is likely to change for some types in each new release. There needs to be decision on official policy about breaking backwards compatibility of postgresql clients. Is it: A) Every x.y postgres release is free to break any parts of the * protocol * text encoding * binary protocol as longas it is documented -> all client libraries should be coded so that they refuse to support version x.y+1 or newer(they might have a option to override this but there are no guarantees that it would work) Good: no random bugswhen using old client library Bad: initial complaints from users before they understand that this is the bestoption for everyone B) Every x.y postgres release must guarantee that no client visible parts of protocol, text encoding or binary encodingwill change from previous release in the v3 protocol. If any changes are needed then a new protocol version mustbe created. -> very high barrier for new features Good: can upgrade server without updating clients Bad: new featuresare only introduced very rarely after enough have accumulated Bad: many feature/change patches will rotwhile waiting for the upcoming new version C) There is effort to try avoiding incompatible changes. Some changes are blocked because it is detected that they canbreak backwards compatibility while others are let through (often with some compatibility option on server side tofall back to previous functionality (f.ex. bytea hex encoding). -> As far as I understand this is the current situation. Good: has worked so far Bad: accumulates compatibility flags in the server D) My proposed compromise where there is one minor_version setting and code in server to support all different minor versions. The client requests the minor version so that all releases default to backwards compatible version. When therecombinations starts to create too much maintenance overhead a new clean v4 version of protocol is specified. Good:Keeps full backwards compatibility Good: Allows introducing changes at any time Bad: Accumulates conditional codeto server and clients until new version of protocol is released I'd like the official policy to be A, otherwise I'll push for D. -Mikko
Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Merlin Moncure
Date:
On Thu, Dec 1, 2011 at 8:42 AM, Mikko Tiihonen <mikko.tiihonen@nitorcreations.com> wrote: > On 11/30/2011 06:52 PM, Merlin Moncure wrote: >> >> On Mon, Nov 28, 2011 at 9:18 AM, Mikko Tiihonen >> <mikko.tiihonen@nitorcreations.com> wrote: >>> >>> Hi, >>> >>> As discussed few days ago it would be beneficial if we could change the >>> v3 >>> backend<->client protocol without breaking backwards compatibility. >>> >>> Here is a working patch that exports a supported_binary_minor constant >>> and a >>> binary_minor session variable and a that can be used by clients to enable >>> newer features. >>> >>> I also added an example usage where the array encoding for constant size >>> elements is optimized if binary_minor version is new enough. >>> >>> I have coded the client side support for binary_minor for jdbc driver and >>> tested that it worked. But lets first discuss if this is an acceptable >>> path >>> forward. >> >> >> Regarding your TODO in the code comments about interactions with >> replication: I think it should be removed. WAL streaming depends on >> more things being identical than what is guaranteed here which is >> basically the protocol + data formats. > > > OK. I'll remove the comments about replication. > > >> I think all references to >> 'protocol' should be removed; Binary wire formats != protocol: the >> protocol could bump to v4 but the wire formats (by happenstance) could >> all still continue to work -- therefore the version isn't minor (it's >> not dependent on protocol version but only on itself). Therefore, >> don't much like the name 'supported_binary_minor'. How about >> binary_format_version? > > > I was thinking that it would be possible use the new minor version to > introduce also new protocol messages such as streaming of large data > into database without knowing it's size beforehand. > > >> Also, is a non granular approach really buying us anything here? ISTM >> *something* is likely to change format on most releases of the server >> so I'm wondering what's the point (if you don't happen to be on the >> same x.x release of the server, you might as well assume to not match >> or at least 'go on your own risk). The value added to the client >> version query is quite low. > > > You have a very good point about changes in every new postgres > version. Either text or the binary encoding is likely to change for > some types in each new release. > > There needs to be decision on official policy about breaking backwards > compatibility of postgresql clients. Is it: > > A) Every x.y postgres release is free to break any parts of the > * protocol > * text encoding > * binary protocol > as long as it is documented > -> all client libraries should be coded so that they refuse to > support version x.y+1 or newer (they might have a option to > override this but there are no guarantees that it would work) > Good: no random bugs when using old client library > Bad: initial complaints from users before they understand that > this is the best option for everyone > > B) Every x.y postgres release must guarantee that no client visible > parts of protocol, text encoding or binary encoding will change > from previous release in the v3 protocol. If any changes are > needed then a new protocol version must be created. > -> very high barrier for new features > Good: can upgrade server without updating clients > Bad: new features are only introduced very rarely after enough > have accumulated > Bad: many feature/change patches will rot while waiting for the > upcoming new version > > C) There is effort to try avoiding incompatible changes. Some > changes are blocked because it is detected that they can break > backwards compatibility while others are let through (often with > some compatibility option on server side to fall back to > previous functionality (f.ex. bytea hex encoding). > -> As far as I understand this is the current situation. > Good: has worked so far > Bad: accumulates compatibility flags in the server > > D) My proposed compromise where there is one minor_version setting > and code in server to support all different minor versions. > The client requests the minor version so that all releases > default to backwards compatible version. > When there combinations starts to create too much maintenance > overhead a new clean v4 version of protocol is specified. > Good: Keeps full backwards compatibility > Good: Allows introducing changes at any time > Bad: Accumulates conditional code to server and clients until > new version of protocol is released > > > I'd like the official policy to be A, otherwise I'll push for D. In issue A, you mentioned that client libraries should refuse to support version x+1 or newer. If by client libraries, you mean 'libpq', then this is absolutely not going to fly -- libpq has no such restriction now and adding it isn't doing users any favors IMO. The problem is not libpq/jdbc etc, but application code. I'll say again, wire format compatibility issues are fundamentally different from the protocol issues; decades of user application code are involved and half measures are simply not going to work. The typical scenario is that some hand written C application utilizes the binary wire format for some reason and the database goes EOL and gets upgraded, possibly many years after the application was written. The minor version check provides zero help in dealing with this problem other then to tell you something you already knew was going to be an issue, which is really no help at all. Where a minor version check might find some limited use is when reading data produced by COPY BINARY, but that's not a major problem as I see it today since isn't really a huge stumbling block for users since user application code rarely consumes that format. Something that definitely would help binary format consumers while maintaining the status quo would be to simply document all the wire formats, or at least all the changes to those formats. That way, at least someone considering an upgrade would have a handy reference to check to determine the scope and impact of the changes. As far as I'm concerned, this puts the text and binary formats at parity in terms of standard of support to the user. If you want something richer than a documentation style approach, then I would be more in favor of something that would actually solve compatibility issues from the user's point of view. This means handling wire formats in the client library such that client application authors are protected from wire format (either text or binary) changes. Yes, this means we would permanently have to track old formats forever just like we have to keep the old protocol code laying around (both in the client *and the server*). Unfortunately, this only partially helps jdbc users, since the jdbc doesn't wrap libpq. If the jdbc authors want to get serious about binary support of fancy backend structures like arrays then they would have to get serious about client insulation as well. This is, by the way, a noble goal. The Postgres type system is incredibly rich but many developers, especially non C coders, are unable to fully take advantage of that because of poor support on the client side. merlin
Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Noah Misch
Date:
Hi Mikko, First, for everyone else: this patch provides a more-compact binary output format for arrays. When the array contains no NULL elements and has a fixed-length element type, the new format saves four bytes per array element. We could do more. We could add support for arrays containing NULLs by way of a nulls bitmap. We could reduce the per-array overhead, currently 20 bytes for a 1-D array, in addition to the per-element overhead. Does anyone care about these broader cases? If so, speak now. On Thu, Dec 01, 2011 at 04:42:43PM +0200, Mikko Tiihonen wrote: > On 11/30/2011 06:52 PM, Merlin Moncure wrote: >> On Mon, Nov 28, 2011 at 9:18 AM, Mikko Tiihonen >> <mikko.tiihonen@nitorcreations.com> wrote: >>> Hi, >>> >>> As discussed few days ago it would be beneficial if we could change the v3 >>> backend<->client protocol without breaking backwards compatibility. >>> >>> Here is a working patch that exports a supported_binary_minor constant and a >>> binary_minor session variable and a that can be used by clients to enable >>> newer features. >>> >>> I also added an example usage where the array encoding for constant size >>> elements is optimized if binary_minor version is new enough. >>> >>> I have coded the client side support for binary_minor for jdbc driver and >>> tested that it worked. But lets first discuss if this is an acceptable path >>> forward. >> I think all references to >> 'protocol' should be removed; Binary wire formats != protocol: the >> protocol could bump to v4 but the wire formats (by happenstance) could >> all still continue to work -- therefore the version isn't minor (it's >> not dependent on protocol version but only on itself). Therefore, >> don't much like the name 'supported_binary_minor'. How about >> binary_format_version? > I was thinking that it would be possible use the new minor version to > introduce also new protocol messages such as streaming of large data > into database without knowing it's size beforehand. I agree with Merlin; the frontend/backend protocol is logically distinct from the binary send/recv formats of data types. For one key point, the latter is not exclusively core-defined; third-party extensions change their send/recv formats on a different schedule. They can add myext.binary_format_version GUCs of their own to cope in a similar way. Client interfaces that do not interpret individual result column data, such as libpq, require no update for send/recv format changes. In contrast, all client interfaces would need changes for a new protocol message. Most clients doing so may as well then advertise their support unconditionally. In contrast, send/recv format is something individual _users_ of the same client library may set differently. It's reasonable to have an application that still reads its data in send/recv format v0 even after upgrading to a version of libpq that talks frontend/backend protocol v4. I do think this is a great way to handle send/recv format changes. > There needs to be decision on official policy about breaking backwards > compatibility of postgresql clients. Is it: > > A) Every x.y postgres release is free to break any parts of the > * protocol > * text encoding > * binary protocol > as long as it is documented > -> all client libraries should be coded so that they refuse to > support version x.y+1 or newer (they might have a option to > override this but there are no guarantees that it would work) > Good: no random bugs when using old client library > Bad: initial complaints from users before they understand that > this is the best option for everyone The ability to use old client libraries with new servers is more valuable than the combined benefits of all currently-contemplated protocol improvements. > D) My proposed compromise where there is one minor_version setting > and code in server to support all different minor versions. > The client requests the minor version so that all releases > default to backwards compatible version. This is workable. > When there combinations starts to create too much maintenance > overhead a new clean v4 version of protocol is specified. > Good: Keeps full backwards compatibility > Good: Allows introducing changes at any time > Bad: Accumulates conditional code to server and clients until > new version of protocol is released We would retain support for protocol V3 for years following the first release to support protocol V4, so think of the conditional code as lasting forever. The patch works as advertised. After "set binary_minor = 1", the output of this command shrinks from 102 MiB to 63 MiB: \copy (select array[0,1,2,3,4,5,6,7,8,9] from generate_series(1,1000000)) to mynums with binary > --- a/src/backend/utils/misc/guc.c > +++ b/src/backend/utils/misc/guc.c > @@ -477,6 +477,7 @@ static int segment_size; > static int wal_block_size; > static int wal_segment_size; > static bool integer_datetimes; > +static int supported_binary_minor; > static int effective_io_concurrency; > > /* should be static, but commands/variable.c needs to get at this */ > @@ -1976,6 +1977,19 @@ static struct config_int ConfigureNamesInt[] = > }, > > { > + {"supported_binary_minor", PGC_INTERNAL, PRESET_OPTIONS, > + gettext_noop("Maximum supported binary minor version."), > + NULL, > + GUC_REPORT | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE > + }, > + &supported_binary_minor, > + SUPPORTED_BINARY_MINOR_VERSION, > + SUPPORTED_BINARY_MINOR_VERSION, > + SUPPORTED_BINARY_MINOR_VERSION, > + NULL, NULL, NULL > + }, We don't need this GUC; clients can do: select max_val from pg_settings where name = 'binary_minor' > --- a/src/backend/utils/adt/arrayfuncs.c > +++ b/src/backend/utils/adt/arrayfuncs.c > @@ -1535,6 +1537,10 @@ array_send(PG_FUNCTION_ARGS) > typbyval = my_extra->typbyval; > typalign = my_extra->typalign; > > + flags = ARR_HASNULL(v) ? 1 : 0; > + if (binary_minor >= 1 && flags == 0 && typlen > 0) > + flags |= 2; Let's use symbolic constants for these flags. Submit patches in context diff format (filterdiff --format=context). Also, binary_minor.patch is actually two git commits stacked on each other and modifying the same file twice. Please flatten each patch. You could also merge fixed_length_array_protocol.patch and binary_minor.patch into one patch. Neither would be applied in isolation, and they're no easier to understand when separated. Add a description of the new GUC to the SGML documentation. Reference that GUC from the arrays data type page. Our documentation does not generally describe the send/recv format for data types, and array_send() is no exception. Considering that, I do wonder how much software out there is using the current array_send(). Alas, no way to know. This will be odd; we're adding a GUC that switches between two undocumented behaviors. Thanks, nm
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Robert Haas
Date:
On Thu, Jan 19, 2012 at 10:37 AM, Noah Misch <noah@leadboat.com> wrote: > I agree with Merlin; the frontend/backend protocol is logically distinct from > the binary send/recv formats of data types. For one key point, the latter is > not exclusively core-defined; third-party extensions change their send/recv > formats on a different schedule. They can add myext.binary_format_version > GUCs of their own to cope in a similar way. I agree. It occurs to me that we recently changed the default *text* output format for bytea for reasons not dissimilar to those contemplated here. Presumably, that's a much more disruptive change, and yet we've had minimal complaints because anyone who gets bitten can easily set bytea_output='escape' and the problem goes away. The same thing seems like it would work here, only the number of people needing to change the parameter will probably be even smaller, because fewer people use binary than text. Having said that, if we're to follow the precedent set by bytea_format, maybe we ought to just add binary_array_format={huge,ittybitty} and be done with it, rather than inventing a minor protocol version GUC for something that isn't really a protocol version change at all. We could introduce a differently-named general mechanism, but I guess I'm not seeing the point of that either. Just because someone has a backward-compatibility issue with one change of this type doesn't mean they have a similar issue with all of them. So I think adding a special-purpose GUC is more logical and more parallel to what we've done in the past, and it doesn't saddle us with having to be certain that we've designed the mechanism generally enough to handle all the cases that may come later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Noah Misch
Date:
On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote: > On Thu, Jan 19, 2012 at 10:37 AM, Noah Misch <noah@leadboat.com> wrote: > > I agree with Merlin; the frontend/backend protocol is logically distinct from > > the binary send/recv formats of data types. ?For one key point, the latter is > > not exclusively core-defined; third-party extensions change their send/recv > > formats on a different schedule. ?They can add myext.binary_format_version > > GUCs of their own to cope in a similar way. > > I agree. It occurs to me that we recently changed the default *text* > output format for bytea for reasons not dissimilar to those > contemplated here. Presumably, that's a much more disruptive change, > and yet we've had minimal complaints because anyone who gets bitten > can easily set bytea_output='escape' and the problem goes away. The > same thing seems like it would work here, only the number of people > needing to change the parameter will probably be even smaller, because > fewer people use binary than text. > > Having said that, if we're to follow the precedent set by > bytea_format, maybe we ought to just add > binary_array_format={huge,ittybitty} and be done with it, rather than > inventing a minor protocol version GUC for something that isn't really > a protocol version change at all. We could introduce a > differently-named general mechanism, but I guess I'm not seeing the > point of that either. Just because someone has a > backward-compatibility issue with one change of this type doesn't mean > they have a similar issue with all of them. So I think adding a > special-purpose GUC is more logical and more parallel to what we've > done in the past, and it doesn't saddle us with having to be certain > that we've designed the mechanism generally enough to handle all the > cases that may come later. That makes sense. An attraction of a single binary format version was avoiding the "Is this worth a GUC?" conversation for each change. However, adding a GUC should be no more notable than bumping a binary format version.
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Ants Aasma
Date:
<p>On 19 Jan 2012 21:00, "Robert Haas" <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>> wrote:<br/> > I agree. It occurs to me that we recently changed the default *text*<br /> > output format for byteafor reasons not dissimilar to those<br /> > contemplated here. Presumably, that's a much more disruptive change,<br/> > and yet we've had minimal complaints because anyone who gets bitten<br /> > can easily set bytea_output='escape'and the problem goes away.<p>I had a run in with this. JDBC driver versions < 9.0 with the defaultconfiguration resulted in silent data corruption. The fix was easy, but not having an useful error was what reallybothered me.<p>--<br /> Ants Aasma
Re: Optimize binary serialization format of arrays with fixed size elements
From
Mikko Tiihonen
Date:
Previous title was: Add minor version to v3 protocol to allow changes without breaking backwards compatibility On 01/20/2012 04:45 AM, Noah Misch wrote: > On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote: >> On Thu, Jan 19, 2012 at 10:37 AM, Noah Misch<noah@leadboat.com> wrote: >>> I agree with Merlin; the frontend/backend protocol is logically distinct from >>> the binary send/recv formats of data types. ?For one key point, the latter is >>> not exclusively core-defined; third-party extensions change their send/recv >>> formats on a different schedule. ?They can add myext.binary_format_version >>> GUCs of their own to cope in a similar way. >> >> I agree. It occurs to me that we recently changed the default *text* >> output format for bytea for reasons not dissimilar to those >> contemplated here. Presumably, that's a much more disruptive change, >> and yet we've had minimal complaints because anyone who gets bitten >> can easily set bytea_output='escape' and the problem goes away. The >> same thing seems like it would work here, only the number of people >> needing to change the parameter will probably be even smaller, because >> fewer people use binary than text. >> >> Having said that, if we're to follow the precedent set by >> bytea_format, maybe we ought to just add >> binary_array_format={huge,ittybitty} and be done with it, rather than >> inventing a minor protocol version GUC for something that isn't really >> a protocol version change at all. We could introduce a >> differently-named general mechanism, but I guess I'm not seeing the >> point of that either. Just because someone has a >> backward-compatibility issue with one change of this type doesn't mean >> they have a similar issue with all of them. So I think adding a >> special-purpose GUC is more logical and more parallel to what we've >> done in the past, and it doesn't saddle us with having to be certain >> that we've designed the mechanism generally enough to handle all the >> cases that may come later. > > That makes sense. An attraction of a single binary format version was avoiding > the "Is this worth a GUC?" conversation for each change. However, adding a GUC > should be no more notable than bumping a binary format version. I see the main difference between the GUC per feature vs minor version being that in versioned changes old clients keep working because the have to explicitly request a specific version. Whereas in separate GUC variables each feature will be enabled by default and users have to either keep up with new client versions or figure out how to explicitly disable the changes. However, due to popular vote I removed the minor version proposal for now. Here is a second version of the patch. The binary array encoding changes stay the same but all code around was rewritten. Changes from previous versions based on received comments: * removed the minor protocol version concept * introduced a new GUC variable array_output copying the current bytea_output type, with values "full" (old value) and "smallfixed" (new default) * added documentation for the new GUC variable * used constants for the array flags variable values -Mikko
Attachment
On Sun, Jan 22, 2012 at 11:47:06PM +0200, Mikko Tiihonen wrote: > On 01/20/2012 04:45 AM, Noah Misch wrote: >> On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote: >>> Having said that, if we're to follow the precedent set by >>> bytea_format, maybe we ought to just add >>> binary_array_format={huge,ittybitty} and be done with it, rather than >>> inventing a minor protocol version GUC for something that isn't really >>> a protocol version change at all. We could introduce a >>> differently-named general mechanism, but I guess I'm not seeing the >>> point of that either. Just because someone has a >>> backward-compatibility issue with one change of this type doesn't mean >>> they have a similar issue with all of them. So I think adding a >>> special-purpose GUC is more logical and more parallel to what we've >>> done in the past, and it doesn't saddle us with having to be certain >>> that we've designed the mechanism generally enough to handle all the >>> cases that may come later. >> >> That makes sense. An attraction of a single binary format version was avoiding >> the "Is this worth a GUC?" conversation for each change. However, adding a GUC >> should be no more notable than bumping a binary format version. > > I see the main difference between the GUC per feature vs minor version being that > in versioned changes old clients keep working because the have to explicitly > request a specific version. Whereas in separate GUC variables each feature will be > enabled by default and users have to either keep up with new client versions or > figure out how to explicitly disable the changes. No, we can decide that anew for each GUC. If you'd propose that array_output = full be the default, that works for me. > Here is a second version of the patch. The binary array encoding changes > stay the same but all code around was rewritten. > > Changes from previous versions based on received comments: > * removed the minor protocol version concept > * introduced a new GUC variable array_output copying the current > bytea_output type, with values "full" (old value) and > "smallfixed" (new default) How about the name "array_binary_output"? > * added documentation for the new GUC variable > * used constants for the array flags variable values > *** a/doc/src/sgml/config.sgml > --- b/doc/src/sgml/config.sgml > *************** COPY postgres_log FROM '/full/path/to/lo > *** 4888,4893 **** > --- 4888,4910 ---- > </listitem> > </varlistentry> > > + <varlistentry id="guc-array-output" xreflabel="array_output"> > + <term><varname>array_output</varname> (<type>enum</type>)</term> > + <indexterm> > + <primary><varname>array_output</> configuration parameter</primary> > + </indexterm> > + <listitem> > + <para> > + Sets the output format for binary encoding of values of > + type <type>array</type>. Valid values are > + <literal>smallfixed</literal> (the default) > + and <literal>full</literal> (the traditional PostgreSQL > + format). The <type>array</type> type always It's not "The array type" but "Array types", a class. > + accepts both formats on input, regardless of this setting. > + </para> > + </listitem> > + </varlistentry> The section "Array Input and Output Syntax" should reference this GUC. > *** a/src/backend/utils/misc/guc.c > --- b/src/backend/utils/misc/guc.c > *************** > *** 64,69 **** > --- 64,70 ---- > #include "storage/predicate.h" > #include "tcop/tcopprot.h" > #include "tsearch/ts_cache.h" > + #include "utils/array.h" > #include "utils/builtins.h" > #include "utils/bytea.h" > #include "utils/guc_tables.h" > *************** static const struct config_enum_entry by > *** 225,230 **** > --- 226,243 ---- > }; > > /* > + * Options for enum values defined in this module. > + * > + * NOTE! Option values may not contain double quotes! > + */ Don't replicate this comment. > + > + static const struct config_enum_entry array_output_options[] = { > + {"full", ARRAY_OUTPUT_FULL, false}, > + {"smallfixed", ARRAY_OUTPUT_SMALLFIXED, false}, > + {NULL, 0, false} > + }; > + > + /* > * We have different sets for client and server message level options because > * they sort slightly different (see "log" level) > */ > *************** static struct config_enum ConfigureNames > *** 3047,3052 **** > --- 3060,3075 ---- > NULL, NULL, NULL > }, > > + { > + {"array_output", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS, You've put the GUC in COMPAT_OPTIONS_PREVIOUS, but you've put its documentation in the section for CLIENT_CONN_STATEMENT. I don't have a strong opinion on which one to use, but be consistent. > + gettext_noop("Sets the binary output format for arrays."), > + NULL > + }, > + &bytea_output, &array_output > + ARRAY_OUTPUT_SMALLFIXED, array_output_options, > + NULL, NULL, NULL > + }, > + > { > {"client_min_messages", PGC_USERSET, LOGGING_WHEN, > gettext_noop("Sets the message levels that are sent to the client."), Thanks, nm
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Florian Weimer
Date:
* Ants Aasma: > I had a run in with this. JDBC driver versions < 9.0 with the default > configuration resulted in silent data corruption. The fix was easy, but not > having an useful error was what really bothered me. Same for the DBD::Pg driver. In this particular case, I knew that the change was coming and could push updated Java and Perl client libraries well before the server-side change hit our internal repository, but I really don't want to have to pay attention to such details. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Robert Haas
Date:
On Mon, Jan 23, 2012 at 4:03 AM, Florian Weimer <fweimer@bfk.de> wrote: > * Ants Aasma: >> I had a run in with this. JDBC driver versions < 9.0 with the default >> configuration resulted in silent data corruption. The fix was easy, but not >> having an useful error was what really bothered me. > > Same for the DBD::Pg driver. > > In this particular case, I knew that the change was coming and could > push updated Java and Perl client libraries well before the server-side > change hit our internal repository, but I really don't want to have to > pay attention to such details. But if we *don't* turn this on by default, then chances are very good that it will get much less use. That doesn't seem good either. If it's important enough to do it at all, then IMHO it's important enough for it to be turned on by default. We have never made any guarantee that the binary format won't change from release to release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Florian Weimer
Date:
* Robert Haas: >> In this particular case, I knew that the change was coming and could >> push updated Java and Perl client libraries well before the server-side >> change hit our internal repository, but I really don't want to have to >> pay attention to such details. > > But if we *don't* turn this on by default, then chances are very good > that it will get much less use. That doesn't seem good either. If > it's important enough to do it at all, then IMHO it's important enough > for it to be turned on by default. We have never made any guarantee > that the binary format won't change from release to release. The problem here are libpq-style drivers which expose the binary format to the application. The Java driver doesn't do that, but the Perl driver does. (However, both transparently decode BYTEA values received in text format, which led to the compatibility issue.) If you've version negotiation and you don't expose the binary format, then all clients can use the most efficient format automatically. If I understand things correctly, this is where the JDBC driver is heading (that is, automatic use of binary format). -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Robert Haas
Date:
On Mon, Jan 23, 2012 at 9:36 AM, Florian Weimer <fweimer@bfk.de> wrote: > * Robert Haas: >>> In this particular case, I knew that the change was coming and could >>> push updated Java and Perl client libraries well before the server-side >>> change hit our internal repository, but I really don't want to have to >>> pay attention to such details. >> >> But if we *don't* turn this on by default, then chances are very good >> that it will get much less use. That doesn't seem good either. If >> it's important enough to do it at all, then IMHO it's important enough >> for it to be turned on by default. We have never made any guarantee >> that the binary format won't change from release to release. > > The problem here are libpq-style drivers which expose the binary format > to the application. The Java driver doesn't do that, but the Perl > driver does. (However, both transparently decode BYTEA values received > in text format, which led to the compatibility issue.) I can see where that could cause some headaches... but it seems to me that if we take that concern seriously, it brings us right back to square one. If breaking the binary format causes too much pain to actually go do it, then we shouldn't change it until we're breaking everything else anyway (i.e. new protocol version, as Tom suggested). Even if you have version negotiation, it doesn't help that much in this scenario. Drivers that know about the new protocol can theoretically negotiate upward, but if they expose the binary format to their users in turn then it's a can of worms: they then need to negotiate with their client applications to see what version the client is OK with, and then turn around and negotiate with the server to that level and not higher. That strikes me as a lot of work for a lot of people given the amount of improvement we can expect out of this one change. I think it's also worth noting that a GUC is not really a good mechanism for negotiating the protocol version. GUCs can be changed by the local administrator on a server-wide (or per-user, or per-database, or per-user-and-database) basis, and that's not what you want for a protocol negotiation. Every client will have to query the server version and then decide whether to try to change it, and handle errors if that fails. All of that is going to add start-up overhead to connections, which will need to make multiple round trips to get everything straightened out. I think if the only way to make this change without excessive pain is by having a good mechanism for negotiating the protocol version, then we need to defer it to a future release. Now is not the time to invent entirely new mechanisms, especially around just one example. I'm OK with breaking it and adding a GUC for backward-compatibility, but so far the other suggestions strike me as being not convincingly well-enough engineered to stand the test of time, and whatever we do here is going to be with us for quite a while. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Noah Misch
Date:
On Mon, Jan 23, 2012 at 10:34:12AM -0500, Robert Haas wrote: > On Mon, Jan 23, 2012 at 9:36 AM, Florian Weimer <fweimer@bfk.de> wrote: > > * Robert Haas: > >>> In this particular case, I knew that the change was coming and could > >>> push updated Java and Perl client libraries well before the server-side > >>> change hit our internal repository, but I really don't want to have to > >>> pay attention to such details. > >> > >> But if we *don't* turn this on by default, then chances are very good > >> that it will get much less use. ?That doesn't seem good either. ?If > >> it's important enough to do it at all, then IMHO it's important enough > >> for it to be turned on by default. ?We have never made any guarantee > >> that the binary format won't change from release to release. > > > > The problem here are libpq-style drivers which expose the binary format > > to the application. ?The Java driver doesn't do that, but the Perl > > driver does. ?(However, both transparently decode BYTEA values received > > in text format, which led to the compatibility issue.) > > I can see where that could cause some headaches... but it seems to me > that if we take that concern seriously, it brings us right back to > square one. If breaking the binary format causes too much pain to > actually go do it, then we shouldn't change it until we're breaking > everything else anyway (i.e. new protocol version, as Tom suggested). As I said upthread, and you appeared to agree, the protocol is independent of individual data type send/recv formats. Even if we were already adding protocol v4 to PostgreSQL 9.2, having array_send() change its behavior in response to the active protocol version would be wrong.
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Marko Kreen
Date:
On Mon, Jan 23, 2012 at 5:34 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jan 23, 2012 at 9:36 AM, Florian Weimer <fweimer@bfk.de> wrote: >> * Robert Haas: >>>> In this particular case, I knew that the change was coming and could >>>> push updated Java and Perl client libraries well before the server-side >>>> change hit our internal repository, but I really don't want to have to >>>> pay attention to such details. >>> >>> But if we *don't* turn this on by default, then chances are very good >>> that it will get much less use. That doesn't seem good either. If >>> it's important enough to do it at all, then IMHO it's important enough >>> for it to be turned on by default. We have never made any guarantee >>> that the binary format won't change from release to release. >> >> The problem here are libpq-style drivers which expose the binary format >> to the application. The Java driver doesn't do that, but the Perl >> driver does. (However, both transparently decode BYTEA values received >> in text format, which led to the compatibility issue.) > > I can see where that could cause some headaches... but it seems to me > that if we take that concern seriously, it brings us right back to > square one. If breaking the binary format causes too much pain to > actually go do it, then we shouldn't change it until we're breaking > everything else anyway (i.e. new protocol version, as Tom suggested). My suggestion - please avoid per-session-protocol. Either something is Postgres version-dependent or it can be toggled/tracked per request. That means any data can either be passed through, or you need to understand formats of Postgres version X.Y. This kind of hints at per-request "gimme-formats-from-version-x.y" flag for ExecuteV4 packet. Or some equivalent of it. Anything that cannot be processed without tracking per-session state over whole stack (poolers/client frameworks) is major pain to maintain. -- marko
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Robert Haas
Date:
On Mon, Jan 23, 2012 at 11:15 AM, Noah Misch <noah@leadboat.com> wrote: > As I said upthread, and you appeared to agree, the protocol is independent of > individual data type send/recv formats. Even if we were already adding > protocol v4 to PostgreSQL 9.2, having array_send() change its behavior in > response to the active protocol version would be wrong. Sure, it's not directly related to the active protocol version, but my point is that we need to decide whether we need an autonegotiation mechanism or some kind, or not. We can reasonably decide that: 1. It's OK to change the binary format incompatibly, and clients must be prepared to deal with that, possibly assisted by a backward-compatibility GUC. -or else- 2. It's not OK to change the binary format incompatibility, and therefore we need some kind of negotiation mechanism to make sure that we give the new and improved format only to clients that can cope with it. Not being responsible from the maintenance of any PostgreSQL drivers whatsoever, I don't have a strong feeling about which of these is the case, and I'd like us to hear from the people who do. What I do think is that we can't look at a GUC (however named) as a poor man's replacement for #2. It's not gonna work, or at least not very well. If the default is off, then clients have to go through a round-trip to turn it on, which means that every client will have to decide whether to pay the price of turning it on (and possibly not recoup the investment) or whether to leave it off (and possibly get hosed if many large arrays that would have met the criteria for the optimization are transferred). Furthermore, if we turn it on by default, drivers and applications that haven't been updated will deliver corrupted results.None of that sounds very good to me. If there areenough dependencies on the details of the binary format that we can't afford to just change it, then we'd better have a cheap and reliable way for clients to negotiate upward - and a GUC is not going to give us that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Merlin Moncure
Date:
On Mon, Jan 23, 2012 at 3:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Not being responsible from the maintenance of any PostgreSQL drivers > whatsoever, I don't have a strong feeling about which of these is the > case, and I'd like us to hear from the people who do. I'm just gonna come right out and say that GUC-based solutions are not the way -- bytea encoding change is a perfect example of that. IMSNHO, there's only two plausible paths ahead: 1) document the binary formats and continue with 'go at your own risk' 2) full auto-negotiation for all wire formats (text and binary). merlin
Re: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
From
Marko Kreen
Date:
On Mon, Jan 23, 2012 at 11:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jan 23, 2012 at 11:15 AM, Noah Misch <noah@leadboat.com> wrote: >> As I said upthread, and you appeared to agree, the protocol is independent of >> individual data type send/recv formats. Even if we were already adding >> protocol v4 to PostgreSQL 9.2, having array_send() change its behavior in >> response to the active protocol version would be wrong. > > Sure, it's not directly related to the active protocol version, but my > point is that we need to decide whether we need an autonegotiation > mechanism or some kind, or not. We can reasonably decide that: > > 1. It's OK to change the binary format incompatibly, and clients must > be prepared to deal with that, possibly assisted by a > backward-compatibility GUC. > > -or else- > > 2. It's not OK to change the binary format incompatibility, and > therefore we need some kind of negotiation mechanism to make sure that > we give the new and improved format only to clients that can cope with > it. > > Not being responsible from the maintenance of any PostgreSQL drivers > whatsoever, I don't have a strong feeling about which of these is the > case, and I'd like us to hear from the people who do. What I do think > is that we can't look at a GUC (however named) as a poor man's > replacement for #2. It's not gonna work, or at least not very well. > If the default is off, then clients have to go through a round-trip to > turn it on, which means that every client will have to decide whether > to pay the price of turning it on (and possibly not recoup the > investment) or whether to leave it off (and possibly get hosed if many > large arrays that would have met the criteria for the optimization are > transferred). Furthermore, if we turn it on by default, drivers and > applications that haven't been updated will deliver corrupted results. > None of that sounds very good to me. If there are enough > dependencies on the details of the binary format that we can't afford > to just change it, then we'd better have a cheap and reliable way for > clients to negotiate upward - and a GUC is not going to give us that. Trying to solve it with startup-time negotiation, or some GUC is a dead end, in the sense that it will actively discourage any kind of pass-through protocol processing. If simple protocol processor (~pgbouncer) needs to know about some GUC, and tune it on-the-fly, it's not a payload feature, it's a protocol feature. Instead this should be solved with extending the per-query text/bin flag to include version info and maybe also type groups. Some way of saying "numerics:9.0-bin", "the-rest:8.4-text". The groups would also solve the problem with no way of turning on binary formats on result columns safely. The flags could be text (~http accept), or maybe integers for more efficiency (group code: group format ver). -- marko