Thread: Binary support for pgoutput plugin
Is there a reason why pgoutput sends data in text format? Seems to me that sending data in binary would provide a considerable performance improvement.
Dave Cramer
On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote: > Is there a reason why pgoutput sends data in text format? Seems to > me that sending data in binary would provide a considerable > performance improvement. Are you seeing something that suggests that the text output is taking a lot of time or other resources? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Dave Cramer
On Mon, 3 Jun 2019 at 20:54, David Fetter <david@fetter.org> wrote:
On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> Is there a reason why pgoutput sends data in text format? Seems to
> me that sending data in binary would provide a considerable
> performance improvement.
Are you seeing something that suggests that the text output is taking
a lot of time or other resources?
Actually it's on the other end that there is improvement. Parsing text takes much longer for almost everything except ironically text.
To be more transparent there is some desire to use pgoutput for something other than logical replication. Change Data Capture clients such as Debezium have a requirement for a stable plugin which is shipped with core as this is always available in cloud providers offerings. There's no reason that I am aware of that they cannot use pgoutput for this. There's also no reason that I am aware that binary outputs can't be supported. The protocol would have to change slightly and I am working on a POC patch.
Thing is they aren't all written in C so using binary does provide a pretty substantial win on the decoding end.
Dave
Hi, On 2019-06-04 15:47:04 -0400, Dave Cramer wrote: > On Mon, 3 Jun 2019 at 20:54, David Fetter <david@fetter.org> wrote: > > > On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote: > > > Is there a reason why pgoutput sends data in text format? Seems to > > > me that sending data in binary would provide a considerable > > > performance improvement. > > > > Are you seeing something that suggests that the text output is taking > > a lot of time or other resources? > > > > Actually it's on the other end that there is improvement. Parsing text > takes much longer for almost everything except ironically text. It's on both sides, I'd say. E.g. float (until v12), timestamp, bytea are all much more expensive to convert from binary to text. > To be more transparent there is some desire to use pgoutput for something > other than logical replication. Change Data Capture clients such as > Debezium have a requirement for a stable plugin which is shipped with core > as this is always available in cloud providers offerings. There's no reason > that I am aware of that they cannot use pgoutput for this. Except that that's not pgoutput's purpose, and we shouldn't make it meaningfully more complicated or slower to achieve this. Don't think there's a conflict in this case though. > There's also no reason that I am aware that binary outputs can't be > supported. Well, it *does* increase version dependencies, and does make replication more complicated, because type oids etc cannot be relied to be the same on source and target side. > The protocol would have to change slightly and I am working > on a POC patch. Hm, what would have to be changed protocol wise? IIRC that'd just be a different datum type? Or is that what you mean? pq_sendbyte(out, 't'); /* 'text' data follows */ IIRC there was code for the binary protocol in a predecessor of pgoutput. I think if we were to add binary output - and I think we should - we ought to only accept a patch if it's also used in core. Greetings, Andres Freund
Dave Cramer
On Tue, 4 Jun 2019 at 16:30, Andres Freund <andres.freund@enterprisedb.com> wrote:
Hi,
On 2019-06-04 15:47:04 -0400, Dave Cramer wrote:
> On Mon, 3 Jun 2019 at 20:54, David Fetter <david@fetter.org> wrote:
>
> > On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> > > Is there a reason why pgoutput sends data in text format? Seems to
> > > me that sending data in binary would provide a considerable
> > > performance improvement.
> >
> > Are you seeing something that suggests that the text output is taking
> > a lot of time or other resources?
> >
> > Actually it's on the other end that there is improvement. Parsing text
> takes much longer for almost everything except ironically text.
It's on both sides, I'd say. E.g. float (until v12), timestamp, bytea
are all much more expensive to convert from binary to text.
> To be more transparent there is some desire to use pgoutput for something
> other than logical replication. Change Data Capture clients such as
> Debezium have a requirement for a stable plugin which is shipped with core
> as this is always available in cloud providers offerings. There's no reason
> that I am aware of that they cannot use pgoutput for this.
Except that that's not pgoutput's purpose, and we shouldn't make it
meaningfully more complicated or slower to achieve this. Don't think
there's a conflict in this case though.
agreed, my intent was to slightly bend it to my will :)
> There's also no reason that I am aware that binary outputs can't be
> supported.
Well, it *does* increase version dependencies, and does make replication
more complicated, because type oids etc cannot be relied to be the same
on source and target side.
I was about to agree with this but if the type oids change from source to target you
still can't decode the text version properly. Unless I mis-understand something here ?
> The protocol would have to change slightly and I am working
> on a POC patch.
Hm, what would have to be changed protocol wise? IIRC that'd just be a
different datum type? Or is that what you mean?
pq_sendbyte(out, 't'); /* 'text' data follows */
I haven't really thought this through completely but one place JDBC has problems with binary is with
timestamps with timezone as we don't know which timezone to use. Is it safe to assume everything is in UTC
since the server stores in UTC ? Then there are UDF's. My original thought was to use options to send in the
types that I wanted in binary, everything else could be sent as text.
IIRC there was code for the binary protocol in a predecessor of
pgoutput.
Hmmm that might be good place to start. I will do some digging through git history
I think if we were to add binary output - and I think we should - we
ought to only accept a patch if it's also used in core.
Certainly; as not doing so would make my work completely irrelevant for my purpose.
Thanks,
Dave
Hi, On 2019-06-04 16:39:32 -0400, Dave Cramer wrote: > On Tue, 4 Jun 2019 at 16:30, Andres Freund <andres.freund@enterprisedb.com> > wrote: > > > There's also no reason that I am aware that binary outputs can't be > > > supported. > > > > Well, it *does* increase version dependencies, and does make replication > > more complicated, because type oids etc cannot be relied to be the same > > on source and target side. > > > I was about to agree with this but if the type oids change from source > to target you still can't decode the text version properly. Unless I > mis-understand something here ? The text format doesn't care about oids. I don't see how it'd be a problem? Note that some people *intentionally* use different types from source to target system when logically replicating. So you can't rely on the target table's types under any circumstance. I think you really have to use the textual type which we already write out (cf logicalrep_write_typ()) to call the binary input functions. And you can send only data as binary that's from builtin types - otherwise there's no guarantee at all that the target system has something compatible. And even if you just assumed that all extensions etc are present, you can't transport arrays / composite types in binary: For hard to discern reasons we a) embed type oids in them b) verify them. b) won't ever work for non-builtin types, because oids are assigned dynamically. > > I think if we were to add binary output - and I think we should - we > > ought to only accept a patch if it's also used in core. > > > > Certainly; as not doing so would make my work completely irrelevant for my > purpose. What I mean is that the builtin logical replication would have to use this on the receiving side too. Greetings, Andres Freund
On Tue, 4 Jun 2019 at 16:46, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-06-04 16:39:32 -0400, Dave Cramer wrote:
> On Tue, 4 Jun 2019 at 16:30, Andres Freund <andres.freund@enterprisedb.com>
> wrote:
> > > There's also no reason that I am aware that binary outputs can't be
> > > supported.
> >
> > Well, it *does* increase version dependencies, and does make replication
> > more complicated, because type oids etc cannot be relied to be the same
> > on source and target side.
> >
> I was about to agree with this but if the type oids change from source
> to target you still can't decode the text version properly. Unless I
> mis-understand something here ?
The text format doesn't care about oids. I don't see how it'd be a
problem? Note that some people *intentionally* use different types from
source to target system when logically replicating. So you can't rely on
the target table's types under any circumstance.
I think you really have to use the textual type which we already write
out (cf logicalrep_write_typ()) to call the binary input functions. And
you can send only data as binary that's from builtin types - otherwise
there's no guarantee at all that the target system has something
compatible. And even if you just assumed that all extensions etc are
present, you can't transport arrays / composite types in binary: For
hard to discern reasons we a) embed type oids in them b) verify them. b)
won't ever work for non-builtin types, because oids are assigned
dynamically.
I figured arrays and UDT's would be problematic.
> > I think if we were to add binary output - and I think we should - we
> > ought to only accept a patch if it's also used in core.
> >
>
> Certainly; as not doing so would make my work completely irrelevant for my
> purpose.
What I mean is that the builtin logical replication would have to use
this on the receiving side too.
Got it, thanks for validating that the idea isn't nuts. Now I *have* to produce a POC.
Thanks,
Dave
On 6/4/19 4:39 PM, Dave Cramer wrote: > I haven't really thought this through completely but one place JDBC has > problems with binary is with > timestamps with timezone as we don't know which timezone to use. Is it safe > to assume everything is in UTC > since the server stores in UTC ? PL/Java, when converting to the Java 8 java.time types (because those are sane), will turn a timestamp with timezone into an OffsetDateTime with explicit offset zero (UTC), no matter what timezone may have been used when the value was input (as you've observed, there's no way to recover that). In the return direction, if given an OffsetDateTime with any nonzero offset, it will adjust the value to UTC for postgres. So, yes, say I. Regards, -Chap
On Tue, Jun 04, 2019 at 04:55:33PM -0400, Dave Cramer wrote: > On Tue, 4 Jun 2019 at 16:46, Andres Freund <andres@anarazel.de> wrote: > > > Hi, > > > > On 2019-06-04 16:39:32 -0400, Dave Cramer wrote: > > > On Tue, 4 Jun 2019 at 16:30, Andres Freund < > > andres.freund@enterprisedb.com> > > > wrote: > > > > > There's also no reason that I am aware that binary outputs can't be > > > > > supported. > > > > > > > > Well, it *does* increase version dependencies, and does make > > replication > > > > more complicated, because type oids etc cannot be relied to be the same > > > > on source and target side. > > > > > > > I was about to agree with this but if the type oids change from source > > > to target you still can't decode the text version properly. Unless I > > > mis-understand something here ? > > > > The text format doesn't care about oids. I don't see how it'd be a > > problem? Note that some people *intentionally* use different types from > > source to target system when logically replicating. So you can't rely on > > the target table's types under any circumstance. > > > > I think you really have to use the textual type which we already write > > out (cf logicalrep_write_typ()) to call the binary input functions. And > > you can send only data as binary that's from builtin types - otherwise > > there's no guarantee at all that the target system has something > > compatible. And even if you just assumed that all extensions etc are > > present, you can't transport arrays / composite types in binary: For > > hard to discern reasons we a) embed type oids in them b) verify them. b) > > won't ever work for non-builtin types, because oids are assigned > > dynamically. > > > > I figured arrays and UDT's would be problematic. Would it make sense to work toward a binary format that's not architecture-specific? I recall from COPY that our binary format is not standardized across, for example, big- and little-endian machines. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi, On 2019-06-05 00:05:02 +0200, David Fetter wrote: > Would it make sense to work toward a binary format that's not > architecture-specific? I recall from COPY that our binary format is > not standardized across, for example, big- and little-endian machines. I think you recall wrongly. It's obviously possible that we have bugs around this, but output/input routines are supposed to handle a endianess independent format. That usually means that you have to do endianess conversions, but that doesn't make it non-standardized. - Andres
On Tue, 4 Jun 2019 at 18:08, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-06-05 00:05:02 +0200, David Fetter wrote:
> Would it make sense to work toward a binary format that's not
> architecture-specific? I recall from COPY that our binary format is
> not standardized across, for example, big- and little-endian machines.
I think you recall wrongly. It's obviously possible that we have bugs
around this, but output/input routines are supposed to handle a
endianess independent format. That usually means that you have to do
endianess conversions, but that doesn't make it non-standardized.
Additionally there are a number of drivers that already know how to handle our binary types.
I don't really think there's a win here. I also want to keep the changes small .
Dave
Hi, On 05/06/2019 00:08, Andres Freund wrote: > Hi, > > On 2019-06-05 00:05:02 +0200, David Fetter wrote: >> Would it make sense to work toward a binary format that's not >> architecture-specific? I recall from COPY that our binary format is >> not standardized across, for example, big- and little-endian machines. > > I think you recall wrongly. It's obviously possible that we have bugs > around this, but output/input routines are supposed to handle a > endianess independent format. That usually means that you have to do > endianess conversions, but that doesn't make it non-standardized. > Yeah, there are really 3 formats of data we have, text protocol, binary network protocol and internal on disk format. The internal on disk format will not work across big/little-endian but network binary protocol will. FWIW I don't think the code for binary format was included in original logical replication patch (I really tried to keep it as minimal as possible), but the code and protocol is pretty much ready for adding that. That said, pglogical has code which handles this (I guess Andres means that by predecessor of pgoutput) so if you look for example at the write_tuple/read_tuple/decide_datum_transfer in https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c that can help you give some ideas on how to approach this. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions https://www.2ndQuadrant.com/
Hi,
On Wed, 5 Jun 2019 at 07:18, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
Hi,
On 05/06/2019 00:08, Andres Freund wrote:
> Hi,
>
> On 2019-06-05 00:05:02 +0200, David Fetter wrote:
>> Would it make sense to work toward a binary format that's not
>> architecture-specific? I recall from COPY that our binary format is
>> not standardized across, for example, big- and little-endian machines.
>
> I think you recall wrongly. It's obviously possible that we have bugs
> around this, but output/input routines are supposed to handle a
> endianess independent format. That usually means that you have to do
> endianess conversions, but that doesn't make it non-standardized.
>
Yeah, there are really 3 formats of data we have, text protocol, binary
network protocol and internal on disk format. The internal on disk
format will not work across big/little-endian but network binary
protocol will.
FWIW I don't think the code for binary format was included in original
logical replication patch (I really tried to keep it as minimal as
possible), but the code and protocol is pretty much ready for adding that.
Yes, I looked through the public history and could not find it. Thanks for confirming.
That said, pglogical has code which handles this (I guess Andres means
that by predecessor of pgoutput) so if you look for example at the
write_tuple/read_tuple/decide_datum_transfer in
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c
that can help you give some ideas on how to approach this.
Thanks for the tip!
Dave Cramer
On Wed, 5 Jun 2019 at 07:21, Dave Cramer <davecramer@gmail.com> wrote:
Hi,On Wed, 5 Jun 2019 at 07:18, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:Hi,
On 05/06/2019 00:08, Andres Freund wrote:
> Hi,
>
> On 2019-06-05 00:05:02 +0200, David Fetter wrote:
>> Would it make sense to work toward a binary format that's not
>> architecture-specific? I recall from COPY that our binary format is
>> not standardized across, for example, big- and little-endian machines.
>
> I think you recall wrongly. It's obviously possible that we have bugs
> around this, but output/input routines are supposed to handle a
> endianess independent format. That usually means that you have to do
> endianess conversions, but that doesn't make it non-standardized.
>
Yeah, there are really 3 formats of data we have, text protocol, binary
network protocol and internal on disk format. The internal on disk
format will not work across big/little-endian but network binary
protocol will.
FWIW I don't think the code for binary format was included in original
logical replication patch (I really tried to keep it as minimal as
possible), but the code and protocol is pretty much ready for adding that.Yes, I looked through the public history and could not find it. Thanks for confirming.
That said, pglogical has code which handles this (I guess Andres means
that by predecessor of pgoutput) so if you look for example at the
write_tuple/read_tuple/decide_datum_transfer in
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c
that can help you give some ideas on how to approach this.Thanks for the tip!
Looking at:
this seems completely ignored. What was the intent?
Dave
Hi On June 5, 2019 8:51:10 AM PDT, Dave Cramer <davecramer@gmail.com> wrote: >On Wed, 5 Jun 2019 at 07:21, Dave Cramer <davecramer@gmail.com> wrote: > >> Hi, >> >> >> On Wed, 5 Jun 2019 at 07:18, Petr Jelinek ><petr.jelinek@2ndquadrant.com> >> wrote: >> >>> Hi, >>> >>> On 05/06/2019 00:08, Andres Freund wrote: >>> > Hi, >>> > >>> > On 2019-06-05 00:05:02 +0200, David Fetter wrote: >>> >> Would it make sense to work toward a binary format that's not >>> >> architecture-specific? I recall from COPY that our binary format >is >>> >> not standardized across, for example, big- and little-endian >machines. >>> > >>> > I think you recall wrongly. It's obviously possible that we have >bugs >>> > around this, but output/input routines are supposed to handle a >>> > endianess independent format. That usually means that you have to >do >>> > endianess conversions, but that doesn't make it non-standardized. >>> > >>> >>> Yeah, there are really 3 formats of data we have, text protocol, >binary >>> network protocol and internal on disk format. The internal on disk >>> format will not work across big/little-endian but network binary >>> protocol will. >>> >>> FWIW I don't think the code for binary format was included in >original >>> logical replication patch (I really tried to keep it as minimal as >>> possible), but the code and protocol is pretty much ready for adding >that. >>> >> Yes, I looked through the public history and could not find it. >Thanks for >> confirming. >> >>> >>> That said, pglogical has code which handles this (I guess Andres >means >>> that by predecessor of pgoutput) so if you look for example at the >>> write_tuple/read_tuple/decide_datum_transfer in >>> >>> >https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c >>> that can help you give some ideas on how to approach this. >>> >> >> Thanks for the tip! >> > >Looking at: >https://github.com/postgres/postgres/blob/8255c7a5eeba8f1a38b7a431c04909bde4f5e67d/src/backend/replication/pgoutput/pgoutput.c#L163 > >this seems completely ignored. What was the intent? That's about the output of the plugin, not the datatypes. And independent of text/binary output, the protocol contains non-printablechars. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi,
On Wed, 5 Jun 2019 at 12:01, Andres Freund <andres@anarazel.de> wrote:
Hi
On June 5, 2019 8:51:10 AM PDT, Dave Cramer <davecramer@gmail.com> wrote:
>On Wed, 5 Jun 2019 at 07:21, Dave Cramer <davecramer@gmail.com> wrote:
>
>> Hi,
>>
>>
>> On Wed, 5 Jun 2019 at 07:18, Petr Jelinek
><petr.jelinek@2ndquadrant.com>
>> wrote:
>>
>>> Hi,
>>>
>>> On 05/06/2019 00:08, Andres Freund wrote:
>>> > Hi,
>>> >
>>> > On 2019-06-05 00:05:02 +0200, David Fetter wrote:
>>> >> Would it make sense to work toward a binary format that's not
>>> >> architecture-specific? I recall from COPY that our binary format
>is
>>> >> not standardized across, for example, big- and little-endian
>machines.
>>> >
>>> > I think you recall wrongly. It's obviously possible that we have
>bugs
>>> > around this, but output/input routines are supposed to handle a
>>> > endianess independent format. That usually means that you have to
>do
>>> > endianess conversions, but that doesn't make it non-standardized.
>>> >
>>>
>>> Yeah, there are really 3 formats of data we have, text protocol,
>binary
>>> network protocol and internal on disk format. The internal on disk
>>> format will not work across big/little-endian but network binary
>>> protocol will.
>>>
>>> FWIW I don't think the code for binary format was included in
>original
>>> logical replication patch (I really tried to keep it as minimal as
>>> possible), but the code and protocol is pretty much ready for adding
>that.
>>>
>> Yes, I looked through the public history and could not find it.
>Thanks for
>> confirming.
>>
>>>
>>> That said, pglogical has code which handles this (I guess Andres
>means
>>> that by predecessor of pgoutput) so if you look for example at the
>>> write_tuple/read_tuple/decide_datum_transfer in
>>>
>>>
>https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c
>>> that can help you give some ideas on how to approach this.
>>>
>>
>> Thanks for the tip!
>>
>
>Looking at:
>https://github.com/postgres/postgres/blob/8255c7a5eeba8f1a38b7a431c04909bde4f5e67d/src/backend/replication/pgoutput/pgoutput.c#L163
>
>this seems completely ignored. What was the intent?
That's about the output of the plugin, not the datatypes. And independent of text/binary output, the protocol contains non-printable chars.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
So one of the things they would like added is to get not null information in the schema record. This is so they can mark the field Optional in Java. I presume this would also have some uses in other languages. As I understand it this would require a protocol bump. If this were to be accepted are there any outstanding asks that would useful to add if we were going to bump the protocol?
Dave
Hi, On 2019-06-05 18:47:57 -0400, Dave Cramer wrote: > So one of the things they would like added is to get not null information > in the schema record. This is so they can mark the field Optional in Java. > I presume this would also have some uses in other languages. As I > understand it this would require a protocol bump. If this were to be > accepted are there any outstanding asks that would useful to add if we were > going to bump the protocol? I'm pretty strongly opposed to this. What's the limiting factor when adding such information? I think clients that want something like this ought to query the database for catalog information when getting schema information. Greetings, Andres Freund
Hi,
On Wed, 5 Jun 2019 at 18:50, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-06-05 18:47:57 -0400, Dave Cramer wrote:
> So one of the things they would like added is to get not null information
> in the schema record. This is so they can mark the field Optional in Java.
> I presume this would also have some uses in other languages. As I
> understand it this would require a protocol bump. If this were to be
> accepted are there any outstanding asks that would useful to add if we were
> going to bump the protocol?
I'm pretty strongly opposed to this. What's the limiting factor when
adding such information? I think clients that want something like this
ought to query the database for catalog information when getting schema
information.
I'm not intimately familiar with their code. I will query them more about the ask.
I am curious why you are "strongly" opposed however. We already have the information. Adding doesn't seem onerous.
Dave
Hi, On 2019-06-05 19:05:05 -0400, Dave Cramer wrote: > I am curious why you are "strongly" opposed however. We already have the > information. Adding doesn't seem onerous. (thought I'd already replied with this) The problem is that I don't recognize a limiting principle: If we want NOT NULL information for clients, why don't we include the underlying types for arrays, and the fields in composite types? What about foreign keys? And unique keys? And then we suddenly need tracking for all these, so we don't always send out that information when we previously already did - and in some of the cases there's no infrastructure for that. I just don't quite buy that the output plugin build for pg's logical replication needs is a good place to include a continually increasing amount of metadata that logical replication doesn't need. That's going to add overhead and make the code more complicated. Greetings, Andres Freund
On 06/07/19 19:27, Andres Freund wrote: > The problem is that I don't recognize a limiting principle: > > If we want NOT NULL information for clients, why don't we include the > underlying types for arrays, and the fields in composite types? What > about foreign keys? And unique keys? This reminds me of an idea I had for a future fe/be protocol version, right after a talk by Alyssa Ritchie and Henrietta Dombrovskaya at the last 2Q PGConf. [1] It seems they had ended up designing a whole 'nother "protocol level" involving queries wrapping their results as JSON and an app layer that unwraps again, after trying a simpler first approach that was foiled by the inability to see into arrays and anonymous record types in the 'describe' response. I thought, in a new protocol rev, why not let the driver send additional 'describe' messages after the first one, to drill into structure of individual columns mentioned in the first response, before sending the 'execute' message? If it doesn't want the further detail, it doesn't have to ask. > And then we suddenly need tracking for all these, so we don't always > send out that information when we previously already did If it's up to the client driver, it can track what it needs or already has. I haven't looked too deeply into the replication protocol ... it happens under a kind of copy-both, right?, so maybe there's a way for the receiver to send some inquiries back, but maybe in a windowed, full-duplex way where it might have to buffer some incoming messages before getting the response to an inquiry message it sent. Would those be thinkable thoughts for a future protocol rev? Regards, -Chap [1] https://www.2qpgconf.com/schedule/information-exchange-techniques-for-javapostgresql-applications/
Hi, On 2019-06-07 20:52:38 -0400, Chapman Flack wrote: > It seems they had ended up designing a whole 'nother "protocol level" > involving queries wrapping their results as JSON and an app layer that > unwraps again, after trying a simpler first approach that was foiled by the > inability to see into arrays and anonymous record types in the 'describe' > response. I suspect quite a few people would have to have left the projectbefore this would happen. > I thought, in a new protocol rev, why not let the driver send additional > 'describe' messages after the first one, to drill into structure of > individual columns mentioned in the first response, before sending the > 'execute' message? > > If it doesn't want the further detail, it doesn't have to ask. > > > And then we suddenly need tracking for all these, so we don't always > > send out that information when we previously already did > > If it's up to the client driver, it can track what it needs or already has. > I haven't looked too deeply into the replication protocol ... it happens > under a kind of copy-both, right?, so maybe there's a way for the receiver > to send some inquiries back, but maybe in a windowed, full-duplex way where > it might have to buffer some incoming messages before getting the response > to an inquiry message it sent. That'd be a *lot* of additional complexity, and pretty much prohibitive from a performance POV. We'd have to not continue decoding on the server side *all* the time to give the client a chance to inquire additional information. Greetings, Andres Freund
On 06/07/19 21:01, Andres Freund wrote: > On 2019-06-07 20:52:38 -0400, Chapman Flack wrote: >> It seems they had ended up designing a whole 'nother "protocol level" >> involving queries wrapping their results as JSON and an app layer that >> unwraps again, after trying a simpler first approach that was foiled by the >> inability to see into arrays and anonymous record types in the 'describe' >> response. > > I suspect quite a few people would have to have left the projectbefore > this would happen. I'm not sure I understand what you're getting at. The "whole 'nother protocol" was something they actually implemented, at the application level, by rewriting their queries to produce JSON and their client to unwrap it. It wasn't proposed to go into postgres ... but it was a workaround they were forced into by the current protocol's inability to tell them what they were getting. > That'd be a *lot* of additional complexity, and pretty much prohibitive > from a performance POV. We'd have to not continue decoding on the server > side *all* the time to give the client a chance to inquire additional > information. Does anything travel in the client->server direction during replication? I thought I saw CopyBoth mentioned. Is there a select()/poll() being done so those messages can be received? It does seem that the replication protocol would be the tougher problem. For the regular extended-query protocol, it seems like allowing an extra Describe or two before Execute might not be awfully hard. Regards, -Chap
Hi, On 2019-06-07 21:16:12 -0400, Chapman Flack wrote: > On 06/07/19 21:01, Andres Freund wrote: > > On 2019-06-07 20:52:38 -0400, Chapman Flack wrote: > > That'd be a *lot* of additional complexity, and pretty much prohibitive > > from a performance POV. We'd have to not continue decoding on the server > > side *all* the time to give the client a chance to inquire additional > > information. > > Does anything travel in the client->server direction during replication? > I thought I saw CopyBoth mentioned. Is there a select()/poll() being done > so those messages can be received? Yes, acknowledgements of how far data has been received (and how far processed), which is then used to release resources (WAL, xid horizon) and allow synchronous replication to block until something has been received. - Andres
On Fri, Jun 07, 2019 at 06:01:12PM -0700, Andres Freund wrote: >Hi, > >On 2019-06-07 20:52:38 -0400, Chapman Flack wrote: >> It seems they had ended up designing a whole 'nother "protocol level" >> involving queries wrapping their results as JSON and an app layer that >> unwraps again, after trying a simpler first approach that was foiled by the >> inability to see into arrays and anonymous record types in the 'describe' >> response. > >I suspect quite a few people would have to have left the projectbefore >this would happen. > > >> I thought, in a new protocol rev, why not let the driver send additional >> 'describe' messages after the first one, to drill into structure of >> individual columns mentioned in the first response, before sending the >> 'execute' message? >> >> If it doesn't want the further detail, it doesn't have to ask. >> >> > And then we suddenly need tracking for all these, so we don't always >> > send out that information when we previously already did >> >> If it's up to the client driver, it can track what it needs or already has. > >> I haven't looked too deeply into the replication protocol ... it happens >> under a kind of copy-both, right?, so maybe there's a way for the receiver >> to send some inquiries back, but maybe in a windowed, full-duplex way where >> it might have to buffer some incoming messages before getting the response >> to an inquiry message it sent. > >That'd be a *lot* of additional complexity, and pretty much prohibitive >from a performance POV. We'd have to not continue decoding on the server >side *all* the time to give the client a chance to inquire additional >information. > I kinda agree with this, and I think it's an argument why replication solutions that need such additional metadata (e.g. because they have no database to query) should not rely on pgoutput but should invent their own decoding plugin. Which is why it's a plugin. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
This should have gone to hackers as well
---------- Forwarded message ---------
From: Dave Cramer <davecramer@gmail.com>
Date: Sat, Jun 8, 2019, 6:41 PM
Subject: Re: Binary support for pgoutput plugin
To: Tomas Vondra <tomas.vondra@2ndquadrant.com>
From: Dave Cramer <davecramer@gmail.com>
Date: Sat, Jun 8, 2019, 6:41 PM
Subject: Re: Binary support for pgoutput plugin
To: Tomas Vondra <tomas.vondra@2ndquadrant.com>
On Sat, Jun 8, 2019, 6:27 PM Tomas Vondra, <tomas.vondra@2ndquadrant.com> wrote:
On Fri, Jun 07, 2019 at 06:01:12PM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-06-07 20:52:38 -0400, Chapman Flack wrote:
>> It seems they had ended up designing a whole 'nother "protocol level"
>> involving queries wrapping their results as JSON and an app layer that
>> unwraps again, after trying a simpler first approach that was foiled by the
>> inability to see into arrays and anonymous record types in the 'describe'
>> response.
>
>I suspect quite a few people would have to have left the projectbefore
>this would happen.
>
>
>> I thought, in a new protocol rev, why not let the driver send additional
>> 'describe' messages after the first one, to drill into structure of
>> individual columns mentioned in the first response, before sending the
>> 'execute' message?
>>
>> If it doesn't want the further detail, it doesn't have to ask.
>>
>> > And then we suddenly need tracking for all these, so we don't always
>> > send out that information when we previously already did
>>
>> If it's up to the client driver, it can track what it needs or already has.
>
>> I haven't looked too deeply into the replication protocol ... it happens
>> under a kind of copy-both, right?, so maybe there's a way for the receiver
>> to send some inquiries back, but maybe in a windowed, full-duplex way where
>> it might have to buffer some incoming messages before getting the response
>> to an inquiry message it sent.
>
>That'd be a *lot* of additional complexity, and pretty much prohibitive
>from a performance POV. We'd have to not continue decoding on the server
>side *all* the time to give the client a chance to inquire additional
>information.
>
I kinda agree with this, and I think it's an argument why replication
solutions that need such additional metadata (e.g. because they have no
database to query) should not rely on pgoutput but should invent their own
decoding plugin. Which is why it's a plugin.
So the reason we are discussing using pgoutput plugin is because it is part of core and guaranteed to be in cloud providers solutions. I'm trying to find a balance here of using what we have as opposed to burdening core to take on additional code to take care of. Not sending the metadata is not a deal breaker but i can see some value in it.
Dave
Hi, On 2019-06-08 19:41:34 -0400, Dave Cramer wrote: > So the reason we are discussing using pgoutput plugin is because it is part > of core and guaranteed to be in cloud providers solutions. IMO people needing this should then band together and write one that's suitable, rather than trying to coerce test_decoding and now pgoutput into something they're not made for. Greetings, Andres Freund
On Sat, 8 Jun 2019 at 20:09, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-06-08 19:41:34 -0400, Dave Cramer wrote:
> So the reason we are discussing using pgoutput plugin is because it is part
> of core and guaranteed to be in cloud providers solutions.
IMO people needing this should then band together and write one that's
suitable, rather than trying to coerce test_decoding and now pgoutput
into something they're not made for.
At the moment it would look a lot like pgoutput. For now I'm fine with no changes to pgoutput other than binary
Once we have some real use cases we can look at writing a new one. I would imagine we would actually start with
pgoutput and just add to it.
Thanks,
Dave
On Sat, Jun 08, 2019 at 08:40:43PM -0400, Dave Cramer wrote: >On Sat, 8 Jun 2019 at 20:09, Andres Freund <andres@anarazel.de> wrote: > >> Hi, >> >> On 2019-06-08 19:41:34 -0400, Dave Cramer wrote: >> > So the reason we are discussing using pgoutput plugin is because it is >> part >> > of core and guaranteed to be in cloud providers solutions. >> >> IMO people needing this should then band together and write one that's >> suitable, rather than trying to coerce test_decoding and now pgoutput >> into something they're not made for. >> > >At the moment it would look a lot like pgoutput. For now I'm fine with no >changes to pgoutput other than binary >Once we have some real use cases we can look at writing a new one. I would >imagine we would actually start with >pgoutput and just add to it. > I understand the desire to make this work for managed cloud environments, we support quite a few customers who would benefit from it. But pgoutput is meant specifically for built-in replication, and adding complexity that is useless for that use case does not seem like a good tradeoff. From this POV the binary mode is fine, because it'd benefit pgoutput, but the various other stuff mentioned here (e.g. nullability) is not. And if we implement a new plugin for use by out-of-core stuff, I guess we'd probably done it in an extension. But even having it in contrib would not make it automatically installed on managed systems, because AFAIK the various providers only allow whitelisted extensions. At which point there's there's little difference compared to external extensions. I think the best party to implement such extension is whoever implements such replication system (say Debezium), because they are the ones who know which format / behavior would work for them. And they can also show the benefit to their users, who can then push the cloud providers to install the extension. Of course, that'll take a long time (but it's unclear how long), and until then they'll have to provide some fallback. This is a bit of a chicken-egg problem, with three parties - our project, projects building on logical replication and cloud providers. And no matter how you slice it, the party implementing it has only limited (if any) control over what the other parties allow. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
So back to binary output.
From what I can tell the place to specify binary options would be in the create publication and or in replication slots?
The challenge as I see it is that the subscriber would have to be able to decode binary output.
Any thoughts on how to handle this? At the moment I'm assuming that this would only work for subscribers that knew how to handle binary.
Regards,
Dave
Hi, On 10/06/2019 13:27, Dave Cramer wrote: > So back to binary output. > > From what I can tell the place to specify binary options would be in the > create publication and or in replication slots? > > The challenge as I see it is that the subscriber would have to be able > to decode binary output. > > Any thoughts on how to handle this? At the moment I'm assuming that this > would only work for subscribers that knew how to handle binary. > Given that we don't need to write anything extra to WAL on upstream to support binary output, why not just have the request for binary data as an option for the pgoutput and have it chosen dynamically? Then it's the subscriber who asks for binary output via option(s) to START_REPLICATION. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions https://www.2ndQuadrant.com/
OK, before I go too much further down this rabbit hole I'd like feedback on the current code. See attached patch
There is one obvious hack where in binary mode I reset the input cursor to allow the binary input to be re-read
From what I can tell the alternative is to convert the data in logicalrep_read_tuple but that would require moving a lot of the logic currently in worker.c to proto.c. This seems minimally invasive.
and thanks Petr for the tip to use pglogical for ideas.
Thanks,
Dave Cramer
Attachment
On Mon, 10 Jun 2019 at 07:49, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
Hi,
On 10/06/2019 13:27, Dave Cramer wrote:
> So back to binary output.
>
> From what I can tell the place to specify binary options would be in the
> create publication and or in replication slots?
>
> The challenge as I see it is that the subscriber would have to be able
> to decode binary output.
>
> Any thoughts on how to handle this? At the moment I'm assuming that this
> would only work for subscribers that knew how to handle binary.
>
Given that we don't need to write anything extra to WAL on upstream to
support binary output, why not just have the request for binary data as
an option for the pgoutput and have it chosen dynamically? Then it's the
subscriber who asks for binary output via option(s) to START_REPLICATION.
If I understand this correctly this would add something to the CREATE/ALTER SUBSCRIPTION commands in the WITH clause.
Additionally another column would be required for pg_subscription for the binary option.
Does it make sense to add an options column which would just be a comma separated string?
Not that I have future options in mind but seems like something that might come up in the future.
Dave Cramer
On Wed, Jun 12, 2019 at 10:35:48AM -0400, Dave Cramer wrote: >On Mon, 10 Jun 2019 at 07:49, Petr Jelinek <petr.jelinek@2ndquadrant.com> >wrote: > >> Hi, >> >> On 10/06/2019 13:27, Dave Cramer wrote: >> > So back to binary output. >> > >> > From what I can tell the place to specify binary options would be in the >> > create publication and or in replication slots? >> > >> > The challenge as I see it is that the subscriber would have to be able >> > to decode binary output. >> > >> > Any thoughts on how to handle this? At the moment I'm assuming that this >> > would only work for subscribers that knew how to handle binary. >> > >> >> Given that we don't need to write anything extra to WAL on upstream to >> support binary output, why not just have the request for binary data as >> an option for the pgoutput and have it chosen dynamically? Then it's the >> subscriber who asks for binary output via option(s) to START_REPLICATION. >> > >If I understand this correctly this would add something to the CREATE/ALTER >SUBSCRIPTION commands in the WITH clause. >Additionally another column would be required for pg_subscription for the >binary option. >Does it make sense to add an options column which would just be a comma >separated string? >Not that I have future options in mind but seems like something that might >come up in the future. > I'd just add a boolean column to the catalog. That's what I did in the patch adding support for streaming in-progress transactions. I don't think we expect many additional parameters, so it makes little sense to optimize for that case. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Dave Cramer
On Fri, 14 Jun 2019 at 14:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Wed, Jun 12, 2019 at 10:35:48AM -0400, Dave Cramer wrote:
>On Mon, 10 Jun 2019 at 07:49, Petr Jelinek <petr.jelinek@2ndquadrant.com>
>wrote:
>
>> Hi,
>>
>> On 10/06/2019 13:27, Dave Cramer wrote:
>> > So back to binary output.
>> >
>> > From what I can tell the place to specify binary options would be in the
>> > create publication and or in replication slots?
>> >
>> > The challenge as I see it is that the subscriber would have to be able
>> > to decode binary output.
>> >
>> > Any thoughts on how to handle this? At the moment I'm assuming that this
>> > would only work for subscribers that knew how to handle binary.
>> >
>>
>> Given that we don't need to write anything extra to WAL on upstream to
>> support binary output, why not just have the request for binary data as
>> an option for the pgoutput and have it chosen dynamically? Then it's the
>> subscriber who asks for binary output via option(s) to START_REPLICATION.
>>
>
>If I understand this correctly this would add something to the CREATE/ALTER
>SUBSCRIPTION commands in the WITH clause.
>Additionally another column would be required for pg_subscription for the
>binary option.
>Does it make sense to add an options column which would just be a comma
>separated string?
>Not that I have future options in mind but seems like something that might
>come up in the future.
>
I'd just add a boolean column to the catalog. That's what I did in the
patch adding support for streaming in-progress transactions. I don't think
we expect many additional parameters, so it makes little sense to optimize
for that case.
Which is what I have done. Thanks
I've attached both patches for comments.
I still have to add documentation.
Regards,
Dave
Attachment
On Fri, 14 Jun 2019 at 15:42, Dave Cramer <davecramer@gmail.com> wrote:
Dave CramerOn Fri, 14 Jun 2019 at 14:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:On Wed, Jun 12, 2019 at 10:35:48AM -0400, Dave Cramer wrote:
>On Mon, 10 Jun 2019 at 07:49, Petr Jelinek <petr.jelinek@2ndquadrant.com>
>wrote:
>
>> Hi,
>>
>> On 10/06/2019 13:27, Dave Cramer wrote:
>> > So back to binary output.
>> >
>> > From what I can tell the place to specify binary options would be in the
>> > create publication and or in replication slots?
>> >
>> > The challenge as I see it is that the subscriber would have to be able
>> > to decode binary output.
>> >
>> > Any thoughts on how to handle this? At the moment I'm assuming that this
>> > would only work for subscribers that knew how to handle binary.
>> >
>>
>> Given that we don't need to write anything extra to WAL on upstream to
>> support binary output, why not just have the request for binary data as
>> an option for the pgoutput and have it chosen dynamically? Then it's the
>> subscriber who asks for binary output via option(s) to START_REPLICATION.
>>
>
>If I understand this correctly this would add something to the CREATE/ALTER
>SUBSCRIPTION commands in the WITH clause.
>Additionally another column would be required for pg_subscription for the
>binary option.
>Does it make sense to add an options column which would just be a comma
>separated string?
>Not that I have future options in mind but seems like something that might
>come up in the future.
>
I'd just add a boolean column to the catalog. That's what I did in the
patch adding support for streaming in-progress transactions. I don't think
we expect many additional parameters, so it makes little sense to optimize
for that case.Which is what I have done. ThanksI've attached both patches for comments.I still have to add documentation.Regards,Dave
Additional patch for documentation.
Dave Cramer
Attachment
On Wed, 5 Jun 2019 at 18:50, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-06-05 18:47:57 -0400, Dave Cramer wrote:
> So one of the things they would like added is to get not null information
> in the schema record. This is so they can mark the field Optional in Java.
> I presume this would also have some uses in other languages. As I
> understand it this would require a protocol bump. If this were to be
> accepted are there any outstanding asks that would useful to add if we were
> going to bump the protocol?
I'm pretty strongly opposed to this. What's the limiting factor when
adding such information? I think clients that want something like this
ought to query the database for catalog information when getting schema
information.
So talking some more to the guys that want to use this for Change Data Capture they pointed out that if the schema changes while they are offline there is no way to query the catalog information
Dave
> On Mon, Jun 17, 2019 at 10:29:26AM -0400, Dave Cramer wrote: > > Which is what I have done. Thanks > > > > I've attached both patches for comments. > > I still have to add documentation. > > Additional patch for documentation. Thank you for the patch! Unfortunately 0002 has some conflicts, could you please send a rebased version? In the meantime I have few questions: -LogicalRepRelId +void logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup) { char action; - LogicalRepRelId relid; - - /* read the relation id */ - relid = pq_getmsgint(in, 4); action = pq_getmsgbyte(in); if (action != 'N') @@ -175,7 +173,6 @@ logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup) logicalrep_read_tuple(in, newtup); - return relid; } .... - relid = logicalrep_read_insert(s, &newtup); + /* read the relation id */ + relid = pq_getmsgint(s, 4); rel = logicalrep_rel_open(relid, RowExclusiveLock); + + logicalrep_read_insert(s, &newtup); Maybe I'm missing something, what is the reason of moving pq_getmsgint out of logicalrep_read_*? Just from the patch it seems that the code is equivalent. > There is one obvious hack where in binary mode I reset the input > cursor to allow the binary input to be re-read From what I can tell > the alternative is to convert the data in logicalrep_read_tuple but > that would require moving a lot of the logic currently in worker.c to > proto.c. This seems minimally invasive. Which logic has to be moved for example? Actually it sounds more natural to me, if this functionality would be in e.g. logicalrep_read_tuple rather than slot_store_data, since it has something to do with reading data. And it seems that in pglogical it's also located in pglogical_read_tuple.
On Sun, 27 Oct 2019 at 11:00, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On Mon, Jun 17, 2019 at 10:29:26AM -0400, Dave Cramer wrote:
> > Which is what I have done. Thanks
> >
> > I've attached both patches for comments.
> > I still have to add documentation.
>
> Additional patch for documentation.
Thank you for the patch! Unfortunately 0002 has some conflicts, could
you please send a rebased version? In the meantime I have few questions:
-LogicalRepRelId
+void
logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)
{
char action;
- LogicalRepRelId relid;
-
- /* read the relation id */
- relid = pq_getmsgint(in, 4);
action = pq_getmsgbyte(in);
if (action != 'N')
@@ -175,7 +173,6 @@ logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)
logicalrep_read_tuple(in, newtup);
- return relid;
}
....
- relid = logicalrep_read_insert(s, &newtup);
+ /* read the relation id */
+ relid = pq_getmsgint(s, 4);
rel = logicalrep_rel_open(relid, RowExclusiveLock);
+
+ logicalrep_read_insert(s, &newtup);
Maybe I'm missing something, what is the reason of moving pq_getmsgint
out of logicalrep_read_*? Just from the patch it seems that the code is
equivalent.
> There is one obvious hack where in binary mode I reset the input
> cursor to allow the binary input to be re-read From what I can tell
> the alternative is to convert the data in logicalrep_read_tuple but
> that would require moving a lot of the logic currently in worker.c to
> proto.c. This seems minimally invasive.
Which logic has to be moved for example? Actually it sounds more natural
to me, if this functionality would be in e.g. logicalrep_read_tuple
rather than slot_store_data, since it has something to do with reading
data. And it seems that in pglogical it's also located in
pglogical_read_tuple.
Ok, I've rebased and reverted logicalrep_read_insert
As to the last comment, honestly it's been so long I can't remember why I put that comment in there.
Thanks for reviewing
Dave
Attachment
On Thu, Oct 31, 2019 at 3:03 AM Dave Cramer <davecramer@gmail.com> wrote: > Ok, I've rebased and reverted logicalrep_read_insert Hi Dave, From the code style police (actually just from cfbot, which is set up to complain about declarations after statements, a bit of C99 we aren't ready for): proto.c:557:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] int len = pq_getmsgint(in, 4); /* read length */ ^ proto.c:573:6: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] int len = pq_getmsgint(in, 4); /* read length */ ^
On Sun, 3 Nov 2019 at 21:47, Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Oct 31, 2019 at 3:03 AM Dave Cramer <davecramer@gmail.com> wrote:
> Ok, I've rebased and reverted logicalrep_read_insert
Hi Dave,
From the code style police (actually just from cfbot, which is set up
to complain about declarations after statements, a bit of C99 we
aren't ready for):
proto.c:557:6: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
int len = pq_getmsgint(in, 4); /* read length */
^
proto.c:573:6: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
int len = pq_getmsgint(in, 4); /* read length */
^
Thomas,
Thanks for the review.
See attached
Attachment
> On Tue, Nov 05, 2019 at 07:16:10AM -0500, Dave Cramer wrote: > > See attached --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -1779,6 +1779,7 @@ ApplyWorkerMain(Datum main_arg) options.slotname = myslotname; options.proto.logical.proto_version = LOGICALREP_PROTO_VERSION_NUM; options.proto.logical.publication_names = MySubscription->publications; + options.proto.logical.binary = MySubscription->binary; I'm a bit confused, shouldn't be there also --- a/src/backend/catalog/pg_subscription.c +++ b/src/backend/catalog/pg_subscription.c @@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok) sub->name = pstrdup(NameStr(subform->subname)); sub->owner = subform->subowner; sub->enabled = subform->subenabled; + sub->binary = subform->subbinary; in the GetSubscription?
On Fri, 8 Nov 2019 at 11:20, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On Tue, Nov 05, 2019 at 07:16:10AM -0500, Dave Cramer wrote:
>
> See attached
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1779,6 +1779,7 @@ ApplyWorkerMain(Datum main_arg)
options.slotname = myslotname;
options.proto.logical.proto_version = LOGICALREP_PROTO_VERSION_NUM;
options.proto.logical.publication_names = MySubscription->publications;
+ options.proto.logical.binary = MySubscription->binary;
I'm a bit confused, shouldn't be there also
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok)
sub->name = pstrdup(NameStr(subform->subname));
sub->owner = subform->subowner;
sub->enabled = subform->subenabled;
+ sub->binary = subform->subbinary;
in the GetSubscription?
yes, I have added this. I will supply an updated patch later.
Now a bigger question(s).
Previously someone mentioned that we need to confirm whether the two servers are compatible for binary or not.
Checking to make sure the two servers have the same endianness is obvious.
Sizeof int, long, float, double, timestamp (float/int) at a minimum.
this could be done in libpqrcv_startstreaming. The question I have remaining is do we fall back to text mode if needed or simply fail ?
Dave
On 2019-Nov-11, Dave Cramer wrote: > Previously someone mentioned that we need to confirm whether the two > servers are compatible for binary or not. > > Checking to make sure the two servers have the same endianness is obvious. > Sizeof int, long, float, double, timestamp (float/int) at a minimum. > > this could be done in libpqrcv_startstreaming. The question I have > remaining is do we fall back to text mode if needed or simply fail ? I think it makes more sense to have it fail. If the user wants to retry in text mode, they can do that easily enough; but if we make it fall-back automatically and they set up the received wrongly by mistake, they would pay the performance penalty without noticing. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 11 Nov 2019 at 12:04, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Nov-11, Dave Cramer wrote:
> Previously someone mentioned that we need to confirm whether the two
> servers are compatible for binary or not.
>
> Checking to make sure the two servers have the same endianness is obvious.
> Sizeof int, long, float, double, timestamp (float/int) at a minimum.
>
> this could be done in libpqrcv_startstreaming. The question I have
> remaining is do we fall back to text mode if needed or simply fail ?
I think it makes more sense to have it fail. If the user wants to retry
in text mode, they can do that easily enough; but if we make it
fall-back automatically and they set up the received wrongly by mistake,
they would pay the performance penalty without noticing.
Alvaro,
thanks, after sending this I pretty much came to the same conclusion.
Dave
On Mon, 11 Nov 2019 at 12:07, Dave Cramer <davecramer@gmail.com> wrote:
On Mon, 11 Nov 2019 at 12:04, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:On 2019-Nov-11, Dave Cramer wrote:
> Previously someone mentioned that we need to confirm whether the two
> servers are compatible for binary or not.
>
> Checking to make sure the two servers have the same endianness is obvious.
> Sizeof int, long, float, double, timestamp (float/int) at a minimum.
>
> this could be done in libpqrcv_startstreaming. The question I have
> remaining is do we fall back to text mode if needed or simply fail ?
I think it makes more sense to have it fail. If the user wants to retry
in text mode, they can do that easily enough; but if we make it
fall-back automatically and they set up the received wrongly by mistake,
they would pay the performance penalty without noticing.Alvaro,thanks, after sending this I pretty much came to the same conclusion.Dave
Following 2 patches address Dmitry's concern and check for compatibility.
Attachment
> On Mon, Nov 11, 2019 at 11:15:45AM -0500, Dave Cramer wrote: > On Fri, 8 Nov 2019 at 11:20, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > On Tue, Nov 05, 2019 at 07:16:10AM -0500, Dave Cramer wrote: > > > > > > See attached > > > > --- a/src/backend/replication/logical/worker.c > > +++ b/src/backend/replication/logical/worker.c > > @@ -1779,6 +1779,7 @@ ApplyWorkerMain(Datum main_arg) > > options.slotname = myslotname; > > options.proto.logical.proto_version = LOGICALREP_PROTO_VERSION_NUM; > > options.proto.logical.publication_names = > > MySubscription->publications; > > + options.proto.logical.binary = MySubscription->binary; > > > > I'm a bit confused, shouldn't be there also > > > > --- a/src/backend/catalog/pg_subscription.c > > +++ b/src/backend/catalog/pg_subscription.c > > @@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok) > > sub->name = pstrdup(NameStr(subform->subname)); > > sub->owner = subform->subowner; > > sub->enabled = subform->subenabled; > > + sub->binary = subform->subbinary; > > > > in the GetSubscription? > > > > yes, I have added this. I will supply an updated patch later. > > Now a bigger question(s). Well, without this change it wasn't working for me at all. Other than that yes, it was a small question :)
On 2019-Nov-11, Dave Cramer wrote: > Following 2 patches address Dmitry's concern and check for compatibility. Please resend the whole patchset, so that the patch tester can verify the series. (Doing it helps humans, too). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 11 Nov 2019 at 15:17, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Nov-11, Dave Cramer wrote:
> Following 2 patches address Dmitry's concern and check for compatibility.
Please resend the whole patchset, so that the patch tester can verify
the series. (Doing it helps humans, too).
Attached,
Thanks,
Dave
Attachment
- 0004-get-relid-inside-of-logical_read_insert.patch
- 0007-check-that-the-subscriber-is-compatible-with-the-pub.patch
- 0005-Remove-C99-declaration-and-code.patch
- 0006-make-sure-the-subcription-is-marked-as-binary.patch
- 0001-First-pass-at-working-code-without-subscription-opti.patch
- 0002-add-binary-column-to-pg_subscription.patch
- 0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
Hi, On Mon, Nov 11, 2019 at 03:24:59PM -0500, Dave Cramer wrote: > Attached, The latest patch set does not apply correctly. Could you send a rebase please? I am moving the patch to next CF, waiting on author. -- Michael
Attachment
Rebased against head
Dave Cramer
On Sat, 30 Nov 2019 at 20:48, Michael Paquier <michael@paquier.xyz> wrote:
Hi,
On Mon, Nov 11, 2019 at 03:24:59PM -0500, Dave Cramer wrote:
> Attached,
The latest patch set does not apply correctly. Could you send a
rebase please? I am moving the patch to next CF, waiting on author.
--
Michael
Attachment
- 0007-check-that-the-subscriber-is-compatible-with-the-pub.patch
- 0001-First-pass-at-working-code-without-subscription-opti.patch
- 0006-make-sure-the-subcription-is-marked-as-binary.patch
- 0004-get-relid-inside-of-logical_read_insert.patch
- 0005-Remove-C99-declaration-and-code.patch
- 0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
- 0002-add-binary-column-to-pg_subscription.patch
On Mon, 2 Dec 2019 at 14:35, Dave Cramer <davecramer@gmail.com> wrote:
Rebased against headDave CramerOn Sat, 30 Nov 2019 at 20:48, Michael Paquier <michael@paquier.xyz> wrote:Hi,
On Mon, Nov 11, 2019 at 03:24:59PM -0500, Dave Cramer wrote:
> Attached,
The latest patch set does not apply correctly. Could you send a
rebase please? I am moving the patch to next CF, waiting on author.
--
Michael
Dave Cramer
Dave Cramer <davecramer@gmail.com> writes: > Rebased against head The cfbot's failing to apply this [1]. It looks like the reason is only that you included a catversion bump in what you submitted. Protocol is to *not* do that in submitted patches, but rely on the committer to add it at the last minute --- otherwise you'll waste a lot of time rebasing the patch, which is what it needs now. regards, tom lane [1] http://cfbot.cputube.org/patch_27_2152.log
On Fri, 28 Feb 2020 at 18:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Rebased against head
The cfbot's failing to apply this [1]. It looks like the reason is only
that you included a catversion bump in what you submitted. Protocol is to
*not* do that in submitted patches, but rely on the committer to add it at
the last minute --- otherwise you'll waste a lot of time rebasing the
patch, which is what it needs now.
regards, tom lane
[1] http://cfbot.cputube.org/patch_27_2152.log
rebased and removed the catversion bump.
Attachment
- 0006-make-sure-the-subcription-is-marked-as-binary.patch
- 0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
- 0002-add-binary-column-to-pg_subscription.patch
- 0004-get-relid-inside-of-logical_read_insert.patch
- 0005-Remove-C99-declaration-and-code.patch
- 0001-First-pass-at-working-code-without-subscription-opti.patch
- 0007-check-that-the-subscriber-is-compatible-with-the-pub.patch
Hi Dave, On 29/02/2020 18:44, Dave Cramer wrote: > > > rebased and removed the catversion bump. Looked into this and it generally seems okay, but I do have one gripe here: > + tuple->values[i].data = palloc(len + 1); > + /* and data */ > + > + pq_copymsgbytes(in, tuple->values[i].data, len); > + tuple->values[i].len = len; > + tuple->values[i].cursor = 0; > + tuple->values[i].maxlen = len; > + /* not strictly necessary but the docs say it is required */ > + tuple->values[i].data[len] = '\0'; > + break; > + } > + case 't': /* text formatted value */ > + { > + tuple->changed[i] = true; > + int len = pq_getmsgint(in, 4); /* read length */ > > /* and data */ > - tuple->values[i] = palloc(len + 1); > - pq_copymsgbytes(in, tuple->values[i], len); > - tuple->values[i][len] = '\0'; > + tuple->values[i].data = palloc(len + 1); > + pq_copymsgbytes(in, tuple->values[i].data, len); > + tuple->values[i].data[len] = '\0'; > + tuple->values[i].len = len; The cursor should be set to 0 in the text formatted case too if this is how we chose to encode data. However I am not quite convinced I like the StringInfoData usage here. Why not just change the struct to include additional array of lengths rather than replacing the existing values array with StringInfoData array, that seems generally both simpler and should have smaller memory footprint too, no? We could also merge the binary and changed arrays into single char array named something like format (as data can be either unchanged, binary or text) and just reuse same identifiers we have in protocol. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
On Fri, 6 Mar 2020 at 08:54, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi Dave,
On 29/02/2020 18:44, Dave Cramer wrote:
>
>
> rebased and removed the catversion bump.
Looked into this and it generally seems okay, but I do have one gripe here:
> + tuple->values[i].data = palloc(len + 1);
> + /* and data */
> +
> + pq_copymsgbytes(in, tuple->values[i].data, len);
> + tuple->values[i].len = len;
> + tuple->values[i].cursor = 0;
> + tuple->values[i].maxlen = len;
> + /* not strictly necessary but the docs say it is required */
> + tuple->values[i].data[len] = '\0';
> + break;
> + }
> + case 't': /* text formatted value */
> + {
> + tuple->changed[i] = true;
> + int len = pq_getmsgint(in, 4); /* read length */
>
> /* and data */
> - tuple->values[i] = palloc(len + 1);
> - pq_copymsgbytes(in, tuple->values[i], len);
> - tuple->values[i][len] = '\0';
> + tuple->values[i].data = palloc(len + 1);
> + pq_copymsgbytes(in, tuple->values[i].data, len);
> + tuple->values[i].data[len] = '\0';
> + tuple->values[i].len = len;
The cursor should be set to 0 in the text formatted case too if this is
how we chose to encode data.
However I am not quite convinced I like the StringInfoData usage here.
Why not just change the struct to include additional array of lengths
rather than replacing the existing values array with StringInfoData
array, that seems generally both simpler and should have smaller memory
footprint too, no?
Can you explain this a bit more? I don't really see a huge difference in memory usage.
We still need length and the data copied into LogicalRepTupleData when sending the data in binary, no?
We could also merge the binary and changed arrays into single char array
named something like format (as data can be either unchanged, binary or
text) and just reuse same identifiers we have in protocol.
This seems like a good idea.
Dave Cramer
--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/
Hi, On 08/03/2020 00:18, Dave Cramer wrote: > On Fri, 6 Mar 2020 at 08:54, Petr Jelinek <petr@2ndquadrant.com > <mailto:petr@2ndquadrant.com>> wrote: > > Hi Dave, > > On 29/02/2020 18:44, Dave Cramer wrote: > > > > > > rebased and removed the catversion bump. > > Looked into this and it generally seems okay, but I do have one > gripe here: > > > + tuple->values[i].data = > palloc(len + 1); > > + /* and data */ > > + > > + pq_copymsgbytes(in, > tuple->values[i].data, len); > > + tuple->values[i].len = len; > > + tuple->values[i].cursor = 0; > > + tuple->values[i].maxlen = len; > > + /* not strictly necessary > but the docs say it is required */ > > + tuple->values[i].data[len] > = '\0'; > > + break; > > + } > > + case 't': /* text > formatted value */ > > + { > > + tuple->changed[i] = true; > > + int len = pq_getmsgint(in, > 4); /* read length */ > > > > /* and data */ > > - tuple->values[i] = > palloc(len + 1); > > - pq_copymsgbytes(in, > tuple->values[i], len); > > - tuple->values[i][len] = '\0'; > > + tuple->values[i].data = > palloc(len + 1); > > + pq_copymsgbytes(in, > tuple->values[i].data, len); > > + tuple->values[i].data[len] > = '\0'; > > + tuple->values[i].len = len; > > The cursor should be set to 0 in the text formatted case too if this is > how we chose to encode data. > > However I am not quite convinced I like the StringInfoData usage here. > Why not just change the struct to include additional array of lengths > rather than replacing the existing values array with StringInfoData > array, that seems generally both simpler and should have smaller memory > footprint too, no? > > > Can you explain this a bit more? I don't really see a huge difference in > memory usage. > We still need length and the data copied into LogicalRepTupleData when > sending the data in binary, no? > Yes but we don't need to have fixed sized array of 1600 elements that contains maxlen and cursor positions of the StringInfoData struct which we don't use for anything AFAICS. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
On Fri, 3 Apr 2020 at 03:43, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,
On 08/03/2020 00:18, Dave Cramer wrote:
> On Fri, 6 Mar 2020 at 08:54, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>
> Hi Dave,
>
> On 29/02/2020 18:44, Dave Cramer wrote:
> >
> >
> > rebased and removed the catversion bump.
>
> Looked into this and it generally seems okay, but I do have one
> gripe here:
>
> > + tuple->values[i].data =
> palloc(len + 1);
> > + /* and data */
> > +
> > + pq_copymsgbytes(in,
> tuple->values[i].data, len);
> > + tuple->values[i].len = len;
> > + tuple->values[i].cursor = 0;
> > + tuple->values[i].maxlen = len;
> > + /* not strictly necessary
> but the docs say it is required */
> > + tuple->values[i].data[len]
> = '\0';
> > + break;
> > + }
> > + case 't': /* text
> formatted value */
> > + {
> > + tuple->changed[i] = true;
> > + int len = pq_getmsgint(in,
> 4); /* read length */
> >
> > /* and data */
> > - tuple->values[i] =
> palloc(len + 1);
> > - pq_copymsgbytes(in,
> tuple->values[i], len);
> > - tuple->values[i][len] = '\0';
> > + tuple->values[i].data =
> palloc(len + 1);
> > + pq_copymsgbytes(in,
> tuple->values[i].data, len);
> > + tuple->values[i].data[len]
> = '\0';
> > + tuple->values[i].len = len;
>
> The cursor should be set to 0 in the text formatted case too if this is
> how we chose to encode data.
>
> However I am not quite convinced I like the StringInfoData usage here.
> Why not just change the struct to include additional array of lengths
> rather than replacing the existing values array with StringInfoData
> array, that seems generally both simpler and should have smaller memory
> footprint too, no?
>
>
> Can you explain this a bit more? I don't really see a huge difference in
> memory usage.
> We still need length and the data copied into LogicalRepTupleData when
> sending the data in binary, no?
>
Yes but we don't need to have fixed sized array of 1600 elements that
contains maxlen and cursor positions of the StringInfoData struct which
we don't use for anything AFAICS.
OK, I can see an easy way to only allocate the number of elements required but since OidReceiveFunctionCall takes
StringInfo as one of it's arguments it seems like an easy path unless there is something I am missing ?
Dave
On Fri, 3 Apr 2020 at 16:44, Dave Cramer <davecramer@gmail.com> wrote:
On Fri, 3 Apr 2020 at 03:43, Petr Jelinek <petr@2ndquadrant.com> wrote:Hi,
On 08/03/2020 00:18, Dave Cramer wrote:
> On Fri, 6 Mar 2020 at 08:54, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>
> Hi Dave,
>
> On 29/02/2020 18:44, Dave Cramer wrote:
> >
> >
> > rebased and removed the catversion bump.
>
> Looked into this and it generally seems okay, but I do have one
> gripe here:
>
> > + tuple->values[i].data =
> palloc(len + 1);
> > + /* and data */
> > +
> > + pq_copymsgbytes(in,
> tuple->values[i].data, len);
> > + tuple->values[i].len = len;
> > + tuple->values[i].cursor = 0;
> > + tuple->values[i].maxlen = len;
> > + /* not strictly necessary
> but the docs say it is required */
> > + tuple->values[i].data[len]
> = '\0';
> > + break;
> > + }
> > + case 't': /* text
> formatted value */
> > + {
> > + tuple->changed[i] = true;
> > + int len = pq_getmsgint(in,
> 4); /* read length */
> >
> > /* and data */
> > - tuple->values[i] =
> palloc(len + 1);
> > - pq_copymsgbytes(in,
> tuple->values[i], len);
> > - tuple->values[i][len] = '\0';
> > + tuple->values[i].data =
> palloc(len + 1);
> > + pq_copymsgbytes(in,
> tuple->values[i].data, len);
> > + tuple->values[i].data[len]
> = '\0';
> > + tuple->values[i].len = len;
>
> The cursor should be set to 0 in the text formatted case too if this is
> how we chose to encode data.
>
> However I am not quite convinced I like the StringInfoData usage here.
> Why not just change the struct to include additional array of lengths
> rather than replacing the existing values array with StringInfoData
> array, that seems generally both simpler and should have smaller memory
> footprint too, no?
>
>
> Can you explain this a bit more? I don't really see a huge difference in
> memory usage.
> We still need length and the data copied into LogicalRepTupleData when
> sending the data in binary, no?
>
Yes but we don't need to have fixed sized array of 1600 elements that
contains maxlen and cursor positions of the StringInfoData struct which
we don't use for anything AFAICS.
New patch that fixes a number of errors in the check for validity as well as reduces the memory usage by
dynamically allocating the data changed as well as collapsing the changed and binary arrays into a format array.
Dave Cramer
Attachment
- 0001-First-pass-at-working-code-without-subscription-opti.patch
- 0002-add-binary-column-to-pg_subscription.patch
- 0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
- 0005-Remove-C99-declaration-and-code.patch
- 0004-get-relid-inside-of-logical_read_insert.patch
- 0006-make-sure-the-subcription-is-marked-as-binary.patch
- 0007-check-that-the-subscriber-is-compatible-with-the-pub.patch
- 0008-Changed-binary-and-changed-to-format-and-use-existin.patch
> On 7 Apr 2020, at 21:45, Dave Cramer <davecramer@gmail.com> wrote: > New patch that fixes a number of errors in the check for validity as well as reduces the memory usage by > dynamically allocating the data changed as well as collapsing the changed and binary arrays into a format array. The 0001 patch fails to apply, and possibly other in the series. Please submit a rebased version. Marking the CF entry as Waiting for Author in the meantime. cheers ./daniel
Honestly I'm getting a little weary of fixing this up only to have the patch not get reviewed.
Apparently it has no value so unless someone is willing to review it and get it committed I'm just going to let it go.
Thanks,
Dave Cramer
On Wed, 1 Jul 2020 at 04:53, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 7 Apr 2020, at 21:45, Dave Cramer <davecramer@gmail.com> wrote:
> New patch that fixes a number of errors in the check for validity as well as reduces the memory usage by
> dynamically allocating the data changed as well as collapsing the changed and binary arrays into a format array.
The 0001 patch fails to apply, and possibly other in the series. Please submit
a rebased version. Marking the CF entry as Waiting for Author in the meantime.
cheers ./daniel
rebased
Thanks,
Dave Cramer
On Wed, 1 Jul 2020 at 06:43, Dave Cramer <davecramer@gmail.com> wrote:
Honestly I'm getting a little weary of fixing this up only to have the patch not get reviewed.Apparently it has no value so unless someone is willing to review it and get it committed I'm just going to let it go.Thanks,Dave CramerOn Wed, 1 Jul 2020 at 04:53, Daniel Gustafsson <daniel@yesql.se> wrote:> On 7 Apr 2020, at 21:45, Dave Cramer <davecramer@gmail.com> wrote:
> New patch that fixes a number of errors in the check for validity as well as reduces the memory usage by
> dynamically allocating the data changed as well as collapsing the changed and binary arrays into a format array.
The 0001 patch fails to apply, and possibly other in the series. Please submit
a rebased version. Marking the CF entry as Waiting for Author in the meantime.
cheers ./daniel
Attachment
- 0001-First-pass-at-working-code-without-subscription-opti.patch
- 0005-Remove-C99-declaration-and-code.patch
- 0002-add-binary-column-to-pg_subscription.patch
- 0004-get-relid-inside-of-logical_read_insert.patch
- 0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
- 0006-make-sure-the-subcription-is-marked-as-binary.patch
- 0007-check-that-the-subscriber-is-compatible-with-the-pub.patch
- 0008-Changed-binary-and-changed-to-format-and-use-existin.patch
> On 2 Jul 2020, at 18:41, Dave Cramer <davecramer@gmail.com> wrote: > > rebased Thanks! The new version of 0001 patch has a compiler warning due to mixed declarations and code: worker.c: In function ‘slot_store_data’: worker.c:366:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] int cursor = tupleData->values[remoteattnum]->cursor; ^ worker.c: In function ‘slot_modify_data’: worker.c:485:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] int cursor = tupleData->values[remoteattnum]->cursor; ^ I didn't investigate to see if it was new, but Travis is running with Werror which fails this build. cheers ./daniel
On 2020-Jul-05, Daniel Gustafsson wrote: > > On 2 Jul 2020, at 18:41, Dave Cramer <davecramer@gmail.com> wrote: > > > > rebased > > Thanks! The new version of 0001 patch has a compiler warning due to mixed > declarations and code: > > worker.c: In function ‘slot_store_data’: > worker.c:366:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] AFAICS this is fixed in 0005. I'm going to suggest to use "git rebase -i" so that fixes for bugs that earlier patches introduce are applied as fix-ups in those patches; we don't need or want to see the submitter's intermediate versions. Ideally, each submitted patch should be free of such problems, so that we can consider each individual patch in the series in isolation. Indeed, evidently the cfbot consider things that way. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 5 Jul 2020, at 23:11, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2020-Jul-05, Daniel Gustafsson wrote: > >>> On 2 Jul 2020, at 18:41, Dave Cramer <davecramer@gmail.com> wrote: >>> >>> rebased >> >> Thanks! The new version of 0001 patch has a compiler warning due to mixed >> declarations and code: >> >> worker.c: In function ‘slot_store_data’: >> worker.c:366:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] > > AFAICS this is fixed in 0005. Yes and no, 0005 fixes one such instance but the one failing the build is another one in worker.c (the below being from 0008 which in turn change the row in question from previous patches): + int cursor = tupleData->values[remoteattnum]->cursor; > I'm going to suggest to use "git rebase -i" +1 cheers ./daniel
On Sun, 5 Jul 2020 at 17:28, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 5 Jul 2020, at 23:11, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2020-Jul-05, Daniel Gustafsson wrote:
>
>>> On 2 Jul 2020, at 18:41, Dave Cramer <davecramer@gmail.com> wrote:
>>>
>>> rebased
>>
>> Thanks! The new version of 0001 patch has a compiler warning due to mixed
>> declarations and code:
>>
>> worker.c: In function ‘slot_store_data’:
>> worker.c:366:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
>
> AFAICS this is fixed in 0005.
Yes and no, 0005 fixes one such instance but the one failing the build is
another one in worker.c (the below being from 0008 which in turn change the row
in question from previous patches):
+ int cursor = tupleData->values[remoteattnum]->cursor;
> I'm going to suggest to use "git rebase -i"
+1
Strangely I don't see those errors when I build on my machine, but I will fix them
as far as rebase -i do what is advised here for squashing them. Just one patch now ?
Thanks,
> On 6 Jul 2020, at 14:58, Dave Cramer <davecramer@gmail.com> wrote: > as far as rebase -i do what is advised here for squashing them. Just one patch now ? One patch per logical change, if there are two disjoint changes in the patchset where one builds on top of the other then multiple patches are of course fine. My personal rule-of-thumb is to split it the way I envision it committed. cheers ./daniel
On Mon, 6 Jul 2020 at 09:03, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 6 Jul 2020, at 14:58, Dave Cramer <davecramer@gmail.com> wrote:
> as far as rebase -i do what is advised here for squashing them. Just one patch now ?
One patch per logical change, if there are two disjoint changes in the patchset
where one builds on top of the other then multiple patches are of course fine.
My personal rule-of-thumb is to split it the way I envision it committed.
At this point it is the result of 3 rebases. I guess I'll have to break it up differently..
Thanks,
Dave
On Mon, 6 Jul 2020 at 09:35, Dave Cramer <davecramer@gmail.com> wrote:
On Mon, 6 Jul 2020 at 09:03, Daniel Gustafsson <daniel@yesql.se> wrote:> On 6 Jul 2020, at 14:58, Dave Cramer <davecramer@gmail.com> wrote:
> as far as rebase -i do what is advised here for squashing them. Just one patch now ?
One patch per logical change, if there are two disjoint changes in the patchset
where one builds on top of the other then multiple patches are of course fine.
My personal rule-of-thumb is to split it the way I envision it committed.At this point it is the result of 3 rebases. I guess I'll have to break it up differently..
OK, rebased it down to 2 patches, attached.
Thanks,Dave
Attachment
> On 7 Jul 2020, at 02:16, Dave Cramer <davecramer@gmail.com> wrote: > OK, rebased it down to 2 patches, attached. I took a look at this patchset today. The feature clearly seems like something which we'd benefit from having, especially if it allows for the kind of extensions that were discussed at the beginning of this thread. In general I think it's in pretty good shape, there are however a few comments: The patch lacks any kind of test, which I think is required for it to be considered for committing. It also doesn't update the \dRs view in psql to include the subbinary column which IMO it should. I took the liberty to write this as well as tests as I was playing with the patch, the attached 0003 contains this, while 0001 and 0002 are your patches included to ensure the CFBot can do it's thing. This was kind of thrown together to have something while testing, so it definately need a once-over or two. The comment here implies that unchanged is the default value for format, but isn't this actually setting it to text formatted value? + /* default is unchanged */ + tuple->format = palloc(natts * sizeof(char)); + memset(tuple->format, 't', natts * sizeof(char)); Also, since the values member isn't memset() with a default, this seems a bit misleading at best no? For the rest of the backend we aren't including the defname in the errmsg like what is done here. Maybe we should, but I think that should be done consistently if so, and we should stick to just "conflicting or redundant options" for now. At the very least, this needs a comma between "options" and the defname and ideally the defname wrapped in \". - errmsg("conflicting or redundant options"))); + errmsg("conflicting or redundant options %s already provided", defel->defname))); These types of constructs are IMHO quite hard to read: + if( + #ifdef WORDS_BIGENDIAN + true + #else + false + #endif + != bigendian) How about spelling out the statement completely for both cases, or perhaps encapsulating the logic in a macro? Something like the below perhaps? + #ifdef WORDS_BIGENDIAN + if (bigendian != true) + #else + if (bigendian != false) + #endif This change needs to be removed before a commit, just highlighting that here to avoid it going unnoticed. -/* #define WAL_DEBUG */ +#define WAL_DEBUG Reading this I'm left wondering if we shoulnd't introduce macros for the kinds, since we now compare with 'u', 't' etc in quite a few places and add comments explaining the types everywhere. A descriptive name would make it easier to grep for all occurrences, and avoid the need for the comment lines. Thats not necesarily for this patch though, but an observation from reading it. I found a few smaller nitpicks as well, some of these might go away by a pg_indent run but I've included them all here regardless: This change, and the subsequent whitespace removal later in the same function, seems a bit pointless: - /* read the relation id */ relid = pq_getmsgint(in, 4); - Braces should go on the next line: + if (options->proto.logical.binary) { This should be a C /* ... */ comment, or perhaps just removed since the code is quite self explanatory: + // default to false + *binary_basetypes = false; Indentation here: - errmsg("conflicting or redundant options"))); + errmsg("conflicting or redundant options %s already provided", defel->defname))); ..as well as here (there are a few like this one): + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("incompatible datum size"))); Capitalization of "after" to make it a proper sentence: + * after we know that the subscriber is requesting binary check to make sure Excessive whitespace and indentation in a few places, and not enough in some: + if (binary_given) + { + values[Anum_pg_subscription_subbinary - 1] = ... + if ( *binary_basetypes == true ) ... + if (sizeof(int) != int_size) ... + if( float4_byval != ... + if (sizeof(long) != long_size) + ereport(ERROR, ... + if (tupleData->format[remoteattnum] =='u') ... + bool binary_basetypes; That's all for now. cheers ./daniel
Attachment
On Tue, 7 Jul 2020 at 10:01, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 7 Jul 2020, at 02:16, Dave Cramer <davecramer@gmail.com> wrote:
> OK, rebased it down to 2 patches, attached.
I took a look at this patchset today. The feature clearly seems like something
which we'd benefit from having, especially if it allows for the kind of
extensions that were discussed at the beginning of this thread. In general I
think it's in pretty good shape, there are however a few comments:
The patch lacks any kind of test, which I think is required for it to be
considered for committing. It also doesn't update the \dRs view in psql to
include the subbinary column which IMO it should. I took the liberty to write
this as well as tests as I was playing with the patch, the attached 0003
contains this, while 0001 and 0002 are your patches included to ensure the
CFBot can do it's thing. This was kind of thrown together to have something
while testing, so it definately need a once-over or two.
I have put all your requests other than the indentation as that can be dealt with by pg_indent into another patch which I reordered ahead of yours
This should make it easier to see that all of your issues have been addressed.
I did not do the macro for updated, inserted, deleted, will give that a go tomorrow.
The comment here implies that unchanged is the default value for format, but
isn't this actually setting it to text formatted value?
+ /* default is unchanged */
+ tuple->format = palloc(natts * sizeof(char));
+ memset(tuple->format, 't', natts * sizeof(char));
Also, since the values member isn't memset() with a default, this seems a bit
misleading at best no?
For the rest of the backend we aren't including the defname in the errmsg like
what is done here. Maybe we should, but I think that should be done
consistently if so, and we should stick to just "conflicting or redundant
options" for now. At the very least, this needs a comma between "options" and
the defname and ideally the defname wrapped in \".
- errmsg("conflicting or redundant options")));
+ errmsg("conflicting or redundant options %s already provided", defel->defname)));
I added these since this will now be used outside of logical replication and getting reasonable error messages when setting up
replication is useful. I added the \" and ,
These types of constructs are IMHO quite hard to read:
+ if(
+ #ifdef WORDS_BIGENDIAN
+ true
+ #else
+ false
+ #endif
+ != bigendian)
How about spelling out the statement completely for both cases, or perhaps
encapsulating the logic in a macro? Something like the below perhaps?
+ #ifdef WORDS_BIGENDIAN
+ if (bigendian != true)
+ #else
+ if (bigendian != false)
+ #endif
This change needs to be removed before a commit, just highlighting that here to
avoid it going unnoticed.
-/* #define WAL_DEBUG */
+#define WAL_DEBUG
Done
Reading this I'm left wondering if we shoulnd't introduce macros for the kinds,
since we now compare with 'u', 't' etc in quite a few places and add comments
explaining the types everywhere. A descriptive name would make it easier to
grep for all occurrences, and avoid the need for the comment lines. Thats not
necesarily for this patch though, but an observation from reading it.
I'll take a look at adding macros tomorrow.
I've taken care of much of this below
I found a few smaller nitpicks as well, some of these might go away by a
pg_indent run but I've included them all here regardless:
This change, and the subsequent whitespace removal later in the same function,
seems a bit pointless:
- /* read the relation id */
relid = pq_getmsgint(in, 4);
-
Braces should go on the next line:
+ if (options->proto.logical.binary) {
This should be a C /* ... */ comment, or perhaps just removed since the code
is quite self explanatory:
+ // default to false
+ *binary_basetypes = false;
Indentation here:
- errmsg("conflicting or redundant options")));
+ errmsg("conflicting or redundant options %s already provided", defel->defname)));
..as well as here (there are a few like this one):
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("incompatible datum size")));
Capitalization of "after" to make it a proper sentence:
+ * after we know that the subscriber is requesting binary check to make sure
Excessive whitespace and indentation in a few places, and not enough in some:
+ if (binary_given)
+ {
+ values[Anum_pg_subscription_subbinary - 1] =
...
+ if ( *binary_basetypes == true )
...
+ if (sizeof(int) != int_size)
...
+ if( float4_byval !=
...
+ if (sizeof(long) != long_size)
+ ereport(ERROR,
...
+ if (tupleData->format[remoteattnum] =='u')
...
+ bool binary_basetypes;
That's all for now.
cheers ./daniel
Attachment
> On 7 Jul 2020, at 22:53, Dave Cramer <davecramer@gmail.com> wrote: > I have put all your requests other than the indentation as that can be dealt with by pg_indent into another patch whichI reordered ahead of yours > This should make it easier to see that all of your issues have been addressed. Thanks for the update! Do note that my patch included a new file which is missing from this patchset: src/test/subscription/t/014_binary.pl This is, IMO, the most interesting test of this feature so it would be good to be included. It's a basic test and can no doubt be extended to be even more relevant, but it's a start. > I did not do the macro for updated, inserted, deleted, will give that a go tomorrow. This might not be a blocker, but personally I think it would make the code more readable. Anyone else have an opinion on this? > I added these since this will now be used outside of logical replication and getting reasonable error messages when settingup > replication is useful. I added the \" and , I think the "lack of detail" in the existing error messages is intentional to make translation easier, but I might be wrong here. Reading through the new patch, and running the tests, I'm marking this as Ready for Committer. It does need some cosmetic TLC, quite possibly just from pg_indent but I didn't validate if it will take care of everything, and comment touchups (there is still a TODO comment around wording that needs to be resolved). However, I think it's in good enough shape for consideration at this point. cheers ./daniel
Daniel Gustafsson <daniel@yesql.se> writes: > Thanks for the update! Do note that my patch included a new file which is > missing from this patchset: > src/test/subscription/t/014_binary.pl > This is, IMO, the most interesting test of this feature so it would be good to > be included. It's a basic test and can no doubt be extended to be even more > relevant, but it's a start. I was about to complain that the latest patchset includes no meaningful test cases, but I assume that this missing file contains those. >> I did not do the macro for updated, inserted, deleted, will give that a go tomorrow. > This might not be a blocker, but personally I think it would make the > code more readable. Anyone else have an opinion on this? +1 for using macros. > Reading through the new patch, and running the tests, I'm marking this as Ready > for Committer. It does need some cosmetic TLC, quite possibly just from > pg_indent but I didn't validate if it will take care of everything, and comment > touchups (there is still a TODO comment around wording that needs to be > resolved). However, I think it's in good enough shape for consideration at > this point. I took a quick look through the patch and had some concerns: * Please strip out the PG_VERSION_NUM and USE_INTEGER_DATETIMES checks. Those are quite dead so far as a patch for HEAD is concerned --- in fact, since USE_INTEGER_DATETIMES hasn't even been defined since v10 or so, the patch is actively doing the wrong thing there. Not that it matters. This code will never appear in any branch where float timestamps could be a thing. * I doubt that the checks on USE_FLOAT4/8_BYVAL, sizeof(int), endiannness, etc, make any sense either. Those surely do not affect the on-the-wire representation of values --- or if they do, we've blown it somewhere else. I'd just take out all those checks and assume that the binary representation is sufficiently portable. (If it's not, it's more or less the user's problem, just as in binary COPY.) * Please also remove debugging hacks such as enabling WAL_DEBUG. * It'd likely be wise for the documentation to point out that binary mode only works if all types to be transferred have send/receive functions. BTW, while it's not the job of this patch to fix it, I find it quite distressing that we're apparently repeating the lookups of the type I/O functions for every row transferred. I'll set this back to WoA, but I concur with Daniel's opinion that it doesn't seem that far from committability. regards, tom lane
On Fri, 10 Jul 2020 at 14:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
> Thanks for the update! Do note that my patch included a new file which is
> missing from this patchset:
> src/test/subscription/t/014_binary.pl
> This is, IMO, the most interesting test of this feature so it would be good to
> be included. It's a basic test and can no doubt be extended to be even more
> relevant, but it's a start.
I was about to complain that the latest patchset includes no meaningful
test cases, but I assume that this missing file contains those.
>> I did not do the macro for updated, inserted, deleted, will give that a go tomorrow.
> This might not be a blocker, but personally I think it would make the
> code more readable. Anyone else have an opinion on this?
+1 for using macros.
Got it, will add.
> Reading through the new patch, and running the tests, I'm marking this as Ready
> for Committer. It does need some cosmetic TLC, quite possibly just from
> pg_indent but I didn't validate if it will take care of everything, and comment
> touchups (there is still a TODO comment around wording that needs to be
> resolved). However, I think it's in good enough shape for consideration at
> this point.
I took a quick look through the patch and had some concerns:
* Please strip out the PG_VERSION_NUM and USE_INTEGER_DATETIMES checks.
Those are quite dead so far as a patch for HEAD is concerned --- in fact,
since USE_INTEGER_DATETIMES hasn't even been defined since v10 or so,
the patch is actively doing the wrong thing there. Not that it matters.
This code will never appear in any branch where float timestamps could
be a thing.
* I doubt that the checks on USE_FLOAT4/8_BYVAL, sizeof(int), endiannness,
etc, make any sense either. Those surely do not affect the on-the-wire
representation of values --- or if they do, we've blown it somewhere else.
I'd just take out all those checks and assume that the binary
representation is sufficiently portable. (If it's not, it's more or less
the user's problem, just as in binary COPY.)
So is there any point in having them as options then ?
* Please also remove debugging hacks such as enabling WAL_DEBUG.
* It'd likely be wise for the documentation to point out that binary
mode only works if all types to be transferred have send/receive
functions.
will do
BTW, while it's not the job of this patch to fix it, I find it quite
distressing that we're apparently repeating the lookups of the type
I/O functions for every row transferred.
I'll set this back to WoA, but I concur with Daniel's opinion that
it doesn't seem that far from committability.
Thanks for looking at this
Dave Cramer
Hi, On 11/07/2020 14:14, Dave Cramer wrote: > > > On Fri, 10 Jul 2020 at 14:21, Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > > Reading through the new patch, and running the tests, I'm marking > this as Ready > > for Committer. It does need some cosmetic TLC, quite possibly > just from > > pg_indent but I didn't validate if it will take care of > everything, and comment > > touchups (there is still a TODO comment around wording that needs > to be > > resolved). However, I think it's in good enough shape for > consideration at > > this point. > > I took a quick look through the patch and had some concerns: > > * Please strip out the PG_VERSION_NUM and USE_INTEGER_DATETIMES checks. > Those are quite dead so far as a patch for HEAD is concerned --- in > fact, > since USE_INTEGER_DATETIMES hasn't even been defined since v10 or so, > the patch is actively doing the wrong thing there. Not that it matters. > This code will never appear in any branch where float timestamps could > be a thing. > > * I doubt that the checks on USE_FLOAT4/8_BYVAL, sizeof(int), > endiannness, > etc, make any sense either. Those surely do not affect the on-the-wire > representation of values --- or if they do, we've blown it somewhere > else. > I'd just take out all those checks and assume that the binary > representation is sufficiently portable. (If it's not, it's more or > less > the user's problem, just as in binary COPY.) > > > So is there any point in having them as options then ? > I am guessing this is copied from pglogical, right? We have them there because it can optionally send data in the on-disk format (not the network binary format) and there this matters, but for network binary format they do not matter as Tom says. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Petr Jelinek <petr@2ndquadrant.com> writes: > On 11/07/2020 14:14, Dave Cramer wrote: >> So is there any point in having them as options then ? > I am guessing this is copied from pglogical, right? We have them there > because it can optionally send data in the on-disk format (not the > network binary format) and there this matters, but for network binary > format they do not matter as Tom says. Ah, I wondered why that was there at all. Yes, you should just delete all that logic --- it's irrelevant as long as we use the send/recv functions. regards, tom lane
On Sat, 11 Jul 2020 at 10:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Petr Jelinek <petr@2ndquadrant.com> writes:
> On 11/07/2020 14:14, Dave Cramer wrote:
>> So is there any point in having them as options then ?
> I am guessing this is copied from pglogical, right? We have them there
> because it can optionally send data in the on-disk format (not the
> network binary format) and there this matters, but for network binary
> format they do not matter as Tom says.
Ah, I wondered why that was there at all. Yes, you should just delete
all that logic --- it's irrelevant as long as we use the send/recv
functions.
regards, tom lane
Ok,
removed all the unnecessary options.
Added the test case that Daniel had created.
Added a note to the docs.
Note WAL_DEBUG is removed in patch 3. I could rebase that into patch 1 if required.
Thanks,
Dave
Attachment
> On 13 Jul 2020, at 15:11, Dave Cramer <davecramer@gmail.com> wrote: I took another look at the updated version today. Since there now were some unused variables and (I believe) unnecessary checks (int size and endianness etc) left, I took the liberty to fix those. I also fixed some markup in the catalog docs, did some minor tidying up and ran pgindent on it. The attached is a squash of the 4 patches in your email with the above fixes. I'm again marking it RfC since I believe all concerns raised so far has been addressed. > Added the test case that Daniel had created. Nope, still missing AFAICT =) But I've included it in the attached. cheers ./daniel
Attachment
On Tue, 14 Jul 2020 at 09:26, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 13 Jul 2020, at 15:11, Dave Cramer <davecramer@gmail.com> wrote:
I took another look at the updated version today. Since there now were some
unused variables and (I believe) unnecessary checks (int size and endianness
etc) left, I took the liberty to fix those. I also fixed some markup in the
catalog docs, did some minor tidying up and ran pgindent on it.
The attached is a squash of the 4 patches in your email with the above fixes.
I'm again marking it RfC since I believe all concerns raised so far has been
addressed.
> Added the test case that Daniel had created.
Nope, still missing AFAICT =) But I've included it in the attached.
Thanks!
Dave
So I started looking through this seriously, and my first question is why do the docs and code keep saying that "base types" are sent in binary? Why not just "data"? Are there any cases where we don't use binary format, if the subscription requests it? If there's not a concrete reason to use that terminology, I'd rather flush it, because it seems confusing. regards, tom lane
On Tue, 14 Jul 2020 at 12:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So I started looking through this seriously, and my first question
is why do the docs and code keep saying that "base types" are sent
in binary? Why not just "data"? Are there any cases where we
don't use binary format, if the subscription requests it?
If there's not a concrete reason to use that terminology,
I'd rather flush it, because it seems confusing.
Well for some reason I thought there were some types that did not have send and receive functions.
I've changed the docs to say data and the flag from binary_basetypes to just binary
See attached.
Thanks,
Dave
Attachment
Dave Cramer <davecramer@gmail.com> writes: > On Tue, 14 Jul 2020 at 12:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So I started looking through this seriously, and my first question >> is why do the docs and code keep saying that "base types" are sent >> in binary? Why not just "data"? Are there any cases where we >> don't use binary format, if the subscription requests it? > Well for some reason I thought there were some types that did not have send > and receive functions. There are, but they're all base types, so this terminology is still unhelpful ;-). It'd be possible for the sender to send binary for columns it has a typsend function for, and otherwise send text. However, this only helps if the receiver has receive functions for all those types; in cross-version cases they might disagree about which types can be sent in binary. (Hm ... maybe we could have the receiver verify that it has typreceive for every column included in its version of the table, before asking for binary mode?) regards, tom lane
Hi, On 2020-07-14 14:08:53 -0400, Dave Cramer wrote: > On Tue, 14 Jul 2020 at 12:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > So I started looking through this seriously, and my first question > > is why do the docs and code keep saying that "base types" are sent > > in binary? Why not just "data"? Are there any cases where we > > don't use binary format, if the subscription requests it? > > > > If there's not a concrete reason to use that terminology, > > I'd rather flush it, because it seems confusing. > > > > Well for some reason I thought there were some types that did not have send > and receive functions. There's also send/receive functions that do not work across systems, unfortunately :(. In particular record and array send functions embed type oids and their receive functions verify that they match the local system. Which basically means that if there's any difference in oid assignment order between two systems that they will not allow to send/recv such data between them :(. I suspect that is what that comments might have been referring to? I've several times suggested that we should remove those type checks in recv, as they afaict don't provide any actual value. But unfortunately there hasn't been much response to that. See e.g. https://postgr.es/m/20160426001713.hbqdiwvf4mkzkg55%40alap3.anarazel.de Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > There's also send/receive functions that do not work across systems, > unfortunately :(. In particular record and array send functions embed > type oids and their receive functions verify that they match the local > system. Which basically means that if there's any difference in oid > assignment order between two systems that they will not allow to > send/recv such data between them :(. It's not a problem particularly for built-in types, but I agree there's an issue for extension types. > I've several times suggested that we should remove those type checks in > recv, as they afaict don't provide any actual value. But unfortunately > there hasn't been much response to that. See e.g. > https://postgr.es/m/20160426001713.hbqdiwvf4mkzkg55%40alap3.anarazel.de Maybe we could compromise by omitting the check if both OIDs are outside the built-in range? regards, tom lane
Hi, On 2020-07-14 19:46:52 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > There's also send/receive functions that do not work across systems, > > unfortunately :(. In particular record and array send functions embed > > type oids and their receive functions verify that they match the local > > system. Which basically means that if there's any difference in oid > > assignment order between two systems that they will not allow to > > send/recv such data between them :(. > > It's not a problem particularly for built-in types, but I agree > there's an issue for extension types. I'm not so sure. That's true for builtin types within a single major version, but not necessarily across major versions. Not that I can immediately recall cases where we renumbered type oids. It also assumes that the type specification exactly matches between the source / target system. It's probably not a great idea to try to use send/recv for meaningfully different types, but it doesn't seem to crazy to e.g. allow to e.g. change varchar to text while doing a major version upgrade over logical rep. What is the gain in having these checks? recv functions need to be safe against arbitrary input, so a type crosscheck doesn't buy additional safety in that regard. Not that a potential attacker couldn't just change the content anyways? > > I've several times suggested that we should remove those type checks in > > recv, as they afaict don't provide any actual value. But unfortunately > > there hasn't been much response to that. See e.g. > > https://postgr.es/m/20160426001713.hbqdiwvf4mkzkg55%40alap3.anarazel.de > > Maybe we could compromise by omitting the check if both OIDs are > outside the built-in range? Hm. That'd be a lot better than the current situation. So I'd definitely go for that if that's what we can agree on. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > What is the gain in having these checks? recv functions need to be safe > against arbitrary input, so a type crosscheck doesn't buy additional > safety in that regard. Not that a potential attacker couldn't just > change the content anyways? You're confusing security issues with user-friendliness issues. Detecting that you sent the wrong type via an OID mismatch error is a lot less painful than trying to figure out why you've got errors along the line of "incorrect binary data format". regards, tom lane
Hi, On 2020-07-14 22:28:48 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > What is the gain in having these checks? recv functions need to be safe > > against arbitrary input, so a type crosscheck doesn't buy additional > > safety in that regard. Not that a potential attacker couldn't just > > change the content anyways? > > You're confusing security issues with user-friendliness issues. > Detecting that you sent the wrong type via an OID mismatch error > is a lot less painful than trying to figure out why you've got > errors along the line of "incorrect binary data format". An oid mismatch error without knowing what that's about isn't very helpful either. How about adding an errcontext that shows the "source type oid", the target type oid & type name and, for records, the column name of the target table? That'd make this a lot easier to debug. Greetings, Andres Freund
On Tue, 14 Jul 2020 at 22:47, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-07-14 22:28:48 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > What is the gain in having these checks? recv functions need to be safe
> > against arbitrary input, so a type crosscheck doesn't buy additional
> > safety in that regard. Not that a potential attacker couldn't just
> > change the content anyways?
>
> You're confusing security issues with user-friendliness issues.
> Detecting that you sent the wrong type via an OID mismatch error
> is a lot less painful than trying to figure out why you've got
> errors along the line of "incorrect binary data format".
An oid mismatch error without knowing what that's about isn't very
helpful either.
How about adding an errcontext that shows the "source type oid", the
target type oid & type name and, for records, the column name of the
target table? That'd make this a lot easier to debug.
So looking at how to confirm that the subscriber has receive functions for all of the types.
AFAICT we don't have that information since the publication determines what is sent?
This code line 482 in proto.c attempts to limit what is sent in binary. We could certainly be more restrictive here.
if (binary &&
OidIsValid(typclass->typreceive) &&
(att->atttypid < FirstNormalObjectId || typclass->typtype != 'c') &&
(att->atttypid < FirstNormalObjectId || typclass->typelem == InvalidOid))
Dave Cramer
Working through this ... what is the rationale for having changed the API of logicalrep_read_update? It seems kind of random, especially since no comparable change was made to logicalrep_read_insert. If there's actually a good reason, it seems like it'd apply to both. If there's not, I'd be inclined to not change the API, because this sort of thing is a recipe for bugs when making cross-version patches. regards, tom lane
I've pushed this patch, with a number of adjustments, some cosmetic and some not so much (no pg_dump support!?). We're not quite done though ... Dave Cramer <davecramer@gmail.com> writes: > So looking at how to confirm that the subscriber has receive functions for > all of the types. > AFAICT we don't have that information since the publication determines what > is sent? Yeah, at the point where we need to send the option, we seem not to have a lot of info. In practice, if the sender has a typsend function, the only way the subscriber doesn't have a matching typreceive function is if it's an older PG version. I think it's sufficient to document that you can't use binary mode in that case, so that's what I did. (Note that getTypeBinaryInputInfo will say "no binary input function available for type %s" in such a case, so that seemed like adequate error handling.) > On Tue, 14 Jul 2020 at 22:47, Andres Freund <andres@anarazel.de> wrote: >> An oid mismatch error without knowing what that's about isn't very >> helpful either. >> How about adding an errcontext that shows the "source type oid", the >> target type oid & type name and, for records, the column name of the >> target table? That'd make this a lot easier to debug. > This code line 482 in proto.c attempts to limit what is sent in binary. We > could certainly be more restrictive here. I think Andres' point is to be *less* restrictive. I left that logic as-is in the committed patch, but we could do something like the attached to improve the situation. regards, tom lane diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c index 2b1356ee24..d9837a5971 100644 --- a/src/backend/replication/logical/proto.c +++ b/src/backend/replication/logical/proto.c @@ -494,22 +494,9 @@ logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple, bool binar typclass = (Form_pg_type) GETSTRUCT(typtup); /* - * Choose whether to send in binary. Obviously, the option must be - * requested and the type must have a send function. Also, if the - * type is not built-in then it must not be a composite or array type. - * Such types contain type OIDs, which will likely not match at the - * receiver if it's not a built-in type. - * - * XXX this could be relaxed if we changed record_recv and array_recv - * to be less picky. - * - * XXX this fails to apply the restriction to domains over such types. + * Send in binary if requested and type has suitable send function. */ - if (binary && - OidIsValid(typclass->typsend) && - (att->atttypid < FirstGenbkiObjectId || - (typclass->typtype != TYPTYPE_COMPOSITE && - typclass->typelem == InvalidOid))) + if (binary && OidIsValid(typclass->typsend)) { bytea *outputbytes; int len; diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 800107d4e7..392445ea03 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -1308,13 +1308,34 @@ array_recv(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid array flags"))); + /* Check element type recorded in the data */ element_type = pq_getmsgint(buf, sizeof(Oid)); + + /* + * From a security standpoint, it doesn't matter whether the input's + * element type matches what we expect: the element type's receive + * function has to be robust enough to cope with invalid data. However, + * from a user-friendliness standpoint, it's nicer to complain about type + * mismatches than to throw "improper binary format" errors. But there's + * a problem: only built-in types have OIDs that are stable enough to + * believe that a mismatch is a real issue. So complain only if both OIDs + * are in the built-in range. Otherwise, carry on with the element type + * we "should" be getting. + */ if (element_type != spec_element_type) { - /* XXX Can we allow taking the input element type in any cases? */ - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("wrong element type"))); + if (element_type < FirstGenbkiObjectId && + spec_element_type < FirstGenbkiObjectId) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("binary data has array element type %u (%s) instead of expected %u (%s)", + element_type, + format_type_extended(element_type, -1, + FORMAT_TYPE_ALLOW_INVALID), + spec_element_type, + format_type_extended(spec_element_type, -1, + FORMAT_TYPE_ALLOW_INVALID)))); + element_type = spec_element_type; } for (i = 0; i < ndim; i++) diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c index 80cba2f4c2..674cf0a55d 100644 --- a/src/backend/utils/adt/rowtypes.c +++ b/src/backend/utils/adt/rowtypes.c @@ -551,13 +551,33 @@ record_recv(PG_FUNCTION_ARGS) continue; } - /* Verify column datatype */ + /* Check column type recorded in the data */ coltypoid = pq_getmsgint(buf, sizeof(Oid)); - if (coltypoid != column_type) + + /* + * From a security standpoint, it doesn't matter whether the input's + * column type matches what we expect: the column type's receive + * function has to be robust enough to cope with invalid data. + * However, from a user-friendliness standpoint, it's nicer to + * complain about type mismatches than to throw "improper binary + * format" errors. But there's a problem: only built-in types have + * OIDs that are stable enough to believe that a mismatch is a real + * issue. So complain only if both OIDs are in the built-in range. + * Otherwise, carry on with the column type we "should" be getting. + */ + if (coltypoid != column_type && + coltypoid < FirstGenbkiObjectId && + column_type < FirstGenbkiObjectId) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("wrong data type: %u, expected %u", - coltypoid, column_type))); + errmsg("binary data has type %u (%s) instead of expected %u (%s) in record column %d", + coltypoid, + format_type_extended(coltypoid, -1, + FORMAT_TYPE_ALLOW_INVALID), + column_type, + format_type_extended(column_type, -1, + FORMAT_TYPE_ALLOW_INVALID), + i + 1))); /* Get and check the item length */ itemlen = pq_getmsgint(buf, 4);
On Sat, Jul 18, 2020 at 9:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've pushed this patch, with a number of adjustments, some cosmetic > and some not so much (no pg_dump support!?). We're not quite > done though ... Skink's latest run reports a failure that I surmise was caused by this patch: ==722318== VALGRINDERROR-BEGIN ==722318== Invalid read of size 1 ==722318== at 0x4F4CC9: apply_handle_update (worker.c:834) ==722318== by 0x4F4F81: apply_dispatch (worker.c:1427) ==722318== by 0x4F5104: LogicalRepApplyLoop (worker.c:1635) ==722318== by 0x4F57BF: ApplyWorkerMain (worker.c:2141) ==722318== by 0x4BD49E: StartBackgroundWorker (bgworker.c:813) ==722318== by 0x4CBAB4: do_start_bgworker (postmaster.c:5865) ==722318== by 0x4CBBF5: maybe_start_bgworkers (postmaster.c:6091) ==722318== by 0x4CC4BF: sigusr1_handler (postmaster.c:5260) ==722318== by 0x486413F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so) ==722318== by 0x4DC7845: select (select.c:41) ==722318== by 0x4CCE40: ServerLoop (postmaster.c:1691) ==722318== by 0x4CE106: PostmasterMain (postmaster.c:1400) ==722318== Address 0x78cb0ab is 443 bytes inside a recently re-allocated block of size 8,192 alloc'd ==722318== at 0x483877F: malloc (vg_replace_malloc.c:307) ==722318== by 0x6A55BD: AllocSetContextCreateInternal (aset.c:468) ==722318== by 0x280262: AtStart_Memory (xact.c:1108) ==722318== by 0x2806ED: StartTransaction (xact.c:1979) ==722318== by 0x282128: StartTransactionCommand (xact.c:2829) ==722318== by 0x4F5514: ApplyWorkerMain (worker.c:2014) ==722318== by 0x4BD49E: StartBackgroundWorker (bgworker.c:813) ==722318== by 0x4CBAB4: do_start_bgworker (postmaster.c:5865) ==722318== by 0x4CBBF5: maybe_start_bgworkers (postmaster.c:6091) ==722318== by 0x4CC4BF: sigusr1_handler (postmaster.c:5260) ==722318== by 0x486413F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so) ==722318== by 0x4DC7845: select (select.c:41) ==722318== ==722318== VALGRINDERROR-END See https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2020-07-20%2002%3A37%3A51 -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > Skink's latest run reports a failure that I surmise was caused by this patch: Yeah, I've just been digging through that. The patch didn't create the bug, but it allowed valgrind to detect it, because the column status array is now "just big enough" rather than being always MaxTupleAttributeNumber entries. To wit, the problem is that the code in apply_handle_update that computes target_rte->updatedCols is junk. The immediate issue is that it fails to apply the remote-to-local column number mapping, so that it's looking at the wrong colstatus entries, possibly including entries past the end of the array. I'm fixing that, but even after that, there's a semantic problem: LOGICALREP_COLUMN_UNCHANGED is just a weak optimization, cf the code that sends it, in proto.c around line 480. colstatus will often *not* be that for columns that were in fact not updated on the remote side. I wonder whether we need to take steps to improve that. CC'ing Peter E., as this issue arose with b9c130a1fdf. regards, tom lane
Hi, On 20/07/2020 17:51, Tom Lane wrote: > Peter Geoghegan <pg@bowt.ie> writes: >> Skink's latest run reports a failure that I surmise was caused by this patch: > > Yeah, I've just been digging through that. The patch didn't create > the bug, but it allowed valgrind to detect it, because the column > status array is now "just big enough" rather than being always > MaxTupleAttributeNumber entries. To wit, the problem is that the > code in apply_handle_update that computes target_rte->updatedCols > is junk. > > The immediate issue is that it fails to apply the remote-to-local > column number mapping, so that it's looking at the wrong colstatus > entries, possibly including entries past the end of the array. > > I'm fixing that, but even after that, there's a semantic problem: > LOGICALREP_COLUMN_UNCHANGED is just a weak optimization, cf the code > that sends it, in proto.c around line 480. colstatus will often *not* > be that for columns that were in fact not updated on the remote side. > I wonder whether we need to take steps to improve that. > LOGICALREP_COLUMN_UNCHANGED is not trying to optimize anything, there is certainly no effort made to not send columns that were not updated by logical replication itself. It's just something we invented in order to handle the fact that values for TOASTed columns that were not updated are simply not visible to logical decoding (unless table has REPLICA IDENTITY FULL) as they are not written to WAL nor accessible via historic snapshot. So the output plugin simply does not see the real value. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Petr Jelinek <petr@2ndquadrant.com> writes: > On 20/07/2020 17:51, Tom Lane wrote: >> I'm fixing that, but even after that, there's a semantic problem: >> LOGICALREP_COLUMN_UNCHANGED is just a weak optimization, cf the code >> that sends it, in proto.c around line 480. colstatus will often *not* >> be that for columns that were in fact not updated on the remote side. >> I wonder whether we need to take steps to improve that. > LOGICALREP_COLUMN_UNCHANGED is not trying to optimize anything, there is > certainly no effort made to not send columns that were not updated by > logical replication itself. It's just something we invented in order to > handle the fact that values for TOASTed columns that were not updated > are simply not visible to logical decoding (unless table has REPLICA > IDENTITY FULL) as they are not written to WAL nor accessible via > historic snapshot. So the output plugin simply does not see the real value. Hm. So the comment I added a couple days ago is wrong; can you propose a better one? However, be that as it may, we do have a provision in the protocol that can handle marking columns unchanged. I'm thinking if we tried a bit harder to identify unchanged columns on the sending side, we could both fix this semantic deficiency for triggers and improve efficiency by reducing transmission of unneeded data. regards, tom lane