Thread: Binary support for pgoutput plugin

Binary support for pgoutput plugin

From

Dave Cramer

Date:

03 June 2019, 14:49:54

Is there a reason why pgoutput sends data in text format? Seems to me that sending data in binary would provide a considerable performance improvement.

Dave Cramer

Re: Binary support for pgoutput plugin

From

David Fetter

Date:

04 June 2019, 00:54:15

On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> Is there a reason why pgoutput sends data in text format? Seems to
> me that sending data in binary would provide a considerable
> performance improvement.

Are you seeing something that suggests that the text output is taking
a lot of time or other resources?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

04 June 2019, 19:47:04

Dave Cramer

On Mon, 3 Jun 2019 at 20:54, David Fetter <david@fetter.org> wrote:

On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> Is there a reason why pgoutput sends data in text format? Seems to
> me that sending data in binary would provide a considerable
> performance improvement.

Are you seeing something that suggests that the text output is taking
a lot of time or other resources?

Actually it's on the other end that there is improvement. Parsing text takes much longer for almost everything except ironically text.

To be more transparent there is some desire to use pgoutput for something other than logical replication. Change Data Capture clients such as Debezium have a requirement for a stable plugin which is shipped with core as this is always available in cloud providers offerings. There's no reason that I am aware of that they cannot use pgoutput for this. There's also no reason that I am aware that binary outputs can't be supported. The protocol would have to change slightly and I am working on a POC patch.

Thing is they aren't all written in C so using binary does provide a pretty substantial win on the decoding end.

Dave

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

04 June 2019, 20:38:35

Hi,

On 2019-06-04 15:47:04 -0400, Dave Cramer wrote:
> On Mon, 3 Jun 2019 at 20:54, David Fetter <david@fetter.org> wrote:
> 
> > On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> > > Is there a reason why pgoutput sends data in text format? Seems to
> > > me that sending data in binary would provide a considerable
> > > performance improvement.
> >
> > Are you seeing something that suggests that the text output is taking
> > a lot of time or other resources?
> >
> > Actually it's on the other end that there is improvement. Parsing text
> takes much longer for almost everything except ironically text.

It's on both sides, I'd say. E.g. float (until v12), timestamp, bytea
are all much more expensive to convert from binary to text.

> To be more transparent there is some desire to use pgoutput for something
> other than logical replication. Change Data Capture clients such as
> Debezium have a requirement for a stable plugin which is shipped with core
> as this is always available in cloud providers offerings. There's no reason
> that I am aware of that they cannot use pgoutput for this.

Except that that's not pgoutput's purpose, and we shouldn't make it
meaningfully more complicated or slower to achieve this. Don't think
there's a conflict in this case though.

> There's also no reason that I am aware that binary outputs can't be
> supported.

Well, it *does* increase version dependencies, and does make replication
more complicated, because type oids etc cannot be relied to be the same
on source and target side.

> The protocol would have to change slightly and I am working
> on a POC patch.

Hm, what would have to be changed protocol wise? IIRC that'd just be a
different datum type? Or is that what you mean?
        pq_sendbyte(out, 't');    /* 'text' data follows */

IIRC there was code for the binary protocol in a predecessor of
pgoutput.

I think if we were to add binary output - and I think we should - we
ought to only accept a patch if it's also used in core.

Greetings,

Andres Freund

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

04 June 2019, 20:39:32

Dave Cramer

On Tue, 4 Jun 2019 at 16:30, Andres Freund <andres.freund@enterprisedb.com> wrote:

Hi,

On 2019-06-04 15:47:04 -0400, Dave Cramer wrote:
> On Mon, 3 Jun 2019 at 20:54, David Fetter <david@fetter.org> wrote:
>
> > On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> > > Is there a reason why pgoutput sends data in text format? Seems to
> > > me that sending data in binary would provide a considerable
> > > performance improvement.
> >
> > Are you seeing something that suggests that the text output is taking
> > a lot of time or other resources?
> >
> > Actually it's on the other end that there is improvement. Parsing text
> takes much longer for almost everything except ironically text.

It's on both sides, I'd say. E.g. float (until v12), timestamp, bytea
are all much more expensive to convert from binary to text.

> To be more transparent there is some desire to use pgoutput for something
> other than logical replication. Change Data Capture clients such as
> Debezium have a requirement for a stable plugin which is shipped with core
> as this is always available in cloud providers offerings. There's no reason
> that I am aware of that they cannot use pgoutput for this.

Except that that's not pgoutput's purpose, and we shouldn't make it
meaningfully more complicated or slower to achieve this. Don't think
there's a conflict in this case though.

agreed, my intent was to slightly bend it to my will :)

> There's also no reason that I am aware that binary outputs can't be
> supported.

Well, it *does* increase version dependencies, and does make replication
more complicated, because type oids etc cannot be relied to be the same
on source and target side.

I was about to agree with this but if the type oids change from source to target you

still can't decode the text version properly. Unless I mis-understand something here ?

> The protocol would have to change slightly and I am working
> on a POC patch.

Hm, what would have to be changed protocol wise? IIRC that'd just be a
different datum type? Or is that what you mean?
pq_sendbyte(out, 't'); /* 'text' data follows */

I haven't really thought this through completely but one place JDBC has problems with binary is with

timestamps with timezone as we don't know which timezone to use. Is it safe to assume everything is in UTC

since the server stores in UTC ? Then there are UDF's. My original thought was to use options to send in the

types that I wanted in binary, everything else could be sent as text.

IIRC there was code for the binary protocol in a predecessor of
pgoutput.

Hmmm that might be good place to start. I will do some digging through git history

I think if we were to add binary output - and I think we should - we
ought to only accept a patch if it's also used in core.

Certainly; as not doing so would make my work completely irrelevant for my purpose.

Thanks,

Dave

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

04 June 2019, 20:46:41

Hi,

On 2019-06-04 16:39:32 -0400, Dave Cramer wrote:
> On Tue, 4 Jun 2019 at 16:30, Andres Freund <andres.freund@enterprisedb.com>
> wrote:
> > > There's also no reason that I am aware that binary outputs can't be
> > > supported.
> >
> > Well, it *does* increase version dependencies, and does make replication
> > more complicated, because type oids etc cannot be relied to be the same
> > on source and target side.
> >
> I was about to agree with this but if the type oids change from source
> to target you still can't decode the text version properly. Unless I
> mis-understand something here ?

The text format doesn't care about oids. I don't see how it'd be a
problem?  Note that some people *intentionally* use different types from
source to target system when logically replicating. So you can't rely on
the target table's types under any circumstance.

I think you really have to use the textual type which we already write
out (cf logicalrep_write_typ()) to call the binary input functions. And
you can send only data as binary that's from builtin types - otherwise
there's no guarantee at all that the target system has something
compatible. And even if you just assumed that all extensions etc are
present, you can't transport arrays / composite types in binary: For
hard to discern reasons we a) embed type oids in them b) verify them. b)
won't ever work for non-builtin types, because oids are assigned
dynamically.

> > I think if we were to add binary output - and I think we should - we
> > ought to only accept a patch if it's also used in core.
> >
> 
> Certainly; as not doing so would make my work completely irrelevant for my
> purpose.

What I mean is that the builtin logical replication would have to use
this on the receiving side too.

Greetings,

Andres Freund

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

04 June 2019, 20:55:33

On Tue, 4 Jun 2019 at 16:46, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-06-04 16:39:32 -0400, Dave Cramer wrote:
> On Tue, 4 Jun 2019 at 16:30, Andres Freund <andres.freund@enterprisedb.com>
> wrote:
> > > There's also no reason that I am aware that binary outputs can't be
> > > supported.
> >
> > Well, it *does* increase version dependencies, and does make replication
> > more complicated, because type oids etc cannot be relied to be the same
> > on source and target side.
> >
> I was about to agree with this but if the type oids change from source
> to target you still can't decode the text version properly. Unless I
> mis-understand something here ?

The text format doesn't care about oids. I don't see how it'd be a
problem? Note that some people *intentionally* use different types from
source to target system when logically replicating. So you can't rely on
the target table's types under any circumstance.

I think you really have to use the textual type which we already write
out (cf logicalrep_write_typ()) to call the binary input functions. And
you can send only data as binary that's from builtin types - otherwise
there's no guarantee at all that the target system has something
compatible. And even if you just assumed that all extensions etc are
present, you can't transport arrays / composite types in binary: For
hard to discern reasons we a) embed type oids in them b) verify them. b)
won't ever work for non-builtin types, because oids are assigned
dynamically.

I figured arrays and UDT's would be problematic.

> > I think if we were to add binary output - and I think we should - we
> > ought to only accept a patch if it's also used in core.
> >
>
> Certainly; as not doing so would make my work completely irrelevant for my
> purpose.

What I mean is that the builtin logical replication would have to use
this on the receiving side too.

Got it, thanks for validating that the idea isn't nuts. Now I *have* to produce a POC.

Thanks,

Dave

Re: Binary support for pgoutput plugin

From

Chapman Flack

Date:

04 June 2019, 21:33:28

On 6/4/19 4:39 PM, Dave Cramer wrote:
> I haven't really thought this through completely but one place JDBC has
> problems with binary is with
> timestamps with timezone as we don't know which timezone to use. Is it safe
> to assume everything is in UTC
> since the server stores in UTC ?

PL/Java, when converting to the Java 8 java.time types (because those
are sane), will turn a timestamp with timezone into an OffsetDateTime
with explicit offset zero (UTC), no matter what timezone may have been
used when the value was input (as you've observed, there's no way to
recover that). In the return direction, if given an OffsetDateTime
with any nonzero offset, it will adjust the value to UTC for postgres.

So, yes, say I.

Regards,
-Chap

Re: Binary support for pgoutput plugin

From

David Fetter

Date:

04 June 2019, 22:05:02

On Tue, Jun 04, 2019 at 04:55:33PM -0400, Dave Cramer wrote:
> On Tue, 4 Jun 2019 at 16:46, Andres Freund <andres@anarazel.de> wrote:
> 
> > Hi,
> >
> > On 2019-06-04 16:39:32 -0400, Dave Cramer wrote:
> > > On Tue, 4 Jun 2019 at 16:30, Andres Freund <
> > andres.freund@enterprisedb.com>
> > > wrote:
> > > > > There's also no reason that I am aware that binary outputs can't be
> > > > > supported.
> > > >
> > > > Well, it *does* increase version dependencies, and does make
> > replication
> > > > more complicated, because type oids etc cannot be relied to be the same
> > > > on source and target side.
> > > >
> > > I was about to agree with this but if the type oids change from source
> > > to target you still can't decode the text version properly. Unless I
> > > mis-understand something here ?
> >
> > The text format doesn't care about oids. I don't see how it'd be a
> > problem?  Note that some people *intentionally* use different types from
> > source to target system when logically replicating. So you can't rely on
> > the target table's types under any circumstance.
> >
> > I think you really have to use the textual type which we already write
> > out (cf logicalrep_write_typ()) to call the binary input functions. And
> > you can send only data as binary that's from builtin types - otherwise
> > there's no guarantee at all that the target system has something
> > compatible. And even if you just assumed that all extensions etc are
> > present, you can't transport arrays / composite types in binary: For
> > hard to discern reasons we a) embed type oids in them b) verify them. b)
> > won't ever work for non-builtin types, because oids are assigned
> > dynamically.
> >
> 
> I figured arrays and UDT's would be problematic.

Would it make sense to work toward a binary format that's not
architecture-specific? I recall from COPY that our binary format is
not standardized across, for example, big- and little-endian machines.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

04 June 2019, 22:08:17

Hi,

On 2019-06-05 00:05:02 +0200, David Fetter wrote:
> Would it make sense to work toward a binary format that's not
> architecture-specific? I recall from COPY that our binary format is
> not standardized across, for example, big- and little-endian machines.

I think you recall wrongly. It's obviously possible that we have bugs
around this, but output/input routines are supposed to handle a
endianess independent format. That usually means that you have to do
endianess conversions, but that doesn't make it non-standardized.

- Andres

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

04 June 2019, 22:32:23

On Tue, 4 Jun 2019 at 18:08, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-06-05 00:05:02 +0200, David Fetter wrote:
> Would it make sense to work toward a binary format that's not
> architecture-specific? I recall from COPY that our binary format is
> not standardized across, for example, big- and little-endian machines.

I think you recall wrongly. It's obviously possible that we have bugs
around this, but output/input routines are supposed to handle a
endianess independent format. That usually means that you have to do
endianess conversions, but that doesn't make it non-standardized.

Additionally there are a number of drivers that already know how to handle our binary types.

I don't really think there's a win here. I also want to keep the changes small .

Dave

Re: Binary support for pgoutput plugin

From

Petr Jelinek

Date:

05 June 2019, 11:18:42

Hi,

On 05/06/2019 00:08, Andres Freund wrote:
> Hi,
> 
> On 2019-06-05 00:05:02 +0200, David Fetter wrote:
>> Would it make sense to work toward a binary format that's not
>> architecture-specific? I recall from COPY that our binary format is
>> not standardized across, for example, big- and little-endian machines.
> 
> I think you recall wrongly. It's obviously possible that we have bugs
> around this, but output/input routines are supposed to handle a
> endianess independent format. That usually means that you have to do
> endianess conversions, but that doesn't make it non-standardized.
> 

Yeah, there are really 3 formats of data we have, text protocol, binary
network protocol and internal on disk format. The internal on disk
format will not work across big/little-endian but network binary
protocol will.

FWIW I don't think the code for binary format was included in original
logical replication patch (I really tried to keep it as minimal as
possible), but the code and protocol is pretty much ready for adding that.

That said, pglogical has code which handles this (I guess Andres means
that by predecessor of pgoutput) so if you look for example at the
write_tuple/read_tuple/decide_datum_transfer in
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c
that can help you give some ideas on how to approach this.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions
https://www.2ndQuadrant.com/

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

05 June 2019, 11:21:28

Hi,

On Wed, 5 Jun 2019 at 07:18, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

Hi,

On 05/06/2019 00:08, Andres Freund wrote:
> Hi,
>
> On 2019-06-05 00:05:02 +0200, David Fetter wrote:
>> Would it make sense to work toward a binary format that's not
>> architecture-specific? I recall from COPY that our binary format is
>> not standardized across, for example, big- and little-endian machines.
>
> I think you recall wrongly. It's obviously possible that we have bugs
> around this, but output/input routines are supposed to handle a
> endianess independent format. That usually means that you have to do
> endianess conversions, but that doesn't make it non-standardized.
>

Yeah, there are really 3 formats of data we have, text protocol, binary
network protocol and internal on disk format. The internal on disk
format will not work across big/little-endian but network binary
protocol will.

FWIW I don't think the code for binary format was included in original
logical replication patch (I really tried to keep it as minimal as
possible), but the code and protocol is pretty much ready for adding that.

Yes, I looked through the public history and could not find it. Thanks for confirming.

That said, pglogical has code which handles this (I guess Andres means
that by predecessor of pgoutput) so if you look for example at the
write_tuple/read_tuple/decide_datum_transfer in
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c
that can help you give some ideas on how to approach this.

Thanks for the tip!

Dave Cramer

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

05 June 2019, 15:51:10

On Wed, 5 Jun 2019 at 07:21, Dave Cramer <davecramer@gmail.com> wrote:

Hi,

On Wed, 5 Jun 2019 at 07:18, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
Hi,

On 05/06/2019 00:08, Andres Freund wrote:
> Hi,
>
> On 2019-06-05 00:05:02 +0200, David Fetter wrote:
>> Would it make sense to work toward a binary format that's not
>> architecture-specific? I recall from COPY that our binary format is
>> not standardized across, for example, big- and little-endian machines.
>
> I think you recall wrongly. It's obviously possible that we have bugs
> around this, but output/input routines are supposed to handle a
> endianess independent format. That usually means that you have to do
> endianess conversions, but that doesn't make it non-standardized.
>

Yeah, there are really 3 formats of data we have, text protocol, binary
network protocol and internal on disk format. The internal on disk
format will not work across big/little-endian but network binary
protocol will.

FWIW I don't think the code for binary format was included in original
logical replication patch (I really tried to keep it as minimal as
possible), but the code and protocol is pretty much ready for adding that.
Yes, I looked through the public history and could not find it. Thanks for confirming.

That said, pglogical has code which handles this (I guess Andres means
that by predecessor of pgoutput) so if you look for example at the
write_tuple/read_tuple/decide_datum_transfer in
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c
that can help you give some ideas on how to approach this.

Thanks for the tip!

Looking at:

https://github.com/postgres/postgres/blob/8255c7a5eeba8f1a38b7a431c04909bde4f5e67d/src/backend/replication/pgoutput/pgoutput.c#L163

this seems completely ignored. What was the intent?

Dave

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

05 June 2019, 16:01:45

Hi

On June 5, 2019 8:51:10 AM PDT, Dave Cramer <davecramer@gmail.com> wrote:
>On Wed, 5 Jun 2019 at 07:21, Dave Cramer <davecramer@gmail.com> wrote:
>
>> Hi,
>>
>>
>> On Wed, 5 Jun 2019 at 07:18, Petr Jelinek
><petr.jelinek@2ndquadrant.com>
>> wrote:
>>
>>> Hi,
>>>
>>> On 05/06/2019 00:08, Andres Freund wrote:
>>> > Hi,
>>> >
>>> > On 2019-06-05 00:05:02 +0200, David Fetter wrote:
>>> >> Would it make sense to work toward a binary format that's not
>>> >> architecture-specific? I recall from COPY that our binary format
>is
>>> >> not standardized across, for example, big- and little-endian
>machines.
>>> >
>>> > I think you recall wrongly. It's obviously possible that we have
>bugs
>>> > around this, but output/input routines are supposed to handle a
>>> > endianess independent format. That usually means that you have to
>do
>>> > endianess conversions, but that doesn't make it non-standardized.
>>> >
>>>
>>> Yeah, there are really 3 formats of data we have, text protocol,
>binary
>>> network protocol and internal on disk format. The internal on disk
>>> format will not work across big/little-endian but network binary
>>> protocol will.
>>>
>>> FWIW I don't think the code for binary format was included in
>original
>>> logical replication patch (I really tried to keep it as minimal as
>>> possible), but the code and protocol is pretty much ready for adding
>that.
>>>
>> Yes, I looked through the public history and could not find it.
>Thanks for
>> confirming.
>>
>>>
>>> That said, pglogical has code which handles this (I guess Andres
>means
>>> that by predecessor of pgoutput) so if you look for example at the
>>> write_tuple/read_tuple/decide_datum_transfer in
>>>
>>>
>https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c
>>> that can help you give some ideas on how to approach this.
>>>
>>
>> Thanks for the tip!
>>
>
>Looking at:

>https://github.com/postgres/postgres/blob/8255c7a5eeba8f1a38b7a431c04909bde4f5e67d/src/backend/replication/pgoutput/pgoutput.c#L163
>
>this seems completely ignored. What was the intent?

That's about the output of the plugin, not the datatypes. And independent of text/binary output, the protocol contains
non-printablechars. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

05 June 2019, 22:47:57

Hi,

On Wed, 5 Jun 2019 at 12:01, Andres Freund <andres@anarazel.de> wrote:

Hi

On June 5, 2019 8:51:10 AM PDT, Dave Cramer <davecramer@gmail.com> wrote:
>On Wed, 5 Jun 2019 at 07:21, Dave Cramer <davecramer@gmail.com> wrote:
>
>> Hi,
>>
>>
>> On Wed, 5 Jun 2019 at 07:18, Petr Jelinek
><petr.jelinek@2ndquadrant.com>
>> wrote:
>>
>>> Hi,
>>>
>>> On 05/06/2019 00:08, Andres Freund wrote:
>>> > Hi,
>>> >
>>> > On 2019-06-05 00:05:02 +0200, David Fetter wrote:
>>> >> Would it make sense to work toward a binary format that's not
>>> >> architecture-specific? I recall from COPY that our binary format
>is
>>> >> not standardized across, for example, big- and little-endian
>machines.
>>> >
>>> > I think you recall wrongly. It's obviously possible that we have
>bugs
>>> > around this, but output/input routines are supposed to handle a
>>> > endianess independent format. That usually means that you have to
>do
>>> > endianess conversions, but that doesn't make it non-standardized.
>>> >
>>>
>>> Yeah, there are really 3 formats of data we have, text protocol,
>binary
>>> network protocol and internal on disk format. The internal on disk
>>> format will not work across big/little-endian but network binary
>>> protocol will.
>>>
>>> FWIW I don't think the code for binary format was included in
>original
>>> logical replication patch (I really tried to keep it as minimal as
>>> possible), but the code and protocol is pretty much ready for adding
>that.
>>>
>> Yes, I looked through the public history and could not find it.
>Thanks for
>> confirming.
>>
>>>
>>> That said, pglogical has code which handles this (I guess Andres
>means
>>> that by predecessor of pgoutput) so if you look for example at the
>>> write_tuple/read_tuple/decide_datum_transfer in
>>>
>>>
>https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c
>>> that can help you give some ideas on how to approach this.
>>>
>>
>> Thanks for the tip!
>>
>
>Looking at:
>https://github.com/postgres/postgres/blob/8255c7a5eeba8f1a38b7a431c04909bde4f5e67d/src/backend/replication/pgoutput/pgoutput.c#L163
>
>this seems completely ignored. What was the intent?

That's about the output of the plugin, not the datatypes. And independent of text/binary output, the protocol contains non-printable chars.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

So one of the things they would like added is to get not null information in the schema record. This is so they can mark the field Optional in Java. I presume this would also have some uses in other languages. As I understand it this would require a protocol bump. If this were to be accepted are there any outstanding asks that would useful to add if we were going to bump the protocol?

Dave

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

05 June 2019, 22:50:14

Hi,

On 2019-06-05 18:47:57 -0400, Dave Cramer wrote:
> So one of the things they would like added is to get not null information
> in the schema record. This is so they can mark the field Optional in Java.
> I presume this would also have some uses in other languages. As I
> understand it this would require a protocol bump. If this were to be
> accepted are there any outstanding asks that would useful to add if we were
> going to bump the protocol?

I'm pretty strongly opposed to this. What's the limiting factor when
adding such information? I think clients that want something like this
ought to query the database for catalog information when getting schema
information.

Greetings,

Andres Freund

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

05 June 2019, 23:05:05

Hi,

On Wed, 5 Jun 2019 at 18:50, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-06-05 18:47:57 -0400, Dave Cramer wrote:
> So one of the things they would like added is to get not null information
> in the schema record. This is so they can mark the field Optional in Java.
> I presume this would also have some uses in other languages. As I
> understand it this would require a protocol bump. If this were to be
> accepted are there any outstanding asks that would useful to add if we were
> going to bump the protocol?

I'm pretty strongly opposed to this. What's the limiting factor when
adding such information? I think clients that want something like this
ought to query the database for catalog information when getting schema
information.

I'm not intimately familiar with their code. I will query them more about the ask.

I am curious why you are "strongly" opposed however. We already have the information. Adding doesn't seem onerous.

Dave

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

07 June 2019, 23:27:44

Hi,

On 2019-06-05 19:05:05 -0400, Dave Cramer wrote:
> I am curious why you are "strongly" opposed however. We already have the
> information. Adding doesn't seem onerous.

(thought I'd already replied with this)

The problem is that I don't recognize a limiting principle:

If we want NOT NULL information for clients, why don't we include the
underlying types for arrays, and the fields in composite types? What
about foreign keys? And unique keys?

And then we suddenly need tracking for all these, so we don't always
send out that information when we previously already did - and in some
of the cases there's no infrastructure for that.

I just don't quite buy that the output plugin build for pg's logical
replication needs is a good place to include a continually increasing
amount of metadata that logical replication doesn't need.  That's going
to add overhead and make the code more complicated.

Greetings,

Andres Freund

Re: Binary support for pgoutput plugin

From

Chapman Flack

Date:

08 June 2019, 00:52:38

On 06/07/19 19:27, Andres Freund wrote:
> The problem is that I don't recognize a limiting principle:
> 
> If we want NOT NULL information for clients, why don't we include the
> underlying types for arrays, and the fields in composite types? What
> about foreign keys? And unique keys?

This reminds me of an idea I had for a future fe/be protocol version,
right after a talk by Alyssa Ritchie and Henrietta Dombrovskaya at the
last 2Q PGConf. [1]

It seems they had ended up designing a whole 'nother "protocol level"
involving queries wrapping their results as JSON and an app layer that
unwraps again, after trying a simpler first approach that was foiled by the
inability to see into arrays and anonymous record types in the 'describe'
response.

I thought, in a new protocol rev, why not let the driver send additional
'describe' messages after the first one, to drill into structure of
individual columns mentioned in the first response, before sending the
'execute' message?

If it doesn't want the further detail, it doesn't have to ask.

> And then we suddenly need tracking for all these, so we don't always
> send out that information when we previously already did

If it's up to the client driver, it can track what it needs or already has.

I haven't looked too deeply into the replication protocol ... it happens
under a kind of copy-both, right?, so maybe there's a way for the receiver
to send some inquiries back, but maybe in a windowed, full-duplex way where
it might have to buffer some incoming messages before getting the response
to an inquiry message it sent.

Would those be thinkable thoughts for a future protocol rev?

Regards,
-Chap

[1]
https://www.2qpgconf.com/schedule/information-exchange-techniques-for-javapostgresql-applications/

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

08 June 2019, 01:01:12

Hi,

On 2019-06-07 20:52:38 -0400, Chapman Flack wrote:
> It seems they had ended up designing a whole 'nother "protocol level"
> involving queries wrapping their results as JSON and an app layer that
> unwraps again, after trying a simpler first approach that was foiled by the
> inability to see into arrays and anonymous record types in the 'describe'
> response.

I suspect quite a few people would have to have left the projectbefore
this would happen.


> I thought, in a new protocol rev, why not let the driver send additional
> 'describe' messages after the first one, to drill into structure of
> individual columns mentioned in the first response, before sending the
> 'execute' message?
> 
> If it doesn't want the further detail, it doesn't have to ask.
> 
> > And then we suddenly need tracking for all these, so we don't always
> > send out that information when we previously already did
> 
> If it's up to the client driver, it can track what it needs or already has.

> I haven't looked too deeply into the replication protocol ... it happens
> under a kind of copy-both, right?, so maybe there's a way for the receiver
> to send some inquiries back, but maybe in a windowed, full-duplex way where
> it might have to buffer some incoming messages before getting the response
> to an inquiry message it sent.

That'd be a *lot* of additional complexity, and pretty much prohibitive
from a performance POV. We'd have to not continue decoding on the server
side *all* the time to give the client a chance to inquire additional
information.

Greetings,

Andres Freund

Re: Binary support for pgoutput plugin

From

Chapman Flack

Date:

08 June 2019, 01:16:12

On 06/07/19 21:01, Andres Freund wrote:
> On 2019-06-07 20:52:38 -0400, Chapman Flack wrote:
>> It seems they had ended up designing a whole 'nother "protocol level"
>> involving queries wrapping their results as JSON and an app layer that
>> unwraps again, after trying a simpler first approach that was foiled by the
>> inability to see into arrays and anonymous record types in the 'describe'
>> response.
> 
> I suspect quite a few people would have to have left the projectbefore
> this would happen.

I'm not sure I understand what you're getting at. The "whole 'nother
protocol" was something they actually implemented, at the application
level, by rewriting their queries to produce JSON and their client to
unwrap it. It wasn't proposed to go into postgres ... but it was a
workaround they were forced into by the current protocol's inability
to tell them what they were getting.

> That'd be a *lot* of additional complexity, and pretty much prohibitive
> from a performance POV. We'd have to not continue decoding on the server
> side *all* the time to give the client a chance to inquire additional
> information.

Does anything travel in the client->server direction during replication?
I thought I saw CopyBoth mentioned. Is there a select()/poll() being done
so those messages can be received?

It does seem that the replication protocol would be the tougher problem.
For the regular extended-query protocol, it seems like allowing an extra
Describe or two before Execute might not be awfully hard.

Regards,
-Chap

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

08 June 2019, 01:18:41

Hi,

On 2019-06-07 21:16:12 -0400, Chapman Flack wrote:
> On 06/07/19 21:01, Andres Freund wrote:
> > On 2019-06-07 20:52:38 -0400, Chapman Flack wrote:
> > That'd be a *lot* of additional complexity, and pretty much prohibitive
> > from a performance POV. We'd have to not continue decoding on the server
> > side *all* the time to give the client a chance to inquire additional
> > information.
> 
> Does anything travel in the client->server direction during replication?
> I thought I saw CopyBoth mentioned. Is there a select()/poll() being done
> so those messages can be received?

Yes, acknowledgements of how far data has been received (and how far
processed), which is then used to release resources (WAL, xid horizon)
and allow synchronous replication to block until something has been
received.

- Andres

Re: Binary support for pgoutput plugin

From

Tomas Vondra

Date:

08 June 2019, 22:27:34

On Fri, Jun 07, 2019 at 06:01:12PM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-06-07 20:52:38 -0400, Chapman Flack wrote:
>> It seems they had ended up designing a whole 'nother "protocol level"
>> involving queries wrapping their results as JSON and an app layer that
>> unwraps again, after trying a simpler first approach that was foiled by the
>> inability to see into arrays and anonymous record types in the 'describe'
>> response.
>
>I suspect quite a few people would have to have left the projectbefore
>this would happen.
>
>
>> I thought, in a new protocol rev, why not let the driver send additional
>> 'describe' messages after the first one, to drill into structure of
>> individual columns mentioned in the first response, before sending the
>> 'execute' message?
>>
>> If it doesn't want the further detail, it doesn't have to ask.
>>
>> > And then we suddenly need tracking for all these, so we don't always
>> > send out that information when we previously already did
>>
>> If it's up to the client driver, it can track what it needs or already has.
>
>> I haven't looked too deeply into the replication protocol ... it happens
>> under a kind of copy-both, right?, so maybe there's a way for the receiver
>> to send some inquiries back, but maybe in a windowed, full-duplex way where
>> it might have to buffer some incoming messages before getting the response
>> to an inquiry message it sent.
>
>That'd be a *lot* of additional complexity, and pretty much prohibitive
>from a performance POV. We'd have to not continue decoding on the server
>side *all* the time to give the client a chance to inquire additional
>information.
>

I kinda agree with this, and I think it's an argument why replication
solutions that need such additional metadata (e.g. because they have no
database to query) should not rely on pgoutput but should invent their own
decoding plugin. Which is why it's a plugin.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Binary support for pgoutput plugin

From

Dave Cramer

Date:

08 June 2019, 23:41:34

This should have gone to hackers as well

---------- Forwarded message ---------
From: Dave Cramer <davecramer@gmail.com>
Date: Sat, Jun 8, 2019, 6:41 PM
Subject: Re: Binary support for pgoutput plugin
To: Tomas Vondra <tomas.vondra@2ndquadrant.com>

On Sat, Jun 8, 2019, 6:27 PM Tomas Vondra, <tomas.vondra@2ndquadrant.com> wrote:

On Fri, Jun 07, 2019 at 06:01:12PM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-06-07 20:52:38 -0400, Chapman Flack wrote:
>> It seems they had ended up designing a whole 'nother "protocol level"
>> involving queries wrapping their results as JSON and an app layer that
>> unwraps again, after trying a simpler first approach that was foiled by the
>> inability to see into arrays and anonymous record types in the 'describe'
>> response.
>
>I suspect quite a few people would have to have left the projectbefore
>this would happen.
>
>
>> I thought, in a new protocol rev, why not let the driver send additional
>> 'describe' messages after the first one, to drill into structure of
>> individual columns mentioned in the first response, before sending the
>> 'execute' message?
>>
>> If it doesn't want the further detail, it doesn't have to ask.
>>
>> > And then we suddenly need tracking for all these, so we don't always
>> > send out that information when we previously already did
>>
>> If it's up to the client driver, it can track what it needs or already has.
>
>> I haven't looked too deeply into the replication protocol ... it happens
>> under a kind of copy-both, right?, so maybe there's a way for the receiver
>> to send some inquiries back, but maybe in a windowed, full-duplex way where
>> it might have to buffer some incoming messages before getting the response
>> to an inquiry message it sent.
>
>That'd be a *lot* of additional complexity, and pretty much prohibitive
>from a performance POV. We'd have to not continue decoding on the server
>side *all* the time to give the client a chance to inquire additional
>information.
>

I kinda agree with this, and I think it's an argument why replication
solutions that need such additional metadata (e.g. because they have no
database to query) should not rely on pgoutput but should invent their own
decoding plugin. Which is why it's a plugin.

So the reason we are discussing using pgoutput plugin is because it is part of core and guaranteed to be in cloud providers solutions. I'm trying to find a balance here of using what we have as opposed to burdening core to take on additional code to take care of. Not sending the metadata is not a deal breaker but i can see some value in it.

Dave

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

09 June 2019, 00:09:44

Hi,

On 2019-06-08 19:41:34 -0400, Dave Cramer wrote:
> So the reason we are discussing using pgoutput plugin is because it is part
> of core and guaranteed to be in cloud providers solutions.

IMO people needing this should then band together and write one that's
suitable, rather than trying to coerce test_decoding and now pgoutput
into something they're not made for.

Greetings,

Andres Freund

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

09 June 2019, 00:40:43

On Sat, 8 Jun 2019 at 20:09, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-06-08 19:41:34 -0400, Dave Cramer wrote:
> So the reason we are discussing using pgoutput plugin is because it is part
> of core and guaranteed to be in cloud providers solutions.

IMO people needing this should then band together and write one that's
suitable, rather than trying to coerce test_decoding and now pgoutput
into something they're not made for.

At the moment it would look a lot like pgoutput. For now I'm fine with no changes to pgoutput other than binary

Once we have some real use cases we can look at writing a new one. I would imagine we would actually start with

pgoutput and just add to it.

Thanks,

Dave

Re: Binary support for pgoutput plugin

From

Tomas Vondra

Date:

09 June 2019, 10:47:42

On Sat, Jun 08, 2019 at 08:40:43PM -0400, Dave Cramer wrote:
>On Sat, 8 Jun 2019 at 20:09, Andres Freund <andres@anarazel.de> wrote:
>
>> Hi,
>>
>> On 2019-06-08 19:41:34 -0400, Dave Cramer wrote:
>> > So the reason we are discussing using pgoutput plugin is because it is
>> part
>> > of core and guaranteed to be in cloud providers solutions.
>>
>> IMO people needing this should then band together and write one that's
>> suitable, rather than trying to coerce test_decoding and now pgoutput
>> into something they're not made for.
>>
>
>At the moment it would look a lot like pgoutput. For now I'm fine with no
>changes to pgoutput other than binary
>Once we have some real use cases we can look at writing a new one. I would
>imagine we would actually start with
>pgoutput and just add to it.
>

I understand the desire to make this work for managed cloud environments,
we support quite a few customers who would benefit from it. But pgoutput
is meant specifically for built-in replication, and adding complexity that
is useless for that use case does not seem like a good tradeoff.

From this POV the binary mode is fine, because it'd benefit pgoutput, but
the various other stuff mentioned here (e.g. nullability) is not.

And if we implement a new plugin for use by out-of-core stuff, I guess
we'd probably done it in an extension. But even having it in contrib would
not make it automatically installed on managed systems, because AFAIK the
various providers only allow whitelisted extensions. At which point
there's there's little difference compared to external extensions.

I think the best party to implement such extension is whoever implements
such replication system (say Debezium), because they are the ones who know
which format / behavior would work for them. And they can also show the
benefit to their users, who can then push the cloud providers to install
the extension. Of course, that'll take a long time (but it's unclear how
long), and until then they'll have to provide some fallback.

This is a bit of a chicken-egg problem, with three parties - our project,
projects building on logical replication and cloud providers. And no
matter how you slice it, the party implementing it has only limited (if
any) control over what the other parties allow.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

10 June 2019, 11:27:41

So back to binary output.

From what I can tell the place to specify binary options would be in the create publication and or in replication slots?

The challenge as I see it is that the subscriber would have to be able to decode binary output.

Any thoughts on how to handle this? At the moment I'm assuming that this would only work for subscribers that knew how to handle binary.

Regards,

Dave

Re: Binary support for pgoutput plugin

From

Petr Jelinek

Date:

10 June 2019, 11:49:47

Hi,

On 10/06/2019 13:27, Dave Cramer wrote:
> So back to binary output.
> 
> From what I can tell the place to specify binary options would be in the
> create publication and or in replication slots?
> 
> The challenge as I see it is that the subscriber would have to be able
> to decode binary output. 
> 
> Any thoughts on how to handle this? At the moment I'm assuming that this
> would only work for subscribers that knew how to handle binary.
> 

Given that we don't need to write anything extra to WAL on upstream to
support binary output, why not just have the request for binary data as
an option for the pgoutput and have it chosen dynamically? Then it's the
subscriber who asks for binary output via option(s) to START_REPLICATION.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions
https://www.2ndQuadrant.com/

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

11 June 2019, 19:44:02

OK, before I go too much further down this rabbit hole I'd like feedback on the current code. See attached patch

There is one obvious hack where in binary mode I reset the input cursor to allow the binary input to be re-read

From what I can tell the alternative is to convert the data in logicalrep_read_tuple but that would require moving a lot of the logic currently in worker.c to proto.c. This seems minimally invasive.

and thanks Petr for the tip to use pglogical for ideas.

Thanks,

Dave Cramer

Attachment

0001-First-pass-at-working-code-without-subscription-opti.patch

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

12 June 2019, 14:35:48

On Mon, 10 Jun 2019 at 07:49, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

Hi,

On 10/06/2019 13:27, Dave Cramer wrote:
> So back to binary output.
>
> From what I can tell the place to specify binary options would be in the
> create publication and or in replication slots?
>
> The challenge as I see it is that the subscriber would have to be able
> to decode binary output.
>
> Any thoughts on how to handle this? At the moment I'm assuming that this
> would only work for subscribers that knew how to handle binary.
>

Given that we don't need to write anything extra to WAL on upstream to
support binary output, why not just have the request for binary data as
an option for the pgoutput and have it chosen dynamically? Then it's the
subscriber who asks for binary output via option(s) to START_REPLICATION.

If I understand this correctly this would add something to the CREATE/ALTER SUBSCRIPTION commands in the WITH clause.

Additionally another column would be required for pg_subscription for the binary option.

Does it make sense to add an options column which would just be a comma separated string?

Not that I have future options in mind but seems like something that might come up in the future.

Dave Cramer

Re: Binary support for pgoutput plugin

From

Tomas Vondra

Date:

14 June 2019, 18:36:22

On Wed, Jun 12, 2019 at 10:35:48AM -0400, Dave Cramer wrote:
>On Mon, 10 Jun 2019 at 07:49, Petr Jelinek <petr.jelinek@2ndquadrant.com>
>wrote:
>
>> Hi,
>>
>> On 10/06/2019 13:27, Dave Cramer wrote:
>> > So back to binary output.
>> >
>> > From what I can tell the place to specify binary options would be in the
>> > create publication and or in replication slots?
>> >
>> > The challenge as I see it is that the subscriber would have to be able
>> > to decode binary output.
>> >
>> > Any thoughts on how to handle this? At the moment I'm assuming that this
>> > would only work for subscribers that knew how to handle binary.
>> >
>>
>> Given that we don't need to write anything extra to WAL on upstream to
>> support binary output, why not just have the request for binary data as
>> an option for the pgoutput and have it chosen dynamically? Then it's the
>> subscriber who asks for binary output via option(s) to START_REPLICATION.
>>
>
>If I understand this correctly this would add something to the CREATE/ALTER
>SUBSCRIPTION commands in the WITH clause.
>Additionally another column would be required for pg_subscription for the
>binary option.
>Does it make sense to add an options column which would just be a comma
>separated string?
>Not that I have future options in mind but seems like something that might
>come up in the future.
>

I'd just add a boolean column to the catalog. That's what I did in the
patch adding support for streaming in-progress transactions. I don't think
we expect many additional parameters, so it makes little sense to optimize
for that case.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

14 June 2019, 19:42:47

Dave Cramer

On Fri, 14 Jun 2019 at 14:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Wed, Jun 12, 2019 at 10:35:48AM -0400, Dave Cramer wrote:
>On Mon, 10 Jun 2019 at 07:49, Petr Jelinek <petr.jelinek@2ndquadrant.com>
>wrote:
>
>> Hi,
>>
>> On 10/06/2019 13:27, Dave Cramer wrote:
>> > So back to binary output.
>> >
>> > From what I can tell the place to specify binary options would be in the
>> > create publication and or in replication slots?
>> >
>> > The challenge as I see it is that the subscriber would have to be able
>> > to decode binary output.
>> >
>> > Any thoughts on how to handle this? At the moment I'm assuming that this
>> > would only work for subscribers that knew how to handle binary.
>> >
>>
>> Given that we don't need to write anything extra to WAL on upstream to
>> support binary output, why not just have the request for binary data as
>> an option for the pgoutput and have it chosen dynamically? Then it's the
>> subscriber who asks for binary output via option(s) to START_REPLICATION.
>>
>
>If I understand this correctly this would add something to the CREATE/ALTER
>SUBSCRIPTION commands in the WITH clause.
>Additionally another column would be required for pg_subscription for the
>binary option.
>Does it make sense to add an options column which would just be a comma
>separated string?
>Not that I have future options in mind but seems like something that might
>come up in the future.
>

I'd just add a boolean column to the catalog. That's what I did in the
patch adding support for streaming in-progress transactions. I don't think
we expect many additional parameters, so it makes little sense to optimize
for that case.

Which is what I have done. Thanks

I've attached both patches for comments.

I still have to add documentation.

Regards,

Dave

Attachment

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

17 June 2019, 14:29:26

On Fri, 14 Jun 2019 at 15:42, Dave Cramer <davecramer@gmail.com> wrote:

Dave Cramer

On Fri, 14 Jun 2019 at 14:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Wed, Jun 12, 2019 at 10:35:48AM -0400, Dave Cramer wrote:
>On Mon, 10 Jun 2019 at 07:49, Petr Jelinek <petr.jelinek@2ndquadrant.com>
>wrote:
>
>> Hi,
>>
>> On 10/06/2019 13:27, Dave Cramer wrote:
>> > So back to binary output.
>> >
>> > From what I can tell the place to specify binary options would be in the
>> > create publication and or in replication slots?
>> >
>> > The challenge as I see it is that the subscriber would have to be able
>> > to decode binary output.
>> >
>> > Any thoughts on how to handle this? At the moment I'm assuming that this
>> > would only work for subscribers that knew how to handle binary.
>> >
>>
>> Given that we don't need to write anything extra to WAL on upstream to
>> support binary output, why not just have the request for binary data as
>> an option for the pgoutput and have it chosen dynamically? Then it's the
>> subscriber who asks for binary output via option(s) to START_REPLICATION.
>>
>
>If I understand this correctly this would add something to the CREATE/ALTER
>SUBSCRIPTION commands in the WITH clause.
>Additionally another column would be required for pg_subscription for the
>binary option.
>Does it make sense to add an options column which would just be a comma
>separated string?
>Not that I have future options in mind but seems like something that might
>come up in the future.
>

I'd just add a boolean column to the catalog. That's what I did in the
patch adding support for streaming in-progress transactions. I don't think
we expect many additional parameters, so it makes little sense to optimize
for that case.

Which is what I have done. Thanks

I've attached both patches for comments.
I still have to add documentation.

Regards,

Dave

Additional patch for documentation.

Dave Cramer

Attachment

0003-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

04 July 2019, 13:49:30

On Wed, 5 Jun 2019 at 18:50, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-06-05 18:47:57 -0400, Dave Cramer wrote:
> So one of the things they would like added is to get not null information
> in the schema record. This is so they can mark the field Optional in Java.
> I presume this would also have some uses in other languages. As I
> understand it this would require a protocol bump. If this were to be
> accepted are there any outstanding asks that would useful to add if we were
> going to bump the protocol?

I'm pretty strongly opposed to this. What's the limiting factor when
adding such information? I think clients that want something like this
ought to query the database for catalog information when getting schema
information.

So talking some more to the guys that want to use this for Change Data Capture they pointed out that if the schema changes while they are offline there is no way to query the catalog information

Dave

Re: Binary support for pgoutput plugin

From

Dmitry Dolgov

Date:

27 October 2019, 15:02:28

> On Mon, Jun 17, 2019 at 10:29:26AM -0400, Dave Cramer wrote:
> > Which is what I have done. Thanks
> >
> > I've attached both patches for comments.
> > I still have to add documentation.
>
> Additional patch for documentation.

Thank you for the patch! Unfortunately 0002 has some conflicts, could
you please send a rebased version? In the meantime I have few questions:

    -LogicalRepRelId
    +void
     logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)
     {
        char        action;
    -    LogicalRepRelId relid;
    -
    -    /* read the relation id */
    -    relid = pq_getmsgint(in, 4);

        action = pq_getmsgbyte(in);
        if (action != 'N')
    @@ -175,7 +173,6 @@ logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)

        logicalrep_read_tuple(in, newtup);

    -    return relid;
     }

    ....

    -    relid = logicalrep_read_insert(s, &newtup);
    +    /* read the relation id */
    +    relid = pq_getmsgint(s, 4);
        rel = logicalrep_rel_open(relid, RowExclusiveLock);
    +
    +    logicalrep_read_insert(s, &newtup);

Maybe I'm missing something, what is the reason of moving pq_getmsgint
out of logicalrep_read_*? Just from the patch it seems that the code is
equivalent.

> There is one obvious hack where in binary mode I reset the input
> cursor to allow the binary input to be re-read From what I can tell
> the alternative is to convert the data in logicalrep_read_tuple but
> that would require moving a lot of the logic currently in worker.c to
> proto.c. This seems minimally invasive.

Which logic has to be moved for example? Actually it sounds more natural
to me, if this functionality would be in e.g. logicalrep_read_tuple
rather than slot_store_data, since it has something to do with reading
data. And it seems that in pglogical it's also located in
pglogical_read_tuple.

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

30 October 2019, 14:03:01

On Sun, 27 Oct 2019 at 11:00, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> On Mon, Jun 17, 2019 at 10:29:26AM -0400, Dave Cramer wrote:
> > Which is what I have done. Thanks
> >
> > I've attached both patches for comments.
> > I still have to add documentation.
>
> Additional patch for documentation.

Thank you for the patch! Unfortunately 0002 has some conflicts, could
you please send a rebased version? In the meantime I have few questions:

-LogicalRepRelId
+void
logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)
{
char action;
- LogicalRepRelId relid;
-
- /* read the relation id */
- relid = pq_getmsgint(in, 4);

action = pq_getmsgbyte(in);
if (action != 'N')
@@ -175,7 +173,6 @@ logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)

logicalrep_read_tuple(in, newtup);

- return relid;
}

....

- relid = logicalrep_read_insert(s, &newtup);
+ /* read the relation id */
+ relid = pq_getmsgint(s, 4);
rel = logicalrep_rel_open(relid, RowExclusiveLock);
+
+ logicalrep_read_insert(s, &newtup);

Maybe I'm missing something, what is the reason of moving pq_getmsgint
out of logicalrep_read_*? Just from the patch it seems that the code is
equivalent.

> There is one obvious hack where in binary mode I reset the input
> cursor to allow the binary input to be re-read From what I can tell
> the alternative is to convert the data in logicalrep_read_tuple but
> that would require moving a lot of the logic currently in worker.c to
> proto.c. This seems minimally invasive.

Which logic has to be moved for example? Actually it sounds more natural
to me, if this functionality would be in e.g. logicalrep_read_tuple
rather than slot_store_data, since it has something to do with reading
data. And it seems that in pglogical it's also located in
pglogical_read_tuple.

Ok, I've rebased and reverted logicalrep_read_insert

As to the last comment, honestly it's been so long I can't remember why I put that comment in there.

Thanks for reviewing

Dave

Attachment

Re: Binary support for pgoutput plugin

From

Thomas Munro

Date:

04 November 2019, 02:46:49

On Thu, Oct 31, 2019 at 3:03 AM Dave Cramer <davecramer@gmail.com> wrote:
> Ok, I've rebased and reverted logicalrep_read_insert

Hi Dave,

From the code style police (actually just from cfbot, which is set up
to complain about declarations after statements, a bit of C99 we
aren't ready for):

proto.c:557:6: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
      int len = pq_getmsgint(in, 4); /* read length */
      ^
proto.c:573:6: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
      int len = pq_getmsgint(in, 4); /* read length */
      ^

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

05 November 2019, 12:16:10

On Sun, 3 Nov 2019 at 21:47, Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Oct 31, 2019 at 3:03 AM Dave Cramer <davecramer@gmail.com> wrote:
> Ok, I've rebased and reverted logicalrep_read_insert

Hi Dave,

From the code style police (actually just from cfbot, which is set up
to complain about declarations after statements, a bit of C99 we
aren't ready for):

proto.c:557:6: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
int len = pq_getmsgint(in, 4); /* read length */
^
proto.c:573:6: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
int len = pq_getmsgint(in, 4); /* read length */
^

Thomas,

Thanks for the review.

See attached

Attachment

0005-Remove-C99-declaration-and-code.patch

Re: Binary support for pgoutput plugin

From

Dmitry Dolgov

Date:

08 November 2019, 16:22:32

> On Tue, Nov 05, 2019 at 07:16:10AM -0500, Dave Cramer wrote:
>
> See attached

    --- a/src/backend/replication/logical/worker.c
    +++ b/src/backend/replication/logical/worker.c
    @@ -1779,6 +1779,7 @@ ApplyWorkerMain(Datum main_arg)
        options.slotname = myslotname;
        options.proto.logical.proto_version = LOGICALREP_PROTO_VERSION_NUM;
        options.proto.logical.publication_names = MySubscription->publications;
    +    options.proto.logical.binary = MySubscription->binary;

I'm a bit confused, shouldn't be there also

    --- a/src/backend/catalog/pg_subscription.c
    +++ b/src/backend/catalog/pg_subscription.c
    @@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok)
            sub->name = pstrdup(NameStr(subform->subname));
            sub->owner = subform->subowner;
            sub->enabled = subform->subenabled;
    +       sub->binary = subform->subbinary;

in the GetSubscription?

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

11 November 2019, 16:15:45

On Fri, 8 Nov 2019 at 11:20, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> On Tue, Nov 05, 2019 at 07:16:10AM -0500, Dave Cramer wrote:
>
> See attached

--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1779,6 +1779,7 @@ ApplyWorkerMain(Datum main_arg)
options.slotname = myslotname;
options.proto.logical.proto_version = LOGICALREP_PROTO_VERSION_NUM;
options.proto.logical.publication_names = MySubscription->publications;
+ options.proto.logical.binary = MySubscription->binary;

I'm a bit confused, shouldn't be there also

--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok)
sub->name = pstrdup(NameStr(subform->subname));
sub->owner = subform->subowner;
sub->enabled = subform->subenabled;
+ sub->binary = subform->subbinary;

in the GetSubscription?

yes, I have added this. I will supply an updated patch later.

Now a bigger question(s).

Previously someone mentioned that we need to confirm whether the two servers are compatible for binary or not.

Checking to make sure the two servers have the same endianness is obvious.

Sizeof int, long, float, double, timestamp (float/int) at a minimum.

this could be done in libpqrcv_startstreaming. The question I have remaining is do we fall back to text mode if needed or simply fail ?

Dave

Re: Binary support for pgoutput plugin

From

Alvaro Herrera

Date:

11 November 2019, 16:55:59

On 2019-Nov-11, Dave Cramer wrote:

> Previously someone mentioned that we need to confirm whether the two
> servers are compatible for binary or not.
> 
> Checking to make sure the two servers have the same endianness is obvious.
> Sizeof int, long, float, double, timestamp (float/int) at a minimum.
> 
> this could be done in libpqrcv_startstreaming. The question I have
> remaining is do we fall back to text mode if needed or simply fail ?

I think it makes more sense to have it fail.  If the user wants to retry
in text mode, they can do that easily enough; but if we make it
fall-back automatically and they set up the received wrongly by mistake,
they would pay the performance penalty without noticing.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

11 November 2019, 17:07:20

On Mon, 11 Nov 2019 at 12:04, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Nov-11, Dave Cramer wrote:

> Previously someone mentioned that we need to confirm whether the two
> servers are compatible for binary or not.
>
> Checking to make sure the two servers have the same endianness is obvious.
> Sizeof int, long, float, double, timestamp (float/int) at a minimum.
>
> this could be done in libpqrcv_startstreaming. The question I have
> remaining is do we fall back to text mode if needed or simply fail ?

I think it makes more sense to have it fail. If the user wants to retry
in text mode, they can do that easily enough; but if we make it
fall-back automatically and they set up the received wrongly by mistake,
they would pay the performance penalty without noticing.

Alvaro,

thanks, after sending this I pretty much came to the same conclusion.

Dave

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

11 November 2019, 19:42:30

On Mon, 11 Nov 2019 at 12:07, Dave Cramer <davecramer@gmail.com> wrote:

On Mon, 11 Nov 2019 at 12:04, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Nov-11, Dave Cramer wrote:

> Previously someone mentioned that we need to confirm whether the two
> servers are compatible for binary or not.
>
> Checking to make sure the two servers have the same endianness is obvious.
> Sizeof int, long, float, double, timestamp (float/int) at a minimum.
>
> this could be done in libpqrcv_startstreaming. The question I have
> remaining is do we fall back to text mode if needed or simply fail ?

I think it makes more sense to have it fail. If the user wants to retry
in text mode, they can do that easily enough; but if we make it
fall-back automatically and they set up the received wrongly by mistake,
they would pay the performance penalty without noticing.

Alvaro,

thanks, after sending this I pretty much came to the same conclusion.

Dave

Following 2 patches address Dmitry's concern and check for compatibility.

Attachment

Re: Binary support for pgoutput plugin

From

Dmitry Dolgov

Date:

11 November 2019, 19:46:20

> On Mon, Nov 11, 2019 at 11:15:45AM -0500, Dave Cramer wrote:
> On Fri, 8 Nov 2019 at 11:20, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > > On Tue, Nov 05, 2019 at 07:16:10AM -0500, Dave Cramer wrote:
> > >
> > > See attached
> >
> >     --- a/src/backend/replication/logical/worker.c
> >     +++ b/src/backend/replication/logical/worker.c
> >     @@ -1779,6 +1779,7 @@ ApplyWorkerMain(Datum main_arg)
> >         options.slotname = myslotname;
> >         options.proto.logical.proto_version = LOGICALREP_PROTO_VERSION_NUM;
> >         options.proto.logical.publication_names =
> > MySubscription->publications;
> >     +   options.proto.logical.binary = MySubscription->binary;
> >
> > I'm a bit confused, shouldn't be there also
> >
> >         --- a/src/backend/catalog/pg_subscription.c
> >         +++ b/src/backend/catalog/pg_subscription.c
> >         @@ -71,6 +71,7 @@ GetSubscription(Oid subid, bool missing_ok)
> >                         sub->name = pstrdup(NameStr(subform->subname));
> >                         sub->owner = subform->subowner;
> >                         sub->enabled = subform->subenabled;
> >         +       sub->binary = subform->subbinary;
> >
> > in the GetSubscription?
> >
>
> yes, I have added this. I will supply an updated patch later.
>
> Now a bigger question(s).

Well, without this change it wasn't working for me at all. Other than
that yes, it was a small question :)

Re: Binary support for pgoutput plugin

From

Alvaro Herrera

Date:

11 November 2019, 20:17:25

On 2019-Nov-11, Dave Cramer wrote:

> Following 2 patches address Dmitry's concern and check for compatibility.

Please resend the whole patchset, so that the patch tester can verify
the series.  (Doing it helps humans, too).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

11 November 2019, 20:24:59

On Mon, 11 Nov 2019 at 15:17, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Nov-11, Dave Cramer wrote:

> Following 2 patches address Dmitry's concern and check for compatibility.

Please resend the whole patchset, so that the patch tester can verify
the series. (Doing it helps humans, too).

Attached,

Thanks,

Dave

Attachment

Re: Binary support for pgoutput plugin

From

Michael Paquier

Date:

01 December 2019, 01:47:53

Hi,

On Mon, Nov 11, 2019 at 03:24:59PM -0500, Dave Cramer wrote:
> Attached,

The latest patch set does not apply correctly.  Could you send a
rebase please?  I am moving the patch to next CF, waiting on author.
--
Michael

Attachment

signature.asc

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

02 December 2019, 19:35:40

Rebased against head

Dave Cramer

On Sat, 30 Nov 2019 at 20:48, Michael Paquier <michael@paquier.xyz> wrote:

Hi,

On Mon, Nov 11, 2019 at 03:24:59PM -0500, Dave Cramer wrote:
> Attached,

The latest patch set does not apply correctly. Could you send a
rebase please? I am moving the patch to next CF, waiting on author.
--
Michael

Attachment

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

17 January 2020, 17:26:41

On Mon, 2 Dec 2019 at 14:35, Dave Cramer <davecramer@gmail.com> wrote:

Rebased against head

Dave Cramer

On Sat, 30 Nov 2019 at 20:48, Michael Paquier <michael@paquier.xyz> wrote:
Hi,

On Mon, Nov 11, 2019 at 03:24:59PM -0500, Dave Cramer wrote:
> Attached,

The latest patch set does not apply correctly. Could you send a
rebase please? I am moving the patch to next CF, waiting on author.
--
Michael

Can I get someone to review this please ?

Dave Cramer

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

28 February 2020, 23:34:10

Dave Cramer <davecramer@gmail.com> writes:
> Rebased against head

The cfbot's failing to apply this [1].  It looks like the reason is only
that you included a catversion bump in what you submitted.  Protocol is to
*not* do that in submitted patches, but rely on the committer to add it at
the last minute --- otherwise you'll waste a lot of time rebasing the
patch, which is what it needs now.

            regards, tom lane

[1] http://cfbot.cputube.org/patch_27_2152.log

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

29 February 2020, 15:44:44

On Fri, 28 Feb 2020 at 18:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dave Cramer <davecramer@gmail.com> writes:
> Rebased against head

The cfbot's failing to apply this [1]. It looks like the reason is only
that you included a catversion bump in what you submitted. Protocol is to
*not* do that in submitted patches, but rely on the committer to add it at
the last minute --- otherwise you'll waste a lot of time rebasing the
patch, which is what it needs now.

regards, tom lane

[1] http://cfbot.cputube.org/patch_27_2152.log

rebased and removed the catversion bump.

Attachment

Re: Binary support for pgoutput plugin

From

Petr Jelinek

Date:

06 March 2020, 16:54:15

Hi Dave,

On 29/02/2020 18:44, Dave Cramer wrote:
> 
> 
> rebased and removed the catversion bump.

Looked into this and it generally seems okay, but I do have one gripe here:

> +                    tuple->values[i].data = palloc(len + 1);
> +                    /* and data */
> +
> +                    pq_copymsgbytes(in, tuple->values[i].data, len);
> +                    tuple->values[i].len = len;
> +                    tuple->values[i].cursor = 0;
> +                    tuple->values[i].maxlen = len;
> +                    /* not strictly necessary but the docs say it is required */
> +                    tuple->values[i].data[len] = '\0';
> +                    break;
> +                }
> +            case 't':            /* text formatted value */
> +                {
> +                    tuple->changed[i] = true;
> +                    int len = pq_getmsgint(in, 4);    /* read length */
>  
>                      /* and data */
> -                    tuple->values[i] = palloc(len + 1);
> -                    pq_copymsgbytes(in, tuple->values[i], len);
> -                    tuple->values[i][len] = '\0';
> +                    tuple->values[i].data = palloc(len + 1);
> +                    pq_copymsgbytes(in, tuple->values[i].data, len);
> +                    tuple->values[i].data[len] = '\0';
> +                    tuple->values[i].len = len;

The cursor should be set to 0 in the text formatted case too if this is 
how we chose to encode data.

However I am not quite convinced I like the StringInfoData usage here. 
Why not just change the struct to include additional array of lengths 
rather than replacing the existing values array with StringInfoData 
array, that seems generally both simpler and should have smaller memory 
footprint too, no?

We could also merge the binary and changed arrays into single char array 
named something like format (as data can be either unchanged, binary or 
text) and just reuse same identifiers we have in protocol.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

07 March 2020, 23:18:38

On Fri, 6 Mar 2020 at 08:54, Petr Jelinek <petr@2ndquadrant.com> wrote:

Hi Dave,

On 29/02/2020 18:44, Dave Cramer wrote:
>
>
> rebased and removed the catversion bump.

Looked into this and it generally seems okay, but I do have one gripe here:

> + tuple->values[i].data = palloc(len + 1);
> + /* and data */
> +
> + pq_copymsgbytes(in, tuple->values[i].data, len);
> + tuple->values[i].len = len;
> + tuple->values[i].cursor = 0;
> + tuple->values[i].maxlen = len;
> + /* not strictly necessary but the docs say it is required */
> + tuple->values[i].data[len] = '\0';
> + break;
> + }
> + case 't': /* text formatted value */
> + {
> + tuple->changed[i] = true;
> + int len = pq_getmsgint(in, 4); /* read length */
>
> /* and data */
> - tuple->values[i] = palloc(len + 1);
> - pq_copymsgbytes(in, tuple->values[i], len);
> - tuple->values[i][len] = '\0';
> + tuple->values[i].data = palloc(len + 1);
> + pq_copymsgbytes(in, tuple->values[i].data, len);
> + tuple->values[i].data[len] = '\0';
> + tuple->values[i].len = len;

The cursor should be set to 0 in the text formatted case too if this is
how we chose to encode data.

However I am not quite convinced I like the StringInfoData usage here.
Why not just change the struct to include additional array of lengths
rather than replacing the existing values array with StringInfoData
array, that seems generally both simpler and should have smaller memory
footprint too, no?

Can you explain this a bit more? I don't really see a huge difference in memory usage.

We still need length and the data copied into LogicalRepTupleData when sending the data in binary, no?

We could also merge the binary and changed arrays into single char array
named something like format (as data can be either unchanged, binary or
text) and just reuse same identifiers we have in protocol.

This seems like a good idea.

Dave Cramer

--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Re: Binary support for pgoutput plugin

From

Petr Jelinek

Date:

03 April 2020, 07:42:57

Hi,

On 08/03/2020 00:18, Dave Cramer wrote:
> On Fri, 6 Mar 2020 at 08:54, Petr Jelinek <petr@2ndquadrant.com 
> <mailto:petr@2ndquadrant.com>> wrote:
> 
>     Hi Dave,
> 
>     On 29/02/2020 18:44, Dave Cramer wrote:
>      >
>      >
>      > rebased and removed the catversion bump.
> 
>     Looked into this and it generally seems okay, but I do have one
>     gripe here:
> 
>      > +                                     tuple->values[i].data =
>     palloc(len + 1);
>      > +                                     /* and data */
>      > +
>      > +                                     pq_copymsgbytes(in,
>     tuple->values[i].data, len);
>      > +                                     tuple->values[i].len = len;
>      > +                                     tuple->values[i].cursor = 0;
>      > +                                     tuple->values[i].maxlen = len;
>      > +                                     /* not strictly necessary
>     but the docs say it is required */
>      > +                                     tuple->values[i].data[len]
>     = '\0';
>      > +                                     break;
>      > +                             }
>      > +                     case 't':                       /* text
>     formatted value */
>      > +                             {
>      > +                                     tuple->changed[i] = true;
>      > +                                     int len = pq_getmsgint(in,
>     4);  /* read length */
>      >
>      >                                       /* and data */
>      > -                                     tuple->values[i] =
>     palloc(len + 1);
>      > -                                     pq_copymsgbytes(in,
>     tuple->values[i], len);
>      > -                                     tuple->values[i][len] = '\0';
>      > +                                     tuple->values[i].data =
>     palloc(len + 1);
>      > +                                     pq_copymsgbytes(in,
>     tuple->values[i].data, len);
>      > +                                     tuple->values[i].data[len]
>     = '\0';
>      > +                                     tuple->values[i].len = len;
> 
>     The cursor should be set to 0 in the text formatted case too if this is
>     how we chose to encode data.
> 
>     However I am not quite convinced I like the StringInfoData usage here.
>     Why not just change the struct to include additional array of lengths
>     rather than replacing the existing values array with StringInfoData
>     array, that seems generally both simpler and should have smaller memory
>     footprint too, no?
> 
> 
> Can you explain this a bit more? I don't really see a huge difference in 
> memory usage.
> We still need length and the data copied into LogicalRepTupleData when 
> sending the data in binary, no?
> 

Yes but we don't need to have fixed sized array of 1600 elements that 
contains maxlen and cursor positions of the StringInfoData struct which 
we don't use for anything AFAICS.


-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

03 April 2020, 20:44:11

On Fri, 3 Apr 2020 at 03:43, Petr Jelinek <petr@2ndquadrant.com> wrote:

Hi,

On 08/03/2020 00:18, Dave Cramer wrote:
> On Fri, 6 Mar 2020 at 08:54, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>
> Hi Dave,
>
> On 29/02/2020 18:44, Dave Cramer wrote:
> >
> >
> > rebased and removed the catversion bump.
>
> Looked into this and it generally seems okay, but I do have one
> gripe here:
>
> > + tuple->values[i].data =
> palloc(len + 1);
> > + /* and data */
> > +
> > + pq_copymsgbytes(in,
> tuple->values[i].data, len);
> > + tuple->values[i].len = len;
> > + tuple->values[i].cursor = 0;
> > + tuple->values[i].maxlen = len;
> > + /* not strictly necessary
> but the docs say it is required */
> > + tuple->values[i].data[len]
> = '\0';
> > + break;
> > + }
> > + case 't': /* text
> formatted value */
> > + {
> > + tuple->changed[i] = true;
> > + int len = pq_getmsgint(in,
> 4); /* read length */
> >
> > /* and data */
> > - tuple->values[i] =
> palloc(len + 1);
> > - pq_copymsgbytes(in,
> tuple->values[i], len);
> > - tuple->values[i][len] = '\0';
> > + tuple->values[i].data =
> palloc(len + 1);
> > + pq_copymsgbytes(in,
> tuple->values[i].data, len);
> > + tuple->values[i].data[len]
> = '\0';
> > + tuple->values[i].len = len;
>
> The cursor should be set to 0 in the text formatted case too if this is
> how we chose to encode data.
>
> However I am not quite convinced I like the StringInfoData usage here.
> Why not just change the struct to include additional array of lengths
> rather than replacing the existing values array with StringInfoData
> array, that seems generally both simpler and should have smaller memory
> footprint too, no?
>
>
> Can you explain this a bit more? I don't really see a huge difference in
> memory usage.
> We still need length and the data copied into LogicalRepTupleData when
> sending the data in binary, no?
>

Yes but we don't need to have fixed sized array of 1600 elements that
contains maxlen and cursor positions of the StringInfoData struct which
we don't use for anything AFAICS.

OK, I can see an easy way to only allocate the number of elements required but since OidReceiveFunctionCall takes

StringInfo as one of it's arguments it seems like an easy path unless there is something I am missing ?

Dave

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

07 April 2020, 19:45:57

On Fri, 3 Apr 2020 at 16:44, Dave Cramer <davecramer@gmail.com> wrote:

On Fri, 3 Apr 2020 at 03:43, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,

On 08/03/2020 00:18, Dave Cramer wrote:
> On Fri, 6 Mar 2020 at 08:54, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>
> Hi Dave,
>
> On 29/02/2020 18:44, Dave Cramer wrote:
> >
> >
> > rebased and removed the catversion bump.
>
> Looked into this and it generally seems okay, but I do have one
> gripe here:
>
> > + tuple->values[i].data =
> palloc(len + 1);
> > + /* and data */
> > +
> > + pq_copymsgbytes(in,
> tuple->values[i].data, len);
> > + tuple->values[i].len = len;
> > + tuple->values[i].cursor = 0;
> > + tuple->values[i].maxlen = len;
> > + /* not strictly necessary
> but the docs say it is required */
> > + tuple->values[i].data[len]
> = '\0';
> > + break;
> > + }
> > + case 't': /* text
> formatted value */
> > + {
> > + tuple->changed[i] = true;
> > + int len = pq_getmsgint(in,
> 4); /* read length */
> >
> > /* and data */
> > - tuple->values[i] =
> palloc(len + 1);
> > - pq_copymsgbytes(in,
> tuple->values[i], len);
> > - tuple->values[i][len] = '\0';
> > + tuple->values[i].data =
> palloc(len + 1);
> > + pq_copymsgbytes(in,
> tuple->values[i].data, len);
> > + tuple->values[i].data[len]
> = '\0';
> > + tuple->values[i].len = len;
>
> The cursor should be set to 0 in the text formatted case too if this is
> how we chose to encode data.
>
> However I am not quite convinced I like the StringInfoData usage here.
> Why not just change the struct to include additional array of lengths
> rather than replacing the existing values array with StringInfoData
> array, that seems generally both simpler and should have smaller memory
> footprint too, no?
>
>
> Can you explain this a bit more? I don't really see a huge difference in
> memory usage.
> We still need length and the data copied into LogicalRepTupleData when
> sending the data in binary, no?
>

Yes but we don't need to have fixed sized array of 1600 elements that
contains maxlen and cursor positions of the StringInfoData struct which
we don't use for anything AFAICS.

New patch that fixes a number of errors in the check for validity as well as reduces the memory usage by

dynamically allocating the data changed as well as collapsing the changed and binary arrays into a format array.

Dave Cramer

Attachment

Re: Binary support for pgoutput plugin

From

Daniel Gustafsson

Date:

01 July 2020, 08:53:54

> On 7 Apr 2020, at 21:45, Dave Cramer <davecramer@gmail.com> wrote:

> New patch that fixes a number of errors in the check for validity as well as reduces the memory usage by
> dynamically allocating the data changed as well as collapsing the changed and binary arrays into a format array.

The 0001 patch fails to apply, and possibly other in the series.  Please submit
a rebased version. Marking the CF entry as Waiting for Author in the meantime.

cheers ./daniel

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

01 July 2020, 10:43:42

Honestly I'm getting a little weary of fixing this up only to have the patch not get reviewed.

Apparently it has no value so unless someone is willing to review it and get it committed I'm just going to let it go.

Thanks,

Dave Cramer

On Wed, 1 Jul 2020 at 04:53, Daniel Gustafsson <daniel@yesql.se> wrote:

> On 7 Apr 2020, at 21:45, Dave Cramer <davecramer@gmail.com> wrote:

> New patch that fixes a number of errors in the check for validity as well as reduces the memory usage by
> dynamically allocating the data changed as well as collapsing the changed and binary arrays into a format array.

The 0001 patch fails to apply, and possibly other in the series. Please submit
a rebased version. Marking the CF entry as Waiting for Author in the meantime.

cheers ./daniel

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

02 July 2020, 16:41:14

rebased

Thanks,

Dave Cramer

On Wed, 1 Jul 2020 at 06:43, Dave Cramer <davecramer@gmail.com> wrote:

Honestly I'm getting a little weary of fixing this up only to have the patch not get reviewed.

Apparently it has no value so unless someone is willing to review it and get it committed I'm just going to let it go.

Thanks,

Dave Cramer

On Wed, 1 Jul 2020 at 04:53, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 7 Apr 2020, at 21:45, Dave Cramer <davecramer@gmail.com> wrote:

> New patch that fixes a number of errors in the check for validity as well as reduces the memory usage by
> dynamically allocating the data changed as well as collapsing the changed and binary arrays into a format array.

The 0001 patch fails to apply, and possibly other in the series. Please submit
a rebased version. Marking the CF entry as Waiting for Author in the meantime.

cheers ./daniel

Attachment

Re: Binary support for pgoutput plugin

From

Daniel Gustafsson

Date:

05 July 2020, 20:40:16

> On 2 Jul 2020, at 18:41, Dave Cramer <davecramer@gmail.com> wrote:
>
> rebased

Thanks!  The new version of 0001 patch has a compiler warning due to mixed
declarations and code:

worker.c: In function ‘slot_store_data’:
worker.c:366:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
     int cursor = tupleData->values[remoteattnum]->cursor;
     ^
worker.c: In function ‘slot_modify_data’:
worker.c:485:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
     int cursor = tupleData->values[remoteattnum]->cursor;
     ^

I didn't investigate to see if it was new, but Travis is running with Werror
which fails this build.

cheers ./daniel

Re: Binary support for pgoutput plugin

From

Alvaro Herrera

Date:

05 July 2020, 21:11:25

On 2020-Jul-05, Daniel Gustafsson wrote:

> > On 2 Jul 2020, at 18:41, Dave Cramer <davecramer@gmail.com> wrote:
> > 
> > rebased
> 
> Thanks!  The new version of 0001 patch has a compiler warning due to mixed
> declarations and code:
> 
> worker.c: In function ‘slot_store_data’:
> worker.c:366:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]

AFAICS this is fixed in 0005.  I'm going to suggest to use "git rebase
-i" so that fixes for bugs that earlier patches introduce are applied as
fix-ups in those patches; we don't need or want to see the submitter's
intermediate versions.  Ideally, each submitted patch should be free of
such problems, so that we can consider each individual patch in the
series in isolation.  Indeed, evidently the cfbot consider things that
way.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary support for pgoutput plugin

From

Daniel Gustafsson

Date:

05 July 2020, 21:28:11

> On 5 Jul 2020, at 23:11, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2020-Jul-05, Daniel Gustafsson wrote:
>
>>> On 2 Jul 2020, at 18:41, Dave Cramer <davecramer@gmail.com> wrote:
>>>
>>> rebased
>>
>> Thanks!  The new version of 0001 patch has a compiler warning due to mixed
>> declarations and code:
>>
>> worker.c: In function ‘slot_store_data’:
>> worker.c:366:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
>
> AFAICS this is fixed in 0005.

Yes and no, 0005 fixes one such instance but the one failing the build is
another one in worker.c (the below being from 0008 which in turn change the row
in question from previous patches):

+               int cursor = tupleData->values[remoteattnum]->cursor;

>  I'm going to suggest to use "git rebase -i"

+1

cheers ./daniel

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

06 July 2020, 12:58:01

On Sun, 5 Jul 2020 at 17:28, Daniel Gustafsson <daniel@yesql.se> wrote:

> On 5 Jul 2020, at 23:11, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2020-Jul-05, Daniel Gustafsson wrote:
>
>>> On 2 Jul 2020, at 18:41, Dave Cramer <davecramer@gmail.com> wrote:
>>>
>>> rebased
>>
>> Thanks! The new version of 0001 patch has a compiler warning due to mixed
>> declarations and code:
>>
>> worker.c: In function ‘slot_store_data’:
>> worker.c:366:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
>
> AFAICS this is fixed in 0005.

Yes and no, 0005 fixes one such instance but the one failing the build is
another one in worker.c (the below being from 0008 which in turn change the row
in question from previous patches):

+ int cursor = tupleData->values[remoteattnum]->cursor;

> I'm going to suggest to use "git rebase -i"

+1

Strangely I don't see those errors when I build on my machine, but I will fix them

as far as rebase -i do what is advised here for squashing them. Just one patch now ?

Thanks,

Re: Binary support for pgoutput plugin

From

Daniel Gustafsson

Date:

06 July 2020, 13:03:32

> On 6 Jul 2020, at 14:58, Dave Cramer <davecramer@gmail.com> wrote:

> as far as rebase -i do what is advised here for squashing them. Just one patch now ?

One patch per logical change, if there are two disjoint changes in the patchset
where one builds on top of the other then multiple patches are of course fine.
My personal rule-of-thumb is to split it the way I envision it committed.

cheers ./daniel

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

06 July 2020, 13:35:36

On Mon, 6 Jul 2020 at 09:03, Daniel Gustafsson <daniel@yesql.se> wrote:

> On 6 Jul 2020, at 14:58, Dave Cramer <davecramer@gmail.com> wrote:

> as far as rebase -i do what is advised here for squashing them. Just one patch now ?

One patch per logical change, if there are two disjoint changes in the patchset
where one builds on top of the other then multiple patches are of course fine.
My personal rule-of-thumb is to split it the way I envision it committed.

At this point it is the result of 3 rebases. I guess I'll have to break it up differently..

Thanks,

Dave

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

07 July 2020, 00:16:50

On Mon, 6 Jul 2020 at 09:35, Dave Cramer <davecramer@gmail.com> wrote:

On Mon, 6 Jul 2020 at 09:03, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 6 Jul 2020, at 14:58, Dave Cramer <davecramer@gmail.com> wrote:

> as far as rebase -i do what is advised here for squashing them. Just one patch now ?

One patch per logical change, if there are two disjoint changes in the patchset
where one builds on top of the other then multiple patches are of course fine.
My personal rule-of-thumb is to split it the way I envision it committed.

At this point it is the result of 3 rebases. I guess I'll have to break it up differently..

OK, rebased it down to 2 patches, attached.

Thanks,

Dave

Attachment

Re: Binary support for pgoutput plugin

From

Daniel Gustafsson

Date:

07 July 2020, 14:01:22

> On 7 Jul 2020, at 02:16, Dave Cramer <davecramer@gmail.com> wrote:

> OK, rebased it down to 2 patches, attached.

I took a look at this patchset today.  The feature clearly seems like something
which we'd benefit from having, especially if it allows for the kind of
extensions that were discussed at the beginning of this thread.  In general I
think it's in pretty good shape, there are however a few comments:

The patch lacks any kind of test, which I think is required for it to be
considered for committing.  It also doesn't update the \dRs view in psql to
include the subbinary column which IMO it should.  I took the liberty to write
this as well as tests as I was playing with the patch, the attached 0003
contains this, while 0001 and 0002 are your patches included to ensure the
CFBot can do it's thing.  This was kind of thrown together to have something
while testing, so it definately need a once-over or two.

The comment here implies that unchanged is the default value for format, but
isn't this actually setting it to text formatted value?
+   /* default is unchanged */
+   tuple->format = palloc(natts * sizeof(char));
+   memset(tuple->format, 't', natts * sizeof(char));
Also, since the values member isn't memset() with a default, this seems a bit
misleading at best no?

For the rest of the backend we aren't including the defname in the errmsg like
what is done here.  Maybe we should, but I think that should be done
consistently if so, and we should stick to just "conflicting or redundant
options" for now.  At the very least, this needs a comma between "options" and
the defname and ideally the defname wrapped in \".
-                        errmsg("conflicting or redundant options")));
+                        errmsg("conflicting or redundant options %s already provided", defel->defname)));

These types of constructs are IMHO quite hard to read:
+         if(
+ #ifdef WORDS_BIGENDIAN
+             true
+ #else
+             false
+ #endif
+                 != bigendian)
How about spelling out the statement completely for both cases, or perhaps
encapsulating the logic in a macro? Something like the below perhaps?
+ #ifdef WORDS_BIGENDIAN
+         if (bigendian != true)
+ #else
+         if (bigendian != false)
+ #endif

This change needs to be removed before a commit, just highlighting that here to
avoid it going unnoticed.
-/* #define WAL_DEBUG */
+#define WAL_DEBUG

Reading this I'm left wondering if we shoulnd't introduce macros for the kinds,
since we now compare with 'u', 't' etc in quite a few places and add comments
explaining the types everywhere.  A descriptive name would make it easier to
grep for all occurrences, and avoid the need for the comment lines.  Thats not
necesarily for this patch though, but an observation from reading it.

I found a few smaller nitpicks as well, some of these might go away by a
pg_indent run but I've included them all here regardless:

This change, and the subsequent whitespace removal later in the same function,
seems a bit pointless:
-       /* read the relation id */
        relid = pq_getmsgint(in, 4);
-

Braces should go on the next line:
+       if (options->proto.logical.binary) {

This should be a C /* ...  */ comment, or perhaps just removed since the code
is quite self explanatory:
+   // default to false
+   *binary_basetypes = false;

Indentation here:
-                        errmsg("conflicting or redundant options")));
+                       errmsg("conflicting or redundant options %s already provided", defel->defname)));

..as well as here (there are a few like this one):
+                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                   errmsg("incompatible datum size")));

Capitalization of "after" to make it a proper sentence:
+ * after we know that the subscriber is requesting binary check to make sure

Excessive whitespace and indentation in a few places, and not enough in some:
+                if (binary_given)
+                {
+                values[Anum_pg_subscription_subbinary - 1] =
...
+   if ( *binary_basetypes == true )
...
+       if (sizeof(int)  != int_size)
...
+       if( float4_byval !=
...
+         if (sizeof(long)  != long_size)
+                     ereport(ERROR,
...
+        if (tupleData->format[remoteattnum] =='u')
...
+   bool       binary_basetypes;

That's all for now.

cheers ./daniel

Attachment

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

07 July 2020, 20:53:53

On Tue, 7 Jul 2020 at 10:01, Daniel Gustafsson <daniel@yesql.se> wrote:

> On 7 Jul 2020, at 02:16, Dave Cramer <davecramer@gmail.com> wrote:

> OK, rebased it down to 2 patches, attached.

I took a look at this patchset today. The feature clearly seems like something
which we'd benefit from having, especially if it allows for the kind of
extensions that were discussed at the beginning of this thread. In general I
think it's in pretty good shape, there are however a few comments:

The patch lacks any kind of test, which I think is required for it to be
considered for committing. It also doesn't update the \dRs view in psql to
include the subbinary column which IMO it should. I took the liberty to write
this as well as tests as I was playing with the patch, the attached 0003
contains this, while 0001 and 0002 are your patches included to ensure the
CFBot can do it's thing. This was kind of thrown together to have something
while testing, so it definately need a once-over or two.

I have put all your requests other than the indentation as that can be dealt with by pg_indent into another patch which I reordered ahead of yours

This should make it easier to see that all of your issues have been addressed.

I did not do the macro for updated, inserted, deleted, will give that a go tomorrow.

The comment here implies that unchanged is the default value for format, but
isn't this actually setting it to text formatted value?
+ /* default is unchanged */
+ tuple->format = palloc(natts * sizeof(char));
+ memset(tuple->format, 't', natts * sizeof(char));
Also, since the values member isn't memset() with a default, this seems a bit
misleading at best no?

For the rest of the backend we aren't including the defname in the errmsg like
what is done here. Maybe we should, but I think that should be done
consistently if so, and we should stick to just "conflicting or redundant
options" for now. At the very least, this needs a comma between "options" and
the defname and ideally the defname wrapped in \".
- errmsg("conflicting or redundant options")));
+ errmsg("conflicting or redundant options %s already provided", defel->defname)));

I added these since this will now be used outside of logical replication and getting reasonable error messages when setting up

replication is useful. I added the \" and ,

These types of constructs are IMHO quite hard to read:
+ if(
+ #ifdef WORDS_BIGENDIAN
+ true
+ #else
+ false
+ #endif
+ != bigendian)
How about spelling out the statement completely for both cases, or perhaps
encapsulating the logic in a macro? Something like the below perhaps?
+ #ifdef WORDS_BIGENDIAN
+ if (bigendian != true)
+ #else
+ if (bigendian != false)
+ #endif

This change needs to be removed before a commit, just highlighting that here to
avoid it going unnoticed.
-/* #define WAL_DEBUG */
+#define WAL_DEBUG

Done

Reading this I'm left wondering if we shoulnd't introduce macros for the kinds,
since we now compare with 'u', 't' etc in quite a few places and add comments
explaining the types everywhere. A descriptive name would make it easier to
grep for all occurrences, and avoid the need for the comment lines. Thats not
necesarily for this patch though, but an observation from reading it.

I'll take a look at adding macros tomorrow.

I've taken care of much of this below

I found a few smaller nitpicks as well, some of these might go away by a
pg_indent run but I've included them all here regardless:

This change, and the subsequent whitespace removal later in the same function,
seems a bit pointless:
- /* read the relation id */
relid = pq_getmsgint(in, 4);
-

Braces should go on the next line:
+ if (options->proto.logical.binary) {

This should be a C /* ... */ comment, or perhaps just removed since the code
is quite self explanatory:
+ // default to false
+ *binary_basetypes = false;

Indentation here:
- errmsg("conflicting or redundant options")));
+ errmsg("conflicting or redundant options %s already provided", defel->defname)));

..as well as here (there are a few like this one):
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("incompatible datum size")));

Capitalization of "after" to make it a proper sentence:
+ * after we know that the subscriber is requesting binary check to make sure

Excessive whitespace and indentation in a few places, and not enough in some:
+ if (binary_given)
+ {
+ values[Anum_pg_subscription_subbinary - 1] =
...
+ if ( *binary_basetypes == true )
...
+ if (sizeof(int) != int_size)
...
+ if( float4_byval !=
...
+ if (sizeof(long) != long_size)
+ ereport(ERROR,
...
+ if (tupleData->format[remoteattnum] =='u')
...
+ bool binary_basetypes;

That's all for now.

cheers ./daniel

Attachment

Re: Binary support for pgoutput plugin

From

Daniel Gustafsson

Date:

09 July 2020, 08:48:06

> On 7 Jul 2020, at 22:53, Dave Cramer <davecramer@gmail.com> wrote:

> I have put all your requests other than the indentation as that can be dealt with by pg_indent into another patch
whichI reordered ahead of yours 
> This should make it easier to see that all of your issues have been addressed.

Thanks for the update!  Do note that my patch included a new file which is
missing from this patchset:

    src/test/subscription/t/014_binary.pl

This is, IMO, the most interesting test of this feature so it would be good to
be included.  It's a basic test and can no doubt be extended to be even more
relevant, but it's a start.

> I did not do the macro for updated, inserted, deleted, will give that a go tomorrow.

This might not be a blocker, but personally I think it would make the code more
readable. Anyone else have an opinion on this?

> I added these since this will now be used outside of logical replication and getting reasonable error messages when
settingup 
> replication is useful. I added the \" and ,

I think the "lack of detail" in the existing error messages is intentional to
make translation easier, but I might be wrong here.

Reading through the new patch, and running the tests, I'm marking this as Ready
for Committer.  It does need some cosmetic TLC, quite possibly just from
pg_indent but I didn't validate if it will take care of everything, and comment
touchups (there is still a TODO comment around wording that needs to be
resolved).  However, I think it's in good enough shape for consideration at
this point.

cheers ./daniel

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

10 July 2020, 18:20:58

Daniel Gustafsson <daniel@yesql.se> writes:
> Thanks for the update!  Do note that my patch included a new file which is
> missing from this patchset:
>     src/test/subscription/t/014_binary.pl
> This is, IMO, the most interesting test of this feature so it would be good to
> be included.  It's a basic test and can no doubt be extended to be even more
> relevant, but it's a start.

I was about to complain that the latest patchset includes no meaningful
test cases, but I assume that this missing file contains those.

>> I did not do the macro for updated, inserted, deleted, will give that a go tomorrow.

> This might not be a blocker, but personally I think it would make the
> code more readable. Anyone else have an opinion on this?

+1 for using macros.

> Reading through the new patch, and running the tests, I'm marking this as Ready
> for Committer.  It does need some cosmetic TLC, quite possibly just from
> pg_indent but I didn't validate if it will take care of everything, and comment
> touchups (there is still a TODO comment around wording that needs to be
> resolved).  However, I think it's in good enough shape for consideration at
> this point.

I took a quick look through the patch and had some concerns:

* Please strip out the PG_VERSION_NUM and USE_INTEGER_DATETIMES checks.
Those are quite dead so far as a patch for HEAD is concerned --- in fact,
since USE_INTEGER_DATETIMES hasn't even been defined since v10 or so,
the patch is actively doing the wrong thing there.  Not that it matters.
This code will never appear in any branch where float timestamps could
be a thing.

* I doubt that the checks on USE_FLOAT4/8_BYVAL, sizeof(int), endiannness,
etc, make any sense either.  Those surely do not affect the on-the-wire
representation of values --- or if they do, we've blown it somewhere else.
I'd just take out all those checks and assume that the binary
representation is sufficiently portable.  (If it's not, it's more or less
the user's problem, just as in binary COPY.)

* Please also remove debugging hacks such as enabling WAL_DEBUG.

* It'd likely be wise for the documentation to point out that binary
mode only works if all types to be transferred have send/receive
functions.

BTW, while it's not the job of this patch to fix it, I find it quite
distressing that we're apparently repeating the lookups of the type
I/O functions for every row transferred.

I'll set this back to WoA, but I concur with Daniel's opinion that
it doesn't seem that far from committability.

            regards, tom lane

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

11 July 2020, 12:14:48

On Fri, 10 Jul 2020 at 14:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Gustafsson <daniel@yesql.se> writes:
> Thanks for the update! Do note that my patch included a new file which is
> missing from this patchset:
> src/test/subscription/t/014_binary.pl
> This is, IMO, the most interesting test of this feature so it would be good to
> be included. It's a basic test and can no doubt be extended to be even more
> relevant, but it's a start.

I was about to complain that the latest patchset includes no meaningful
test cases, but I assume that this missing file contains those.

>> I did not do the macro for updated, inserted, deleted, will give that a go tomorrow.

> This might not be a blocker, but personally I think it would make the
> code more readable. Anyone else have an opinion on this?

+1 for using macros.

Got it, will add.

> Reading through the new patch, and running the tests, I'm marking this as Ready
> for Committer. It does need some cosmetic TLC, quite possibly just from
> pg_indent but I didn't validate if it will take care of everything, and comment
> touchups (there is still a TODO comment around wording that needs to be
> resolved). However, I think it's in good enough shape for consideration at
> this point.

I took a quick look through the patch and had some concerns:

* Please strip out the PG_VERSION_NUM and USE_INTEGER_DATETIMES checks.
Those are quite dead so far as a patch for HEAD is concerned --- in fact,
since USE_INTEGER_DATETIMES hasn't even been defined since v10 or so,
the patch is actively doing the wrong thing there. Not that it matters.
This code will never appear in any branch where float timestamps could
be a thing.

* I doubt that the checks on USE_FLOAT4/8_BYVAL, sizeof(int), endiannness,
etc, make any sense either. Those surely do not affect the on-the-wire
representation of values --- or if they do, we've blown it somewhere else.
I'd just take out all those checks and assume that the binary
representation is sufficiently portable. (If it's not, it's more or less
the user's problem, just as in binary COPY.)

So is there any point in having them as options then ?

* Please also remove debugging hacks such as enabling WAL_DEBUG.

* It'd likely be wise for the documentation to point out that binary
mode only works if all types to be transferred have send/receive
functions.

will do

BTW, while it's not the job of this patch to fix it, I find it quite
distressing that we're apparently repeating the lookups of the type
I/O functions for every row transferred.

I'll set this back to WoA, but I concur with Daniel's opinion that
it doesn't seem that far from committability.

Thanks for looking at this

Dave Cramer

Re: Binary support for pgoutput plugin

From

Petr Jelinek

Date:

11 July 2020, 14:08:43

Hi,

On 11/07/2020 14:14, Dave Cramer wrote:
> 
> 
> On Fri, 10 Jul 2020 at 14:21, Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>      > Reading through the new patch, and running the tests, I'm marking
>     this as Ready
>      > for Committer.  It does need some cosmetic TLC, quite possibly
>     just from
>      > pg_indent but I didn't validate if it will take care of
>     everything, and comment
>      > touchups (there is still a TODO comment around wording that needs
>     to be
>      > resolved).  However, I think it's in good enough shape for
>     consideration at
>      > this point.
> 
>     I took a quick look through the patch and had some concerns:
> 
>     * Please strip out the PG_VERSION_NUM and USE_INTEGER_DATETIMES checks.
>     Those are quite dead so far as a patch for HEAD is concerned --- in
>     fact,
>     since USE_INTEGER_DATETIMES hasn't even been defined since v10 or so,
>     the patch is actively doing the wrong thing there.  Not that it matters.
>     This code will never appear in any branch where float timestamps could
>     be a thing.
> 
>     * I doubt that the checks on USE_FLOAT4/8_BYVAL, sizeof(int),
>     endiannness,
>     etc, make any sense either.  Those surely do not affect the on-the-wire
>     representation of values --- or if they do, we've blown it somewhere
>     else.
>     I'd just take out all those checks and assume that the binary
>     representation is sufficiently portable.  (If it's not, it's more or
>     less
>     the user's problem, just as in binary COPY.)
> 
> 
> So is there any point in having them as options then ?
> 

I am guessing this is copied from pglogical, right? We have them there 
because it can optionally send data in the on-disk format (not the 
network binary format) and there this matters, but for network binary 
format they do not matter as Tom says.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

11 July 2020, 14:20:47

Petr Jelinek <petr@2ndquadrant.com> writes:
> On 11/07/2020 14:14, Dave Cramer wrote:
>> So is there any point in having them as options then ?

> I am guessing this is copied from pglogical, right? We have them there 
> because it can optionally send data in the on-disk format (not the 
> network binary format) and there this matters, but for network binary 
> format they do not matter as Tom says.

Ah, I wondered why that was there at all.  Yes, you should just delete
all that logic --- it's irrelevant as long as we use the send/recv
functions.

            regards, tom lane

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

13 July 2020, 13:11:43

On Sat, 11 Jul 2020 at 10:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Petr Jelinek <petr@2ndquadrant.com> writes:
> On 11/07/2020 14:14, Dave Cramer wrote:
>> So is there any point in having them as options then ?

> I am guessing this is copied from pglogical, right? We have them there
> because it can optionally send data in the on-disk format (not the
> network binary format) and there this matters, but for network binary
> format they do not matter as Tom says.

Ah, I wondered why that was there at all. Yes, you should just delete
all that logic --- it's irrelevant as long as we use the send/recv
functions.

regards, tom lane

Ok,

removed all the unnecessary options.

Added the test case that Daniel had created.

Added a note to the docs.

Note WAL_DEBUG is removed in patch 3. I could rebase that into patch 1 if required.

Thanks,

Dave

Attachment

Re: Binary support for pgoutput plugin

From

Daniel Gustafsson

Date:

14 July 2020, 13:26:01

> On 13 Jul 2020, at 15:11, Dave Cramer <davecramer@gmail.com> wrote:

I took another look at the updated version today.  Since there now were some
unused variables and (I believe) unnecessary checks (int size and endianness
etc) left, I took the liberty to fix those.  I also fixed some markup in the
catalog docs, did some minor tidying up and ran pgindent on it.

The attached is a squash of the 4 patches in your email with the above fixes.
I'm again marking it RfC since I believe all concerns raised so far has been
addressed.

> Added the test case that Daniel had created.

Nope, still missing AFAICT =) But I've included it in the attached.

cheers ./daniel

Attachment

0001-Add-binary-protocol-for-publications-and-subscriptio.patch

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

14 July 2020, 13:36:37

On Tue, 14 Jul 2020 at 09:26, Daniel Gustafsson <daniel@yesql.se> wrote:

> On 13 Jul 2020, at 15:11, Dave Cramer <davecramer@gmail.com> wrote:

I took another look at the updated version today. Since there now were some
unused variables and (I believe) unnecessary checks (int size and endianness
etc) left, I took the liberty to fix those. I also fixed some markup in the
catalog docs, did some minor tidying up and ran pgindent on it.

The attached is a squash of the 4 patches in your email with the above fixes.
I'm again marking it RfC since I believe all concerns raised so far has been
addressed.

> Added the test case that Daniel had created.

Nope, still missing AFAICT =) But I've included it in the attached.

Thanks!

Dave

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

14 July 2020, 16:59:23

So I started looking through this seriously, and my first question
is why do the docs and code keep saying that "base types" are sent
in binary?  Why not just "data"?  Are there any cases where we
don't use binary format, if the subscription requests it?

If there's not a concrete reason to use that terminology,
I'd rather flush it, because it seems confusing.

            regards, tom lane

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

14 July 2020, 18:08:53

On Tue, 14 Jul 2020 at 12:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So I started looking through this seriously, and my first question
is why do the docs and code keep saying that "base types" are sent
in binary? Why not just "data"? Are there any cases where we
don't use binary format, if the subscription requests it?

If there's not a concrete reason to use that terminology,
I'd rather flush it, because it seems confusing.

Well for some reason I thought there were some types that did not have send and receive functions.

I've changed the docs to say data and the flag from binary_basetypes to just binary

See attached.

Thanks,

Dave

Attachment

0001-Add-binary-protocol-for-publications-and-subscriptio.patch

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

14 July 2020, 18:36:37

Dave Cramer <davecramer@gmail.com> writes:
> On Tue, 14 Jul 2020 at 12:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I started looking through this seriously, and my first question
>> is why do the docs and code keep saying that "base types" are sent
>> in binary?  Why not just "data"?  Are there any cases where we
>> don't use binary format, if the subscription requests it?

> Well for some reason I thought there were some types that did not have send
> and receive functions.

There are, but they're all base types, so this terminology is still
unhelpful ;-).

It'd be possible for the sender to send binary for columns it has a
typsend function for, and otherwise send text.  However, this only helps
if the receiver has receive functions for all those types; in
cross-version cases they might disagree about which types can be sent
in binary.  (Hm ... maybe we could have the receiver verify that it has
typreceive for every column included in its version of the table, before
asking for binary mode?)

            regards, tom lane

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

14 July 2020, 19:56:17

Hi,

On 2020-07-14 14:08:53 -0400, Dave Cramer wrote:
> On Tue, 14 Jul 2020 at 12:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> > So I started looking through this seriously, and my first question
> > is why do the docs and code keep saying that "base types" are sent
> > in binary?  Why not just "data"?  Are there any cases where we
> > don't use binary format, if the subscription requests it?
> >
> > If there's not a concrete reason to use that terminology,
> > I'd rather flush it, because it seems confusing.
> >
> 
> Well for some reason I thought there were some types that did not have send
> and receive functions.

There's also send/receive functions that do not work across systems,
unfortunately :(. In particular record and array send functions embed
type oids and their receive functions verify that they match the local
system. Which basically means that if there's any difference in oid
assignment order between two systems that they will not allow to
send/recv such data between them :(.

I suspect that is what that comments might have been referring to?

I've several times suggested that we should remove those type checks in
recv, as they afaict don't provide any actual value. But unfortunately
there hasn't been much response to that. See e.g.

https://postgr.es/m/20160426001713.hbqdiwvf4mkzkg55%40alap3.anarazel.de

Greetings,

Andres Freund

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

14 July 2020, 23:46:52

Andres Freund <andres@anarazel.de> writes:
> There's also send/receive functions that do not work across systems,
> unfortunately :(. In particular record and array send functions embed
> type oids and their receive functions verify that they match the local
> system. Which basically means that if there's any difference in oid
> assignment order between two systems that they will not allow to
> send/recv such data between them :(.

It's not a problem particularly for built-in types, but I agree
there's an issue for extension types.

> I've several times suggested that we should remove those type checks in
> recv, as they afaict don't provide any actual value. But unfortunately
> there hasn't been much response to that. See e.g.
> https://postgr.es/m/20160426001713.hbqdiwvf4mkzkg55%40alap3.anarazel.de

Maybe we could compromise by omitting the check if both OIDs are
outside the built-in range?

            regards, tom lane

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

15 July 2020, 01:41:13

Hi,

On 2020-07-14 19:46:52 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > There's also send/receive functions that do not work across systems,
> > unfortunately :(. In particular record and array send functions embed
> > type oids and their receive functions verify that they match the local
> > system. Which basically means that if there's any difference in oid
> > assignment order between two systems that they will not allow to
> > send/recv such data between them :(.
> 
> It's not a problem particularly for built-in types, but I agree
> there's an issue for extension types.

I'm not so sure. That's true for builtin types within a single major
version, but not necessarily across major versions. Not that I can
immediately recall cases where we renumbered type oids.

It also assumes that the type specification exactly matches between the
source / target system. It's probably not a great idea to try to use
send/recv for meaningfully different types, but it doesn't seem to crazy
to e.g. allow to e.g. change varchar to text while doing a major version
upgrade over logical rep.

What is the gain in having these checks? recv functions need to be safe
against arbitrary input, so a type crosscheck doesn't buy additional
safety in that regard. Not that a potential attacker couldn't just
change the content anyways?

> > I've several times suggested that we should remove those type checks in
> > recv, as they afaict don't provide any actual value. But unfortunately
> > there hasn't been much response to that. See e.g.
> > https://postgr.es/m/20160426001713.hbqdiwvf4mkzkg55%40alap3.anarazel.de
> 
> Maybe we could compromise by omitting the check if both OIDs are
> outside the built-in range?

Hm. That'd be a lot better than the current situation. So I'd definitely
go for that if that's what we can agree on.

Greetings,

Andres Freund

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

15 July 2020, 02:28:48

Andres Freund <andres@anarazel.de> writes:
> What is the gain in having these checks? recv functions need to be safe
> against arbitrary input, so a type crosscheck doesn't buy additional
> safety in that regard. Not that a potential attacker couldn't just
> change the content anyways?

You're confusing security issues with user-friendliness issues.
Detecting that you sent the wrong type via an OID mismatch error
is a lot less painful than trying to figure out why you've got
errors along the line of "incorrect binary data format".

            regards, tom lane

Re: Binary support for pgoutput plugin

From

Andres Freund

Date:

15 July 2020, 02:47:53

Hi,

On 2020-07-14 22:28:48 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > What is the gain in having these checks? recv functions need to be safe
> > against arbitrary input, so a type crosscheck doesn't buy additional
> > safety in that regard. Not that a potential attacker couldn't just
> > change the content anyways?
> 
> You're confusing security issues with user-friendliness issues.
> Detecting that you sent the wrong type via an OID mismatch error
> is a lot less painful than trying to figure out why you've got
> errors along the line of "incorrect binary data format".

An oid mismatch error without knowing what that's about isn't very
helpful either.

How about adding an errcontext that shows the "source type oid", the
target type oid & type name and, for records, the column name of the
target table? That'd make this a lot easier to debug.

Greetings,

Andres Freund

Re: Binary support for pgoutput plugin

From

Dave Cramer

Date:

16 July 2020, 13:58:13

On Tue, 14 Jul 2020 at 22:47, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-07-14 22:28:48 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > What is the gain in having these checks? recv functions need to be safe
> > against arbitrary input, so a type crosscheck doesn't buy additional
> > safety in that regard. Not that a potential attacker couldn't just
> > change the content anyways?
>
> You're confusing security issues with user-friendliness issues.
> Detecting that you sent the wrong type via an OID mismatch error
> is a lot less painful than trying to figure out why you've got
> errors along the line of "incorrect binary data format".

An oid mismatch error without knowing what that's about isn't very
helpful either.

How about adding an errcontext that shows the "source type oid", the
target type oid & type name and, for records, the column name of the
target table? That'd make this a lot easier to debug.

So looking at how to confirm that the subscriber has receive functions for all of the types.

AFAICT we don't have that information since the publication determines what is sent?

This code line 482 in proto.c attempts to limit what is sent in binary. We could certainly be more restrictive here.

if (binary &&

OidIsValid(typclass->typreceive) &&

(att->atttypid < FirstNormalObjectId || typclass->typtype != 'c') &&

(att->atttypid < FirstNormalObjectId || typclass->typelem == InvalidOid))

Dave Cramer

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

17 July 2020, 18:55:08

Working through this ... what is the rationale for having changed
the API of logicalrep_read_update?  It seems kind of random,
especially since no comparable change was made to
logicalrep_read_insert.  If there's actually a good reason,
it seems like it'd apply to both.  If there's not, I'd be
inclined to not change the API, because this sort of thing
is a recipe for bugs when making cross-version patches.

            regards, tom lane

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

18 July 2020, 16:53:32

I've pushed this patch, with a number of adjustments, some cosmetic
and some not so much (no pg_dump support!?).  We're not quite
done though ...

Dave Cramer <davecramer@gmail.com> writes:
> So looking at how to confirm that the subscriber has receive functions for
> all of the types.
> AFAICT we don't have that information since the publication determines what
> is sent?

Yeah, at the point where we need to send the option, we seem not to have a
lot of info.  In practice, if the sender has a typsend function, the only
way the subscriber doesn't have a matching typreceive function is if it's
an older PG version.  I think it's sufficient to document that you can't
use binary mode in that case, so that's what I did.  (Note that
getTypeBinaryInputInfo will say "no binary input function available for
type %s" in such a case, so that seemed like adequate error handling.)

> On Tue, 14 Jul 2020 at 22:47, Andres Freund <andres@anarazel.de> wrote:
>> An oid mismatch error without knowing what that's about isn't very
>> helpful either.
>> How about adding an errcontext that shows the "source type oid", the
>> target type oid & type name and, for records, the column name of the
>> target table? That'd make this a lot easier to debug.

> This code line 482 in proto.c attempts to limit what is sent in binary. We
> could certainly be more restrictive here.

I think Andres' point is to be *less* restrictive.  I left that logic
as-is in the committed patch, but we could do something like the attached
to improve the situation.

            regards, tom lane

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 2b1356ee24..d9837a5971 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -494,22 +494,9 @@ logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple, bool binar
         typclass = (Form_pg_type) GETSTRUCT(typtup);

         /*
-         * Choose whether to send in binary.  Obviously, the option must be
-         * requested and the type must have a send function.  Also, if the
-         * type is not built-in then it must not be a composite or array type.
-         * Such types contain type OIDs, which will likely not match at the
-         * receiver if it's not a built-in type.
-         *
-         * XXX this could be relaxed if we changed record_recv and array_recv
-         * to be less picky.
-         *
-         * XXX this fails to apply the restriction to domains over such types.
+         * Send in binary if requested and type has suitable send function.
          */
-        if (binary &&
-            OidIsValid(typclass->typsend) &&
-            (att->atttypid < FirstGenbkiObjectId ||
-             (typclass->typtype != TYPTYPE_COMPOSITE &&
-              typclass->typelem == InvalidOid)))
+        if (binary && OidIsValid(typclass->typsend))
         {
             bytea       *outputbytes;
             int            len;
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 800107d4e7..392445ea03 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -1308,13 +1308,34 @@ array_recv(PG_FUNCTION_ARGS)
                 (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
                  errmsg("invalid array flags")));

+    /* Check element type recorded in the data */
     element_type = pq_getmsgint(buf, sizeof(Oid));
+
+    /*
+     * From a security standpoint, it doesn't matter whether the input's
+     * element type matches what we expect: the element type's receive
+     * function has to be robust enough to cope with invalid data.  However,
+     * from a user-friendliness standpoint, it's nicer to complain about type
+     * mismatches than to throw "improper binary format" errors.  But there's
+     * a problem: only built-in types have OIDs that are stable enough to
+     * believe that a mismatch is a real issue.  So complain only if both OIDs
+     * are in the built-in range.  Otherwise, carry on with the element type
+     * we "should" be getting.
+     */
     if (element_type != spec_element_type)
     {
-        /* XXX Can we allow taking the input element type in any cases? */
-        ereport(ERROR,
-                (errcode(ERRCODE_DATATYPE_MISMATCH),
-                 errmsg("wrong element type")));
+        if (element_type < FirstGenbkiObjectId &&
+            spec_element_type < FirstGenbkiObjectId)
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("binary data has array element type %u (%s) instead of expected %u (%s)",
+                            element_type,
+                            format_type_extended(element_type, -1,
+                                                 FORMAT_TYPE_ALLOW_INVALID),
+                            spec_element_type,
+                            format_type_extended(spec_element_type, -1,
+                                                 FORMAT_TYPE_ALLOW_INVALID))));
+        element_type = spec_element_type;
     }

     for (i = 0; i < ndim; i++)
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 80cba2f4c2..674cf0a55d 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -551,13 +551,33 @@ record_recv(PG_FUNCTION_ARGS)
             continue;
         }

-        /* Verify column datatype */
+        /* Check column type recorded in the data */
         coltypoid = pq_getmsgint(buf, sizeof(Oid));
-        if (coltypoid != column_type)
+
+        /*
+         * From a security standpoint, it doesn't matter whether the input's
+         * column type matches what we expect: the column type's receive
+         * function has to be robust enough to cope with invalid data.
+         * However, from a user-friendliness standpoint, it's nicer to
+         * complain about type mismatches than to throw "improper binary
+         * format" errors.  But there's a problem: only built-in types have
+         * OIDs that are stable enough to believe that a mismatch is a real
+         * issue.  So complain only if both OIDs are in the built-in range.
+         * Otherwise, carry on with the column type we "should" be getting.
+         */
+        if (coltypoid != column_type &&
+            coltypoid < FirstGenbkiObjectId &&
+            column_type < FirstGenbkiObjectId)
             ereport(ERROR,
                     (errcode(ERRCODE_DATATYPE_MISMATCH),
-                     errmsg("wrong data type: %u, expected %u",
-                            coltypoid, column_type)));
+                     errmsg("binary data has type %u (%s) instead of expected %u (%s) in record column %d",
+                            coltypoid,
+                            format_type_extended(coltypoid, -1,
+                                                 FORMAT_TYPE_ALLOW_INVALID),
+                            column_type,
+                            format_type_extended(column_type, -1,
+                                                 FORMAT_TYPE_ALLOW_INVALID),
+                            i + 1)));

         /* Get and check the item length */
         itemlen = pq_getmsgint(buf, 4);

Re: Binary support for pgoutput plugin

From

Peter Geoghegan

Date:

20 July 2020, 15:17:25

On Sat, Jul 18, 2020 at 9:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've pushed this patch, with a number of adjustments, some cosmetic
> and some not so much (no pg_dump support!?).  We're not quite
> done though ...

Skink's latest run reports a failure that I surmise was caused by this patch:

==722318== VALGRINDERROR-BEGIN
==722318== Invalid read of size 1
==722318==    at 0x4F4CC9: apply_handle_update (worker.c:834)
==722318==    by 0x4F4F81: apply_dispatch (worker.c:1427)
==722318==    by 0x4F5104: LogicalRepApplyLoop (worker.c:1635)
==722318==    by 0x4F57BF: ApplyWorkerMain (worker.c:2141)
==722318==    by 0x4BD49E: StartBackgroundWorker (bgworker.c:813)
==722318==    by 0x4CBAB4: do_start_bgworker (postmaster.c:5865)
==722318==    by 0x4CBBF5: maybe_start_bgworkers (postmaster.c:6091)
==722318==    by 0x4CC4BF: sigusr1_handler (postmaster.c:5260)
==722318==    by 0x486413F: ??? (in
/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==722318==    by 0x4DC7845: select (select.c:41)
==722318==    by 0x4CCE40: ServerLoop (postmaster.c:1691)
==722318==    by 0x4CE106: PostmasterMain (postmaster.c:1400)
==722318==  Address 0x78cb0ab is 443 bytes inside a recently
re-allocated block of size 8,192 alloc'd
==722318==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==722318==    by 0x6A55BD: AllocSetContextCreateInternal (aset.c:468)
==722318==    by 0x280262: AtStart_Memory (xact.c:1108)
==722318==    by 0x2806ED: StartTransaction (xact.c:1979)
==722318==    by 0x282128: StartTransactionCommand (xact.c:2829)
==722318==    by 0x4F5514: ApplyWorkerMain (worker.c:2014)
==722318==    by 0x4BD49E: StartBackgroundWorker (bgworker.c:813)
==722318==    by 0x4CBAB4: do_start_bgworker (postmaster.c:5865)
==722318==    by 0x4CBBF5: maybe_start_bgworkers (postmaster.c:6091)
==722318==    by 0x4CC4BF: sigusr1_handler (postmaster.c:5260)
==722318==    by 0x486413F: ??? (in
/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==722318==    by 0x4DC7845: select (select.c:41)
==722318==
==722318== VALGRINDERROR-END

See https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2020-07-20%2002%3A37%3A51

-- 
Peter Geoghegan

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

20 July 2020, 15:51:19

Peter Geoghegan <pg@bowt.ie> writes:
> Skink's latest run reports a failure that I surmise was caused by this patch:

Yeah, I've just been digging through that.  The patch didn't create
the bug, but it allowed valgrind to detect it, because the column
status array is now "just big enough" rather than being always
MaxTupleAttributeNumber entries.  To wit, the problem is that the
code in apply_handle_update that computes target_rte->updatedCols
is junk.

The immediate issue is that it fails to apply the remote-to-local
column number mapping, so that it's looking at the wrong colstatus
entries, possibly including entries past the end of the array.

I'm fixing that, but even after that, there's a semantic problem:
LOGICALREP_COLUMN_UNCHANGED is just a weak optimization, cf the code
that sends it, in proto.c around line 480.  colstatus will often *not*
be that for columns that were in fact not updated on the remote side.
I wonder whether we need to take steps to improve that.

CC'ing Peter E., as this issue arose with b9c130a1fdf.

            regards, tom lane

Re: Binary support for pgoutput plugin

From

Petr Jelinek

Date:

20 July 2020, 18:55:07

Hi,

On 20/07/2020 17:51, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
>> Skink's latest run reports a failure that I surmise was caused by this patch:
> 
> Yeah, I've just been digging through that.  The patch didn't create
> the bug, but it allowed valgrind to detect it, because the column
> status array is now "just big enough" rather than being always
> MaxTupleAttributeNumber entries.  To wit, the problem is that the
> code in apply_handle_update that computes target_rte->updatedCols
> is junk.
> 
> The immediate issue is that it fails to apply the remote-to-local
> column number mapping, so that it's looking at the wrong colstatus
> entries, possibly including entries past the end of the array.
> 
> I'm fixing that, but even after that, there's a semantic problem:
> LOGICALREP_COLUMN_UNCHANGED is just a weak optimization, cf the code
> that sends it, in proto.c around line 480.  colstatus will often *not*
> be that for columns that were in fact not updated on the remote side.
> I wonder whether we need to take steps to improve that.
> 

LOGICALREP_COLUMN_UNCHANGED is not trying to optimize anything, there is 
certainly no effort made to not send columns that were not updated by 
logical replication itself. It's just something we invented in order to 
handle the fact that values for TOASTed columns that were not updated 
are simply not visible to logical decoding (unless table has REPLICA 
IDENTITY FULL) as they are not written to WAL nor accessible via 
historic snapshot. So the output plugin simply does not see the real value.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Re: Binary support for pgoutput plugin

From

Tom Lane

Date:

20 July 2020, 19:02:59

Petr Jelinek <petr@2ndquadrant.com> writes:
> On 20/07/2020 17:51, Tom Lane wrote:
>> I'm fixing that, but even after that, there's a semantic problem:
>> LOGICALREP_COLUMN_UNCHANGED is just a weak optimization, cf the code
>> that sends it, in proto.c around line 480.  colstatus will often *not*
>> be that for columns that were in fact not updated on the remote side.
>> I wonder whether we need to take steps to improve that.

> LOGICALREP_COLUMN_UNCHANGED is not trying to optimize anything, there is 
> certainly no effort made to not send columns that were not updated by 
> logical replication itself. It's just something we invented in order to 
> handle the fact that values for TOASTed columns that were not updated 
> are simply not visible to logical decoding (unless table has REPLICA 
> IDENTITY FULL) as they are not written to WAL nor accessible via 
> historic snapshot. So the output plugin simply does not see the real value.

Hm.  So the comment I added a couple days ago is wrong; can you propose
a better one?

However, be that as it may, we do have a provision in the protocol that
can handle marking columns unchanged.  I'm thinking if we tried a bit
harder to identify unchanged columns on the sending side, we could both
fix this semantic deficiency for triggers and improve efficiency by
reducing transmission of unneeded data.

            regards, tom lane