Re: Binary support for pgoutput plugin - Mailing list pgsql-hackers

From Dave Cramer
Subject Re: Binary support for pgoutput plugin
Date
Msg-id CADK3HHKF+TQRHoUYqvyq9s9Et-yzeSqP32C+v7bJKJDqF7AATg@mail.gmail.com
Whole thread Raw
In response to Binary support for pgoutput plugin  (Dave Cramer <davecramer@gmail.com>)
Responses Re: Binary support for pgoutput plugin
Re: Binary support for pgoutput plugin
List pgsql-hackers

Dave Cramer


On Tue, 4 Jun 2019 at 16:30, Andres Freund <andres.freund@enterprisedb.com> wrote:
Hi,

On 2019-06-04 15:47:04 -0400, Dave Cramer wrote:
> On Mon, 3 Jun 2019 at 20:54, David Fetter <david@fetter.org> wrote:
>
> > On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> > > Is there a reason why pgoutput sends data in text format? Seems to
> > > me that sending data in binary would provide a considerable
> > > performance improvement.
> >
> > Are you seeing something that suggests that the text output is taking
> > a lot of time or other resources?
> >
> > Actually it's on the other end that there is improvement. Parsing text
> takes much longer for almost everything except ironically text.

It's on both sides, I'd say. E.g. float (until v12), timestamp, bytea
are all much more expensive to convert from binary to text.


> To be more transparent there is some desire to use pgoutput for something
> other than logical replication. Change Data Capture clients such as
> Debezium have a requirement for a stable plugin which is shipped with core
> as this is always available in cloud providers offerings. There's no reason
> that I am aware of that they cannot use pgoutput for this.

Except that that's not pgoutput's purpose, and we shouldn't make it
meaningfully more complicated or slower to achieve this. Don't think
there's a conflict in this case though.

agreed, my intent was to slightly bend it to my will :) 


> There's also no reason that I am aware that binary outputs can't be
> supported.

Well, it *does* increase version dependencies, and does make replication
more complicated, because type oids etc cannot be relied to be the same
on source and target side.

I was about to agree with this but if the type oids change from source to target you 
still can't decode the text version properly. Unless I mis-understand something here ? 


> The protocol would have to change slightly and I am working
> on a POC patch.

Hm, what would have to be changed protocol wise? IIRC that'd just be a
different datum type? Or is that what you mean?
                pq_sendbyte(out, 't');  /* 'text' data follows */

I haven't really thought this through completely but one place JDBC has problems with binary is with
timestamps with timezone as we don't know which timezone to use. Is it safe to assume everything is in UTC
since the server stores in UTC ? Then there are UDF's. My original thought was to use options to send in the 
types that I wanted in binary, everything else could be sent as text. 

IIRC there was code for the binary protocol in a predecessor of
pgoutput.

Hmmm that might be good place to start. I will do some digging through git history  

I think if we were to add binary output - and I think we should - we
ought to only accept a patch if it's also used in core.

Certainly; as not doing so would make my work completely irrelevant for my purpose. 

Thanks,

Dave

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Binary support for pgoutput plugin
Next
From: Andres Freund
Date:
Subject: Re: Binary support for pgoutput plugin