Thread: Logical decoding for operations on zheap tables

Logical decoding for operations on zheap tables

From

Amit Kapila

Date:

31 December 2018, 04:26:48

To support logical decoding for zheap operations, we need a way to
ensure zheap tuples can be registered as change streams.   One idea
could be that we make ReorderBufferChange aware of another kind of
tuples as well, something like this:

@@ -100,6 +123,20 @@ typedef struct ReorderBufferChange
  ReorderBufferTupleBuf *newtuple;
  } tp;
+ struct
+ {
+ /* relation that has been changed */
+ RelFileNode relnode;
+
+ /* no previously reassembled toast chunks are necessary anymore */
+ bool clear_toast_afterwards;
+
+ /* valid for DELETE || UPDATE */
+ ReorderBufferZTupleBuf *oldtuple;
+ /* valid for INSERT || UPDATE */
+ ReorderBufferZTupleBuf *newtuple;
+ } ztp;
+


+/* an individual zheap tuple, stored in one chunk of memory */
+typedef struct ReorderBufferZTupleBuf
+{
..
+ /* tuple header, the interesting bit for users of logical decoding */
+ ZHeapTupleData tuple;
..
+} ReorderBufferZTupleBuf;

Apart from this, we need to define different decode functions for
zheap operations as the WAL data is different for heap and zheap, so
same functions can't be used to decode.

I have written a very hacky version to support zheap Insert operation
based on the above idea.  If we want to go with this approach, we
might need a better way to represent a different type of tuple in
ReorderBufferChange.

The yet another approach could be that in the decode functions after
forming zheap tuples from WAL, we can convert them to heap tuples.  I
have not tried that, so not sure if it can work, but it seems to me if
we can avoid tuple conversion overhead, it will be good.

This email is primarily to discuss about how the logical decoding for
basic DML operations (Insert/Update/Delete) will work in zheap.  We
might need some special mechanism to deal with sub-transactions as
zheap doesn't generate a transaction id for sub-transactions, but we
can discuss that separately.

Thoughts?

Note - This patch is based on pluggable_zheap branch
(https://github.com/anarazel/postgres-pluggable-storage)

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

decode_zinsert_1.patch

Re: Logical decoding for operations on zheap tables

From

Amit Kapila

Date:

03 January 2019, 03:58:12

On Mon, Dec 31, 2018 at 9:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> To support logical decoding for zheap operations, we need a way to
> ensure zheap tuples can be registered as change streams.   One idea
> could be that we make ReorderBufferChange aware of another kind of
> tuples as well, something like this:
>
..
>
> Apart from this, we need to define different decode functions for
> zheap operations as the WAL data is different for heap and zheap, so
> same functions can't be used to decode.
>
> I have written a very hacky version to support zheap Insert operation
> based on the above idea.
>

I went ahead and tried to implement the decoding for Delete operation
as well based on the above approach and the result is attached.

>
> The yet another approach could be that in the decode functions after
> forming zheap tuples from WAL, we can convert them to heap tuples.  I
> have not tried that, so not sure if it can work, but it seems to me if
> we can avoid tuple conversion overhead, it will be good.
>

While implementing the decoding for delete operation, I noticed that
the main changes required are to write a decode operation and
additional WAL (like old tuple) which anyway is required even if we
pursue this approach, so I think it might be better to with the
approach where we don't need tuple conversion (aka something similar
to what is done in attached patch).

Note - This patch is based on pluggable-zheap branch
(https://github.com/anarazel/postgres-pluggable-storage)

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

decode_zops_2.patch

Re: Logical decoding for operations on zheap tables

From

Andres Freund

Date:

03 January 2019, 18:00:55

Hi,

On 2018-12-31 09:56:48 +0530, Amit Kapila wrote:
> To support logical decoding for zheap operations, we need a way to
> ensure zheap tuples can be registered as change streams.   One idea
> could be that we make ReorderBufferChange aware of another kind of
> tuples as well, something like this:
> 
> @@ -100,6 +123,20 @@ typedef struct ReorderBufferChange
>   ReorderBufferTupleBuf *newtuple;
>   } tp;
> + struct
> + {
> + /* relation that has been changed */
> + RelFileNode relnode;
> +
> + /* no previously reassembled toast chunks are necessary anymore */
> + bool clear_toast_afterwards;
> +
> + /* valid for DELETE || UPDATE */
> + ReorderBufferZTupleBuf *oldtuple;
> + /* valid for INSERT || UPDATE */
> + ReorderBufferZTupleBuf *newtuple;
> + } ztp;
> +
> 
> 
> +/* an individual zheap tuple, stored in one chunk of memory */
> +typedef struct ReorderBufferZTupleBuf
> +{
> ..
> + /* tuple header, the interesting bit for users of logical decoding */
> + ZHeapTupleData tuple;
> ..
> +} ReorderBufferZTupleBuf;
> 
> Apart from this, we need to define different decode functions for
> zheap operations as the WAL data is different for heap and zheap, so
> same functions can't be used to decode.

I'm very strongly opposed to that. We shouldn't have expose every
possible storage method to output plugins, that'll make extensibility
a farce.  I think we'll either have to re-form a HeapTuple or decide
to bite the bullet and start exposing tuples via slots.


> This email is primarily to discuss about how the logical decoding for
> basic DML operations (Insert/Update/Delete) will work in zheap.  We
> might need some special mechanism to deal with sub-transactions as
> zheap doesn't generate a transaction id for sub-transactions, but we
> can discuss that separately.

Subtransactions seems to be the hardest part besides the tuple format
issue, so I think we should discuss that very soon.

> +/*
> + * Write zheap's INSERT to the output stream.
> + */
> +void
> +logicalrep_write_zinsert(StringInfo out, Relation rel, ZHeapTuple newtuple)
> +{
> +    pq_sendbyte(out, 'I');        /* action INSERT */
> +
> +    Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
> +           rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
> +           rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
> +
> +    /* use Oid as relation identifier */
> +    pq_sendint32(out, RelationGetRelid(rel));
> +
> +    pq_sendbyte(out, 'N');        /* new tuple follows */
> +    //logicalrep_write_tuple(out, rel, newtuple);
> +}

Obviously we need to do better - I don't think we should have
tuple-specific replication messages.


>  /*
>   * Write relation description to the output stream.
>   */
> diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
> index 23466bade2..70fb5e2934 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -393,6 +393,19 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
>                  change->data.tp.oldtuple = NULL;
>              }
>              break;
> +        case REORDER_BUFFER_CHANGE_ZINSERT:

This really needs to be undistinguishable from normal CHANGE_INSERT...

Greetings,

Andres Freund

Re: Logical decoding for operations on zheap tables

From

Alvaro Herrera

Date:

03 January 2019, 18:13:42

On 2019-Jan-03, Andres Freund wrote:

> > Apart from this, we need to define different decode functions for
> > zheap operations as the WAL data is different for heap and zheap, so
> > same functions can't be used to decode.
> 
> I'm very strongly opposed to that. We shouldn't have expose every
> possible storage method to output plugins, that'll make extensibility
> a farce.  I think we'll either have to re-form a HeapTuple or decide
> to bite the bullet and start exposing tuples via slots.

Hmm, without looking at the patches, I agree that the tuples should be
given as slots to the logical decoding interface.  I wonder if we need a
further function in the TTS interface to help decoding, or is the
"getattr" stuff sufficient.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Logical decoding for operations on zheap tables

From

Andres Freund

Date:

03 January 2019, 18:23:30

Hi,

On 2019-01-03 15:13:42 -0300, Alvaro Herrera wrote:
> On 2019-Jan-03, Andres Freund wrote:
> 
> > > Apart from this, we need to define different decode functions for
> > > zheap operations as the WAL data is different for heap and zheap, so
> > > same functions can't be used to decode.
> > 
> > I'm very strongly opposed to that. We shouldn't have expose every
> > possible storage method to output plugins, that'll make extensibility
> > a farce.  I think we'll either have to re-form a HeapTuple or decide
> > to bite the bullet and start exposing tuples via slots.
> 
> Hmm, without looking at the patches, I agree that the tuples should be
> given as slots to the logical decoding interface.  I wonder if we need a
> further function in the TTS interface to help decoding, or is the
> "getattr" stuff sufficient.

What precisely do you mean with "getattr stuff"? I'd assume that you'd
normally do a slot_getallattrs() and then access tts_values/nulls
directly. I don't think there's anything missing in the slot interface
itself, but using slots probably would require some careful
considerations around memory management, possibly a decoding specific
slot implementation even.

Greetings,

Andres Freund

Re: Logical decoding for operations on zheap tables

From

Alvaro Herrera

Date:

03 January 2019, 18:38:30

On 2019-Jan-03, Andres Freund wrote:

> Hi,
> 
> On 2019-01-03 15:13:42 -0300, Alvaro Herrera wrote:

> > Hmm, without looking at the patches, I agree that the tuples should be
> > given as slots to the logical decoding interface.  I wonder if we need a
> > further function in the TTS interface to help decoding, or is the
> > "getattr" stuff sufficient.
> 
> What precisely do you mean with "getattr stuff"? I'd assume that you'd
> normally do a slot_getallattrs() and then access tts_values/nulls
> directly.

Ah, yeah, you deform the tuple first and then access the arrays
directly, right.  I was just agreeing with your point that forming a
heaptuple only to have logical decoding grab individual attrs from there
didn't sound terribly optimal.

> I don't think there's anything missing in the slot interface itself,
> but using slots probably would require some careful considerations
> around memory management, possibly a decoding specific slot
> implementation even.

A specific slot implementation sounds like more work than I was
envisioning.  Can't we just "pin" a slot to a memory context or
something like that, to keep it alive until decoding is done with it?
It seems useful to avoid creating another copy of the tuple in memory
(which we would need if, if I understand you correctly, we need to form
the tuple under a different slot implementation from whatever the origin
is).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Logical decoding for operations on zheap tables

From

Amit Kapila

Date:

04 January 2019, 03:24:34

On Thu, Jan 3, 2019 at 11:30 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2018-12-31 09:56:48 +0530, Amit Kapila wrote:
> > To support logical decoding for zheap operations, we need a way to
> > ensure zheap tuples can be registered as change streams.   One idea
> > could be that we make ReorderBufferChange aware of another kind of
> > tuples as well, something like this:
> >
> > @@ -100,6 +123,20 @@ typedef struct ReorderBufferChange
> >   ReorderBufferTupleBuf *newtuple;
> >   } tp;
> > + struct
> > + {
> > + /* relation that has been changed */
> > + RelFileNode relnode;
> > +
> > + /* no previously reassembled toast chunks are necessary anymore */
> > + bool clear_toast_afterwards;
> > +
> > + /* valid for DELETE || UPDATE */
> > + ReorderBufferZTupleBuf *oldtuple;
> > + /* valid for INSERT || UPDATE */
> > + ReorderBufferZTupleBuf *newtuple;
> > + } ztp;
> > +
> >
> >
> > +/* an individual zheap tuple, stored in one chunk of memory */
> > +typedef struct ReorderBufferZTupleBuf
> > +{
> > ..
> > + /* tuple header, the interesting bit for users of logical decoding */
> > + ZHeapTupleData tuple;
> > ..
> > +} ReorderBufferZTupleBuf;
> >
> > Apart from this, we need to define different decode functions for
> > zheap operations as the WAL data is different for heap and zheap, so
> > same functions can't be used to decode.
>
> I'm very strongly opposed to that. We shouldn't have expose every
> possible storage method to output plugins, that'll make extensibility
> a farce.  I think we'll either have to re-form a HeapTuple or decide
> to bite the bullet and start exposing tuples via slots.
>

To be clear, you are against exposing different format of tuples to
plugins, not having different decoding routines for other storage
engines, because later part is unavoidable due to WAL format.   Now,
about tuple format, I guess it would be a lot better if we expose via
slots, but won't that make existing plugins to change the way they
decode the tuple, maybe that is okay?  OTOH, re-forming the heap tuple
has a cost which might be okay for the time being or first version,
but eventually, we want to avoid that.  The other reason why I
refrained from tuple conversion was that I was not sure if we anywhere
rely on the transaction information in the tuple during decode
process, because that will be tricky to mimic, but I guess we don't
check that.

The only point for exposing a different tuple format via plugin was a
performance which I think can be addressed if we expose via slots.  I
don't want to take up exposing slots instead of tuples for plugins as
part of this project and I think if we want to go with that, it is
better done as part of pluggable API?

>
> > This email is primarily to discuss about how the logical decoding for
> > basic DML operations (Insert/Update/Delete) will work in zheap.  We
> > might need some special mechanism to deal with sub-transactions as
> > zheap doesn't generate a transaction id for sub-transactions, but we
> > can discuss that separately.
>
> Subtransactions seems to be the hardest part besides the tuple format
> issue, so I think we should discuss that very soon.
>

Agreed, I am going to look at that part next.

>
> >  /*
> >   * Write relation description to the output stream.
> >   */
> > diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
> > index 23466bade2..70fb5e2934 100644
> > --- a/src/backend/replication/logical/reorderbuffer.c
> > +++ b/src/backend/replication/logical/reorderbuffer.c
> > @@ -393,6 +393,19 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
> >                               change->data.tp.oldtuple = NULL;
> >                       }
> >                       break;
> > +             case REORDER_BUFFER_CHANGE_ZINSERT:
>
> This really needs to be undistinguishable from normal CHANGE_INSERT...
>

Sure, it will be if we decide to either re-form heap tuple or expose via slots.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Logical decoding for operations on zheap tables

From

Andres Freund

Date:

04 January 2019, 03:30:58

Hi,

On 2019-01-04 08:54:34 +0530, Amit Kapila wrote:
> On Thu, Jan 3, 2019 at 11:30 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2018-12-31 09:56:48 +0530, Amit Kapila wrote:
> > > To support logical decoding for zheap operations, we need a way to
> > > ensure zheap tuples can be registered as change streams.   One idea
> > > could be that we make ReorderBufferChange aware of another kind of
> > > tuples as well, something like this:
> > >
> > > @@ -100,6 +123,20 @@ typedef struct ReorderBufferChange
> > >   ReorderBufferTupleBuf *newtuple;
> > >   } tp;
> > > + struct
> > > + {
> > > + /* relation that has been changed */
> > > + RelFileNode relnode;
> > > +
> > > + /* no previously reassembled toast chunks are necessary anymore */
> > > + bool clear_toast_afterwards;
> > > +
> > > + /* valid for DELETE || UPDATE */
> > > + ReorderBufferZTupleBuf *oldtuple;
> > > + /* valid for INSERT || UPDATE */
> > > + ReorderBufferZTupleBuf *newtuple;
> > > + } ztp;
> > > +
> > >
> > >
> > > +/* an individual zheap tuple, stored in one chunk of memory */
> > > +typedef struct ReorderBufferZTupleBuf
> > > +{
> > > ..
> > > + /* tuple header, the interesting bit for users of logical decoding */
> > > + ZHeapTupleData tuple;
> > > ..
> > > +} ReorderBufferZTupleBuf;
> > >
> > > Apart from this, we need to define different decode functions for
> > > zheap operations as the WAL data is different for heap and zheap, so
> > > same functions can't be used to decode.
> >
> > I'm very strongly opposed to that. We shouldn't have expose every
> > possible storage method to output plugins, that'll make extensibility
> > a farce.  I think we'll either have to re-form a HeapTuple or decide
> > to bite the bullet and start exposing tuples via slots.
> >
> 
> To be clear, you are against exposing different format of tuples to
> plugins, not having different decoding routines for other storage
> engines, because later part is unavoidable due to WAL format.

Correct.


> Now,
> about tuple format, I guess it would be a lot better if we expose via
> slots, but won't that make existing plugins to change the way they
> decode the tuple, maybe that is okay?

I think one-off API changes are ok. What I'm strictly against is
primarily that output plugins will have to deal with more and more
different tuple formats.


> OTOH, re-forming the heap tuple
> has a cost which might be okay for the time being or first version,
> but eventually, we want to avoid that.

Right.


> The other reason why I refrained from tuple conversion was that I
> was not sure if we anywhere rely on the transaction information in
> the tuple during decode process, because that will be tricky to
> mimic, but I guess we don't check that.

Shouldn't be necessary - in fact, most of that information isn't in
the heap wal records in the first place.


> The only point for exposing a different tuple format via plugin was a
> performance which I think can be addressed if we expose via slots.  I
> don't want to take up exposing slots instead of tuples for plugins as
> part of this project and I think if we want to go with that, it is
> better done as part of pluggable API?

No, I don't think it makes sense to address this is as part of
pluggable storage. That patchset is already way too invasive and
large.

Greetings,

Andres Freund

Re: Logical decoding for operations on zheap tables

From

Amit Kapila

Date:

12 January 2019, 11:32:29

On Fri, Jan 4, 2019 at 9:01 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-01-04 08:54:34 +0530, Amit Kapila wrote:
> > The only point for exposing a different tuple format via plugin was a
> > performance which I think can be addressed if we expose via slots.  I
> > don't want to take up exposing slots instead of tuples for plugins as
> > part of this project and I think if we want to go with that, it is
> > better done as part of pluggable API?
>
> No, I don't think it makes sense to address this is as part of
> pluggable storage. That patchset is already way too invasive and
> large.
>

Fair enough.  I think that for now (and maybe for the first version
that can be committed) we might want to use heap tuple format.  There
will be some overhead but I think code-wise, things will be simpler.
I have prototyped it for Insert and Delete operations of zheap and the
only thing that is required are new decode functions, see the attached
patch.  I have done very minimal testing of this patch as this is just
to show you and others the direction we are taking (w.r.t tuple
format) to support logical decoding in zheap.

Thanks for the feedback, further thoughts are welcome!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

decode_zops_3.patch

Re: Logical decoding for operations on zheap tables

From

Dilip Kumar

Date:

22 January 2019, 04:37:30

On Sat, Jan 12, 2019 at 5:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Fair enough.  I think that for now (and maybe for the first version
> that can be committed) we might want to use heap tuple format.  There
> will be some overhead but I think code-wise, things will be simpler.
> I have prototyped it for Insert and Delete operations of zheap and the
> only thing that is required are new decode functions, see the attached
> patch.  I have done very minimal testing of this patch as this is just
> to show you and others the direction we are taking (w.r.t tuple
> format) to support logical decoding in zheap.
+ */
+ zhtup = DecodeXLogZTuple(tupledata, datalen);
+ reloid = RelidByRelfilenode(change->data.tp.relnode.spcNode,
+ change->data.tp.relnode.relNode);
+ relation = RelationIdGetRelation(reloid);

We need to start a transaction for fetching the relation if it's a
walsender process.  I have fixed this issue in the patch and also
implemented decode functions for zheap update and multi-insert.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

decode_zops_v4.patch