Re: Logical decoding for operations on zheap tables - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Logical decoding for operations on zheap tables
Date
Msg-id CAA4eK1JaY3c6WFJoWYj0Vawew3g2o=iqmWChSk54FvhKReo9Og@mail.gmail.com
Whole thread Raw
In response to Re: Logical decoding for operations on zheap tables  (Andres Freund <andres@anarazel.de>)
Responses Re: Logical decoding for operations on zheap tables
List pgsql-hackers
On Thu, Jan 3, 2019 at 11:30 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2018-12-31 09:56:48 +0530, Amit Kapila wrote:
> > To support logical decoding for zheap operations, we need a way to
> > ensure zheap tuples can be registered as change streams.   One idea
> > could be that we make ReorderBufferChange aware of another kind of
> > tuples as well, something like this:
> >
> > @@ -100,6 +123,20 @@ typedef struct ReorderBufferChange
> >   ReorderBufferTupleBuf *newtuple;
> >   } tp;
> > + struct
> > + {
> > + /* relation that has been changed */
> > + RelFileNode relnode;
> > +
> > + /* no previously reassembled toast chunks are necessary anymore */
> > + bool clear_toast_afterwards;
> > +
> > + /* valid for DELETE || UPDATE */
> > + ReorderBufferZTupleBuf *oldtuple;
> > + /* valid for INSERT || UPDATE */
> > + ReorderBufferZTupleBuf *newtuple;
> > + } ztp;
> > +
> >
> >
> > +/* an individual zheap tuple, stored in one chunk of memory */
> > +typedef struct ReorderBufferZTupleBuf
> > +{
> > ..
> > + /* tuple header, the interesting bit for users of logical decoding */
> > + ZHeapTupleData tuple;
> > ..
> > +} ReorderBufferZTupleBuf;
> >
> > Apart from this, we need to define different decode functions for
> > zheap operations as the WAL data is different for heap and zheap, so
> > same functions can't be used to decode.
>
> I'm very strongly opposed to that. We shouldn't have expose every
> possible storage method to output plugins, that'll make extensibility
> a farce.  I think we'll either have to re-form a HeapTuple or decide
> to bite the bullet and start exposing tuples via slots.
>

To be clear, you are against exposing different format of tuples to
plugins, not having different decoding routines for other storage
engines, because later part is unavoidable due to WAL format.   Now,
about tuple format, I guess it would be a lot better if we expose via
slots, but won't that make existing plugins to change the way they
decode the tuple, maybe that is okay?  OTOH, re-forming the heap tuple
has a cost which might be okay for the time being or first version,
but eventually, we want to avoid that.  The other reason why I
refrained from tuple conversion was that I was not sure if we anywhere
rely on the transaction information in the tuple during decode
process, because that will be tricky to mimic, but I guess we don't
check that.

The only point for exposing a different tuple format via plugin was a
performance which I think can be addressed if we expose via slots.  I
don't want to take up exposing slots instead of tuples for plugins as
part of this project and I think if we want to go with that, it is
better done as part of pluggable API?

>
> > This email is primarily to discuss about how the logical decoding for
> > basic DML operations (Insert/Update/Delete) will work in zheap.  We
> > might need some special mechanism to deal with sub-transactions as
> > zheap doesn't generate a transaction id for sub-transactions, but we
> > can discuss that separately.
>
> Subtransactions seems to be the hardest part besides the tuple format
> issue, so I think we should discuss that very soon.
>

Agreed, I am going to look at that part next.

>
> >  /*
> >   * Write relation description to the output stream.
> >   */
> > diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
> > index 23466bade2..70fb5e2934 100644
> > --- a/src/backend/replication/logical/reorderbuffer.c
> > +++ b/src/backend/replication/logical/reorderbuffer.c
> > @@ -393,6 +393,19 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
> >                               change->data.tp.oldtuple = NULL;
> >                       }
> >                       break;
> > +             case REORDER_BUFFER_CHANGE_ZINSERT:
>
> This really needs to be undistinguishable from normal CHANGE_INSERT...
>

Sure, it will be if we decide to either re-form heap tuple or expose via slots.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Mithun Cy
Date:
Subject: Re: WIP: Avoid creation of the free space map for small tables
Next
From: Andres Freund
Date:
Subject: Re: Logical decoding for operations on zheap tables