Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Date
Msg-id CAFiTN-v5=_2qHXXZoDTS=Ua5NJNdK-9oeH5G7_tbLYDZabTJ7g@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, May 26, 2020 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 25, 2020 at 8:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, May 22, 2020 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > 4.
> > > + * XXX Do we need to allocate it in TopMemoryContext?
> > > + */
> > > +static void
> > > +subxact_info_add(TransactionId xid)
> > > {
> > > ..
> > >
> > > For this and other places in a patch like in function
> > > stream_open_file(), instead of using TopMemoryContext, can we consider
> > > using a new memory context LogicalStreamingContext or something like
> > > that. We can create LogicalStreamingContext under TopMemoryContext.  I
> > > don't see any need of using TopMemoryContext here.
> >
> > But, when we will delete/reset the LogicalStreamingContext?
> >
>
> Why can't we reset it at each stream stop message?

Done this

>
> >  because
> > we are planning to keep this memory until the worker is alive so that
> > supposed to be the top memory context.
> >
>
> Which part of allocation do we want to keep till the worker is alive?
> Why we need memory-related to subxacts till the worker is alive?  As
> we have now, after reading subxact info (subxact_info_read), we need
> to ensure that it is freed after its usage due to which we need to
> remember and perform pfree at various places.
>
> I think we should once see the possibility that such that we could
> switch to this new context in start stream message and reset it in
> stop stream message.  That might help in avoiding
> MemoryContextSwitchTo TopMemoryContext at various places.
>
> >  If we create any other context
> > with the same life span as TopMemoryContext then what is the point?
> >
>
> It is helpful for debugging.  It is recommended that we don't use the
> top memory context unless it is really required.  Read about it in
> src/backend/utils/mmgr/README.

xids is now allocated in ApplyContext

> > > 8.
> > > + * XXX Maybe we should only include the checksum when the cluster is
> > > + * initialized with checksums?
> > > + */
> > > +static void
> > > +subxact_info_write(Oid subid, TransactionId xid)
> > >
> > > Do we really need to have the checksum for temporary files? I have
> > > checked a few other similar cases like SharedFileSet stuff for
> > > parallel hash join but didn't find them using checksums.  Can you also
> > > once see other usages of temporary files and then let us decide if we
> > > see any reason to have checksums for this?
> >
> > Yeah, even I can see other places checksum is not used.
> >
>
> So, unless someone speaks up before you are ready for the next version
> of the patch, can we remove it?

Done
> > > Another point is we don't seem to be doing this for 'changes' file,
> > > see stream_write_change.  So, not sure, there is any sense to write
> > > checksum for subxact file.
> >
> > I can see there are comment atop this function
> >
> > * XXX The subxact file includes CRC32C of the contents. Maybe we should
> > * include something like that here too, but doing so will not be as
> > * straighforward, because we write the file in chunks.
> >
>
> You can remove this comment as well.  I don't know how advantageous it
> is to checksum temporary files.  We can anyway add it later if there
> is a reason for doing so.

Done

> >
> > > 12.
> > > maybe_send_schema()
> > > {
> > > ..
> > > + if (in_streaming)
> > > + {
> > > + /*
> > > + * TOCHECK: We have to send schema after each catalog change and it may
> > > + * occur when streaming already started, so we have to track new catalog
> > > + * changes somehow.
> > > + */
> > > + schema_sent = get_schema_sent_in_streamed_txn(relentry, topxid);
> > > ..
> > > ..
> > > }
> > >
> > > I think it is good to once verify/test what this comment says but as
> > > per code we should be sending the schema after each catalog change as
> > > we invalidate the streamed_txns list in rel_sync_cache_relation_cb
> > > which must be called during relcache invalidation.  Do we see any
> > > problem with that mechanism?
> >
> > I have tested this, I think we are already sending the schema after
> > each catalog change.
> >
>
> Then remove "TOCHECK" in the above comment.

Done


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: David Fetter
Date:
Subject: Re: proposal: possibility to read dumped table's name from file