Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions |
Date | |
Msg-id | CAFiTN-v5=_2qHXXZoDTS=Ua5NJNdK-9oeH5G7_tbLYDZabTJ7g@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Tue, May 26, 2020 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, May 25, 2020 at 8:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, May 22, 2020 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 4. > > > + * XXX Do we need to allocate it in TopMemoryContext? > > > + */ > > > +static void > > > +subxact_info_add(TransactionId xid) > > > { > > > .. > > > > > > For this and other places in a patch like in function > > > stream_open_file(), instead of using TopMemoryContext, can we consider > > > using a new memory context LogicalStreamingContext or something like > > > that. We can create LogicalStreamingContext under TopMemoryContext. I > > > don't see any need of using TopMemoryContext here. > > > > But, when we will delete/reset the LogicalStreamingContext? > > > > Why can't we reset it at each stream stop message? Done this > > > because > > we are planning to keep this memory until the worker is alive so that > > supposed to be the top memory context. > > > > Which part of allocation do we want to keep till the worker is alive? > Why we need memory-related to subxacts till the worker is alive? As > we have now, after reading subxact info (subxact_info_read), we need > to ensure that it is freed after its usage due to which we need to > remember and perform pfree at various places. > > I think we should once see the possibility that such that we could > switch to this new context in start stream message and reset it in > stop stream message. That might help in avoiding > MemoryContextSwitchTo TopMemoryContext at various places. > > > If we create any other context > > with the same life span as TopMemoryContext then what is the point? > > > > It is helpful for debugging. It is recommended that we don't use the > top memory context unless it is really required. Read about it in > src/backend/utils/mmgr/README. xids is now allocated in ApplyContext > > > 8. > > > + * XXX Maybe we should only include the checksum when the cluster is > > > + * initialized with checksums? > > > + */ > > > +static void > > > +subxact_info_write(Oid subid, TransactionId xid) > > > > > > Do we really need to have the checksum for temporary files? I have > > > checked a few other similar cases like SharedFileSet stuff for > > > parallel hash join but didn't find them using checksums. Can you also > > > once see other usages of temporary files and then let us decide if we > > > see any reason to have checksums for this? > > > > Yeah, even I can see other places checksum is not used. > > > > So, unless someone speaks up before you are ready for the next version > of the patch, can we remove it? Done > > > Another point is we don't seem to be doing this for 'changes' file, > > > see stream_write_change. So, not sure, there is any sense to write > > > checksum for subxact file. > > > > I can see there are comment atop this function > > > > * XXX The subxact file includes CRC32C of the contents. Maybe we should > > * include something like that here too, but doing so will not be as > > * straighforward, because we write the file in chunks. > > > > You can remove this comment as well. I don't know how advantageous it > is to checksum temporary files. We can anyway add it later if there > is a reason for doing so. Done > > > > > 12. > > > maybe_send_schema() > > > { > > > .. > > > + if (in_streaming) > > > + { > > > + /* > > > + * TOCHECK: We have to send schema after each catalog change and it may > > > + * occur when streaming already started, so we have to track new catalog > > > + * changes somehow. > > > + */ > > > + schema_sent = get_schema_sent_in_streamed_txn(relentry, topxid); > > > .. > > > .. > > > } > > > > > > I think it is good to once verify/test what this comment says but as > > > per code we should be sending the schema after each catalog change as > > > we invalidate the streamed_txns list in rel_sync_cache_relation_cb > > > which must be called during relcache invalidation. Do we see any > > > problem with that mechanism? > > > > I have tested this, I think we are already sending the schema after > > each catalog change. > > > > Then remove "TOCHECK" in the above comment. Done -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: