Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions |
Date | |
Msg-id | CAFiTN-urcPsyoOJAuhhsH=s02jgxWg8Cyr3b+s_kuXChoMKJUg@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
|
List | pgsql-hackers |
On Tue, May 19, 2020 at 2:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, May 18, 2020 at 5:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 3. > > + /* > > + * If streaming is enable and we have serialized this transaction because > > + * it had incomplete tuple. So if now we have got the complete tuple we > > + * can stream it. > > + */ > > + if (ReorderBufferCanStream(rb) && can_stream && rbtxn_is_serialized(toptxn) > > + && !(rbtxn_has_toast_insert(txn)) && !(rbtxn_has_spec_insert(txn))) > > + { > > > > This comment is just saying what you are doing in the if-check. I > > think you need to explain the rationale behind it. I don't like the > > variable name 'can_stream' because it matches ReorderBufferCanStream > > whereas it is for a different purpose, how about naming it as > > 'change_complete' or something like that. The check has many > > conditions, can we move it to a separate function to make the code > > here look clean? > > > > Do we really need this? Immediately after this check, we are calling > ReorderBufferCheckMemoryLimit which will anyway stream the changes if > required. Actually, ReorderBufferCheckMemoryLimit is only meant for checking whether we need to stream the changes due to the memory limit. But suppose when memory limit exceeds that time we could not stream the transaction because there was only incomplete toast insert so we serialized. Now, when we get the tuple which makes the changes complete but now it is not crossing the memory limit as changes were already serialized. So I am not sure whether it is a good idea to stream the transaction as soon as we get the complete changes or we shall wait till next time memory limit exceed and that time we select the suitable candidate. Ideally, we were are in streaming more and the transaction is serialized means it was already a candidate for streaming but could not stream due to the incomplete changes so shouldn't we stream it immediately as soon as its changes are complete even though now we are in memory limit. Because our target is to stream not spill so we should try to stream the spilled changes on the first opportunity. Can we move the changes related to the detection of > incomplete data to a separate function? Ok. > > Another comments on v20-0010-Bugfix-handling-of-incomplete-toast-tuple: > > + else if (rbtxn_has_toast_insert(txn) && > + ChangeIsInsertOrUpdate(change->action)) > + { > + toptxn->txn_flags &= ~RBTXN_HAS_TOAST_INSERT; > + can_stream = true; > + } > .. > +#define ChangeIsInsertOrUpdate(action) \ > + (((action) == REORDER_BUFFER_CHANGE_INSERT) || \ > + ((action) == REORDER_BUFFER_CHANGE_UPDATE) || \ > + ((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)) > > How can we clear the RBTXN_HAS_TOAST_INSERT flag on > REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT action? Partial toast insert means we have inserted in the toast but not in the main table. So even if it is spec insert we can form the complete tuple, however, we can still not stream it because we haven't got spec_confirm but for that, we are marking another flag. So if the insert is aspect insert the toast insert will also be spec insert and as part of that toast, spec inserts we are marking partial tuple so cleaning that flag should happen when the spec insert is done for the main table right? > IIUC, the basic idea used to handle incomplete changes (which is > possible in case of toast tuples and speculative inserts) is to mark > such TXNs as containing incomplete changes and then while finding the > largest top-level TXN for streaming, we ignore such TXN's and move to > next largest TXN. If none of the TXNs have complete changes then we > choose the largest (sub)transaction and spill the same to make the > in-memory changes below logical_decoding_work_mem threshold. This > idea can work but the strategy to choose the transaction is suboptimal > for cases where TXNs have some changes which are complete followed by > an incomplete toast or speculative tuple. I was having an offlist > discussion with Robert on this problem and he suggested that it would > be better if we track the complete part of changes separately and then > we can avoid the drawback mentioned above. I have thought about this > and I think it can work if we track the size and LSN of completed > changes. I think we need to ensure that if there is concurrent abort > then we discard all changes for current (sub)transaction not only up > to completed changes LSN whereas if the streaming is successful then > we can truncate the changes only up to completed changes LSN. What do > you think? > > I wonder why you have done this as 0010 in the patch series, it should > be as 0006 after the > 0005-Implement-streaming-mode-in-ReorderBuffer.patch. If we can do > that way then it would be easier for me to review. Is there a reason > for not doing so? No reason, I can do that. Actually, later we can merge the changes to 0005 only, I kept separate for review. Anyway, in the next version, I will make it as 0006. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: