Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Date
Msg-id CAFiTN-vV_eO7xjJq0iHyFqcMA2GiohPPnd-ohLhib9vMqG0Z1w@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Jan 9, 2020 at 12:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 9, 2020 at 10:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Jan 9, 2020 at 9:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jan 8, 2020 at 1:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I have observed one more design issue.
> > > >
> > >
> > > Good observation.
> > >
> > > >  The problem is that when we
> > > > get a toasted chunks we remember the changes in the memory(hash table)
> > > > but don't stream until we get the actual change on the main table.
> > > > Now, the problem is that we might get the change of the toasted table
> > > > and the main table in different streams.  So basically, in a stream,
> > > > if we have only got the toasted tuples then even after
> > > > ReorderBufferStreamTXN the memory usage will not be reduced.
> > > >
> > >
> > > I think we can't split such changes in a different stream (unless we
> > > design an entirely new solution to send partial changes of toast
> > > data), so we need to send them together. We can keep a flag like
> > > data_complete in ReorderBufferTxn and mark it complete only when we
> > > are able to assemble the entire tuple.  Now, whenever, we try to
> > > stream the changes once we reach the memory threshold, we can check
> > > whether the data_complete flag is true, if so, then only send the
> > > changes, otherwise, we can pick the next largest transaction.  I think
> > > we can retry it for few times and if we get the incomplete data for
> > > multiple transactions, then we can decide to spill the transaction or
> > > maybe we can directly spill the first largest transaction which has
> > > incomplete data.
> > >
> > Yeah, we might do something on this line.  Basically, we need to mark
> > the top-transaction as data-incomplete if any of its subtransaction is
> > having data-incomplete (it will always be the latest sub-transaction
> > of the top transaction).  Also, for streaming, we are checking the
> > largest top transaction whereas for spilling we just need the larget
> > (sub) transaction.   So we also need to decide while picking the
> > largest top transaction for streaming, if we get a few transactions
> > with in-complete data then how we will go for the spill.  Do we spill
> > all the sub-transactions under this top transaction or we will again
> > find the larget (sub) transaction for spilling.
> >
>
> I think it is better to do later as that will lead to the spill of
> only required (minimum changes to get the memory below threshold)
> changes.
Make sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Add pg_file_sync() to adminpack
Next
From: Neha Sharma
Date:
Subject: Re: [Logical Replication] TRAP: FailedAssertion("rel->rd_rel->relreplident== REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident ==REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX"