Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] logical decoding of two-phase transactions |
Date | |
Msg-id | CAA4eK1L4fZGY0YQOuHRkZU4ThBmLBpObwdndJ0UuuWLRswhi3g@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] logical decoding of two-phase transactions (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] logical decoding of two-phase transactions
Re: [HACKERS] logical decoding of two-phase transactions |
List | pgsql-hackers |
On Mon, Nov 30, 2020 at 7:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 30, 2020 at 2:36 PM Ajin Cherian <itsajin@gmail.com> wrote: > > Sure, but you can see in your example above it got skipped due to > start_decoding_at not due to DecodingContextReady. So, the problem as > mentioned by me previously was how we distinguish those cases because > it can skip due to start_decoding_at during restart as well when we > would have already sent the prepare to the subscriber. > > One idea could be that the subscriber skips the transaction if it sees > the transaction is already prepared. > To skip it, we need to send GID in begin message and then on subscriber-side, check if the prepared xact already exists, if so then set a flag. The flag needs to be set in begin/start_stream and reset in stop_stream/commit/abort. Using the flag, we can skip the entire contents of the prepared xact. In ReorderFuffer-side also, we need to get and set GID in txn even when we skip it because we need to send the same at commit time. In this solution, we won't be able to send it during normal start_stream because by that time we won't know GID and I think that won't be required. Note that this is only required when we skipped sending prepare, otherwise, we just need to send Commit-Prepared at commit time. Another way to solve this problem via publisher-side is to maintain in some file at slot level whether we have sent prepare for a particular txn? Basically, after sending prepare, we need to update the slot information on disk to indicate that the particular GID is sent (we can probably store GID and LSN of Prepare). Then next time whenever we have to skip prepare due to whatever reason, we can check the existence of persistent information on disk for that GID, if it exists then we need to send just Commit Prepared, otherwise, the entire transaction. We can remove this information during or after CheckPointSnapBuild, basically, we can remove the information of all GID's that are after cutoff LSN computed via ReplicationSlotsComputeLogicalRestartLSN. Now, we can even think of removing this information after Commit Prepared but not sure if that is correct because we can't lose this information unless start_decoding_at (or restart_lsn) is moved past the commit lsn Now, to persist this information, there could be multiple possibilities (a) maintain the flexible array for GID's at the end of ReplicationSlotPersistentData, (b) have a separate state file per-slot for prepared xacts, (c) have a separate state file for each prepared xact per-slot. With (a) during upgrade from the previous version there could be a problem because the previous data won't match new data but I am not sure if we maintain slots info intact after upgrade. I think (c) would be simplest but OTOH, having many such files (in case there are more prepared xacts) per-slot might not be a good idea. One more thing that needs to be thought about is when we are sending the entire xact at commit time whether we will send prepare separately? Because, if we don't send it separately, then later allowing the PREPARE on the master to wait for prepare via subscribers won't be possible? Thoughts? -- With Regards, Amit Kapila.
pgsql-hackers by date: