Re: Global snapshots - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Global snapshots |
Date | |
Msg-id | CAA4eK1JVsMWUD4q-b+vawehFzJb6Qg0AOGq7qOGL_gm6EEhdJg@mail.gmail.com Whole thread Raw |
In response to | Re: Global snapshots (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: Global snapshots
|
List | pgsql-hackers |
On Fri, Jul 3, 2020 at 12:18 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 20 Jun 2020 at 21:21, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov > > <a.lepikhov@postgrespro.ru> wrote: > > > > > Also, can you let us know if this > > > > supports 2PC in some way and if so how is it different from what the > > > > other thread on the same topic [1] is trying to achieve? > > > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains > > > 2PC machinery. Now I'd not judge which approach is better. > > > > > > > Sorry for being late. > No problem, your summarization, and comparisons of both approaches are quite helpful. > > I studied this patch and did a simple comparison between this patch > (0002 patch) and my 2PC patch. > > In terms of atomic commit, the features that are not implemented in > this patch but in the 2PC patch are: > > * Crash safe. > * PREPARE TRANSACTION command support. > * Query cancel during waiting for the commit. > * Automatically in-doubt transaction resolution. > > On the other hand, the feature that is implemented in this patch but > not in the 2PC patch is: > > * Executing PREPARE TRANSACTION (and other commands) in parallel > > When the 2PC patch was proposed, IIRC it was like this patch (0002 > patch). I mean, it changed only postgres_fdw to support 2PC. But after > discussion, we changed the approach to have the core manage foreign > transaction for crash-safe. From my perspective, this patch has a > minimum implementation of 2PC to work the global snapshot feature and > has some missing features important for supporting crash-safe atomic > commit. So I personally think we should consider how to integrate this > global snapshot feature with the 2PC patch, rather than improving this > patch if we want crash-safe atomic commit. > Okay, but isn't there some advantage with this approach (manage 2PC at postgres_fdw level) as well which is that any node will be capable of handling global transactions rather than doing them via central coordinator? I mean any node can do writes or reads rather than probably routing them (at least writes) via coordinator node. Now, I agree that even if this advantage is there in the current approach, we can't lose the crash-safety aspect of other approach. Will you be able to summarize what was the problem w.r.t crash-safety and how your patch has dealt it? > Looking at the commit procedure with this patch: > > When starting a new transaction on a foreign server, postgres_fdw > executes pg_global_snapshot_import() to import the global snapshot. > After some work, in pre-commit phase we do: > > 1. generate global transaction id, say 'gid' > 2. execute PREPARE TRANSACTION 'gid' on all participants. > 3. prepare global snapshot locally, if the local node also involves > the transaction > 4. execute pg_global_snapshot_prepare('gid') for all participants > > During step 2 to 4, we calculate the maximum CSN from the CSNs > returned from each pg_global_snapshot_prepare() executions. > > 5. assign global snapshot locally, if the local node also involves the > transaction > 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants. > > Then, we commit locally (i.g. mark the current transaction as > committed in clog). > > After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all > participants. > As per my current understanding, the overall idea is as follows. For global transactions, pg_global_snapshot_prepare('gid') will set the transaction status as InDoubt and generate CSN (let's call it NodeCSN) at the node where that function is executed, it also returns the NodeCSN to the coordinator. Then the coordinator (the current postgres_fdw node on which write transaction is being executed) computes MaxCSN based on the return value (NodeCSN) of prepare (pg_global_snapshot_prepare) from all nodes. It then assigns MaxCSN to each node. Finally, when Commit Prepared is issued for each node that MaxCSN will be written to each node including the current node. So, with this idea, each node will have the same view of CSN value corresponding to any particular transaction. For Snapshot management, the node which receives the query generates a CSN (CurrentCSN) and follows the simple rule that the tuple having a xid with CSN lesser than CurrentCSN will be visible. Now, it is possible that when we are examining a tuple, the CSN corresponding to xid that has written the tuple has a value as INDOUBT which will indicate that the transaction is yet not committed on all nodes. And we wait till we get the valid CSN value corresponding to xid and then use it to check if the tuple is visible. Now, one thing to note here is that for global transactions we primarily rely on CSN value corresponding to a transaction for its visibility even though we still maintain CLOG for local transaction status. Leaving aside the incomplete parts and or flaws of the current patch, does the above match the top-level idea of this patch? I am not sure if my understanding of this patch at this stage is completely correct or whether we want to follow the approach of this patch but I think at least lets first be sure if such a top-level idea can achieve what we want to do here. > Considering how to integrate this global snapshot feature with the 2PC > patch, what the 2PC patch needs to at least change is to allow FDW to > store an FDW-private data that is passed to subsequent FDW transaction > API calls. Currently, in the current 2PC patch, we call Prepare API > for each participant servers one by one, and the core pass only > metadata such as ForeignServer, UserMapping, and global transaction > identifier. So it's not easy to calculate the maximum CSN across > multiple transaction API calls. I think we can change the 2PC patch to > add a void pointer into FdwXactRslvState, struct passed from the core, > in order to store FDW-private data. It's going to be the maximum CSN > in this case. That way, at the first Prepare API calls postgres_fdw > allocates the space and stores CSN to that space. And at subsequent > Prepare API calls it can calculate the maximum of csn, and then is > able to the step 3 to 6 when preparing the transaction on the last > participant. Another idea would be to change 2PC patch so that the > core passes a bunch of participants grouped by FDW. > IIUC with this the coordinator needs the communication with the nodes twice at the prepare stage, once to prepare the transaction in each node and get CSN from each node and then to communicate MaxCSN to each node? Also, we probably need InDoubt CSN status at prepare phase to make snapshots and global visibility work. > I’ve not read this patch deeply yet and have considered it without any > coding but my first feeling is not hard to integrate this feature with > the 2PC patch. > Okay. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: