Re: Global snapshots - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Global snapshots |
Date | |
Msg-id | CA+fd4k6oZtO-MFYmunHVecGaTWre8YKDNTSfX9hZhQh6Kui1kA@mail.gmail.com Whole thread Raw |
In response to | Re: Global snapshots (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Global snapshots
|
List | pgsql-hackers |
On Sat, 20 Jun 2020 at 21:21, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov > <a.lepikhov@postgrespro.ru> wrote: > > > > On 6/19/20 11:48 AM, Amit Kapila wrote: > > > On Wed, Jun 10, 2020 at 8:36 AM Andrey V. Lepikhov > > > <a.lepikhov@postgrespro.ru> wrote: > > >> On 09.06.2020 11:41, Fujii Masao wrote: > > >>> The patches seem not to be registered in CommitFest yet. > > >>> Are you planning to do that? > > >> Not now. It is a sharding-related feature. I'm not sure that this > > >> approach is fully consistent with the sharding way now. > > > Can you please explain in detail, why you think so? There is no > > > commit message explaining what each patch does so it is difficult to > > > understand why you said so? > > For now I used this patch set for providing correct visibility in the > > case of access to the table with foreign partitions from many nodes in > > parallel. So I saw at this patch set as a sharding-related feature, but > > [1] shows another useful application. > > CSN-based approach has weak points such as: > > 1. Dependency on clocks synchronization > > 2. Needs guarantees of monotonically increasing of the CSN in the case > > of an instance restart/crash etc. > > 3. We need to delay increasing of OldestXmin because it can be needed > > for a transaction snapshot at another node. > > > > So, is anyone working on improving these parts of the patch. AFAICS > from what Bruce has shared [1], some people from HighGo are working on > it but I don't see any discussion of that yet. > > > So I do not have full conviction that it will be better than a single > > distributed transaction manager. > > > > When you say "single distributed transaction manager" do you mean > something like pg_dtm which is inspired by Postgres-XL? > > > Also, can you let us know if this > > > supports 2PC in some way and if so how is it different from what the > > > other thread on the same topic [1] is trying to achieve? > > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains > > 2PC machinery. Now I'd not judge which approach is better. > > > Sorry for being late. > Yeah, I have studied both the approaches a little and I feel the main > difference seems to be that in this patch atomicity is tightly coupled > with how we achieve global visibility, basically in this patch "all > running transactions are marked as InDoubt on all nodes in prepare > phase, and after that, each node commit it and stamps each xid with a > given GlobalCSN.". There are no separate APIs for > prepare/commit/rollback exposed by postgres_fdw as we do it in the > approach followed by Sawada-San's patch. It seems to me in the patch > in this email one of postgres_fdw node can be a sort of coordinator > which prepares and commit the transaction on all other nodes whereas > that is not true in Sawada-San's patch (where the coordinator is a > local Postgres node, am I right Sawada-San?). Yeah, where to manage foreign transactions is different: postgres_fdw manages foreign transactions in this patch whereas the PostgreSQL core does that in that 2PC patch. > > I feel if Sawada-San or someone involved in another patch also once > studies this approach and try to come up with some form of comparison > then we might be able to make better decision. It is possible that > there are few good things in each approach which we can use. > I studied this patch and did a simple comparison between this patch (0002 patch) and my 2PC patch. In terms of atomic commit, the features that are not implemented in this patch but in the 2PC patch are: * Crash safe. * PREPARE TRANSACTION command support. * Query cancel during waiting for the commit. * Automatically in-doubt transaction resolution. On the other hand, the feature that is implemented in this patch but not in the 2PC patch is: * Executing PREPARE TRANSACTION (and other commands) in parallel When the 2PC patch was proposed, IIRC it was like this patch (0002 patch). I mean, it changed only postgres_fdw to support 2PC. But after discussion, we changed the approach to have the core manage foreign transaction for crash-safe. From my perspective, this patch has a minimum implementation of 2PC to work the global snapshot feature and has some missing features important for supporting crash-safe atomic commit. So I personally think we should consider how to integrate this global snapshot feature with the 2PC patch, rather than improving this patch if we want crash-safe atomic commit. Looking at the commit procedure with this patch: When starting a new transaction on a foreign server, postgres_fdw executes pg_global_snapshot_import() to import the global snapshot. After some work, in pre-commit phase we do: 1. generate global transaction id, say 'gid' 2. execute PREPARE TRANSACTION 'gid' on all participants. 3. prepare global snapshot locally, if the local node also involves the transaction 4. execute pg_global_snapshot_prepare('gid') for all participants During step 2 to 4, we calculate the maximum CSN from the CSNs returned from each pg_global_snapshot_prepare() executions. 5. assign global snapshot locally, if the local node also involves the transaction 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants. Then, we commit locally (i.g. mark the current transaction as committed in clog). After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all participants. Considering how to integrate this global snapshot feature with the 2PC patch, what the 2PC patch needs to at least change is to allow FDW to store an FDW-private data that is passed to subsequent FDW transaction API calls. Currently, in the current 2PC patch, we call Prepare API for each participant servers one by one, and the core pass only metadata such as ForeignServer, UserMapping, and global transaction identifier. So it's not easy to calculate the maximum CSN across multiple transaction API calls. I think we can change the 2PC patch to add a void pointer into FdwXactRslvState, struct passed from the core, in order to store FDW-private data. It's going to be the maximum CSN in this case. That way, at the first Prepare API calls postgres_fdw allocates the space and stores CSN to that space. And at subsequent Prepare API calls it can calculate the maximum of csn, and then is able to the step 3 to 6 when preparing the transaction on the last participant. Another idea would be to change 2PC patch so that the core passes a bunch of participants grouped by FDW. I’ve not read this patch deeply yet and have considered it without any coding but my first feeling is not hard to integrate this feature with the 2PC patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: