Re: Global snapshots - Mailing list pgsql-hackers
From | Alexey Kondratov |
---|---|
Subject | Re: Global snapshots |
Date | |
Msg-id | 3ef7877bfed0582019eab3d462a43275@postgrespro.ru Whole thread Raw |
In response to | Re: Global snapshots ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>) |
Responses |
Re: Global snapshots
|
List | pgsql-hackers |
Hi, On 2020-07-27 09:44, Andrey V. Lepikhov wrote: > On 7/27/20 11:22 AM, tsunakawa.takay@fujitsu.com wrote: >> >> US8356007B2 - Distributed transaction management for database systems >> with multiversioning - Google Patents >> https://patents.google.com/patent/US8356007 >> >> >> If it is, can we circumvent this patent? >> > > Thank you for the research (and previous links too). > I haven't seen this patent before. This should be carefully studied. I had a look on the patch set, although it is quite outdated, especially on 0003. Two thoughts about 0003: First, IIUC atomicity of the distributed transaction in the postgres_fdw is achieved by the usage of 2PC. I think that this postgres_fdw 2PC support should be separated from global snapshots. It could be useful to have such atomic distributed transactions even without a proper visibility, which is guaranteed by the global snapshot. Especially taking into account the doubts about Clock-SI and general questions about algorithm choosing criteria above in the thread. Thus, I propose to split 0003 into two parts and add a separate GUC 'postgres_fdw.use_twophase', which could be turned on independently from 'postgres_fdw.use_global_snapshots'. Of course if the latter is enabled, then 2PC should be forcedly turned on as well. Second, there are some problems with errors handling in the 0003 (thanks to Arseny Sher for review). +error: + if (!res) + { + sql = psprintf("ABORT PREPARED '%s'", fdwTransState->gid); + BroadcastCmd(sql); + elog(ERROR, "Failed to PREPARE transaction on remote node"); + } It seems that we should never reach this point, just because BroadcastStmt will throw an ERROR if it fails to prepare transaction on the foreign server: + if (PQresultStatus(result) != expectedStatus || + (handler && !handler(result, arg))) + { + elog(WARNING, "Failed command %s: status=%d, expected status=%d", sql, PQresultStatus(result), expectedStatus); + pgfdw_report_error(ERROR, result, entry->conn, true, sql); + allOk = false; + } Moreover, It doesn't make much sense to try to abort prepared xacts, since if we failed to prepare it somewhere, then some foreign servers may become unavailable already and this doesn't provide us a 100% guarantee of clean up. + /* COMMIT open transaction of we were doing 2PC */ + if (fdwTransState->two_phase_commit && + (event == XACT_EVENT_PARALLEL_COMMIT || event == XACT_EVENT_COMMIT)) + { + BroadcastCmd(psprintf("COMMIT PREPARED '%s'", fdwTransState->gid)); + } At this point, the host (local) transaction is already committed and there is no way to abort it gracefully. However, BroadcastCmd may rise an ERROR that will cause a PANIC, since it is non-recoverable state: PANIC: cannot abort transaction 487, it was already committed Attached is a patch, which implements a plain 2PC in the postgres_fdw and adds a GUC 'postgres_fdw.use_twophase'. Also it solves these errors handling issues above and tries to add proper comments everywhere. I think, that 0003 should be rebased on the top of it, or it could be a first patch in the set, since it may be used independently. What do you think? Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Attachment
pgsql-hackers by date: