Home > mailing lists

Re: Question concerning XTM (eXtensible Transaction Manager API) - Mailing list pgsql-hackers

From	konstantin knizhnik
Subject	Re: Question concerning XTM (eXtensible Transaction Manager API)
Date	November 17, 2015 12:09:18
Msg-id	5DF86E4E-4F3C-4D39-A9DE-97586D51B8AD@postgrespro.ru Whole thread Raw
In response to	Re: Question concerning XTM (eXtensible Transaction Manager API) (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Question concerning XTM (eXtensible Transaction Manager API)
List	pgsql-hackers

Tree view

On Nov 17, 2015, at 10:44 AM, Amit Kapila wrote:

I think the general idea is that if Commit is WAL logged, then the
operation is considered to committed on local node and commit should
happen on any node, only once prepare from all nodes is successful.
And after that transaction is not supposed to abort. But I think you are
trying to optimize the DTM in some way to not follow that kind of protocol.

DTM is still following 2PC protocol:

First transaction is saved in WAL at all nodes and only after it commit is completed at all nodes.

We try to avoid maintaining of separate log files for 2PC (as now for prepared transactions)

and do not want to change logic of work with WAL.

DTM approach is based on the assumption that PostgreSQL CLOG and visibility rules allows to "hide" transaction even if it is committed in WAL.

By the way, how will arbiter does the recovery in a scenario where it
crashes, won't it need to contact all nodes for the status of in-progress or
prepared transactions?

The current answer is that arbiter can not crash. To provide fault tolerance we spawn replicas of arbiter which are managed using Raft protocol.

If master is crashed or network is partitioned then new master is chosen.

PostgreSQL backends have list of possible arbiter addresses. Once connection with arbiter is broken, backend tries to reestablish connection using alternative addresses.

But only master accepts incomming connections.

I think it would be better if more detailed design of DTM with respect to
transaction management and recovery could be updated on wiki for having
discussion on this topic. I have seen that you have already updated many
details of the system, but still the complete picture of DTM is not clear.

I agree.

But please notice that pg_dtm is just one of the possible implementations of distributed transaction management.

We also experimenting with other implementations, for example pg_tsftm based on timestamps. It doesn't require central arbiter and so shows much better (almost linear) scalability.

But recovery in case of pg_tsdtm is even more obscure.

Also performance of pg_tsdtm greatly depends on system clock synchronization and network delays. We git about 70k TPS on cluster with 12 nodes connected with 10Gbit network.,

But when we run the same test on hosts located in different geographic regions (several thousands km), then performance falls down to 15 TPS.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Simon Riggs
Date: 17 November 2015, 11:54:05
Subject: Re: Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

From: Masahiko Sawada
Date: 17 November 2015, 12:13:59
Subject: Re: Support for N synchronous standby servers - take 2

Re: Question concerning XTM (eXtensible Transaction Manager API) - Mailing list pgsql-hackers

Previous

Next