Thread: Anyone know why PostgreSQL doesn't support 2 phase execution?
...and what kind of performance hit we take (and under what circumstances) for not having it?
On Sat, 2003-04-05 at 11:35, Ron Peacetree wrote: > ...and what kind of performance hit we take (and under what > circumstances) for not having it? Do you mean 2-phase commits? If so, how do you take a performance hit from *not* having it? PostgreSQL doesn't have it (prepare & forget phases)) simply because nobody has completed and submitted an implementation. Satoshi is working on the problem. If not, what do you mean by 2 phase execution? -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Ron Peacetree wrote > ...and what kind of performance hit we take (and under what > circumstances) for not having it? Are you thinking of "two phase commit"? There is no "performance hit for not having it." And it does not currently apply to PostgreSQL. Two phase commit is only needed when updates need to be applied simultaneously on multiple databases. That is, you'd have something like: CREATE DISTRIBUTED TRANSACTION X1; insert into table TBL1 in database ('db1', 'id1', 'auth1') values (1, 2, 100.00, now());insert into table TBL2 in database ('db2', 'id2', 'auth2') values (1, 3, -100.00, now()); COMMIT DISTRIBUTED TRANSACTION X1; where the "in database ('db1', 'id1', 'auth1')" part indicates some form of connection parameters for the database. There certainly is merit to having two phase commit; it allows coordinating updates to multiple databases. The "degradation of performance" that results from not having this is that you can't have distributed transactions. That's not a "performance hit;" that's a case of "you can't do distributed transactions." And distributed transactions are probably /more/ expensive than nondistributed ones, so it is more readily argued that by not supporting them, you don't have the problem of performance degrading due to making use of distributed transactions. -- output = reverse("gro.gultn@" "enworbbc") http://www3.sympatico.ca/cbbrowne/nonrdbms.html Rules of the Evil Overlord #195. "I will not use hostages as bait in a trap. Unless you're going to use them for negotiation or as human shields, there's no point in taking them." <http://www.eviloverlord.com/>
"Rod Taylor" <rbt@rbt.ca> wrote in message news:1049735504.40144.35.camel@jester... > --=-v98gp7DTtZFJa7ee6YMf > Content-Type: text/plain > Content-Transfer-Encoding: quoted-printable > > On Sat, 2003-04-05 at 11:35, Ron Peacetree wrote: > > ...and what kind of performance hit we take (and under what > > circumstances) for not having it? > > Do you mean 2-phase commits? If so, how do you take a performance hit > from *not* having it? PostgreSQL doesn't have it (prepare & forget > phases)) simply because nobody has completed and submitted an > implementation. Satoshi is working on the problem. > > If not, what do you mean by 2 phase execution? > The performance hit as in "in comparison to DB's that =do= have two phase execution (and commit for that matter), just how much slower is PostgreSQL?" Two phase execution and two phase commit are two different concepts. Two phase execution splits the execution of queries explicitly into a "do all the book keeping and setup stuff before execution" phase and an actual execution phase. The benefit is that if you are going to say, step through a largish table in chunks, doing the same query on each chunk, two phase execution allows the DB to do everything (syntax checking, query planning, blah, blah) except the actual execution =once= and reuse it for each subsequent chunk. Think of it as a close cousin to loop unrolling. It also helps parallel performance since you can hand the "blessed" set up query plan to multiple processes and those processes can focus on just getting work done. The lack of two phase =commit= is a also a potential performance hit in comparison to DB products that have it, but the more important issue there is that there are SMP/distributed apps that really can't work acceptably unless a DB product has two phase commit. The three "biggies" in DB land, SQL Server, Oracle, and DB2, have both features. I suspect that PostgreSQL will need to as well...
On Mon, 2003-04-07 at 14:59, Ron Peacetree wrote: > Two phase execution and two phase commit are two different concepts. > Two phase execution splits the execution of queries explicitly into a > "do all the book keeping and setup stuff before execution" phase and > an actual execution phase. The benefit is that if you are going to > say, step through a largish table in chunks, doing the same query on > each chunk, two phase execution allows the DB to do everything (syntax > checking, query planning, blah, blah) except the actual execution > =once= and reuse it for each subsequent chunk. If "stepping through a largish table in chunks" is implemented as a single SQL query, PostgreSQL already does this internally (the parsing, planning, rewriting, and execution phases are distinct operations inside the backend). If the stepping is done as a bunch of similar queries, you can use prepared queries (as of PostgreSQL 7.3) to do the parsing, planning and rewriting only once, and then reuse the query plan multiple times. > It also helps parallel performance since > you can hand the "blessed" set up query plan to multiple processes and > those processes can focus on just getting work done. Prepared queries are per-backend as of PostgreSQL 7.3, so this can't be done (and I'm a little skeptical that it would be very useful...) Cheers, Neil
Ron Peacetree wrote: > [...] > The lack of two phase =commit= is a also a potential performance hit > in comparison to DB products that have it, but the more important > issue there is that there are SMP/distributed apps that really can't > work acceptably unless a DB product has two phase commit. > > The three "biggies" in DB land, SQL Server, Oracle, and DB2, have both > features. I suspect that PostgreSQL will need to as well... Ron, do you actually have some ideas how to do 2 phase commits? Especially things like how to re-lock during startup after a crash and the like? Or is your knowledge in reality only buzzwords collected from high glossy tradeshow flyers? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
"Jan Wieck" <JanWieck@Yahoo.com> wrote in message news:3E942A99.A6102F2D@Yahoo.com... > Ron Peacetree wrote: > > [...] > > The lack of two phase =commit= is a also a potential performance hit > > in comparison to DB products that have it, but the more important > > issue there is that there are SMP/distributed apps that really can't > > work acceptably unless a DB product has two phase commit. > > > > The three "biggies" in DB land, SQL Server, Oracle, and DB2, have both > > features. I suspect that PostgreSQL will need to as well... > > Ron, do you actually have some ideas how to do 2 phase commits? > Especially things like how to re-lock during startup after a crash and > the like? Or is your knowledge in reality only buzzwords collected from > high glossy tradeshow flyers? > If "some ideas" means "do I know how to code it into PostgreSQL right now", the answer is no. If "some ideas" means "do I understand the general problem at a technical level well enough to be thinking about the algorithms and datastructures needed to support the functionality" the answer is yes. So I'd say a fair response to your questions is that my knowledge is in between the two extremes you've described, but probably closer to the first than the second ;-). We can have a private email discussion on the topic if you wish.