Thread: Performance with new nested-xacts code
I was all set to launch into a diatribe about the half dozen performance issues I think we *must* fix in the new nested-transactions code, and thought I'd back it up by citing some pgbench numbers. So I ran pgbench runs using yesterday's CVS tip and current. I had to fix a small memory leak before I could get through pgbench -i at all :-(, but after that I found that today's tip seems a good 10% faster than yesterday's. This brought me up short. I sure as heck do not see anything in that patch that would represent a performance gain over before, especially not in the very vanilla-flavor cases exercised by pgbench. Do you see an explanation? I'm a bit worried that we've managed to dike out some essential operation or other... Can anyone else reproduce these results? The test case I'm using ispgbench -i -s 10 bench followed by repeatedpgbench -c 5 -t 1000 bench I've built PG with --enable-debug and --enable-cassert, and am running with -F (fsync off) but otherwise absolutely factory-stock postgresql.conf. The hardware is a not-so-new-anymore Dell P4 with run-of-the-mill IDE disk drive, running RHL 8.0. Obviously none of this is tuned at all, but the question is why did CVS tip get faster when it should by rights be slower. regards, tom lane
On Thu, Jul 01, 2004 at 12:21:55AM -0400, Tom Lane wrote: > I was all set to launch into a diatribe about the half dozen performance > issues I think we *must* fix in the new nested-transactions code, I completely agree, of course. > This brought me up short. I sure as heck do not see anything in that > patch that would represent a performance gain over before, especially > not in the very vanilla-flavor cases exercised by pgbench. Do you see > an explanation? I'm a bit worried that we've managed to dike out some > essential operation or other... Nope, nothing; in fact, I think the only thing that changed in the normal (no subxacts) code path are the tqual.c tests, and those ought to be more expensive than before, not less ... The only thing that changed somewhat is the new GUC code you wrote, but I doubt that could account for a 10% performance increase. > Can anyone else reproduce these results? The test case I'm using is > pgbench -i -s 10 bench > followed by repeated > pgbench -c 5 -t 1000 bench I measure the same in 7 runs: Yesterday Today 241.37 243.02 215.3 228.58 235.13 225.57 194.72 253.23 215.98 272.14 239.15 262.35 220.17 249.42 Min 194.72 225.57 Max 241.37 272.14 Avg 223.12 247.76 -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Por suerte hoy explotó el califont porque si no me habría muerto de aburrido" (Papelucho)
Hi Tom, As requested - although the results are all over the place... :-( One interesting factor in these tests is that the max tps without the new code was 74.7, with the new code, 85.8. This is a Sony Laptop, slow IDE disk, Fedora Core 2 [grant@localhost pgsql-HEAD]$ uname -a Linux localhost.localdomain 2.6.6-1.435 #1 Mon Jun 14 09:09:07 EDT 2004 i686 i686 i386 GNU/Linux ./bin/postmaster -F HTH. Regards, Grant -- PRE NESTED XACTS [grant@localhost pgbench]$ ./pgbench -c 5 -t 1000 bench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 5 number of transactions per client: 1000 number of transactions actually processed: 5000/5000 tps = 74.632059 (including connections establishing) tps = 74.710309 (excluding connections establishing) [grant@localhost pgbench]$ ./pgbench -c 5 -t 1000 bench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 5 number of transactions per client: 1000 number of transactions actually processed: 5000/5000 tps = 61.405658 (including connections establishing) tps = 61.471754 (excluding connections establishing) [grant@localhost pgbench]$ ./pgbench -c 5 -t 1000 bench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 5 number of transactions per client: 1000 number of transactions actually processed: 5000/5000 tps = 59.702545 (including connections establishing) tps = 59.754499 (excluding connections establishing) [grant@localhost pgbench]$ ./pgbench -c 5 -t 1000 bench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 5 number of transactions per client: 1000 number of transactions actually processed: 5000/5000 tps = 54.531685 (including connections establishing) tps = 54.584432 (excluding connections establishing) -- POST NESTED XACTS [grant@localhost pgbench]$ ./pgbench -c 5 -t 1000 bench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 5 number of transactions per client: 1000 number of transactions actually processed: 5000/5000 tps = 72.656915 (including connections establishing) tps = 72.732723 (excluding connections establishing) [grant@localhost pgbench]$ ./pgbench -c 5 -t 1000 bench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 5 number of transactions per client: 1000 number of transactions actually processed: 5000/5000 tps = 85.687383 (including connections establishing) tps = 85.822281 (excluding connections establishing) [grant@localhost pgbench]$ ./pgbench -c 5 -t 1000 bench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 5 number of transactions per client: 1000 number of transactions actually processed: 5000/5000 tps = 59.479127 (including connections establishing) tps = 59.540478 (excluding connections establishing) [grant@localhost pgbench]$ ./pgbench -c 5 -t 1000 bench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 5 number of transactions per client: 1000 number of transactions actually processed: 5000/5000 tps = 51.675145 (including connections establishing) tps = 51.715526 (excluding connections establishing) Tom Lane wrote: [snip] > > Can anyone else reproduce these results? The test case I'm using is > pgbench -i -s 10 bench > followed by repeated > pgbench -c 5 -t 1000 bench > I've built PG with --enable-debug and --enable-cassert, and am running > with -F (fsync off) but otherwise absolutely factory-stock > postgresql.conf. The hardware is a not-so-new-anymore Dell P4 with > run-of-the-mill IDE disk drive, running RHL 8.0. Obviously none of this > is tuned at all, but the question is why did CVS tip get faster when it > should by rights be slower. >
On Thu, Jul 01, 2004 at 12:21:55AM -0400, Tom Lane wrote: > This brought me up short. I sure as heck do not see anything in that > patch that would represent a performance gain over before, especially > not in the very vanilla-flavor cases exercised by pgbench. Do you see > an explanation? Oh, also the inval.c code now is saving a lot of pfrees at each transaction end. By the way, thank you very much for the code review. The commit message said "with some help from," but it actually should have been something like "more than half of the patch rewritten by." Lots of things are much simpler and of course a lot of bugs are gone. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "You knock on that door or the sun will be shining on places inside you that the sun doesn't usually shine" (en Death: "The High Cost of Living")
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Oh, also the inval.c code now is saving a lot of pfrees at each > transaction end. Nope, that's not it; the old code actually did no retail pfree's anyway --- I just diked out what was really dead code. Besides which, pgbench doesn't do any catalog updates in its main loop, and so there shouldn't be anything for inval.c to do anyhow. The only thing that's occurred to me since last night is that I simplified the data structures in trigger.c enough to get rid of a separate memory context for them. That means one less MemoryContextCreate/Delete per transaction cycle. It's tough to believe that that's a 10% gain ... but OTOH, with --enable-cassert turned on, there's a lot of per-context overhead invested in memory checking. Were your numbers also taken with --enable-cassert? It might be instructive to compare numbers taken without. > By the way, thank you very much for the code review. The commit message > said "with some help from," but it actually should have been something > like "more than half of the patch rewritten by." Lots of things are > much simpler and of course a lot of bugs are gone. And, no doubt, some new ones introduced ;-). I have a long list of things we still need to look at, which I will post as soon as I've cleaned it up into a readable form. I did not yet incorporate the localbuf changes you sent; I think we should first decide what our theory is about cursors, which will determine whether we need to change bufmgr.c. There's no point in cloning that code until it's right. And I think you sent me one other patch recently that didn't make it into this commit, either. regards, tom lane
On Thu, Jul 01, 2004 at 08:51:59AM -0400, Tom Lane wrote: > The only thing that's occurred to me since last night is that I > simplified the data structures in trigger.c enough to get rid of > a separate memory context for them. That means one less > MemoryContextCreate/Delete per transaction cycle. It's tough to > believe that that's a 10% gain ... but OTOH, with --enable-cassert > turned on, there's a lot of per-context overhead invested in > memory checking. > > Were your numbers also taken with --enable-cassert? It might be > instructive to compare numbers taken without. Ah, yes, it was with asserts enabled. I'll try again. > I did not yet incorporate the localbuf changes you sent; I think we > should first decide what our theory is about cursors, which will > determine whether we need to change bufmgr.c. There's no point in > cloning that code until it's right. Well, my opinion is that cursors and other resources should at least be usable from a inner subtransaction in its parent -- because if that can't be done we are wasting some of the benefits, because we can't just "stick everything in a subtransaction" to be able to retry if it fails. It is a pity that we can't roll back FETCH or lo_close() but at least we can keep them declared/open across a subtransaction commit. So I think we should only be releasing resources and restoring refcounts at subtransaction abort, not commit -- and of course we would only complain about nonzero refcounts for main transaction commit. I have changes for this in the relcache and catcache code because it was needed to make large objects behave; I'll do it for bufmgr too. (Anyway there's something strange about large objects, because at main transaction commit the locking code says "corrupted proclock table" -- I haven't been able to figure out why). > And I think you sent me one other patch recently that didn't make it > into this commit, either. Yes, it was the password file management. But it was untested; let me do that first (we need some docs on that file). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Para tener más hay que desear menos"
On Thu, Jul 01, 2004 at 09:07:11AM -0400, Alvaro Herrera wrote: > > Were your numbers also taken with --enable-cassert? It might be > > instructive to compare numbers taken without. > > Ah, yes, it was with asserts enabled. I'll try again. With asserts disabled the situations seems reverted: Yesterday Today 325.76 277.92 304.12 272.92 264.32 463.23 282.75 252.96 284.92 268.38 327.97 244.47 364.94 243.23 Min 264.32 243.23 Max 364.94 463.23 Avg 307.82 289.02 There's a lot more variance though, which I can easily attribute to the server being more loaded (this is a Uni server; last night there was nobody connected and today there's quite a few people, though the majority is just using tin, pine or elm). (These numbers are not comparable to what I posted yesterday because I changed the checkpoint_segments parameter in between.) -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Well, my opinion is that cursors and other resources should at least be > usable from a inner subtransaction in its parent -- because if that > can't be done we are wasting some of the benefits, because we can't just > "stick everything in a subtransaction" to be able to retry if it fails. > It is a pity that we can't roll back FETCH or lo_close() but at least we > can keep them declared/open across a subtransaction commit. AFAICS we can't allow an inner transaction to use a cursor that was declared in an outer transaction, because if the inner transaction fails then it's not just a matter of the FETCH not rolling back; the subtransaction abort will restore state in the bufmgr and other places that is simply inconsistent with the state of the cursor's plantree. If we don't restore bufmgr state at subxact commit, I think that it would work to do begin; begin; declare cursor c ...; end; -- cursor, bufmgr state NOT changed herefetch from c;... It seems though that we might have a lot of problems with figuring out which subsystems ought to restore state at subxact commit and which not. Another point is that this will NOT work: begin; begin; declare cursor c ...; end; -- cursor, bufmgr state NOT changed here begin; fetch from c; abort; -- oops, wrong state restored here so the rule would have to be something like "cursors can only be touched by the highest subxact nesting level they have ever been visible to". Yuck. regards, tom lane
On Thu, 1 Jul 2004, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Well, my opinion is that cursors and other resources should at least be > > usable from a inner subtransaction in its parent -- because if that > > can't be done we are wasting some of the benefits, because we can't just > > "stick everything in a subtransaction" to be able to retry if it fails. > > It is a pity that we can't roll back FETCH or lo_close() but at least we > > can keep them declared/open across a subtransaction commit. > > AFAICS we can't allow an inner transaction to use a cursor that was > declared in an outer transaction, because if the inner transaction fails > then it's not just a matter of the FETCH not rolling back; the > subtransaction abort will restore state in the bufmgr and other places > that is simply inconsistent with the state of the cursor's plantree. > This isn't just directly declared CURSORs, but also V3 protocol portals which makes it very difficult for a driver to use. An individual writing direct BEGIN, DECLARE CURSOR, and so on statements can work around the restrictions here because they know exactly what they are doing and exactly what statements are sent to the backend. From a driver perspective it has no idea what the end user's intention is and therefore cannot do things like transform a query to a cursor based statement or even use the V3 protocol because it has no idea if the caller is going to use a subtransaction at some point. Kris Jurka
On Thu, Jul 01, 2004 at 10:19:08AM -0400, Tom Lane wrote: > AFAICS we can't allow an inner transaction to use a cursor that was > declared in an outer transaction, because if the inner transaction fails > then it's not just a matter of the FETCH not rolling back; the > subtransaction abort will restore state in the bufmgr and other places > that is simply inconsistent with the state of the cursor's plantree. Well! I was reading some code about cursors/portals and it certainly is not an easy issue to handle. > begin; > begin; > declare cursor c ...; > end; -- cursor, bufmgr state NOT changed here > fetch from c; I tried modifying bufmgr, relcache and catcache to see this (with a simple example) and it works. > It seems though that we might have a lot of problems with figuring out > which subsystems ought to restore state at subxact commit and which not. AFAICS the only ones that restore state at subxact commit right now are relcache, catcache and bufmgr. I think the three of them should not do so to support this. > [...] so the rule would have to be something like "cursors can only be > touched by the highest subxact nesting level they have ever been > visible to". Yuck. Yeah. Another answer would be to reset the executor state if the cursor is modified in a subtransaction that aborts: begin; declare cursor c ...; fetch 1 from c; -- returns tuple 1 begin; fetch 1 from c; -- returns tuple 2rollback; fetch 1 from c; -- returns tuple 1 again This is mightly ugly but I think it's the most usable of the options seen so far. I'm not sure how hard is to do that -- maybe it's just a matter of running PortalStart() again for the cursor? What do you think? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Hay dos momentos en la vida de un hombre en los que no debería especular: cuando puede permitírselo y cuando no puede" (Mark Twain)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Yeah. Another answer would be to reset the executor state if the cursor > is modified in a subtransaction that aborts: Reset is no solution --- rewinding the cursor to the beginning still leaves it in a state that is inconsistent with the restored state of the bufmgr refcounts, etc. We have to restore the executor tree to the state it was in when we entered the subtransaction, not any earlier or later state. The problem is that that's a *major* bit of work, and probably impossible to do completely (how are you going to get a user-written SRF to restore state?). But I think it would be the best solution if we can think of a reasonable way to do it. Another idea I've been wondering about involves leaving the cursor's state alone at subtrans abort, and instead trying to fix up the bufmgr &etc state to be correct for this situation. This seems not real easy since I'm not sure how we distinguish state changes associated with advancing an outer cursor from those associated with completely-inside-the-subxact operations. But it seems at least theoretically doable without breaking user SRFs. Also, it's possible that we could arrange things so that major cost is incurred only when a subxact actually aborts, rather than in the main-line path of control. (Expending lots of cycles at every subxact start to save state that we might never need really sticks in my craw...) One possible plan of attack for this approach is to abandon the notion that bufmgr per se is responsible for figuring out what to reset its state to. Instead we would insist on doing a proper shutdown of inside-the-transaction portals, and expect that doing so would bring the refcounts to where they oughta be. I think that this would have been an unworkably fragile solution back in the day when the present error recovery approach was designed, because there were too many bugs and we were often recovering from the effects of those bugs as much as anything else. But maybe now we could get away with it. BTW, I've been more or less ignoring the nearby debate about whether cursors ought to roll back at subxact abort or not, because right now I don't know how to implement *either* behavior. Unless we have credible theories about how to implement both, it's a bit useless to debate which is better. regards, tom lane
Tom Lane wrote: > BTW, I've been more or less ignoring the nearby debate about whether > cursors ought to roll back at subxact abort or not, because right now > I don't know how to implement *either* behavior. Unless we have > credible theories about how to implement both, it's a bit useless to > debate which is better. If/when you have a choice -- the JDBC driver code gets easier if you don't roll back at abort. If rollback of cursor state can happen, then the driver will need to preserve client-side state associated with each cursor (i.e. the driver's idea of its position) at the start of each subtransaction so it can restore it on rollback -- or always use FETCH ABSOLUTE (which I think needs SCROLL cursors?) -O
Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > >>Well, my opinion is that cursors and other resources should at least be >>usable from a inner subtransaction in its parent -- because if that >>can't be done we are wasting some of the benefits, because we can't just >>"stick everything in a subtransaction" to be able to retry if it fails. >>It is a pity that we can't roll back FETCH or lo_close() but at least we >>can keep them declared/open across a subtransaction commit. > > > AFAICS we can't allow an inner transaction to use a cursor that was > declared in an outer transaction, because if the inner transaction fails > then it's not just a matter of the FETCH not rolling back; the > subtransaction abort will restore state in the bufmgr and other places > that is simply inconsistent with the state of the cursor's plantree. Another data point: DB2 claims this behaviour: # The impact on cursors resulting from a ROLLBACK TO SAVEPOINT depends on the statements within the savepoint * If the savepoint contains DDL on which a cursor is dependent, the cursor is marked invalid. Attempts to use such a cursor results in an error (SQLSTATE 57007). * Otherwise: o If the cursor is referenced in the savepoint, the cursor remains open and is positioned before the next logical row of the result table. (A FETCH must be performed before a positioned UPDATE or DELETE statement is issued.) o Otherwise, the cursor is not affected by the ROLLBACK TO SAVEPOINT (it remains open and positioned). -O