Thread: Re: [PATCHES] COPY with no WAL, in certain circumstances
"Simon Riggs" <simon@2ndquadrant.com> writes: > The rule is: if the relfilenode for a table is new in this transaction > (and therefore the whole things will be dropped at end-of-transaction) > then *all* COPY commands are able to avoid writing WAL safely, if: > - PITR is not enabled > - there is no active portal (which could have been opened on an earlier > commandid and could therefore see data prior to the switch to the new > relfilenode). In those cases, *not* using WAL causes no problems at all, > so sleep well without it. Uh ... what in the world has an active portal got to do with it? I think you've confused snapshot considerations with crash recovery. regards, tom lane
On Sat, 2007-01-06 at 21:18 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > The rule is: if the relfilenode for a table is new in this transaction > > (and therefore the whole things will be dropped at end-of-transaction) > > then *all* COPY commands are able to avoid writing WAL safely, if: > > - PITR is not enabled > > - there is no active portal (which could have been opened on an earlier > > commandid and could therefore see data prior to the switch to the new > > relfilenode). In those cases, *not* using WAL causes no problems at all, > > so sleep well without it. > > Uh ... what in the world has an active portal got to do with it? > I think you've confused snapshot considerations with crash recovery. The patch sets HEAP_XMIN_COMMITTED on all of the rows loaded by COPY as well. So the active portal consideration does apply in this case. (We discussed about a year ago the idea of setting FrozenTransactionId, which I now agree wouldn't work, but setting the hint bits does work.). That is important, because otherwise the first person to read the newly loaded table has to re-write the whole table again; right now we ignore that cost as being associated with the original COPY, but from most users perspective it is. Its common practice to issue a select count(*) from table after its been loaded, so that later readers of the table don't suffer. Which makes me think we can still use the no-WAL optimisation, but just without setting HEAP_XMIN_COMMITTED when there is an active portal. (I should also mention that the creation of the relfilenode can happen in earlier committed subtransactions also. There is also a great big list of commands that throw implicit transactions, all of which cannot therefore be used with this optimisation either.) -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > On Sat, 2007-01-06 at 21:18 -0500, Tom Lane wrote: >> Uh ... what in the world has an active portal got to do with it? >> I think you've confused snapshot considerations with crash recovery. > The patch sets HEAP_XMIN_COMMITTED on all of the rows loaded by COPY as > well. I think you just talked yourself out of getting this patch applied. regards, tom lane
On Sun, 2007-01-07 at 03:53 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > On Sat, 2007-01-06 at 21:18 -0500, Tom Lane wrote: > >> Uh ... what in the world has an active portal got to do with it? > >> I think you've confused snapshot considerations with crash recovery. > > > The patch sets HEAP_XMIN_COMMITTED on all of the rows loaded by COPY as > > well. > > I think you just talked yourself out of getting this patch applied. Maybe; what would be your explanation? Do you have a failure case you know of? Perhaps if one exists, there is another route. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Sun, Jan 07, 2007 at 11:46:29AM +0000, Simon Riggs wrote: > On Sun, 2007-01-07 at 03:53 -0500, Tom Lane wrote: > > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > The patch sets HEAP_XMIN_COMMITTED on all of the rows loaded by COPY as > > > well. > > > > I think you just talked yourself out of getting this patch applied. > > Maybe; what would be your explanation? Do you have a failure case you > know of? Perhaps if one exists, there is another route. One thing I pondered while looking at this: how do you know the user is going to commit the transaction after the COPY is complete. Could they run analyze or vacuum or some other DDL command on the table that would get confused by the disparity between the hint bits and the xlog. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Sun, 2007-01-07 at 12:59 +0100, Martijn van Oosterhout wrote: > On Sun, Jan 07, 2007 at 11:46:29AM +0000, Simon Riggs wrote: > > On Sun, 2007-01-07 at 03:53 -0500, Tom Lane wrote: > > > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > > The patch sets HEAP_XMIN_COMMITTED on all of the rows loaded by COPY as > > > > well. > > > > > > I think you just talked yourself out of getting this patch applied. > > > > Maybe; what would be your explanation? Do you have a failure case you > > know of? Perhaps if one exists, there is another route. > > One thing I pondered while looking at this: how do you know the user is > going to commit the transaction after the COPY is complete. Could they > run analyze or vacuum or some other DDL command on the table that would > get confused by the disparity between the hint bits and the xlog. If it crashes, we'll clean up the file. At end of statement it is synced to disk. There is no failure condition where the rows continue to exist on disk && the table relfilenode shows a committed transaction pointing to the file containing the marked-valid-but-actually-not rows. There is a failure condition where the new relfilenode is on disk, but the version of the table that points to that will not be visible. (You can't run a VACUUM inside a transaction block.) Everybody else is locked out because the CREATE or TRUNCATE has taken an AccessExclusiveLock. I've just re-checked the conditions from tqual.c and they all check, AFAICS. There would be a problem *if* it was possible to issue a self-referential COPY, like this: COPY foo FROM (select * from foo) which would exhibit the Halloween problem. But this is not yet possible, and if it were we would be able to check for that and avoid it. I'm not saying I haven't made a mistake, but I've done lots of thinking and checking to confirm that this is a valid thing to do. That in itself is never enough, which is why I/we talk together. If somebody does find a problem, its a small thing to remove that from the patch, since it is an additional enhancement on top of the basic WAL removal. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > On Sun, 2007-01-07 at 03:53 -0500, Tom Lane wrote: >> I think you just talked yourself out of getting this patch applied. > Maybe; what would be your explanation? The main reason is that you were guilty of false advertising. This patch was described as being an application of a known-and-agreed-safe optimization to a new case, viz letting COPY into a new table use a whole-file fsync instead of WAL-logging individual records. I suspect most people didn't look at it closely because it sounded like nothing very new; I certainly didn't. Now we find out that you've also decided you can subvert the MVCC system in the name of speed. This is NOT something the hackers community has discussed and agreed to, and I for one doubt that it's safe. The active-portal kluge that you've just mentioned is nothing but a kluge, proving that you thought of some cases where it would fail. But I doubt you thought of everything. In any case the correct method for dealing with a new optimization of questionable safety or value is to submit it as a separate patch, not to hope that the committer will fail to notice that the patch doesn't do what you said it did. regards, tom lane
I wrote: > ... The active-portal kluge that you've just > mentioned is nothing but a kluge, proving that you thought of some cases > where it would fail. But I doubt you thought of everything. BTW, a sufficient counterexample for that kluge is that neither SPI or SQL-function execution use a separate portal for invoked commands. Thus testing whether there's only one active portal isn't sufficient to prove that you're not inside a function executing in serializable mode, and thus it could have a transaction snapshot predating the COPY. It's conceivable that it's safe anyway, or could be made so with some rejiggering of the tests in tqual.c, but counting active portals doesn't do anything to help. regards, tom lane
"Simon Riggs" <simon@2ndquadrant.com> writes: > There is no failure condition where the rows continue to exist > on disk && the table relfilenode shows a committed transaction pointing > to the file containing the marked-valid-but-actually-not rows. What of BEGIN; CREATE TABLE foo ...; SAVEPOINT x; COPY foo FROM ...; ROLLBACK TO x; COMMIT; regards, tom lane
On Sun, 2007-01-07 at 11:14 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > On Sun, 2007-01-07 at 03:53 -0500, Tom Lane wrote: > >> I think you just talked yourself out of getting this patch applied. > > > Maybe; what would be your explanation? > > The main reason is that you were guilty of false advertising. It was not my intention to do that, but I see that is how it has come out. I am at fault and will withdraw that part of the patch. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Sun, 2007-01-07 at 11:29 -0500, Tom Lane wrote: > I wrote: > > ... The active-portal kluge that you've just > > mentioned is nothing but a kluge, proving that you thought of some cases > > where it would fail. But I doubt you thought of everything. > > BTW, a sufficient counterexample for that kluge is that neither SPI or > SQL-function execution use a separate portal for invoked commands. Thus > testing whether there's only one active portal isn't sufficient to prove > that you're not inside a function executing in serializable mode, and > thus it could have a transaction snapshot predating the COPY. Chewing the last pieces of my Bowler hat while reading. I don't have many left ;-( > It's conceivable that it's safe anyway, or could be made so with some > rejiggering of the tests in tqual.c, but counting active portals doesn't > do anything to help. I'll rethink, but as you say, with separate proposal and patch. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Sun, 2007-01-07 at 11:29 -0500, Tom Lane wrote: > I wrote: > > ... The active-portal kluge that you've just > > mentioned is nothing but a kluge, proving that you thought of some cases > > where it would fail. But I doubt you thought of everything. New patch submitted to -patches on different thread. ...continuing this discussion about setting HEAP_XMIN_COMMITTED... > BTW, a sufficient counterexample for that kluge is that neither SPI or > SQL-function execution use a separate portal for invoked commands. Thus > testing whether there's only one active portal isn't sufficient to prove > that you're not inside a function executing in serializable mode, and > thus it could have a transaction snapshot predating the COPY. What would the best/acceptable way be to test for this condition? Usingif (IsXactIsoLevelSerializable) would not be a very tight condition, but at least it would avoid putting additional status flags into every transaction, just to test for this case in COPY statements. ISTM unlikely that people would try to use COPY in Serializable mode; what do people think? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > ...continuing this discussion about setting HEAP_XMIN_COMMITTED... >> BTW, a sufficient counterexample for that kluge is that neither SPI or >> SQL-function execution use a separate portal for invoked commands. > What would the best/acceptable way be to test for this condition? My opinion is that the only reliable answer would be to find a way not to have to test. Tuples entered by your own transaction are normally considered good by tqual.c anyway, and thus I think we might be pretty close to having it Just Work, but you'd have to go through all the cases in tqual.c and see if any don't work. The other point is that to make such an optimization you have to consider the subtransaction history. For WAL you only have to know that the table will disappear if the top-level transaction fails, but to pre-set commit bits requires being sure that the table will disappear if the current subxact fails --- not the same thing at all. regards, tom lane
On Tue, 2007-01-09 at 16:31 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > ...continuing this discussion about setting HEAP_XMIN_COMMITTED... > > >> BTW, a sufficient counterexample for that kluge is that neither SPI or > >> SQL-function execution use a separate portal for invoked commands. > > > What would the best/acceptable way be to test for this condition? > > My opinion is that the only reliable answer would be to find a way not > to have to test. Tuples entered by your own transaction are normally > considered good by tqual.c anyway, and thus I think we might be pretty > close to having it Just Work, but you'd have to go through all the cases > in tqual.c and see if any don't work. I agree we could get this to Just Work by altering HeapTupleSatisfies...() functions so that their first test is if (TransactionIdIsCurrentTransactionId(xvac)) rather then if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)) I had ruled that out, unconsciously prefering the localised check in COPY, but I agree that the test was too complex. Taking this direct approach does have a lot of promise: Looks like HeapTupleSatisfiesSnapshot() currently does 4 if tests to check that an undeleted row is visible, and all other paths do much more work. Increasing the number of checks to 5 might not hurt that much. The branch prediction would work well for it, since when you are the CurrentTransactionId the test would pass 100% and when you're not the branch would fail 100% of the time, so the CPU would react to it positively I think. I'll run some tests and see if there's a noticeable difference. > The other point is that to make such an optimization you have to > consider the subtransaction history. For WAL you only have to know that > the table will disappear if the top-level transaction fails, but to > pre-set commit bits requires being sure that the table will disappear > if the current subxact fails --- not the same thing at all. Right, you reminded me of that on the other part of the thread. It seems straightforward to put a test into COPY that the optimization will only work if you're in the Xid that made the relfilenode change. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > I agree we could get this to Just Work by altering > HeapTupleSatisfies...() functions so that their first test is > if (TransactionIdIsCurrentTransactionId(xvac)) > rather then > if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)) Huh? That doesn't make any sense at all. xvac wouldn't normally be valid. I don't want to put TransactionIdIsCurrentTransactionId into the main path of the tqual routines if at all possible; it's not an especially cheap test, particularly if deeply nested in subtransactions. My thought was that for SatisfiesSnapshot, you'd fall through the first big chunk if XMIN_COMMITTED is set and then fail the XidInSnapshot test. Then a TransactionIdIsCurrentTransactionId test could be added in that fairly-unusual failure path, where it wouldn't slow the main path of control. Something like if (XidInSnapshot(HeapTupleHeaderGetXmin(tuple), snapshot)) { if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple))) { if (HeapTupleHeaderGetCmin(tuple) >=snapshot->curcid) return false; /* inserted after scan started */ } else returnfalse; /* treat as still in progress */ } This would require rejiggering snapshots to include our own xid, see comment for XidInSnapshot. That would add some distributed cost to executions of XidInSnapshot for recently-committed tuples, but it would avoid adding any cycles to the path taken for tuples committed before the xmin horizon, which is the normal case that has to be kept fast. Haven't looked at whether an equivalent hack is possible for the other tqual routines. regards, tom lane
On Wed, 2007-01-10 at 10:37 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > I agree we could get this to Just Work by altering > > HeapTupleSatisfies...() functions so that their first test is > > if (TransactionIdIsCurrentTransactionId(xvac)) Oh? Sorry, I meant xmin not xvac at that point. Cut N Paste thinko. > > rather then > > if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)) > > Huh? That doesn't make any sense at all. xvac wouldn't normally be > valid. > I don't want to put TransactionIdIsCurrentTransactionId into the main > path of the tqual routines if at all possible; it's not an especially > cheap test, particularly if deeply nested in subtransactions. Phew, well I'm relieved. Such a mainline change did make me nervous. > This would require rejiggering snapshots to include our own xid, see > comment for XidInSnapshot. That would add some distributed cost to > executions of XidInSnapshot for recently-committed tuples, but it would > avoid adding any cycles to the path taken for tuples committed before > the xmin horizon, which is the normal case that has to be kept fast. > > Haven't looked at whether an equivalent hack is possible for the other > tqual routines. Will check, thanks. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com