Thread: Hot standby, slot ids and stuff
I've mentioned before that I don't like the slotid stuff. From an architectural point of view, we should try to keep the extra communication between primary and standby to a minimum, for the sake of robustness. The primary shouldn't have such a big role in keeping track of which xids it has already emitted in WAL records, which subxids have been marked, and so on. Since I've been whining about that for some time, I figured I have to put my money where my mouth is, so here's a patch based on v6a that eliminates the concept of slotids, as well as the new xl_info2 field in XLogRecord. This seems much simpler to me. I haven't given it much testing, but seems to work. There's a whole bunch of comments marked with XXX that need resolving, though. Attached is a patch agains CVS HEAD (hs-noslotid-1.patch) as well as an incremental patch against your v6a (hs-noslotid-against-v6a.patch). Sorry if you were just working on some big changes and I joggled your elbow. I also didn't check what changes you had in v7 yet. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote: > I've mentioned before that I don't like the slotid stuff. From an > architectural point of view, we should try to keep the extra > communication between primary and standby to a minimum, for the sake of > robustness. The primary shouldn't have such a big role in keeping track > of which xids it has already emitted in WAL records, which subxids have > been marked, and so on. > > Since I've been whining about that for some time, I figured I have to > put my money where my mouth is Yes, you do, but I think not like this. Turning up with a patch and saying "my way is better" but not actually saying what that way *is* just confuses things. If you want to do things a different way you need to say what you want to do and what effects those changes will have. Are there tradeoffs? If so what are they? ISTM easier to discuss them on list first than to write a load of code and ask us all to read, understand and comment. > , so here's a patch based on v6a that > eliminates the concept of slotids, as well as the new xl_info2 field in > XLogRecord. This seems much simpler to me. I haven't given it much > testing, but seems to work. There's a whole bunch of comments marked > with XXX that need resolving, though. > > Attached is a patch agains CVS HEAD (hs-noslotid-1.patch) as well as an > incremental patch against your v6a (hs-noslotid-against-v6a.patch). > Sorry if you were just working on some big changes and I joggled your > elbow. I also didn't check what changes you had in v7 yet. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote: > I've mentioned before that I don't like the slotid stuff. From an > architectural point of view, we should try to keep the extra > communication between primary and standby to a minimum, for the sake of > robustness. The primary shouldn't have such a big role in keeping track > of which xids it has already emitted in WAL records, which subxids have > been marked, and so on. Removing slotid does improve the code, I agree. It was never there for reasons other than the functions and features it provides. Effects of removing slotid are at least these * if FATAL errors occur, yet we have long running transactions then we have no way of removing those entries from the recovery procs. Since we have a fixed pool of recovery transactions we won't have anywhere to store that data. Snapshot sizes are fixed maximum with max_connections, so eventually we would not be able to take a snapshot at all, and we'd need to have a "ERROR: unable to take valid snapshot". * if FATAL errors occur while holding AccessExclusiveLock then we have no way of releasing those locks until the recovery proc is stale, which might be some time. Not sure if your patch releases those? * xid->proc lookup is now very slow and needs to be called more frequently; will still be slower even with the further optimisations you mention. Possibly a minor point with right tuning. * slotid removed from xlrec header; makes no difference with 64-bit systems because of alignment issues. ...perhaps more, not sure yet. All we need to do is decide if we want these things or not. If not, then we can remove the slotid stuff. If it wasn't clear before, this was, for me, never a discussion about a particular way of coding things. I care little for that and would probably go with your suggestions more often than not. It's just simply about what we want it to do. If you want to argue in favour of restrictions to make things simpler, that's OK and I could probably live with them quite happily myself. I'm trying to implement the project philosophy of It Just Works, no restrictions or caveats - and I know if I had not then the patch would have been rejected on that basis, as has happened many times before. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > * if FATAL errors occur, yet we have long running transactions then we > have no way of removing those entries from the recovery procs. Since we > have a fixed pool of recovery transactions we won't have anywhere to > store that data. Snapshot sizes are fixed maximum with max_connections, > so eventually we would not be able to take a snapshot at all, and we'd > need to have a "ERROR: unable to take valid snapshot". When a backend dies with FATAL, it writes an abort record before exiting. (I was under the impression it doesn't until few minutes ago myself, when I actually read the shutdown code :-)) > * if FATAL errors occur while holding AccessExclusiveLock then we have > no way of releasing those locks until the recovery proc is stale, which > might be some time. Not sure if your patch releases those? I don't think so, that needs to be fixed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > If you want to do things a different way you need to say what you want > to do and what effects those changes will have. I want to reduce the coupling between the primary and the master. The less they need to communicate, the better. I want to get rid of slotid, and as many of the other extra information carried in WAL records that I can. I believe that will make the patch both simpler and more robust. > Are there tradeoffs? If so what are they? I don't think there's any big difference in user-visible behavior. RecordKnownAssignedTransactionId now needs to be called for every xlog record as opposed to just the first record where an xid appears, because I eliminated the hint flag in WAL to mark those records. And it needs to look up the recover proc by xid, instead of using the slot id. But I don't think that will have a significant impact on performance. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > * if FATAL errors occur, yet we have long running transactions then we > > have no way of removing those entries from the recovery procs. Since we > > have a fixed pool of recovery transactions we won't have anywhere to > > store that data. Snapshot sizes are fixed maximum with max_connections, > > so eventually we would not be able to take a snapshot at all, and we'd > > need to have a "ERROR: unable to take valid snapshot". > > When a backend dies with FATAL, it writes an abort record before exiting. > > (I was under the impression it doesn't until few minutes ago myself, > when I actually read the shutdown code :-)) Not in all cases; keep reading :-) The good thing is we have a choice now as to whether we care about that, or not. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote: >> When a backend dies with FATAL, it writes an abort record before exiting. >> >> (I was under the impression it doesn't until few minutes ago myself, >> when I actually read the shutdown code :-)) > > Not in all cases; keep reading :-) Want to give a clue? ShutdownPostgres is registered as an on_shmem_exit hook in InitPostgres, and ShutdownPostgres calls AbortOutOfAnyTransaction. PANIC is another story, but in that case the primary will go down, and will write a new checkpoint soon after it starts up again. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndQuadrant.com> writes: > On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote: >> When a backend dies with FATAL, it writes an abort record before exiting. >> >> (I was under the impression it doesn't until few minutes ago myself, >> when I actually read the shutdown code :-)) > Not in all cases; keep reading :-) If it doesn't, that's a bug. A FATAL exit is not supposed to leave the shared state corrupted, it's only supposed to be a forcible session termination. Any open transaction should be rolled back. regards, tom lane
On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote: > >> When a backend dies with FATAL, it writes an abort record before exiting. > >> > >> (I was under the impression it doesn't until few minutes ago myself, > >> when I actually read the shutdown code :-)) > > > Not in all cases; keep reading :-) > > If it doesn't, that's a bug. A FATAL exit is not supposed to leave the > shared state corrupted, it's only supposed to be a forcible session > termination. Any open transaction should be rolled back. Please look back at the earlier discussion we had on this exact point: http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php Heikki, perhaps now you understand my continued opposition to your ideas for simplification: I had already thought of them and was forced to rule them out, not through my own choice. Tom, note that I listen to what you say and try to write code that meets those requirements. >From here, I don't mind which way we go. I want code that is as simple as possible to "do the job", whatever we agree that to be. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote: >> Simon Riggs <simon@2ndQuadrant.com> writes: >>> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote: >>>> When a backend dies with FATAL, it writes an abort record before exiting. >>>> >>>> (I was under the impression it doesn't until few minutes ago myself, >>>> when I actually read the shutdown code :-)) >>> Not in all cases; keep reading :-) >> If it doesn't, that's a bug. A FATAL exit is not supposed to leave the >> shared state corrupted, it's only supposed to be a forcible session >> termination. Any open transaction should be rolled back. > > Please look back at the earlier discussion we had on this exact point: > http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php I think the running-xacts list we dump to WAL at every checkpoint is enough to handle that. Just treat the dead transaction as in-progress until the next running-xacts record. It's presumably extremely rare to have a process die with FATAL, and not write an abort record. A related issue is that currently the recovery PANICs if it runs out of recovery procs. I think that's not acceptable, regardless of whether we use slotids or some other method to avoid it in normal operation, because it means you can't recover at all if you set max_connections too low in the standby (or in the primary, and you have to recover from crash), or you run out of recovery procs because of an abort failed in the primary like discussed on that thread. The standby should just fast-forward to the next running-xacts record in that case. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote: > >> Simon Riggs <simon@2ndQuadrant.com> writes: > >>> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote: > >>>> When a backend dies with FATAL, it writes an abort record before exiting. > >>>> > >>>> (I was under the impression it doesn't until few minutes ago myself, > >>>> when I actually read the shutdown code :-)) > >>> Not in all cases; keep reading :-) > >> If it doesn't, that's a bug. A FATAL exit is not supposed to leave the > >> shared state corrupted, it's only supposed to be a forcible session > >> termination. Any open transaction should be rolled back. > > > > Please look back at the earlier discussion we had on this exact point: > > http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php > > I think the running-xacts list we dump to WAL at every checkpoint is > enough to handle that. Just treat the dead transaction as in-progress > until the next running-xacts record. It's presumably extremely rare to > have a process die with FATAL, and not write an abort record. I agree, but I'll wait for Tom to speak further. > A related issue is that currently the recovery PANICs if it runs out of > recovery procs. I think that's not acceptable, regardless of whether we > use slotids or some other method to avoid it in normal operation, > because it means you can't recover at all if you set max_connections too > low in the standby (or in the primary, and you have to recover from > crash), or you run out of recovery procs because of an abort failed in > the primary like discussed on that thread. > The standby should just > fast-forward to the next running-xacts record in that case. What do you mean by "fast forward"? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote: >> A related issue is that currently the recovery PANICs if it runs out of >> recovery procs. I think that's not acceptable, regardless of whether we >> use slotids or some other method to avoid it in normal operation, >> because it means you can't recover at all if you set max_connections too >> low in the standby (or in the primary, and you have to recover from >> crash), or you run out of recovery procs because of an abort failed in >> the primary like discussed on that thread. > >> The standby should just >> fast-forward to the next running-xacts record in that case. > > What do you mean by "fast forward"? I mean the standby should stop trying to track the in progress transactions in recovery procs, and apply the WAL records like it does before the consistent state is reached. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote: > >> A related issue is that currently the recovery PANICs if it runs out of > >> recovery procs. I think that's not acceptable, regardless of whether we > >> use slotids or some other method to avoid it in normal operation, > >> because it means you can't recover at all if you set max_connections too > >> low in the standby (or in the primary, and you have to recover from > >> crash), or you run out of recovery procs because of an abort failed in > >> the primary like discussed on that thread. > > > >> The standby should just > >> fast-forward to the next running-xacts record in that case. > > > > What do you mean by "fast forward"? > > I mean the standby should stop trying to track the in progress > transactions in recovery procs, and apply the WAL records like it does > before the consistent state is reached. If you say something is not acceptable you need to say what behaviour you do find acceptable in its place. It's good to come up with new ideas, but it's not good to ignore the problems the new ideas have. This is a general point that applies both here and to your proposals with synch rep. It's much harder to say how it should work in a way that covers all the requirements and points others have made, otherwise its just handwaving. So, if we don't PANIC, how should we behave? Without full information on running-xacts we would be unable to take a snapshot, so should: * backends be forcibly disconnected? * backends hang waiting for snapshot info to be re-available again in X minutes worth of WAL time? * backends throw an ERROR: unable to provide snapshot at this time, DETAIL: retry your statement later. ...other alternatives and possibly prevent new connections. If max_connections is higher on primary then the standby will *never* be available for querying. Should we have multiple ERRORs depending upon whether the situation is hopefully-temporary or looks-permanent? Don't assume I want the PANIC. That clearly needs to be revisited if we change slotids. I just want to make a balanced judgement based upon a full consideration of the options. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote: >> I mean the standby should stop trying to track the in progress >> transactions in recovery procs, and apply the WAL records like it does >> before the consistent state is reached. > > ... > > So, if we don't PANIC, how should we behave? > > Without full information on running-xacts we would be unable to take a > snapshot, so should: > * backends be forcibly disconnected? > * backends hang waiting for snapshot info to be re-available again in X > minutes worth of WAL time? > * backends throw an ERROR: unable to provide snapshot at this time, > DETAIL: retry your statement later. > ...other alternatives > > and possibly prevent new connections. All of those seem reasonable to me. The 2nd option seems nicest, "X minutes" should probably be controlled by max_standby_delay, after which you can throw an error. If we care enough, we could also keep tracking the transactions in backend-private memory of the startup process, until there's enough room in proc array. That would make the outage shorter, because you wouldn't have to wait until the next running-xacts record, but only until enough transactions have finished that they all fit in proc array again. But whatever is the simplest, really. > If max_connections is higher on primary then the standby will *never* be > available for querying. Should we have multiple ERRORs depending upon > whether the situation is hopefully-temporary or looks-permanent? > > Don't assume I want the PANIC. That clearly needs to be revisited if we > change slotids. It needs to be revisited whether we change slotids or not, IMHO. Note that with slotids, you have a problem as soon as any of the slots that don't exist on standby are used, regardless of how many concurrent transactions there actually is. Without slots you only have a problem if you really have more than standby's max_connectionsconcurrent transactions. That makes a big difference in practice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2009-01-09 at 14:38 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote: > >> I mean the standby should stop trying to track the in progress > >> transactions in recovery procs, and apply the WAL records like it does > >> before the consistent state is reached. > > > > ... > > > > So, if we don't PANIC, how should we behave? > > > > Without full information on running-xacts we would be unable to take a > > snapshot, so should: > > * backends be forcibly disconnected? > > * backends hang waiting for snapshot info to be re-available again in X > > minutes worth of WAL time? > > * backends throw an ERROR: unable to provide snapshot at this time, > > DETAIL: retry your statement later. > > ...other alternatives > > > > and possibly prevent new connections. > > All of those seem reasonable to me. The 2nd option seems nicest, "X > minutes" should probably be controlled by max_standby_delay, after which > you can throw an error. Hmm, we use the recovery procs to track transactions that have TransactionIds assigned. That means we will overflow only if we have approach 100% write transactions at any time, or if we have more write transactions in progress than we have max_connections on standby. So it sounds like the overflow situation would probably be both rare and, if it did occur, may not occur for long periods. > If we care enough, we could also keep tracking the transactions in > backend-private memory of the startup process, until there's enough room > in proc array. That would make the outage shorter, because you wouldn't > have to wait until the next running-xacts record, but only until enough > transactions have finished that they all fit in proc array again. > > But whatever is the simplest, really. The above does sound best since it would allow us to have the snapshot hang for a short period. But at this stage of the game, more complex. For now though, since it looks like it would happen fairly rarely, I'd opt for the simplest: throw an ERROR. > > If max_connections is higher on primary then the standby will *never* be > > available for querying. Should we have multiple ERRORs depending upon > > whether the situation is hopefully-temporary or looks-permanent? > > > > Don't assume I want the PANIC. That clearly needs to be revisited if we > > change slotids. > > It needs to be revisited whether we change slotids or not, IMHO. > > Note that with slotids, you have a problem as soon as any of the slots > that don't exist on standby are used, regardless of how many concurrent > transactions there actually is. Without slots you only have a problem if > you really have more than standby's max_connections concurrent > transactions. That makes a big difference in practice. Sometimes, but mostly people set max_connections higher because they intend to use those extra connections. So no real advantage there against the slotid approach :-) -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Fri, 2009-01-09 at 11:20 +0000, Simon Riggs wrote: > On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote: > > Simon Riggs wrote: > > > On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote: > > >> Simon Riggs <simon@2ndQuadrant.com> writes: > > >>> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote: > > >>>> When a backend dies with FATAL, it writes an abort record before exiting. > > >>>> > > >>>> (I was under the impression it doesn't until few minutes ago myself, > > >>>> when I actually read the shutdown code :-)) > > >>> Not in all cases; keep reading :-) > > >> If it doesn't, that's a bug. A FATAL exit is not supposed to leave the > > >> shared state corrupted, it's only supposed to be a forcible session > > >> termination. Any open transaction should be rolled back. > > > > > > Please look back at the earlier discussion we had on this exact point: > > > http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php > > > > I think the running-xacts list we dump to WAL at every checkpoint is > > enough to handle that. Just treat the dead transaction as in-progress > > until the next running-xacts record. It's presumably extremely rare to > > have a process die with FATAL, and not write an abort record. > > I agree, but I'll wait for Tom to speak further. OK, will proceed without this. Time is pressing. Heikki and I both agree that FATAL errors that fail to write abort records are rare and an acceptable problem in real usage. If they do occur, their nuisance factor is short-lived because of measures taken within the patch. Hot Standby does not *rely* upon there always an abort record for FATAL errors, so we cannot reasonably say the current design would be "unacceptably fragile" as I had once thought. So based upon that, out comes the slotid concept and luckily much of the cruftier aspects of the patch. Less code, probably fewer bugs. Good thing. I will produce a new v7 with those changes and merge the changes for v6b also, so we can begin review again from there. Hi ho, hi ho... -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote: > Since I've been whining about that for some time, I figured I have to > put my money where my mouth is, so here's a patch based on v6a that > eliminates the concept of slotids, as well as the new xl_info2 field in > XLogRecord. This seems much simpler to me. I haven't given it much > testing, but seems to work. There's a whole bunch of comments marked > with XXX that need resolving, though. There's a confusion in the patch between top level xid and parent xid. The xl field is named parentxid but actually contains top level. That distinction is important because the code now uses the top level xid to locate the recovery proc, formerly the role of the slotid. This leads to an error when we SubTransSetParent(child_xid, top_xid); since this assumes that the top_xid is the parent, which it is not. Mostly you wouldn't notice unless you were looking up the subtrans status for an xid that had committed but was the child of an aborted subtransaction, with the top level xid having > 64 subtransactions. It's possible the confusion leads to other bugs in UnobservedXid processing, but I didn't look too hard at that. AFAICS we need both parent and top xids. Or put another way, we need the parent xid and other info that allows us to uniquely determine the proc we need to update. Now the "other info..." could be top xid or it could also be slotid, which then avoids later zig-zagging to look up the proc. I'm wasn't looking for ways to reintroduce slotid, but it seems more logical to keep slotid in light of the above. However, you will probably view this as intransigence, so I will await your input. I'm very happy that GetStandbyInfoForTransaction() and all the XLR2 flags have bitten the dust and will sleep for eternity. For xl_rel_lock you add a field called xid and then ask /* xid of the *parent* transaction. XXX why parent? */. You've done this because it replaced slotid. But looking at that, I think the 6a patch had a bug there: a subtransaction abort record would release locks for the whole top level xact. So we need to pass both top level xid (or slotid) and xid for each lock, then release using the actual xid only. You also ask: Shouldn't we call StartupSUBTRANS() and the other startup functions like we do below, before letting anyone in? My answer is that the current role of StartupSUBTRANS and friends is not appropriate at that point, since they zero out those structures. I left those routines in place thinking "startup" meant "moving to normal running". If we did have a "startupsubtrans" at the point you note, it would currently be empty: we don't keep track of the latest page during recovery. Perhaps we should, but then we'd need to do the equivalent of ExtendSubtrans etc, which it seemed easier to avoid. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Sat, 2009-01-10 at 09:40 +0000, Simon Riggs wrote: > This leads to an error when we SubTransSetParent(child_xid, top_xid); > since this assumes that the top_xid is the parent, which it is not. > Mostly you wouldn't notice unless you were looking up the subtrans > status for an xid that had committed but was the child of an aborted > subtransaction, with the top level xid having > 64 subtransactions. > It's possible the confusion leads to other bugs in UnobservedXid > processing, but I didn't look too hard at that. > > AFAICS we need both parent and top xids. I wonder if its possible to derive the parent by looking at the highest/most newly assigned xid? Abort records would remove aborted subtransactions and AFAIK we currently assign a new subtransaction only ever from the latest current subtransaction. (This wouldn't be necessarily true if supported true branch-anywhere subtransactions, but we don't). Sounds correct, but not really sure. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Sat, 2009-01-10 at 11:45 +0000, Simon Riggs wrote: > On Sat, 2009-01-10 at 09:40 +0000, Simon Riggs wrote: > > > This leads to an error when we SubTransSetParent(child_xid, top_xid); > > since this assumes that the top_xid is the parent, which it is not. > > Mostly you wouldn't notice unless you were looking up the subtrans > > status for an xid that had committed but was the child of an aborted > > subtransaction, with the top level xid having > 64 subtransactions. > > It's possible the confusion leads to other bugs in UnobservedXid > > processing, but I didn't look too hard at that. > > > > AFAICS we need both parent and top xids. > > I wonder if its possible to derive the parent by looking at the > highest/most newly assigned xid? Abort records would remove aborted > subtransactions and AFAIK we currently assign a new subtransaction only > ever from the latest current subtransaction. (This wouldn't be > necessarily true if supported true branch-anywhere subtransactions, but > we don't). Sounds correct, but not really sure. Starting to sound like a do-me-later-if-ever optimisation and certainly nothing I want to rely on in court. I'm progressing with parent_xid added to the xlog record header, for now. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > There's a confusion in the patch between top level xid and parent xid. > The xl field is named parentxid but actually contains top level. That > distinction is important because the code now uses the top level xid to > locate the recovery proc, formerly the role of the slotid. True, I changed the meaning halfway through. My thinking was that we can get away by only tracking the top-level xact of each subtransaction, not the whole tree of subtransactions. > This leads to an error when we SubTransSetParent(child_xid, top_xid); > since this assumes that the top_xid is the parent, which it is not. > Mostly you wouldn't notice unless you were looking up the subtrans > status for an xid that had committed but was the child of an aborted > subtransaction, with the top level xid having > 64 subtransactions. Hmm. When a subtransaction aborts, RecordTransactionAbort is called and clog is updated to show the subtransaction and all it's subcommitted children as aborted. I think we're safe, though it wouldn't hurt to double-check. It's an important point that needs documenting, though. I completely neglected that. > I'm wasn't looking for ways to reintroduce slotid, but it seems more > logical to keep slotid in light of the above. However, you will probably > view this as intransigence, so I will await your input. Yeah, it sure does seem like intransigence :-) > For xl_rel_lock you add a field called xid and then ask > /* xid of the *parent* transaction. XXX why parent? */. > You've done this because it replaced slotid. But looking at that, I > think the 6a patch had a bug there: a subtransaction abort record would > release locks for the whole top level xact. So we need to pass both top > level xid (or slotid) and xid for each lock, then release using the > actual xid only. Hmm, would just the subtransaction xid be enough? If we use that to release, what do we need the top-level xid for? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sun, 2009-01-11 at 10:41 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > There's a confusion in the patch between top level xid and parent xid. > > The xl field is named parentxid but actually contains top level. That > > distinction is important because the code now uses the top level xid to > > locate the recovery proc, formerly the role of the slotid. > > True, I changed the meaning halfway through. My thinking was that we can > get away by only tracking the top-level xact of each subtransaction, not > the whole tree of subtransactions. > > > This leads to an error when we SubTransSetParent(child_xid, top_xid); > > since this assumes that the top_xid is the parent, which it is not. > > Mostly you wouldn't notice unless you were looking up the subtrans > > status for an xid that had committed but was the child of an aborted > > subtransaction, with the top level xid having > 64 subtransactions. > > Hmm. When a subtransaction aborts, RecordTransactionAbort is called and > clog is updated to show the subtransaction and all it's subcommitted > children as aborted. I think we're safe, though it wouldn't hurt to > double-check. That part of your argument is correct but we are not safe. But please look at XactLockTableWait() and see what you think. According to comments this would lead to different lock release behaviour. Beauty, thy name is not subtransaction. If you agree that we need the parent xid then we have further problems. Adding another xid onto the header of each WAL record will add 8 bytes onto each record because of alignment. This was the other reason for slotid, since I was able to fit that into just 2 bytes and avoid the 8 byte loss. Moving swiftly on, I have this morning though of an alternative, so at least we now have some choice. Rather than store the parent xid itself we store the difference between the current xid and the parent xid. Typically this will be less than 65535; when it is not we set it to zero but issue an xid assignment xlog record. However, I think XactLockTableWait() doesn't need to know the parent either. (This feels more like wishful thinking, but here goes anyway). We release locks *after* TransactionIdAbortTree() has fully executed, so the test for TransactionIdIsInProgress(xid) will always see the abort status, if set. Notice that if we are awake at all it is because the top-level transaction is complete or our subxid is aborted. So there is never any need to look at the status of the parent, nor in fact any need to look at the procarray at all, which is always a waste of effort. If you believe that, you'll want to commit the attached patch (or something similar with comments refactored etc). > For xl_rel_lock you add a field called xid and then ask > > /* xid of the *parent* transaction. XXX why parent? */. > > You've done this because it replaced slotid. But looking at that, I > > think the 6a patch had a bug there: a subtransaction abort record would > > release locks for the whole top level xact. So we need to pass both top > > level xid (or slotid) and xid for each lock, then release using the > > actual xid only. > > Hmm, would just the subtransaction xid be enough? If we use that to > release, what do we need the top-level xid for? So we can release all locks taken by subxids at once, when we learn that a top level xid has disappeared. A minor point. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Attachment
Heikki, can I get your feedback on this urgently please? I want to respond positively to your review comments and complete something you will find acceptable. But I need to know what that is, please. On Sun, 2009-01-11 at 11:55 +0000, Simon Riggs wrote: > On Sun, 2009-01-11 at 10:41 +0200, Heikki Linnakangas wrote: > > Simon Riggs wrote: > > > There's a confusion in the patch between top level xid and parent xid. > > > The xl field is named parentxid but actually contains top level. That > > > distinction is important because the code now uses the top level xid to > > > locate the recovery proc, formerly the role of the slotid. > > > > True, I changed the meaning halfway through. My thinking was that we can > > get away by only tracking the top-level xact of each subtransaction, not > > the whole tree of subtransactions. > > > > > This leads to an error when we SubTransSetParent(child_xid, top_xid); > > > since this assumes that the top_xid is the parent, which it is not. > > > Mostly you wouldn't notice unless you were looking up the subtrans > > > status for an xid that had committed but was the child of an aborted > > > subtransaction, with the top level xid having > 64 subtransactions. > > > > Hmm. When a subtransaction aborts, RecordTransactionAbort is called and > > clog is updated to show the subtransaction and all it's subcommitted > > children as aborted. I think we're safe, though it wouldn't hurt to > > double-check. > > That part of your argument is correct but we are not safe. But please > look at XactLockTableWait() and see what you think. According to > comments this would lead to different lock release behaviour. > > Beauty, thy name is not subtransaction. > > If you agree that we need the parent xid then we have further problems. > Adding another xid onto the header of each WAL record will add 8 bytes > onto each record because of alignment. This was the other reason for > slotid, since I was able to fit that into just 2 bytes and avoid the 8 > byte loss. Moving swiftly on, I have this morning though of an > alternative, so at least we now have some choice. Rather than store the > parent xid itself we store the difference between the current xid and > the parent xid. Typically this will be less than 65535; when it is not > we set it to zero but issue an xid assignment xlog record. > > However, I think XactLockTableWait() doesn't need to know the parent > either. (This feels more like wishful thinking, but here goes anyway). > We release locks *after* TransactionIdAbortTree() has fully executed, so > the test for TransactionIdIsInProgress(xid) will always see the abort > status, if set. Notice that if we are awake at all it is because the > top-level transaction is complete or our subxid is aborted. So there is > never any need to look at the status of the parent, nor in fact any need > to look at the procarray at all, which is always a waste of effort. If > you believe that, you'll want to commit the attached patch (or something > similar with comments refactored etc). > > > For xl_rel_lock you add a field called xid and then ask > > > /* xid of the *parent* transaction. XXX why parent? */. > > > You've done this because it replaced slotid. But looking at that, I > > > think the 6a patch had a bug there: a subtransaction abort record would > > > release locks for the whole top level xact. So we need to pass both top > > > level xid (or slotid) and xid for each lock, then release using the > > > actual xid only. > > > > Hmm, would just the subtransaction xid be enough? If we use that to > > release, what do we need the top-level xid for? > > So we can release all locks taken by subxids at once, when we learn that > a top level xid has disappeared. A minor point. > -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Mon, 2009-01-12 at 14:10 +0200, Heikki Linnakangas wrote: > > However, I think XactLockTableWait() doesn't need to know the parent > > either. (This feels more like wishful thinking, but here goes anyway). > > We release locks *after* TransactionIdAbortTree() has fully executed, so > > the test for TransactionIdIsInProgress(xid) will always see the abort > > status, if set. Notice that if we are awake at all it is because the > > top-level transaction is complete or our subxid is aborted. So there is > > never any need to look at the status of the parent, nor in fact any need > > to look at the procarray at all, which is always a waste of effort. > > Right, we don't currently write a WAL record at subtransaction commit, > only at subtransaction abort or top-level commit. So the problem > described in the comment at XactLockTableWait() can't arise in the standby. Good. So we can exclude parent_xid and just include top level xid. So we are now officially back inside the current xlog record padding: we don't need to increase WAL record length and therefore we can lose slotid (yay!). > Actually, I wonder if we should write a WAL record at subtransaction > commit too, to save on shared memory in the standby as well. You understand that the current design never performs XactLockTableInsert() for individual top/sub transactions? There would be no memory overhead to save. The only locks allowed are AccessShareLocks which never conflict with anything except for AccessExclusiveLocks. If an AEL conflicts with an ASL we blow away the ASL holders. If an ASL requestor is blocked by a an AEL the wait is performed on the single lock table entry for the Startup process, which acts as a proxy for all emulated AEL lock holders. > > If > > you believe that, you'll want to commit the attached patch (or something > > similar with comments refactored etc). > > Umm, we still need the SubTransGetParent() call in normal operation. OK, that was really just a thought experiment anyway to prove the point one way or the other. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > Rather than store the > parent xid itself we store the difference between the current xid and > the parent xid. Typically this will be less than 65535; when it is not > we set it to zero but issue an xid assignment xlog record. That sounds pretty hacky. > However, I think XactLockTableWait() doesn't need to know the parent > either. (This feels more like wishful thinking, but here goes anyway). > We release locks *after* TransactionIdAbortTree() has fully executed, so > the test for TransactionIdIsInProgress(xid) will always see the abort > status, if set. Notice that if we are awake at all it is because the > top-level transaction is complete or our subxid is aborted. So there is > never any need to look at the status of the parent, nor in fact any need > to look at the procarray at all, which is always a waste of effort. Right, we don't currently write a WAL record at subtransaction commit, only at subtransaction abort or top-level commit. So the problem described in the comment at XactLockTableWait() can't arise in the standby. Actually, I wonder if we should write a WAL record at subtransaction commit too, to save on shared memory in the standby as well. > If > you believe that, you'll want to commit the attached patch (or something > similar with comments refactored etc). Umm, we still need the SubTransGetParent() call in normal operation. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sat, 2009-01-10 at 09:40 +0000, Simon Riggs wrote: > But looking at that, I > think the 6a patch had a bug there: a subtransaction abort record > would release locks for the whole top level xact. Fixed. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > If you want to do things a different way you need to say what you want > > to do and what effects those changes will have. > > I want to reduce the coupling between the primary and the master. The > less they need to communicate, the better. I want to get rid of slotid, > and as many of the other extra information carried in WAL records that I > can. I believe that will make the patch both simpler and more robust. > > > Are there tradeoffs? If so what are they? > > I don't think there's any big difference in user-visible behavior. I notice that we are no longer able to record the databaseId for a connection, so query conflict resolution cannot be isolated to individual databases any longer. We might sometimes infer a transaction's databaseId from certain WAL records but that is only going to be possible within each rmgr, which is fairly strange. I'll leave all of the databaseId stuff in there in case we think of anything good. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Thu, 2009-01-15 at 22:10 +0000, Simon Riggs wrote: > I notice that we are no longer able to record the databaseId for a > connection, so query conflict resolution cannot be isolated to > individual databases any longer. Wrong way round. Incoming WAL records from dbOid on them, so we can still check for conflicts against the db of the standby backends. Phew! > I'll leave all of the databaseId stuff in there in case we think of > anything good. Ripping out now as final part of earlier refactoring. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support