Thread: Hot standby, slot ids and stuff

Hot standby, slot ids and stuff

From
Heikki Linnakangas
Date:
I've mentioned before that I don't like the slotid stuff. From an
architectural point of view, we should try to keep the extra
communication between primary and standby to a minimum, for the sake of
robustness. The primary shouldn't have such a big role in keeping track
of which xids it has already emitted in WAL records, which subxids have
been marked, and so on.

Since I've been whining about that for some time, I figured I have to
put my money where my mouth is, so here's a patch based on v6a that
eliminates the concept of slotids, as well as the new xl_info2 field in
XLogRecord. This seems much simpler to me. I haven't given it much
testing, but seems to work. There's a whole bunch of comments marked
with XXX that need resolving, though.

Attached is a patch agains CVS HEAD (hs-noslotid-1.patch) as well as an
incremental patch against your v6a (hs-noslotid-against-v6a.patch).
Sorry if you were just working on some big changes and I joggled your
elbow. I also didn't check what changes you had in v7 yet.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote:

> I've mentioned before that I don't like the slotid stuff. From an 
> architectural point of view, we should try to keep the extra 
> communication between primary and standby to a minimum, for the sake of 
> robustness. The primary shouldn't have such a big role in keeping track 
> of which xids it has already emitted in WAL records, which subxids have 
> been marked, and so on.
> 
> Since I've been whining about that for some time, I figured I have to 
> put my money where my mouth is

Yes, you do, but I think not like this. Turning up with a patch and
saying "my way is better" but not actually saying what that way *is*
just confuses things.

If you want to do things a different way you need to say what you want
to do and what effects those changes will have. Are there tradeoffs? If
so what are they? ISTM easier to discuss them on list first than to
write a load of code and ask us all to read, understand and comment.

> , so here's a patch based on v6a that 
> eliminates the concept of slotids, as well as the new xl_info2 field in 
> XLogRecord. This seems much simpler to me. I haven't given it much 
> testing, but seems to work. There's a whole bunch of comments marked 
> with XXX that need resolving, though.
> 
> Attached is a patch agains CVS HEAD (hs-noslotid-1.patch) as well as an 
> incremental patch against your v6a (hs-noslotid-against-v6a.patch). 
> Sorry if you were just working on some big changes and I joggled your 
> elbow. I also didn't check what changes you had in v7 yet.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote:

> I've mentioned before that I don't like the slotid stuff. From an 
> architectural point of view, we should try to keep the extra 
> communication between primary and standby to a minimum, for the sake of 
> robustness. The primary shouldn't have such a big role in keeping track 
> of which xids it has already emitted in WAL records, which subxids have 
> been marked, and so on.

Removing slotid does improve the code, I agree. It was never there for
reasons other than the functions and features it provides.

Effects of removing slotid are at least these

* if FATAL errors occur, yet we have long running transactions then we
have no way of removing those entries from the recovery procs. Since we
have a fixed pool of recovery transactions we won't have anywhere to
store that data. Snapshot sizes are fixed maximum with max_connections,
so eventually we would not be able to take a snapshot at all, and we'd
need to have a "ERROR:  unable to take valid snapshot". 

* if FATAL errors occur while holding AccessExclusiveLock then we have
no way of releasing those locks until the recovery proc is stale, which
might be some time. Not sure if your patch releases those?

* xid->proc lookup is now very slow and needs to be called more
frequently; will still be slower even with the further optimisations you
mention. Possibly a minor point with right tuning.

* slotid removed from xlrec header; makes no difference with 64-bit
systems because of alignment issues.

...perhaps more, not sure yet.

All we need to do is decide if we want these things or not. If not, then
we can remove the slotid stuff.

If it wasn't clear before, this was, for me, never a discussion about a
particular way of coding things. I care little for that and would
probably go with your suggestions more often than not. It's just simply
about what we want it to do. If you want to argue in favour of
restrictions to make things simpler, that's OK and I could probably live
with them quite happily myself. I'm trying to implement the project
philosophy of It Just Works, no restrictions or caveats - and I know if
I had not then the patch would have been rejected on that basis, as has
happened many times before.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> * if FATAL errors occur, yet we have long running transactions then we
> have no way of removing those entries from the recovery procs. Since we
> have a fixed pool of recovery transactions we won't have anywhere to
> store that data. Snapshot sizes are fixed maximum with max_connections,
> so eventually we would not be able to take a snapshot at all, and we'd
> need to have a "ERROR:  unable to take valid snapshot". 

When a backend dies with FATAL, it writes an abort record before exiting.

(I was under the impression it doesn't until few minutes ago myself, 
when I actually read the shutdown code :-))

> * if FATAL errors occur while holding AccessExclusiveLock then we have
> no way of releasing those locks until the recovery proc is stale, which
> might be some time. Not sure if your patch releases those?

I don't think so, that needs to be fixed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, slot ids and stuff

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> If you want to do things a different way you need to say what you want
> to do and what effects those changes will have. 

I want to reduce the coupling between the primary and the master. The 
less they need to communicate, the better. I want to get rid of slotid, 
and as many of the other extra information carried in WAL records that I 
can. I believe that will make the patch both simpler and more robust.

> Are there tradeoffs? If so what are they?

I don't think there's any big difference in user-visible behavior. 
RecordKnownAssignedTransactionId now needs to be called for every xlog 
record as opposed to just the first record where an xid appears, because 
I eliminated the hint flag in WAL to mark those records. And it needs to 
look up the recover proc by xid, instead of using the slot id. But I 
don't think that will have a significant impact on performance.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > * if FATAL errors occur, yet we have long running transactions then we
> > have no way of removing those entries from the recovery procs. Since we
> > have a fixed pool of recovery transactions we won't have anywhere to
> > store that data. Snapshot sizes are fixed maximum with max_connections,
> > so eventually we would not be able to take a snapshot at all, and we'd
> > need to have a "ERROR:  unable to take valid snapshot". 
> 
> When a backend dies with FATAL, it writes an abort record before exiting.
> 
> (I was under the impression it doesn't until few minutes ago myself, 
> when I actually read the shutdown code :-))

Not in all cases; keep reading :-)

The good thing is we have a choice now as to whether we care about that,
or not.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
>> When a backend dies with FATAL, it writes an abort record before exiting.
>>
>> (I was under the impression it doesn't until few minutes ago myself, 
>> when I actually read the shutdown code :-))
> 
> Not in all cases; keep reading :-)

Want to give a clue? ShutdownPostgres is registered as an on_shmem_exit 
hook in InitPostgres, and ShutdownPostgres calls AbortOutOfAnyTransaction.

PANIC is another story, but in that case the primary will go down, and 
will write a new checkpoint soon after it starts up again.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, slot ids and stuff

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
>> When a backend dies with FATAL, it writes an abort record before exiting.
>> 
>> (I was under the impression it doesn't until few minutes ago myself, 
>> when I actually read the shutdown code :-))

> Not in all cases; keep reading :-)

If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
shared state corrupted, it's only supposed to be a forcible session
termination.  Any open transaction should be rolled back.
        regards, tom lane


Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
> >> When a backend dies with FATAL, it writes an abort record before exiting.
> >> 
> >> (I was under the impression it doesn't until few minutes ago myself, 
> >> when I actually read the shutdown code :-))
> 
> > Not in all cases; keep reading :-)
> 
> If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
> shared state corrupted, it's only supposed to be a forcible session
> termination.  Any open transaction should be rolled back.

Please look back at the earlier discussion we had on this exact point:
http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php

Heikki, perhaps now you understand my continued opposition to your ideas
for simplification: I had already thought of them and was forced to rule
them out, not through my own choice. Tom, note that I listen to what you
say and try to write code that meets those requirements.

>From here, I don't mind which way we go. I want code that is as simple
as possible to "do the job", whatever we agree that to be.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
>>>> When a backend dies with FATAL, it writes an abort record before exiting.
>>>>
>>>> (I was under the impression it doesn't until few minutes ago myself, 
>>>> when I actually read the shutdown code :-))
>>> Not in all cases; keep reading :-)
>> If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
>> shared state corrupted, it's only supposed to be a forcible session
>> termination.  Any open transaction should be rolled back.
> 
> Please look back at the earlier discussion we had on this exact point:
> http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php

I think the running-xacts list we dump to WAL at every checkpoint is 
enough to handle that. Just treat the dead transaction as in-progress 
until the next running-xacts record. It's presumably extremely rare to 
have a process die with FATAL, and not write an abort record.

A related issue is that currently the recovery PANICs if it runs out of 
recovery procs. I think that's not acceptable, regardless of whether we 
use slotids or some other method to avoid it in normal operation, 
because it means you can't recover at all if you set max_connections too 
low in the standby (or in the primary, and you have to recover from 
crash), or you run out of recovery procs because of an abort failed in 
the primary like discussed on that thread. The standby should just 
fast-forward to the next running-xacts record in that case.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote:
> >> Simon Riggs <simon@2ndQuadrant.com> writes:
> >>> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
> >>>> When a backend dies with FATAL, it writes an abort record before exiting.
> >>>>
> >>>> (I was under the impression it doesn't until few minutes ago myself, 
> >>>> when I actually read the shutdown code :-))
> >>> Not in all cases; keep reading :-)
> >> If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
> >> shared state corrupted, it's only supposed to be a forcible session
> >> termination.  Any open transaction should be rolled back.
> > 
> > Please look back at the earlier discussion we had on this exact point:
> > http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php
> 
> I think the running-xacts list we dump to WAL at every checkpoint is 
> enough to handle that. Just treat the dead transaction as in-progress 
> until the next running-xacts record. It's presumably extremely rare to 
> have a process die with FATAL, and not write an abort record.

I agree, but I'll wait for Tom to speak further.

> A related issue is that currently the recovery PANICs if it runs out of 
> recovery procs. I think that's not acceptable, regardless of whether we 
> use slotids or some other method to avoid it in normal operation, 
> because it means you can't recover at all if you set max_connections too 
> low in the standby (or in the primary, and you have to recover from 
> crash), or you run out of recovery procs because of an abort failed in 
> the primary like discussed on that thread. 

> The standby should just 
> fast-forward to the next running-xacts record in that case.

What do you mean by "fast forward"?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote:
>> A related issue is that currently the recovery PANICs if it runs out of 
>> recovery procs. I think that's not acceptable, regardless of whether we 
>> use slotids or some other method to avoid it in normal operation, 
>> because it means you can't recover at all if you set max_connections too 
>> low in the standby (or in the primary, and you have to recover from 
>> crash), or you run out of recovery procs because of an abort failed in 
>> the primary like discussed on that thread. 
> 
>> The standby should just 
>> fast-forward to the next running-xacts record in that case.
> 
> What do you mean by "fast forward"?

I mean the standby should stop trying to track the in progress 
transactions in recovery procs, and apply the WAL records like it does 
before the consistent state is reached.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote:
> >> A related issue is that currently the recovery PANICs if it runs out of 
> >> recovery procs. I think that's not acceptable, regardless of whether we 
> >> use slotids or some other method to avoid it in normal operation, 
> >> because it means you can't recover at all if you set max_connections too 
> >> low in the standby (or in the primary, and you have to recover from 
> >> crash), or you run out of recovery procs because of an abort failed in 
> >> the primary like discussed on that thread. 
> > 
> >> The standby should just 
> >> fast-forward to the next running-xacts record in that case.
> > 
> > What do you mean by "fast forward"?
> 
> I mean the standby should stop trying to track the in progress 
> transactions in recovery procs, and apply the WAL records like it does 
> before the consistent state is reached.

If you say something is not acceptable you need to say what behaviour
you do find acceptable in its place. It's good to come up with new
ideas, but it's not good to ignore the problems the new ideas have. This
is a general point that applies both here and to your proposals with
synch rep. It's much harder to say how it should work in a way that
covers all the requirements and points others have made, otherwise its
just handwaving.

So, if we don't PANIC, how should we behave?

Without full information on running-xacts we would be unable to take a
snapshot, so should:
* backends be forcibly disconnected?
* backends hang waiting for snapshot info to be re-available again in X
minutes worth of WAL time?
* backends throw an ERROR:  unable to provide snapshot at this time,
DETAIL: retry your statement later. 
...other alternatives

and possibly prevent new connections.

If max_connections is higher on primary then the standby will *never* be
available for querying. Should we have multiple ERRORs depending upon
whether the situation is hopefully-temporary or looks-permanent?

Don't assume I want the PANIC. That clearly needs to be revisited if we
change slotids. I just want to make a balanced judgement based upon a
full consideration of the options.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
>> I mean the standby should stop trying to track the in progress 
>> transactions in recovery procs, and apply the WAL records like it does 
>> before the consistent state is reached.
> 
> ...
> 
> So, if we don't PANIC, how should we behave?
> 
> Without full information on running-xacts we would be unable to take a
> snapshot, so should:
> * backends be forcibly disconnected?
> * backends hang waiting for snapshot info to be re-available again in X
> minutes worth of WAL time?
> * backends throw an ERROR:  unable to provide snapshot at this time,
> DETAIL: retry your statement later. 
> ...other alternatives
> 
> and possibly prevent new connections.

All of those seem reasonable to me. The 2nd option seems nicest, "X 
minutes" should probably be controlled by max_standby_delay, after which 
you can throw an error.

If we care enough, we could also keep tracking the transactions in 
backend-private memory of the startup process, until there's enough room 
in proc array. That would make the outage shorter, because you wouldn't 
have to wait until the next running-xacts record, but only until enough 
transactions have finished that they all fit in proc array again.

But whatever is the simplest, really.

> If max_connections is higher on primary then the standby will *never* be
> available for querying. Should we have multiple ERRORs depending upon
> whether the situation is hopefully-temporary or looks-permanent?
> 
> Don't assume I want the PANIC. That clearly needs to be revisited if we
> change slotids. 

It needs to be revisited whether we change slotids or not, IMHO.

Note that with slotids, you have a problem as soon as any of the slots 
that don't exist on standby are used, regardless of how many concurrent 
transactions there actually is. Without slots you only have a problem if  you really have more than standby's
max_connectionsconcurrent 
 
transactions. That makes a big difference in practice.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Fri, 2009-01-09 at 14:38 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
> >> I mean the standby should stop trying to track the in progress 
> >> transactions in recovery procs, and apply the WAL records like it does 
> >> before the consistent state is reached.
> > 
> > ...
> > 
> > So, if we don't PANIC, how should we behave?
> > 
> > Without full information on running-xacts we would be unable to take a
> > snapshot, so should:
> > * backends be forcibly disconnected?
> > * backends hang waiting for snapshot info to be re-available again in X
> > minutes worth of WAL time?
> > * backends throw an ERROR:  unable to provide snapshot at this time,
> > DETAIL: retry your statement later. 
> > ...other alternatives
> > 
> > and possibly prevent new connections.
> 
> All of those seem reasonable to me. The 2nd option seems nicest, "X 
> minutes" should probably be controlled by max_standby_delay, after which 
> you can throw an error.

Hmm, we use the recovery procs to track transactions that have
TransactionIds assigned. That means we will overflow only if we have
approach 100% write transactions at any time, or if we have more write
transactions in progress than we have max_connections on standby.

So it sounds like the overflow situation would probably be both rare
and, if it did occur, may not occur for long periods.

> If we care enough, we could also keep tracking the transactions in 
> backend-private memory of the startup process, until there's enough room 
> in proc array. That would make the outage shorter, because you wouldn't 
> have to wait until the next running-xacts record, but only until enough 
> transactions have finished that they all fit in proc array again.
> 
> But whatever is the simplest, really.

The above does sound best since it would allow us to have the snapshot
hang for a short period. But at this stage of the game, more complex.

For now though, since it looks like it would happen fairly rarely, I'd
opt for the simplest: throw an ERROR.

> > If max_connections is higher on primary then the standby will *never* be
> > available for querying. Should we have multiple ERRORs depending upon
> > whether the situation is hopefully-temporary or looks-permanent?
> > 
> > Don't assume I want the PANIC. That clearly needs to be revisited if we
> > change slotids. 
> 
> It needs to be revisited whether we change slotids or not, IMHO.
> 
> Note that with slotids, you have a problem as soon as any of the slots 
> that don't exist on standby are used, regardless of how many concurrent 
> transactions there actually is. Without slots you only have a problem if 
>   you really have more than standby's max_connections concurrent 
> transactions. That makes a big difference in practice.

Sometimes, but mostly people set max_connections higher because they
intend to use those extra connections. So no real advantage there
against the slotid approach :-)

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Fri, 2009-01-09 at 11:20 +0000, Simon Riggs wrote:
> On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> > > On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote:
> > >> Simon Riggs <simon@2ndQuadrant.com> writes:
> > >>> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
> > >>>> When a backend dies with FATAL, it writes an abort record before exiting.
> > >>>>
> > >>>> (I was under the impression it doesn't until few minutes ago myself, 
> > >>>> when I actually read the shutdown code :-))
> > >>> Not in all cases; keep reading :-)
> > >> If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
> > >> shared state corrupted, it's only supposed to be a forcible session
> > >> termination.  Any open transaction should be rolled back.
> > > 
> > > Please look back at the earlier discussion we had on this exact point:
> > > http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php
> > 
> > I think the running-xacts list we dump to WAL at every checkpoint is 
> > enough to handle that. Just treat the dead transaction as in-progress 
> > until the next running-xacts record. It's presumably extremely rare to 
> > have a process die with FATAL, and not write an abort record.
> 
> I agree, but I'll wait for Tom to speak further.

OK, will proceed without this. Time is pressing.

Heikki and I both agree that FATAL errors that fail to write abort
records are rare and an acceptable problem in real usage. If they do
occur, their nuisance factor is short-lived because of measures taken
within the patch.

Hot Standby does not *rely* upon there always an abort record for FATAL
errors, so we cannot reasonably say the current design would be
"unacceptably fragile" as I had once thought.

So based upon that, out comes the slotid concept and luckily much of the
cruftier aspects of the patch. Less code, probably fewer bugs. Good
thing.

I will produce a new v7 with those changes and merge the changes for v6b
also, so we can begin review again from there. 

Hi ho, hi ho...

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote:

> Since I've been whining about that for some time, I figured I have to 
> put my money where my mouth is, so here's a patch based on v6a that 
> eliminates the concept of slotids, as well as the new xl_info2 field in 
> XLogRecord. This seems much simpler to me. I haven't given it much 
> testing, but seems to work. There's a whole bunch of comments marked 
> with XXX that need resolving, though.

There's a confusion in the patch between top level xid and parent xid.
The xl field is named parentxid but actually contains top level. That
distinction is important because the code now uses the top level xid to
locate the recovery proc, formerly the role of the slotid.

This leads to an error when we SubTransSetParent(child_xid, top_xid);
since this assumes that the top_xid is the parent, which it is not.
Mostly you wouldn't notice unless you were looking up the subtrans
status for an xid that had committed but was the child of an aborted
subtransaction, with the top level xid having > 64 subtransactions. It's
possible the confusion leads to other bugs in UnobservedXid processing,
but I didn't look too hard at that.

AFAICS we need both parent and top xids. Or put another way, we need the
parent xid and other info that allows us to uniquely determine the proc
we need to update. Now the "other info..." could be top xid or it could
also be slotid, which then avoids later zig-zagging to look up the proc.

I'm wasn't looking for ways to reintroduce slotid, but it seems more
logical to keep slotid in light of the above. However, you will probably
view this as intransigence, so I will await your input.

I'm very happy that GetStandbyInfoForTransaction() and all the XLR2
flags have bitten the dust and will sleep for eternity.

For xl_rel_lock you add a field called xid and then ask
/* xid of the *parent* transaction. XXX why parent? */.
You've done this because it replaced slotid. But looking at that, I
think the 6a patch had a bug there: a subtransaction abort record would
release locks for the whole top level xact. So we need to pass both top
level xid (or slotid) and xid for each lock, then release using the
actual xid only.

You also ask: Shouldn't we call StartupSUBTRANS() and the other startup
functions like we do below, before letting anyone in?

My answer is that the current role of StartupSUBTRANS and friends is not
appropriate at that point, since they zero out those structures. I left
those routines in place thinking "startup" meant "moving to normal
running". If we did have a "startupsubtrans" at the point you note, it
would currently be empty: we don't keep track of the latest page during
recovery. Perhaps we should, but then we'd need to do the equivalent of
ExtendSubtrans etc, which it seemed easier to avoid.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Sat, 2009-01-10 at 09:40 +0000, Simon Riggs wrote:

> This leads to an error when we SubTransSetParent(child_xid, top_xid);
> since this assumes that the top_xid is the parent, which it is not.
> Mostly you wouldn't notice unless you were looking up the subtrans
> status for an xid that had committed but was the child of an aborted
> subtransaction, with the top level xid having > 64 subtransactions.
> It's possible the confusion leads to other bugs in UnobservedXid
> processing, but I didn't look too hard at that.
> 
> AFAICS we need both parent and top xids.

I wonder if its possible to derive the parent by looking at the
highest/most newly assigned xid? Abort records would remove aborted
subtransactions and AFAIK we currently assign a new subtransaction only
ever from the latest current subtransaction. (This wouldn't be
necessarily true if supported true branch-anywhere subtransactions, but
we don't). Sounds correct, but not really sure.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Sat, 2009-01-10 at 11:45 +0000, Simon Riggs wrote:
> On Sat, 2009-01-10 at 09:40 +0000, Simon Riggs wrote:
> 
> > This leads to an error when we SubTransSetParent(child_xid, top_xid);
> > since this assumes that the top_xid is the parent, which it is not.
> > Mostly you wouldn't notice unless you were looking up the subtrans
> > status for an xid that had committed but was the child of an aborted
> > subtransaction, with the top level xid having > 64 subtransactions.
> > It's possible the confusion leads to other bugs in UnobservedXid
> > processing, but I didn't look too hard at that.
> > 
> > AFAICS we need both parent and top xids.
> 
> I wonder if its possible to derive the parent by looking at the
> highest/most newly assigned xid? Abort records would remove aborted
> subtransactions and AFAIK we currently assign a new subtransaction only
> ever from the latest current subtransaction. (This wouldn't be
> necessarily true if supported true branch-anywhere subtransactions, but
> we don't). Sounds correct, but not really sure.

Starting to sound like a do-me-later-if-ever optimisation and certainly
nothing I want to rely on in court.

I'm progressing with parent_xid added to the xlog record header, for
now.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> There's a confusion in the patch between top level xid and parent xid.
> The xl field is named parentxid but actually contains top level. That
> distinction is important because the code now uses the top level xid to
> locate the recovery proc, formerly the role of the slotid.

True, I changed the meaning halfway through. My thinking was that we can 
get away by only tracking the top-level xact of each subtransaction, not 
the whole tree of subtransactions.

> This leads to an error when we SubTransSetParent(child_xid, top_xid);
> since this assumes that the top_xid is the parent, which it is not.
> Mostly you wouldn't notice unless you were looking up the subtrans
> status for an xid that had committed but was the child of an aborted
> subtransaction, with the top level xid having > 64 subtransactions.

Hmm. When a subtransaction aborts, RecordTransactionAbort is called and 
clog is updated to show the subtransaction and all it's subcommitted 
children as aborted. I think we're safe, though it wouldn't hurt to 
double-check.

It's an important point that needs documenting, though. I completely 
neglected that.

> I'm wasn't looking for ways to reintroduce slotid, but it seems more
> logical to keep slotid in light of the above. However, you will probably
> view this as intransigence, so I will await your input.

Yeah, it sure does seem like intransigence :-)

> For xl_rel_lock you add a field called xid and then ask
> /* xid of the *parent* transaction. XXX why parent? */.
> You've done this because it replaced slotid. But looking at that, I
> think the 6a patch had a bug there: a subtransaction abort record would
> release locks for the whole top level xact. So we need to pass both top
> level xid (or slotid) and xid for each lock, then release using the
> actual xid only.

Hmm, would just the subtransaction xid be enough? If we use that to 
release, what do we need the top-level xid for?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Sun, 2009-01-11 at 10:41 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > There's a confusion in the patch between top level xid and parent xid.
> > The xl field is named parentxid but actually contains top level. That
> > distinction is important because the code now uses the top level xid to
> > locate the recovery proc, formerly the role of the slotid.
>
> True, I changed the meaning halfway through. My thinking was that we can
> get away by only tracking the top-level xact of each subtransaction, not
> the whole tree of subtransactions.
>
> > This leads to an error when we SubTransSetParent(child_xid, top_xid);
> > since this assumes that the top_xid is the parent, which it is not.
> > Mostly you wouldn't notice unless you were looking up the subtrans
> > status for an xid that had committed but was the child of an aborted
> > subtransaction, with the top level xid having > 64 subtransactions.
>
> Hmm. When a subtransaction aborts, RecordTransactionAbort is called and
> clog is updated to show the subtransaction and all it's subcommitted
> children as aborted. I think we're safe, though it wouldn't hurt to
> double-check.

That part of your argument is correct but we are not safe. But please
look at XactLockTableWait() and see what you think. According to
comments this would lead to different lock release behaviour.

Beauty, thy name is not subtransaction.

If you agree that we need the parent xid then we have further problems.
Adding another xid onto the header of each WAL record will add 8 bytes
onto each record because of alignment. This was the other reason for
slotid, since I was able to fit that into just 2 bytes and avoid the 8
byte loss. Moving swiftly on, I have this morning though of an
alternative, so at least we now have some choice. Rather than store the
parent xid itself we store the difference between the current xid and
the parent xid. Typically this will be less than 65535; when it is not
we set it to zero but issue an xid assignment xlog record.

However, I think XactLockTableWait() doesn't need to know the parent
either. (This feels more like wishful thinking, but here goes anyway).
We release locks *after* TransactionIdAbortTree() has fully executed, so
the test for TransactionIdIsInProgress(xid) will always see the abort
status, if set. Notice that if we are awake at all it is because the
top-level transaction is complete or our subxid is aborted. So there is
never any need to look at the status of the parent, nor in fact any need
to look at the procarray at all, which is always a waste of effort. If
you believe that, you'll want to commit the attached patch (or something
similar with comments refactored etc).

> For xl_rel_lock you add a field called xid and then ask
> > /* xid of the *parent* transaction. XXX why parent? */.
> > You've done this because it replaced slotid. But looking at that, I
> > think the 6a patch had a bug there: a subtransaction abort record would
> > release locks for the whole top level xact. So we need to pass both top
> > level xid (or slotid) and xid for each lock, then release using the
> > actual xid only.
>
> Hmm, would just the subtransaction xid be enough? If we use that to
> release, what do we need the top-level xid for?

So we can release all locks taken by subxids at once, when we learn that
a top level xid has disappeared. A minor point.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
Heikki, can I get your feedback on this urgently please? I want to
respond positively to your review comments and complete something you
will find acceptable. But I need to know what that is, please.


On Sun, 2009-01-11 at 11:55 +0000, Simon Riggs wrote:
> On Sun, 2009-01-11 at 10:41 +0200, Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> > > There's a confusion in the patch between top level xid and parent xid.
> > > The xl field is named parentxid but actually contains top level. That
> > > distinction is important because the code now uses the top level xid to
> > > locate the recovery proc, formerly the role of the slotid.
> > 
> > True, I changed the meaning halfway through. My thinking was that we can 
> > get away by only tracking the top-level xact of each subtransaction, not 
> > the whole tree of subtransactions.
> > 
> > > This leads to an error when we SubTransSetParent(child_xid, top_xid);
> > > since this assumes that the top_xid is the parent, which it is not.
> > > Mostly you wouldn't notice unless you were looking up the subtrans
> > > status for an xid that had committed but was the child of an aborted
> > > subtransaction, with the top level xid having > 64 subtransactions.
> > 
> > Hmm. When a subtransaction aborts, RecordTransactionAbort is called and 
> > clog is updated to show the subtransaction and all it's subcommitted 
> > children as aborted. I think we're safe, though it wouldn't hurt to 
> > double-check.
> 
> That part of your argument is correct but we are not safe. But please
> look at XactLockTableWait() and see what you think. According to
> comments this would lead to different lock release behaviour. 
> 
> Beauty, thy name is not subtransaction.
> 
> If you agree that we need the parent xid then we have further problems.
> Adding another xid onto the header of each WAL record will add 8 bytes
> onto each record because of alignment. This was the other reason for
> slotid, since I was able to fit that into just 2 bytes and avoid the 8
> byte loss. Moving swiftly on, I have this morning though of an
> alternative, so at least we now have some choice. Rather than store the
> parent xid itself we store the difference between the current xid and
> the parent xid. Typically this will be less than 65535; when it is not
> we set it to zero but issue an xid assignment xlog record.
> 
> However, I think XactLockTableWait() doesn't need to know the parent
> either. (This feels more like wishful thinking, but here goes anyway).
> We release locks *after* TransactionIdAbortTree() has fully executed, so
> the test for TransactionIdIsInProgress(xid) will always see the abort
> status, if set. Notice that if we are awake at all it is because the
> top-level transaction is complete or our subxid is aborted. So there is
> never any need to look at the status of the parent, nor in fact any need
> to look at the procarray at all, which is always a waste of effort. If
> you believe that, you'll want to commit the attached patch (or something
> similar with comments refactored etc).
> 
> > For xl_rel_lock you add a field called xid and then ask
> > > /* xid of the *parent* transaction. XXX why parent? */.
> > > You've done this because it replaced slotid. But looking at that, I
> > > think the 6a patch had a bug there: a subtransaction abort record would
> > > release locks for the whole top level xact. So we need to pass both top
> > > level xid (or slotid) and xid for each lock, then release using the
> > > actual xid only.
> > 
> > Hmm, would just the subtransaction xid be enough? If we use that to 
> > release, what do we need the top-level xid for?
> 
> So we can release all locks taken by subxids at once, when we learn that
> a top level xid has disappeared. A minor point.
> 
-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Mon, 2009-01-12 at 14:10 +0200, Heikki Linnakangas wrote:

> > However, I think XactLockTableWait() doesn't need to know the parent
> > either. (This feels more like wishful thinking, but here goes anyway).
> > We release locks *after* TransactionIdAbortTree() has fully executed, so
> > the test for TransactionIdIsInProgress(xid) will always see the abort
> > status, if set. Notice that if we are awake at all it is because the
> > top-level transaction is complete or our subxid is aborted. So there is
> > never any need to look at the status of the parent, nor in fact any need
> > to look at the procarray at all, which is always a waste of effort. 
> 
> Right, we don't currently write a WAL record at subtransaction commit, 
> only at subtransaction abort or top-level commit. So the problem 
> described in the comment at XactLockTableWait() can't arise in the standby.

Good. So we can exclude parent_xid and just include top level xid. So we
are now officially back inside the current xlog record padding: we don't
need to increase WAL record length and therefore we can lose slotid
(yay!).

> Actually, I wonder if we should write a WAL record at subtransaction 
> commit too, to save on shared memory in the standby as well.

You understand that the current design never performs
XactLockTableInsert() for individual top/sub transactions? There would
be no memory overhead to save.

The only locks allowed are AccessShareLocks which never conflict with
anything except for AccessExclusiveLocks. If an AEL conflicts with an
ASL we blow away the ASL holders. If an ASL requestor is blocked by a an
AEL the wait is performed on the single lock table entry for the Startup
process, which acts as a proxy for all emulated AEL lock holders.

> > If
> > you believe that, you'll want to commit the attached patch (or something
> > similar with comments refactored etc).
> 
> Umm, we still need the SubTransGetParent() call in normal operation.

OK, that was really just a thought experiment anyway to prove the point
one way or the other.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Rather than store the
> parent xid itself we store the difference between the current xid and
> the parent xid. Typically this will be less than 65535; when it is not
> we set it to zero but issue an xid assignment xlog record.

That sounds pretty hacky.

> However, I think XactLockTableWait() doesn't need to know the parent
> either. (This feels more like wishful thinking, but here goes anyway).
> We release locks *after* TransactionIdAbortTree() has fully executed, so
> the test for TransactionIdIsInProgress(xid) will always see the abort
> status, if set. Notice that if we are awake at all it is because the
> top-level transaction is complete or our subxid is aborted. So there is
> never any need to look at the status of the parent, nor in fact any need
> to look at the procarray at all, which is always a waste of effort. 

Right, we don't currently write a WAL record at subtransaction commit, 
only at subtransaction abort or top-level commit. So the problem 
described in the comment at XactLockTableWait() can't arise in the standby.

Actually, I wonder if we should write a WAL record at subtransaction 
commit too, to save on shared memory in the standby as well.

> If
> you believe that, you'll want to commit the attached patch (or something
> similar with comments refactored etc).

Umm, we still need the SubTransGetParent() call in normal operation.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Sat, 2009-01-10 at 09:40 +0000, Simon Riggs wrote:

> But looking at that, I
> think the 6a patch had a bug there: a subtransaction abort record
> would release locks for the whole top level xact. 

Fixed.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > If you want to do things a different way you need to say what you want
> > to do and what effects those changes will have. 
> 
> I want to reduce the coupling between the primary and the master. The 
> less they need to communicate, the better. I want to get rid of slotid, 
> and as many of the other extra information carried in WAL records that I 
> can. I believe that will make the patch both simpler and more robust.
> 
> > Are there tradeoffs? If so what are they?
> 
> I don't think there's any big difference in user-visible behavior. 

I notice that we are no longer able to record the databaseId for a
connection, so query conflict resolution cannot be isolated to
individual databases any longer.

We might sometimes infer a transaction's databaseId from certain WAL
records but that is only going to be possible within each rmgr, which is
fairly strange.

I'll leave all of the databaseId stuff in there in case we think of
anything good.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Hot standby, slot ids and stuff

From
Simon Riggs
Date:
On Thu, 2009-01-15 at 22:10 +0000, Simon Riggs wrote:

> I notice that we are no longer able to record the databaseId for a
> connection, so query conflict resolution cannot be isolated to
> individual databases any longer.

Wrong way round. Incoming WAL records from dbOid on them, so we can
still check for conflicts against the db of the standby backends.
Phew!

> I'll leave all of the databaseId stuff in there in case we think of
> anything good.

Ripping out now as final part of earlier refactoring.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support