Thread: Subtransaction commits and Hot Standby

Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
Subtransactions cause a couple of problems for Hot Standby:
* we don't record new subtransactionids in WAL, so we have no direct way
to issue updates to subtrans while in recovery mode as would normally
happen when we assign subtransaction ids (subxids)
* we don't record subtransaction commit in WAL, so we have no direct way
to issue updates to clog to show TRANSACTION_STATUS_SUB_COMMITTED while
in recovery mode

The obvious solutions are to record this in WAL by doing
(1) every new subxid create an entry in WAL, either by creating a whole
new record or by augmenting the first WAL record created by the new
subxid
(2) we issue a WAL record every time we subcommit

Lets see if we can improve on that:

If we did not have an entry in subtrans, what would happen? We would be
unable to tell whether a subtransaction was visible or not in a snapshot
because TransactionIdIsInProgress() requires it. And so, if an xid was
active at time-of-snapshot yet has now committed we would mistakenly
conclude it was visible. So we must be able to track every xid back to
our snapshot *if* there should be a linkage.

However, we only need to check subtrans if the snapshot cache has
overflowed. Is it possible to only insert entries into subtrans when
they will be required, rather than always? Hmmm, nothing clear, so we
must do (1) it seems.

If we did not have an entry in clog, what would happen? Correct clog
entries are *not* required to prove whether an xid is a subxid and part
of our snapshot, we only need subtrans for that.
XidInMVCCSnapshot() calls SubTransGetTopmostTransaction() which doesn't
touch clog at all. Clog entries are currently required when we issue
TransactionIdDidCommit() on a subxid. If clog entries were missing then
we would only make a mistake about the correct status of a subxid when
we were mid-way through updating the status of an xid to committed.

So clog update might still be needed for subtransactions, but not until
commit time, so we can completely avoid the need to generate WAL records
at time of subcommit for use in Hot Standby. And it would seem that if
we update the clog atomically, we would not need to mark the subcommit
state in clog at all, even in normal running.

Right now we lock and unlock the clog for each committed subtransaction
at commit time, which is wasteful. A better scheme: pre-scan the list of xids to derive list of pages if we have just a
singlepage to update {       update all entries on page in one action } else {       loop thru xids marking them all as
subcommittedmarktop level transaction committed       loop thus xids again marking them all as committed }
 

All clog updates would be performed page-at-a-time, in ascending xid
order.

This seems likely to work well since many subtransactions will be on
same clog page as the top-level xid and the locking will often be more
efficient than the current case of repeated single lock acquisitions. It
also means we can skip RecordSubTransactionCommit() entirely,
significantly reducing clog contention.

Anybody see a problem there?

If not, I will work on separate patches:
* re-work subtrans commit so that we use the form described above. This
should improve performance for both normal and standby modes, hence do
this as a stand-alone patch
* include code in the main hot standby patch to update subtrans during
recovery when a new subtransaction is created

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Subtransaction commits and Hot Standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Subtransactions cause a couple of problems for Hot Standby:

Do we need to treat subtransactions any differently from normal 
transactions? Just treat all subtransactions as top-level transactions 
until commit, and mark them all as committed when you see the commit 
record for the top-level transaction.

> Right now we lock and unlock the clog for each committed subtransaction
> at commit time, which is wasteful. A better scheme:
>   pre-scan the list of xids to derive list of pages
>   if we have just a single page to update
>   {
>         update all entries on page in one action
>   }
>   else
>   {
>         loop thru xids marking them all as subcommitted
>     mark top level transaction committed
>         loop thus xids again marking them all as committed
>   }
> 
> All clog updates would be performed page-at-a-time, in ascending xid
> order.
> 
> This seems likely to work well since many subtransactions will be on
> same clog page as the top-level xid and the locking will often be more
> efficient than the current case of repeated single lock acquisitions. It
> also means we can skip RecordSubTransactionCommit() entirely,
> significantly reducing clog contention.
> 
> Anybody see a problem there?

Hmm, I don't see anything immediately wrong with that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Subtransaction commits and Hot Standby

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:
> Simon Riggs wrote:
>> Subtransactions cause a couple of problems for Hot Standby:
>
> Do we need to treat subtransactions any differently from normal  
> transactions? Just treat all subtransactions as top-level transactions  
> until commit, and mark them all as committed when you see the commit  
> record for the top-level transaction.

This could lead to inconsistent results -- some of the subtransactions
could be marked as committed while others are still in progress.  Unless
we want to be able to atomically mark them all as committed, but I don't
think that's really an option because it could mean holding the clog
lock for a long time, possibly involving I/O of clog pages.

>> Right now we lock and unlock the clog for each committed subtransaction
>> at commit time, which is wasteful. A better scheme:
>>   pre-scan the list of xids to derive list of pages
>>   if we have just a single page to update
>>   {
>>         update all entries on page in one action
>>   }
>>   else
>>   {
>>         loop thru xids marking them all as subcommitted
>>     mark top level transaction committed
>>         loop thus xids again marking them all as committed
>>   }

> Hmm, I don't see anything immediately wrong with that.

Neither do I.

I wonder if the improved clog API required to mark multiple transactions
as committed at once would be also useful to TransactionIdCommitTree
which is used in regular transaction commit.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Tue, 2008-09-16 at 17:01 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > Subtransactions cause a couple of problems for Hot Standby:
> 
> Do we need to treat subtransactions any differently from normal 
> transactions? Just treat all subtransactions as top-level transactions 
> until commit, and mark them all as committed when you see the commit 
> record for the top-level transaction.

If we do that, snapshots become infinitely sized objects though, which
then requires us to invent some way of scrolling it to disk. So having
removed the need for subtrans, I then need to reinvent something similar
(or at least something like a multitrans entry).

Perhaps it is sufficient to throw an error if the subxid cache
overflows? But I suspect that may not be acceptable...

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Tue, 2008-09-16 at 10:11 -0400, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> >> Subtransactions cause a couple of problems for Hot Standby:
> >
> > Do we need to treat subtransactions any differently from normal  
> > transactions? Just treat all subtransactions as top-level transactions  
> > until commit, and mark them all as committed when you see the commit  
> > record for the top-level transaction.
> 
> This could lead to inconsistent results -- some of the subtransactions
> could be marked as committed while others are still in progress.  Unless
> we want to be able to atomically mark them all as committed, but I don't
> think that's really an option because it could mean holding the clog
> lock for a long time, possibly involving I/O of clog pages.

If we did that we would need to mark them all subcomitted and then mark
them all committed. So its possible, but not desirable.

> I wonder if the improved clog API required to mark multiple
> transactions
> as committed at once would be also useful to TransactionIdCommitTree
> which is used in regular transaction commit.

Yes, I think its an improvement for regular commits/subcommits also.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Subtransaction commits and Hot Standby

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> Perhaps it is sufficient to throw an error if the subxid cache
> overflows? But I suspect that may not be acceptable...

Certainly not.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Tue, 2008-09-16 at 10:11 -0400, Alvaro Herrera wrote:

> >> Right now we lock and unlock the clog for each committed subtransaction
> >> at commit time, which is wasteful. A better scheme:
> >>   pre-scan the list of xids to derive list of pages
> >>   if we have just a single page to update
> >>   {
> >>         update all entries on page in one action
> >>   }
> >>   else
> >>   {
> >>         loop thru xids marking them all as subcommitted
> >>     mark top level transaction committed
> >>         loop thus xids again marking them all as committed
> >>   }
>
> > Hmm, I don't see anything immediately wrong with that.
>
> Neither do I.
>
> I wonder if the improved clog API required to mark multiple transactions
> as committed at once would be also useful to TransactionIdCommitTree
> which is used in regular transaction commit.

I enclose a patch to transform what we have now into what I think is
possible. If we agree this is possible, then I will do further work to
optimise transam.c (using clog.c changes also). So this is an
"intermediate" or precursor patch for discussion only.

 backend/access/transam/transam.c  |   78 ++++++++++++-----------------!
 backend/access/transam/twophase.c |    4 !
 backend/access/transam/xact.c     |   50 -----------------!!!!!!!
 include/access/transam.h          |    7 !!!
 4 files changed, 32 insertions(+), 78 deletions(-), 29 modifications(!)

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Tue, 2008-09-16 at 15:38 +0100, Simon Riggs wrote:
> On Tue, 2008-09-16 at 17:01 +0300, Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> > > Subtransactions cause a couple of problems for Hot Standby:
> > 
> > Do we need to treat subtransactions any differently from normal 
> > transactions? Just treat all subtransactions as top-level transactions 
> > until commit, and mark them all as committed when you see the commit 
> > record for the top-level transaction.
> 
> If we do that, snapshots become infinitely sized objects though, which
> then requires us to invent some way of scrolling it to disk. So having
> removed the need for subtrans, I then need to reinvent something similar
> (or at least something like a multitrans entry).

Currently we keep track of whether the whole subxid cache has
overflowed, or not. It seems possible to track for overflows of
individual parts of the cache. That makes the code path for subxid
overflow in GetSnapshotData() slightly slower, but it's not the common
case. We save time elsewhere in more common cases.

We would be able to avoid making an entry in subtrans for new subxids
unless our local backend has overflowed its cache. That will reduce
subtrans access frequency considerably and greatly reduce the number of
requests that might need to perform I/O, possibly to zero. It would also
avoid the need for generating WAL records for new subxids for standby.

The path thru XidInMVCCSnapshot() would then require us to *always*
check the subxid cache, even if it has overflowed. If we find the xid
then we don't need to check subtrans at all. That's quite useful because
searching the subxid cache is cheaper than looking in subtrans and the
probability it would be there rather than in subtrans is still good,
even for overflows of up to 3-5 times the subxid cache. It would
increase the cost of subxid checking slightly when running with very
high numbers of subxids.

For Hot Standby, this would mean we could avoid generating WAL records
for new subxids in most cases - only generate them when our backend's
subxid cache has overflowed. On the standby it then means we can store
xids into a fixed size snapshot without worrying about whether it
overflows because the xids all fitted in the snapshot on the master
(whose xids we are emulating), *or* we have a WAL record that tells us
the cache overflowed and we make the insert into subtrans instead. When
we use the standby snapshot we look in subxid cache first and if we
don't find it then we check in subtrans.

Sounds possible?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Tue, 2008-09-16 at 11:08 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> 
> > Perhaps it is sufficient to throw an error if the subxid cache
> > overflows? But I suspect that may not be acceptable...
> 
> Certainly not.

Yeh :-) ... it was just a rhetorical question. I'll try to avoid those.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Tue, 2008-09-16 at 10:11 -0400, Alvaro Herrera wrote:

> I wonder if the improved clog API required to mark multiple
> transactions as committed at once would be also useful to
> TransactionIdCommitTree which is used in regular transaction commit.

I've hacked together this concept patch (WIP).

Not fully tested yet, but it gives a flavour of the API rearrangements
required for atomic clog updates. It passes make check, but that's not
saying enough for a serious review yet. I expect to pick this up again
next week.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Thu, 2008-09-18 at 15:59 +0100, Simon Riggs wrote:
> On Tue, 2008-09-16 at 10:11 -0400, Alvaro Herrera wrote:
>
> > I wonder if the improved clog API required to mark multiple
> > transactions as committed at once would be also useful to
> > TransactionIdCommitTree which is used in regular transaction commit.
>
> I've hacked together this concept patch (WIP).
>
> Not fully tested yet, but it gives a flavour of the API rearrangements
> required for atomic clog updates. It passes make check, but that's not
> saying enough for a serious review yet. I expect to pick this up again
> next week.

I've tested this some more and am much happier with it now.
Also added README details; there are no user interface or behaviour
changes.

The patch removes the need for RecordSubTransactionCommit() which

* reduces response times of subtransaction queries because we are able
to apply these changes in batches at commit time. This requires a
batch-style API that now works atomically, so there is much change in
transam.c

* reduces the path length for visibility tests for all users viewing
concurrent subtransaction activity since we are much less likely to
waste time following a long trail to an uncommitted higher-level
transaction

* removes the need for additional WAL logging to implement
subtransaction commits for Hot Standby

So half the patch is refactoring, half rearranging of clog access
functions to support batched-access.

An early review would greatly help my work on Hot Standby. Thanks.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

Re: [PATCHES] Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Tue, 2008-09-23 at 22:47 +0100, Simon Riggs wrote:

> I've tested this some more and am much happier with it now.

The concept is fine, but I've found a coding bug in further testing.
Please wait now for new version before review.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: [PATCHES] Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Wed, 2008-09-24 at 13:48 +0100, Simon Riggs wrote:
> On Tue, 2008-09-23 at 22:47 +0100, Simon Riggs wrote:
>
> > I've tested this some more and am much happier with it now.
>
> The concept is fine, but I've found a coding bug in further testing.
> Please wait now for new version before review.

OK, spent long time testing various batching scenarios for this using a
custom test harness to simulate various spreads of xids in transaction
trees. All looks fine now.

The main work is done in new clog.c functions:
TransactionIdSetTreeStatus() which sets whole tree atomically by calling
TransactionIdSetPageStatus(), which in turn calls
TransactionIdSetStatusBit() for each xid status change.

TransactionIdSetPageStatus() performs locking and handles write_ok
problem, as did code it replaces. TransactionIdSetPageStatus() is called
theoretical minimum number of times for any transaction tree.

Patch slightly fumbles diff-ing new and replacement code, so there are
two chunks that appear to show I'm removing locking. I'm not!!

Everything else is just API changes.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

Re: Subtransaction commits and Hot Standby

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> OK, spent long time testing various batching scenarios for this using a
> custom test harness to simulate various spreads of xids in transaction
> trees. All looks fine now.

I had a look and was mostly rephrasing some comments and the README
(hopefully I didn't make any of them worse than they were), when I
noticed that the code to iterate thru pages could be refactored.  I
think the change makes the algorithm in TransactionIdSetTreeStatus
easier to follow.

I also noticed that TransactionIdSetPageStatus has a "subcommit" arg
which is unexplained.  I sort-of understand the point, but I think it's
better that you fill in the explanation in the header comment (marked
with XXX)

I hope I didn't break the code with the new function
set_tree_status_by_pages -- please recheck that part.

I didn't test this beyond regression tests.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Sun, 2008-10-05 at 14:51 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> 
> > OK, spent long time testing various batching scenarios for this using a
> > custom test harness to simulate various spreads of xids in transaction
> > trees. All looks fine now.
> 
> I had a look and was mostly rephrasing some comments and the README
> (hopefully I didn't make any of them worse than they were), when I
> noticed that the code to iterate thru pages could be refactored.  I
> think the change makes the algorithm in TransactionIdSetTreeStatus
> easier to follow.

OK, thanks for the review.

> I also noticed that TransactionIdSetPageStatus has a "subcommit" arg
> which is unexplained.  I sort-of understand the point, but I think it's
> better that you fill in the explanation in the header comment (marked
> with XXX)

I'll explain some more in the code, and in the README with worked
examples of what we need to do and why.

> I hope I didn't break the code with the new function
> set_tree_status_by_pages -- please recheck that part.

Eyeballs OK.

> I didn't test this beyond regression tests.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Sun, 2008-10-05 at 14:51 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
>
> > OK, spent long time testing various batching scenarios for this using a
> > custom test harness to simulate various spreads of xids in transaction
> > trees. All looks fine now.
>
> I had a look and was mostly rephrasing some comments and the README
> (hopefully I didn't make any of them worse than they were), when I
> noticed that the code to iterate thru pages could be refactored.  I
> think the change makes the algorithm in TransactionIdSetTreeStatus
> easier to follow.

Yes, all fits on one screen when reading it now.

> I also noticed that TransactionIdSetPageStatus has a "subcommit" arg
> which is unexplained.  I sort-of understand the point, but I think it's
> better that you fill in the explanation in the header comment (marked
> with XXX)

I've changed the logic slightly to remove the need for the subcommit
argument. So no need to explain now.

Added an Assert to check for what should be an impossible call.

Example provided in comments for a complex update.

> I hope I didn't break the code with the new function
> set_tree_status_by_pages -- please recheck that part.

Renamed to set_status_by_pages because we never use this on the whole
tree. Added comments to say that.

Overall, cleaner and more readable now. Thanks.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

Re: Subtransaction commits and Hot Standby

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> Renamed to set_status_by_pages because we never use this on the whole
> tree. Added comments to say that.
> 
> Overall, cleaner and more readable now. Thanks.

Committed, thanks.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Subtransaction commits and Hot Standby

From
Simon Riggs
Date:
On Mon, 2008-10-20 at 16:23 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> 
> > Renamed to set_status_by_pages because we never use this on the whole
> > tree. Added comments to say that.
> > 
> > Overall, cleaner and more readable now. Thanks.
> 
> Committed, thanks.

Cheers.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support