Thread: Reviewers Guide to Deferred Transactions/Transaction Guarantee

Reviewers Guide to Deferred Transactions/Transaction Guarantee

From
"Simon Riggs"
Date:
transaction_guarantee.v11.patch
- keep current, cleanup, more comments and docs

Brief Performance Analysis
--------------------------

I've tested 3 scenarios:
1. normal
2. wal_writer_delay = 100ms
3. wal_writer_delay = 100ms and transaction_guarantee = off

On my laptop, with a scale=1 pgbench database with 1 connection I
consistently get around 85 tps in mode (1), with a slight performance
drop in mode (2). In mode (3) I get anywhere from 200tps - 900 tps,
depending upon how well cached everything is, with 700 tps being fairly
typical. fsync = on gives around 900tps.

Also good speedups with multiple session tests.

make installcheck passes in 120 sec in mode (3), though 155 sec in mode
(1) and 158 sec in mode (2).

Basic Implementation
--------------------

xact.c
xact.h

The basic implementation simply records the LSN of the xlog commit
record in a shared memory area, the deferred fsync cache.

ipci.c

The cache is protected by an LWlock called DeferredFsyncLock.

lwlock.h

A WALWriter process wakes up regularly to perform a background flush of
WAL up to the point of the highest LSN in the deferred fsync cache.

walwriter.c
walwriter.h
postmaster.c

WALWriter can be enabled only at server start.
(All above same as March 11 version)

Correctness
-----------

postgres.c

Only certain code paths can execute transaction_guarantee = off
transactions, though the main code paths for OLTP allow it.

xlog.c

CreateCheckpoint() must protect against starting a checkpoint when
commits are not yet flushed, so an additional flush must occur here.

vacuum.c

VACUUM FULL cannot move tuples until their states are all known, so this
command triggers a background flush also.

clog.c
clog.h
slru.c
slru.h

Changes to Clog and SLRU enforce the basic rule of WAL-before-data,
which otherwise might allow the record of a commit to reach disk before
the flush of the WAL. This is implemented by storing an LSN for each
clog page.

transam.c
transam.h
twophase.c
xact.c

The above files have API changes that allow the LSN at transaction
commit to be passed through to the Clog.

tqual.c
tqual.h
multixact.c
multixact.h

Visibility hint bits must also not be set before the transaction is
flushed, so other changes are required to ensure we store the LSN of
each transaction, not just the maximum LSN. Changes to tqual.c appear
extensive, though this is just refactoring to allow us to make
additional function calls before setting bits - there are no functional
changes to any HeapTupleSatisfies... functions.

xact.c

Contains the module for the Deferred Transaction functions and in
particular the deferred transaction cache. This could be a separate
module, since there is only a slight link with the other xact.c code.

User Interface
--------------

guc.c
postgresql.conf.sample
guc_table.h

New parameters have been added, with a new parameter grouping of
WAL_COMMITS created to control the various commit parameters.

Performance Tuning
------------------

The WALWriter wakes up each eal_writer_delay milliseconds. There are two
protections against mis-setting this parameter.

pmsignal.h

The WALWriter will also be woken by a signal if the DF cache has nearly
filled and flushing would be desirable.

The WALWriter will also loop without any delay if the number of
transactions committed while it was writing WAL is above a threshold
value.

Docs
----
The fsync parameter has been removed from postgresql.conf.sample and the
docs, though it still exists in this patch to allow performance testing
during Beta. It is suggested that fsync=on should mean the same thing as
transaction_guarantee = off, wal_writer_delay = 100ms, if it is
specified in postgresql.conf or on the server command line.

A new section in wal.sgml willd escribe this in more detail, later.

Open Questions
--------------

1. Should the DFC use a standard hash table? Custom code allows both
additional speed and the ability to signal when it fills.

2. Should tqual.c update the LSN of a heap page with the LSN of the
transaction commit that it can read from the DF cache?

3. Should the WALWriter also do the wal_buffers half-full write at the
start of XLogInsert() ?

4. The recent changes to remove CheckpointStartLock haven't changed the
code path for deferred transactions, so a similar solution might be
possible there also.

5. Is it correct to do WAL-before-flush for clog only, or should this
be multixact also?

All of the above are fairly minor changes.

Any other thoughts/comments/tests welcome.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Attachment

Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

From
"Simon Riggs"
Date:
On Thu, 2007-04-05 at 22:56 +0100, Simon Riggs wrote:
> transaction_guarantee.v11.patch

correct files attached

> Open Questions
> --------------
>
> 1. Should the DFC use a standard hash table? Custom code allows both
> additional speed and the ability to signal when it fills.
>
> 2. Should tqual.c update the LSN of a heap page with the LSN of the
> transaction commit that it can read from the DF cache?

I now think we should update the LSN of the page, but not changed yet.

> 3. Should the WALWriter also do the wal_buffers half-full write at the
> start of XLogInsert() ?

Not that important

> 4. The recent changes to remove CheckpointStartLock haven't changed the
> code path for deferred transactions, so a similar solution might be
> possible there also.

Some further discussion required here, I think. That change may actually
have introduced a slight risk into the patch. Will raise at review.

> 5. Is it correct to do WAL-before-flush for clog only, or should this
> be multixact also?

Not necessary

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Attachment

Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
>> 4. The recent changes to remove CheckpointStartLock haven't changed the
>> code path for deferred transactions, so a similar solution might be
>> possible there also.

> Some further discussion required here, I think. That change may actually
> have introduced a slight risk into the patch. Will raise at review.

Given that you're going to be gone for the next two weeks, I'm wondering
when you think that discussion will happen.

            regards, tom lane

Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee

From
"Simon Riggs"
Date:
On Sun, 2007-04-08 at 11:05 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> >> 4. The recent changes to remove CheckpointStartLock haven't changed the
> >> code path for deferred transactions, so a similar solution might be
> >> possible there also.
>
> > Some further discussion required here, I think. That change may actually
> > have introduced a slight risk into the patch. Will raise at review.
>
> Given that you're going to be gone for the next two weeks, I'm wondering
> when you think that discussion will happen.

Well, now is good... but I would never say "this must happen now".

I'm sorry my schedule is busy at this time, I really thought the change
of dates would mean I'd avoid my normal disappearing trick. Previously
its been family holidays, now its just other business I am called to.


My concern was this:

If we flush the currently outstanding deferred transactions then that
doesn't guarantee they have all reached the clog. Previously, a deferred
transaction would not release the CheckpointStartLock until after the
clog had been updated.

If we wait for all currently inCommit transactions to end this will
cover all deferred transactions also. So I think I just need to flush
deferred transactions prior to the wait and this will be valid. Would
you agree?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee

From
"Simon Riggs"
Date:
On Sun, 2007-04-08 at 17:02 +0100, Simon Riggs wrote:

> My concern was this:
>
> If we flush the currently outstanding deferred transactions then that
> doesn't guarantee they have all reached the clog. Previously, a deferred
> transaction would not release the CheckpointStartLock until after the
> clog had been updated.
>
> If we wait for all currently inCommit transactions to end this will
> cover all deferred transactions also. So I think I just need to flush
> deferred transactions prior to the wait and this will be valid. Would
> you agree?

I'm good with this now, sorry for the noise.

From the existing code in CreateCheckpoint, just need to add a
background flush immediately prior to the newly added waits. That would
replace what I've got in the current patch where I hold the lock across
the calculation the WAL insert pointer for the checkpoint which was too
safe - there is no need for prior WAL to be flushed at that point.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

From
ITAGAKI Takahiro
Date:
"Simon Riggs" <simon@2ndquadrant.com> wrote:

> > transaction_guarantee.v11.patch
> correct files attached

This is a small fix to transaction_guarantee patch.
WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms.
Other changes are only for suppressing warnings.

We might also need to increase NUM_AUXILIARY_PROCS (=3) for WAL writer,
but I didn't change it in the patch. (I don't know why the value is 3
-- bgwriter, autovacuum launcher and ... what?)


BTW, the following TODO item comes to my mind:
| Allow WAL traffic to be streamed to another server for stand-by replication
We have to open sockets to another server when we want to stream WAL.
If there were WAL writer, we can save the number of those sockets.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee

From
"Simon Riggs"
Date:
On Tue, 2007-04-10 at 20:46 +0900, ITAGAKI Takahiro wrote:
> "Simon Riggs" <simon@2ndquadrant.com> wrote:
>
> > > transaction_guarantee.v11.patch
> > correct files attached
>
> This is a small fix to transaction_guarantee patch.
> WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms.
> Other changes are only for suppressing warnings.

Thanks

> BTW, the following TODO item comes to my mind:
> | Allow WAL traffic to be streamed to another server for stand-by replication
> We have to open sockets to another server when we want to stream WAL.
> If there were WAL writer, we can save the number of those sockets.

I'll be looking at designs for that in the next cycle.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Tue, 2007-04-10 at 20:46 +0900, ITAGAKI Takahiro wrote:
> > "Simon Riggs" <simon@2ndquadrant.com> wrote:
> >
> > > > transaction_guarantee.v11.patch
> > > correct files attached
> >
> > This is a small fix to transaction_guarantee patch.
> > WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms.
> > Other changes are only for suppressing warnings.
>
> Thanks
>
> > BTW, the following TODO item comes to my mind:
> > | Allow WAL traffic to be streamed to another server for stand-by replication
> > We have to open sockets to another server when we want to stream WAL.
> > If there were WAL writer, we can save the number of those sockets.
>
> I'll be looking at designs for that in the next cycle.

Already a TODO:

* Allow WAL traffic to be streamed to another server for stand-by
  replication

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Reviewers Guide to Deferred Transactions/Transaction Guarantee

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Simon Riggs wrote:
> transaction_guarantee.v11.patch
> - keep current, cleanup, more comments and docs
>
> Brief Performance Analysis
> --------------------------
>
> I've tested 3 scenarios:
> 1. normal
> 2. wal_writer_delay = 100ms
> 3. wal_writer_delay = 100ms and transaction_guarantee = off
>
> On my laptop, with a scale=1 pgbench database with 1 connection I
> consistently get around 85 tps in mode (1), with a slight performance
> drop in mode (2). In mode (3) I get anywhere from 200tps - 900 tps,
> depending upon how well cached everything is, with 700 tps being fairly
> typical. fsync = on gives around 900tps.
>
> Also good speedups with multiple session tests.
>
> make installcheck passes in 120 sec in mode (3), though 155 sec in mode
> (1) and 158 sec in mode (2).
>
> Basic Implementation
> --------------------
>
> xact.c
> xact.h
>
> The basic implementation simply records the LSN of the xlog commit
> record in a shared memory area, the deferred fsync cache.
>
> ipci.c
>
> The cache is protected by an LWlock called DeferredFsyncLock.
>
> lwlock.h
>
> A WALWriter process wakes up regularly to perform a background flush of
> WAL up to the point of the highest LSN in the deferred fsync cache.
>
> walwriter.c
> walwriter.h
> postmaster.c
>
> WALWriter can be enabled only at server start.
> (All above same as March 11 version)
>
> Correctness
> -----------
>
> postgres.c
>
> Only certain code paths can execute transaction_guarantee = off
> transactions, though the main code paths for OLTP allow it.
>
> xlog.c
>
> CreateCheckpoint() must protect against starting a checkpoint when
> commits are not yet flushed, so an additional flush must occur here.
>
> vacuum.c
>
> VACUUM FULL cannot move tuples until their states are all known, so this
> command triggers a background flush also.
>
> clog.c
> clog.h
> slru.c
> slru.h
>
> Changes to Clog and SLRU enforce the basic rule of WAL-before-data,
> which otherwise might allow the record of a commit to reach disk before
> the flush of the WAL. This is implemented by storing an LSN for each
> clog page.
>
> transam.c
> transam.h
> twophase.c
> xact.c
>
> The above files have API changes that allow the LSN at transaction
> commit to be passed through to the Clog.
>
> tqual.c
> tqual.h
> multixact.c
> multixact.h
>
> Visibility hint bits must also not be set before the transaction is
> flushed, so other changes are required to ensure we store the LSN of
> each transaction, not just the maximum LSN. Changes to tqual.c appear
> extensive, though this is just refactoring to allow us to make
> additional function calls before setting bits - there are no functional
> changes to any HeapTupleSatisfies... functions.
>
> xact.c
>
> Contains the module for the Deferred Transaction functions and in
> particular the deferred transaction cache. This could be a separate
> module, since there is only a slight link with the other xact.c code.
>
> User Interface
> --------------
>
> guc.c
> postgresql.conf.sample
> guc_table.h
>
> New parameters have been added, with a new parameter grouping of
> WAL_COMMITS created to control the various commit parameters.
>
> Performance Tuning
> ------------------
>
> The WALWriter wakes up each eal_writer_delay milliseconds. There are two
> protections against mis-setting this parameter.
>
> pmsignal.h
>
> The WALWriter will also be woken by a signal if the DF cache has nearly
> filled and flushing would be desirable.
>
> The WALWriter will also loop without any delay if the number of
> transactions committed while it was writing WAL is above a threshold
> value.
>
> Docs
> ----
> The fsync parameter has been removed from postgresql.conf.sample and the
> docs, though it still exists in this patch to allow performance testing
> during Beta. It is suggested that fsync=on should mean the same thing as
> transaction_guarantee = off, wal_writer_delay = 100ms, if it is
> specified in postgresql.conf or on the server command line.
>
> A new section in wal.sgml willd escribe this in more detail, later.
>
> Open Questions
> --------------
>
> 1. Should the DFC use a standard hash table? Custom code allows both
> additional speed and the ability to signal when it fills.
>
> 2. Should tqual.c update the LSN of a heap page with the LSN of the
> transaction commit that it can read from the DF cache?
>
> 3. Should the WALWriter also do the wal_buffers half-full write at the
> start of XLogInsert() ?
>
> 4. The recent changes to remove CheckpointStartLock haven't changed the
> code path for deferred transactions, so a similar solution might be
> possible there also.
>
> 5. Is it correct to do WAL-before-flush for clog only, or should this
> be multixact also?
>
> All of the above are fairly minor changes.
>
> Any other thoughts/comments/tests welcome.
>
> --
>   Simon Riggs
>   EnterpriseDB   http://www.enterprisedb.com
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Simon Riggs wrote:
> On Thu, 2007-04-05 at 22:56 +0100, Simon Riggs wrote:
> > transaction_guarantee.v11.patch
>
> correct files attached
>
> > Open Questions
> > --------------
> >
> > 1. Should the DFC use a standard hash table? Custom code allows both
> > additional speed and the ability to signal when it fills.
> >
> > 2. Should tqual.c update the LSN of a heap page with the LSN of the
> > transaction commit that it can read from the DF cache?
>
> I now think we should update the LSN of the page, but not changed yet.
>
> > 3. Should the WALWriter also do the wal_buffers half-full write at the
> > start of XLogInsert() ?
>
> Not that important
>
> > 4. The recent changes to remove CheckpointStartLock haven't changed the
> > code path for deferred transactions, so a similar solution might be
> > possible there also.
>
> Some further discussion required here, I think. That change may actually
> have introduced a slight risk into the patch. Will raise at review.
>
> > 5. Is it correct to do WAL-before-flush for clog only, or should this
> > be multixact also?
>
> Not necessary
>
> --
>   Simon Riggs
>   EnterpriseDB   http://www.enterprisedb.com
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


ITAGAKI Takahiro wrote:
> "Simon Riggs" <simon@2ndquadrant.com> wrote:
>
> > > transaction_guarantee.v11.patch
> > correct files attached
>
> This is a small fix to transaction_guarantee patch.
> WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms.
> Other changes are only for suppressing warnings.
>
> We might also need to increase NUM_AUXILIARY_PROCS (=3) for WAL writer,
> but I didn't change it in the patch. (I don't know why the value is 3
> -- bgwriter, autovacuum launcher and ... what?)
>
>
> BTW, the following TODO item comes to my mind:
> | Allow WAL traffic to be streamed to another server for stand-by replication
> We have to open sockets to another server when we want to stream WAL.
> If there were WAL writer, we can save the number of those sockets.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +