Thread: Reviewers Guide to Deferred Transactions/Transaction Guarantee
transaction_guarantee.v11.patch - keep current, cleanup, more comments and docs Brief Performance Analysis -------------------------- I've tested 3 scenarios: 1. normal 2. wal_writer_delay = 100ms 3. wal_writer_delay = 100ms and transaction_guarantee = off On my laptop, with a scale=1 pgbench database with 1 connection I consistently get around 85 tps in mode (1), with a slight performance drop in mode (2). In mode (3) I get anywhere from 200tps - 900 tps, depending upon how well cached everything is, with 700 tps being fairly typical. fsync = on gives around 900tps. Also good speedups with multiple session tests. make installcheck passes in 120 sec in mode (3), though 155 sec in mode (1) and 158 sec in mode (2). Basic Implementation -------------------- xact.c xact.h The basic implementation simply records the LSN of the xlog commit record in a shared memory area, the deferred fsync cache. ipci.c The cache is protected by an LWlock called DeferredFsyncLock. lwlock.h A WALWriter process wakes up regularly to perform a background flush of WAL up to the point of the highest LSN in the deferred fsync cache. walwriter.c walwriter.h postmaster.c WALWriter can be enabled only at server start. (All above same as March 11 version) Correctness ----------- postgres.c Only certain code paths can execute transaction_guarantee = off transactions, though the main code paths for OLTP allow it. xlog.c CreateCheckpoint() must protect against starting a checkpoint when commits are not yet flushed, so an additional flush must occur here. vacuum.c VACUUM FULL cannot move tuples until their states are all known, so this command triggers a background flush also. clog.c clog.h slru.c slru.h Changes to Clog and SLRU enforce the basic rule of WAL-before-data, which otherwise might allow the record of a commit to reach disk before the flush of the WAL. This is implemented by storing an LSN for each clog page. transam.c transam.h twophase.c xact.c The above files have API changes that allow the LSN at transaction commit to be passed through to the Clog. tqual.c tqual.h multixact.c multixact.h Visibility hint bits must also not be set before the transaction is flushed, so other changes are required to ensure we store the LSN of each transaction, not just the maximum LSN. Changes to tqual.c appear extensive, though this is just refactoring to allow us to make additional function calls before setting bits - there are no functional changes to any HeapTupleSatisfies... functions. xact.c Contains the module for the Deferred Transaction functions and in particular the deferred transaction cache. This could be a separate module, since there is only a slight link with the other xact.c code. User Interface -------------- guc.c postgresql.conf.sample guc_table.h New parameters have been added, with a new parameter grouping of WAL_COMMITS created to control the various commit parameters. Performance Tuning ------------------ The WALWriter wakes up each eal_writer_delay milliseconds. There are two protections against mis-setting this parameter. pmsignal.h The WALWriter will also be woken by a signal if the DF cache has nearly filled and flushing would be desirable. The WALWriter will also loop without any delay if the number of transactions committed while it was writing WAL is above a threshold value. Docs ---- The fsync parameter has been removed from postgresql.conf.sample and the docs, though it still exists in this patch to allow performance testing during Beta. It is suggested that fsync=on should mean the same thing as transaction_guarantee = off, wal_writer_delay = 100ms, if it is specified in postgresql.conf or on the server command line. A new section in wal.sgml willd escribe this in more detail, later. Open Questions -------------- 1. Should the DFC use a standard hash table? Custom code allows both additional speed and the ability to signal when it fills. 2. Should tqual.c update the LSN of a heap page with the LSN of the transaction commit that it can read from the DF cache? 3. Should the WALWriter also do the wal_buffers half-full write at the start of XLogInsert() ? 4. The recent changes to remove CheckpointStartLock haven't changed the code path for deferred transactions, so a similar solution might be possible there also. 5. Is it correct to do WAL-before-flush for clog only, or should this be multixact also? All of the above are fairly minor changes. Any other thoughts/comments/tests welcome. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Attachment
On Thu, 2007-04-05 at 22:56 +0100, Simon Riggs wrote: > transaction_guarantee.v11.patch correct files attached > Open Questions > -------------- > > 1. Should the DFC use a standard hash table? Custom code allows both > additional speed and the ability to signal when it fills. > > 2. Should tqual.c update the LSN of a heap page with the LSN of the > transaction commit that it can read from the DF cache? I now think we should update the LSN of the page, but not changed yet. > 3. Should the WALWriter also do the wal_buffers half-full write at the > start of XLogInsert() ? Not that important > 4. The recent changes to remove CheckpointStartLock haven't changed the > code path for deferred transactions, so a similar solution might be > possible there also. Some further discussion required here, I think. That change may actually have introduced a slight risk into the patch. Will raise at review. > 5. Is it correct to do WAL-before-flush for clog only, or should this > be multixact also? Not necessary -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Attachment
"Simon Riggs" <simon@2ndquadrant.com> writes: >> 4. The recent changes to remove CheckpointStartLock haven't changed the >> code path for deferred transactions, so a similar solution might be >> possible there also. > Some further discussion required here, I think. That change may actually > have introduced a slight risk into the patch. Will raise at review. Given that you're going to be gone for the next two weeks, I'm wondering when you think that discussion will happen. regards, tom lane
On Sun, 2007-04-08 at 11:05 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > >> 4. The recent changes to remove CheckpointStartLock haven't changed the > >> code path for deferred transactions, so a similar solution might be > >> possible there also. > > > Some further discussion required here, I think. That change may actually > > have introduced a slight risk into the patch. Will raise at review. > > Given that you're going to be gone for the next two weeks, I'm wondering > when you think that discussion will happen. Well, now is good... but I would never say "this must happen now". I'm sorry my schedule is busy at this time, I really thought the change of dates would mean I'd avoid my normal disappearing trick. Previously its been family holidays, now its just other business I am called to. My concern was this: If we flush the currently outstanding deferred transactions then that doesn't guarantee they have all reached the clog. Previously, a deferred transaction would not release the CheckpointStartLock until after the clog had been updated. If we wait for all currently inCommit transactions to end this will cover all deferred transactions also. So I think I just need to flush deferred transactions prior to the wait and this will be valid. Would you agree? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Sun, 2007-04-08 at 17:02 +0100, Simon Riggs wrote: > My concern was this: > > If we flush the currently outstanding deferred transactions then that > doesn't guarantee they have all reached the clog. Previously, a deferred > transaction would not release the CheckpointStartLock until after the > clog had been updated. > > If we wait for all currently inCommit transactions to end this will > cover all deferred transactions also. So I think I just need to flush > deferred transactions prior to the wait and this will be valid. Would > you agree? I'm good with this now, sorry for the noise. From the existing code in CreateCheckpoint, just need to add a background flush immediately prior to the newly added waits. That would replace what I've got in the current patch where I hold the lock across the calculation the WAL insert pointer for the checkpoint which was too safe - there is no need for prior WAL to be flushed at that point. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> wrote: > > transaction_guarantee.v11.patch > correct files attached This is a small fix to transaction_guarantee patch. WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms. Other changes are only for suppressing warnings. We might also need to increase NUM_AUXILIARY_PROCS (=3) for WAL writer, but I didn't change it in the patch. (I don't know why the value is 3 -- bgwriter, autovacuum launcher and ... what?) BTW, the following TODO item comes to my mind: | Allow WAL traffic to be streamed to another server for stand-by replication We have to open sockets to another server when we want to stream WAL. If there were WAL writer, we can save the number of those sockets. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
On Tue, 2007-04-10 at 20:46 +0900, ITAGAKI Takahiro wrote: > "Simon Riggs" <simon@2ndquadrant.com> wrote: > > > > transaction_guarantee.v11.patch > > correct files attached > > This is a small fix to transaction_guarantee patch. > WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms. > Other changes are only for suppressing warnings. Thanks > BTW, the following TODO item comes to my mind: > | Allow WAL traffic to be streamed to another server for stand-by replication > We have to open sockets to another server when we want to stream WAL. > If there were WAL writer, we can save the number of those sockets. I'll be looking at designs for that in the next cycle. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Tue, 2007-04-10 at 20:46 +0900, ITAGAKI Takahiro wrote: > > "Simon Riggs" <simon@2ndquadrant.com> wrote: > > > > > > transaction_guarantee.v11.patch > > > correct files attached > > > > This is a small fix to transaction_guarantee patch. > > WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms. > > Other changes are only for suppressing warnings. > > Thanks > > > BTW, the following TODO item comes to my mind: > > | Allow WAL traffic to be streamed to another server for stand-by replication > > We have to open sockets to another server when we want to stream WAL. > > If there were WAL writer, we can save the number of those sockets. > > I'll be looking at designs for that in the next cycle. Already a TODO: * Allow WAL traffic to be streamed to another server for stand-by replication -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Simon Riggs wrote: > transaction_guarantee.v11.patch > - keep current, cleanup, more comments and docs > > Brief Performance Analysis > -------------------------- > > I've tested 3 scenarios: > 1. normal > 2. wal_writer_delay = 100ms > 3. wal_writer_delay = 100ms and transaction_guarantee = off > > On my laptop, with a scale=1 pgbench database with 1 connection I > consistently get around 85 tps in mode (1), with a slight performance > drop in mode (2). In mode (3) I get anywhere from 200tps - 900 tps, > depending upon how well cached everything is, with 700 tps being fairly > typical. fsync = on gives around 900tps. > > Also good speedups with multiple session tests. > > make installcheck passes in 120 sec in mode (3), though 155 sec in mode > (1) and 158 sec in mode (2). > > Basic Implementation > -------------------- > > xact.c > xact.h > > The basic implementation simply records the LSN of the xlog commit > record in a shared memory area, the deferred fsync cache. > > ipci.c > > The cache is protected by an LWlock called DeferredFsyncLock. > > lwlock.h > > A WALWriter process wakes up regularly to perform a background flush of > WAL up to the point of the highest LSN in the deferred fsync cache. > > walwriter.c > walwriter.h > postmaster.c > > WALWriter can be enabled only at server start. > (All above same as March 11 version) > > Correctness > ----------- > > postgres.c > > Only certain code paths can execute transaction_guarantee = off > transactions, though the main code paths for OLTP allow it. > > xlog.c > > CreateCheckpoint() must protect against starting a checkpoint when > commits are not yet flushed, so an additional flush must occur here. > > vacuum.c > > VACUUM FULL cannot move tuples until their states are all known, so this > command triggers a background flush also. > > clog.c > clog.h > slru.c > slru.h > > Changes to Clog and SLRU enforce the basic rule of WAL-before-data, > which otherwise might allow the record of a commit to reach disk before > the flush of the WAL. This is implemented by storing an LSN for each > clog page. > > transam.c > transam.h > twophase.c > xact.c > > The above files have API changes that allow the LSN at transaction > commit to be passed through to the Clog. > > tqual.c > tqual.h > multixact.c > multixact.h > > Visibility hint bits must also not be set before the transaction is > flushed, so other changes are required to ensure we store the LSN of > each transaction, not just the maximum LSN. Changes to tqual.c appear > extensive, though this is just refactoring to allow us to make > additional function calls before setting bits - there are no functional > changes to any HeapTupleSatisfies... functions. > > xact.c > > Contains the module for the Deferred Transaction functions and in > particular the deferred transaction cache. This could be a separate > module, since there is only a slight link with the other xact.c code. > > User Interface > -------------- > > guc.c > postgresql.conf.sample > guc_table.h > > New parameters have been added, with a new parameter grouping of > WAL_COMMITS created to control the various commit parameters. > > Performance Tuning > ------------------ > > The WALWriter wakes up each eal_writer_delay milliseconds. There are two > protections against mis-setting this parameter. > > pmsignal.h > > The WALWriter will also be woken by a signal if the DF cache has nearly > filled and flushing would be desirable. > > The WALWriter will also loop without any delay if the number of > transactions committed while it was writing WAL is above a threshold > value. > > Docs > ---- > The fsync parameter has been removed from postgresql.conf.sample and the > docs, though it still exists in this patch to allow performance testing > during Beta. It is suggested that fsync=on should mean the same thing as > transaction_guarantee = off, wal_writer_delay = 100ms, if it is > specified in postgresql.conf or on the server command line. > > A new section in wal.sgml willd escribe this in more detail, later. > > Open Questions > -------------- > > 1. Should the DFC use a standard hash table? Custom code allows both > additional speed and the ability to signal when it fills. > > 2. Should tqual.c update the LSN of a heap page with the LSN of the > transaction commit that it can read from the DF cache? > > 3. Should the WALWriter also do the wal_buffers half-full write at the > start of XLogInsert() ? > > 4. The recent changes to remove CheckpointStartLock haven't changed the > code path for deferred transactions, so a similar solution might be > possible there also. > > 5. Is it correct to do WAL-before-flush for clog only, or should this > be multixact also? > > All of the above are fairly minor changes. > > Any other thoughts/comments/tests welcome. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Simon Riggs wrote: > On Thu, 2007-04-05 at 22:56 +0100, Simon Riggs wrote: > > transaction_guarantee.v11.patch > > correct files attached > > > Open Questions > > -------------- > > > > 1. Should the DFC use a standard hash table? Custom code allows both > > additional speed and the ability to signal when it fills. > > > > 2. Should tqual.c update the LSN of a heap page with the LSN of the > > transaction commit that it can read from the DF cache? > > I now think we should update the LSN of the page, but not changed yet. > > > 3. Should the WALWriter also do the wal_buffers half-full write at the > > start of XLogInsert() ? > > Not that important > > > 4. The recent changes to remove CheckpointStartLock haven't changed the > > code path for deferred transactions, so a similar solution might be > > possible there also. > > Some further discussion required here, I think. That change may actually > have introduced a slight risk into the patch. Will raise at review. > > > 5. Is it correct to do WAL-before-flush for clog only, or should this > > be multixact also? > > Not necessary > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- ITAGAKI Takahiro wrote: > "Simon Riggs" <simon@2ndquadrant.com> wrote: > > > > transaction_guarantee.v11.patch > > correct files attached > > This is a small fix to transaction_guarantee patch. > WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms. > Other changes are only for suppressing warnings. > > We might also need to increase NUM_AUXILIARY_PROCS (=3) for WAL writer, > but I didn't change it in the patch. (I don't know why the value is 3 > -- bgwriter, autovacuum launcher and ... what?) > > > BTW, the following TODO item comes to my mind: > | Allow WAL traffic to be streamed to another server for stand-by replication > We have to open sockets to another server when we want to stream WAL. > If there were WAL writer, we can save the number of those sockets. > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +