Reviewers Guide to Deferred Transactions/Transaction Guarantee - Mailing list pgsql-patches
From | Simon Riggs |
---|---|
Subject | Reviewers Guide to Deferred Transactions/Transaction Guarantee |
Date | |
Msg-id | 1175810202.3623.437.camel@silverbirch.site Whole thread Raw |
Responses |
Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee
("Simon Riggs" <simon@2ndquadrant.com>)
Re: Reviewers Guide to Deferred Transactions/Transaction Guarantee (Bruce Momjian <bruce@momjian.us>) |
List | pgsql-patches |
transaction_guarantee.v11.patch - keep current, cleanup, more comments and docs Brief Performance Analysis -------------------------- I've tested 3 scenarios: 1. normal 2. wal_writer_delay = 100ms 3. wal_writer_delay = 100ms and transaction_guarantee = off On my laptop, with a scale=1 pgbench database with 1 connection I consistently get around 85 tps in mode (1), with a slight performance drop in mode (2). In mode (3) I get anywhere from 200tps - 900 tps, depending upon how well cached everything is, with 700 tps being fairly typical. fsync = on gives around 900tps. Also good speedups with multiple session tests. make installcheck passes in 120 sec in mode (3), though 155 sec in mode (1) and 158 sec in mode (2). Basic Implementation -------------------- xact.c xact.h The basic implementation simply records the LSN of the xlog commit record in a shared memory area, the deferred fsync cache. ipci.c The cache is protected by an LWlock called DeferredFsyncLock. lwlock.h A WALWriter process wakes up regularly to perform a background flush of WAL up to the point of the highest LSN in the deferred fsync cache. walwriter.c walwriter.h postmaster.c WALWriter can be enabled only at server start. (All above same as March 11 version) Correctness ----------- postgres.c Only certain code paths can execute transaction_guarantee = off transactions, though the main code paths for OLTP allow it. xlog.c CreateCheckpoint() must protect against starting a checkpoint when commits are not yet flushed, so an additional flush must occur here. vacuum.c VACUUM FULL cannot move tuples until their states are all known, so this command triggers a background flush also. clog.c clog.h slru.c slru.h Changes to Clog and SLRU enforce the basic rule of WAL-before-data, which otherwise might allow the record of a commit to reach disk before the flush of the WAL. This is implemented by storing an LSN for each clog page. transam.c transam.h twophase.c xact.c The above files have API changes that allow the LSN at transaction commit to be passed through to the Clog. tqual.c tqual.h multixact.c multixact.h Visibility hint bits must also not be set before the transaction is flushed, so other changes are required to ensure we store the LSN of each transaction, not just the maximum LSN. Changes to tqual.c appear extensive, though this is just refactoring to allow us to make additional function calls before setting bits - there are no functional changes to any HeapTupleSatisfies... functions. xact.c Contains the module for the Deferred Transaction functions and in particular the deferred transaction cache. This could be a separate module, since there is only a slight link with the other xact.c code. User Interface -------------- guc.c postgresql.conf.sample guc_table.h New parameters have been added, with a new parameter grouping of WAL_COMMITS created to control the various commit parameters. Performance Tuning ------------------ The WALWriter wakes up each eal_writer_delay milliseconds. There are two protections against mis-setting this parameter. pmsignal.h The WALWriter will also be woken by a signal if the DF cache has nearly filled and flushing would be desirable. The WALWriter will also loop without any delay if the number of transactions committed while it was writing WAL is above a threshold value. Docs ---- The fsync parameter has been removed from postgresql.conf.sample and the docs, though it still exists in this patch to allow performance testing during Beta. It is suggested that fsync=on should mean the same thing as transaction_guarantee = off, wal_writer_delay = 100ms, if it is specified in postgresql.conf or on the server command line. A new section in wal.sgml willd escribe this in more detail, later. Open Questions -------------- 1. Should the DFC use a standard hash table? Custom code allows both additional speed and the ability to signal when it fills. 2. Should tqual.c update the LSN of a heap page with the LSN of the transaction commit that it can read from the DF cache? 3. Should the WALWriter also do the wal_buffers half-full write at the start of XLogInsert() ? 4. The recent changes to remove CheckpointStartLock haven't changed the code path for deferred transactions, so a similar solution might be possible there also. 5. Is it correct to do WAL-before-flush for clog only, or should this be multixact also? All of the above are fairly minor changes. Any other thoughts/comments/tests welcome. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Attachment
pgsql-patches by date: