Reviewers Guide to Deferred Transactions/Transaction Guarantee - Mailing list pgsql-patches

From Simon Riggs
Subject Reviewers Guide to Deferred Transactions/Transaction Guarantee
Date
Msg-id 1175810202.3623.437.camel@silverbirch.site
Whole thread Raw
Responses Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee  ("Simon Riggs" <simon@2ndquadrant.com>)
Re: Reviewers Guide to Deferred Transactions/Transaction Guarantee  (Bruce Momjian <bruce@momjian.us>)
List pgsql-patches
transaction_guarantee.v11.patch
- keep current, cleanup, more comments and docs

Brief Performance Analysis
--------------------------

I've tested 3 scenarios:
1. normal
2. wal_writer_delay = 100ms
3. wal_writer_delay = 100ms and transaction_guarantee = off

On my laptop, with a scale=1 pgbench database with 1 connection I
consistently get around 85 tps in mode (1), with a slight performance
drop in mode (2). In mode (3) I get anywhere from 200tps - 900 tps,
depending upon how well cached everything is, with 700 tps being fairly
typical. fsync = on gives around 900tps.

Also good speedups with multiple session tests.

make installcheck passes in 120 sec in mode (3), though 155 sec in mode
(1) and 158 sec in mode (2).

Basic Implementation
--------------------

xact.c
xact.h

The basic implementation simply records the LSN of the xlog commit
record in a shared memory area, the deferred fsync cache.

ipci.c

The cache is protected by an LWlock called DeferredFsyncLock.

lwlock.h

A WALWriter process wakes up regularly to perform a background flush of
WAL up to the point of the highest LSN in the deferred fsync cache.

walwriter.c
walwriter.h
postmaster.c

WALWriter can be enabled only at server start.
(All above same as March 11 version)

Correctness
-----------

postgres.c

Only certain code paths can execute transaction_guarantee = off
transactions, though the main code paths for OLTP allow it.

xlog.c

CreateCheckpoint() must protect against starting a checkpoint when
commits are not yet flushed, so an additional flush must occur here.

vacuum.c

VACUUM FULL cannot move tuples until their states are all known, so this
command triggers a background flush also.

clog.c
clog.h
slru.c
slru.h

Changes to Clog and SLRU enforce the basic rule of WAL-before-data,
which otherwise might allow the record of a commit to reach disk before
the flush of the WAL. This is implemented by storing an LSN for each
clog page.

transam.c
transam.h
twophase.c
xact.c

The above files have API changes that allow the LSN at transaction
commit to be passed through to the Clog.

tqual.c
tqual.h
multixact.c
multixact.h

Visibility hint bits must also not be set before the transaction is
flushed, so other changes are required to ensure we store the LSN of
each transaction, not just the maximum LSN. Changes to tqual.c appear
extensive, though this is just refactoring to allow us to make
additional function calls before setting bits - there are no functional
changes to any HeapTupleSatisfies... functions.

xact.c

Contains the module for the Deferred Transaction functions and in
particular the deferred transaction cache. This could be a separate
module, since there is only a slight link with the other xact.c code.

User Interface
--------------

guc.c
postgresql.conf.sample
guc_table.h

New parameters have been added, with a new parameter grouping of
WAL_COMMITS created to control the various commit parameters.

Performance Tuning
------------------

The WALWriter wakes up each eal_writer_delay milliseconds. There are two
protections against mis-setting this parameter.

pmsignal.h

The WALWriter will also be woken by a signal if the DF cache has nearly
filled and flushing would be desirable.

The WALWriter will also loop without any delay if the number of
transactions committed while it was writing WAL is above a threshold
value.

Docs
----
The fsync parameter has been removed from postgresql.conf.sample and the
docs, though it still exists in this patch to allow performance testing
during Beta. It is suggested that fsync=on should mean the same thing as
transaction_guarantee = off, wal_writer_delay = 100ms, if it is
specified in postgresql.conf or on the server command line.

A new section in wal.sgml willd escribe this in more detail, later.

Open Questions
--------------

1. Should the DFC use a standard hash table? Custom code allows both
additional speed and the ability to signal when it fills.

2. Should tqual.c update the LSN of a heap page with the LSN of the
transaction commit that it can read from the DF cache?

3. Should the WALWriter also do the wal_buffers half-full write at the
start of XLogInsert() ?

4. The recent changes to remove CheckpointStartLock haven't changed the
code path for deferred transactions, so a similar solution might be
possible there also.

5. Is it correct to do WAL-before-flush for clog only, or should this
be multixact also?

All of the above are fairly minor changes.

Any other thoughts/comments/tests welcome.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Attachment

pgsql-patches by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: autovacuum multiworkers, patch 5
Next
From: Tatsuo Ishii
Date:
Subject: Re: pgbench transaction timestamps