Re: [PATCHES] Reviewers Guide to Deferred Transactions/TransactionGuarantee - Mailing list pgsql-hackers

On Thu, 2007-04-12 at 15:56 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > transaction_guarantee.v11.patch 

Thanks for the review.

> I can't help feeling that this is enormously overcomplicated.

I agree with all but one of your comments, see below. 

> The "DFC" in particular seems to not be worth its overhead.  Why wouldn't
> we simply track the newest commit record at all times, and then whenever
> the wal writer wakes up, it would write/fsync that far (or write/fsync
> all completed WAL pages, if there's no new commit record to worry
> about)?

> The other interesting issue is not letting hint-bit updates get to disk
> in advance of the WAL flush, but I don't see a need to track those at
> a per-transaction level: just advance page LSN to latest commit record
> any time a hint bit is updated.  The commit will likely be flushed
> before we'd be interested in writing the buffer out anyway.  Moreover,
> the way you are doing it creates a conflict in that the DFC has to
> guarantee to remember every unflushed transaction, whereas it really
> needs to be just an approximate cache for its performance to be good.

I've spent a few hours thinking on this and I'm happy with it now. The
lure of removing that much code is too strong to resist; its certainly
easier to remove code after freeze than it is to add it.

Advancing the LSN too far was a worry of mine, but we have the code now
to cope if that shows to be a problem in testing. So lets strip that
out.

> I see the concern about not letting clog pages go to disk before the
> corresponding WAL data is flushed, but that could be handled much more
> simply: just force a flush through the newest commit record before any
> write of a clog page.  Those writes are infrequent enough (every 32K
> transactions or one checkpoint) that this seems not a serious problem.

This bit I'm not that happy with. You're right its fairly infrequent,
but the clog pages are typically written when we extend the clog. That
happens while holding XidGenLock and ProcArrayLock, so holding those
across an additional (and real) I/O is going to make that blockage
worse. We've been to great pains in other places to remove logjams and
we know that the follow-on effects of logjams are not swift to clear
when the system is running at full load on multiple CPU systems.

The code to implement this is pretty clean: a few extra lines in
clog/slru and bubbled-up API changes. 

I was actually thinking of adding something to the bgwriter to clean the
LRU block of the clog, if it was dirty, once per cycle, to further
reduce the possibility of I/O at that point.

> AFAIK there is no need to associate any forced flush with multixacts;
> there is no state saved across crashes for those anyway.

Agreed. 

> I don't see a point in allowing the WAL writer to be disabled ---
> I believe it will be a performance win just like the bgwriter,
> independently of whether transaction_guarantee is used or not,
> by helping to keep down the number of dirty WAL buffers.  That in
> turn allows some other simplifications, like not needing an assign hook
> for transaction_guarantee.

That would be pleasant. The other changes make hint bit setting need a
LWlock request, so I wanted to include a way of saying "I never ever
want to use transaction_guarantee = off". I see the beauty of your
suggestion and agree. 

So keep the parameter, but let it default to 100ms?
Range 10-1000ms?

> I disagree with your desire to remove the fsync parameter.  It may have
> less use than before with this feature, but that doesn't mean it has
> none.

OK

> > 3. Should the WALWriter also do the wal_buffers half-full write at the
> > start of XLogInsert() ?
> 
> That should go away entirely; to me the main point of the separate
> wal-writer process is to take over responsibility for not letting too
> many dirty wal buffers accumulate.

Yes


I'll make the agreed changes by next Wed/Thurs. 

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: "Simon Riggs"
Date:
Subject: Re: Group Commit
Next
From: "Hiroshi Saito"
Date:
Subject: Re: Vista/IPv6