Re: Group commit, revised - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Group commit, revised
Date
Msg-id CA+U5nM+Od94quNOsObo7gfiCRijwtJSWko8Gz9tyorHY5e96Sw@mail.gmail.com
Whole thread Raw
In response to Re: Group commit, revised  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Group commit, revised  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Jan 18, 2012 at 1:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jan 17, 2012 at 12:37 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> I found it very helpful to reduce wal_writer_delay in pgbench tests, when
>> running with synchronous_commit=off. The reason is that hint bits don't get
>> set until the commit record is flushed to disk, so making the flushes more
>> frequent reduces the contention on the clog. However, Simon made async
>> commits nudge WAL writer if the WAL page fills up, so I'm not sure how
>> relevant that experience is anymore.

Still completely relevant and orthogonal to this discussion. The patch
retains multi-modal behaviour.

> There's still a small but measurable effect there in master.  I think
> we might be able to make it fully auto-tuning, but I don't think we're
> fully there yet (not sure how much this patch changes that equation).
>
> I suggested a design similar to the one you just proposed to Simon
> when he originally suggested this feature.  It seems that if the WAL
> writer is the only one doing WAL flushes, then there must be some IPC
> overhead - and context switching - involved whenever WAL is flushed.
> But clearly we're saving something somewhere else, on the basis of
> Peter's results, so maybe it's not worth worrying about.  It does seem
> pretty odd to have all the regular backends going through the WAL
> writer and the auxiliary processes using a different mechanism,
> though.  If we got rid of that, maybe WAL writer wouldn't even require
> a lock, if there's only one process that can be doing it at a time.

When we did sync rep it made sense to have the WALSender do the work
and for others to just wait. It would be quite strange to require a
different design for essentially the same thing for normal commits and
WAL flushes to local disk. I should mention the original proposal for
streaming replication had each backend send data to standby
independently and that was recognised as a bad idea after some time.
Same for sync rep also.

The gain is that previously there was measurable contention for the
WALWriteLock, now there is none. Plus the gang effect continues to
work even when the database gets busy, which isn't true of piggyback
commits as we use now.

Not sure why its odd to have backends do one thing and auxiliaries do
another. The whole point of auxiliary processes is that they do a
specific thing different to normal backends. Anyway, the important
thing is to have auxiliary processes be independent of each other as
much as possible, which simplifies error handling and state logic in
the postmaster.

With regard to context switching, we're making a kernel call to fsync,
so we'll get a context switch anyway. The whole process is similar to
the way lwlock wake up works.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Marti Raudsepp
Date:
Subject: Re: Patch review for logging hooks (CF 2012-01)
Next
From: Simon Riggs
Date:
Subject: Re: Should I implement DROP INDEX CONCURRENTLY?