Thread: Async Commit, v21

Async Commit, v21

From
"Simon Riggs"
Date:
Latest version of synchronous_commit = on | off

Applies cleanly to CVS HEAD. Passes make check with/without
synchronous_commit on in postgresql.conf, while running
--enable-cassert.

I expect to review this tomorrow to make sure everything is correctly
changed before requesting final review/commit.

Docs included at top of main patch.

All comments welcome.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Attachment

Re: Async Commit, v21

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Simon Riggs wrote:
> Latest version of synchronous_commit = on | off
>
> Applies cleanly to CVS HEAD. Passes make check with/without
> synchronous_commit on in postgresql.conf, while running
> --enable-cassert.
>
> I expect to review this tomorrow to make sure everything is correctly
> changed before requesting final review/commit.
>
> Docs included at top of main patch.
>
> All comments welcome.
>
> --
>   Simon Riggs
>   EnterpriseDB  http://www.enterprisedb.com

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Tue, 2007-07-17 at 01:28 -0400, Bruce Momjian wrote:
> Your patch has been added to the PostgreSQL unapplied patches list at:
>
>     http://momjian.postgresql.org/cgi-bin/pgpatches
>
> It will be applied as soon as one of the PostgreSQL committers reviews
> and approves it.
>
> ---------------------------------------------------------------------------
>
>
> Simon Riggs wrote:
> > Latest version of synchronous_commit = on | off
> >
> > Applies cleanly to CVS HEAD. Passes make check with/without
> > synchronous_commit on in postgresql.conf, while running
> > --enable-cassert.
> >
> > I expect to review this tomorrow to make sure everything is correctly
> > changed before requesting final review/commit.
> >
> > Docs included at top of main patch.
> >
> > All comments welcome.
> >

Here's the latest version. I've reviewed this to check that this does
what I want it to do, re-written various comments and changed a few
minor points in the code.

I've also added a chunk to transam/README that describes the workings of
the patch from a high level.

Now ready for final review.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Attachment

Re: Async Commit, v21 (now: v22)

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Simon Riggs wrote:
> On Tue, 2007-07-17 at 01:28 -0400, Bruce Momjian wrote:
> > Your patch has been added to the PostgreSQL unapplied patches list at:
> >
> >     http://momjian.postgresql.org/cgi-bin/pgpatches
> >
> > It will be applied as soon as one of the PostgreSQL committers reviews
> > and approves it.
> >
> > ---------------------------------------------------------------------------
> >
> >
> > Simon Riggs wrote:
> > > Latest version of synchronous_commit = on | off
> > >
> > > Applies cleanly to CVS HEAD. Passes make check with/without
> > > synchronous_commit on in postgresql.conf, while running
> > > --enable-cassert.
> > >
> > > I expect to review this tomorrow to make sure everything is correctly
> > > changed before requesting final review/commit.
> > >
> > > Docs included at top of main patch.
> > >
> > > All comments welcome.
> > >
>
> Here's the latest version. I've reviewed this to check that this does
> what I want it to do, re-written various comments and changed a few
> minor points in the code.
>
> I've also added a chunk to transam/README that describes the workings of
> the patch from a high level.
>
> Now ready for final review.
>
> --
>   Simon Riggs
>   EnterpriseDB  http://www.enterprisedb.com

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Async Commit, v21 (now: v22)

From
Peter Eisentraut
Date:
Am Dienstag, 17. Juli 2007 20:31 schrieb Simon Riggs:
> Here's the latest version. I've reviewed this to check that this does
> what I want it to do, re-written various comments and changed a few
> minor points in the code.
>
> I've also added a chunk to transam/README that describes the workings of
> the patch from a high level.
>
> Now ready for final review.

I'm not sure the following explanation is all that clear:

+    <para>
+     Asynchronous commit provides different behaviour to setting
+     <varname>fsync</varname> = off, since that is a server-wide
+     setting that will alter the behaviour of all transactions,
+     overriding the setting of <varname>synchronous_commit</varname>,
+     as well as risking much wider data loss.  With <varname>fsync</varname>
+       = off the WAL written but not fsynced, so data is lost only in case
+       of a system crash. With asynchronous commit the WAL is not written
+       to disk at all by the user, so data is lost if there is a database
+       server crash, as well as when the system crashes.
+    </para>

On the one hand, it claims that fsync off has much wider data loss
implications than asynchronous commit, on the other hand, it states that the
risk of a loss due to asynchronous commit happening is larger than fsync off.
I *think* I know what this is trying to say, but I suspect that the average
user might not be able to make a good choice of settings.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Async Commit, v21 (now: v22)

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Here's the latest version. I've reviewed this to check that this does
> what I want it to do, re-written various comments and changed a few
> minor points in the code.
>
> I've also added a chunk to transam/README that describes the workings of
> the patch from a high level.
>
> Now ready for final review.

Just a couple of quick comments:

Having to call XLogBackgroundFlush(2) to flush the WAL up to the latest
async commit is a weird interface. How about passing a "bool force"
argument instead? Or adding another function "XLogFlushCommits".

walwriter.h is missing the

#ifndef XXXX_H
#define XXXX_H

boilerplate.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Wed, 2007-07-18 at 16:44 +0100, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > Here's the latest version. I've reviewed this to check that this does
> > what I want it to do, re-written various comments and changed a few
> > minor points in the code.
> >
> > I've also added a chunk to transam/README that describes the workings of
> > the patch from a high level.
> >
> > Now ready for final review.
>
> Just a couple of quick comments:
>
> Having to call XLogBackgroundFlush(2) to flush the WAL up to the latest
> async commit is a weird interface. How about passing a "bool force"
> argument instead? Or adding another function "XLogFlushCommits".
>
> walwriter.h is missing the
>
> #ifndef XXXX_H
> #define XXXX_H
>
> boilerplate.

OK. Will do, thanks.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> OK. Will do, thanks.

Make sure to remove the bogus comment about "pgstat considers our data
as gone" in walwriter.c as well (in the sigjmp block)

--
Alvaro Herrera                          Developer, http://www.PostgreSQL.org/
"Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152)

Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Wed, 2007-07-18 at 12:04 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
>
> > OK. Will do, thanks.
>
> Make sure to remove the bogus comment about "pgstat considers our data
> as gone" in walwriter.c as well (in the sigjmp block)

Thanks, its gone in v23.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Wed, 2007-07-18 at 17:16 +0100, Simon Riggs wrote:
> On Wed, 2007-07-18 at 12:04 -0400, Alvaro Herrera wrote:
> > Simon Riggs wrote:
> >
> > > OK. Will do, thanks.
> >
> > Make sure to remove the bogus comment about "pgstat considers our data
> > as gone" in walwriter.c as well (in the sigjmp block)
>
> Thanks, its gone in v23.

Thanks to everyone for the various review comments.

Here's v23, including all suggested changes, plus some reworking of the
transaction APIs to reduce the footprint of the patch.

Performance tests: pgbench -c 4 -t 10000
- pgbench standard, sync 800 tps, async 1200 tps
- single insert only, sync 3000 tps, async 9000 tps

Async performance appears to be just short of fsync=off

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Attachment

Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Wed, 2007-07-18 at 12:12 +0200, Peter Eisentraut wrote:
> Am Dienstag, 17. Juli 2007 20:31 schrieb Simon Riggs:
> > Here's the latest version. I've reviewed this to check that this does
> > what I want it to do, re-written various comments and changed a few
> > minor points in the code.
> >
> > I've also added a chunk to transam/README that describes the workings of
> > the patch from a high level.
> >
> > Now ready for final review.
>
> I'm not sure the following explanation is all that clear:
>
> +    <para>
> +     Asynchronous commit provides different behaviour to setting
> +     <varname>fsync</varname> = off, since that is a server-wide
> +     setting that will alter the behaviour of all transactions,
> +     overriding the setting of <varname>synchronous_commit</varname>,
> +     as well as risking much wider data loss.  With <varname>fsync</varname>
> +       = off the WAL written but not fsynced, so data is lost only in case
> +       of a system crash. With asynchronous commit the WAL is not written
> +       to disk at all by the user, so data is lost if there is a database
> +       server crash, as well as when the system crashes.
> +    </para>
>
> On the one hand, it claims that fsync off has much wider data loss
> implications than asynchronous commit, on the other hand, it states that the
> risk of a loss due to asynchronous commit happening is larger than fsync off.
> I *think* I know what this is trying to say, but I suspect that the average
> user might not be able to make a good choice of settings.

Thanks for reading. Updated version in new patch.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Thanks for reading. Updated version in new patch.

What was the reasoning for basing walwriter.c on autovacuum (which needs
to be able to execute transactions) rather than bgwriter (which does
not)?  The shutdown logic in particular seems all wrong; you can't have
a process connected to shared memory that is going to outlive the
postmaster.

            regards, tom lane

Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Mon, 2007-07-23 at 18:59 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > Thanks for reading. Updated version in new patch.
>
> What was the reasoning for basing walwriter.c on autovacuum (which needs
> to be able to execute transactions) rather than bgwriter (which does
> not)?

Writing WAL means we have to have xlog access and therefore shared
memory access. Don't really need the ability to execute transactions
though, tis true, but I wasn't aware there was a distinction.

>  The shutdown logic in particular seems all wrong; you can't have
> a process connected to shared memory that is going to outlive the
> postmaster.

It seemed to work cleanly when I tested it initially, but I'll take
another look tomorrow. By version 23 I was going code-blind.

Autovac is the most clean implementation of a special process, so seemed
like a good prototype. I'd thought I'd combed out any pointless code
though.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> Autovac is the most clean implementation of a special process, so seemed
> like a good prototype. I'd thought I'd combed out any pointless code
> though.

What, you mean there's pointless code in autovac?  Hey, be sure to let
me know about it!

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Async Commit, v21 (now: v22)

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> Here's v23, including all suggested changes, plus some reworking of the
> transaction APIs to reduce the footprint of the patch.

What's the thing about doing the flush twice in a couple of comments in
calls to XLogBackgroundFlush?  Are they just leftover comments from
older code?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Async Commit, v21 (now: v22)

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Mon, 2007-07-23 at 18:59 -0400, Tom Lane wrote:
>> The shutdown logic in particular seems all wrong; you can't have
>> a process connected to shared memory that is going to outlive the
>> postmaster.

> It seemed to work cleanly when I tested it initially, but I'll take
> another look tomorrow. By version 23 I was going code-blind.

Leave it be for the moment --- I'm working on it now so it'd just be
duplicated effort.  I'm inclined though to rebase at least the signal
handling on bgwriter.c; don't like different processes using different
signal interpretations unless there's a good reason for it.

I came across another point worthy of mention: as given, the patch turns
XLogWrite's "flexible write" logic into dead code, because there are no
callers that pass flexible = true.  We could rip that out, but it seems
to me there's still some value to it under high load conditions ("high
load" meaning we fill multiple wal pages per wal_writer_delay).  What
I'm inclined to do is modify XLogBackgroundFlush so that it uses
flexible = true when it's flushing whole pages.  The downside to that
is that it could take as many as three walwriter cycles, instead of two,
to guararantee an async commit is down to disk:
    * first write/flush stops at buffer wraparound point
    * second one stops at last complete page
    * third finally is certain to write any remaining commit record
This seems like no big problem to me, but does anyone want to object?

(BTW, in case you can't tell from the drift of my questions, I've
separated the patch into "add background wal writer" and "add async
commit", and am working on the first half.)

            regards, tom lane

Re: Async Commit, v21 (now: v22)

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> What's the thing about doing the flush twice in a couple of comments in
> calls to XLogBackgroundFlush?  Are they just leftover comments from
> older code?

I was wondering that too --- they looked like obsolete comments to me.

My current thinking BTW is that trying to make XLogBackgroundFlush serve
two purposes is counterproductive.  It should be dedicated to use by the
walwriter only, and the checkpoint case should simply read the async
commit pointer and call regular XLogFlush with it.

            regards, tom lane

Re: Async Commit, v21 (now: v22)

From
Tom Lane
Date:
I wrote:
> (BTW, in case you can't tell from the drift of my questions, I've
> separated the patch into "add background wal writer" and "add async
> commit", and am working on the first half.)

I've committed the first half of that.  Something that still needs
investigation is what the default wal_writer_delay should be.  I left
it at 200ms as submitted, but in some crude testing here (just running
the regression tests and strace'ing the walwriter) it seemed that I had
to crank it down to 10ms before the walwriter was really doing the
majority of the wal-flushing work.

            regards, tom lane

Re: Async Commit, v21 (now: v22)

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> I wrote:
>> (BTW, in case you can't tell from the drift of my questions, I've
>> separated the patch into "add background wal writer" and "add async
>> commit", and am working on the first half.)
>
> I've committed the first half of that.  Something that still needs
> investigation is what the default wal_writer_delay should be.  I left
> it at 200ms as submitted, but in some crude testing here (just running
> the regression tests and strace'ing the walwriter) it seemed that I had
> to crank it down to 10ms before the walwriter was really doing the
> majority of the wal-flushing work.

Without async commits? Do we really want the walwriter doing the majority of
the wal-flushing work for normal commits? It seems like that's not going to be
any advantage over just having some random backend do the commit.

The only way the walwriter seems like an advantage is either with async
commits or with something like commit_delay and some extra logic in the
walwriter to aim for grouping together the maximum number of commits.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Mon, 2007-07-23 at 20:02 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
>
> > Autovac is the most clean implementation of a special process, so seemed
> > like a good prototype. I'd thought I'd combed out any pointless code
> > though.
>
> What, you mean there's pointless code in autovac?  Hey, be sure to let
> me know about it!

Nothing pointless in autovac. It's clean, which is why I used it for the
basis for walwriter.c. Once copied, some parts were not needed.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Mon, 2007-07-23 at 21:11 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > What's the thing about doing the flush twice in a couple of comments in
> > calls to XLogBackgroundFlush?  Are they just leftover comments from
> > older code?
>
> I was wondering that too --- they looked like obsolete comments to me.

True, recent API change meant they were slightly off.

> My current thinking BTW is that trying to make XLogBackgroundFlush serve
> two purposes is counterproductive.  It should be dedicated to use by the
> walwriter only, and the checkpoint case should simply read the async
> commit pointer and call regular XLogFlush with it.

OK

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Tue, 2007-07-24 at 00:58 -0400, Tom Lane wrote:
> I wrote:
> > (BTW, in case you can't tell from the drift of my questions, I've
> > separated the patch into "add background wal writer" and "add async
> > commit", and am working on the first half.)
>
> I've committed the first half of that.

Cool

> Something that still needs
> investigation is what the default wal_writer_delay should be.  I left
> it at 200ms as submitted, but in some crude testing here (just running
> the regression tests and strace'ing the walwriter) it seemed that I had
> to crank it down to 10ms before the walwriter was really doing the
> majority of the wal-flushing work.

There are two things to consider here.

If you are running solely async commit then walwriter should be at a
somewhat higher setting. The default is set at the "works well on crappy
hardware" value, for which my laptop is a good simulation. 50ms or below
reduces benefit considerably.

If you are not running async commits then walwriter can provide some
form of group commit. To do that you need to wind the time down. I think
that's what your seeing now.

My hope is to formalise that in the next release, so that walwriter can
autotune and to allow group commit

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Mon, 2007-07-23 at 21:06 -0400, Tom Lane wrote:

> I came across another point worthy of mention: as given, the patch turns
> XLogWrite's "flexible write" logic into dead code, because there are no
> callers that pass flexible = true.  We could rip that out, but it seems
> to me there's still some value to it under high load conditions ("high
> load" meaning we fill multiple wal pages per wal_writer_delay).  What
> I'm inclined to do is modify XLogBackgroundFlush so that it uses
> flexible = true when it's flushing whole pages.  The downside to that
> is that it could take as many as three walwriter cycles, instead of two,
> to guararantee an async commit is down to disk:
>     * first write/flush stops at buffer wraparound point
>     * second one stops at last complete page
>     * third finally is certain to write any remaining commit record
> This seems like no big problem to me, but does anyone want to object?

Sure, that would work.

The logic is still the same: we only need the last flush if there is a
pause in the flow of transactions, so the majority of transactions will
still reach disk in one wal_writer_delay.

Most of the time the first and second phases are performed in just one
write anyway. It's just the few times we're next to the circular buffer
wrap point we would need this.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
Florian Weimer
Date:
+    <para>
+     Asynchronous commit provides different behaviour to setting
+     <varname>fsync</varname> = off, which is a server-wide
+     setting that will alter the behaviour of all transactions,
+     overriding the setting of <varname>synchronous_commit</varname>,
+     as well as risking much wider data loss.  With <varname>fsync</varname>
+     = off the WAL written but not fsynced, so data is lost only in case
+     of a system crash. Both WAL and datafiles written within the last
+     few seconds would be at risk, affecting all types of database
+     transactions. The precise number depends upon whether your operating
+     system is configured for automatic filesystem writeback and what the
+     delay is set too; Linux currently defaults to 30 seconds.  With
+     asynchronous commit the WAL is not written to disk at all at commit
+     time, so data is lost if there is a database server crash, whether or
+     not the system crashes at the same time.
+    </para>

I think fsync=off also endagers metadata, while synchronous_commit=off
should be perfectly safe as far as the metadata is concerned.
Wouldn't this be worth mentioning as well?


--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Tue, 2007-07-24 at 10:51 +0200, Florian Weimer wrote:
> +    <para>
> +     Asynchronous commit provides different behaviour to setting
> +     <varname>fsync</varname> = off, which is a server-wide
> +     setting that will alter the behaviour of all transactions,
> +     overriding the setting of <varname>synchronous_commit</varname>,
> +     as well as risking much wider data loss.  With <varname>fsync</varname>
> +     = off the WAL written but not fsynced, so data is lost only in case
> +     of a system crash. Both WAL and datafiles written within the last
> +     few seconds would be at risk, affecting all types of database
> +     transactions. The precise number depends upon whether your operating
> +     system is configured for automatic filesystem writeback and what the
> +     delay is set too; Linux currently defaults to 30 seconds.  With
> +     asynchronous commit the WAL is not written to disk at all at commit
> +     time, so data is lost if there is a database server crash, whether or
> +     not the system crashes at the same time.
> +    </para>
>
> I think fsync=off also endagers metadata, while synchronous_commit=off
> should be perfectly safe as far as the metadata is concerned.
> Wouldn't this be worth mentioning as well?

Well, I think "wider data loss" covers it for me, but I don't have a
problem with people wanting to detail the various problems that can
occur.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
Florian Weimer
Date:
* Simon Riggs:

>> I think fsync=off also endagers metadata, while synchronous_commit=off
>> should be perfectly safe as far as the metadata is concerned.
>> Wouldn't this be worth mentioning as well?
>
> Well, I think "wider data loss" covers it for me, but I don't have a
> problem with people wanting to detail the various problems that can
> occur.

To be honest, I'm mostly concerned with the "synchronous_commit=off
does not wedge the whole database" part of that statement.  For me as
a user, this is a crucial piece of information, and I don't like it
when such information is merely implied.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: Async Commit, v21 (now: v22)

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Without async commits? Do we really want the walwriter doing the
> majority of the wal-flushing work for normal commits? It seems like
> that's not going to be any advantage over just having some random
> backend do the commit.

Sure: the advantage is that the backends (ie, user query processing)
don't get blocked on fsync's.  This is not really different from the
rationale for having the bgwriter.  It's probably most useful for large
transactions, which up to now generally had to stop and flush the WAL
buffers every few pages worth of WAL output.

            regards, tom lane

Re: Async Commit, v21 (now: v22)

From
Greg Smith
Date:
On Tue, 24 Jul 2007, Gregory Stark wrote:

> Do we really want the walwriter doing the majority of the wal-flushing
> work for normal commits? It seems like that's not going to be any
> advantage over just having some random backend do the commit.

Might there be some advantage in high-throughput/multi-[cpu|core]
situations due to improved ability to keep that code in a single
processor?  CPU cache issues are turning into scalability bottlenecks in
so many designs I came across lately.  A distinct walwriter might be more
likely to be (or even be explicitly) bound to a processor and stay there
than when the code executes on any random backend.  The obvious flip side
is that increased moving of data between processors more often may balance
or even negate any potential improvement there.

More on the system tuning side, I know I've found that the background
writer is a separate process enormously helpful because of how it allows
monitoring the activity level of just it relative to the whole.  There are
enough WAL-bound systems out there (particularly when there's no caching
disk controller) that I could see similar value to being able to watch a
distinct walwriter.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Async Commit, v21 (now: v22)

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>> Without async commits? Do we really want the walwriter doing the
>> majority of the wal-flushing work for normal commits? It seems like
>> that's not going to be any advantage over just having some random
>> backend do the commit.
>
> Sure: the advantage is that the backends (ie, user query processing)
> don't get blocked on fsync's.  This is not really different from the
> rationale for having the bgwriter.  It's probably most useful for large
> transactions, which up to now generally had to stop and flush the WAL
> buffers every few pages worth of WAL output.

I wonder what it would take to offload the CRC calculation to the wal
writer. And if that would then become a bottleneck, making it actually
counterproductive.

No, not in this release :).

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: Async Commit, v21 (now: v22)

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> Without async commits? Do we really want the walwriter doing the
>> majority of the wal-flushing work for normal commits? It seems like
>> that's not going to be any advantage over just having some random
>> backend do the commit.
>
> Sure: the advantage is that the backends (ie, user query processing)
> don't get blocked on fsync's.  This is not really different from the
> rationale for having the bgwriter.

I'm puzzled though. How do they not get blocked on fsyncs? They can't proceed
past their commit until the fsync happens whether they do it themselves or the
walwriter does it.

> It's probably most useful for large transactions, which up to now generally
> had to stop and flush the WAL buffers every few pages worth of WAL output.

That could be useful though the backend doesn't have to fsync when it writes
out those buffers, does it?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Tue, 2007-07-24 at 10:01 -0400, Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
> > Without async commits? Do we really want the walwriter doing the
> > majority of the wal-flushing work for normal commits? It seems like
> > that's not going to be any advantage over just having some random
> > backend do the commit.
>
> Sure: the advantage is that the backends (ie, user query processing)
> don't get blocked on fsync's.  This is not really different from the
> rationale for having the bgwriter.

Let's measure things and set the defaults accordingly.

> It's probably most useful for large
> transactions, which up to now generally had to stop and flush the WAL
> buffers every few pages worth of WAL output.

That should be a reasonable gain from avoiding CPU/disk flip-flopping,
but we are still CPU bound on COPY. Will measure.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Sure: the advantage is that the backends (ie, user query processing)
>> don't get blocked on fsync's.  This is not really different from the
>> rationale for having the bgwriter.

> I'm puzzled though. How do they not get blocked on fsyncs? They can't proceed
> past their commit until the fsync happens whether they do it themselves or the
> walwriter does it.

Sure, they'll block on an fsync when they commit.  Even then, the
walwriter can be an advantage if it's already flushed previous WAL
blocks: writing and flushing one page of WAL is faster than writing
and flushing a lot of pages, no?

>> It's probably most useful for large transactions, which up to now generally
>> had to stop and flush the WAL buffers every few pages worth of WAL output.

> That could be useful though the backend doesn't have to fsync when it writes
> out those buffers, does it?

A lot of systems seem to favor synchronous write methods for WAL, in
which you effectively *do* fsync when you write.  There's also the
problem that if you have to write a dirty buffer, you must first ensure
WAL is fsync'd up through its LSN.  (So to some extent this is also
offloading work from the bgwriter.)

            regards, tom lane

Re: Async Commit, v21 (now: v22)

From
Alvaro Herrera
Date:
Florian Weimer wrote:

> I think fsync=off also endagers metadata, while synchronous_commit=off
> should be perfectly safe as far as the metadata is concerned.
> Wouldn't this be worth mentioning as well?

Is it true that a transaction is turned into sync commit as soon as it
writes on a system catalog?  Is it desirable to make it so?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Async Commit, v21 (now: v22)

From
"Jim C. Nasby"
Date:
On Tue, Jul 24, 2007 at 02:08:00PM -0400, Alvaro Herrera wrote:
> Florian Weimer wrote:
>
> > I think fsync=off also endagers metadata, while synchronous_commit=off
> > should be perfectly safe as far as the metadata is concerned.
> > Wouldn't this be worth mentioning as well?
>
> Is it true that a transaction is turned into sync commit as soon as it
> writes on a system catalog?  Is it desirable to make it so?

If we don't do that then regular users have the ability to put the
catalog (and by extension everything else) at risk... I think we need
that, though it would be nice if it wasn't an issue for temporary
objects (but I can't see any way to accomplish that).
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: Async Commit, v21 (now: v22)

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> On Tue, Jul 24, 2007 at 02:08:00PM -0400, Alvaro Herrera wrote:
>> Is it true that a transaction is turned into sync commit as soon as it
>> writes on a system catalog?  Is it desirable to make it so?

> If we don't do that then regular users have the ability to put the
> catalog (and by extension everything else) at risk...

How do you arrive at that conclusion?  The point of the async commit
patch is that transactions might be lost, as in not really committed,
but there can be no database corruption.  Otherwise we'd never consider
making it a userset config setting.

            regards, tom lane

Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Tue, 2007-07-24 at 16:25 -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > On Tue, Jul 24, 2007 at 02:08:00PM -0400, Alvaro Herrera wrote:
> >> Is it true that a transaction is turned into sync commit as soon as it
> >> writes on a system catalog?  Is it desirable to make it so?
>
> > If we don't do that then regular users have the ability to put the
> > catalog (and by extension everything else) at risk...
>
> How do you arrive at that conclusion?  The point of the async commit
> patch is that transactions might be lost, as in not really committed,
> but there can be no database corruption.  Otherwise we'd never consider
> making it a userset config setting.

There isn't an explicit test for writing to catalog tables; as Tom says
this is as safe as any other data.

There is an explicit test for whether the transaction has modified
files; if so the commit is always synchronous, even if explicitly
requested otherwise. Also, utility commands never perform async commits,
so overall there aren't that many of the commonly used DDL commands that
could be performed and still have it be an async commit.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

>> If we don't do that then regular users have the ability to put the
>> catalog (and by extension everything else) at risk...
>
> How do you arrive at that conclusion?  The point of the async commit
> patch is that transactions might be lost, as in not really committed,
> but there can be no database corruption.  Otherwise we'd never consider
> making it a userset config setting.

I think the danger that arises is not related to catalogs so much as it is
related to end-of-transaction filesystem operations such as dropping heap
files. If those operations are done but the related transaction commit is lost
then you have a problem.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> There is an explicit test for whether the transaction has modified
> files; if so the commit is always synchronous, even if explicitly
> requested otherwise. Also, utility commands never perform async commits,
> so overall there aren't that many of the commonly used DDL commands that
> could be performed and still have it be an async commit.

Huh?  I see neither a reason for these restrictions nor any way that you
could enforce them without horrid kluges.

            regards, tom lane

Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Tue, 2007-07-24 at 17:53 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > There is an explicit test for whether the transaction has modified
> > files; if so the commit is always synchronous, even if explicitly
> > requested otherwise. Also, utility commands never perform async commits,
> > so overall there aren't that many of the commonly used DDL commands that
> > could be performed and still have it be an async commit.
>
> Huh?  I see neither a reason for these restrictions

So that we don't lose DROP TABLEs, TRUNCATEs for example. If the
database crashed between removing the file and flushing the WAL then the
relfilenode would be incorrectly set for a non-temp table and we would
have no easy way of working out what to do. That seems like a problem to
me, if we let it occur; the patch avoids that trivially.

> nor any way that you
> could enforce them without horrid kluges.

In RecordTransactionCommit we do smgrGetPendingDeletes(), so we know if
there are files to be removed at commit time. If there are files to be
removed then in the async commit patch we override the setting of
synchronous_commit that was requested for that transaction so that the
transaction will always be synchronous.

So this works for DROP TABLE, DROP INDEX and TRUNCATE.

CREATE TABLE would not be such a problem since if we crash between
commit and WAL flush then the reference to the file would not be
present, even if the table would be. Data loss, but no catalog integrity
issue.

We *might* want to override the commit to be synchronous if there is
anything at all in the pending delete list, whether atCommit or atAbort.
This would then work for both CREATE and DROP. An additional smgr call
to do this would be fast and easy, as well as modular since there are
already calls from xact.c to smgr.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


Re: Async Commit, v21 (now: v22)

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Here's v23, including all suggested changes, plus some reworking of the
> transaction APIs to reduce the footprint of the patch.

Applied with some editorialization --- I found a few bugs, as well as
things that seemed they could be neater.  Notable was that the patch
would have significantly increased the contention for CLogControlLock by
requiring two lock cycles to fetch a transaction's commit status and
LSN.  I changed it so that TransactionIdGetStatus returns both pieces
of information.

            regards, tom lane

Re: Async Commit, v21 (now: v22)

From
"Simon Riggs"
Date:
On Wed, 2007-08-01 at 18:54 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > Here's v23, including all suggested changes, plus some reworking of the
> > transaction APIs to reduce the footprint of the patch.
>
> Applied with some editorialization ---

Thanks

>  I found a few bugs, as well as
> things that seemed they could be neater.

Keen to improve: any chance of explaining, possibly offlist?

>  Notable was that the patch
> would have significantly increased the contention for CLogControlLock by
> requiring two lock cycles to fetch a transaction's commit status and
> LSN.  I changed it so that TransactionIdGetStatus returns both pieces
> of information.

Thanks for picking that up.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com