Thread: PostgreSQL as a local in-memory cache

From:
"jgardner@jonathangardner.net"
Date:

We have a fairly unique need for a local, in-memory cache. This will
store data aggregated from other sources. Generating the data only
takes a few minutes, and it is updated often. There will be some
fairly expensive queries of arbitrary complexity run at a fairly high
rate. We're looking for high concurrency and reasonable performance
throughout.

The entire data set is roughly 20 MB in size. We've tried Carbonado in
front of SleepycatJE only to discover that it chokes at a fairly low
concurrency and that Carbonado's rule-based optimizer is wholly
insufficient for our needs. We've also tried Carbonado's Map
Repository which suffers the same problems.

I've since moved the backend database to a local PostgreSQL instance
hoping to take advantage of PostgreSQL's superior performance at high
concurrency. Of course, at the default settings, it performs quite
poorly compares to the Map Repository and Sleepycat JE.

My question is how can I configure the database to run as quickly as
possible if I don't care about data consistency or durability? That
is, the data is updated so often and it can be reproduced fairly
rapidly so that if there is a server crash or random particles from
space mess up memory we'd just restart the machine and move on.

I've never configured PostgreSQL to work like this and I thought maybe
someone here had some ideas on a good approach to this.

From:
"jgardner@jonathangardner.net"
Date:

On Jun 14, 7:14 pm, "jgard...@jonathangardner.net"
<jgard...@jonathangardner.net> wrote:
> We have a fairly unique need for a local, in-memory cache. This will
> store data aggregated from other sources. Generating the data only
> takes a few minutes, and it is updated often. There will be some
> fairly expensive queries of arbitrary complexity run at a fairly high
> rate. We're looking for high concurrency and reasonable performance
> throughout.
>
> The entire data set is roughly 20 MB in size. We've tried Carbonado in
> front of SleepycatJE only to discover that it chokes at a fairly low
> concurrency and that Carbonado's rule-based optimizer is wholly
> insufficient for our needs. We've also tried Carbonado's Map
> Repository which suffers the same problems.
>
> I've since moved the backend database to a local PostgreSQL instance
> hoping to take advantage of PostgreSQL's superior performance at high
> concurrency. Of course, at the default settings, it performs quite
> poorly compares to the Map Repository and Sleepycat JE.
>
> My question is how can I configure the database to run as quickly as
> possible if I don't care about data consistency or durability? That
> is, the data is updated so often and it can be reproduced fairly
> rapidly so that if there is a server crash or random particles from
> space mess up memory we'd just restart the machine and move on.
>
> I've never configured PostgreSQL to work like this and I thought maybe
> someone here had some ideas on a good approach to this.

Just to summarize what I've been able to accomplish so far. By turning
fsync and synchronize_commit off, and moving the data dir to tmpfs,
I've been able to run the expensive queries much faster than BDB or
the MapRepository that comes with Carbonado. This is because
PostgreSQL's planner is so much faster and better than whatever
Carbonado has. Tweaking indexes has only made things run faster.

Right now I'm wrapping up the project so that we can do some serious
performance benchmarks. I'll let you all know how it goes.

Also, just a note that setting up PostgreSQL for these weird scenarios
turned out to be just a tiny bit harder than setting up SQLite. I
remember several years ago when there was a push to simplify the
configuration and installation of PostgreSQL, and I believe that that
has born fruit.

From:
Josh Berkus
Date:

All,

So, I've been discussing this because using PostgreSQL on the caching
layer has become more common that I think most people realize.  Jonathan
is one of 4 companies I know of who are doing this, and with the growth
of Hadoop and other large-scale data-processing technologies, I think
demand will increase.

Especially as, in repeated tests, PostgreSQL with persistence turned off
is just as fast as the fastest nondurable NoSQL database.  And it has a
LOT more features.

Now, while fsync=off and tmpfs for WAL more-or-less eliminate the IO for
durability, they don't eliminate the CPU time.  Which means that a
caching version of PostgreSQL could be even faster.   To do that, we'd
need to:

a) Eliminate WAL logging entirely
b) Eliminate checkpointing
c) Turn off the background writer
d) Have PostgreSQL refuse to restart after a crash and instead call an
exteral script (for reprovisioning)

Of the three above, (a) is the most difficult codewise.  (b)(c) and (d)
should be relatively straightforwards, although I believe that we now
have the bgwriter doing some other essential work besides syncing
buffers.  There's also a narrower use-case in eliminating (a), since a
non-fsync'd server which was recording WAL could be used as part of a
replication chain.

This isn't on hackers because I'm not ready to start working on a patch,
but I'd like some feedback on the complexities of doing (b) and (c) as
well as how many people could use a non-persistant, in-memory postgres.

--
                                   -- Josh Berkus
                                      PostgreSQL Experts Inc.
                                      http://www.pgexperts.com

From:
"Pierre C"
Date:

> Especially as, in repeated tests, PostgreSQL with persistence turned off
> is just as fast as the fastest nondurable NoSQL database.  And it has a
> LOT more features.

An option to completely disable WAL for such use cases would make it a lot
faster, especially in the case of heavy concurrent writes.

> Now, while fsync=off and tmpfs for WAL more-or-less eliminate the IO for
> durability, they don't eliminate the CPU time.

Actually the WAL overhead is some CPU and lots of locking.

> Which means that a caching version of PostgreSQL could be even faster.
> To do that, we'd need to:
>
> a) Eliminate WAL logging entirely
> b) Eliminate checkpointing
> c) Turn off the background writer
> d) Have PostgreSQL refuse to restart after a crash and instead call an
> exteral script (for reprovisioning)
>
> Of the three above, (a) is the most difficult codewise.

Actually, it's pretty easy, look in xlog.c


From:
Dimitri Fontaine
Date:

Hi,

Josh Berkus <> writes:
> a) Eliminate WAL logging entirely
> b) Eliminate checkpointing
> c) Turn off the background writer
> d) Have PostgreSQL refuse to restart after a crash and instead call an
> exteral script (for reprovisioning)

Well I guess I'd prefer a per-transaction setting, allowing to bypass
WAL logging and checkpointing. Forcing the backend to care itself for
writing the data I'm not sure is a good thing, but if you say so.

Then you could have the GUC set for a whole cluster, only a database
etc. We already have synchronous_commit to trade durability against
performances, we could maybe support protect_data = off too.

The d) point I'm not sure still applies if you have per transaction
setting, which I think makes the most sense. The data you choose not to
protect is missing at restart, just add some way to register a hook
there. We already have one (shared_preload_libraries) but it requires
coding in C.

Calling a user function at the end of recovery and before accepting
connection would be good I think. A user function (per database) is
better than a script because if you want to run it before accepting
connections and still cause changes in the database…

Regards,
--
dim

From:
Tom Lane
Date:

Dimitri Fontaine <> writes:
> Josh Berkus <> writes:
>> a) Eliminate WAL logging entirely
>> b) Eliminate checkpointing
>> c) Turn off the background writer
>> d) Have PostgreSQL refuse to restart after a crash and instead call an
>> exteral script (for reprovisioning)

> Well I guess I'd prefer a per-transaction setting, allowing to bypass
> WAL logging and checkpointing.

Not going to happen; this is all or nothing.

> Forcing the backend to care itself for
> writing the data I'm not sure is a good thing, but if you say so.

Yeah, I think proposal (c) is likely to be a net loss.

(a) and (d) are probably simple, if by "reprovisioning" you mean
"rm -rf $PGDATA; initdb".  Point (b) will be a bit trickier because
there are various housekeeping activities tied into checkpoints.
I think you can't actually remove checkpoints altogether, just
skip the flush-dirty-pages part.

            regards, tom lane

From:
Greg Smith
Date:

Josh Berkus wrote:
> a) Eliminate WAL logging entirely
> c) Turn off the background writer

Note that if you turn off full_page_writes and set
bgwriter_lru_maxpages=0, you'd get a substantial move in both these
directions without touching any code.  Would help prove those as useful
directions to move toward or not.  The difference in WAL writes just
after a checkpoint in particular, due to the full_page_writes behavior,
is a significant portion of total WAL activity on most systems.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


From:
"Pierre C"
Date:

> Well I guess I'd prefer a per-transaction setting, allowing to bypass
> WAL logging and checkpointing. Forcing the backend to care itself for
> writing the data I'm not sure is a good thing, but if you say so.

Well if the transaction touches a system catalog it better be WAL-logged...

A per-table (or per-index) setting makes more sense IMHO. For instance "on
recovery, truncate this table" (this was mentioned before).
Another option would be "make the table data safe, but on recovery,
destroy and rebuild this index" : because on a not so large, often updated
table, with often updated indexes, it may not take long to rebuild the
indexes, but all those wal-logged index updates do add some overhead.


From:
Josh Berkus
Date:

> Well I guess I'd prefer a per-transaction setting, allowing to bypass
> WAL logging and checkpointing.

Not even conceiveable.  For this to work, we're talking about the whole
database installation.  This is only a set of settings for a database
*server* which is considered disposable and replaceable, where if it
shuts down unexpectedly, you throw it away and replace it.

> Forcing the backend to care itself for
> writing the data I'm not sure is a good thing, but if you say so.

Oh, yeah, I guess we'd only be turning off the LRU cache operations of
the background writer.  Same with checkpoints.  Copying between
shared_buffers and the LRU cache would still happen.

> Calling a user function at the end of recovery and before accepting
> connection would be good I think. A user function (per database) is
> better than a script because if you want to run it before accepting
> connections and still cause changes in the database…

Hmmm, you're not quite following my idea.  There is no recovery.  If the
database shuts down unexpectedly, it's toast and you replace it from
another copy somewhere else.

> (a) and (d) are probably simple, if by "reprovisioning" you mean
> "rm -rf $PGDATA; initdb".

Exactly.  Followed by "scp database_image".  Or heck, just replacing the
whole VM.

> Point (b) will be a bit trickier because
> there are various housekeeping activities tied into checkpoints.
> I think you can't actually remove checkpoints altogether, just
> skip the flush-dirty-pages part.

Yes, and we'd want to flush dirty pages on an actual shutdown command.
We do want to be able to shut down the DB on purpose.

> Well if the transaction touches a system catalog it better be
> WAL-logged...

Given the above, why?


--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

From:
Tom Lane
Date:

Josh Berkus <> writes:
>> (a) and (d) are probably simple, if by "reprovisioning" you mean
>> "rm -rf $PGDATA; initdb".

> Exactly.  Followed by "scp database_image".  Or heck, just replacing the
> whole VM.

Right, that would work.  I don't think you really need to implement that
inside Postgres.  I would envision having the startup script do it, ie

    rm -rf $PGDATA
    cp -pr prepared-database-image $PGDATA

    # this loop exits when postmaster exits normally
    while ! postmaster ...
    do
        rm -rf $PGDATA
        cp -pr prepared-database-image $PGDATA
    done

Then all you need is a tweak to make the postmaster exit(1) after
a crash instead of trying to launch recovery.

            regards, tom lane

From:
Matthew Wakeling
Date:

Dimitri Fontaine wrote:
>> Well I guess I'd prefer a per-transaction setting

Not possible, as many others have said. As soon as you make an unsafe
transaction, all the other transactions have nothing to rely on.

On Thu, 17 Jun 2010, Pierre C wrote:
> A per-table (or per-index) setting makes more sense IMHO. For instance "on
> recovery, truncate this table" (this was mentioned before).

That would be much more valuable.

I'd like to point out the costs involved in having a whole separate
"version" of Postgres that has all this safety switched off. Package
managers will not thank anyone for having to distribute another version of
the system, and woe betide the user who installs the wrong version because
"it runs faster". No, this is much better as a configurable option.

Going back to the "on recovery, truncate this table". We already have a
mechanism for skipping the WAL writes on an entire table - we do that for
tables that have been created in the current transaction. It would surely
be a small step to allow this to be configurably permanent on a particular
table.

Moreover, we already have a mechanism for taking a table that has had
non-logged changes, and turning it into a fully logged table - we do that
to the above mentioned tables when the transaction commits. I would
strongly recommend providing an option to ALTER TABLE MAKE SAFE, which may
involve some more acrobatics if the table is currently in use by multiple
transactions, but would be valuable.

This would allow users to create "temporary tables" that can be shared by
several connections. It would also allow bulk loading in parallel of a
single large table.

With these suggestions, we would still need to WAL-log all the metadata
changes, but I think in most circumstances that is not going to be a large
burden on performance.

Matthew

--
 Picard: I was just paid a visit from Q.
 Riker:  Q! Any idea what he's up to?
 Picard: No. He said he wanted to be "nice" to me.
 Riker:  I'll alert the crew.

From:
"Pierre C"
Date:

> I'd like to point out the costs involved in having a whole separate
> "version"

It must be a setting, not a version.

For instance suppose you have a session table for your website and a users
table.

- Having ACID on the users table is of course a must ;
- for the sessions table you can drop the "D"

Server crash would force all users to re-login on your website but if your
server crashes enough that your users complain about that, you have
another problem anyway. Having the sessions table not WAL-logged (ie
faster) would not prevent you from having sessions.user_id REFERENCES
users( user_id ) ... so mixing safe and unsafe tables would be much more
powerful than just having unsafe tables.

And I really like the idea of non-WAL-logged indexes, too, since they can
be rebuilt as needed, the DBA could decide between faster index updates
but rebuild on crash, or normal updates and fast recovery.

Also materialized views etc, you can rebuild them on crash and the added
update speed would be good.

> Moreover, we already have a mechanism for taking a table that has had
> non-logged changes, and turning it into a fully logged table - we do
> that to the above mentioned tables when the transaction commits. I would
> strongly recommend providing an option to ALTER TABLE MAKE SAFE, which
> may involve some more acrobatics if the table is currently in use by
> multiple transactions, but would be valuable.

I believe the old discussions called this ALTER TABLE SET PERSISTENCE.

> This would allow users to create "temporary tables" that can be shared
> by several connections. It would also allow bulk loading in parallel of
> a single large table.

This would need to WAL-log the entire table to send it to the slaves if
replication is enabled, but it's a lot faster than replicating each record.


From:
Josh Berkus
Date:

> It must be a setting, not a version.
>
> For instance suppose you have a session table for your website and a
> users table.
>
> - Having ACID on the users table is of course a must ;
> - for the sessions table you can drop the "D"

You're trying to solve a different use-case than the one I am.

Your use-case will be solved by global temporary tables.  I suggest that
you give Robert Haas some help & feedback on that.

My use case is people using PostgreSQL as a cache, or relying entirely
on replication for durability.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

From:
Josh Berkus
Date:

On 6/18/10 2:15 AM, Matthew Wakeling wrote:
> I'd like to point out the costs involved in having a whole separate
> "version" of Postgres that has all this safety switched off. Package
> managers will not thank anyone for having to distribute another version
> of the system, and woe betide the user who installs the wrong version
> because "it runs faster". No, this is much better as a configurable option.

Agreed, although initial alphas of this concept are likely to in fact be
a separate source code tree.  Eventually when we have it working well it
could become an initdb-time option.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

From:
Robert Haas
Date:

On Thu, Jun 17, 2010 at 1:29 PM, Josh Berkus <> wrote:
> a) Eliminate WAL logging entirely

In addition to global temporary tables, I am also planning to
implement unlogged tables, which are, precisely, tables for which no
WAL is written.  On restart, any such tables will be truncated.  That
should give you the ability to do this (by making all your tables
unlogged).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:
Bruce Momjian
Date:

Tom Lane wrote:
> Dimitri Fontaine <> writes:
> > Josh Berkus <> writes:
> >> a) Eliminate WAL logging entirely

If we elimiate WAL logging, that means a reinstall is required for even
a postmaster crash, which is a new non-durable behavior.

Also, we just added wal_level = minimal, which might end up being a poor
name choice of we want wal_level = off in PG 9.1.  Perhaps we should
have used wal_level = crash_safe in 9.0.

I have added the following TODO:

    Consider a non-crash-safe wal_level that eliminates WAL activity

        * http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

From:
Bruce Momjian
Date:

Tom Lane wrote:
> Dimitri Fontaine <> writes:
> > Josh Berkus <> writes:
> >> a) Eliminate WAL logging entirely
> >> b) Eliminate checkpointing
> >> c) Turn off the background writer
> >> d) Have PostgreSQL refuse to restart after a crash and instead call an
> >> exteral script (for reprovisioning)
>
> > Well I guess I'd prefer a per-transaction setting, allowing to bypass
> > WAL logging and checkpointing.
>
> Not going to happen; this is all or nothing.
>
> > Forcing the backend to care itself for
> > writing the data I'm not sure is a good thing, but if you say so.
>
> Yeah, I think proposal (c) is likely to be a net loss.
>
> (a) and (d) are probably simple, if by "reprovisioning" you mean
> "rm -rf $PGDATA; initdb".  Point (b) will be a bit trickier because
> there are various housekeeping activities tied into checkpoints.
> I think you can't actually remove checkpoints altogether, just
> skip the flush-dirty-pages part.

Based on this thread, I have developed the following documentation patch
that outlines the performance enhancements possible if durability is not
required.  The patch also documents that synchronous_commit = false has
potential committed transaction loss from a database crash (as well as
an OS crash).

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.282
diff -c -c -r1.282 config.sgml
*** doc/src/sgml/config.sgml    22 Jun 2010 02:57:49 -0000    1.282
--- doc/src/sgml/config.sgml    23 Jun 2010 18:53:26 -0000
***************
*** 1463,1469 ****
          really guaranteed to be safe against a server crash.  (The maximum
          delay is three times <xref linkend="guc-wal-writer-delay">.)  Unlike
          <xref linkend="guc-fsync">, setting this parameter to <literal>off</>
!         does not create any risk of database inconsistency: a crash might
          result in some recent allegedly-committed transactions being lost, but
          the database state will be just the same as if those transactions had
          been aborted cleanly.  So, turning <varname>synchronous_commit</> off
--- 1463,1470 ----
          really guaranteed to be safe against a server crash.  (The maximum
          delay is three times <xref linkend="guc-wal-writer-delay">.)  Unlike
          <xref linkend="guc-fsync">, setting this parameter to <literal>off</>
!         does not create any risk of database inconsistency: an operating
!         system or database crash crash might
          result in some recent allegedly-committed transactions being lost, but
          the database state will be just the same as if those transactions had
          been aborted cleanly.  So, turning <varname>synchronous_commit</> off
Index: doc/src/sgml/perform.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v
retrieving revision 1.80
diff -c -c -r1.80 perform.sgml
*** doc/src/sgml/perform.sgml    29 May 2010 21:08:04 -0000    1.80
--- doc/src/sgml/perform.sgml    23 Jun 2010 18:53:26 -0000
***************
*** 1104,1107 ****
--- 1104,1169 ----
    </sect2>
    </sect1>

+   <sect1 id="non-durability">
+    <title>Non-Durable Settings</title>
+
+    <indexterm zone="non-durability">
+     <primary>non-durable</primary>
+    </indexterm>
+
+    <para>
+     Durability is a database feature that guarantees the recording of
+     committed transactions even if if the server crashes or loses
+     power.  However, durability adds significant database overhead,
+     so if your site does not require such a guarantee,
+     <productname>PostgreSQL</productname> can be configured to run
+     much faster.  The following are configuration changes you can make
+     to improve performance in such cases;  they do not invalidate
+     commit guarantees related to database crashes, only abrupt operating
+     system stoppage, except as mentioned below:
+
+     <itemizedlist>
+      <listitem>
+       <para>
+        Place the database cluster's data directory in a memory-backed
+        file system (i.e. <acronym>RAM</> disk).  This eliminates all
+        database disk I/O, but limits data storage to the amount of
+        available memory (and perhaps swap).
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Turn off <xref linkend="guc-fsync">;  there is no need to flush
+        data to disk.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Turn off <xref linkend="guc-full-page-writes">;  there is no need
+        to guard against partial page writes.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Increase <xref linkend="guc-checkpoint-segments"> and <xref
+        linkend="guc-checkpoint-timeout"> ; this reduces the frequency
+        of checkpoints, but increases the storage requirements of
+        <filename>/pg_xlog</>.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Turn off <xref linkend="guc-synchronous-commit">;  there might be no
+        need to write the <acronym>WAL</acronym> to disk on every
+        commit.  This does affect database crash transaction durability.
+       </para>
+      </listitem>
+     </itemizedlist>
+    </para>
+   </sect1>
+
   </chapter>

From:
Pavel Stehule
Date:

2010/6/23 Bruce Momjian <>:
> Tom Lane wrote:
>> Dimitri Fontaine <> writes:
>> > Josh Berkus <> writes:
>> >> a) Eliminate WAL logging entirely
>
> If we elimiate WAL logging, that means a reinstall is required for even
> a postmaster crash, which is a new non-durable behavior.
>
> Also, we just added wal_level = minimal, which might end up being a poor
> name choice of we want wal_level = off in PG 9.1.  Perhaps we should
> have used wal_level = crash_safe in 9.0.
>
> I have added the following TODO:
>
>        Consider a non-crash-safe wal_level that eliminates WAL activity
>
>            * http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php
>
> --

isn't fsync to off enought?

Regards

Pavel

>  Bruce Momjian  <>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + None of us is going to be here forever. +
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

From:
Bruce Momjian
Date:

Pavel Stehule wrote:
> 2010/6/23 Bruce Momjian <>:
> > Tom Lane wrote:
> >> Dimitri Fontaine <> writes:
> >> > Josh Berkus <> writes:
> >> >> a) Eliminate WAL logging entirely
> >
> > If we elimiate WAL logging, that means a reinstall is required for even
> > a postmaster crash, which is a new non-durable behavior.
> >
> > Also, we just added wal_level = minimal, which might end up being a poor
> > name choice of we want wal_level = off in PG 9.1. ?Perhaps we should
> > have used wal_level = crash_safe in 9.0.
> >
> > I have added the following TODO:
> >
> > ? ? ? ?Consider a non-crash-safe wal_level that eliminates WAL activity
> >
> > ? ? ? ? ? ?* http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php
> >
> > --
>
> isn't fsync to off enought?

Well,  testing reported in the thread showed other settings also help,
though the checkpoint lengthening was not tested.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

From:
Robert Haas
Date:

On Wed, Jun 23, 2010 at 3:37 PM, Bruce Momjian <> wrote:
> Tom Lane wrote:
>> Dimitri Fontaine <> writes:
>> > Josh Berkus <> writes:
>> >> a) Eliminate WAL logging entirely
>
> If we elimiate WAL logging, that means a reinstall is required for even
> a postmaster crash, which is a new non-durable behavior.
>
> Also, we just added wal_level = minimal, which might end up being a poor
> name choice of we want wal_level = off in PG 9.1.  Perhaps we should
> have used wal_level = crash_safe in 9.0.
>
> I have added the following TODO:
>
>        Consider a non-crash-safe wal_level that eliminates WAL activity
>
>            * http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php

I don't think we need a system-wide setting for that.  I believe that
the unlogged tables I'm working on will handle that case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:
Dave Page
Date:

On Wed, Jun 23, 2010 at 9:25 PM, Robert Haas <> wrote:
> On Wed, Jun 23, 2010 at 3:37 PM, Bruce Momjian <> wrote:
>> Tom Lane wrote:
>>> Dimitri Fontaine <> writes:
>>> > Josh Berkus <> writes:
>>> >> a) Eliminate WAL logging entirely
>>
>> If we elimiate WAL logging, that means a reinstall is required for even
>> a postmaster crash, which is a new non-durable behavior.
>>
>> Also, we just added wal_level = minimal, which might end up being a poor
>> name choice of we want wal_level = off in PG 9.1.  Perhaps we should
>> have used wal_level = crash_safe in 9.0.
>>
>> I have added the following TODO:
>>
>>        Consider a non-crash-safe wal_level that eliminates WAL activity
>>
>>            * http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php
>
> I don't think we need a system-wide setting for that.  I believe that
> the unlogged tables I'm working on will handle that case.

Aren't they going to be truncated at startup? If the entire system is
running without WAL, we would only need to do that in case of an
unclean shutdown wouldn't we?


--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

From:
Bruce Momjian
Date:

Robert Haas wrote:
> On Wed, Jun 23, 2010 at 3:37 PM, Bruce Momjian <> wrote:
> > Tom Lane wrote:
> >> Dimitri Fontaine <> writes:
> >> > Josh Berkus <> writes:
> >> >> a) Eliminate WAL logging entirely
> >
> > If we elimiate WAL logging, that means a reinstall is required for even
> > a postmaster crash, which is a new non-durable behavior.
> >
> > Also, we just added wal_level = minimal, which might end up being a poor
> > name choice of we want wal_level = off in PG 9.1. ?Perhaps we should
> > have used wal_level = crash_safe in 9.0.
> >
> > I have added the following TODO:
> >
> > ? ? ? ?Consider a non-crash-safe wal_level that eliminates WAL activity
> >
> > ? ? ? ? ? ?* http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php
>
> I don't think we need a system-wide setting for that.  I believe that
> the unlogged tables I'm working on will handle that case.

Uh, will we have some global unlogged setting, like for the system
tables and stuff?  It seems like an heavy burden to tell people they
have to create ever object as unlogged, and we would still generate log
for things like transaction commits.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

From:
Tom Lane
Date:

Dave Page <> writes:
> On Wed, Jun 23, 2010 at 9:25 PM, Robert Haas <> wrote:
>> I don't think we need a system-wide setting for that. �I believe that
>> the unlogged tables I'm working on will handle that case.

> Aren't they going to be truncated at startup? If the entire system is
> running without WAL, we would only need to do that in case of an
> unclean shutdown wouldn't we?

The problem with a system-wide no-WAL setting is it means you can't
trust the system catalogs after a crash.  Which means you are forced to
use initdb to recover from any crash, in return for not a lot of savings
(for typical usages where there's not really much churn in the
catalogs).  I tend to agree with Robert that a way to not log content
updates for individual user tables is likely to be much more useful in
practice.

            regards, tom lane

From:
Bruce Momjian
Date:

Tom Lane wrote:
> Dave Page <> writes:
> > On Wed, Jun 23, 2010 at 9:25 PM, Robert Haas <> wrote:
> >> I don't think we need a system-wide setting for that. �I believe that
> >> the unlogged tables I'm working on will handle that case.
>
> > Aren't they going to be truncated at startup? If the entire system is
> > running without WAL, we would only need to do that in case of an
> > unclean shutdown wouldn't we?
>
> The problem with a system-wide no-WAL setting is it means you can't
> trust the system catalogs after a crash.  Which means you are forced to

True, and in fact any postmaster crash could lead to curruption.

> use initdb to recover from any crash, in return for not a lot of savings
> (for typical usages where there's not really much churn in the
> catalogs).  I tend to agree with Robert that a way to not log content
> updates for individual user tables is likely to be much more useful in
> practice.

OK, TODO removed.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

From:
Dimitri Fontaine
Date:

Tom Lane <> writes:
> The problem with a system-wide no-WAL setting is it means you can't
> trust the system catalogs after a crash.  Which means you are forced to
> use initdb to recover from any crash, in return for not a lot of savings
> (for typical usages where there's not really much churn in the
> catalogs).

What about having a "catalog only" WAL setting, userset ?

I'm not yet clear on the point but it well seems that the per
transaction WAL setting is impossible because of catalogs (meaning
mainly DDL support), but I can see us enforcing durability and crash
safety there.

That would probably mean that setting WAL level this low yet doing any
kind of DDL would need to be either an ERROR, or better yet, a WARNING
telling that the WAL level can not be that low so has been raised by the
system.

Regards,
--
dim

From:
Rob Wultsch
Date:

On Fri, Jun 18, 2010 at 1:55 PM, Josh Berkus <> wrote:
>
>> It must be a setting, not a version.
>>
>> For instance suppose you have a session table for your website and a
>> users table.
>>
>> - Having ACID on the users table is of course a must ;
>> - for the sessions table you can drop the "D"
>
> You're trying to solve a different use-case than the one I am.
>
> Your use-case will be solved by global temporary tables.  I suggest that
> you give Robert Haas some help & feedback on that.
>
> My use case is people using PostgreSQL as a cache, or relying entirely
> on replication for durability.
>
> --
>                                  -- Josh Berkus
>                                     PostgreSQL Experts Inc.
>                                     http://www.pgexperts.com
>


Is he? Wouldn't a global temporary table have content that is not
visible between db connections? A db session many not be the same as a
user session.

--
Rob Wultsch


From:
Robert Haas
Date:

On Thu, Jun 24, 2010 at 4:40 AM, Rob Wultsch <> wrote:
> On Fri, Jun 18, 2010 at 1:55 PM, Josh Berkus <> wrote:
>>
>>> It must be a setting, not a version.
>>>
>>> For instance suppose you have a session table for your website and a
>>> users table.
>>>
>>> - Having ACID on the users table is of course a must ;
>>> - for the sessions table you can drop the "D"
>>
>> You're trying to solve a different use-case than the one I am.
>>
>> Your use-case will be solved by global temporary tables.  I suggest that
>> you give Robert Haas some help & feedback on that.
>>
>> My use case is people using PostgreSQL as a cache, or relying entirely
>> on replication for durability.
>
> Is he? Wouldn't a global temporary table have content that is not
> visible between db connections? A db session many not be the same as a
> user session.
>

I'm planning to implement global temporary tables, which can have
different contents for each user session.

And I'm also planning to implement unlogged tables, which have the
same contents for all sessions but are not WAL-logged (and are
truncated on startup).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:
Josh Berkus
Date:

> And I'm also planning to implement unlogged tables, which have the
> same contents for all sessions but are not WAL-logged (and are
> truncated on startup).

Yep.  And it's quite possible that this will be adequate for most users.

And it's also possible that the extra CPU which Robert isn't getting rid
of (bgwriter, checkpointing, etc.) does not have a measurable impact on
performance.  At this point, my idea (which I call
"RunningWithScissorsDB") is only an idea for experimentation and
performance testing.  It's pretty far off from being a TODO.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

From:
Pavel Stehule
Date:

2010/6/24 Josh Berkus <>:
>
>> And I'm also planning to implement unlogged tables, which have the
>> same contents for all sessions but are not WAL-logged (and are
>> truncated on startup).

this is similar MySQL's memory tables. Personally, I don't see any
practical sense do same work on PostgreSQL now, when memcached exists.
Much more important is smarter cache controlling then we have now -
maybe with priorities for some tables and some operations
(applications) - sometimes we don't need use cache for extra large
scans.

Regards

Pavel Stehule


>
> Yep.  And it's quite possible that this will be adequate for most users.
>
> And it's also possible that the extra CPU which Robert isn't getting rid
> of (bgwriter, checkpointing, etc.) does not have a measurable impact on
> performance.  At this point, my idea (which I call
> "RunningWithScissorsDB") is only an idea for experimentation and
> performance testing.  It's pretty far off from being a TODO.
>

> --
>                                  -- Josh Berkus
>                                     PostgreSQL Experts Inc.
>                                     http://www.pgexperts.com
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

From:
"Joshua D. Drake"
Date:

On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote:
> 2010/6/24 Josh Berkus <>:
> >
> >> And I'm also planning to implement unlogged tables, which have the
> >> same contents for all sessions but are not WAL-logged (and are
> >> truncated on startup).
>
> this is similar MySQL's memory tables. Personally, I don't see any
> practical sense do same work on PostgreSQL now, when memcached exists.

Because memcache is yet another layer and increases overhead to the
application developers by adding yet another layer to work with. Non
logged tables would rock.

SELECT * FROM foo;

:D

JD




--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
From:
Pavel Stehule
Date:

2010/6/24 Joshua D. Drake <>:
> On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote:
>> 2010/6/24 Josh Berkus <>:
>> >
>> >> And I'm also planning to implement unlogged tables, which have the
>> >> same contents for all sessions but are not WAL-logged (and are
>> >> truncated on startup).
>>
>> this is similar MySQL's memory tables. Personally, I don't see any
>> practical sense do same work on PostgreSQL now, when memcached exists.
>
> Because memcache is yet another layer and increases overhead to the
> application developers by adding yet another layer to work with. Non
> logged tables would rock.

I see only one positive point - it can help to people with broken
design application with migration to PostgreSQL.

There are different interesting feature - cached procedure's results
like Oracle 11. - it's more general.

only idea.

For me memory tables are nonsens, but what about memory cached
materialised views (maybe periodically refreshed)?

Regards

Pavel

>
> SELECT * FROM foo;
>
> :D

:)
>
> JD
>
>
>
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
>
>

From:
"A.M."
Date:

On Jun 24, 2010, at 4:01 PM, Pavel Stehule wrote:

> 2010/6/24 Joshua D. Drake <>:
>> On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote:
>>> 2010/6/24 Josh Berkus <>:
>>>>
>>>>> And I'm also planning to implement unlogged tables, which have the
>>>>> same contents for all sessions but are not WAL-logged (and are
>>>>> truncated on startup).
>>>
>>> this is similar MySQL's memory tables. Personally, I don't see any
>>> practical sense do same work on PostgreSQL now, when memcached exists.
>>
>> Because memcache is yet another layer and increases overhead to the
>> application developers by adding yet another layer to work with. Non
>> logged tables would rock.
>
> I see only one positive point - it can help to people with broken
> design application with migration to PostgreSQL.

The broken design is being required to work around PostgreSQL's lack of this optimization.

>
> There are different interesting feature - cached procedure's results
> like Oracle 11. - it's more general.
>
> only idea.
>
> For me memory tables are nonsens, but what about memory cached
> materialised views (maybe periodically refreshed)?

Non-WAL-logged, non-fsynced tables are not equivalent to MySQL "memory tables". Such tables simply contain transient
information.One can already make "memory tables" in PostgreSQL by making a tablespace in a tmpfs partition. 

I have been eagerly waiting for this feature for six years so that I can write proper queries against ever-changing
sessiondata with transactional semantics (which memcached cannot offer). The only restriction I see for these transient
datatables is that they cannot be referenced by standard tables using foreign key constraints. Otherwise, these tables
behavelike any other. That's the benefit. 

Cheers,
M

From:
Pavel Stehule
Date:

2010/6/24 A.M. <>:
>
> On Jun 24, 2010, at 4:01 PM, Pavel Stehule wrote:
>
>> 2010/6/24 Joshua D. Drake <>:
>>> On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote:
>>>> 2010/6/24 Josh Berkus <>:
>>>>>
>>>>>> And I'm also planning to implement unlogged tables, which have the
>>>>>> same contents for all sessions but are not WAL-logged (and are
>>>>>> truncated on startup).
>>>>
>>>> this is similar MySQL's memory tables. Personally, I don't see any
>>>> practical sense do same work on PostgreSQL now, when memcached exists.
>>>
>>> Because memcache is yet another layer and increases overhead to the
>>> application developers by adding yet another layer to work with. Non
>>> logged tables would rock.
>>
>> I see only one positive point - it can help to people with broken
>> design application with migration to PostgreSQL.
>
> The broken design is being required to work around PostgreSQL's lack of this optimization.
>
>>
>> There are different interesting feature - cached procedure's results
>> like Oracle 11. - it's more general.
>>
>> only idea.
>>
>> For me memory tables are nonsens, but what about memory cached
>> materialised views (maybe periodically refreshed)?
>
> Non-WAL-logged, non-fsynced tables are not equivalent to MySQL "memory tables". Such tables simply contain transient
information.One can already make "memory tables" in PostgreSQL by making a tablespace in a tmpfs partition. 
>
> I have been eagerly waiting for this feature for six years so that I can write proper queries against ever-changing
sessiondata with transactional semantics (which memcached cannot offer). The only restriction I see for these transient
datatables is that they cannot be referenced by standard tables using foreign key constraints. Otherwise, these tables
behavelike any other. That's the benefit. 
>

if you remove WAL, then there are MVCC still - you have to do VACUUM,
you have to do ANALYZE, you have to thinking about indexes ...
Processing pipe for simple query is long too. The removing WAL doesn't
do memory database from Postgres. But You have to know best, what do
you do.

Regards

Pavel Stehule

p.s. maybe memcached is too simply for you - there are more NoSQL db

> Cheers,
> M
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

From:
Josh Berkus
Date:

> this is similar MySQL's memory tables. Personally, I don't see any
> practical sense do same work on PostgreSQL now, when memcached exists.

Thing is, if you only have one table (say, a sessions table) which you
don't want logged, you don't necessarily want to fire up a 2nd software
application just for that.  Plus, recent testing seems to show that with
no logging, memcached isn't really faster than PG.

Also, like for asynch_commit, this is something where users are
currently turning off fsync.  Any option where we can present users with
controlled, predictable data loss instead of random corruption is a good
one.

> Much more important is smarter cache controlling then we have now -
> maybe with priorities for some tables and some operations
> (applications) - sometimes we don't need use cache for extra large
> scans.

Well, that would be good *too*.  You working on it?  ;-)

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

From:
Pavel Stehule
Date:

2010/6/24 Josh Berkus <>:
>
>> this is similar MySQL's memory tables. Personally, I don't see any
>> practical sense do same work on PostgreSQL now, when memcached exists.
>
> Thing is, if you only have one table (say, a sessions table) which you
> don't want logged, you don't necessarily want to fire up a 2nd software
> application just for that.  Plus, recent testing seems to show that with
> no logging, memcached isn't really faster than PG.

sorry, I thinking some else. Not only WAL does significant overhead.
You need litlle bit more memory, much more processing time. With very
fast operations, the bottle neck will be in interprocess communication
- but it doesn't mean so pg isn't slower than memcached. I repeating
it again - there are no any universal tool for all tasks.

>
> Also, like for asynch_commit, this is something where users are
> currently turning off fsync.  Any option where we can present users with
> controlled, predictable data loss instead of random corruption is a good
> one.
>

it isn't too simple. What about statistics? These are used in system table.

>> Much more important is smarter cache controlling then we have now -
>> maybe with priorities for some tables and some operations
>> (applications) - sometimes we don't need use cache for extra large
>> scans.
>
> Well, that would be good *too*.  You working on it?  ;-)
>

no - just I know about possible problems with memory control.

> --
>                                  -- Josh Berkus
>                                     PostgreSQL Experts Inc.
>                                     http://www.pgexperts.com
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

From:
"Joshua D. Drake"
Date:

On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote:
> 2010/6/24 Josh Berkus <>:
> >
> >> And I'm also planning to implement unlogged tables, which have the
> >> same contents for all sessions but are not WAL-logged (and are
> >> truncated on startup).
>
> this is similar MySQL's memory tables. Personally, I don't see any
> practical sense do same work on PostgreSQL now, when memcached exists.

Because memcache is yet another layer and increases overhead to the
application developers by adding yet another layer to work with. Non
logged tables would rock.

SELECT * FROM foo;

:D

JD




--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering


From:
Bruce Momjian
Date:

Bruce Momjian wrote:
> Tom Lane wrote:
> > Dimitri Fontaine <> writes:
> > > Josh Berkus <> writes:
> > >> a) Eliminate WAL logging entirely
> > >> b) Eliminate checkpointing
> > >> c) Turn off the background writer
> > >> d) Have PostgreSQL refuse to restart after a crash and instead call an
> > >> exteral script (for reprovisioning)
> >
> > > Well I guess I'd prefer a per-transaction setting, allowing to bypass
> > > WAL logging and checkpointing.
> >
> > Not going to happen; this is all or nothing.
> >
> > > Forcing the backend to care itself for
> > > writing the data I'm not sure is a good thing, but if you say so.
> >
> > Yeah, I think proposal (c) is likely to be a net loss.
> >
> > (a) and (d) are probably simple, if by "reprovisioning" you mean
> > "rm -rf $PGDATA; initdb".  Point (b) will be a bit trickier because
> > there are various housekeeping activities tied into checkpoints.
> > I think you can't actually remove checkpoints altogether, just
> > skip the flush-dirty-pages part.
>
> Based on this thread, I have developed the following documentation patch
> that outlines the performance enhancements possible if durability is not
> required.  The patch also documents that synchronous_commit = false has
> potential committed transaction loss from a database crash (as well as
> an OS crash).

Applied.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

From:
Robert Haas
Date:

On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian <> wrote:
>> The patch also documents that synchronous_commit = false has
>> potential committed transaction loss from a database crash (as well as
>> an OS crash).

Is this actually true?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:
Bruce Momjian
Date:

Robert Haas wrote:
> On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian <> wrote:
> >> The patch also documents that synchronous_commit = false has
> >> potential committed transaction loss from a database crash (as well as
> >> an OS crash).
>
> Is this actually true?

I asked on IRC and was told it is true, and looking at the C code it
looks true.  What synchronous_commit = false does is to delay writing
the wal buffers to disk and fsyncing them, not just fsync, which is
where the commit loss due to db process crash comes from.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

From:
"Kevin Grittner"
Date:

Bruce Momjian <> wrote:

> What synchronous_commit = false does is to delay writing
> the wal buffers to disk and fsyncing them, not just fsync

Ah, that answers the question Josh Berkus asked here:

http://archives.postgresql.org/pgsql-performance/2010-06/msg00285.php

(which is something I was wondering about, too.)

-Kevin

From:
Robert Haas
Date:

On Tue, Jun 29, 2010 at 9:32 AM, Bruce Momjian <> wrote:
> Robert Haas wrote:
>> On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian <> wrote:
>> >> The patch also documents that synchronous_commit = false has
>> >> potential committed transaction loss from a database crash (as well as
>> >> an OS crash).
>>
>> Is this actually true?
>
> I asked on IRC and was told it is true, and looking at the C code it
> looks true.  What synchronous_commit = false does is to delay writing
> the wal buffers to disk and fsyncing them, not just fsync, which is
> where the commit loss due to db process crash comes from.

Ah, I see.  Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:
Bruce Momjian
Date:

Robert Haas wrote:
> On Tue, Jun 29, 2010 at 9:32 AM, Bruce Momjian <> wrote:
> > Robert Haas wrote:
> >> On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian <> wrote:
> >> >> The patch also documents that synchronous_commit = false has
> >> >> potential committed transaction loss from a database crash (as well as
> >> >> an OS crash).
> >>
> >> Is this actually true?
> >
> > I asked on IRC and was told it is true, and looking at the C code it
> > looks true. ?What synchronous_commit = false does is to delay writing
> > the wal buffers to disk and fsyncing them, not just fsync, which is
> > where the commit loss due to db process crash comes from.
>
> Ah, I see.  Thanks.

I am personally surprised it was designed that way;  I thought we would
just delay fsync.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

From:
Tom Lane
Date:

Bruce Momjian <> writes:
>>> I asked on IRC and was told it is true, and looking at the C code it
>>> looks true. ?What synchronous_commit = false does is to delay writing
>>> the wal buffers to disk and fsyncing them, not just fsync, which is
>>> where the commit loss due to db process crash comes from.

>> Ah, I see.  Thanks.

> I am personally surprised it was designed that way;  I thought we would
> just delay fsync.

That would require writing and syncing to be separable actions.  If
you're using O_SYNC or similar, they aren't.

            regards, tom lane

From:
Bruce Momjian
Date:

Tom Lane wrote:
> Bruce Momjian <> writes:
> >>> I asked on IRC and was told it is true, and looking at the C code it
> >>> looks true. ?What synchronous_commit = false does is to delay writing
> >>> the wal buffers to disk and fsyncing them, not just fsync, which is
> >>> where the commit loss due to db process crash comes from.
>
> >> Ah, I see.  Thanks.
>
> > I am personally surprised it was designed that way;  I thought we would
> > just delay fsync.
>
> That would require writing and syncing to be separable actions.  If
> you're using O_SYNC or similar, they aren't.

Ah, very good point.  I have added a C comment to clarify why this is
the current behavior;  attached and applied.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.291
diff -c -c -r1.291 xact.c
*** src/backend/access/transam/xact.c    13 May 2010 11:39:30 -0000    1.291
--- src/backend/access/transam/xact.c    29 Jun 2010 18:33:47 -0000
***************
*** 1028,1034 ****
      if (XactSyncCommit || forceSyncCommit || haveNonTemp)
      {
          /*
!          * Synchronous commit case.
           *
           * Sleep before flush! So we can flush more than one commit records
           * per single fsync.  (The idea is some other backend may do the
--- 1028,1034 ----
      if (XactSyncCommit || forceSyncCommit || haveNonTemp)
      {
          /*
!          * Synchronous commit case:
           *
           * Sleep before flush! So we can flush more than one commit records
           * per single fsync.  (The idea is some other backend may do the
***************
*** 1054,1060 ****
      else
      {
          /*
!          * Asynchronous commit case.
           *
           * Report the latest async commit LSN, so that the WAL writer knows to
           * flush this commit.
--- 1054,1065 ----
      else
      {
          /*
!          * Asynchronous commit case:
!          *
!          * This enables possible committed transaction loss in the case of a
!          * postmaster crash because WAL buffers are left unwritten.
!          * Ideally we could issue the WAL write without the fsync, but
!          * some wal_sync_methods do not allow separate write/fsync.
           *
           * Report the latest async commit LSN, so that the WAL writer knows to
           * flush this commit.

From:
Jignesh Shah
Date:

On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian <> wrote:
> Tom Lane wrote:
>> Bruce Momjian <> writes:
>> >>> I asked on IRC and was told it is true, and looking at the C code it
>> >>> looks true. ?What synchronous_commit = false does is to delay writing
>> >>> the wal buffers to disk and fsyncing them, not just fsync, which is
>> >>> where the commit loss due to db process crash comes from.
>>
>> >> Ah, I see.  Thanks.
>>
>> > I am personally surprised it was designed that way;  I thought we would
>> > just delay fsync.
>>
>> That would require writing and syncing to be separable actions.  If
>> you're using O_SYNC or similar, they aren't.
>
> Ah, very good point.  I have added a C comment to clarify why this is
> the current behavior;  attached and applied.
>
> --
>  Bruce Momjian  <>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com


Though has anybody seen a behaviour where synchronous_commit=off is
slower than synchronous_commit=on  ? Again there are two cases here
one with O_* flag and other with f*sync flags. But I had seen that
behavior with PostgreSQL 9.0 beta(2 I think) though havent really
investigated it much yet .. (though now I dont remember which
wal_sync_method flag) . Just curious if anybody has seen that
behavior..

Regards,
Jignesh

From:
Bruce Momjian
Date:

Jignesh Shah wrote:
> On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian <> wrote:
> > Tom Lane wrote:
> >> Bruce Momjian <> writes:
> >> >>> I asked on IRC and was told it is true, and looking at the C code it
> >> >>> looks true. ?What synchronous_commit = false does is to delay writing
> >> >>> the wal buffers to disk and fsyncing them, not just fsync, which is
> >> >>> where the commit loss due to db process crash comes from.
> >>
> >> >> Ah, I see. ?Thanks.
> >>
> >> > I am personally surprised it was designed that way; ?I thought we would
> >> > just delay fsync.
> >>
> >> That would require writing and syncing to be separable actions. ?If
> >> you're using O_SYNC or similar, they aren't.
> >
> > Ah, very good point. ?I have added a C comment to clarify why this is
> > the current behavior; ?attached and applied.
> >
> > --
> > ?Bruce Momjian ?<> ? ? ? ?http://momjian.us
> > ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com
>
>
> Though has anybody seen a behaviour where synchronous_commit=off is
> slower than synchronous_commit=on  ? Again there are two cases here
> one with O_* flag and other with f*sync flags. But I had seen that
> behavior with PostgreSQL 9.0 beta(2 I think) though havent really
> investigated it much yet .. (though now I dont remember which
> wal_sync_method flag) . Just curious if anybody has seen that
> behavior..

I have trouble believing how synchronous_commit=off could be slower than
'on'.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

From:
Brad Nicholson
Date:

On Tue, 2010-06-29 at 21:39 -0400, Bruce Momjian wrote:
> Jignesh Shah wrote:
> > On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian <> wrote:
> > > Tom Lane wrote:
> > >> Bruce Momjian <> writes:
> > >> >>> I asked on IRC and was told it is true, and looking at the C code it
> > >> >>> looks true. ?What synchronous_commit = false does is to delay writing
> > >> >>> the wal buffers to disk and fsyncing them, not just fsync, which is
> > >> >>> where the commit loss due to db process crash comes from.
> > >>
> > >> >> Ah, I see. ?Thanks.
> > >>
> > >> > I am personally surprised it was designed that way; ?I thought we would
> > >> > just delay fsync.
> > >>
> > >> That would require writing and syncing to be separable actions. ?If
> > >> you're using O_SYNC or similar, they aren't.
> > >
> > > Ah, very good point. ?I have added a C comment to clarify why this is
> > > the current behavior; ?attached and applied.
> > >
> > > --
> > > ?Bruce Momjian ?<> ? ? ? ?http://momjian.us
> > > ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com
> >
> >
> > Though has anybody seen a behaviour where synchronous_commit=off is
> > slower than synchronous_commit=on  ? Again there are two cases here
> > one with O_* flag and other with f*sync flags. But I had seen that
> > behavior with PostgreSQL 9.0 beta(2 I think) though havent really
> > investigated it much yet .. (though now I dont remember which
> > wal_sync_method flag) . Just curious if anybody has seen that
> > behavior..
>
> I have trouble believing how synchronous_commit=off could be slower than
> 'on'.
>

I wonder if it could be contention on wal buffers?

Say I've turned synchronous_commit off, I drive enough traffic fill up
my wal_buffers.  I assume that we would have to start writing buffers
down to disk before allocating to the new process.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



From:
Bruce Momjian
Date:

Brad Nicholson wrote:
> > > > Ah, very good point. ?I have added a C comment to clarify why this is
> > > > the current behavior; ?attached and applied.
> > > >
> > > > --
> > > > ?Bruce Momjian ?<> ? ? ? ?http://momjian.us
> > > > ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com
> > >
> > >
> > > Though has anybody seen a behaviour where synchronous_commit=off is
> > > slower than synchronous_commit=on  ? Again there are two cases here
> > > one with O_* flag and other with f*sync flags. But I had seen that
> > > behavior with PostgreSQL 9.0 beta(2 I think) though havent really
> > > investigated it much yet .. (though now I dont remember which
> > > wal_sync_method flag) . Just curious if anybody has seen that
> > > behavior..
> >
> > I have trouble believing how synchronous_commit=off could be slower than
> > 'on'.
> >
>
> I wonder if it could be contention on wal buffers?
>
> Say I've turned synchronous_commit off, I drive enough traffic fill up
> my wal_buffers.  I assume that we would have to start writing buffers
> down to disk before allocating to the new process.

Uh, good question.  I know this report showed ynchronous_commit=off as
faster than 'on':

    http://archives.postgresql.org/pgsql-performance/2010-06/msg00277.php

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

From:
Dave Crooke
Date:

I haven't jumped in yet on this thread, but here goes ....

If you're really looking for query performance, then any database which is designed with reliability and ACID consistency in mind is going to inherently have some mis-fit features.

Some other ideas to consider, depending on your query mix:

1. MySQL with the MyISAM database (non-ACID)

2. Put an in-application generic query cache in front of the DB, that runs in the app address space, e.g. Cache' if using Java

3. Using a DB is a good way to get generic querying capability, but if the "where" clause in the querying is over a small set of meta-data, and SQL syntax is not a big requirement, consider non-RDBMS alternatives, e.g. use XPath over a W3C DOM object tree to get primary keys to in-memory hash tables (possibly distributed with something like memcached)

On Mon, Jun 14, 2010 at 9:14 PM, <> wrote:
We have a fairly unique need for a local, in-memory cache. This will
store data aggregated from other sources. Generating the data only
takes a few minutes, and it is updated often. There will be some
fairly expensive queries of arbitrary complexity run at a fairly high
rate. We're looking for high concurrency and reasonable performance
throughout.

The entire data set is roughly 20 MB in size. We've tried Carbonado in
front of SleepycatJE only to discover that it chokes at a fairly low
concurrency and that Carbonado's rule-based optimizer is wholly
insufficient for our needs. We've also tried Carbonado's Map
Repository which suffers the same problems.

I've since moved the backend database to a local PostgreSQL instance
hoping to take advantage of PostgreSQL's superior performance at high
concurrency. Of course, at the default settings, it performs quite
poorly compares to the Map Repository and Sleepycat JE.

My question is how can I configure the database to run as quickly as
possible if I don't care about data consistency or durability? That
is, the data is updated so often and it can be reproduced fairly
rapidly so that if there is a server crash or random particles from
space mess up memory we'd just restart the machine and move on.

I've never configured PostgreSQL to work like this and I thought maybe
someone here had some ideas on a good approach to this.

--
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

From:
Craig James
Date:

On 6/30/10 9:42 AM, Dave Crooke wrote:
> I haven't jumped in yet on this thread, but here goes ....
>
> If you're really looking for query performance, then any database which
> is designed with reliability and ACID consistency in mind is going to
> inherently have some mis-fit features.
>
> Some other ideas to consider, depending on your query mix:
>
> 1. MySQL with the MyISAM database (non-ACID)
>
> 2. Put an in-application generic query cache in front of the DB, that
> runs in the app address space, e.g. Cache' if using Java
>
> 3. Using a DB is a good way to get generic querying capability, but if
> the "where" clause in the querying is over a small set of meta-data, and
> SQL syntax is not a big requirement, consider non-RDBMS alternatives,
> e.g. use XPath over a W3C DOM object tree to get primary keys to
> in-memory hash tables (possibly distributed with something like memcached)

These would be good suggestions if the "throwaway" database was the only one.  But in real life, these throwaway
databasesare built from other databases that are NOT throwaway, where the data matters and ACID is critical.  In other
words,they'll probably need Postgres anyway. 

Sure, you could use both Postgres and MySQL/ISAM, but that means installing and maintaining both, plus building all of
theother application layers to work on both systems. 

Craig

From:
Jignesh Shah
Date:

On Tue, Jun 29, 2010 at 9:39 PM, Bruce Momjian <> wrote:
> Jignesh Shah wrote:
>> On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian <> wrote:
>> > Tom Lane wrote:
>> >> Bruce Momjian <> writes:
>> >> >>> I asked on IRC and was told it is true, and looking at the C code it
>> >> >>> looks true. ?What synchronous_commit = false does is to delay writing
>> >> >>> the wal buffers to disk and fsyncing them, not just fsync, which is
>> >> >>> where the commit loss due to db process crash comes from.
>> >>
>> >> >> Ah, I see. ?Thanks.
>> >>
>> >> > I am personally surprised it was designed that way; ?I thought we would
>> >> > just delay fsync.
>> >>
>> >> That would require writing and syncing to be separable actions. ?If
>> >> you're using O_SYNC or similar, they aren't.
>> >
>> > Ah, very good point. ?I have added a C comment to clarify why this is
>> > the current behavior; ?attached and applied.
>> >
>> > --
>> > ?Bruce Momjian ?<> ? ? ? ?http://momjian.us
>> > ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com
>>
>>
>> Though has anybody seen a behaviour where synchronous_commit=off is
>> slower than synchronous_commit=on  ? Again there are two cases here
>> one with O_* flag and other with f*sync flags. But I had seen that
>> behavior with PostgreSQL 9.0 beta(2 I think) though havent really
>> investigated it much yet .. (though now I dont remember which
>> wal_sync_method flag) . Just curious if anybody has seen that
>> behavior..
>
> I have trouble believing how synchronous_commit=off could be slower than
> 'on'.
>
> --
>  Bruce Momjian  <>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + None of us is going to be here forever. +
>

Hi Bruce,

Let me clarify  the problem a bit.. If the underlying WAL disk is SSD
then it seems I can get synchronous_commit=on to work faster than
synchronous_commit=off.. Yes sounds unintuitive to me. But the results
seems to point in that direction. It could be that it hit some other
bottleneck with synchronous_commit=off reaches that
synchronous_commit=on does not hit (or has not hit yet).

 Brads point of wal buffers could be valid. Though typically I havent
seen the need to increase it beyond 1024kB yet.

Hopefully I will retry it with the latest PostgreSQL 9.0 bits and see
it happens again.
More on that later.

Regards,
Jignesh

From:
Greg Smith
Date:

  On 6/30/2010 2:21 PM, Jignesh Shah wrote:
> If the underlying WAL disk is SSD then it seems I can get
> synchronous_commit=on to work faster than
> synchronous_commit=off..

The first explanation that pops to mind is that synchronous_commit is
writing all the time, which doesn't have the same sort of penalty on
SSD.  Whereas if you turn it off, then there are some idle periods where
the SSD could be writing usefully, but instead it's buffering for the
next burst instead.  The importance of that can be magnified on
operating systems that do their own buffering and tend to lag behind
writes until they see an fsync call, like is the case on Linux with ext3.