Thread: PostgreSQL as a local in-memory cache
We have a fairly unique need for a local, in-memory cache. This will store data aggregated from other sources. Generating the data only takes a few minutes, and it is updated often. There will be some fairly expensive queries of arbitrary complexity run at a fairly high rate. We're looking for high concurrency and reasonable performance throughout. The entire data set is roughly 20 MB in size. We've tried Carbonado in front of SleepycatJE only to discover that it chokes at a fairly low concurrency and that Carbonado's rule-based optimizer is wholly insufficient for our needs. We've also tried Carbonado's Map Repository which suffers the same problems. I've since moved the backend database to a local PostgreSQL instance hoping to take advantage of PostgreSQL's superior performance at high concurrency. Of course, at the default settings, it performs quite poorly compares to the Map Repository and Sleepycat JE. My question is how can I configure the database to run as quickly as possible if I don't care about data consistency or durability? That is, the data is updated so often and it can be reproduced fairly rapidly so that if there is a server crash or random particles from space mess up memory we'd just restart the machine and move on. I've never configured PostgreSQL to work like this and I thought maybe someone here had some ideas on a good approach to this.
On Jun 14, 7:14 pm, "jgard...@jonathangardner.net" <jgard...@jonathangardner.net> wrote: > We have a fairly unique need for a local, in-memory cache. This will > store data aggregated from other sources. Generating the data only > takes a few minutes, and it is updated often. There will be some > fairly expensive queries of arbitrary complexity run at a fairly high > rate. We're looking for high concurrency and reasonable performance > throughout. > > The entire data set is roughly 20 MB in size. We've tried Carbonado in > front of SleepycatJE only to discover that it chokes at a fairly low > concurrency and that Carbonado's rule-based optimizer is wholly > insufficient for our needs. We've also tried Carbonado's Map > Repository which suffers the same problems. > > I've since moved the backend database to a local PostgreSQL instance > hoping to take advantage of PostgreSQL's superior performance at high > concurrency. Of course, at the default settings, it performs quite > poorly compares to the Map Repository and Sleepycat JE. > > My question is how can I configure the database to run as quickly as > possible if I don't care about data consistency or durability? That > is, the data is updated so often and it can be reproduced fairly > rapidly so that if there is a server crash or random particles from > space mess up memory we'd just restart the machine and move on. > > I've never configured PostgreSQL to work like this and I thought maybe > someone here had some ideas on a good approach to this. Just to summarize what I've been able to accomplish so far. By turning fsync and synchronize_commit off, and moving the data dir to tmpfs, I've been able to run the expensive queries much faster than BDB or the MapRepository that comes with Carbonado. This is because PostgreSQL's planner is so much faster and better than whatever Carbonado has. Tweaking indexes has only made things run faster. Right now I'm wrapping up the project so that we can do some serious performance benchmarks. I'll let you all know how it goes. Also, just a note that setting up PostgreSQL for these weird scenarios turned out to be just a tiny bit harder than setting up SQLite. I remember several years ago when there was a push to simplify the configuration and installation of PostgreSQL, and I believe that that has born fruit.
All, So, I've been discussing this because using PostgreSQL on the caching layer has become more common that I think most people realize. Jonathan is one of 4 companies I know of who are doing this, and with the growth of Hadoop and other large-scale data-processing technologies, I think demand will increase. Especially as, in repeated tests, PostgreSQL with persistence turned off is just as fast as the fastest nondurable NoSQL database. And it has a LOT more features. Now, while fsync=off and tmpfs for WAL more-or-less eliminate the IO for durability, they don't eliminate the CPU time. Which means that a caching version of PostgreSQL could be even faster. To do that, we'd need to: a) Eliminate WAL logging entirely b) Eliminate checkpointing c) Turn off the background writer d) Have PostgreSQL refuse to restart after a crash and instead call an exteral script (for reprovisioning) Of the three above, (a) is the most difficult codewise. (b)(c) and (d) should be relatively straightforwards, although I believe that we now have the bgwriter doing some other essential work besides syncing buffers. There's also a narrower use-case in eliminating (a), since a non-fsync'd server which was recording WAL could be used as part of a replication chain. This isn't on hackers because I'm not ready to start working on a patch, but I'd like some feedback on the complexities of doing (b) and (c) as well as how many people could use a non-persistant, in-memory postgres. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
> Especially as, in repeated tests, PostgreSQL with persistence turned off > is just as fast as the fastest nondurable NoSQL database. And it has a > LOT more features. An option to completely disable WAL for such use cases would make it a lot faster, especially in the case of heavy concurrent writes. > Now, while fsync=off and tmpfs for WAL more-or-less eliminate the IO for > durability, they don't eliminate the CPU time. Actually the WAL overhead is some CPU and lots of locking. > Which means that a caching version of PostgreSQL could be even faster. > To do that, we'd need to: > > a) Eliminate WAL logging entirely > b) Eliminate checkpointing > c) Turn off the background writer > d) Have PostgreSQL refuse to restart after a crash and instead call an > exteral script (for reprovisioning) > > Of the three above, (a) is the most difficult codewise. Actually, it's pretty easy, look in xlog.c
Hi, Josh Berkus <josh@agliodbs.com> writes: > a) Eliminate WAL logging entirely > b) Eliminate checkpointing > c) Turn off the background writer > d) Have PostgreSQL refuse to restart after a crash and instead call an > exteral script (for reprovisioning) Well I guess I'd prefer a per-transaction setting, allowing to bypass WAL logging and checkpointing. Forcing the backend to care itself for writing the data I'm not sure is a good thing, but if you say so. Then you could have the GUC set for a whole cluster, only a database etc. We already have synchronous_commit to trade durability against performances, we could maybe support protect_data = off too. The d) point I'm not sure still applies if you have per transaction setting, which I think makes the most sense. The data you choose not to protect is missing at restart, just add some way to register a hook there. We already have one (shared_preload_libraries) but it requires coding in C. Calling a user function at the end of recovery and before accepting connection would be good I think. A user function (per database) is better than a script because if you want to run it before accepting connections and still cause changes in the database… Regards, -- dim
Dimitri Fontaine <dfontaine@hi-media.com> writes: > Josh Berkus <josh@agliodbs.com> writes: >> a) Eliminate WAL logging entirely >> b) Eliminate checkpointing >> c) Turn off the background writer >> d) Have PostgreSQL refuse to restart after a crash and instead call an >> exteral script (for reprovisioning) > Well I guess I'd prefer a per-transaction setting, allowing to bypass > WAL logging and checkpointing. Not going to happen; this is all or nothing. > Forcing the backend to care itself for > writing the data I'm not sure is a good thing, but if you say so. Yeah, I think proposal (c) is likely to be a net loss. (a) and (d) are probably simple, if by "reprovisioning" you mean "rm -rf $PGDATA; initdb". Point (b) will be a bit trickier because there are various housekeeping activities tied into checkpoints. I think you can't actually remove checkpoints altogether, just skip the flush-dirty-pages part. regards, tom lane
Josh Berkus wrote: > a) Eliminate WAL logging entirely > c) Turn off the background writer Note that if you turn off full_page_writes and set bgwriter_lru_maxpages=0, you'd get a substantial move in both these directions without touching any code. Would help prove those as useful directions to move toward or not. The difference in WAL writes just after a checkpoint in particular, due to the full_page_writes behavior, is a significant portion of total WAL activity on most systems. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
> Well I guess I'd prefer a per-transaction setting, allowing to bypass > WAL logging and checkpointing. Forcing the backend to care itself for > writing the data I'm not sure is a good thing, but if you say so. Well if the transaction touches a system catalog it better be WAL-logged... A per-table (or per-index) setting makes more sense IMHO. For instance "on recovery, truncate this table" (this was mentioned before). Another option would be "make the table data safe, but on recovery, destroy and rebuild this index" : because on a not so large, often updated table, with often updated indexes, it may not take long to rebuild the indexes, but all those wal-logged index updates do add some overhead.
> Well I guess I'd prefer a per-transaction setting, allowing to bypass > WAL logging and checkpointing. Not even conceiveable. For this to work, we're talking about the whole database installation. This is only a set of settings for a database *server* which is considered disposable and replaceable, where if it shuts down unexpectedly, you throw it away and replace it. > Forcing the backend to care itself for > writing the data I'm not sure is a good thing, but if you say so. Oh, yeah, I guess we'd only be turning off the LRU cache operations of the background writer. Same with checkpoints. Copying between shared_buffers and the LRU cache would still happen. > Calling a user function at the end of recovery and before accepting > connection would be good I think. A user function (per database) is > better than a script because if you want to run it before accepting > connections and still cause changes in the database… Hmmm, you're not quite following my idea. There is no recovery. If the database shuts down unexpectedly, it's toast and you replace it from another copy somewhere else. > (a) and (d) are probably simple, if by "reprovisioning" you mean > "rm -rf $PGDATA; initdb". Exactly. Followed by "scp database_image". Or heck, just replacing the whole VM. > Point (b) will be a bit trickier because > there are various housekeeping activities tied into checkpoints. > I think you can't actually remove checkpoints altogether, just > skip the flush-dirty-pages part. Yes, and we'd want to flush dirty pages on an actual shutdown command. We do want to be able to shut down the DB on purpose. > Well if the transaction touches a system catalog it better be > WAL-logged... Given the above, why? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> (a) and (d) are probably simple, if by "reprovisioning" you mean >> "rm -rf $PGDATA; initdb". > Exactly. Followed by "scp database_image". Or heck, just replacing the > whole VM. Right, that would work. I don't think you really need to implement that inside Postgres. I would envision having the startup script do it, ie rm -rf $PGDATA cp -pr prepared-database-image $PGDATA # this loop exits when postmaster exits normally while ! postmaster ... do rm -rf $PGDATA cp -pr prepared-database-image $PGDATA done Then all you need is a tweak to make the postmaster exit(1) after a crash instead of trying to launch recovery. regards, tom lane
Dimitri Fontaine wrote: >> Well I guess I'd prefer a per-transaction setting Not possible, as many others have said. As soon as you make an unsafe transaction, all the other transactions have nothing to rely on. On Thu, 17 Jun 2010, Pierre C wrote: > A per-table (or per-index) setting makes more sense IMHO. For instance "on > recovery, truncate this table" (this was mentioned before). That would be much more valuable. I'd like to point out the costs involved in having a whole separate "version" of Postgres that has all this safety switched off. Package managers will not thank anyone for having to distribute another version of the system, and woe betide the user who installs the wrong version because "it runs faster". No, this is much better as a configurable option. Going back to the "on recovery, truncate this table". We already have a mechanism for skipping the WAL writes on an entire table - we do that for tables that have been created in the current transaction. It would surely be a small step to allow this to be configurably permanent on a particular table. Moreover, we already have a mechanism for taking a table that has had non-logged changes, and turning it into a fully logged table - we do that to the above mentioned tables when the transaction commits. I would strongly recommend providing an option to ALTER TABLE MAKE SAFE, which may involve some more acrobatics if the table is currently in use by multiple transactions, but would be valuable. This would allow users to create "temporary tables" that can be shared by several connections. It would also allow bulk loading in parallel of a single large table. With these suggestions, we would still need to WAL-log all the metadata changes, but I think in most circumstances that is not going to be a large burden on performance. Matthew -- Picard: I was just paid a visit from Q. Riker: Q! Any idea what he's up to? Picard: No. He said he wanted to be "nice" to me. Riker: I'll alert the crew.
> I'd like to point out the costs involved in having a whole separate > "version" It must be a setting, not a version. For instance suppose you have a session table for your website and a users table. - Having ACID on the users table is of course a must ; - for the sessions table you can drop the "D" Server crash would force all users to re-login on your website but if your server crashes enough that your users complain about that, you have another problem anyway. Having the sessions table not WAL-logged (ie faster) would not prevent you from having sessions.user_id REFERENCES users( user_id ) ... so mixing safe and unsafe tables would be much more powerful than just having unsafe tables. And I really like the idea of non-WAL-logged indexes, too, since they can be rebuilt as needed, the DBA could decide between faster index updates but rebuild on crash, or normal updates and fast recovery. Also materialized views etc, you can rebuild them on crash and the added update speed would be good. > Moreover, we already have a mechanism for taking a table that has had > non-logged changes, and turning it into a fully logged table - we do > that to the above mentioned tables when the transaction commits. I would > strongly recommend providing an option to ALTER TABLE MAKE SAFE, which > may involve some more acrobatics if the table is currently in use by > multiple transactions, but would be valuable. I believe the old discussions called this ALTER TABLE SET PERSISTENCE. > This would allow users to create "temporary tables" that can be shared > by several connections. It would also allow bulk loading in parallel of > a single large table. This would need to WAL-log the entire table to send it to the slaves if replication is enabled, but it's a lot faster than replicating each record.
> It must be a setting, not a version. > > For instance suppose you have a session table for your website and a > users table. > > - Having ACID on the users table is of course a must ; > - for the sessions table you can drop the "D" You're trying to solve a different use-case than the one I am. Your use-case will be solved by global temporary tables. I suggest that you give Robert Haas some help & feedback on that. My use case is people using PostgreSQL as a cache, or relying entirely on replication for durability. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 6/18/10 2:15 AM, Matthew Wakeling wrote: > I'd like to point out the costs involved in having a whole separate > "version" of Postgres that has all this safety switched off. Package > managers will not thank anyone for having to distribute another version > of the system, and woe betide the user who installs the wrong version > because "it runs faster". No, this is much better as a configurable option. Agreed, although initial alphas of this concept are likely to in fact be a separate source code tree. Eventually when we have it working well it could become an initdb-time option. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Thu, Jun 17, 2010 at 1:29 PM, Josh Berkus <josh@agliodbs.com> wrote: > a) Eliminate WAL logging entirely In addition to global temporary tables, I am also planning to implement unlogged tables, which are, precisely, tables for which no WAL is written. On restart, any such tables will be truncated. That should give you the ability to do this (by making all your tables unlogged). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Tom Lane wrote: > Dimitri Fontaine <dfontaine@hi-media.com> writes: > > Josh Berkus <josh@agliodbs.com> writes: > >> a) Eliminate WAL logging entirely If we elimiate WAL logging, that means a reinstall is required for even a postmaster crash, which is a new non-durable behavior. Also, we just added wal_level = minimal, which might end up being a poor name choice of we want wal_level = off in PG 9.1. Perhaps we should have used wal_level = crash_safe in 9.0. I have added the following TODO: Consider a non-crash-safe wal_level that eliminates WAL activity * http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
Tom Lane wrote: > Dimitri Fontaine <dfontaine@hi-media.com> writes: > > Josh Berkus <josh@agliodbs.com> writes: > >> a) Eliminate WAL logging entirely > >> b) Eliminate checkpointing > >> c) Turn off the background writer > >> d) Have PostgreSQL refuse to restart after a crash and instead call an > >> exteral script (for reprovisioning) > > > Well I guess I'd prefer a per-transaction setting, allowing to bypass > > WAL logging and checkpointing. > > Not going to happen; this is all or nothing. > > > Forcing the backend to care itself for > > writing the data I'm not sure is a good thing, but if you say so. > > Yeah, I think proposal (c) is likely to be a net loss. > > (a) and (d) are probably simple, if by "reprovisioning" you mean > "rm -rf $PGDATA; initdb". Point (b) will be a bit trickier because > there are various housekeeping activities tied into checkpoints. > I think you can't actually remove checkpoints altogether, just > skip the flush-dirty-pages part. Based on this thread, I have developed the following documentation patch that outlines the performance enhancements possible if durability is not required. The patch also documents that synchronous_commit = false has potential committed transaction loss from a database crash (as well as an OS crash). -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + Index: doc/src/sgml/config.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.282 diff -c -c -r1.282 config.sgml *** doc/src/sgml/config.sgml 22 Jun 2010 02:57:49 -0000 1.282 --- doc/src/sgml/config.sgml 23 Jun 2010 18:53:26 -0000 *************** *** 1463,1469 **** really guaranteed to be safe against a server crash. (The maximum delay is three times <xref linkend="guc-wal-writer-delay">.) Unlike <xref linkend="guc-fsync">, setting this parameter to <literal>off</> ! does not create any risk of database inconsistency: a crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turning <varname>synchronous_commit</> off --- 1463,1470 ---- really guaranteed to be safe against a server crash. (The maximum delay is three times <xref linkend="guc-wal-writer-delay">.) Unlike <xref linkend="guc-fsync">, setting this parameter to <literal>off</> ! does not create any risk of database inconsistency: an operating ! system or database crash crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turning <varname>synchronous_commit</> off Index: doc/src/sgml/perform.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v retrieving revision 1.80 diff -c -c -r1.80 perform.sgml *** doc/src/sgml/perform.sgml 29 May 2010 21:08:04 -0000 1.80 --- doc/src/sgml/perform.sgml 23 Jun 2010 18:53:26 -0000 *************** *** 1104,1107 **** --- 1104,1169 ---- </sect2> </sect1> + <sect1 id="non-durability"> + <title>Non-Durable Settings</title> + + <indexterm zone="non-durability"> + <primary>non-durable</primary> + </indexterm> + + <para> + Durability is a database feature that guarantees the recording of + committed transactions even if if the server crashes or loses + power. However, durability adds significant database overhead, + so if your site does not require such a guarantee, + <productname>PostgreSQL</productname> can be configured to run + much faster. The following are configuration changes you can make + to improve performance in such cases; they do not invalidate + commit guarantees related to database crashes, only abrupt operating + system stoppage, except as mentioned below: + + <itemizedlist> + <listitem> + <para> + Place the database cluster's data directory in a memory-backed + file system (i.e. <acronym>RAM</> disk). This eliminates all + database disk I/O, but limits data storage to the amount of + available memory (and perhaps swap). + </para> + </listitem> + + <listitem> + <para> + Turn off <xref linkend="guc-fsync">; there is no need to flush + data to disk. + </para> + </listitem> + + <listitem> + <para> + Turn off <xref linkend="guc-full-page-writes">; there is no need + to guard against partial page writes. + </para> + </listitem> + + <listitem> + <para> + Increase <xref linkend="guc-checkpoint-segments"> and <xref + linkend="guc-checkpoint-timeout"> ; this reduces the frequency + of checkpoints, but increases the storage requirements of + <filename>/pg_xlog</>. + </para> + </listitem> + + <listitem> + <para> + Turn off <xref linkend="guc-synchronous-commit">; there might be no + need to write the <acronym>WAL</acronym> to disk on every + commit. This does affect database crash transaction durability. + </para> + </listitem> + </itemizedlist> + </para> + </sect1> + </chapter>
2010/6/23 Bruce Momjian <bruce@momjian.us>: > Tom Lane wrote: >> Dimitri Fontaine <dfontaine@hi-media.com> writes: >> > Josh Berkus <josh@agliodbs.com> writes: >> >> a) Eliminate WAL logging entirely > > If we elimiate WAL logging, that means a reinstall is required for even > a postmaster crash, which is a new non-durable behavior. > > Also, we just added wal_level = minimal, which might end up being a poor > name choice of we want wal_level = off in PG 9.1. Perhaps we should > have used wal_level = crash_safe in 9.0. > > I have added the following TODO: > > Consider a non-crash-safe wal_level that eliminates WAL activity > > * http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php > > -- isn't fsync to off enought? Regards Pavel > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + None of us is going to be here forever. + > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Pavel Stehule wrote: > 2010/6/23 Bruce Momjian <bruce@momjian.us>: > > Tom Lane wrote: > >> Dimitri Fontaine <dfontaine@hi-media.com> writes: > >> > Josh Berkus <josh@agliodbs.com> writes: > >> >> a) Eliminate WAL logging entirely > > > > If we elimiate WAL logging, that means a reinstall is required for even > > a postmaster crash, which is a new non-durable behavior. > > > > Also, we just added wal_level = minimal, which might end up being a poor > > name choice of we want wal_level = off in PG 9.1. ?Perhaps we should > > have used wal_level = crash_safe in 9.0. > > > > I have added the following TODO: > > > > ? ? ? ?Consider a non-crash-safe wal_level that eliminates WAL activity > > > > ? ? ? ? ? ?* http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php > > > > -- > > isn't fsync to off enought? Well, testing reported in the thread showed other settings also help, though the checkpoint lengthening was not tested. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
On Wed, Jun 23, 2010 at 3:37 PM, Bruce Momjian <bruce@momjian.us> wrote: > Tom Lane wrote: >> Dimitri Fontaine <dfontaine@hi-media.com> writes: >> > Josh Berkus <josh@agliodbs.com> writes: >> >> a) Eliminate WAL logging entirely > > If we elimiate WAL logging, that means a reinstall is required for even > a postmaster crash, which is a new non-durable behavior. > > Also, we just added wal_level = minimal, which might end up being a poor > name choice of we want wal_level = off in PG 9.1. Perhaps we should > have used wal_level = crash_safe in 9.0. > > I have added the following TODO: > > Consider a non-crash-safe wal_level that eliminates WAL activity > > * http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php I don't think we need a system-wide setting for that. I believe that the unlogged tables I'm working on will handle that case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, Jun 23, 2010 at 9:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 23, 2010 at 3:37 PM, Bruce Momjian <bruce@momjian.us> wrote: >> Tom Lane wrote: >>> Dimitri Fontaine <dfontaine@hi-media.com> writes: >>> > Josh Berkus <josh@agliodbs.com> writes: >>> >> a) Eliminate WAL logging entirely >> >> If we elimiate WAL logging, that means a reinstall is required for even >> a postmaster crash, which is a new non-durable behavior. >> >> Also, we just added wal_level = minimal, which might end up being a poor >> name choice of we want wal_level = off in PG 9.1. Perhaps we should >> have used wal_level = crash_safe in 9.0. >> >> I have added the following TODO: >> >> Consider a non-crash-safe wal_level that eliminates WAL activity >> >> * http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php > > I don't think we need a system-wide setting for that. I believe that > the unlogged tables I'm working on will handle that case. Aren't they going to be truncated at startup? If the entire system is running without WAL, we would only need to do that in case of an unclean shutdown wouldn't we? -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Wed, Jun 23, 2010 at 3:37 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Tom Lane wrote: > >> Dimitri Fontaine <dfontaine@hi-media.com> writes: > >> > Josh Berkus <josh@agliodbs.com> writes: > >> >> a) Eliminate WAL logging entirely > > > > If we elimiate WAL logging, that means a reinstall is required for even > > a postmaster crash, which is a new non-durable behavior. > > > > Also, we just added wal_level = minimal, which might end up being a poor > > name choice of we want wal_level = off in PG 9.1. ?Perhaps we should > > have used wal_level = crash_safe in 9.0. > > > > I have added the following TODO: > > > > ? ? ? ?Consider a non-crash-safe wal_level that eliminates WAL activity > > > > ? ? ? ? ? ?* http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php > > I don't think we need a system-wide setting for that. I believe that > the unlogged tables I'm working on will handle that case. Uh, will we have some global unlogged setting, like for the system tables and stuff? It seems like an heavy burden to tell people they have to create ever object as unlogged, and we would still generate log for things like transaction commits. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
Dave Page <dpage@pgadmin.org> writes: > On Wed, Jun 23, 2010 at 9:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I don't think we need a system-wide setting for that. �I believe that >> the unlogged tables I'm working on will handle that case. > Aren't they going to be truncated at startup? If the entire system is > running without WAL, we would only need to do that in case of an > unclean shutdown wouldn't we? The problem with a system-wide no-WAL setting is it means you can't trust the system catalogs after a crash. Which means you are forced to use initdb to recover from any crash, in return for not a lot of savings (for typical usages where there's not really much churn in the catalogs). I tend to agree with Robert that a way to not log content updates for individual user tables is likely to be much more useful in practice. regards, tom lane
Tom Lane wrote: > Dave Page <dpage@pgadmin.org> writes: > > On Wed, Jun 23, 2010 at 9:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> I don't think we need a system-wide setting for that. �I believe that > >> the unlogged tables I'm working on will handle that case. > > > Aren't they going to be truncated at startup? If the entire system is > > running without WAL, we would only need to do that in case of an > > unclean shutdown wouldn't we? > > The problem with a system-wide no-WAL setting is it means you can't > trust the system catalogs after a crash. Which means you are forced to True, and in fact any postmaster crash could lead to curruption. > use initdb to recover from any crash, in return for not a lot of savings > (for typical usages where there's not really much churn in the > catalogs). I tend to agree with Robert that a way to not log content > updates for individual user tables is likely to be much more useful in > practice. OK, TODO removed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
Tom Lane <tgl@sss.pgh.pa.us> writes: > The problem with a system-wide no-WAL setting is it means you can't > trust the system catalogs after a crash. Which means you are forced to > use initdb to recover from any crash, in return for not a lot of savings > (for typical usages where there's not really much churn in the > catalogs). What about having a "catalog only" WAL setting, userset ? I'm not yet clear on the point but it well seems that the per transaction WAL setting is impossible because of catalogs (meaning mainly DDL support), but I can see us enforcing durability and crash safety there. That would probably mean that setting WAL level this low yet doing any kind of DDL would need to be either an ERROR, or better yet, a WARNING telling that the WAL level can not be that low so has been raised by the system. Regards, -- dim
On Fri, Jun 18, 2010 at 1:55 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> It must be a setting, not a version. >> >> For instance suppose you have a session table for your website and a >> users table. >> >> - Having ACID on the users table is of course a must ; >> - for the sessions table you can drop the "D" > > You're trying to solve a different use-case than the one I am. > > Your use-case will be solved by global temporary tables. I suggest that > you give Robert Haas some help & feedback on that. > > My use case is people using PostgreSQL as a cache, or relying entirely > on replication for durability. > > -- > -- Josh Berkus > PostgreSQL Experts Inc. > http://www.pgexperts.com > Is he? Wouldn't a global temporary table have content that is not visible between db connections? A db session many not be the same as a user session. -- Rob Wultsch wultsch@gmail.com
On Thu, Jun 24, 2010 at 4:40 AM, Rob Wultsch <wultsch@gmail.com> wrote: > On Fri, Jun 18, 2010 at 1:55 PM, Josh Berkus <josh@agliodbs.com> wrote: >> >>> It must be a setting, not a version. >>> >>> For instance suppose you have a session table for your website and a >>> users table. >>> >>> - Having ACID on the users table is of course a must ; >>> - for the sessions table you can drop the "D" >> >> You're trying to solve a different use-case than the one I am. >> >> Your use-case will be solved by global temporary tables. I suggest that >> you give Robert Haas some help & feedback on that. >> >> My use case is people using PostgreSQL as a cache, or relying entirely >> on replication for durability. > > Is he? Wouldn't a global temporary table have content that is not > visible between db connections? A db session many not be the same as a > user session. > I'm planning to implement global temporary tables, which can have different contents for each user session. And I'm also planning to implement unlogged tables, which have the same contents for all sessions but are not WAL-logged (and are truncated on startup). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
> And I'm also planning to implement unlogged tables, which have the > same contents for all sessions but are not WAL-logged (and are > truncated on startup). Yep. And it's quite possible that this will be adequate for most users. And it's also possible that the extra CPU which Robert isn't getting rid of (bgwriter, checkpointing, etc.) does not have a measurable impact on performance. At this point, my idea (which I call "RunningWithScissorsDB") is only an idea for experimentation and performance testing. It's pretty far off from being a TODO. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
2010/6/24 Josh Berkus <josh@agliodbs.com>: > >> And I'm also planning to implement unlogged tables, which have the >> same contents for all sessions but are not WAL-logged (and are >> truncated on startup). this is similar MySQL's memory tables. Personally, I don't see any practical sense do same work on PostgreSQL now, when memcached exists. Much more important is smarter cache controlling then we have now - maybe with priorities for some tables and some operations (applications) - sometimes we don't need use cache for extra large scans. Regards Pavel Stehule > > Yep. And it's quite possible that this will be adequate for most users. > > And it's also possible that the extra CPU which Robert isn't getting rid > of (bgwriter, checkpointing, etc.) does not have a measurable impact on > performance. At this point, my idea (which I call > "RunningWithScissorsDB") is only an idea for experimentation and > performance testing. It's pretty far off from being a TODO. > > -- > -- Josh Berkus > PostgreSQL Experts Inc. > http://www.pgexperts.com > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote: > 2010/6/24 Josh Berkus <josh@agliodbs.com>: > > > >> And I'm also planning to implement unlogged tables, which have the > >> same contents for all sessions but are not WAL-logged (and are > >> truncated on startup). > > this is similar MySQL's memory tables. Personally, I don't see any > practical sense do same work on PostgreSQL now, when memcached exists. Because memcache is yet another layer and increases overhead to the application developers by adding yet another layer to work with. Non logged tables would rock. SELECT * FROM foo; :D JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering
2010/6/24 Joshua D. Drake <jd@commandprompt.com>: > On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote: >> 2010/6/24 Josh Berkus <josh@agliodbs.com>: >> > >> >> And I'm also planning to implement unlogged tables, which have the >> >> same contents for all sessions but are not WAL-logged (and are >> >> truncated on startup). >> >> this is similar MySQL's memory tables. Personally, I don't see any >> practical sense do same work on PostgreSQL now, when memcached exists. > > Because memcache is yet another layer and increases overhead to the > application developers by adding yet another layer to work with. Non > logged tables would rock. I see only one positive point - it can help to people with broken design application with migration to PostgreSQL. There are different interesting feature - cached procedure's results like Oracle 11. - it's more general. only idea. For me memory tables are nonsens, but what about memory cached materialised views (maybe periodically refreshed)? Regards Pavel > > SELECT * FROM foo; > > :D :) > > JD > > > > > -- > PostgreSQL.org Major Contributor > Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 > Consulting, Training, Support, Custom Development, Engineering > >
On Jun 24, 2010, at 4:01 PM, Pavel Stehule wrote: > 2010/6/24 Joshua D. Drake <jd@commandprompt.com>: >> On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote: >>> 2010/6/24 Josh Berkus <josh@agliodbs.com>: >>>> >>>>> And I'm also planning to implement unlogged tables, which have the >>>>> same contents for all sessions but are not WAL-logged (and are >>>>> truncated on startup). >>> >>> this is similar MySQL's memory tables. Personally, I don't see any >>> practical sense do same work on PostgreSQL now, when memcached exists. >> >> Because memcache is yet another layer and increases overhead to the >> application developers by adding yet another layer to work with. Non >> logged tables would rock. > > I see only one positive point - it can help to people with broken > design application with migration to PostgreSQL. The broken design is being required to work around PostgreSQL's lack of this optimization. > > There are different interesting feature - cached procedure's results > like Oracle 11. - it's more general. > > only idea. > > For me memory tables are nonsens, but what about memory cached > materialised views (maybe periodically refreshed)? Non-WAL-logged, non-fsynced tables are not equivalent to MySQL "memory tables". Such tables simply contain transient information.One can already make "memory tables" in PostgreSQL by making a tablespace in a tmpfs partition. I have been eagerly waiting for this feature for six years so that I can write proper queries against ever-changing sessiondata with transactional semantics (which memcached cannot offer). The only restriction I see for these transient datatables is that they cannot be referenced by standard tables using foreign key constraints. Otherwise, these tables behavelike any other. That's the benefit. Cheers, M
2010/6/24 A.M. <agentm@themactionfaction.com>: > > On Jun 24, 2010, at 4:01 PM, Pavel Stehule wrote: > >> 2010/6/24 Joshua D. Drake <jd@commandprompt.com>: >>> On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote: >>>> 2010/6/24 Josh Berkus <josh@agliodbs.com>: >>>>> >>>>>> And I'm also planning to implement unlogged tables, which have the >>>>>> same contents for all sessions but are not WAL-logged (and are >>>>>> truncated on startup). >>>> >>>> this is similar MySQL's memory tables. Personally, I don't see any >>>> practical sense do same work on PostgreSQL now, when memcached exists. >>> >>> Because memcache is yet another layer and increases overhead to the >>> application developers by adding yet another layer to work with. Non >>> logged tables would rock. >> >> I see only one positive point - it can help to people with broken >> design application with migration to PostgreSQL. > > The broken design is being required to work around PostgreSQL's lack of this optimization. > >> >> There are different interesting feature - cached procedure's results >> like Oracle 11. - it's more general. >> >> only idea. >> >> For me memory tables are nonsens, but what about memory cached >> materialised views (maybe periodically refreshed)? > > Non-WAL-logged, non-fsynced tables are not equivalent to MySQL "memory tables". Such tables simply contain transient information.One can already make "memory tables" in PostgreSQL by making a tablespace in a tmpfs partition. > > I have been eagerly waiting for this feature for six years so that I can write proper queries against ever-changing sessiondata with transactional semantics (which memcached cannot offer). The only restriction I see for these transient datatables is that they cannot be referenced by standard tables using foreign key constraints. Otherwise, these tables behavelike any other. That's the benefit. > if you remove WAL, then there are MVCC still - you have to do VACUUM, you have to do ANALYZE, you have to thinking about indexes ... Processing pipe for simple query is long too. The removing WAL doesn't do memory database from Postgres. But You have to know best, what do you do. Regards Pavel Stehule p.s. maybe memcached is too simply for you - there are more NoSQL db > Cheers, > M > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
> this is similar MySQL's memory tables. Personally, I don't see any > practical sense do same work on PostgreSQL now, when memcached exists. Thing is, if you only have one table (say, a sessions table) which you don't want logged, you don't necessarily want to fire up a 2nd software application just for that. Plus, recent testing seems to show that with no logging, memcached isn't really faster than PG. Also, like for asynch_commit, this is something where users are currently turning off fsync. Any option where we can present users with controlled, predictable data loss instead of random corruption is a good one. > Much more important is smarter cache controlling then we have now - > maybe with priorities for some tables and some operations > (applications) - sometimes we don't need use cache for extra large > scans. Well, that would be good *too*. You working on it? ;-) -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
2010/6/24 Josh Berkus <josh@agliodbs.com>: > >> this is similar MySQL's memory tables. Personally, I don't see any >> practical sense do same work on PostgreSQL now, when memcached exists. > > Thing is, if you only have one table (say, a sessions table) which you > don't want logged, you don't necessarily want to fire up a 2nd software > application just for that. Plus, recent testing seems to show that with > no logging, memcached isn't really faster than PG. sorry, I thinking some else. Not only WAL does significant overhead. You need litlle bit more memory, much more processing time. With very fast operations, the bottle neck will be in interprocess communication - but it doesn't mean so pg isn't slower than memcached. I repeating it again - there are no any universal tool for all tasks. > > Also, like for asynch_commit, this is something where users are > currently turning off fsync. Any option where we can present users with > controlled, predictable data loss instead of random corruption is a good > one. > it isn't too simple. What about statistics? These are used in system table. >> Much more important is smarter cache controlling then we have now - >> maybe with priorities for some tables and some operations >> (applications) - sometimes we don't need use cache for extra large >> scans. > > Well, that would be good *too*. You working on it? ;-) > no - just I know about possible problems with memory control. > -- > -- Josh Berkus > PostgreSQL Experts Inc. > http://www.pgexperts.com > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Thu, 2010-06-24 at 21:14 +0200, Pavel Stehule wrote: > 2010/6/24 Josh Berkus <josh@agliodbs.com>: > > > >> And I'm also planning to implement unlogged tables, which have the > >> same contents for all sessions but are not WAL-logged (and are > >> truncated on startup). > > this is similar MySQL's memory tables. Personally, I don't see any > practical sense do same work on PostgreSQL now, when memcached exists. Because memcache is yet another layer and increases overhead to the application developers by adding yet another layer to work with. Non logged tables would rock. SELECT * FROM foo; :D JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering
Bruce Momjian wrote: > Tom Lane wrote: > > Dimitri Fontaine <dfontaine@hi-media.com> writes: > > > Josh Berkus <josh@agliodbs.com> writes: > > >> a) Eliminate WAL logging entirely > > >> b) Eliminate checkpointing > > >> c) Turn off the background writer > > >> d) Have PostgreSQL refuse to restart after a crash and instead call an > > >> exteral script (for reprovisioning) > > > > > Well I guess I'd prefer a per-transaction setting, allowing to bypass > > > WAL logging and checkpointing. > > > > Not going to happen; this is all or nothing. > > > > > Forcing the backend to care itself for > > > writing the data I'm not sure is a good thing, but if you say so. > > > > Yeah, I think proposal (c) is likely to be a net loss. > > > > (a) and (d) are probably simple, if by "reprovisioning" you mean > > "rm -rf $PGDATA; initdb". Point (b) will be a bit trickier because > > there are various housekeeping activities tied into checkpoints. > > I think you can't actually remove checkpoints altogether, just > > skip the flush-dirty-pages part. > > Based on this thread, I have developed the following documentation patch > that outlines the performance enhancements possible if durability is not > required. The patch also documents that synchronous_commit = false has > potential committed transaction loss from a database crash (as well as > an OS crash). Applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian <bruce@momjian.us> wrote: >> The patch also documents that synchronous_commit = false has >> potential committed transaction loss from a database crash (as well as >> an OS crash). Is this actually true? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> The patch also documents that synchronous_commit = false has > >> potential committed transaction loss from a database crash (as well as > >> an OS crash). > > Is this actually true? I asked on IRC and was told it is true, and looking at the C code it looks true. What synchronous_commit = false does is to delay writing the wal buffers to disk and fsyncing them, not just fsync, which is where the commit loss due to db process crash comes from. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
Bruce Momjian <bruce@momjian.us> wrote: > What synchronous_commit = false does is to delay writing > the wal buffers to disk and fsyncing them, not just fsync Ah, that answers the question Josh Berkus asked here: http://archives.postgresql.org/pgsql-performance/2010-06/msg00285.php (which is something I was wondering about, too.) -Kevin
On Tue, Jun 29, 2010 at 9:32 AM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian <bruce@momjian.us> wrote: >> >> The patch also documents that synchronous_commit = false has >> >> potential committed transaction loss from a database crash (as well as >> >> an OS crash). >> >> Is this actually true? > > I asked on IRC and was told it is true, and looking at the C code it > looks true. What synchronous_commit = false does is to delay writing > the wal buffers to disk and fsyncing them, not just fsync, which is > where the commit loss due to db process crash comes from. Ah, I see. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Tue, Jun 29, 2010 at 9:32 AM, Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > >> On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> >> The patch also documents that synchronous_commit = false has > >> >> potential committed transaction loss from a database crash (as well as > >> >> an OS crash). > >> > >> Is this actually true? > > > > I asked on IRC and was told it is true, and looking at the C code it > > looks true. ?What synchronous_commit = false does is to delay writing > > the wal buffers to disk and fsyncing them, not just fsync, which is > > where the commit loss due to db process crash comes from. > > Ah, I see. Thanks. I am personally surprised it was designed that way; I thought we would just delay fsync. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
Bruce Momjian <bruce@momjian.us> writes: >>> I asked on IRC and was told it is true, and looking at the C code it >>> looks true. ?What synchronous_commit = false does is to delay writing >>> the wal buffers to disk and fsyncing them, not just fsync, which is >>> where the commit loss due to db process crash comes from. >> Ah, I see. Thanks. > I am personally surprised it was designed that way; I thought we would > just delay fsync. That would require writing and syncing to be separable actions. If you're using O_SYNC or similar, they aren't. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > >>> I asked on IRC and was told it is true, and looking at the C code it > >>> looks true. ?What synchronous_commit = false does is to delay writing > >>> the wal buffers to disk and fsyncing them, not just fsync, which is > >>> where the commit loss due to db process crash comes from. > > >> Ah, I see. Thanks. > > > I am personally surprised it was designed that way; I thought we would > > just delay fsync. > > That would require writing and syncing to be separable actions. If > you're using O_SYNC or similar, they aren't. Ah, very good point. I have added a C comment to clarify why this is the current behavior; attached and applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + Index: src/backend/access/transam/xact.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.291 diff -c -c -r1.291 xact.c *** src/backend/access/transam/xact.c 13 May 2010 11:39:30 -0000 1.291 --- src/backend/access/transam/xact.c 29 Jun 2010 18:33:47 -0000 *************** *** 1028,1034 **** if (XactSyncCommit || forceSyncCommit || haveNonTemp) { /* ! * Synchronous commit case. * * Sleep before flush! So we can flush more than one commit records * per single fsync. (The idea is some other backend may do the --- 1028,1034 ---- if (XactSyncCommit || forceSyncCommit || haveNonTemp) { /* ! * Synchronous commit case: * * Sleep before flush! So we can flush more than one commit records * per single fsync. (The idea is some other backend may do the *************** *** 1054,1060 **** else { /* ! * Asynchronous commit case. * * Report the latest async commit LSN, so that the WAL writer knows to * flush this commit. --- 1054,1065 ---- else { /* ! * Asynchronous commit case: ! * ! * This enables possible committed transaction loss in the case of a ! * postmaster crash because WAL buffers are left unwritten. ! * Ideally we could issue the WAL write without the fsync, but ! * some wal_sync_methods do not allow separate write/fsync. * * Report the latest async commit LSN, so that the WAL writer knows to * flush this commit.
On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian <bruce@momjian.us> wrote: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >> >>> I asked on IRC and was told it is true, and looking at the C code it >> >>> looks true. ?What synchronous_commit = false does is to delay writing >> >>> the wal buffers to disk and fsyncing them, not just fsync, which is >> >>> where the commit loss due to db process crash comes from. >> >> >> Ah, I see. Thanks. >> >> > I am personally surprised it was designed that way; I thought we would >> > just delay fsync. >> >> That would require writing and syncing to be separable actions. If >> you're using O_SYNC or similar, they aren't. > > Ah, very good point. I have added a C comment to clarify why this is > the current behavior; attached and applied. > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com Though has anybody seen a behaviour where synchronous_commit=off is slower than synchronous_commit=on ? Again there are two cases here one with O_* flag and other with f*sync flags. But I had seen that behavior with PostgreSQL 9.0 beta(2 I think) though havent really investigated it much yet .. (though now I dont remember which wal_sync_method flag) . Just curious if anybody has seen that behavior.. Regards, Jignesh
Jignesh Shah wrote: > On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Tom Lane wrote: > >> Bruce Momjian <bruce@momjian.us> writes: > >> >>> I asked on IRC and was told it is true, and looking at the C code it > >> >>> looks true. ?What synchronous_commit = false does is to delay writing > >> >>> the wal buffers to disk and fsyncing them, not just fsync, which is > >> >>> where the commit loss due to db process crash comes from. > >> > >> >> Ah, I see. ?Thanks. > >> > >> > I am personally surprised it was designed that way; ?I thought we would > >> > just delay fsync. > >> > >> That would require writing and syncing to be separable actions. ?If > >> you're using O_SYNC or similar, they aren't. > > > > Ah, very good point. ?I have added a C comment to clarify why this is > > the current behavior; ?attached and applied. > > > > -- > > ?Bruce Momjian ?<bruce@momjian.us> ? ? ? ?http://momjian.us > > ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com > > > Though has anybody seen a behaviour where synchronous_commit=off is > slower than synchronous_commit=on ? Again there are two cases here > one with O_* flag and other with f*sync flags. But I had seen that > behavior with PostgreSQL 9.0 beta(2 I think) though havent really > investigated it much yet .. (though now I dont remember which > wal_sync_method flag) . Just curious if anybody has seen that > behavior.. I have trouble believing how synchronous_commit=off could be slower than 'on'. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
On Tue, 2010-06-29 at 21:39 -0400, Bruce Momjian wrote: > Jignesh Shah wrote: > > On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > Tom Lane wrote: > > >> Bruce Momjian <bruce@momjian.us> writes: > > >> >>> I asked on IRC and was told it is true, and looking at the C code it > > >> >>> looks true. ?What synchronous_commit = false does is to delay writing > > >> >>> the wal buffers to disk and fsyncing them, not just fsync, which is > > >> >>> where the commit loss due to db process crash comes from. > > >> > > >> >> Ah, I see. ?Thanks. > > >> > > >> > I am personally surprised it was designed that way; ?I thought we would > > >> > just delay fsync. > > >> > > >> That would require writing and syncing to be separable actions. ?If > > >> you're using O_SYNC or similar, they aren't. > > > > > > Ah, very good point. ?I have added a C comment to clarify why this is > > > the current behavior; ?attached and applied. > > > > > > -- > > > ?Bruce Momjian ?<bruce@momjian.us> ? ? ? ?http://momjian.us > > > ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com > > > > > > Though has anybody seen a behaviour where synchronous_commit=off is > > slower than synchronous_commit=on ? Again there are two cases here > > one with O_* flag and other with f*sync flags. But I had seen that > > behavior with PostgreSQL 9.0 beta(2 I think) though havent really > > investigated it much yet .. (though now I dont remember which > > wal_sync_method flag) . Just curious if anybody has seen that > > behavior.. > > I have trouble believing how synchronous_commit=off could be slower than > 'on'. > I wonder if it could be contention on wal buffers? Say I've turned synchronous_commit off, I drive enough traffic fill up my wal_buffers. I assume that we would have to start writing buffers down to disk before allocating to the new process. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
Brad Nicholson wrote: > > > > Ah, very good point. ?I have added a C comment to clarify why this is > > > > the current behavior; ?attached and applied. > > > > > > > > -- > > > > ?Bruce Momjian ?<bruce@momjian.us> ? ? ? ?http://momjian.us > > > > ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com > > > > > > > > > Though has anybody seen a behaviour where synchronous_commit=off is > > > slower than synchronous_commit=on ? Again there are two cases here > > > one with O_* flag and other with f*sync flags. But I had seen that > > > behavior with PostgreSQL 9.0 beta(2 I think) though havent really > > > investigated it much yet .. (though now I dont remember which > > > wal_sync_method flag) . Just curious if anybody has seen that > > > behavior.. > > > > I have trouble believing how synchronous_commit=off could be slower than > > 'on'. > > > > I wonder if it could be contention on wal buffers? > > Say I've turned synchronous_commit off, I drive enough traffic fill up > my wal_buffers. I assume that we would have to start writing buffers > down to disk before allocating to the new process. Uh, good question. I know this report showed ynchronous_commit=off as faster than 'on': http://archives.postgresql.org/pgsql-performance/2010-06/msg00277.php -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
I haven't jumped in yet on this thread, but here goes ....
If you're really looking for query performance, then any database which is designed with reliability and ACID consistency in mind is going to inherently have some mis-fit features.
Some other ideas to consider, depending on your query mix:
1. MySQL with the MyISAM database (non-ACID)
2. Put an in-application generic query cache in front of the DB, that runs in the app address space, e.g. Cache' if using Java
3. Using a DB is a good way to get generic querying capability, but if the "where" clause in the querying is over a small set of meta-data, and SQL syntax is not a big requirement, consider non-RDBMS alternatives, e.g. use XPath over a W3C DOM object tree to get primary keys to in-memory hash tables (possibly distributed with something like memcached)
If you're really looking for query performance, then any database which is designed with reliability and ACID consistency in mind is going to inherently have some mis-fit features.
Some other ideas to consider, depending on your query mix:
1. MySQL with the MyISAM database (non-ACID)
2. Put an in-application generic query cache in front of the DB, that runs in the app address space, e.g. Cache' if using Java
3. Using a DB is a good way to get generic querying capability, but if the "where" clause in the querying is over a small set of meta-data, and SQL syntax is not a big requirement, consider non-RDBMS alternatives, e.g. use XPath over a W3C DOM object tree to get primary keys to in-memory hash tables (possibly distributed with something like memcached)
On Mon, Jun 14, 2010 at 9:14 PM, jgardner@jonathangardner.net <jgardner@jonathangardner.net> wrote:
We have a fairly unique need for a local, in-memory cache. This will
store data aggregated from other sources. Generating the data only
takes a few minutes, and it is updated often. There will be some
fairly expensive queries of arbitrary complexity run at a fairly high
rate. We're looking for high concurrency and reasonable performance
throughout.
The entire data set is roughly 20 MB in size. We've tried Carbonado in
front of SleepycatJE only to discover that it chokes at a fairly low
concurrency and that Carbonado's rule-based optimizer is wholly
insufficient for our needs. We've also tried Carbonado's Map
Repository which suffers the same problems.
I've since moved the backend database to a local PostgreSQL instance
hoping to take advantage of PostgreSQL's superior performance at high
concurrency. Of course, at the default settings, it performs quite
poorly compares to the Map Repository and Sleepycat JE.
My question is how can I configure the database to run as quickly as
possible if I don't care about data consistency or durability? That
is, the data is updated so often and it can be reproduced fairly
rapidly so that if there is a server crash or random particles from
space mess up memory we'd just restart the machine and move on.
I've never configured PostgreSQL to work like this and I thought maybe
someone here had some ideas on a good approach to this.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 6/30/10 9:42 AM, Dave Crooke wrote: > I haven't jumped in yet on this thread, but here goes .... > > If you're really looking for query performance, then any database which > is designed with reliability and ACID consistency in mind is going to > inherently have some mis-fit features. > > Some other ideas to consider, depending on your query mix: > > 1. MySQL with the MyISAM database (non-ACID) > > 2. Put an in-application generic query cache in front of the DB, that > runs in the app address space, e.g. Cache' if using Java > > 3. Using a DB is a good way to get generic querying capability, but if > the "where" clause in the querying is over a small set of meta-data, and > SQL syntax is not a big requirement, consider non-RDBMS alternatives, > e.g. use XPath over a W3C DOM object tree to get primary keys to > in-memory hash tables (possibly distributed with something like memcached) These would be good suggestions if the "throwaway" database was the only one. But in real life, these throwaway databasesare built from other databases that are NOT throwaway, where the data matters and ACID is critical. In other words,they'll probably need Postgres anyway. Sure, you could use both Postgres and MySQL/ISAM, but that means installing and maintaining both, plus building all of theother application layers to work on both systems. Craig
On Tue, Jun 29, 2010 at 9:39 PM, Bruce Momjian <bruce@momjian.us> wrote: > Jignesh Shah wrote: >> On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > Tom Lane wrote: >> >> Bruce Momjian <bruce@momjian.us> writes: >> >> >>> I asked on IRC and was told it is true, and looking at the C code it >> >> >>> looks true. ?What synchronous_commit = false does is to delay writing >> >> >>> the wal buffers to disk and fsyncing them, not just fsync, which is >> >> >>> where the commit loss due to db process crash comes from. >> >> >> >> >> Ah, I see. ?Thanks. >> >> >> >> > I am personally surprised it was designed that way; ?I thought we would >> >> > just delay fsync. >> >> >> >> That would require writing and syncing to be separable actions. ?If >> >> you're using O_SYNC or similar, they aren't. >> > >> > Ah, very good point. ?I have added a C comment to clarify why this is >> > the current behavior; ?attached and applied. >> > >> > -- >> > ?Bruce Momjian ?<bruce@momjian.us> ? ? ? ?http://momjian.us >> > ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com >> >> >> Though has anybody seen a behaviour where synchronous_commit=off is >> slower than synchronous_commit=on ? Again there are two cases here >> one with O_* flag and other with f*sync flags. But I had seen that >> behavior with PostgreSQL 9.0 beta(2 I think) though havent really >> investigated it much yet .. (though now I dont remember which >> wal_sync_method flag) . Just curious if anybody has seen that >> behavior.. > > I have trouble believing how synchronous_commit=off could be slower than > 'on'. > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + None of us is going to be here forever. + > Hi Bruce, Let me clarify the problem a bit.. If the underlying WAL disk is SSD then it seems I can get synchronous_commit=on to work faster than synchronous_commit=off.. Yes sounds unintuitive to me. But the results seems to point in that direction. It could be that it hit some other bottleneck with synchronous_commit=off reaches that synchronous_commit=on does not hit (or has not hit yet). Brads point of wal buffers could be valid. Though typically I havent seen the need to increase it beyond 1024kB yet. Hopefully I will retry it with the latest PostgreSQL 9.0 bits and see it happens again. More on that later. Regards, Jignesh
On 6/30/2010 2:21 PM, Jignesh Shah wrote: > If the underlying WAL disk is SSD then it seems I can get > synchronous_commit=on to work faster than > synchronous_commit=off.. The first explanation that pops to mind is that synchronous_commit is writing all the time, which doesn't have the same sort of penalty on SSD. Whereas if you turn it off, then there are some idle periods where the SSD could be writing usefully, but instead it's buffering for the next burst instead. The importance of that can be magnified on operating systems that do their own buffering and tend to lag behind writes until they see an fsync call, like is the case on Linux with ext3.