Thread: unlogged tables
I am attempting to test this https://commitfest.postgresql.org/action/patch_view?id=424 but I'm not sure which version of PG this should be applied to. (it would be really neat, on here: https://commitfest.postgresql.org/action/commitfest_view?id=8 if there was a note that said, this test this stuff against git tag X or branch Y or whatever) I got the git: git clone git://git.postgresql.org/git/postgresql.git downloaded the patches, and applied them ok. then did ./configure and make after much spewage I got: bufmgr.c: In function 'PrefetchBuffer': bufmgr.c:126:10: error: 'struct RelationData' has no member named 'rd_istemp' make[4]: *** [bufmgr.o] Error 1 Just to make sure everything was ok with the original, I reset: git reset --hard HEAD^ ./configure make and all was well. so I tried again: make clean make maintainer-clean patch -p1 < relpersistence-v1.patch .. ok .. but then... $ patch -p1 < unlogged-tables-v1.patch patching file doc/src/sgml/indexam.sgml patching file doc/src/sgml/ref/create_table.sgml patching file doc/src/sgml/ref/create_table_as.sgml patching file src/backend/access/gin/gininsert.c patching file src/backend/access/gist/gist.c patching file src/backend/access/hash/hash.c patching file src/backend/access/nbtree/nbtree.c patching file src/backend/access/transam/xlog.c patching file src/backend/catalog/catalog.c patching file src/backend/catalog/heap.c patching file src/backend/catalog/index.c patching file src/backend/catalog/storage.c patching file src/backend/parser/gram.y patching file src/backend/storage/file/Makefile patching file src/backend/storage/file/copydir.c patching file src/backend/storage/file/fd.c The next patch would create the file src/backend/storage/file/reinit.c, which already exists! Assume -R? [n] That didnt happen the first time... I'm almost positive. Not sure what I should do now. -Andy
On Mon, Nov 15, 2010 at 8:56 PM, Andy Colson <andy@squeakycode.net> wrote: > I am attempting to test this > > https://commitfest.postgresql.org/action/patch_view?id=424 > > but I'm not sure which version of PG this should be applied to. (it would > be really neat, on here: > https://commitfest.postgresql.org/action/commitfest_view?id=8 > if there was a note that said, this test this stuff against git tag X or > branch Y or whatever) They're pretty much all against the master branch. > I got the git: > > git clone git://git.postgresql.org/git/postgresql.git > > downloaded the patches, and applied them ok. then did ./configure and make > > after much spewage I got: > > bufmgr.c: In function 'PrefetchBuffer': > bufmgr.c:126:10: error: 'struct RelationData' has no member named > 'rd_istemp' > make[4]: *** [bufmgr.o] Error 1 Woops. Good catch. I guess USE_PREFETCH isn't defined on my system. That line needs to be changed to say RelationUsesLocalBuffers(reln) rather than reln->rd_istemp. Updated patches attached. > That didnt happen the first time... I'm almost positive. When you applied the patches the first time, it created that file; but git reset --hard doesn't remove untracked files. > Not sure what I should do now. git clean -dfx git reset --hard git pull Apply attached patches. configure make make install -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
I was able to apply and compile and run ok, creating unlogged tables seems to work as well. I patched up pgbench to optionally create unlogged tables, and ran it both ways. I get ~80tps normally, and ~1,500tps with unlogged. (Thats from memory, was playing with it last night at home) I also have a "real world" test I can try (import apache logs and run a few stats). What other things would be good to test: indexes? analyze/stats/plans? dump/restore? Is "create temp unlogged table stuff(...)" an option? -Andy
Andy Colson <andy@squeakycode.net> writes: > Is "create temp unlogged table stuff(...)" an option? temp tables are unlogged already. regards, tom lane
On Tue, Nov 16, 2010 at 1:09 PM, Andy Colson <andy@squeakycode.net> wrote: > I was able to apply and compile and run ok, creating unlogged tables seems > to work as well. > > I patched up pgbench to optionally create unlogged tables, and ran it both > ways. I get ~80tps normally, and ~1,500tps with unlogged. (Thats from > memory, was playing with it last night at home) What do you get with normal tables but with fsync, full_page_writes, and synchronous_commits turned off? What do you get with normal tables but with sychronous_commit (only) off? Can you detect any performance regression on normal tables with the patch vs. without the patch? > I also have a "real world" test I can try (import apache logs and run a few > stats). That would be great. > What other things would be good to test: > indexes? > analyze/stats/plans? > dump/restore? All of those. I guess there's a question of what pg_dump should emit for an unlogged table. Clearly, we need to dump a CREATE UNLOGGED TABLE statement (which we do), and right now we also dump the table contents - which seems reasonable, but arguably someone could say that we ought not to dump the contents of anything less than a full-fledged, permanent table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of mar nov 16 15:34:55 -0300 2010: > On Tue, Nov 16, 2010 at 1:09 PM, Andy Colson <andy@squeakycode.net> wrote: > > dump/restore? > > All of those. I guess there's a question of what pg_dump should emit > for an unlogged table. Clearly, we need to dump a CREATE UNLOGGED > TABLE statement (which we do), and right now we also dump the table > contents - which seems reasonable, but arguably someone could say that > we ought not to dump the contents of anything less than a > full-fledged, permanent table. I think if you do a regular backup of the complete database, unlogged tables should come out empty, but if you specifically request a dump of it, it shouldn't. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Nov 16, 2010 at 1:58 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Robert Haas's message of mar nov 16 15:34:55 -0300 2010: >> On Tue, Nov 16, 2010 at 1:09 PM, Andy Colson <andy@squeakycode.net> wrote: > >> > dump/restore? >> >> All of those. I guess there's a question of what pg_dump should emit >> for an unlogged table. Clearly, we need to dump a CREATE UNLOGGED >> TABLE statement (which we do), and right now we also dump the table >> contents - which seems reasonable, but arguably someone could say that >> we ought not to dump the contents of anything less than a >> full-fledged, permanent table. > > I think if you do a regular backup of the complete database, unlogged > tables should come out empty, but if you specifically request a dump of > it, it shouldn't. Oh, wow. That seems confusing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Nov 16, 2010 at 1:58 PM, Alvaro Herrera >> I think if you do a regular backup of the complete database, unlogged >> tables should come out empty, but if you specifically request a dump of >> it, it shouldn't. > Oh, wow. That seems confusing. I don't like it either. I think allowing pg_dump to dump the data in an unlogged table is not only reasonable, but essential. Imagine that someone determines that his reliability needs will be adequately served by unlogged tables plus hourly backups. Now you're going to tell him that that doesn't work because pg_dump arbitrarily excludes the data in unlogged tables? regards, tom lane
On 11/16/2010 02:06 PM, Robert Haas wrote: > On Tue, Nov 16, 2010 at 1:58 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> >> I think if you do a regular backup of the complete database, unlogged >> tables should come out empty, but if you specifically request a dump of >> it, it shouldn't. > Oh, wow. That seems confusing. Yeah. And unnecessary. If you want it excluded we already have a switch for that. cheers andrew
On Tue, Nov 16, 2010 at 3:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Nov 16, 2010 at 1:58 PM, Alvaro Herrera >>> I think if you do a regular backup of the complete database, unlogged >>> tables should come out empty, but if you specifically request a dump of >>> it, it shouldn't. > >> Oh, wow. That seems confusing. > > I don't like it either. > > I think allowing pg_dump to dump the data in an unlogged table is not > only reasonable, but essential. Imagine that someone determines that > his reliability needs will be adequately served by unlogged tables plus > hourly backups. Now you're going to tell him that that doesn't work > because pg_dump arbitrarily excludes the data in unlogged tables? Yeah, you'd have to allow a flag to control the behavior. And in that case I'd rather the flag have a single default rather than different defaults depending on whether or not individual tables were selected. Something like --omit-unlogged-data. Incidentally, unlogged tables plus hourly backups is not dissimilar to what some NoSQL products are offering for reliability. Except with PG, you can (or soon will be able to, hopefully) selectively apply that lowered degree of reliability to a subset of your data for which you determine it's appropriate, while maintaining full reliability guarantees for other data. I am not aware of any other product which offers that level of fine-grained control over durability. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Yeah, you'd have to allow a flag to control the behavior. And in that > case I'd rather the flag have a single default rather than different > defaults depending on whether or not individual tables were selected. > Something like --omit-unlogged-data. Are you sure we don't want to default the other way? It seems to me that most people using unlogged tables won't want to back them up ... especially since the share lock for pgdump will add overhead for the kinds of high-volume updates people want to do with unlogged tables. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Tue, Nov 16, 2010 at 02:00:33PM -0800, Josh Berkus wrote: > > Yeah, you'd have to allow a flag to control the behavior. And in > > that case I'd rather the flag have a single default rather than > > different defaults depending on whether or not individual tables > > were selected. Something like --omit-unlogged-data. > > Are you sure we don't want to default the other way? +1 for defaulting the other way. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote: > It seems to me > that most people using unlogged tables won't want to back them up ... > especially since the share lock for pgdump will add overhead for the > kinds of high-volume updates people want to do with unlogged tables. Or perhaps most people will want them backed up, because them being unlogged the backup is the only way to get them back in case of a crash?
On 11/16/10 2:08 PM, Peter Eisentraut wrote: > On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote: >> It seems to me >> that most people using unlogged tables won't want to back them up ... >> especially since the share lock for pgdump will add overhead for the >> kinds of high-volume updates people want to do with unlogged tables. > > Or perhaps most people will want them backed up, because them being > unlogged the backup is the only way to get them back in case of a crash? Yeah, hard to tell, really. Which default is less likely to become a foot-gun? Maybe it's time for a survey on -general. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Tue, 2010-11-16 at 14:00 -0800, Josh Berkus wrote: > > Yeah, you'd have to allow a flag to control the behavior. And in that > > case I'd rather the flag have a single default rather than different > > defaults depending on whether or not individual tables were selected. > > Something like --omit-unlogged-data. > > Are you sure we don't want to default the other way? It seems to me > that most people using unlogged tables won't want to back them up ... +1 JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On 11/16/2010 05:12 PM, Josh Berkus wrote: > On 11/16/10 2:08 PM, Peter Eisentraut wrote: >> On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote: >>> It seems to me >>> that most people using unlogged tables won't want to back them up ... >>> especially since the share lock for pgdump will add overhead for the >>> kinds of high-volume updates people want to do with unlogged tables. >> Or perhaps most people will want them backed up, because them being >> unlogged the backup is the only way to get them back in case of a crash? > Yeah, hard to tell, really. Which default is less likely to become a > foot-gun? > > Maybe it's time for a survey on -general. > I would argue pretty strongly that backing something up is much less likely to be a foot-gun than not backing it up, and treating unlogged tables the same as logged tables for this purpose is also much less likely to be a foot-gun. As I pointed out upthread, we already have a mechanism for not backing up selected objects. I'd much rather have a rule that says "everything gets backed up by default" than one that says "everything gets backed up by default except unlogged tables". cheers andrew
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Nov 16, 2010 at 3:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think allowing pg_dump to dump the data in an unlogged table is not >> only reasonable, but essential. > Yeah, you'd have to allow a flag to control the behavior. And in that > case I'd rather the flag have a single default rather than different > defaults depending on whether or not individual tables were selected. > Something like --omit-unlogged-data. As long as the default is to include the data, I wouldn't object to having such a flag. A default that drops data seems way too foot-gun-like. regards, tom lane
On Wed, 2010-11-17 at 00:08 +0200, Peter Eisentraut wrote: > On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote: > > It seems to me > > that most people using unlogged tables won't want to back them up ... > > especially since the share lock for pgdump will add overhead for the > > kinds of high-volume updates people want to do with unlogged tables. > > Or perhaps most people will want them backed up, because them being > unlogged the backup is the only way to get them back in case of a crash? To me, the use of unlogged tables is going to be for dynamic, volatile data that can be rebuilt from an integrity set on a crash. Session tables, metadata tables, dynamic updates that are batched to logged tables every 10 minutes, that type of thing. I think Berkus has a good idea on asking general. JD > > > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Tuesday 16 November 2010 23:12:10 Josh Berkus wrote: > On 11/16/10 2:08 PM, Peter Eisentraut wrote: > > On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote: > >> It seems to me > >> that most people using unlogged tables won't want to back them up ... > >> especially since the share lock for pgdump will add overhead for the > >> kinds of high-volume updates people want to do with unlogged tables. > > > > Or perhaps most people will want them backed up, because them being > > unlogged the backup is the only way to get them back in case of a crash? > > Yeah, hard to tell, really. Which default is less likely to become a > foot-gun? Well. Maybe both possibilities are just propable(which I think is unlikely), but the different impact is pretty clear. One way your backup runs too long and too much data changes, the other way round you loose the data which you assumed safely backuped. Isn't that a *really* easy decision? Andres
Andres Freund <andres@anarazel.de> wrote: > One way your backup runs too long and too much data changes, the > other way round you loose the data which you assumed safely > backuped. > > Isn't that a *really* easy decision? Yeah. Count me in the camp which wants the default behavior to be that pg_dump backs up all permanent tables, even those which aren't WAL-logged (and therefore aren't kept up in PITR backups, hot/warm standbys, or streaming replication). -Kevin
On Tuesday 16 November 2010 23:30:29 Andres Freund wrote: > On Tuesday 16 November 2010 23:12:10 Josh Berkus wrote: > > On 11/16/10 2:08 PM, Peter Eisentraut wrote: > > > On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote: > > >> It seems to me > > >> that most people using unlogged tables won't want to back them up ... > > >> especially since the share lock for pgdump will add overhead for the > > >> kinds of high-volume updates people want to do with unlogged tables. > > > > > > Or perhaps most people will want them backed up, because them being > > > unlogged the backup is the only way to get them back in case of a > > > crash? > > > > Yeah, hard to tell, really. Which default is less likely to become a > > foot-gun? > > Well. Maybe both possibilities are just propable(which I think is > unlikely), but the different impact is pretty clear. > > One way your backup runs too long and too much data changes, the other way > round you loose the data which you assumed safely backuped. > > Isn't that a *really* easy decision? Oh, and another argument: Which are you more likely to discover: a backup that runs consistenly running for a short time or a backup thats getting slower and larger... Andres
Josh Berkus <josh@agliodbs.com> writes: >> Yeah, you'd have to allow a flag to control the behavior. And in that >> case I'd rather the flag have a single default rather than different >> defaults depending on whether or not individual tables were selected. >> Something like --omit-unlogged-data. > Are you sure we don't want to default the other way? It seems to me > that most people using unlogged tables won't want to back them up ... That's a very debatable assumption. You got any evidence for it? Personally, I don't think pg_dump should ever default to omitting data. > especially since the share lock for pgdump will add overhead for the > kinds of high-volume updates people want to do with unlogged tables. Say what? pg_dump just takes AccessShareLock. That doesn't add any overhead. regards, tom lane
> That's a very debatable assumption. You got any evidence for it? > Personally, I don't think pg_dump should ever default to omitting > data. Survey launched, although it may become a moot point, given how this discussion is going. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Tue, Nov 16, 2010 at 5:30 PM, Andres Freund <andres@anarazel.de> wrote: > On Tuesday 16 November 2010 23:12:10 Josh Berkus wrote: >> On 11/16/10 2:08 PM, Peter Eisentraut wrote: >> > On tis, 2010-11-16 at 14:00 -0800, Josh Berkus wrote: >> >> It seems to me >> >> that most people using unlogged tables won't want to back them up ... >> >> especially since the share lock for pgdump will add overhead for the >> >> kinds of high-volume updates people want to do with unlogged tables. >> > >> > Or perhaps most people will want them backed up, because them being >> > unlogged the backup is the only way to get them back in case of a crash? >> >> Yeah, hard to tell, really. Which default is less likely to become a >> foot-gun? > Well. Maybe both possibilities are just propable(which I think is unlikely), > but the different impact is pretty clear. > > One way your backup runs too long and too much data changes, the other way > round you loose the data which you assumed safely backuped. > > Isn't that a *really* easy decision? Yeah, it seems pretty clear to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown? Greetings Marcin Mańk
On Tue, Nov 16, 2010 at 5:57 PM, marcin mank <marcin.mank@gmail.com> wrote: > Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown? I don't think so. To make that work, you'd need to keep track of every backing file that might contain pages not fsync()'d to disk, and at shutdown time you'd need to fsync() them all before shutting down. Doing that would require an awful lot of bookkeeping for a pretty marginal gain. Maybe it would be useful to have: ALTER TABLE .. READ [ONLY|WRITE]; ...and preserve unlogged tables that are also read-only. Or perhaps something specific to unlogged tables: ALTER TABLE .. QUIESCE; ...which would take an AccessExclusiveLock, make the table read-only, fsync() it, and tag it for restart-survival. But I'm happy to leave all of this until we gain some field experience with this feature, and have a better idea what features people would most like to see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Nov 16, 2010 at 5:57 PM, marcin mank <marcin.mank@gmail.com> wrote: >> Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown? > I don't think so. To make that work, you'd need to keep track of > every backing file that might contain pages not fsync()'d to disk, and > at shutdown time you'd need to fsync() them all before shutting down. This is presuming that we want to guarantee the same level of safety for unlogged tables as for regular. Which, it seems to me, is exactly what people *aren't* asking for. Why not just write the data and shut down? If you're unlucky enough to have a system crash immediately after that, well, you might have corrupt data in the unlogged tables ... but that doesn't seem real probable. regards, tom lane
On 11/16/10 4:40 PM, Robert Haas wrote: > But I'm happy to leave all of this until we gain some field experience > with this feature, and have a better idea what features people would > most like to see. +1. Let's not complicate this. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Tue, Nov 16, 2010 at 7:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Nov 16, 2010 at 5:57 PM, marcin mank <marcin.mank@gmail.com> wrote: >>> Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown? > >> I don't think so. To make that work, you'd need to keep track of >> every backing file that might contain pages not fsync()'d to disk, and >> at shutdown time you'd need to fsync() them all before shutting down. > > This is presuming that we want to guarantee the same level of safety for > unlogged tables as for regular. Which, it seems to me, is exactly what > people *aren't* asking for. Why not just write the data and shut down? > If you're unlucky enough to have a system crash immediately after that, > well, you might have corrupt data in the unlogged tables ... but that > doesn't seem real probable. I have a hard time getting excited about a system that is designed to ensure that we probably don't have data corruption. The whole point of this feature is to relax the usual data integrity guarantees in a controlled way. A small but uncertain risk of corruption is not an improvement over a simple, predictable behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 16, 2010 at 02:07:35PM -0800, David Fetter wrote: > On Tue, Nov 16, 2010 at 02:00:33PM -0800, Josh Berkus wrote: > > > Yeah, you'd have to allow a flag to control the behavior. And in > > > that case I'd rather the flag have a single default rather than > > > different defaults depending on whether or not individual tables > > > were selected. Something like --omit-unlogged-data. > > > > Are you sure we don't want to default the other way? > > +1 for defaulting the other way. Upon further reflection, I'm switching to the "default to backing up unlogged tables" side. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 17.11.2010 03:56, Robert Haas wrote: > On Tue, Nov 16, 2010 at 7:46 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Robert Haas<robertmhaas@gmail.com> writes: >>> On Tue, Nov 16, 2010 at 5:57 PM, marcin mank<marcin.mank@gmail.com> wrote: >>>> Can (should ?) unlogged tables' contents survive graceful (non-crash) shutdown? >> >>> I don't think so. To make that work, you'd need to keep track of >>> every backing file that might contain pages not fsync()'d to disk, and >>> at shutdown time you'd need to fsync() them all before shutting down. >> >> This is presuming that we want to guarantee the same level of safety for >> unlogged tables as for regular. Which, it seems to me, is exactly what >> people *aren't* asking for. Why not just write the data and shut down? >> If you're unlucky enough to have a system crash immediately after that, >> well, you might have corrupt data in the unlogged tables ... but that >> doesn't seem real probable. > > I have a hard time getting excited about a system that is designed to > ensure that we probably don't have data corruption. The whole point > of this feature is to relax the usual data integrity guarantees in a > controlled way. A small but uncertain risk of corruption is not an > improvement over a simple, predictable behavior. I agree with Robert, the point of unlogged tables is that the system knows to zap them away if there's any risk of having corruption in them. A corrupt page can lead to all kinds of errors. We try to handle corruption gracefully, but I wouldn't be surprised if you managed to even get a segfault caused by a torn page if you're unlucky. fsync()ing the file at shutdown doesn't seem too bad to me from performance point of view, we tolerate that for all other tables. And you can always truncate the table yourself before shutdown. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > fsync()ing the file at shutdown doesn't seem too bad to me from > performance point of view, we tolerate that for all other tables. And > you can always truncate the table yourself before shutdown. The objection to that was not about performance. It was about how to find out what needs to be fsync'd. regards, tom lane
On Wed, Nov 17, 2010 at 3:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> fsync()ing the file at shutdown doesn't seem too bad to me from >> performance point of view, we tolerate that for all other tables. And >> you can always truncate the table yourself before shutdown. > > The objection to that was not about performance. It was about how > to find out what needs to be fsync'd. > Just a crazy brainstorming thought, but.... If this is a clean shutdown then all the non-unlogged tables have been checkpointed so they should have no dirty pages in them anyways. So we could just fsync everything. -- greg
Greg Stark <gsstark@mit.edu> wrote: > If this is a clean shutdown then all the non-unlogged tables have > been checkpointed so they should have no dirty pages in them > anyways. So we could just fsync everything. Or just all the unlogged tables. -Kevin
On Wed, Nov 17, 2010 at 11:00 AM, Greg Stark <gsstark@mit.edu> wrote: > On Wed, Nov 17, 2010 at 3:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> fsync()ing the file at shutdown doesn't seem too bad to me from >>> performance point of view, we tolerate that for all other tables. And >>> you can always truncate the table yourself before shutdown. >> >> The objection to that was not about performance. It was about how >> to find out what needs to be fsync'd. > > Just a crazy brainstorming thought, but.... > > If this is a clean shutdown then all the non-unlogged tables have been > checkpointed so they should have no dirty pages in them anyways. So we > could just fsync everything. Hmm, that reminds me: checkpoints should really skip writing buffers belonging to unlogged relations altogether; and any fsync against an unlogged relation should be skipped. I need to go take a look at what's required to make that happen, either as part of this patch or as a follow-on commit. It might be interesting to have a kind of semi-unlogged table where we write a special xlog record for the first access after each checkpoint but otherwise don't xlog. On redo, we truncate the tables mentioned, but not any others, since they're presumably OK. But that's not what I'm trying to design here. I'm trying optimize it for the case where you DON'T care about durability and you just want it to be as fast as possible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, All: I hope you're following the thread on -general about this feature. We're getting a lot of feedback. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 17.11.2010 17:11, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> fsync()ing the file at shutdown doesn't seem too bad to me from >> performance point of view, we tolerate that for all other tables. And >> you can always truncate the table yourself before shutdown. > > The objection to that was not about performance. It was about how > to find out what needs to be fsync'd. I must be missing something: we handle that just fine with normal tables, why is it a problem for unlogged tables? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 17.11.2010 17:11, Tom Lane wrote: >> The objection to that was not about performance. It was about how >> to find out what needs to be fsync'd. > I must be missing something: we handle that just fine with normal > tables, why is it a problem for unlogged tables? Hmm ... that's a good point. If we simply treat unlogged tables the same as regular for checkpointing purposes, don't we end up having flushed them all correctly during a shutdown checkpoint? I was thinking that WAL-logging had some influence on that logic, but it doesn't. Robert is probably going to object that he wanted to prevent any fsyncing for unlogged tables, but the discussion over in pgsql-general is crystal clear that people do NOT want to lose unlogged data over a clean shutdown and restart. If all it takes to do that is to refrain from lobotomizing the checkpoint logic for unlogged tables, I say we should refrain. regards, tom lane
On Wed, Nov 17, 2010 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote: > Robert, All: > > I hope you're following the thread on -general about this feature. > We're getting a lot of feedback. I haven't been; I'm not subscribed to general; it'd be useful to CC me next time. Reading through the thread in the archives, it seems like people are mostly confused. Some are confused about the current behavior of the patch (no, it really does always truncate your tables, I swear); others are confused about how WAL logging works (of course a backend crash doesn't truncate an ordinary table - that's because it's WAL LOGGED); and still others are maybe not exactly confused but hoping that unlogged table = MyISAM (try not to corrupt your data, but don't get too bent out of shape about the possibility that it may get corrupted anyway). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 17, 2010 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> On 17.11.2010 17:11, Tom Lane wrote: >>> The objection to that was not about performance. It was about how >>> to find out what needs to be fsync'd. > >> I must be missing something: we handle that just fine with normal >> tables, why is it a problem for unlogged tables? > > Hmm ... that's a good point. If we simply treat unlogged tables the > same as regular for checkpointing purposes, don't we end up having > flushed them all correctly during a shutdown checkpoint? I was thinking > that WAL-logging had some influence on that logic, but it doesn't. > > Robert is probably going to object that he wanted to prevent any > fsyncing for unlogged tables, but the discussion over in pgsql-general > is crystal clear that people do NOT want to lose unlogged data over > a clean shutdown and restart. If all it takes to do that is to refrain > from lobotomizing the checkpoint logic for unlogged tables, I say we > should refrain. I think that's absolutely a bad idea. I seriously do not want to have a conversation with someone about why their unlogged tables are exacerbating their checkpoint I/O spikes. I'd be happy to have two modes, though. We should probably revisit the syntax, though. One, it seems that CREATE UNLOGGED TABLE is not as clear as I thought it was. Two, when (not if) we add more durability levels, we don't want to create keywords for all of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Nov 17, 2010 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert is probably going to object that he wanted to prevent any >> fsyncing for unlogged tables, but the discussion over in pgsql-general >> is crystal clear that people do NOT want to lose unlogged data over >> a clean shutdown and restart. �If all it takes to do that is to refrain >> from lobotomizing the checkpoint logic for unlogged tables, I say we >> should refrain. > I think that's absolutely a bad idea. The customer is always right, and I think we are hearing loud and clear what the customers want. Please let's not go out of our way to create a feature that isn't what they want. regards, tom lane
On Wed, Nov 17, 2010 at 02:16:06PM -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Nov 17, 2010 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Robert is probably going to object that he wanted to prevent any > >> fsyncing for unlogged tables, but the discussion over in pgsql-general > >> is crystal clear that people do NOT want to lose unlogged data over > >> a clean shutdown and restart. �If all it takes to do that is to refrain > >> from lobotomizing the checkpoint logic for unlogged tables, I say we > >> should refrain. > > > I think that's absolutely a bad idea. > > The customer is always right, and I think we are hearing loud and clear > what the customers want. Please let's not go out of our way to create > a feature that isn't what they want. > > regards, tom lane > I would be fine with only having a safe shutdown with unlogged tables and skip the checkpoint I/O all other times. Cheers, Ken
On 11/17/2010 02:22 PM, Kenneth Marshall wrote: > On Wed, Nov 17, 2010 at 02:16:06PM -0500, Tom Lane wrote: >> Robert Haas<robertmhaas@gmail.com> writes: >>> On Wed, Nov 17, 2010 at 1:46 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>> Robert is probably going to object that he wanted to prevent any >>>> fsyncing for unlogged tables, but the discussion over in pgsql-general >>>> is crystal clear that people do NOT want to lose unlogged data over >>>> a clean shutdown and restart. �If all it takes to do that is to refrain >>>> from lobotomizing the checkpoint logic for unlogged tables, I say we >>>> should refrain. >>> I think that's absolutely a bad idea. >> The customer is always right, and I think we are hearing loud and clear >> what the customers want. Please let's not go out of our way to create >> a feature that isn't what they want. > I would be fine with only having a safe shutdown with unlogged tables > and skip the checkpoint I/O all other times. Yeah, I was just thinking something like that would be good, and should overcome Robert's objection to the whole idea. I also agree with Tom's sentiment above. To answer another point I see Tom made on the -general list: while individual backends may crash from time to time, crashes of the whole Postgres server are very rare in my experience in production environments. It's really pretty robust, unless you're doing crazy stuff. So that makes it all the more important that we can restart a server cleanly (say, to change a config setting) without losing the unlogged tables. If we don't allow that we'll make a laughing stock of ourselves. Honestly. cheers andrew
Excerpts from Robert Haas's message of mié nov 17 15:48:56 -0300 2010: > On Wed, Nov 17, 2010 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote: > > Robert, All: > > > > I hope you're following the thread on -general about this feature. > > We're getting a lot of feedback. > > I haven't been; I'm not subscribed to general; it'd be useful to CC me > next time. FWIW I've figured that being subscribed to the lists is good even if I have my mail client configured to hide these emails by default. It's a lot easier for searching stuff that someone else references. (I made the mistake of having it hide all pg-general email even though I was CC'ed, though, which is the trivial way to implement this. I don't recommend repeating this mistake.) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Andrew Dunstan <andrew@dunslane.net> writes: > On 11/17/2010 02:22 PM, Kenneth Marshall wrote: >> I would be fine with only having a safe shutdown with unlogged tables >> and skip the checkpoint I/O all other times. > Yeah, I was just thinking something like that would be good, and should > overcome Robert's objection to the whole idea. I don't think you can fsync only in the shutdown checkpoint and assume your data is safe, if you didn't fsync a write a few moments earlier. Now, a few minutes ago Robert was muttering about supporting more than one kind of degraded-reliability table. I could see inventing "unlogged" tables, which means exactly that (no xlog support, but we still checkpoint/fsync as usual), and "unsynced" tables which also/instead suppress fsync activity. The former type could be assumed to survive a clean shutdown/restart, while the latter wouldn't. This would let people pick their poison. regards, tom lane
> Now, a few minutes ago Robert was muttering about supporting more than > one kind of degraded-reliability table. I could see inventing > "unlogged" tables, which means exactly that (no xlog support, but we > still checkpoint/fsync as usual), and "unsynced" tables which > also/instead suppress fsync activity. The former type could be assumed > to survive a clean shutdown/restart, while the latter wouldn't. This > would let people pick their poison. We're assuming here that the checkpoint activity for the unlogged table causes significant load on a production system. Maybe we should do some testing before we try to make this overly complex? I wouldn't be surprised to find that on most filesystems the extra checkpointing of the unlogged tables adds only small minority overhead. Shouldn't be hard to build out pgbench into something which will test this ... if only I had a suitable test machine available. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
[ forgot to comment on this part ] Andrew Dunstan <andrew@dunslane.net> writes: > To answer another point I see Tom made on the -general list: while > individual backends may crash from time to time, crashes of the whole > Postgres server are very rare in my experience in production > environments. Well, if you mean the postmaster darn near never goes down, that's true, because we go out of our way to ensure it does as little as possible. But that has got zip to do with this discussion, because a backend crash has to be assumed to have corrupted unlogged tables. There are some folk over in -general who are wishfully thinking that only a postmaster crash would lose their unlogged data, but that's simply wrong. Backend crashes *will* truncate those tables; there is no way around that. The comment I made was that my experience as to how often backends crash might not square with production experience --- but you do have to draw the distinction between a backend crash and a postmaster crash. regards, tom lane
On 11/17/2010 02:44 PM, Tom Lane wrote: > [ forgot to comment on this part ] > > Andrew Dunstan<andrew@dunslane.net> writes: >> To answer another point I see Tom made on the -general list: while >> individual backends may crash from time to time, crashes of the whole >> Postgres server are very rare in my experience in production >> environments. > Well, if you mean the postmaster darn near never goes down, that's true, > because we go out of our way to ensure it does as little as possible. > But that has got zip to do with this discussion, because a backend crash > has to be assumed to have corrupted unlogged tables. There are some > folk over in -general who are wishfully thinking that only a postmaster > crash would lose their unlogged data, but that's simply wrong. Backend > crashes *will* truncate those tables; there is no way around that. The > comment I made was that my experience as to how often backends crash > might not square with production experience --- but you do have to draw > the distinction between a backend crash and a postmaster crash. OK. I'd missed that. Thanks for clarifying. cheers andrew
On Wed, Nov 17, 2010 at 2:31 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >>> The customer is always right, and I think we are hearing loud and clear >>> what the customers want. Please let's not go out of our way to create >>> a feature that isn't what they want. >> >> I would be fine with only having a safe shutdown with unlogged tables >> and skip the checkpoint I/O all other times. > > Yeah, I was just thinking something like that would be good, and should > overcome Robert's objection to the whole idea. Could we slow down here a bit and talk through the ideas here in a logical fashion? The customer is always right, but the informed customer makes better decisions than the uninformed customer. This idea, as proposed, does not work. If you only include dirty buffers at the final checkpoint before shutting down, you have no guarantee that any buffers that you either didn't write or didn't fsync previously are actually on disk. Therefore, you have no guarantee that the table data is not corrupted.So you really have to decide between including theunlogged-table buffers in EVERY checkpoint and not ever including them at all. Which one is right depends on your use case. For example, consider the poster who said that, when this feature is available, they plan to try ripping out their memcached instance and replacing it with PostgreSQL running unlogged tables. Suppose this poster (or someone else in a similar situation) has a 64 GB and is currently running a 60 GB memcached instance on it, which is not an unrealistic scenario for memcached. Suppose further that he dirties 25% of that data each hour. memcached is currently doing no writes to disk. When he switches to PostgreSQL and sets checkpoints_segments to a gazillion and checkpoint_timeout to the maximum, he's going to start writing 15 GB of data to disk every hour - data which he clearly doesn't care about losing, or preserving across restarts, because he's currently storing it in memcached. In fact, with memcached, he'll not only lose data at shutdown - he'll lose data on a regular basis when everything is running normally. We can try to convince ourselves that someone in this situation will not care about needing to get 15GB of disposable data per hour from memory to disk in order to have a feature that he doesn't need, but I think it's going to be pretty hard to make that credible. Now, second use case. Consider someone who is currently running PostgreSQL in a non-durable configuration, with fsync=off, full_page_writes=off, and synchronous_commit=off. This person - who is based on someone I spoke with at PG West - is doing a large amount of data processing using PostGIS. Their typical workflow is to load a bunch of data, run a simulation, and then throw away the entire database. They don't want to pay the cost of durability because if they crash in mid-simulation they will simply rerun it. Being fast is more important. Whether or not this person will be happy with the proposed behavior is a bit harder to say. If it kills performance, they will definitely hate it. But if the performance penalty is only modest, they may enjoy the convenience of being able to shut down the database and start it up again later without losing data. Third use case. Someone on pgsql-general mentioned that they want to write logs to PG, and can abide losing them if a crash happens, but not on a clean shutdown and restart. This person clearly shuts down their production database a lot more often than I do, but that is OK. By explicit stipulation, they want the survive-a-clean-shutdown behavior. I have no problem supporting that use case, providing they are willing to take the associated performance penalty at checkpoint time, which we don't know because we haven't asked, but I'm fine with assuming it's useful even though I probably wouldn't use it much myself. > I also agree with Tom's sentiment above. > > To answer another point I see Tom made on the -general list: while > individual backends may crash from time to time, crashes of the whole > Postgres server are very rare in my experience in production environments. > It's really pretty robust, unless you're doing crazy stuff. So that makes it > all the more important that we can restart a server cleanly (say, to change > a config setting) without losing the unlogged tables. If we don't allow that > we'll make a laughing stock of ourselves. Honestly. Let's please not assume that there is only one reasonable option here, or that I have not thought about some of these issues. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 17, 2010 at 2:42 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> Now, a few minutes ago Robert was muttering about supporting more than >> one kind of degraded-reliability table. I could see inventing >> "unlogged" tables, which means exactly that (no xlog support, but we >> still checkpoint/fsync as usual), and "unsynced" tables which >> also/instead suppress fsync activity. The former type could be assumed >> to survive a clean shutdown/restart, while the latter wouldn't. This >> would let people pick their poison. > > We're assuming here that the checkpoint activity for the unlogged table > causes significant load on a production system. Maybe we should do some > testing before we try to make this overly complex? I wouldn't be > surprised to find that on most filesystems the extra checkpointing of > the unlogged tables adds only small minority overhead. > > Shouldn't be hard to build out pgbench into something which will test > this ... if only I had a suitable test machine available. I guess the point I'd make here is that checkpoint I/O will be a problem for unlogged tables in exactly the same situations in which it is a problem for regular tables. There is some amount of I/O that your system can handle before the additional I/O caused by checkpoints starts to become a problem. If unlogged tables (or one particular variant of unlogged tables) don't need to participate in checkpoints, then you will be able to use unlogged tables, in situations where they are appropriate to the workload, to control your I/O load and hopefully keep it below the level where it causes a problem. Of course, there will also be workloads where your system has plenty of spare capacity (in which case it won't matter) or where your system is going to be overwhelmed anyway (in which case it doesn't really matter either). But if you are somewhere between those two extremes, this has to matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wednesday 17 November 2010 20:54:14 Robert Haas wrote: > On Wed, Nov 17, 2010 at 2:31 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > >>> The customer is always right, and I think we are hearing loud and clear > >>> what the customers want. Please let's not go out of our way to create > >>> a feature that isn't what they want. > >> > >> I would be fine with only having a safe shutdown with unlogged tables > >> and skip the checkpoint I/O all other times. > > > > Yeah, I was just thinking something like that would be good, and should > > overcome Robert's objection to the whole idea. > > Could we slow down here a bit and talk through the ideas here in a > logical fashion? > > The customer is always right, but the informed customer makes better > decisions than the uninformed customer. This idea, as proposed, does > not work. If you only include dirty buffers at the final checkpoint > before shutting down, you have no guarantee that any buffers that you > either didn't write or didn't fsync previously are actually on disk. > Therefore, you have no guarantee that the table data is not corrupted. > So you really have to decide between including the unlogged-table > buffers in EVERY checkpoint and not ever including them at all. Which > one is right depends on your use case. How can you get a buffer which was no written out *at all*? Do you want to force all such pages to stay in shared_buffers? That sounds quite a bit more complicated than what you proposed... > For example, consider the poster who said that, when this feature is > available, they plan to try ripping out their memcached instance and > replacing it with PostgreSQL running unlogged tables. Suppose this > poster (or someone else in a similar situation) has a 64 GB and is > currently running a 60 GB memcached instance on it, which is not an > unrealistic scenario for memcached. Suppose further that he dirties > 25% of that data each hour. memcached is currently doing no writes to > disk. When he switches to PostgreSQL and sets checkpoints_segments to > a gazillion and checkpoint_timeout to the maximum, he's going to start > writing 15 GB of data to disk every hour - data which he clearly > doesn't care about losing, or preserving across restarts, because he's > currently storing it in memcached. In fact, with memcached, he'll not > only lose data at shutdown - he'll lose data on a regular basis when > everything is running normally. We can try to convince ourselves that > someone in this situation will not care about needing to get 15GB of > disposable data per hour from memory to disk in order to have a > feature that he doesn't need, but I think it's going to be pretty hard > to make that credible. To really support that use case we would first need to make shared_buffers properly scale to 64GB - which unfortunatley, in my experience, is not yet the case. Also, see the issues in the former paragraph - I have severe doubts you can support such a memcached scenario by pg. Either you spill to disk if your buffers overflow (fine with me) or you need to throw away data memcached alike. I doubt there is a sensible implementation in pg for the latter. So you will have to write to disk at some point... > Third use case. Someone on pgsql-general mentioned that they want to > write logs to PG, and can abide losing them if a crash happens, but > not on a clean shutdown and restart. This person clearly shuts down > their production database a lot more often than I do, but that is OK. > By explicit stipulation, they want the survive-a-clean-shutdown > behavior. I have no problem supporting that use case, providing they > are willing to take the associated performance penalty at checkpoint > time, which we don't know because we haven't asked, but I'm fine with > assuming it's useful even though I probably wouldn't use it much > myself. Maybe I am missing something - but why does this imply we have to write data at checkpoints? Just fsyncing every file belonging to an persistently-unlogged (or whatever sensible name anyone can come up) table is not prohibively expensive - in fact doing that on a local $PGDATA with approx 300GB and loads of tables doing so takes less than 15s on a system with hot inode/dentry cache and no dirty files. (just `find $PGDATA -print0|xargs -0 fsync_many_files` with fsync_many_files beeing a tiny c program doing posix_fadvise(POSIX_FADV_DONTNEED) on all files and then fsyncs every one). The assumption of a hot inode cache is realistic I think. Andres
On Wed, Nov 17, 2010 at 2:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 11/17/2010 02:22 PM, Kenneth Marshall wrote: >>> I would be fine with only having a safe shutdown with unlogged tables >>> and skip the checkpoint I/O all other times. > >> Yeah, I was just thinking something like that would be good, and should >> overcome Robert's objection to the whole idea. > > I don't think you can fsync only in the shutdown checkpoint and assume > your data is safe, if you didn't fsync a write a few moments earlier. > > Now, a few minutes ago Robert was muttering about supporting more than > one kind of degraded-reliability table. I could see inventing > "unlogged" tables, which means exactly that (no xlog support, but we > still checkpoint/fsync as usual), and "unsynced" tables which > also/instead suppress fsync activity. The former type could be assumed > to survive a clean shutdown/restart, while the latter wouldn't. This > would let people pick their poison. OK, so we're proposing a hierarchy like this. 1. PERMANENT (already exists). Permanent tables are WAL-logged, participate in checkpoints, and are fsync'd. They survive crashes and clean restarts, and are replicated. 2. UNLOGGED (what this patch currently implements). Unlogged tables are not WAL-logged, but they do participate in checkpoints and they are fsync'd on request. They survive clean restarts, but on a crash they are truncated. They are not replicated. 3. UNSYNCED (future work). Unsynced tables are not WAL-logged, do not participate in checkpoints, and are never fsync'd. After any sort of crash or shutdown, clean or otherwise, they are truncated. They are not replicated. 4. GLOBAL TEMPORARY (future work). Global temporary tables are not WAL-logged, do not participate in checkpoints, and are never fsync'd. The contents of each global temporary table are private to that session, so that they can use the local buffer manager rather than shared buffers. Multiple sessions can use a global temporary table at the same time, and each sees separate contents. At session exit, any contents inserted by the owning backend are lost; since all sessions exit on crash or shutdown, all contents are also lost at that time. 5. LOCAL TEMPORARY (our current temp tables). Local temporary tables are not WAL-logged, do not participate in checkpoints, and are never fsync'd. The table definition and all of its contents are private to the session, so that they are dropped at session exit (or at transaction end if ON COMMIT DROP is used). Since all sessions exit on crash or shutdown, all table definitions and all table contents are lost at that time. It's possible to imagine a few more stops on this hierarchy. For example, you could have an ASYNCHRONOUS table between (1) and (2) that always acts as if synchronous_commit=off, but is otherwise replicated and durable over crashes; or a MINIMALLY LOGGED table that is XLOG'd as if wal_level=minimal even when the actual value of wal_level is otherwise, and is therefore crash-safe but not replication-safe; or a level that is similar to unlogged but we XLOG the first event that dirties a page after each checkpoint, and therefore even on a crash we need only remove the tables for which such an XLOG record has been written. All of those are a bit speculative perhaps but we could jam them in there if there's demand, I suppose. I don't particularly care for the name UNSYNCED, and I'm starting not to like UNLOGGED much either, although at least that one is an actual word. PERMANENT and the flavors of TEMPORARY are a reasonably comprehensible as a description of user-visible behavior, but UNLOGGED and UNSYNCED sounds a lot like they're discussing internal details that the user might not actually understand or care about. I don't have a better idea right off the top of my head, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/17/2010 03:37 PM, Robert Haas wrote: > I don't particularly care for the name UNSYNCED, and I'm starting not > to like UNLOGGED much either, although at least that one is an actual > word. PERMANENT and the flavors of TEMPORARY are a reasonably > comprehensible as a description of user-visible behavior, but UNLOGGED > and UNSYNCED sounds a lot like they're discussing internal details > that the user might not actually understand or care about. I don't > have a better idea right off the top of my head, though. Maybe VOLATILE for UNSYNCED? Not sure about UNLOGGED. cheers andrew
On Wed, Nov 17, 2010 at 3:35 PM, Andres Freund <andres@anarazel.de> wrote: >> The customer is always right, but the informed customer makes better >> decisions than the uninformed customer. This idea, as proposed, does >> not work. If you only include dirty buffers at the final checkpoint >> before shutting down, you have no guarantee that any buffers that you >> either didn't write or didn't fsync previously are actually on disk. >> Therefore, you have no guarantee that the table data is not corrupted. >> So you really have to decide between including the unlogged-table >> buffers in EVERY checkpoint and not ever including them at all. Which >> one is right depends on your use case. > How can you get a buffer which was no written out *at all*? Do you want to > force all such pages to stay in shared_buffers? That sounds quite a bit more > complicated than what you proposed... Oh, you're right. We always have to write buffers before kicking them out of shared_buffers, but if we don't fsync them we have no guarantee they're actually on disk. >> For example, consider the poster who said that, when this feature is >> available, they plan to try ripping out their memcached instance and >> replacing it with PostgreSQL running unlogged tables. Suppose this >> poster (or someone else in a similar situation) has a 64 GB and is >> currently running a 60 GB memcached instance on it, which is not an >> unrealistic scenario for memcached. Suppose further that he dirties >> 25% of that data each hour. memcached is currently doing no writes to >> disk. When he switches to PostgreSQL and sets checkpoints_segments to >> a gazillion and checkpoint_timeout to the maximum, he's going to start >> writing 15 GB of data to disk every hour - data which he clearly >> doesn't care about losing, or preserving across restarts, because he's >> currently storing it in memcached. In fact, with memcached, he'll not >> only lose data at shutdown - he'll lose data on a regular basis when >> everything is running normally. We can try to convince ourselves that >> someone in this situation will not care about needing to get 15GB of >> disposable data per hour from memory to disk in order to have a >> feature that he doesn't need, but I think it's going to be pretty hard >> to make that credible. > To really support that use case we would first need to make shared_buffers > properly scale to 64GB - which unfortunatley, in my experience, is not yet the > case. Well, that's something to aspire to. :-) > Also, see the issues in the former paragraph - I have severe doubts you can > support such a memcached scenario by pg. Either you spill to disk if your > buffers overflow (fine with me) or you need to throw away data memcached alike. I > doubt there is a sensible implementation in pg for the latter. > > So you will have to write to disk at some point... I agree that there are difficulties, but again, doing checkpoint I/O for data that the user was willing to throw away is going in the wrong direction. >> Third use case. Someone on pgsql-general mentioned that they want to >> write logs to PG, and can abide losing them if a crash happens, but >> not on a clean shutdown and restart. This person clearly shuts down >> their production database a lot more often than I do, but that is OK. >> By explicit stipulation, they want the survive-a-clean-shutdown >> behavior. I have no problem supporting that use case, providing they >> are willing to take the associated performance penalty at checkpoint >> time, which we don't know because we haven't asked, but I'm fine with >> assuming it's useful even though I probably wouldn't use it much >> myself. > Maybe I am missing something - but why does this imply we have to write data > at checkpoints? > Just fsyncing every file belonging to an persistently-unlogged (or whatever > sensible name anyone can come up) table is not prohibively expensive - in fact > doing that on a local $PGDATA with approx 300GB and loads of tables doing so > takes less than 15s on a system with hot inode/dentry cache and no dirty files. > (just `find $PGDATA -print0|xargs -0 fsync_many_files` with fsync_many_files > beeing a tiny c program doing posix_fadvise(POSIX_FADV_DONTNEED) on all files > and then fsyncs every one). > The assumption of a hot inode cache is realistic I think. Hmm. I don't really want to try to do it in this patch because it's complicated enough already, but if people don't mind the shutdown sequence potentially being slowed down a bit, that might allow us to have the best of both worlds without needing to invent multiple durability levels. I was sort of assuming that people wouldn't want to slow down the shutdown sequence to avoid losing data they've already declared isn't that valuable, but evidently I underestimated the demand for kinda-durable tables. If the overhead of doing this isn't too severe, it might be the way to go. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 17, 2010 at 03:48:52PM -0500, Andrew Dunstan wrote: > On 11/17/2010 03:37 PM, Robert Haas wrote: > >I don't particularly care for the name UNSYNCED, and I'm starting > >not to like UNLOGGED much either, although at least that one is an > >actual word. PERMANENT and the flavors of TEMPORARY are a > >reasonably comprehensible as a description of user-visible > >behavior, but UNLOGGED and UNSYNCED sounds a lot like they're > >discussing internal details that the user might not actually > >understand or care about. I don't have a better idea right off the > >top of my head, though. > > Maybe VOLATILE for UNSYNCED? Not sure about UNLOGGED. +1 for describing the end-user-visible behavior. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 11/17/2010 12:48 PM, Andrew Dunstan wrote: > > Maybe VOLATILE for UNSYNCED? Not sure about UNLOGGED. > UNSAFE and EXTREMELY_UNSAFE?? :) Cheers, Steve
Robert Haas <robertmhaas@gmail.com> wrote: > OK, so we're proposing a hierarchy like this. > > 1. PERMANENT (already exists). > 2. UNLOGGED (what this patch currently implements). > 3. UNSYNCED (future work). > 4. GLOBAL TEMPORARY (future work). > 5. LOCAL TEMPORARY (our current temp tables). All of the above would have real uses in our shop. > It's possible to imagine a few more stops on this hierarchy. Some of these might be slightly preferred over the above in certain circumstances, but that's getting down to fine tuning. I think the five listed above are more important than the "speculative ones mentioned. > I don't particularly care for the name UNSYNCED EVANESCENT? > I'm starting not to like UNLOGGED much either EPHEMERAL? Actually, the UNSYNCED and UNLOGGED seem fairly clear.... -Kevin
On Nov 17, 2010, at 4:00 PM, Kevin Grittner wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> OK, so we're proposing a hierarchy like this. >> >> 1. PERMANENT (already exists). > >> 2. UNLOGGED (what this patch currently implements). > >> 3. UNSYNCED (future work). > >> 4. GLOBAL TEMPORARY (future work). > >> 5. LOCAL TEMPORARY (our current temp tables). > > All of the above would have real uses in our shop. > >> It's possible to imagine a few more stops on this hierarchy. > > Some of these might be slightly preferred over the above in certain > circumstances, but that's getting down to fine tuning. I think the > five listed above are more important than the "speculative ones > mentioned. > >> I don't particularly care for the name UNSYNCED > > EVANESCENT? > >> I'm starting not to like UNLOGGED much either > > EPHEMERAL? > > Actually, the UNSYNCED and UNLOGGED seem fairly clear.... Unless one thinks that the types could be combined- perhaps a table declaration could use both UNLOGGED and UNSYNCED? Cheers, M
On Wed, Nov 17, 2010 at 4:00 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> OK, so we're proposing a hierarchy like this. >> >> 1. PERMANENT (already exists). > >> 2. UNLOGGED (what this patch currently implements). > >> 3. UNSYNCED (future work). > >> 4. GLOBAL TEMPORARY (future work). > >> 5. LOCAL TEMPORARY (our current temp tables). > > All of the above would have real uses in our shop. > >> It's possible to imagine a few more stops on this hierarchy. > > Some of these might be slightly preferred over the above in certain > circumstances, but that's getting down to fine tuning. I think the > five listed above are more important than the "speculative ones > mentioned. > >> I don't particularly care for the name UNSYNCED > > EVANESCENT? > >> I'm starting not to like UNLOGGED much either > > EPHEMERAL? > > Actually, the UNSYNCED and UNLOGGED seem fairly clear.... I think Andrew's suggestion of VOLATILE is pretty good. It's hard to come up with multiple words that express gradations of "we might decide to chuck your data if things go South", though. Then again if we go with Andres's suggestion maybe we can get by with one level. Or if we still end up with multiple levels, maybe it's best to use VOLATILE for everything >1 and <4, and then have a subordinate clause to specify gradations. CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS THAT EXPLAIN THE DETAILS GO HERE; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/17/2010 04:00 PM, Kevin Grittner wrote: > Actually, the UNSYNCED and UNLOGGED seem fairly clear.... I think Robert's right. These names won't convey much to someone not steeped in our technology. cheers andrew
Excerpts from Robert Haas's message of mié nov 17 17:51:37 -0300 2010: > On Wed, Nov 17, 2010 at 3:35 PM, Andres Freund <andres@anarazel.de> wrote: > > How can you get a buffer which was no written out *at all*? Do you want to > > force all such pages to stay in shared_buffers? That sounds quite a bit more > > complicated than what you proposed... > > Oh, you're right. We always have to write buffers before kicking them > out of shared_buffers, but if we don't fsync them we have no guarantee > they're actually on disk. You could just open all the segments and fsync them. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Nov 17, 2010 at 04:05:56PM -0500, Robert Haas wrote: > On Wed, Nov 17, 2010 at 4:00 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: > > Robert Haas <robertmhaas@gmail.com> wrote: > > > >> OK, so we're proposing a hierarchy like this. > >> > >> 1. PERMANENT (already exists). > > > >> 2. UNLOGGED (what this patch currently implements). > > > >> 3. UNSYNCED (future work). > > > >> 4. GLOBAL TEMPORARY (future work). > > > >> 5. LOCAL TEMPORARY (our current temp tables). > > > > All of the above would have real uses in our shop. > > > >> It's possible to imagine a few more stops on this hierarchy. > > > > Some of these might be slightly preferred over the above in certain > > circumstances, but that's getting down to fine tuning. I think the > > five listed above are more important than the "speculative ones > > mentioned. > > > >> I don't particularly care for the name UNSYNCED > > > > EVANESCENT? > > > >> I'm starting not to like UNLOGGED much either > > > > EPHEMERAL? > > > > Actually, the UNSYNCED and UNLOGGED seem fairly clear.... > > I think Andrew's suggestion of VOLATILE is pretty good. It's hard to > come up with multiple words that express gradations of "we might > decide to chuck your data if things go South", though. Then again if > we go with Andres's suggestion maybe we can get by with one level. > > Or if we still end up with multiple levels, maybe it's best to use > VOLATILE for everything >1 and <4, and then have a subordinate clause > to specify gradations. > > CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS > THAT EXPLAIN THE DETAILS GO HERE; How about something like: OPTIONS (SYNC=no, LOG=no, ... ) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Excerpts from Robert Haas's message of mié nov 17 18:05:56 -0300 2010: > CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS > THAT EXPLAIN THE DETAILS GO HERE; What about some reloptions? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 11/17/2010 11:44 AM, Tom Lane wrote: > ...because a backend crash has to be assumed to have corrupted > unlogged tables... > So in a typical use-case, say storing session data on a web-site, one crashed backend could wreck sessions for some or all of the site? Is there a mechanism in the proposal that would allow a client to determine the state of a table (good, truncated, wrecked, etc.)? Cheers, Steve
> > >> I don't particularly care for the name UNSYNCED > > > > > > EVANESCENT? > > > > > >> I'm starting not to like UNLOGGED much either > > > > > > EPHEMERAL? > > > > > > Actually, the UNSYNCED and UNLOGGED seem fairly clear.... Uhhh yeah. Let's not break out the thesaurus for this. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
> CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS > THAT EXPLAIN THE DETAILS GO HERE; [ TRUNCATE ON RESTART ] Your patch implements this option, right? Regards, > -- dim
On Thu, Nov 18, 2010 at 3:07 AM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > >> CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS >> THAT EXPLAIN THE DETAILS GO HERE; > > [ TRUNCATE ON RESTART ] > > Your patch implements this option, right? Yeah. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I have done a bunch of benchmarking. It was not easy to find consistent numbers, so I picked a job and ran the same thingover and over. I'm running Slackware 13.1 on a desktop computer. Linux storm 2.6.35.7-smp #1 SMP Sun Oct 10 21:43:07 CDT 2010 i686 AMD Athlon(tm) 7850 Dual-Core Processor AuthenticAMD GNU/Linux Database on: /dev/sda2 on /pub type ext4 (rw,noatime) I started with stock, unpatched, pg 9.1, and ran pg_bench. I used several scale's and always set the # connections at halfthe scale. (so scale 20 used 10 connections). I ran all tests for 180 seconds. autovacuum was always off, and I ran"vacuum -z" between each pg_bench. each block of numbers has these columns: scale, test 1, test 2, test 3, avg So the first line below: 6, 96, 105, 102, 101 means: pg_becnh -i -s 6 pg_bench -c 3 -T 180 vacuum -z pg_bench -c 3 -T 180 vacuum -z pg_bench -c 3 -T 180 result times for the three runs 96, 105 and 102 seconds, with average 101 seconds. The LOGS test is importing 61+ million rows of apache logs. Its a perl script, uses COPY over many many files. Each fileis commit separate. checkpoint_segments = 7 shared_buffers = 512MB effective_cache_size = 1024MB autovacuum off fsync on synchronous_commit on full_page_writes on bgwriter_lru_maxpages 100 180 second tests scale, test 1, test 2, test 3, avg 6, 96, 105, 102, 101 20, 120, 82, 76, 93 40, 73, 42, 43, 53 80, 50, 29, 35, 38 synchronous_commit off 6, 239, 676, 614, 510 20, 78, 47, 56, 60 40, 59, 35, 41, 45 80, 53, 30, 35, 39 LOGS: ~ 3,900 ins/sec (I didnt record this well, its sort of a guess) synchronous_commit off full_page_writes off 6, 1273, 1344, 1287, 1301 20, 1323, 1307, 1313, 1314 40, 1051, 872, 702, 875 80, 551, 206, 245, 334 LOGS (got impatient and killed it) Total rows: 20,719,095 Total Seconds: 5,279.74 Total ins/sec: 3,924.25 fsync off synchronous_commit off full_page_writes off bgwriter_lru_maxpages 0 6, 3622, 2940, 2879, 3147 20, 2860, 2952, 2939, 2917 40, 2204, 2143, 2349, 2232 80, 1394, 1043, 1085, 1174 LOG (this is a full import) Total rows: 61,467,489 Total Seconds: 1,240.93 Total ins/sec: 49,533.37 ------- Apply unlogged patches and recompile, re-initdb --- I patched pg_bench to run with either normal or unlogged tables fsync on synchronous_commit on full_page_writes on bgwriter_lru_maxpages 100 180 second tests normal tables 6, 101, 102, 108, 103 20, 110, 71, 90, 90 40, 83, 45, 49, 59 80, 50, 34, 30, 38 LOGS (partial import) Total rows: 24,754,871 Total Seconds: 6,058.03 Total ins/sec: 4,086.28 unlogged tables 6, 2966, 3047, 3007, 3006 20, 2767, 2515, 2708, 2663 40, 1933, 1311, 1464, 1569 80, 837, 552, 579, 656 LOGS (full import) Total rows: 61,467,489 Total Seconds: 1,126.75 Total ins/sec: 54,552.60 After all this... there are too many numbers for me. I have no idea what this means. -Andy
On Sun, Nov 21, 2010 at 11:07 PM, Andy Colson <andy@squeakycode.net> wrote: > After all this... there are too many numbers for me. I have no idea what > this means. I think what it means that is that, for you, unlogged tables were almost as fast as shutting off all of synchronous_commit, full_page_writes, and fsync, and further setting bgwriter_lru_maxpages=0. Now, that seems a little strange, because you'd think if anything it would be faster. I'm not sure what accounts for the difference, although I wonder if checkpoints are part of it. With the current code, which doesn't exclude unlogged table pages from checkpoints, a checkpoint will still be faster with fsync=off than with unlogged tables. It seems like we're agreed that this is a problem to be fixed in phase two, though, either by fsyncing every unlogged table we can find at shutdown time, or else by providing two durability options, one that works as the current code does (but survives clean shutdowns) and another that excludes dirty pages from checkpoints (and does not survive clean shutdowns). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company