Thread: unlogged tables
Here is a series of three patches related to unlogged tables. 1. The first one (relpersistence-v1) is a mostly mechanical patch that replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence (a character), so that we can support more than two values. BE SURE YOU INITDB, since the old catalog format will not work with this patch applied. 2. The second one (unlogged-tables-v1) adds support for unlogged tables by adding a new supported value for relpersistence. I made this work by having backend that creates an unlogged relation write out an "init" fork for that relation. The main fork is nuked and replaced by the contents of the init fork during startup. But I haven't made this code work yet for index types other than btree, so attempting to define a non-btree index on an unlogged relation will currently result in an error. I don't think that's probably too hard to fix, but I haven't done it yet. 3. The third patch (relax-sync-commit-v1) allows asynchronous commit even when synchronous_commit=on if the transaction has not written WAL. Of course, a read-only transaction won't even have an XID and therefore won't need a commit record, so what this is really doing is allowing transactions that have written only to temp - or unlogged - tables to commit asynchronously. This should be OK, because if the system crashes before the commit record hits the disk, we haven't really lost anything we wouldn't lose anyway: the temp tables will disappear on restart, and the unlogged ones will be truncated. This path actually could be applied independently of the first two, if I adjusted the comments a bit. Review and testing would be appreciated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > 2. The second one (unlogged-tables-v1) adds support for unlogged > tables by adding a new supported value for relpersistence. I made this > work by having backend that creates an unlogged relation write out an > "init" fork for that relation. The main fork is nuked and replaced by > the contents of the init fork during startup. But I haven't made this > code work yet for index types other than btree, so attempting to > define a non-btree index on an unlogged relation will currently result > in an error. I don't think that's probably too hard to fix, but I > haven't done it yet. That seems pretty gross. If you're going to have to take a special action at startup anyway, why wouldn't it take the form of "truncate, then if it's an index, call the appropriate ambuild function"? Maybe that's a bit ugly, but at least the ugliness is localized rather than scribbled all over the filesystem. I'm also concerned about possible failure modes having to do with the "init fork" being missing or corrupted. regards, tom lane
On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> 2. The second one (unlogged-tables-v1) adds support for unlogged >> tables by adding a new supported value for relpersistence. I made this >> work by having backend that creates an unlogged relation write out an >> "init" fork for that relation. The main fork is nuked and replaced by >> the contents of the init fork during startup. But I haven't made this >> code work yet for index types other than btree, so attempting to >> define a non-btree index on an unlogged relation will currently result >> in an error. I don't think that's probably too hard to fix, but I >> haven't done it yet. > > That seems pretty gross. If you're going to have to take a special > action at startup anyway, why wouldn't it take the form of "truncate, > then if it's an index, call the appropriate ambuild function"? We've been over this ground before. You can't read from non-shared catalogs without binding to a database, and you must reinitialize all unlogged relations before opening the database for a connection. So what you're proposing would involving launching a worker process for each database in the cluster, having it do its thing and then exit, and only after all that's done opening the database for connections. That seems vastly more complex and less performant than what I've done here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Here is a series of three patches related to unlogged tables. > 1. The first one (relpersistence-v1) is a mostly mechanical patch that > replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence > (a character), so that we can support more than two values. BE SURE > YOU INITDB, since the old catalog format will not work with this patch > applied. While I'm griping ... is there a really good reason to do it that way, rather than adding a new column? This will break clients that are looking at relistemp. Maybe there aren't any, but I wouldn't bet on that, and it doesn't seem like you're buying a lot by creating this incompatibility. I would also argue that temp-ness is a distinct concept from logged-ness. regards, tom lane
On 11/13/2010 07:59 PM, Tom Lane wrote: > I would also argue that temp-ness is a distinct > concept from logged-ness. I agree. cheers andrew
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That seems pretty gross. �If you're going to have to take a special >> action at startup anyway, why wouldn't it take the form of "truncate, >> then if it's an index, call the appropriate ambuild function"? > We've been over this ground before. You can't read from non-shared > catalogs without binding to a database, and you must reinitialize all > unlogged relations before opening the database for a connection. So > what you're proposing would involving launching a worker process for > each database in the cluster, having it do its thing and then exit, > and only after all that's done opening the database for connections. > That seems vastly more complex and less performant than what I've done > here. The fact that it's easy doesn't make it workable. I would point out for starters that AMs might (do) put WAL locations and/or XIDs into indexes. Occasionally copying very old LSNs or XIDs back into active files seems pretty dangerous. Cleanup at first connection is something we've been avoiding for years, but maybe it's time to bite the bullet and do that? BTW, how will all of this activity look to a hot-standby slave? regards, tom lane
On Sat, Nov 13, 2010 at 7:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Here is a series of three patches related to unlogged tables. >> 1. The first one (relpersistence-v1) is a mostly mechanical patch that >> replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence >> (a character), so that we can support more than two values. BE SURE >> YOU INITDB, since the old catalog format will not work with this patch >> applied. > > While I'm griping ... is there a really good reason to do it that way, > rather than adding a new column? This will break clients that are > looking at relistemp. Maybe there aren't any, but I wouldn't bet on > that, and it doesn't seem like you're buying a lot by creating this > incompatibility. I would also argue that temp-ness is a distinct > concept from logged-ness. I think that would be a recipe for bugs. Look at the three new macros I introduced. If you keep relistemp around, then any code which relies on it is likely testing for one of those three things, or maybe even something subtly different from any of them, as in the cases where I needed to add a switch statement. The way I see it, this is ultimately a four-level hierarchy: permanent tables (write WAL, shared buffers, ordinary namespace), unlogged tables (don't write WAL, shared buffers, ordinary namespace), global temporary tables (don't write WAL, local buffers, ordinary namespace), local temporary tables (don't write WAL, local buffers, private namespace). Even if we don't end up implementing global temporary tables in the way I'm envisioning (I know you have an alternate proposal), it seem inevitable that a boolean for relistemp isn't going to be sufficient. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Nov 13, 2010 at 8:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That seems pretty gross. If you're going to have to take a special >>> action at startup anyway, why wouldn't it take the form of "truncate, >>> then if it's an index, call the appropriate ambuild function"? > >> We've been over this ground before. You can't read from non-shared >> catalogs without binding to a database, and you must reinitialize all >> unlogged relations before opening the database for a connection. So >> what you're proposing would involving launching a worker process for >> each database in the cluster, having it do its thing and then exit, >> and only after all that's done opening the database for connections. >> That seems vastly more complex and less performant than what I've done >> here. > > The fact that it's easy doesn't make it workable. I would point out for > starters that AMs might (do) put WAL locations and/or XIDs into indexes. > Occasionally copying very old LSNs or XIDs back into active files seems > pretty dangerous. I haven't examined the GIST, GIN, or hash index code in detail so I am not sure whether there are any hazards there; the btree case does not seem to have any issues of this type. Certainly, if an index AM puts an XID into an empty index, that's gonna break. I would consider that a pretty odd thing to do, though. An LSN seems less problematic since the LSN space does not wrap; it should just look like an index that was created a long time ago and never updated (which, in effect, it is). > Cleanup at first connection is something we've been avoiding for years, > but maybe it's time to bite the bullet and do that? There would certainly be some advantage to doing cleanup at first connection even if we stick with the overall approach I've adopted here, because you could avoid the overhead of cleaning up databases that are never actually accessed. There are a few downsides, though. If you happened to leave a large amount of unlogged data on disk after a crash, and then for some reason never connected to that database again, you'd never reclaim that disk space; although perhaps you could somehow arrange for autovacuum to clean up in that case. Also, the first connection to the offending database would need to lock out all other connections until cleanup was completed, although I suppose that's still better than doing the cleanup in the startup process as is presently the case. I guess the main problem is you'd need a reliable and *inexpensive* way of identifying the first connection to each database. Paying something extra at startup time is better than paying even a small penalty on each individual connection; goodness knows our connections are expensive enough already. > BTW, how will all of this activity look to a hot-standby slave? The table will appear to exist but you'll get an error if you try to access it. I think at present it'll complain about the underying files being missing; that could probably be fine-tuned if we're so inclined. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Cleanup at first connection is something we've been avoiding for years, > but maybe it's time to bite the bullet and do that? > Another alternative is to initialize the unlogged tables when you first access them. If you try to open a table and there are no files attached them go ahead and initialize it by creating an empty table and building any indexes. Hm, I had been assuming recovery would be responsible for cleaning up the tables even if the first access is responsible for rebuilding them. But there's a chance there have been no modifications to them since the last checkpoint. But in that case the data in them is fine. It would be a weird interface if it only cleared them out sometimes based on unpredictable timing though. Avoiding that does require some kind of alternate storage scheme other than the WAL to indicate what needs to be cleared out. .init files are as good a mechanism even if they just mean "unlink this file on startup". -- greg
On Sat, Nov 13, 2010 at 9:17 PM, Greg Stark <gsstark@mit.edu> wrote: > On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Cleanup at first connection is something we've been avoiding for years, >> but maybe it's time to bite the bullet and do that? > > Another alternative is to initialize the unlogged tables when you > first access them. If you try to open a table and there are no files > attached them go ahead and initialize it by creating an empty table > and building any indexes. I thought about that (I've thought about a lot of things in regards to this feature...). One problem is that you presumably will need to open the relation before you can decide whether this is the first access since restart. But by the time you've opened them, you've already taken an AccessShareLock, and you'll presumably need something a whole lot stronger than that to do the rebuild. Lock upgrades are usually a good thing to avoid when possible, although maybe it would be OK in this case, not sure. Another problem is that it's not too clear to me where you'd hook in the logic to do the cleanup. The relcache code seems like an awfully low-level place to be trying to perpetrate this sort of monkey business. > Hm, I had been assuming recovery would be responsible for cleaning up > the tables even if the first access is responsible for rebuilding > them. But there's a chance there have been no modifications to them > since the last checkpoint. But in that case the data in them is fine. > It would be a weird interface if it only cleared them out sometimes > based on unpredictable timing though. Avoiding that does require some > kind of alternate storage scheme other than the WAL to indicate what > needs to be cleared out. .init files are as good a mechanism even if > they just mean "unlink this file on startup". One idea I had was to trigger the rebuild when we notice that the main relation fork is missing. Then the startup code can just notice the init fork, annihilate everything else, and call it good. However, this appears to require modifying some fairly fundamental assumptions of the current system. smgr.c/md.c believe that nobody should ever try to read a nonexistent block, and unconditionally throw an error if the caller tries to do so. You could provide a mode where they don't do that, and instead return an error indication to the caller. Then you could add an additional ReadBuffer mode, say RBM_FAIL, to let the error percolate back up through that layer to the index AM or heap code, which could then try to upgrade its lock and recreate the main fork. However, I really couldn't work up much enthusiasm for implementing this feature in a way that requires drilling a hole in the abstraction stack from top to bottom. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 14, 2010 at 02:16, Robert Haas <robertmhaas@gmail.com> wrote: > Here is a series of three patches related to unlogged tables. Just wondering, have you thought of any mechanisms how application code might detect that an unlogged table was truncated due to restart? While polling with something like "SELECT 1 FROM table LIMIT 1" might work, it's an awful hack. One obvious use case for these unlogged tables would be materalized views. I think it would be useful to execute e.g. a TRUNCATE trigger so that an the view could be initialized. If an exclusive lock were passed on to the trigger procedure, this could even be done in a race-condition-free manner as far as I can tell. Would there be a problem with invoking this trigger from the session that first touches the table? Regards, Marti
On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp <marti@juffo.org> wrote: > On Sun, Nov 14, 2010 at 02:16, Robert Haas <robertmhaas@gmail.com> wrote: >> Here is a series of three patches related to unlogged tables. > > Just wondering, have you thought of any mechanisms how application > code might detect that an unlogged table was truncated due to restart? > While polling with something like "SELECT 1 FROM table LIMIT 1" might > work, it's an awful hack. > > One obvious use case for these unlogged tables would be materalized > views. I think it would be useful to execute e.g. a TRUNCATE trigger > so that an the view could be initialized. If an exclusive lock were > passed on to the trigger procedure, this could even be done in a > race-condition-free manner as far as I can tell. > > Would there be a problem with invoking this trigger from the session > that first touches the table? Yeah, this infrastructure doesn't really allow that. The truncate happens way too early on in startup to execute any user-provided code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Marti Raudsepp <marti@juffo.org> writes: > Would there be a problem with invoking this trigger from the session > that first touches the table? Other than security? regards, tom lane
On Mon, Nov 15, 2010 at 18:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marti Raudsepp <marti@juffo.org> writes: >> Would there be a problem with invoking this trigger from the session >> that first touches the table? > > Other than security? Right, I guess that would only make sense with SECURITY DEFINER. On Mon, Nov 15, 2010 at 18:22, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp <marti@juffo.org> wrote: >> Just wondering, have you thought of any mechanisms how application >> code might detect that an unlogged table was truncated due to restart? > Yeah, this infrastructure doesn't really allow that. The truncate > happens way too early on in startup to execute any user-provided code. The truncate itself can be performed early and set a flag somewhere that would invoke a trigger on the first access. I suppose it cannot be called a "truncate trigger" then. Or maybe provide hooks for pgAgent instead? Do you see any alternatives to be notified of unlogged table truncates? Without notification, this feature would seem to have limited usefulness. Regards, Marti
On Mon, Nov 15, 2010 at 12:02 PM, Marti Raudsepp <marti@juffo.org> wrote: > On Mon, Nov 15, 2010 at 18:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Marti Raudsepp <marti@juffo.org> writes: >>> Would there be a problem with invoking this trigger from the session >>> that first touches the table? >> >> Other than security? > > Right, I guess that would only make sense with SECURITY DEFINER. > > On Mon, Nov 15, 2010 at 18:22, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp <marti@juffo.org> wrote: >>> Just wondering, have you thought of any mechanisms how application >>> code might detect that an unlogged table was truncated due to restart? > >> Yeah, this infrastructure doesn't really allow that. The truncate >> happens way too early on in startup to execute any user-provided code. > > The truncate itself can be performed early and set a flag somewhere > that would invoke a trigger on the first access. I suppose it cannot > be called a "truncate trigger" then. > > Or maybe provide hooks for pgAgent instead? > > Do you see any alternatives to be notified of unlogged table > truncates? Without notification, this feature would seem to have > limited usefulness. Well, you're only monitoring for a server restart. That's probably something you need a way to monitor for anyway. I don't think we have a function that exposes the time of the last server restart at the SQL level, but maybe we should. You can monitor for it by watching the logs, of course. This is really intended for things like caches of session information where loss is annoying (because users have to log back into the webapp, or whatever) but not so critical that we want to take a performance penalty to prevent it. It will also be helpful to people who want to make PG run very very quickly even at the risk of data loss, as in the recent discussion on -performance and some conversations I had at PG West; it provides a more structured, and hopefully also more performant, alternative to turning off fsync, full_page_writes, and synchronous commit. For some such apps, it may be sufficient to check for truncating at each reconnect, which will be a whole lot easier than what they have to do now (which is rebuild the entire cluster every time PG restarts). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 15, 2010 at 11:22 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Yeah, this infrastructure doesn't really allow that. The truncate > happens way too early on in startup to execute any user-provided code. But you could use the very feature of unlogged tables to know if you've "initialized" some unlogged table by using an unlogged table to note the initilization. If the value you expect isn't in your "noted" table, you know that it's been truncated... Sure, it's "app side", but the hole point of unlogged tables it to allow optimzations when the "appside" knows the data's dispensable and rebuild-able. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On lör, 2010-11-13 at 19:16 -0500, Robert Haas wrote: > 1. The first one (relpersistence-v1) is a mostly mechanical patch that > replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence > (a character), so that we can support more than two values. BE SURE > YOU INITDB, since the old catalog format will not work with this patch > applied. Btw., I would recommend that even in-progress or proposed patches include catversion updates, which helps communicate the message such as yours in a more robust manner and also reduces the chance of forgetting the catversion change in the final commit.
On Tue, Nov 16, 2010 at 2:49 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On lör, 2010-11-13 at 19:16 -0500, Robert Haas wrote: >> 1. The first one (relpersistence-v1) is a mostly mechanical patch that >> replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence >> (a character), so that we can support more than two values. BE SURE >> YOU INITDB, since the old catalog format will not work with this patch >> applied. > > Btw., I would recommend that even in-progress or proposed patches > include catversion updates, which helps communicate the message such as > yours in a more robust manner and also reduces the chance of forgetting > the catversion change in the final commit. I thought we had a policy of NOT doing that, because of the merge conflicts thereby created. It's also hard to know what value to set it to; whatever you pick will certainly be obsolete by commit time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On tis, 2010-11-16 at 15:08 -0500, Robert Haas wrote: > > Btw., I would recommend that even in-progress or proposed patches > > include catversion updates, which helps communicate the message such > as > > yours in a more robust manner and also reduces the chance of > forgetting > > the catversion change in the final commit. > > I thought we had a policy of NOT doing that, because of the merge > conflicts thereby created. I don't know, but I for one *want* the merge conflict, because if I'm actually merging two diverging lines of system catalog changes, then I had better stop and think about it. > It's also hard to know what value to set > it to; whatever you pick will certainly be obsolete by commit time. Well, the most important thing is that it's different from the last value, but I have occasionally wondered about a way to support tagging branches separately.
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Nov 16, 2010 at 2:49 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> Btw., I would recommend that even in-progress or proposed patches >> include catversion updates, > I thought we had a policy of NOT doing that, because of the merge > conflicts thereby created. It's also hard to know what value to set > it to; whatever you pick will certainly be obsolete by commit time. Well, my expectation would be that the committer would reset the catversion to current date when it goes into master. The question is whether such a practice would be (a) helpful to testers and/or (b) useful to the committer. As for (a), it likely would be, except that a patch that's not very recent is almost certainly going to get a merge failure here when the tester tries to apply it locally. That doesn't really seem like a gain. Still, I can see the point of forcing an initdb when needed. As for (b), I'm not sure I buy Peter's argument about a merge conflict on that being a helpful flag. I don't see any reason to think that system catalog changes are much more (or less) likely to result in hidden merge conflicts than any other type of change. I'm not personally going to rely on a submitter's determination of whether a catversion bump is needed anyhow. So I lean towards being happy with the current approach, though I could be talked into the other given a better argument for it. regards, tom lane
On tis, 2010-11-16 at 16:04 -0500, Tom Lane wrote: > Well, my expectation would be that the committer would reset the > catversion to current date when it goes into master. The question is > whether such a practice would be (a) helpful to testers and/or (b) > useful to the committer. As with most such things, it's a matter of personal preference. I started doing this out of necessity a while ago, and it has turned out to be very helpful. > As for (a), it likely would be, except that a patch that's not very > recent is almost certainly going to get a merge failure here when the > tester tries to apply it locally. That doesn't really seem like a gain. Arguably, it means that the patch should be updated. At least, it's a warning sign to the reviewer. > Still, I can see the point of forcing an initdb when needed. Especially because it prevents novice patch reviewers from mixing mismatching source and data directories and wasting time on obscure "bugs". > As for (b), I'm not sure I buy Peter's argument about a merge conflict > on that being a helpful flag. I don't see any reason to think that > system catalog changes are much more (or less) likely to result in > hidden merge conflicts than any other type of change. Actually, in a recent sample, the likelihood for a hidden merge conflict in near 100% because different patches keep reassigning the same OID for new database objects. In addition, there is the Git philosophy argument that every branch should stand on its own. If more than one person collaborates on a branch for more than one week, all the original reasons for having the catversion in the first place come back into play. And so while I do not wish to be radical about requiring catversion updates in random patches, we should recognize the possibility that catversion updates outside of the mainline are reasonable.
On 14.11.2010 02:16, Robert Haas wrote: > 3. The third patch (relax-sync-commit-v1) allows asynchronous commit > even when synchronous_commit=on if the transaction has not written > WAL. Of course, a read-only transaction won't even have an XID and > therefore won't need a commit record, so what this is really doing is > allowing transactions that have written only to temp - or unlogged - > tables to commit asynchronously. This should be OK, because if the > system crashes before the commit record hits the disk, we haven't > really lost anything we wouldn't lose anyway: the temp tables will > disappear on restart, and the unlogged ones will be truncated. This > path actually could be applied independently of the first two, if I > adjusted the comments a bit. Looks ok. I'd suggest rewording this comment though: /* * Check if we want to commit asynchronously. If we're doing cleanup of * any non-temp rels or committing any commandthat wanted to force sync * commit, then we must flush XLOG immediately. (We must not allow * asynchronous commitif there are any non-temp tables to be deleted, * because we might delete the files before the COMMIT record is flushedto * disk. We do allow asynchronous commit if all to-be-deleted tables are * temporary though, since they are lostanyway if we crash.) Otherwise, * we can defer the flush if either (1) the user has set synchronous_commit * = off, or(2) the current transaction has not performed any WAL-logged * operation. This latter case can arise if the only writesperformed by * the current transaction target temporary or unlogged relations. Loss * of such a transaction won'tmatter anyway, because temp tables will be * lost after a crash anyway, and unlogged ones will be truncated. */ It's a bit hard to follow, as it first lists exceptions on when we must flush XLOG immediately, and then lists conditions on when we can skip it. How about: /* * Check if we can commit asynchronously. We can skip flushing the XLOG * if synchronous_commit=off, or if the currenttransaction has not * performed any WAL-logged operation. The latter case can arise if the * only writes performedby the current transaction target temporary or * unlogged relations. Loss of such a transaction won't matter anyway,* because temp tables will be lost after a crash anyway, and unlogged * ones will be truncated. * * However, if we'redoing cleanup of any non-temp rels or committing * any command that wanted to force sync commit, then we must flush* XLOG immediately anyway. (We must not allow asynchronous commit if * there are any non-temp tables to be deleted,because we might delete * the files before the COMMIT record is flushed to disk. We do allow * asynchronous commitif all to-be-deleted tables are temporary * though, since they are lost anyway if we crash.) */ -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sat, 2010-11-13 at 20:55 -0500, Robert Haas wrote: > I think that would be a recipe for bugs. Look at the three new macros > I introduced. If you keep relistemp around, then any code which > relies on it is likely testing for one of those three things, or maybe > even something subtly different from any of them, as in the cases > where I needed to add a switch statement. The way I see it, this is > ultimately a four-level hierarchy That argument isn't clear enough to avoid me agreeing so far with Tom and Andrew that logged-ness is separate from temp-ness. As you say though, it might be a recipe for bugs, so please explain a little more. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Sat, 2010-11-13 at 19:16 -0500, Robert Haas wrote: > 3. The third patch (relax-sync-commit-v1) allows asynchronous commit > even when synchronous_commit=on if the transaction has not written > WAL. Of course, a read-only transaction won't even have an XID and > therefore won't need a commit record, so what this is really doing is > allowing transactions that have written only to temp - or unlogged - > tables to commit asynchronously. I like this, great idea. Avoiding the commit record entirely will break Hot Standby though, since we rely on the assumption that all xids that are assigned are also logged. The xids would be "known assigned", yet since they never actually appear they will clog up the machinery (pun unintended). -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Dec 15, 2010 at 4:20 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2010-11-13 at 19:16 -0500, Robert Haas wrote: > >> 3. The third patch (relax-sync-commit-v1) allows asynchronous commit >> even when synchronous_commit=on if the transaction has not written >> WAL. Of course, a read-only transaction won't even have an XID and >> therefore won't need a commit record, so what this is really doing is >> allowing transactions that have written only to temp - or unlogged - >> tables to commit asynchronously. > > I like this, great idea. > > Avoiding the commit record entirely will break Hot Standby though, since > we rely on the assumption that all xids that are assigned are also > logged. The xids would be "known assigned", yet since they never > actually appear they will clog up the machinery (pun unintended). Uggh, that's a really, really bad pun. I made the same observation to Tom somewhere-or-other (must have been a different thread because I don't see it on this one), along with the further observation that we actually could suppress the commit record entirely if wal_level < hot_standby, but I'm not sure there's enough benefit to doing that to worry about the additional complexity. Changing it from a foreground flush to a background flush already wins so much that I don't really see the point of doing anything further. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Dec 15, 2010 at 4:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2010-11-13 at 20:55 -0500, Robert Haas wrote: >> I think that would be a recipe for bugs. Look at the three new macros >> I introduced. If you keep relistemp around, then any code which >> relies on it is likely testing for one of those three things, or maybe >> even something subtly different from any of them, as in the cases >> where I needed to add a switch statement. The way I see it, this is >> ultimately a four-level hierarchy > > That argument isn't clear enough to avoid me agreeing so far with Tom > and Andrew that logged-ness is separate from temp-ness. As you say > though, it might be a recipe for bugs, so please explain a little more. Sure. Most of the existing checks for rd_istemp were actually checking whether the relation required WAL-logging. If there's any third-party code out there that is checking rd_istemp, it likely also needs to be revised to check whether WAL-logging is needed, not whether the relation is temp. The way I've coded it, such code will fail to compile, and can be very easily fixed by substituting a call to RelationNeedsWAL() or RelationUsesLocalBuffers() or RelationUsesTempNamespace(), depending on which property the caller actually cares about. That's better than having the code compile, but then not work as expected. As of today, RelationNeedsWAL() always gives an answer which is directly opposite to the answer given by RelationUsesLocalBuffers() and RelationUsesTempNamespace(). But the main unlogged tables patch changes that. RelationNeedsWAL() will return true for permanent tables and false for unlogged and temp tables, while RelationUsesLocalBuffers() and RelationUsesTempNamespace() will return false for permanent and unlogged tables and true for temp tables. When and if we get global temporary tables, there will be a further split between RelationUsesLocalBuffers() and RelationUsesTempNamespace(). The former will return true for both global and local temporary tables, and the latter only for local temporary tables. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company