Thread: (WIP) VACUUM REWRITE - CLUSTER by ctid
I'm working on alternative version of VACUUM FULL, which is like CLUSTER but sort tuples in ctid order without index. The original discussion is here: [HACKERS] Feedback on getting rid of VACUUM FULL http://archives.postgresql.org/pgsql-hackers/2009-09/msg01047.php WIP patch attached. I have some questions over the development: 1. Syntax: I choose "CLUSTER tbl WITHOUT INDEX" for the syntax, but it is debatable. What syntax is the best? VACUUM REWRITE? CLUSTER ORDER BY ctid? or replace VACUUM FULL? 2. Superclass of HeapScanDesc and IndexScanDesc: We don't have an abstraction layer of HeapScanDesc and IndexScanDesc, but the layer is useful for this purpose. Is it reasonable? It is partially implemented as genam_beginscan() family in the patch. 3. Should we allow "ctid" as a default clustered index? We could assume "ctid" as a virtual index. The syntax for it might be "ALTER TABLE tbl CLUSTER ON COLUMN ctid" or so. Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Itagaki Takahiro wrote: > I'm working on alternative version of VACUUM FULL, which is > like CLUSTER but sort tuples in ctid order without index. > The original discussion is here: > [HACKERS] Feedback on getting rid of VACUUM FULL > http://archives.postgresql.org/pgsql-hackers/2009-09/msg01047.php > > WIP patch attached. I have some questions over the development: > > 1. Syntax: I choose "CLUSTER tbl WITHOUT INDEX" for the syntax, > but it is debatable. What syntax is the best? > VACUUM REWRITE? CLUSTER ORDER BY ctid? or replace VACUUM FULL? I got the impression that replacing VACUUM FULL is the most popular opinion. I like VACUUM REWRITE myself, except that it would require making REWRITE a reserved keyword. I don't like tacking this onto CLUSTER, particularly not with "ORDER BY ctid". ctids are an implementation detail most users don't care about, and ORDER BY sounds like it's sorting something, but it's not. > 2. Superclass of HeapScanDesc and IndexScanDesc: > We don't have an abstraction layer of HeapScanDesc and IndexScanDesc, > but the layer is useful for this purpose. Is it reasonable? > It is partially implemented as genam_beginscan() family in the patch. I don't think it's really necessary. You might as well put a few "if (indexOid)" clauses directly into copy_heap_data. > 3. Should we allow "ctid" as a default clustered index? > We could assume "ctid" as a virtual index. The syntax for it > might be "ALTER TABLE tbl CLUSTER ON COLUMN ctid" or so. Isn't that the same as having no clustering index? We already have "ALTER TABLE tbl SET WITHOUT CLUSTER". -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I got the impression that replacing VACUUM FULL is the most popular > opinion. I like VACUUM REWRITE myself, except that it would require > making REWRITE a reserved keyword. My next proposal for the syntex is "VACUUM (options) table_name". Since "options" are quoted by '(' and ')', we can add new options without adding them into reserved keywords. The traditional vacuum syntax: VACUUM FULL FREEZE VERBOSE ANALYZE table_name (columns); will be: VACUUM (FULL, FREEZE, VERBOSE, ANALYZE) table_name (columns); I think the syntax is consistent with existing syntex of "EXPLAIN (...)". We can choose any keyword for the new "rewrite" version. For example: * VACUUM ( REWRITE ) * VACUUM ( FULL [ INPLACE | REPLACE ] ) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Here is a patch to support "rewrite" version of VACUUM FULL. It was called "VACUUM REWRITE" in the past disucussin, but I choose the following syntax for now: VACUUM ( FULL [ INPLACE | REPLACE ] ) [ table_name ] The reason is to contrast the new REPLACE behavior with the old INPLACE behavior. I cannot find any good terms of opposite meaning of REWRITE. The new version works like as CLUSTER USING ctid or rewriting in ALTER TABLE. It must be faster than them if we have many dead tuples and are not interested in physical order of tuples. We still need traditional VACUUM FULL behavior for system catalog because we cannot change relfilenode for them. Also, VACUUM FULL REPLACE is not always better than traditional VACUUM FULL; the new version requires additional disk space and might be slower if we have a few dead tuples. "VACUUM FULL" and "VACUUM (FULL)" are synonyms for "VACUUM (FULL REPLACE)", but if the target table is s system catalog, instead "FULL INPLACE" is used automatically. If this approach is reasonable, I'll go for other supplemental parts. (documentations, vacuumdb, etc.) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
On Tue, 2009-10-27 at 13:55 +0900, Itagaki Takahiro wrote: > Here is a patch to support "rewrite" version of VACUUM FULL. > It was called "VACUUM REWRITE" in the past disucussin, > but I choose the following syntax for now: > > VACUUM ( FULL [ INPLACE | REPLACE ] ) [ table_name ] > > The reason is to contrast the new REPLACE behavior with the old INPLACE > behavior. I cannot find any good terms of opposite meaning of REWRITE. > > The new version works like as CLUSTER USING ctid or rewriting in > ALTER TABLE. It must be faster than them if we have many dead tuples > and are not interested in physical order of tuples. > > We still need traditional VACUUM FULL behavior for system catalog because > we cannot change relfilenode for them. Also, VACUUM FULL REPLACE is not > always better than traditional VACUUM FULL; the new version requires > additional disk space and might be slower if we have a few dead tuples. > > "VACUUM FULL" and "VACUUM (FULL)" are synonyms for "VACUUM (FULL REPLACE)", > but if the target table is s system catalog, instead "FULL INPLACE" is > used automatically. > If this approach is reasonable, I'll go for other supplemental parts. > (documentations, vacuumdb, etc.) Rough approach looks fine to me. The internal logic is fairly hard to read. I'd suggest you have option flags VACUUM_FULL and VACUUM_REPLACE, rather than VACUUM_INPLACE and VACUUM_REPLACE. So VACUUM_REPLACE can only be set iff VACUUM_FULL. That will reduce much of the logic. -- Simon Riggs www.2ndQuadrant.com
Itagaki Takahiro wrote: > Here is a patch to support "rewrite" version of VACUUM FULL. > It was called "VACUUM REWRITE" in the past disucussin, > but I choose the following syntax for now: > > VACUUM ( FULL [ INPLACE | REPLACE ] ) [ table_name ] > > The reason is to contrast the new REPLACE behavior with the old INPLACE > behavior. I cannot find any good terms of opposite meaning of REWRITE. I thought the idea is to rip out the current implementation altogether. If that's the case, then it doesn't make sense to use a different syntax. Just rip the innards of VACUUM FULL and replace them with your new implementation. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Itagaki Takahiro wrote: > We still need traditional VACUUM FULL behavior for system catalog because > we cannot change relfilenode for them. Also, VACUUM FULL REPLACE is not > always better than traditional VACUUM FULL; the new version requires > additional disk space and might be slower if we have a few dead tuples. Tom was saying that we could fix the problem that relfilenode could not be changed by having a flat file filenode map. It would only be needed for nailed system catalogs (the rest of the tables grab their relfilenode from pg_class as usual) so it wouldn't have the problems that the previous flatfiles had (which was that they could grow too much). I don't recall if this got implemented (I don't think it did). As for it being slower with few dead tuples, I don't think this is a problem -- just use lazy vacuum in that case. I also remember we agreed on something about the need for extra disk space, but I can't remember what it was. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
BTW I think the vacstmt.options change, which seems a reasonable idea to me, could be extracted from the patch and applied separately. That'd reduce the size of the patch somewhat. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> wrote: > BTW I think the vacstmt.options change, which seems a reasonable idea to > me, could be extracted from the patch and applied separately. That'd > reduce the size of the patch somewhat. It's a good idea. I separated the part into the attached patch. It replaces VacuumStmt's "vacuum", "full", "analyze", and "verbose" fields into one "options" flag field. The only user-visible change is to support "VACUUM (options)" syntax: VACUUM ( FULL, FREEZE, VERBOSE, ANALYZE ) table (columns) We don't bother with the order of options in this form :) There is a debatable issue that we can use "VACUUM (VERBOSE) table (col)" in the abobe syntax. Columns are specified but no ANALYZE option there. An ANALYZE option is added automatically in the current implementation, but we should have thrown an syntax error in such case. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
On Fri, 2009-11-13 at 10:47 +0900, Itagaki Takahiro wrote: > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > BTW I think the vacstmt.options change, which seems a reasonable idea to > > me, could be extracted from the patch and applied separately. That'd > > reduce the size of the patch somewhat. > > It's a good idea. I separated the part into the attached patch. > It replaces VacuumStmt's "vacuum", "full", "analyze", and "verbose" > fields into one "options" flag field. > > The only user-visible change is to support "VACUUM (options)" syntax: > VACUUM ( FULL, FREEZE, VERBOSE, ANALYZE ) table (columns) > We don't bother with the order of options in this form :) > > There is a debatable issue that we can use "VACUUM (VERBOSE) table (col)" > in the abobe syntax. Columns are specified but no ANALYZE option there. > An ANALYZE option is added automatically in the current implementation, > but we should have thrown an syntax error in such case. I have just begun review by reading some of the previous threads. I'd like to try to summarize the goals we have for VACUUM to put these patches in perspective: 1. Implement a rewrite version of VACUUM FULL, which is closer to CLUSTER. 2. Use the new options syntax, to make the order of vacuum options irrelevant. 3. Implement several flatfile maps to allow the rewrite version of VACUUM FULL to work on catalogs: http://archives.postgresql.org/message-id/19750.1252094460@sss.pgh.pa.us 4. Implement some kind of concurrent tuple mover that can slowly update tuples and lazily VACUUM in such a way that they migrate to lower heap pages (assuming no long-running transactions). This would be done with normal updates (new xmin) and would not remove the old tuple until it was really dead; otherwise there are concurrency problems. Such a utility would be useful in cases where a very small fraction of tuples need to be moved, or you don't have enough space for a rewrite. 5. Remove VACUUM FULL (in it's current form) completely. Some other ideas also came out of the thread, like: * Provide a way to truncate the dead space from the end of a relation in a blocking manner. Right now, lazy vacuum won't shrink the file unless it can acquire an exclusive lock without waiting, and there's no way to actually make it wait. This can be a separate command, not necessarily a part of VACUUM. * Josh Berkus suggested allowing the user to specify a different tablespace in which to rebuild the tablespace. I'll begin looking at the patches themselves now, which implement #1 and #2. If we can implement enough of these features (say, #3 also) to remove the current form of VACUUM FULL, then we can just call the new feature VACUUM FULL, and save ourselves from syntactical headaches. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > I'd like to try to summarize the goals we have for VACUUM to put these > patches in perspective: Good summary, but ... > Some other ideas also came out of the thread, like: > * Provide a way to truncate the dead space from the end of a relation in > a blocking manner. Right now, lazy vacuum won't shrink the file unless > it can acquire an exclusive lock without waiting, and there's no way to > actually make it wait. This can be a separate command, not necessarily a > part of VACUUM. What I remembered was actually closer to the opposite: people are concerned about lazy vac holding the exclusive lock too long once it does acquire it. In a manual vacuum this leads to unexpected blockage of concurrent queries, and in an autovac this leads to a forced cancel preventing autovac from completing the operation (which means no space removal at all, and no stats update either). The code is designed on the assumption that it won't spend very long holding the ex lock, and so I think a reasonable TODO item is to ensure that by having it limit how many blocks it will scan during the shrinkage attempt. FWIW, once we provide the extensible option syntax, it doesn't seem like it'd be hard to have an option to make it wait for the lock, so I'd recommend that approach over anything with a separate command. regards, tom lane
On Fri, 2009-11-13 at 10:47 +0900, Itagaki Takahiro wrote: > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > BTW I think the vacstmt.options change, which seems a reasonable idea to > > me, could be extracted from the patch and applied separately. That'd > > reduce the size of the patch somewhat. > > It's a good idea. I separated the part into the attached patch. > It replaces VacuumStmt's "vacuum", "full", "analyze", and "verbose" > fields into one "options" flag field. I added a separate commitfest item for this patch to track it separately from the rewrite version of VACUUM: https://commitfest.postgresql.org/action/patch_view?id=222 > The only user-visible change is to support "VACUUM (options)" syntax: > VACUUM ( FULL, FREEZE, VERBOSE, ANALYZE ) table (columns) > We don't bother with the order of options in this form :) I looked at this patch. You left INPLACE in the patch, which looks like an oversight when you were separating the two. Please remove that from this part. > There is a debatable issue that we can use "VACUUM (VERBOSE) table (col)" > in the abobe syntax. Columns are specified but no ANALYZE option there. > An ANALYZE option is added automatically in the current implementation, > but we should have thrown an syntax error in such case. Sounds fine, but worth a mention in the documentation. Just add to the "columns" part of the VACUUM page something like: "If specified, implies ANALYZE". Other than these two minor issues, I don't see any problems with the patch. Please post an updated version to the new commitfest entry. Regards,Jeff Davis
On Tue, 2009-10-27 at 13:55 +0900, Itagaki Takahiro wrote: > Here is a patch to support "rewrite" version of VACUUM FULL. Can you please provide a patch that applies cleanly on top of the vacuum options patch and on top of CVS HEAD (there are a couple minor conflicts with this version). That would make review easier. Initial comments: 1. Do we want to introduce syntax for INPLACE at all, if we are eventually going to remove the current mechanism? If not, then we should probably use REWRITE, because that's a better word than "REPLACE", I think. My opinion is that if we really still need the current in-place mechanism, then VACUUM (FULL) should use the current in-place mechanism; and VACUUM (FULL REWRITE) should use your new rewrite mechanism. That gives us a nice migration path toward always using the rewrite mechanism. 2. Why do all of the following exist: VACOPT_FULL, VACOPT_REPLACE, and VACOPT_INPLACE? Shouldn't VACOPT_FULL be equivalent to one of the other two? This is essentially what Simon was getting at, I think. 3. Some options are being set in vacuum() itself. It looks like the options should already be set in gram.y, so should that be an Assert instead? I think it's cleaner to set all of the options properly early on, rather than waiting until vacuum() to interpret the combinations. I haven't looked at the implementation in detail yet, but at a glance, it seems to be a good approach. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > 3. Some options are being set in vacuum() itself. It looks like the > options should already be set in gram.y, so should that be an Assert > instead? I think it's cleaner to set all of the options properly early > on, rather than waiting until vacuum() to interpret the combinations. As a rule of thumb, I'd recommend keeping as much complexity as possible *out* of gram.y. It's best if that code just reports the facts, ie what options the user entered. Deriving conclusions from that (like what default behaviors should be used) is best done later. One example of why is that if you want the defaults to depend on GUC settings then that logic must *not* happen in gram.y, since the parse tree might outlive the current GUC settings. I haven't read the patch but it sounds like vacuum() is the right place for this type of activity. regards, tom lane
On Sat, 2009-11-14 at 19:25 -0500, Tom Lane wrote: > As a rule of thumb, I'd recommend keeping as much complexity as possible > *out* of gram.y. It's best if that code just reports the facts, ie what > options the user entered. Deriving conclusions from that (like what > default behaviors should be used) is best done later. One example of > why is that if you want the defaults to depend on GUC settings then > that logic must *not* happen in gram.y, since the parse tree might > outlive the current GUC settings. I was referring to (in vacuum()): + if (vacstmt->options & (VACOPT_INPLACE | VACOPT_REPLACE | VACOPT_FREEZE)) + vacstmt->options |= VACOPT_VACUUM; + if (vacstmt->va_cols) + vacstmt->options |= VACOPT_ANALYZE; I think what you say applies to VACOPT_ANALYZE, which is implied when columns are supplied by the user but ANALYZE is not specified explicitly. In that case it should be set in vacuum() but not in gram.y (unless specified by the user). However, for VACOPT_VACUUM, I think that's still in the territory of gram.y. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> wrote: > You left INPLACE in the patch Oops, removed. > Sounds fine, but worth a mention in the documentation. Just add to the > "columns" part of the VACUUM page something like: "If specified, implies > ANALYZE". Added. > Other than these two minor issues, I don't see any problems with the > patch. Please post an updated version to the new commitfest entry. All of the vacuum options are adjusted in gram.y in the current patch. We only check the options with assertions in vacuum(): /* sanity checks */ Assert(vacstmt->options & (VACOPT_VACUUM | VACOPT_ANALYZE)); Assert(!(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)) || (vacstmt->options & VACOPT_VACUUM)); Assert(vacstmt->va_cols == NIL || (vacstmt->options & VACOPT_ANALYZE)); Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
On Mon, 2009-11-16 at 13:37 +0900, Itagaki Takahiro wrote: > Jeff Davis <pgsql@j-davis.com> wrote: > > > You left INPLACE in the patch > > Oops, removed. > > > Sounds fine, but worth a mention in the documentation. Just add to the > > "columns" part of the VACUUM page something like: "If specified, implies > > ANALYZE". > > Added. > Great, I am marking this part "ready for committer". I may be slow to review the remainder of the VACUUM FULL rewrite patch due to the conference in Tokyo, but I will review it soon afterward. Regards,Jeff Davis
Here is an updated patch of rewriting vacuum based on vacuum options patch. Documentations and vacuumdb modification (-i, --inplace) are added. Jeff Davis <pgsql@j-davis.com> wrote: > 1. Do we want to introduce syntax for INPLACE at all, if we are > eventually going to remove the current mechanism? > My opinion is that if we really still need the current in-place > mechanism, then VACUUM (FULL) should use the current in-place mechanism; > and VACUUM (FULL REWRITE) should use your new rewrite mechanism. AFAIK, "VACUUM FULL" should behave as "REWRITE" in the past discussion. Since we don't want users to use in-place FULL vacuum, so we will change the default behavior of VACUUM FULL. There are some choices: <REWRITE version> <in-place version> 1. VACUUM (FULL REPLACE) vs. VACUUM (FULL INPLACE) 2. VACUUM (FULL) vs. VACUUM (FULL INPLACE) 3. VACUUM (REWRITE) vs. VACUUM (FULL) 4. VACUUM (FULL REWRITE) vs. VACUUM (FULL) 5. Don't use SQL and use a GUC instead. (bool inplace_vacuum_full ?) I choose a hybrid syntax of 1 + 2 in the patch, but I'm not particular about it. What is the best? > 2. Why do all of the following exist: VACOPT_FULL, VACOPT_REPLACE, and > VACOPT_INPLACE? Shouldn't VACOPT_FULL be equivalent to one of the other > two? This is essentially what Simon was getting at, I think. * FULL [REPLACE] := VACOPT_FULL * FULL INPLACE := VACOPT_FULL + VACOPT_INPLACE > 3. Some options are being set in vacuum() itself. It looks like the > options should already be set in gram.y, so should that be an Assert > instead? I think it's cleaner to set all of the options properly early > on, rather than waiting until vacuum() to interpret the combinations. I moved all of the logic into gram.y. vacuum() has only assert tests. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Jeff Davis <pgsql@j-davis.com> writes: > On Mon, 2009-11-16 at 13:37 +0900, Itagaki Takahiro wrote: >> [ new options syntax for VACUUM ] > Great, I am marking this part "ready for committer". Applied with very minor editorialization. regards, tom lane
Review comments: * I attached some documentation suggestions. * The patch should be merged with CVS HEAD. The changes required are minor; but notice that there is a minor style difference in the assert in vacuum(). * vacuumdb should reject -i without -f * The replace or inplace option is a "magical" default, because "VACUUM FULL" defaults to "replace" for user tables and "inplace" for system tables. I tried to make that more clear in my documentation suggestions. * "VACUUM (FULL REPLACE) pg_class" should be rejected, not silently turned into "VACUUM (FULL INPLACE) pg_class". * There are some windows line endings in the patch, which should be removed. * In vacuumdb, the code for handling INPLACE is a little ugly. Perhaps it should be changed to always use your new options syntax? That might be more code, but it would be a little more readable. Regards, Jeff Davis
Attachment
Thanks for review! Jeff Davis <pgsql@j-davis.com> wrote: > * "VACUUM (FULL REPLACE) pg_class" should be rejected, not silently > turned into "VACUUM (FULL INPLACE) pg_class". Hmmm, it requires to remember whether REPLACE is specified or not for the non-INPLACE vacuum, but I don't want to add VACOPT_REPLACE only for the purpose. I just removed "FULL REPLACE" syntax; Only "FULL" and "FULL INPLACE" are available. VACUUM FULL without INPLACE behaves as cluster-like rewrites for non-system tables. FULL INPLACE is a traditional vacuum full. System catalogs are always vacuumed with INPLACE version. - VACUUM FULL / VACUUM (FULL) => rewritten version - VACUUM (FULL INPLACE) => traditional version > * In vacuumdb, the code for handling INPLACE is a little ugly. Perhaps > it should be changed to always use your new options syntax? That might > be more code, but it would be a little more readable. It might make the code cleaner, but I want vacuumdb in 8.5 to work on older versions of servers unless we use the new feature. Older servers can only accept older syntax, so I avoided using the new syntax if not needed. (The new patch still uses two versions of syntax.) > * The patch should be merged with CVS HEAD. The changes required are > minor; but notice that there is a minor style difference in the assert > in vacuum(). > * vacuumdb should reject -i without -f > * The replace or inplace option is a "magical" default, because "VACUUM > FULL" defaults to "replace" for user tables and "inplace" for system > tables. I tried to make that more clear in my documentation suggestions. > * There are some windows line endings in the patch, which should be > removed. Done. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Itagaki Takahiro wrote: > Done. (vacuum-full_20091130.patch) > Is this ready for a committer now? Not sure whether Jeff intends to re-review here or not, given that the suggestions and their fixes were pretty straightforward. It looks pretty solid at this point to me. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On Mon, 2009-11-30 at 15:10 -0500, Greg Smith wrote: > Itagaki Takahiro wrote: > > Done. (vacuum-full_20091130.patch) > > > Is this ready for a committer now? Not sure whether Jeff intends to > re-review here or not, given that the suggestions and their fixes were > pretty straightforward. It looks pretty solid at this point to me. The code is in good shape. I was going to take another stab at the documentation (which needs some rewording after the changes), and maybe look at vacuumdb again, as well. Nothing major, so expect it to be "ready for committer" tonight. Of course, a committer can take a look at it sooner, if they have time. Regards,Jeff Davis
On Mon, 2009-11-30 at 14:38 +0900, Itagaki Takahiro wrote: > > * "VACUUM (FULL REPLACE) pg_class" should be rejected, not silently > > turned into "VACUUM (FULL INPLACE) pg_class". > > Hmmm, it requires to remember whether REPLACE is specified or not > for the non-INPLACE vacuum, but I don't want to add VACOPT_REPLACE > only for the purpose. > > I just removed "FULL REPLACE" syntax; Only "FULL" and "FULL INPLACE" are > available. VACUUM FULL without INPLACE behaves as cluster-like rewrites > for non-system tables. FULL INPLACE is a traditional vacuum full. > System catalogs are always vacuumed with INPLACE version. > - VACUUM FULL / VACUUM (FULL) => rewritten version > - VACUUM (FULL INPLACE) => traditional version Ok, looks good. It's cleaner now, too. > It might make the code cleaner, but I want vacuumdb in 8.5 to work on older > versions of servers unless we use the new feature. Older servers can only > accept older syntax, so I avoided using the new syntax if not needed. > (The new patch still uses two versions of syntax.) Good point. I attached a suggestion of how it might look if you detected the server version explicitly. You don't have to use it, but it's what I had in mind. Also, I think the current version fails if -i is passed and it's connecting to an old server, so explicit server version detection may be required. > > * The patch should be merged with CVS HEAD. The changes required are > > minor; but notice that there is a minor style difference in the assert > > in vacuum(). Very minor style issue: it looks like Tom specifically changed the order of the expression in the Assert() from your first vacuum options patch. I attached a diff to show you what I mean -- the complex boolean expressions are easier to read if the styles match. > > * vacuumdb should reject -i without -f > > * The replace or inplace option is a "magical" default, because "VACUUM > > FULL" defaults to "replace" for user tables and "inplace" for system > > tables. I tried to make that more clear in my documentation suggestions. > > * There are some windows line endings in the patch, which should be > > removed. Great, thank you for the patch! Marking as ready. Regards, Jeff Davis
Attachment
On Tue, 2009-12-01 at 01:43 -0800, Jeff Davis wrote: > Marking as ready. You're saying its ready, yet there are 3 additional suggestions patches attached here. Please can you resolve these and re-submit a single final patch from author and reviewer? I will review and eventually commit this, if appropriate. It is either 1st or 2nd in my queue, unless someone else grabs it first. Review comments * What happens if you issue VACUUM FULL; which we would expect to use the new method of vacuum on all tables in the database. Won't that just fail with an error when it comes to catalog tables? Sounds to me like we should avoid the error and just silently do an INPLACE on catalog tables. * Such a pivotal change to Postgres needs more test coverage than a single line in regression tests. It might have been OK before, but I think we need a few more combinations here, at least in this release: with, without indexes, empty table, clustered, non-clustered etc and of course a full database VACUUM so that we have the catalog table case covered, plus an explicit catalog table vacuum. -- Simon Riggs www.2ndQuadrant.com
On Fri, 2009-12-04 at 09:20 +0000, Simon Riggs wrote: > On Tue, 2009-12-01 at 01:43 -0800, Jeff Davis wrote: > > Marking as ready. > > You're saying its ready, yet there are 3 additional suggestions patches > attached here. Please can you resolve these and re-submit a single final > patch from author and reviewer? My apologies. At the time, I thought a couple days might matter, and the changes are in areas that committers tend to editorialize anyway: docs and a style issue. The only substantial patch was to vacuumdb.c. Complete patch attached including my edits. > * What happens if you issue VACUUM FULL; which we would expect to use > the new method of vacuum on all tables in the database. Won't that just > fail with an error when it comes to catalog tables? Sounds to me like we > should avoid the error and just silently do an INPLACE on catalog > tables. That's how it works. > * Such a pivotal change to Postgres needs more test coverage than a > single line in regression tests. It might have been OK before, but I > think we need a few more combinations here, at least in this release: > with, without indexes, empty table, clustered, non-clustered etc and of > course a full database VACUUM so that we have the catalog table case > covered, plus an explicit catalog table vacuum. It was my impression that the regression tests aren't meant to be exhaustive, but merely exercise a good portion of the code to help detect simple breakage. Also, pg_regress isn't good for detecting a lot of the problems that vacuum might have (how do you even know whether the vacuum happened in-place or not?). We could put a VACUUM FULL; and a VACUUM (FULL INPLACE); somewhere, which will cover a lot of the cases you're talking about. However, that may be a performance consideration especially for people who develop on laptops. In general, though, I think the right place for this is a longer test suite that is meant to be run less frequently. Regards, Jeff Davis
Attachment
On Fri, 2009-12-04 at 09:56 -0800, Jeff Davis wrote: > We could put a VACUUM FULL; and a VACUUM (FULL INPLACE); somewhere, > which will cover a lot of the cases you're talking about. However, that > may be a performance consideration especially for people who develop on > laptops. Let's check it works before worrying about performance. We can take tests out as well as add them once it becomes obvious its working. -- Simon Riggs www.2ndQuadrant.com
On Fri, 2009-12-04 at 18:36 +0000, Simon Riggs wrote: > Let's check it works before worrying about performance. We can take > tests out as well as add them once it becomes obvious its working. Itagaki-san, perhaps you should add a variety of tests, and then Simon can remove extra tests after he's convinced that it works. I tested a variety of situations during my review, and everything worked as I expected. Regards,Jeff Davis
On Dec 4, 2009, at 18:07 , Jeff Davis wrote: > On Fri, 2009-12-04 at 18:36 +0000, Simon Riggs wrote: >> Let's check it works before worrying about performance. We can take >> tests out as well as add them once it becomes obvious its working. > > Itagaki-san, perhaps you should add a variety of tests, and then Simon > can remove extra tests after he's convinced that it works. > > I tested a variety of situations during my review, and everything > worked > as I expected. Would there be a way for you to package the scenarios you tested into a suite? Michael Glaesemann grzm seespotcode net
On Fri, 2009-12-04 at 19:49 -0500, Michael Glaesemann wrote: > > I tested a variety of situations during my review, and everything > > worked > > as I expected. > > Would there be a way for you to package the scenarios you tested into > a suite? Except for the simplest tests, they aren't easily moved to pg_regress. For instance, how do you tell if a user table's relfilenode actually changed? Easy to do manually, but tough to do with pg_regress. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2009-12-04 at 18:36 +0000, Simon Riggs wrote: > > Let's check it works before worrying about performance. We can take > > tests out as well as add them once it becomes obvious its working. > > Itagaki-san, perhaps you should add a variety of tests, and then Simon > can remove extra tests after he's convinced that it works. I added regression tests for database-wide vacuums and check changes of relfilenodes in those commands. Only sampled tables are checked in tests -- normal, fundamental and shared catalogs and clusterd, temp and normal tables. Since relfilenodes are unstable between tests, only changes of relfilenodes are compared. Do you think the added tests are enough? Of course we could have cases for serveral updated patterns, but it will be exhaustive. I think checks for relfilenodes are enough in this case. BTW, I needed to add ORDER BY cluase in select_views test. I didn't modify tests in select_views at all, but database-wide vacuum moves tuples in select_views test. I think the fix should be reasonable becausae unsorted result set is always unstable in regression test. ---- added tests ---- CREATE TEMP TABLE vacid ( relid regclass, filenode_0 oid, filenode_1 oid, filenode_2 oid, filenode_3 oid ); INSERT INTO vacid (relid, filenode_0) SELECT oid, relfilenode FROM pg_class WHERE oid::regclass IN ( 'pg_am', -- normal catalog 'pg_class', -- fundamental catalog 'pg_database', -- shared catalog 'vaccluster' , -- clustered table 'vacid', -- temp table 'vactst' -- normal table ); CLUSTER; -- only clusterd table should be changed UPDATE vacid SET filenode_1 = relfilenode FROM pg_class WHERE oid = relid; VACUUM (FULL INPLACE); -- all tables should not be changed UPDATE vacid SET filenode_2 = relfilenode FROM pg_class WHERE oid = relid; VACUUM FULL; -- only non-system tables should be changed UPDATE vacid SET filenode_3 = relfilenode FROM pg_class WHERE oid = relid; SELECT relid, filenode_0 <> filenode_1 AS cluster, filenode_1 <> filenode_2 AS full_inplace, filenode_2 <> filenode_3 AS full FROM vacid ORDER BY relid::text; relid | cluster | full_inplace | full -------------+---------+--------------+------ pg_am | f | f | f pg_class | f | f | f pg_database | f | f | f vaccluster | t | f | t vacid | f | f | t vactst | f | f | t (6 rows) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > I added regression tests for database-wide vacuums and check changes > of relfilenodes in those commands. > ... > BTW, I needed to add ORDER BY cluase in select_views test. I didn't modify > tests in select_views at all, but database-wide vacuum moves tuples in > select_views test. I think the fix should be reasonable becausae unsorted > result set is always unstable in regression test. You should take those out again; if I am the committer I certainly will. Such a test will guarantee complete instability of every other regression test, and it's not worth it. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > You should take those out again; if I am the committer I certainly will. > Such a test will guarantee complete instability of every other > regression test, and it's not worth it. I read the original comment was saying to add regression tests for database-wide vacuums. But I'll reduce the range of vacuum if they are not acceptable. The new patch contains only table-based vacuum for local tables and some of system tables to test non-INPLACE vacuum are not used for system tables. VACUUM FULL pg_am; VACUUM FULL pg_class; VACUUM FULL pg_database; Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Here is an updated patch rebased to the latest CVS HEAD. One remaining concern is VERBOSE. Log messages by FULL (rewrite) are less verbose than FULL INPLACE. The same can be said for CLUSTER VERBOSE, though. I don't have any plans to make CLUSTER more verbose in the patch, but "more verbose CLUSTER" could be a TODO item. =# VACUUM (FULL, VERBOSE) pgbench_accounts; INFO: vacuuming "public.pgbench_accounts" VACUUM =# VACUUM (FULL INPLACE, VERBOSE) pgbench_accounts; INFO: vacuuming "public.pgbench_accounts" INFO: "pgbench_accounts": found 0 removable, 100000 nonremovable row versions in 1640 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 128 to 128 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 195520 bytes. 0 pages are or will become empty, including 0 at the end of the table. 1 pages containing 5396 free bytes are potential move destinations. CPU 0.00s/0.01u sec elapsed 0.01 sec. INFO: index "pgbench_accounts_pkey" now contains 100000 row versions in 276 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pgbench_accounts": moved 0 row versions, truncated 1640 to 1640 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Regards, --- Takahiro Itagaki NTT Open Source Software Center
Attachment
Takahiro Itagaki wrote: > Here is an updated patch rebased to the latest CVS HEAD. > > One remaining concern is VERBOSE. Log messages by FULL (rewrite) are less > verbose than FULL INPLACE. The same can be said for CLUSTER VERBOSE, though. > I don't have any plans to make CLUSTER more verbose in the patch, but > "more verbose CLUSTER" could be a TODO item. Hmm. With this patch, if I do "vacuumdb -f" it will not vacuum the special system catalogs that can only be vacuumed with INPLACE, correct? If that's correct, I wonder whether it would be good to do a regular not-inplace VF for most relations, and do INPLACE for those catalogs. That way, if a sysadmin has a vacuumdb -f in crontab, it will continue to do what he expects. Maybe this is not important if Simon is able to get inplace working for all catalogs. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 12/15/09 1:05 AM, Takahiro Itagaki wrote: > Here is an updated patch rebased to the latest CVS HEAD. > > One remaining concern is VERBOSE. Log messages by FULL (rewrite) are less > verbose than FULL INPLACE. The same can be said for CLUSTER VERBOSE, though. > I don't have any plans to make CLUSTER more verbose in the patch, but > "more verbose CLUSTER" could be a TODO item. That's of necessity; the new CLUSTER isn't checking the contents of the table. However, it could report: Size of table before VF Size of table after VF Space reclaimed Index space reclaimed (per index) --Josh Berkus
Alvaro Herrera <alvherre@commandprompt.com> wrote: > Hmm. With this patch, if I do "vacuumdb -f" it will not vacuum the > special system catalogs that can only be vacuumed with INPLACE, correct? No. It will vacuum normal tables with FULL (rewrite), and system catalogs with FULL INPLACE. FULL vacuums for system catalogs always fall back to INPLACE vacuum silently. But certainly we cannot recommend to use daily database-wide VACUUM FULLs because they have higher costs than repeated FULL INPLACE vacuums. FULL (rewrite) will not be cheaper for tables that have little dead tuples. Just an idea, something like "vacuumdb -f --threshold=<some baseline>" might be useful for users running daily "vacuumdb -f". Regards, --- Takahiro Itagaki NTT Open Source Software Center
On Mon, 2009-12-07 at 16:55 +0900, Itagaki Takahiro wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > You should take those out again; if I am the committer I certainly will. > > Such a test will guarantee complete instability of every other > > regression test, and it's not worth it. > > I read the original comment was saying to add regression tests for > database-wide vacuums. But I'll reduce the range of vacuum if they > are not acceptable. > > The new patch contains only table-based vacuum for local tables and some of > system tables to test non-INPLACE vacuum are not used for system tables. > VACUUM FULL pg_am; > VACUUM FULL pg_class; > VACUUM FULL pg_database; Thanks for adding those additional tests. I notice that during copy_heap_data() we make no attempt to skip pages that are all visible according to the visibilitymap. It seems like it would be a substantial win to copy whole blocks if all the pre-conditions are met (I see what they are). I'm surprised to see that neither CLUSTER nor VACUUM FULL made use of this previously. I think we either need to implement that or document that vacuum will not skip all-visible pages when running VACUUM FULL. Also, I notice that if we perform new VACUUM FULL on a table it will fully re-write the table and rebuild indexes, even if it hasn't found a single row to remove. Old VACUUM FULL was substantially faster than this on tables that had nothing to remove. We aren't asking users to recode anything, so many people will be performing "VACUUM FULL;" as usual every night or weekend. If they do that it will result in substantially longer run times in many databases, all while holding AccessExclusiveLocks. Please can you arrange for the cluster operation to skip rebuilding indexes if the table is not reduced in size? Part of the reason why these happen is that the code hasn't been refactored much at all from its original use for cluster. There are almost zero comments to explain the additional use of this code for VACUUM, or at least to explain it still all works even when there is no index. e.g. check_index_is_clusterable() ought not to be an important routine when there is no index being clustered. I'm seeing that the code all works but that this patch isn't yet a sufficiently permanent change to the code for me to commit, though it could be soon. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > I notice that during copy_heap_data() we make no attempt to skip pages > that are all visible according to the visibilitymap. It seems like it > would be a substantial win to copy whole blocks if all the > pre-conditions are met (I see what they are). I'm surprised to see that > neither CLUSTER nor VACUUM FULL made use of this previously. I think we > either need to implement that or document that vacuum will not skip > all-visible pages when running VACUUM FULL. Unfortunately the visibility map isn't completely crash-safe at the moment (see comments in visibilitymap.c for details). So it's not safe to use it for such purposes. I was planning to address that in 8.5 but it seems I won't have the time. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Dec 21, 2009 at 12:56 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Simon Riggs wrote: >> I notice that during copy_heap_data() we make no attempt to skip pages >> that are all visible according to the visibilitymap. It seems like it >> would be a substantial win to copy whole blocks if all the >> pre-conditions are met (I see what they are). I'm surprised to see that >> neither CLUSTER nor VACUUM FULL made use of this previously. I think we >> either need to implement that or document that vacuum will not skip >> all-visible pages when running VACUUM FULL. > > Unfortunately the visibility map isn't completely crash-safe at the > moment (see comments in visibilitymap.c for details). So it's not safe > to use it for such purposes. I was planning to address that in 8.5 but > it seems I won't have the time. Well since we're going to have to read in the page to do the copy we could just use the page header flag PD_ALL_VISIBLE instead. But sequential scans already use that bit and I'm assuming but haven't checked that these access paths do use the same underlying access path as sequential scans. In which case it won't really save much since the main advantage would be skipping the visibility checks. Saving the actual work to copy tuples retail instead of the whole block wholesale seems unlikely to buy much and would result in us not compacting space on the page and storing accurate free space map values which I think people would expect from both of these commands. If I'm wrong and these commands are not using a sequential scan under the hood or the fact that they're using SNAPSHOT_ANY defeats that optimization then perhaps there is something there. On the third hand presumably all the hint bits will be set if the page bit is set so perhaps there's nothing there even so. -- greg
Simon Riggs <simon@2ndQuadrant.com> wrote: > I think we > either need to implement that or document that vacuum will not skip > all-visible pages when running VACUUM FULL. All-visible does not always mean "filled enough", no? We will need to check both visibility and fillfactor when we optimize copying tuples. > Old VACUUM FULL was substantially faster than this on tables that had > nothing to remove. Yeah, that's why traditional VACUUM FULL is still kept as INPLACE vacuum. > Please can you arrange for the cluster operation to skip rebuilding > indexes if the table is not reduced in size? Do you think we should dispose the rewritten table when we find the VACUUM FULL (or CLUSTER) is useless? We could save the time to reindex, but we've already consumed time to rewrite tables. It will be still slower than VACUUM FULL INPLACE because it is read-only. Instead, I'd suggest to have conditional database-wide maintenance commands, something like: VACUUM FULL WHERE <the table is fragmented> We don't have to support the feature by SQL syntax; it could be done in client tools. How about pg_maintain or pg_ctl maintain that cleans up each relation with appropriate command? We could replace vacuumdb, clusterdb, and reindexdb with it then. > Part of the reason why these happen is that the code hasn't been > refactored much at all from its original use for cluster. There are > almost zero comments to explain the additional use of this code for > VACUUM, or at least to explain it still all works even when there is no > index. Ok, I'll check for additional comments. The flow of code might be still confusable because vacuum() calls cluster(), but we need major replacement of codes to refactor it. I'm not sure we need the code rewrites for it. Also, I think we need additional messages for VACUUM VERBOSE (and CLUSTER VERBOSE), though it might be adjusted in another patch. Regards, --- Takahiro Itagaki NTT Open Source Software Center
On Tue, 2009-12-22 at 18:11 +0900, Takahiro Itagaki wrote: > > > Old VACUUM FULL was substantially faster than this on tables that > had > > nothing to remove. > Yeah, that's why traditional VACUUM FULL is still kept as INPLACE > vacuum. > > > Please can you arrange for the cluster operation to skip rebuilding > > indexes if the table is not reduced in size? > > Do you think we should dispose the rewritten table when we find the > VACUUM FULL (or CLUSTER) is useless? We could save the time to > reindex, > but we've already consumed time to rewrite tables. The main purpose of doing a new VF is to compact space, so its role has slightly changed from earlier versions. We need much clearer docs about this. On a production system, it seems better to skip the re-indexing, which could take a long, long time. I don't see any way to skip completely re-writing the table though, other than scanning the table with a normal VACUUM first, as old VF does. I would be inclined towards the idea that if somebody does a VF of a whole database then we should look out for and optimise for tables with no changes, but when operating on a single table we just do as instructed and rebuild everything, including indexes. That seems like we should do it, but we're running out of time. For now, I think we can easily and should skip the index build, if appropriate. That just takes a little reworking of code, which appears necessary anyway. We should just document that this works a little differently now and a complete db VF is now not either necessary or a good thing. And running VACUUM; REINDEX DATABASE foo; will now be very pointless. > It will be still > slower than VACUUM FULL INPLACE because it is read-only. Understood, lets document that. -- Simon Riggs www.2ndQuadrant.com
On Tue, 2009-12-22 at 18:11 +0900, Takahiro Itagaki wrote: > Instead, I'd suggest to have conditional database-wide maintenance > commands, something like: > VACUUM FULL WHERE <the table is fragmented> > > We don't have to support the feature by SQL syntax; it could be done > in client tools. How about pg_maintain or pg_ctl maintain that cleans > up each relation with appropriate command? We could replace vacuumdb, > clusterdb, and reindexdb with it then. Some broad thoughts... Our perception of acceptable change is much higher than most users. If we tell people "use VACUUM FULL" or vacuumdb -f, then that command should, if possible, continue to work well across many releases. vacuumdb in most people's minds is the command you run to ease maintenance and make everything right, rather than a specific set of features. We have "It just works" as a principle. I think the corollary of that is that we should also have "It just continues to work the same way". -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> wrote: > Our perception of acceptable change is much higher than most users. If > we tell people "use VACUUM FULL" or vacuumdb -f, then that command > should, if possible, continue to work well across many releases. > vacuumdb in most people's minds is the command you run to ease > maintenance and make everything right, rather than a specific set of > features. > > We have "It just works" as a principle. I think the corollary of that is > that we should also have "It just continues to work the same way". I used "VACUUM FULL" because we were discussing to drop VFI completely, but I won't replace the behavior if hot-standby can support VFI. We can use any keywords without making it reserved in "VACUUM (...)" syntax. VACUUM (REWRITE), the first idea, can be used here. We can also take on entirely-different syntax for it -- ex, "ALTER TABLE REWRITE or SHRINK". I think the ALTER TABLE idea is not so bad because it does _not_ support database-wide maintenance. REWRITE is not the best maintenance in normal cases because a database should contain some rarely-updated tables. Regards, --- Takahiro Itagaki NTT Open Source Software Center
On Tue, 2009-12-22 at 19:45 +0900, Takahiro Itagaki wrote: > I used "VACUUM FULL" because we were discussing to drop VFI completely, > but I won't replace the behavior if hot-standby can support VFI. HS can't support VFI now, by definition. We agreed to spend the time getting rid of VFI, which working on this with you is part of. If we can just skip the index rebuild, I think that's all the additional code changes we need. I'll improve the docs as I review-to-commit. -- Simon Riggs www.2ndQuadrant.com
On Tue, Dec 22, 2009 at 7:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, 2009-12-22 at 19:45 +0900, Takahiro Itagaki wrote: > >> I used "VACUUM FULL" because we were discussing to drop VFI completely, >> but I won't replace the behavior if hot-standby can support VFI. > > HS can't support VFI now, by definition. We agreed to spend the time > getting rid of VFI, which working on this with you is part of. > > If we can just skip the index rebuild, I think that's all the additional > code changes we need. I'll improve the docs as I review-to-commit. So, what is the roadmap for getting this done? It seems like to get rid of VFI completely, we would need to implement something like what Tom described here: http://archives.postgresql.org/pgsql-hackers/2009-09/msg00249.php I'm not sure whether the current patch is a good intermediate step towards that ultimate goal, or whether events have overtaken it. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > So, what is the roadmap for getting this done? It seems like to get > rid of VFI completely, we would need to implement something like what > Tom described here: > > http://archives.postgresql.org/pgsql-hackers/2009-09/msg00249.php > > I'm not sure whether the current patch is a good intermediate step > towards that ultimate goal, or whether events have overtaken it. I think the most desirable roadmap is: 1. Enable CLUSTER to non-critical system catalogs. 2. Also enable CLUSTER andREINDEX to critical system catalogs. 3. Remove VFI and re-implement VACUUM FULL with CLUSTER-based approach. Itshould be also optimized as Simon's suggestion. My patch was intended to do 3, but we should not skip 1 and 2. In the roadmap, we don't have two versions of VACUUM FULL (INPLACE and REWRITE) at a time. I think we can do 1 immediately. The comment in cluster says "might work", and I also think so. CLUSTERable toast tables are obviously useful. /* * Disallow clustering system relations. Thiswill definitely NOT work * for shared relations (we have no way to update pg_class rows in other * databases),nor for nailed-in-cache relations (the relfilenode values * for those are hardwired, see relcache.c). It mightwork for other * system relations, but I ain't gonna risk it. */ For 2, we need some kinds of "relfilenode mapper" for shared relations and critical local tables (pg_class, pg_attribute, pg_proc, and pg_type). I'm thinking that we only store "virtual" relfilenodes for them in pg_class and remember the actual relfilenodes in shared memory. For example, smgropen(1248:pg_database) is redirected to smgropen(mapper[1248]). Since we cannot touch pg_class in non-login databases, we need to avoid updating pg_class when we assign new relfilenodes for shared relations. We also need to store the nodes in additional flat file. There might be another approach to store them in control file for shared relation (ControlFileData.shared_relfilenode_mapper as Oid[]), or pg_database for local tables (pg_database.datclsssnode, datprocnode etc.) What approach would be better? Regards, --- Takahiro Itagaki NTT Open Source Software Center
Happy New Year, On Mon, 2010-01-04 at 11:50 +0900, Takahiro Itagaki wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > > > So, what is the roadmap for getting this done? It seems like to get > > rid of VFI completely, we would need to implement something like what > > Tom described here: > > > > http://archives.postgresql.org/pgsql-hackers/2009-09/msg00249.php > > > > I'm not sure whether the current patch is a good intermediate step > > towards that ultimate goal, or whether events have overtaken it. > > I think the most desirable roadmap is: > 1. Enable CLUSTER to non-critical system catalogs. > 2. Also enable CLUSTER and REINDEX to critical system catalogs. > 3. Remove VFI and re-implement VACUUM FULL with CLUSTER-based approach. > It should be also optimized as Simon's suggestion. > > My patch was intended to do 3, but we should not skip 1 and 2. In the roadmap, > we don't have two versions of VACUUM FULL (INPLACE and REWRITE) at a time. > > I think we can do 1 immediately. The comment in cluster says "might work", > and I also think so. CLUSTERable toast tables are obviously useful. You make some good points. I would prefer this slightly modified version 1. Commit your patch, as-is (you/me) 2. Work on infrastructure for VFC (VACUUM FULL using CLUSTER) for system relations (Simon) 3. Enable CLUSTER and REINDEX on critical system catalogs (Itagaki) 4. Optimise VFC, as discussed earlier (Itagaki) I have put names in brackets, but this is just a suggestion. This differs from your sequence in only a few ways * We implement the basic VFC now, so everybody knows what we have * We separate the infrastructure for (2) from the enabling of this infrastructure for CLUSTER and REINDEX. There may be additional issues to consider for those cases and we should think through and test them as a different task * We do not remove VFI in this release This is a more cautious approach. Completely removing VFI in this release is a big risk that we need not take; we have little to gain from doing so and putting it back again will be harder. I am always keen to push forwards when a new feature is worthwhile, but cleaning up code is not an important thing this late in release cycle. -- Simon Riggs www.2ndQuadrant.com
On Mon, Jan 4, 2010 at 3:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > This is a more cautious approach. Completely removing VFI in this > release is a big risk that we need not take; we have little to gain from > doing so and putting it back again will be harder. I am always keen to > push forwards when a new feature is worthwhile, but cleaning up code is > not an important thing this late in release cycle. I don't have a strong opinion one way or the other on whether we should remove VFI this release cycle, but I thought the reason why there was pressure to do that was because we will otherwise need to make changes to Hot Standby to cope with VFI. Or in other words, I thought that in order to wrap a release we would need to do one of (1) remove VFI and (2) fix HS to cope with VFI, and maybe there was a theory that the former was easier than the latter. But it's possible I may have totally misunderstood the situation. What is your thought on how to handle this? ...Robert
On Mon, 2010-01-04 at 10:31 -0500, Robert Haas wrote: > On Mon, Jan 4, 2010 at 3:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > This is a more cautious approach. Completely removing VFI in this > > release is a big risk that we need not take; we have little to gain from > > doing so and putting it back again will be harder. I am always keen to > > push forwards when a new feature is worthwhile, but cleaning up code is > > not an important thing this late in release cycle. > > I don't have a strong opinion one way or the other on whether we > should remove VFI this release cycle, but I thought the reason why > there was pressure to do that was because we will otherwise need to > make changes to Hot Standby to cope with VFI. What I should have said, in addition: VFI will be kept as a non-default option, in case it is required. We will document that use of VFI will not work correctly with HS and that its use is deprecated and should be in emergencies only in any case. I will enjoy removing VFI when that eventually occurs, but its not a priority. (And if you think, why keep it? I'll say - how else can we run a VFI - not by a stored proc, certainly). -- Simon Riggs www.2ndQuadrant.com
On Mon, Jan 4, 2010 at 11:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2010-01-04 at 10:31 -0500, Robert Haas wrote: >> On Mon, Jan 4, 2010 at 3:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > This is a more cautious approach. Completely removing VFI in this >> > release is a big risk that we need not take; we have little to gain from >> > doing so and putting it back again will be harder. I am always keen to >> > push forwards when a new feature is worthwhile, but cleaning up code is >> > not an important thing this late in release cycle. >> >> I don't have a strong opinion one way or the other on whether we >> should remove VFI this release cycle, but I thought the reason why >> there was pressure to do that was because we will otherwise need to >> make changes to Hot Standby to cope with VFI. > > What I should have said, in addition: VFI will be kept as a non-default > option, in case it is required. We will document that use of VFI will > not work correctly with HS and that its use is deprecated and should be > in emergencies only in any case. I will enjoy removing VFI when that > eventually occurs, but its not a priority. (And if you think, why keep > it? I'll say - how else can we run a VFI - not by a stored proc, > certainly). If we go this route, can we make it fail in a relatively detectable way with Hot Standby? Like an error message that says "oh, crap, you did a VFI, you need a new base backup"? Or will it do something goofier than that? I don't have an informed opinion on whether or not we should try to remove VFI in this release, and I leave that discussion to yourself and other people who are more qualified to speak to that issue than I am. I am somewhat dismayed that there are not more people weighing in on this, because it seems to me that this is a critical issue for the forthcoming release, so we really need to make sure we have consensus on the way forward NOW, not a month from now. ...Robert
>> What I should have said, in addition: VFI will be kept as a non-default >> option, in case it is required. We will document that use of VFI will >> not work correctly with HS and that its use is deprecated and should be >> in emergencies only in any case. I will enjoy removing VFI when that >> eventually occurs, but its not a priority. (And if you think, why keep >> it? I'll say - how else can we run a VFI - not by a stored proc, >> certainly). Isn't there some way we can tell if a server is an HS master, and prevent VFI from being run? --Josh Berkus
On Mon, 2010-01-04 at 12:05 -0800, Josh Berkus wrote: > >> What I should have said, in addition: VFI will be kept as a non-default > >> option, in case it is required. We will document that use of VFI will > >> not work correctly with HS and that its use is deprecated and should be > >> in emergencies only in any case. I will enjoy removing VFI when that > >> eventually occurs, but its not a priority. (And if you think, why keep > >> it? I'll say - how else can we run a VFI - not by a stored proc, > >> certainly). > > Isn't there some way we can tell if a server is an HS master, and > prevent VFI from being run? I'm proposing that VFI is only accessible by explicit request using new syntax; no existing code would call VFI. The VFI problems would only apply to system relations anyway, not to all tables. I propose we have a WARNING if VFI being run when recovery_connections = on, since I probably know what I'm doing if I go out of my way to use new syntax after presumably having read the manual. Just as a point of note, I'm worried that the act of removing VFI would introduce more bugs than leaving it alone; if its there we may as well keep it runnable. Changes required to remove it are at least these places * most of vacuum.c * visibility checks * heap tuple flags and xvac * nontransactional validation * minor points and follow up in >7 files, >12 places -- Simon Riggs www.2ndQuadrant.com
On Mon, Jan 4, 2010 at 3:51 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2010-01-04 at 12:05 -0800, Josh Berkus wrote: >> >> What I should have said, in addition: VFI will be kept as a non-default >> >> option, in case it is required. We will document that use of VFI will >> >> not work correctly with HS and that its use is deprecated and should be >> >> in emergencies only in any case. I will enjoy removing VFI when that >> >> eventually occurs, but its not a priority. (And if you think, why keep >> >> it? I'll say - how else can we run a VFI - not by a stored proc, >> >> certainly). >> >> Isn't there some way we can tell if a server is an HS master, and >> prevent VFI from being run? > > I'm proposing that VFI is only accessible by explicit request using new > syntax; no existing code would call VFI. > > The VFI problems would only apply to system relations anyway, not to all > tables. > > I propose we have a WARNING if VFI being run when recovery_connections = > on, since I probably know what I'm doing if I go out of my way to use > new syntax after presumably having read the manual. I think I'd vote for throwing an ERROR. By the time you see the WARNING it may be too late. Since this is only for emergencies, the user can shut off recovery_connections first if they really need it. > Just as a point of note, I'm worried that the act of removing VFI would > introduce more bugs than leaving it alone; if its there we may as well > keep it runnable. > > Changes required to remove it are at least these places > > * most of vacuum.c > * visibility checks > * heap tuple flags and xvac > * nontransactional validation > * minor points and follow up in >7 files, >12 places Doesn't sound trivial. ...Robert
> I think I'd vote for throwing an ERROR. By the time you see the > WARNING it may be too late. Since this is only for emergencies, the > user can shut off recovery_connections first if they really need it. I'm with Robert on this one. If running VFI will cause unrecoverable failure on the slave, it should be prevented. If this is only an issue with system tables, then we only need to error on system tables. --Josh
On Mon, 2010-01-04 at 16:41 -0500, Robert Haas wrote: > > I propose we have a WARNING if VFI being run when recovery_connections = > > on, since I probably know what I'm doing if I go out of my way to use > > new syntax after presumably having read the manual. > > I think I'd vote for throwing an ERROR. By the time you see the > WARNING it may be too late. Since this is only for emergencies, the > user can shut off recovery_connections first if they really need it. OK -- Simon Riggs www.2ndQuadrant.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 4, 2010 at 3:51 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Changes required to remove it are at least these places >> >> * most of vacuum.c >> * visibility checks >> * heap tuple flags and xvac >> * nontransactional validation >> * minor points and follow up in >7 files, >12 places > Doesn't sound trivial. The above is a vast overstatement of the problem. Simon is not only talking about removing VACUUM FULL, he's talking about removing every trace that it ever existed, eg deleting support for MOVED_OFF/MOVED_IN tuple status flags. We are *not* doing that, not now nor in the foreseeable future. As long as we have any ambition of having in-place upgrade from pre-8.5 we have to handle the MOVED status bits the same as we do now. AFAICS, ripping out most of the guts of vacuum.c is about all that's likely to happen for 8.5. regards, tom lane
On Mon, Jan 4, 2010 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Jan 4, 2010 at 3:51 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Changes required to remove it are at least these places >>> >>> * most of vacuum.c >>> * visibility checks >>> * heap tuple flags and xvac >>> * nontransactional validation >>> * minor points and follow up in >7 files, >12 places > >> Doesn't sound trivial. > > The above is a vast overstatement of the problem. Simon is not only > talking about removing VACUUM FULL, he's talking about removing every > trace that it ever existed, eg deleting support for MOVED_OFF/MOVED_IN > tuple status flags. We are *not* doing that, not now nor in the > foreseeable future. As long as we have any ambition of having in-place > upgrade from pre-8.5 we have to handle the MOVED status bits the same as > we do now. > > AFAICS, ripping out most of the guts of vacuum.c is about all that's > likely to happen for 8.5. Well, it sounds like Simon is saying we shouldn't do even that much. Frankly, I'm less concerned with what we're not doing than with what we are doing. From my point of view this VACUUM FULL patch is the most important patch on the table, because it seems that before we can release (1) it has to be committed and then (2) some more work has to be done and committed. We can decide later exactly how far we want to take it and whether to do what you're suggesting here or not; right now I think we should try to keep the focus on moving the current patch forward, since based on what has been said so far, that seems to be on the critical path for 8.5. ...Robert
On Mon, 2010-01-04 at 08:04 +0000, Simon Riggs wrote: > I would prefer this slightly modified version > > 1. Commit your patch, as-is (you/me) I assume this is OK with you now? -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> wrote: > > 1. Commit your patch, as-is (you/me) > > I assume this is OK with you now? I just applied the patch with a few additional comments. Also, I adjusted some messages for vacuumdb to be look-alike for recently-committed --only-analyze patch. Remaining ToDo items are: >> 2. Work on infrastructure for VFC (VACUUM FULL using CLUSTER) for system >> relations (Simon) >> 3. Enable CLUSTER and REINDEX on critical system catalogs (Itagaki) >> 4. Optimise VFC, as discussed earlier (Itagaki) and we might also need: 5. Make CLUSTER VERBOSE to be more verbose because VACUUM FULL VERBOSE is less verbose thanVFI VERBOSE for now. Regards, --- Takahiro Itagaki NTT Open Source Software Center
On Wed, 2010-01-06 at 14:41 +0900, Takahiro Itagaki wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: > > > > 1. Commit your patch, as-is (you/me) > > > > I assume this is OK with you now? > > I just applied the patch with a few additional comments. > Also, I adjusted some messages for vacuumdb to be look-alike > for recently-committed --only-analyze patch. Perfect, thank you. My morning's work is already done, so onto the next item on the list. -- Simon Riggs www.2ndQuadrant.com
On ons, 2010-01-06 at 14:41 +0900, Takahiro Itagaki wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: > > > > 1. Commit your patch, as-is (you/me) > > > > I assume this is OK with you now? > > I just applied the patch with a few additional comments. Please clean up the compiler warnings: cluster.c: In function 'cluster_rel': cluster.c:789: warning: 'heapScan' may be used uninitialized in this function cluster.c:789: note: 'heapScan' was declared here cluster.c:788: warning: 'indexScan' may be used uninitialized in this function cluster.c:788: note: 'indexScan' was declared here
On Wed, 2010-01-06 at 14:41 +0900, Takahiro Itagaki wrote: > I just applied the patch with a few additional comments. I just realised that this new feature *removes* any clustering that was previously defined on a table. Many people would see that as a bug and would say that VACUUM FULL should retain the existing clustering, if any exists. Can't remember if that was discussed already? ISTM that it will be slower if we do that, so it should be either an option or just documented as the new behaviour. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Wed, 2010-01-06 at 14:41 +0900, Takahiro Itagaki wrote: > >> I just applied the patch with a few additional comments. > > I just realised that this new feature *removes* any clustering that was > previously defined on a table. Hmm, that's an overstatement. If the table was in order before, it will be in the same order after VACUUM FULL, all empty gaps and dead tuples are just removed. It also doesn't clear the indisclustered field in pg_index, so the next time you run CLUSTER it will cluster the table just fine. I'm guessing that you mean that VACUUM FULL doesn't reorder the table like CLUSTER does. I think that's OK, it has never done that. In fact the current situation is already an improvement, the new VACUUM FULL doesn't reshuffle the table and destroy old ordering like the old one does. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com