Thread: pg_migrator issues
pg_migrator has become more popular recently, so it seems time to look at some enhancements that would improve pg_migrator. None of these are required, but rather changes that would be nice to have: 1) Right now pg_migrator preserves relfilenodes for TOAST files because this is required for proper migration. Now that we have shown that strategically-placed global variables with a server-side function to set them is a viable solution, it would be nice to preserve all relfilenodes from the old server. This would simplify pg_migrator by no long requiring place-holder relfilenodes or the renaming of TOAST files. A simpler solution would just be to allow TOAST table creation to automatically remove placeholder files and create specified relfilenodes via global variables. 2) Right now pg_migrator renames old tablespaces to .old, which fails if the tablespaces are on mount points. I have already received a report of such a failure. $PGDATA also has that issue, but that renaming has to be done by the user before pg_migrator is run, and only if they want to keep the same $PGDATA value after migration, i.e. no version-specific directory path. One idea we floated around was to have tablespaces use major version directory names under the tablespace directory so renaming would not be necessary. I could implement a pg_migrator --delete-old flag to cleanly delete the old 8.4 server files which are not in a version-specific subdirectory. 3) There is no easy way to analyze all databases. vacuumdb --analyze does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an unnecessary vacuum. Right now I recommend ANALYZE in every database, but it would be nice if there were a single command which did this. 4) I have implemented the ability to run pg_migrator --check on a live old server. However, pg_migrator uses information from controldata to check things, and it also needs xid information that is only available via pg_resetxlog -n(no update) to perform the migration. Unfortunately, pg_resetxlog -n cannot be run on a live server, so pg_migrator runs pg_controldata for --check and pg_resetxlog -n for real upgrades. It would simplify pg_migrator if I would run pg_resetxlog -n on a live server, but I can understand if people don't want to do that because the xid information reported on a live server is inaccurate. Comments? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > pg_migrator has become more popular recently, so it seems time to look > at some enhancements that would improve pg_migrator. None of these are > required, but rather changes that would be nice to have: > > 1) Right now pg_migrator preserves relfilenodes for TOAST files because > this is required for proper migration. Now that we have shown that > strategically-placed global variables with a server-side function to set > them is a viable solution, it would be nice to preserve all relfilenodes > from the old server. This would simplify pg_migrator by no long > requiring place-holder relfilenodes or the renaming of TOAST files. A > simpler solution would just be to allow TOAST table creation to > automatically remove placeholder files and create specified relfilenodes > via global variables. > > 2) Right now pg_migrator renames old tablespaces to .old, which fails > if the tablespaces are on mount points. I have already received a > report of such a failure. $PGDATA also has that issue, but that > renaming has to be done by the user before pg_migrator is run, and only > if they want to keep the same $PGDATA value after migration, i.e. no > version-specific directory path. One idea we floated around was to have > tablespaces use major version directory names under the tablespace > directory so renaming would not be necessary. I could implement a > pg_migrator --delete-old flag to cleanly delete the old 8.4 server files > which are not in a version-specific subdirectory. > > 3) There is no easy way to analyze all databases. vacuumdb --analyze > does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an > unnecessary vacuum. Right now I recommend ANALYZE in every database, > but it would be nice if there were a single command which did this. > > 4) I have implemented the ability to run pg_migrator --check on a live > old server. However, pg_migrator uses information from controldata to > check things, and it also needs xid information that is only available > via pg_resetxlog -n(no update) to perform the migration. Unfortunately, > pg_resetxlog -n cannot be run on a live server, so pg_migrator runs > pg_controldata for --check and pg_resetxlog -n for real upgrades. It > would simplify pg_migrator if I would run pg_resetxlog -n on a live > server, but I can understand if people don't want to do that because the > xid information reported on a live server is inaccurate. > > Comments? Having received no replies to my email above, I assume the community would like to review and perhaps approve patches to implement all of these items. I will start working on the patches. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce, Sorry for not having replied sooner... On Sun, Jan 3, 2010 at 5:43 PM, Bruce Momjian <bruce@momjian.us> wrote: >> 1) Right now pg_migrator preserves relfilenodes for TOAST files because >> this is required for proper migration. Now that we have shown that >> strategically-placed global variables with a server-side function to set >> them is a viable solution, it would be nice to preserve all relfilenodes >> from the old server. This would simplify pg_migrator by no long >> requiring place-holder relfilenodes or the renaming of TOAST files. A >> simpler solution would just be to allow TOAST table creation to >> automatically remove placeholder files and create specified relfilenodes >> via global variables. I have no opinion on this one way or the other. >> 2) Right now pg_migrator renames old tablespaces to .old, which fails >> if the tablespaces are on mount points. I have already received a >> report of such a failure. $PGDATA also has that issue, but that >> renaming has to be done by the user before pg_migrator is run, and only >> if they want to keep the same $PGDATA value after migration, i.e. no >> version-specific directory path. One idea we floated around was to have >> tablespaces use major version directory names under the tablespace >> directory so renaming would not be necessary. I could implement a >> pg_migrator --delete-old flag to cleanly delete the old 8.4 server files >> which are not in a version-specific subdirectory. I don't really like this. It seems klunky, and it seems like there ought to be a way to avoid needing to rename the tablespace directories at all. >> 3) There is no easy way to analyze all databases. vacuumdb --analyze >> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an >> unnecessary vacuum. Right now I recommend ANALYZE in every database, >> but it would be nice if there were a single command which did this. Something like vacuumdb --analyze-only? It seems like overkill to create a whole new command for this, even though vacuumdb doesn't quite make sense. >> 4) I have implemented the ability to run pg_migrator --check on a live >> old server. However, pg_migrator uses information from controldata to >> check things, and it also needs xid information that is only available >> via pg_resetxlog -n(no update) to perform the migration. Unfortunately, >> pg_resetxlog -n cannot be run on a live server, so pg_migrator runs >> pg_controldata for --check and pg_resetxlog -n for real upgrades. It >> would simplify pg_migrator if I would run pg_resetxlog -n on a live >> server, but I can understand if people don't want to do that because the >> xid information reported on a live server is inaccurate. I don't really have a specific thought on this issue, except that it sounds like you're launching a lot of shell commands, and I wonder whether it would be better to try to do this through either C code or by exposing the appropriate stuff at the SQL level. ...Robert
Bruce Momjian wrote: > pg_migrator has become more popular recently, so it seems time to look > at some enhancements that would improve pg_migrator. None of these are > required, but rather changes that would be nice to have: > > 1) Right now pg_migrator preserves relfilenodes for TOAST files because > this is required for proper migration. Now that we have shown that > strategically-placed global variables with a server-side function to set > them is a viable solution, it would be nice to preserve all relfilenodes > from the old server. This would simplify pg_migrator by no long > requiring place-holder relfilenodes or the renaming of TOAST files. A > simpler solution would just be to allow TOAST table creation to > automatically remove placeholder files and create specified relfilenodes > via global variables. Getting rid of the need for placeholders is a good idea. +1 on getting TOAST tables created with the correct relfilenode from the start. I don't know that preserving any other relfilenode is useful; however if it means you no longer have to rename the files underlying each table, it would probably also be a good idea. (I don't know how does pg_migrator deal with such things currently -- does it keep a map of table name to relfilenode?) > 2) Right now pg_migrator renames old tablespaces to .old, which fails > if the tablespaces are on mount points. I have already received a > report of such a failure. I thought it was impossible to use bare mountpoints as tablespaces due to ownership problems ... Is that not the case? -1 for special hacks that work around bogus setups, if that means intrusive changes to the core code. > 3) There is no easy way to analyze all databases. vacuumdb --analyze > does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an > unnecessary vacuum. Right now I recommend ANALYZE in every database, > but it would be nice if there were a single command which did this. +1 for vacuumdb --analyze-only > 4) I have implemented the ability to run pg_migrator --check on a live > old server. However, pg_migrator uses information from controldata to > check things, and it also needs xid information that is only available > via pg_resetxlog -n(no update) to perform the migration. Unfortunately, > pg_resetxlog -n cannot be run on a live server, so pg_migrator runs > pg_controldata for --check and pg_resetxlog -n for real upgrades. It > would simplify pg_migrator if I would run pg_resetxlog -n on a live > server, but I can understand if people don't want to do that because the > xid information reported on a live server is inaccurate. What xid info does it need? Would it be good enough to use the "next XID" from most recent checkpoint from pg_controldata? It is a bit outdated, but can't you simply add some value to it to have a safety margin? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Getting rid of the need for placeholders is a good idea. +1 on getting > TOAST tables created with the correct relfilenode from the start. I > don't know that preserving any other relfilenode is useful; however if > it means you no longer have to rename the files underlying each table, > it would probably also be a good idea. I think this is an all-or-nothing proposition: if you try to preserve only some relfilenodes, you risk collisions with automatically assigned ones. It's just like the situation with pg_type OIDs. I concur that trying to preserve them looks like it would be less work than the current method. regards, tom lane
Alvaro Herrera wrote: > > 3) There is no easy way to analyze all databases. vacuumdb --analyze > > does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an > > unnecessary vacuum. Right now I recommend ANALYZE in every database, > > but it would be nice if there were a single command which did this. > > +1 for vacuumdb --analyze-only OK, I have implemented this using --only-analyze to avoid having the '--anal' option spelling be ambiguous, which might confuse/frustrate users. I also moved the --freeze option documention mention into a more logical place. Patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/ref/vacuumdb.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/vacuumdb.sgml,v retrieving revision 1.45 diff -c -c -r1.45 vacuumdb.sgml *** doc/src/sgml/ref/vacuumdb.sgml 27 Nov 2009 17:41:26 -0000 1.45 --- doc/src/sgml/ref/vacuumdb.sgml 4 Jan 2010 17:21:34 -0000 *************** *** 24,32 **** <command>vacuumdb</command> <arg rep="repeat"><replaceable>connection-option</replaceable></arg> <group><arg>--full</arg><arg>-f</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--freeze</arg><arg>-F</arg></group> <arg>--table | -t <replaceable>table</replaceable> <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg> </arg> --- 24,33 ---- <command>vacuumdb</command> <arg rep="repeat"><replaceable>connection-option</replaceable></arg> <group><arg>--full</arg><arg>-f</arg></group> + <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--only-analyze</arg><arg>-o</arg></group> <arg>--table | -t <replaceable>table</replaceable> <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg> </arg> *************** *** 36,44 **** <arg rep="repeat"><replaceable>connection-options</replaceable></arg> <group><arg>--all</arg><arg>-a</arg></group> <group><arg>--full</arg><arg>-f</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--freeze</arg><arg>-F</arg></group> </cmdsynopsis> </refsynopsisdiv> --- 37,46 ---- <arg rep="repeat"><replaceable>connection-options</replaceable></arg> <group><arg>--all</arg><arg>-a</arg></group> <group><arg>--full</arg><arg>-f</arg></group> + <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--only-analyze</arg><arg>-o</arg></group> </cmdsynopsis> </refsynopsisdiv> *************** *** 56,63 **** <para> <application>vacuumdb</application> is a wrapper around the SQL command <xref linkend="SQL-VACUUM" endterm="SQL-VACUUM-title">. ! There is no effective difference between vacuuming databases via ! this utility and via other methods for accessing the server. </para> </refsect1> --- 58,66 ---- <para> <application>vacuumdb</application> is a wrapper around the SQL command <xref linkend="SQL-VACUUM" endterm="SQL-VACUUM-title">. ! There is no effective difference between vacuuming and analyzing ! databases via this utility and via other methods for accessing the ! server. </para> </refsect1> *************** *** 117,122 **** --- 120,145 ---- </varlistentry> <varlistentry> + <term><option>-F</option></term> + <term><option>--freeze</option></term> + <listitem> + <para> + Aggressively <quote>freeze</quote> tuples. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-o</option></term> + <term><option>--only-analyze</option></term> + <listitem> + <para> + Only calculate statistics for use by the optimizer (no vacuum). + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><option>-q</></term> <term><option>--quiet</></term> <listitem> *************** *** 133,139 **** <para> Clean or analyze <replaceable class="parameter">table</replaceable> only. Column names can be specified only in conjunction with ! the <option>--analyze</option> option. </para> <tip> <para> --- 156,162 ---- <para> Clean or analyze <replaceable class="parameter">table</replaceable> only. Column names can be specified only in conjunction with ! the <option>--analyze</option> or <option>--only-analyze</option> options. </para> <tip> <para> *************** *** 164,178 **** </listitem> </varlistentry> - <varlistentry> - <term><option>-F</option></term> - <term><option>--freeze</option></term> - <listitem> - <para> - Aggressively <quote>freeze</quote> tuples. - </para> - </listitem> - </varlistentry> </variablelist> </para> --- 187,192 ---- Index: src/bin/scripts/vacuumdb.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/scripts/vacuumdb.c,v retrieving revision 1.28 diff -c -c -r1.28 vacuumdb.c *** src/bin/scripts/vacuumdb.c 2 Jan 2010 16:58:00 -0000 1.28 --- src/bin/scripts/vacuumdb.c 4 Jan 2010 17:21:34 -0000 *************** *** 14,25 **** #include "common.h" ! static void vacuum_one_database(const char *dbname, bool full, bool verbose, bool analyze, ! bool freeze, const char *table, ! const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo); ! static void vacuum_all_databases(bool full, bool verbose, bool analyze, bool freeze, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo, bool quiet); --- 14,26 ---- #include "common.h" ! static void vacuum_one_database(const char *dbname, bool full, bool verbose, ! bool and_analyze, bool only_analyze, bool freeze, ! const char *table, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo); ! static void vacuum_all_databases(bool full, bool verbose, bool and_analyze, ! bool only_analyze, bool freeze, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo, bool quiet); *************** *** 40,45 **** --- 41,47 ---- {"quiet", no_argument, NULL, 'q'}, {"dbname", required_argument, NULL, 'd'}, {"analyze", no_argument, NULL, 'z'}, + {"only-analyze", no_argument, NULL, 'o'}, {"freeze", no_argument, NULL, 'F'}, {"all", no_argument, NULL, 'a'}, {"table", required_argument, NULL, 't'}, *************** *** 59,65 **** enum trivalue prompt_password = TRI_DEFAULT; bool echo = false; bool quiet = false; ! bool analyze = false; bool freeze = false; bool alldb = false; char *table = NULL; --- 61,68 ---- enum trivalue prompt_password = TRI_DEFAULT; bool echo = false; bool quiet = false; ! bool and_analyze = false; ! bool only_analyze = false; bool freeze = false; bool alldb = false; char *table = NULL; *************** *** 100,106 **** dbname = optarg; break; case 'z': ! analyze = true; break; case 'F': freeze = true; --- 103,112 ---- dbname = optarg; break; case 'z': ! and_analyze = true; ! break; ! case 'o': ! only_analyze = true; break; case 'F': freeze = true; *************** *** 139,144 **** --- 145,167 ---- setup_cancel_handler(); + if (only_analyze) + { + if (full) + { + fprintf(stderr, _("%s: cannot use the \"full\" option when performing only analyze\n"), + progname); + exit(1); + } + if (freeze) + { + fprintf(stderr, _("%s: cannot use the \"freeze\" option when performing only analyze\n"), + progname); + exit(1); + } + /* ignore 'and_analyze' */ + } + if (alldb) { if (dbname) *************** *** 154,160 **** exit(1); } ! vacuum_all_databases(full, verbose, analyze, freeze, host, port, username, prompt_password, progname, echo, quiet); } --- 177,183 ---- exit(1); } ! vacuum_all_databases(full, verbose, and_analyze, only_analyze, freeze, host, port, username, prompt_password, progname, echo, quiet); } *************** *** 170,176 **** dbname = get_user_name(progname); } ! vacuum_one_database(dbname, full, verbose, analyze, freeze, table, host, port, username, prompt_password, progname, echo); } --- 193,200 ---- dbname = get_user_name(progname); } ! vacuum_one_database(dbname, full, verbose, and_analyze, only_analyze, ! freeze, table, host, port, username, prompt_password, progname, echo); } *************** *** 180,187 **** static void ! vacuum_one_database(const char *dbname, bool full, bool verbose, bool analyze, ! bool freeze, const char *table, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo) --- 204,211 ---- static void ! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze, ! bool only_analyze, bool freeze, const char *table, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo) *************** *** 192,206 **** initPQExpBuffer(&sql); ! appendPQExpBuffer(&sql, "VACUUM"); ! if (full) ! appendPQExpBuffer(&sql, " FULL"); ! if (freeze) ! appendPQExpBuffer(&sql, " FREEZE"); if (verbose) appendPQExpBuffer(&sql, " VERBOSE"); - if (analyze) - appendPQExpBuffer(&sql, " ANALYZE"); if (table) appendPQExpBuffer(&sql, " %s", table); appendPQExpBuffer(&sql, ";\n"); --- 216,235 ---- initPQExpBuffer(&sql); ! if (only_analyze) ! appendPQExpBuffer(&sql, "ANALYZE"); ! else ! { ! appendPQExpBuffer(&sql, "VACUUM"); ! if (full) ! appendPQExpBuffer(&sql, " FULL"); ! if (freeze) ! appendPQExpBuffer(&sql, " FREEZE"); ! if (and_analyze) ! appendPQExpBuffer(&sql, " ANALYZE"); ! } if (verbose) appendPQExpBuffer(&sql, " VERBOSE"); if (table) appendPQExpBuffer(&sql, " %s", table); appendPQExpBuffer(&sql, ";\n"); *************** *** 223,230 **** static void ! vacuum_all_databases(bool full, bool verbose, bool analyze, bool freeze, ! const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo, bool quiet) { --- 252,259 ---- static void ! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool only_analyze, ! bool freeze, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo, bool quiet) { *************** *** 246,253 **** fflush(stdout); } ! vacuum_one_database(dbname, full, verbose, analyze, freeze, NULL, ! host, port, username, prompt_password, progname, echo); } --- 275,282 ---- fflush(stdout); } ! vacuum_one_database(dbname, full, verbose, and_analyze, only_analyze, ! freeze, NULL, host, port, username, prompt_password, progname, echo); } *************** *** 267,272 **** --- 296,302 ---- printf(_(" -e, --echo show the commands being sent to the server\n")); printf(_(" -f, --full do full vacuuming\n")); printf(_(" -F, --freeze freeze row transaction information\n")); + printf(_(" -o, --only-analyze only update optimizer hints\n")); printf(_(" -q, --quiet don't write any messages\n")); printf(_(" -t, --table='TABLE[(COLUMNS)]' vacuum specific table only\n")); printf(_(" -v, --verbose write a lot of output\n"));
Robert Haas wrote: > >> 3) ?There is no easy way to analyze all databases. ?vacuumdb --analyze > >> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an > >> unnecessary vacuum. ?Right now I recommend ANALYZE in every database, > >> but it would be nice if there were a single command which did this. > > Something like vacuumdb --analyze-only? It seems like overkill to > create a whole new command for this, even though vacuumdb doesn't > quite make sense. Yea, I am not excited about having vacuumdb do only analyze, but it seems the most minimal solution. I spelled it --only-analyze and just posted the reason and patch. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Robert Haas wrote: > >> 4) ?I have implemented the ability to run pg_migrator --check on a live > >> old server. ?However, pg_migrator uses information from controldata to > >> check things, and it also needs xid information that is only available > >> via pg_resetxlog -n(no update) to perform the migration. ?Unfortunately, > >> pg_resetxlog -n cannot be run on a live server, so pg_migrator runs > >> pg_controldata for --check and pg_resetxlog -n for real upgrades. ?It > >> would simplify pg_migrator if I would run pg_resetxlog -n on a live > >> server, but I can understand if people don't want to do that because the > >> xid information reported on a live server is inaccurate. > > I don't really have a specific thought on this issue, except that it > sounds like you're launching a lot of shell commands, and I wonder > whether it would be better to try to do this through either C code or > by exposing the appropriate stuff at the SQL level. I considered that but realize that pg_migrator has to read pg_controldata in both the old and new servers, meaning it would need access to both C structures, and considering they both have the same structure names, that would require some odd C tricks. Add to that you don't know which version of Postgres you are migrating from/to during compile and the idea of using C becomes even less attractive. Doing this in C would require pg_migrator to track all changes in the pg_controldata structure layout, which seems excessively complex/error-prone. Right now I only have to track changes to the naming of the output fields. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian escribió: > I considered that but realize that pg_migrator has to read > pg_controldata in both the old and new servers, meaning it would need > access to both C structures, and considering they both have the same > structure names, that would require some odd C tricks. Add to that you > don't know which version of Postgres you are migrating from/to during > compile and the idea of using C becomes even less attractive. However, keep in mind that this might not be the last time on which we will want to read something from a C struct, so perhaps it would be good to bite the bullet and write the odd tricks. Does it already have access (at compile time) to the old and new source trees? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Bruce Momjian wrote: > > pg_migrator has become more popular recently, so it seems time to look > > at some enhancements that would improve pg_migrator. None of these are > > required, but rather changes that would be nice to have: > > > > 1) Right now pg_migrator preserves relfilenodes for TOAST files because > > this is required for proper migration. Now that we have shown that > > strategically-placed global variables with a server-side function to set > > them is a viable solution, it would be nice to preserve all relfilenodes > > from the old server. This would simplify pg_migrator by no long > > requiring place-holder relfilenodes or the renaming of TOAST files. A > > simpler solution would just be to allow TOAST table creation to > > automatically remove placeholder files and create specified relfilenodes > > via global variables. > > Getting rid of the need for placeholders is a good idea. +1 on getting > TOAST tables created with the correct relfilenode from the start. I > don't know that preserving any other relfilenode is useful; however if > it means you no longer have to rename the files underlying each table, > it would probably also be a good idea. (I don't know how does > pg_migrator deal with such things currently -- does it keep a map of > table name to relfilenode?) Yea, as Tom said later, there are two options. Either we create placeholder files and then remove the place-holders when we create the toast tables or we just preserve all relfilenodes --- I think the later is easier. > > 4) I have implemented the ability to run pg_migrator --check on a live > > old server. However, pg_migrator uses information from controldata to > > check things, and it also needs xid information that is only available > > via pg_resetxlog -n(no update) to perform the migration. Unfortunately, > > pg_resetxlog -n cannot be run on a live server, so pg_migrator runs > > pg_controldata for --check and pg_resetxlog -n for real upgrades. It > > would simplify pg_migrator if I would run pg_resetxlog -n on a live > > server, but I can understand if people don't want to do that because the > > xid information reported on a live server is inaccurate. > > What xid info does it need? Would it be good enough to use the "next > XID" from most recent checkpoint from pg_controldata? It is a bit > outdated, but can't you simply add some value to it to have a safety margin? Well, I am not much into 'safety margins' with pg_migrator, meaning I want to get the most reliable value I can --- I have no idea what that safety margin would be. Right now pg_migrator works fine by calling pg_controldata or pg_resetxlog as appropriate. I was hoping to allow pg_resetxlog -n on a live server. Is that something we should avoid? I really don't need the change --- it would just simplify pg_migrator. I was just really asking if disallowing pg_resetxlog -n on a live server is planned behavior or an oversight. I can see the logic that it should be disallowed but I am just looking for confirmation from someone and I can then drop the issue. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Alvaro Herrera wrote: > Bruce Momjian escribi?: > > > I considered that but realize that pg_migrator has to read > > pg_controldata in both the old and new servers, meaning it would need > > access to both C structures, and considering they both have the same > > structure names, that would require some odd C tricks. Add to that you > > don't know which version of Postgres you are migrating from/to during > > compile and the idea of using C becomes even less attractive. > > However, keep in mind that this might not be the last time on which we > will want to read something from a C struct, so perhaps it would be good > to bite the bullet and write the odd tricks. Does it already have > access (at compile time) to the old and new source trees? No, only the new soure tree, or actually any source tree, but ideally the new one. Remember we have Win32 binaries being built, and right now there is limited linkage between pg_migrator and the backend code. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Alvaro Herrera wrote: > > 2) Right now pg_migrator renames old tablespaces to .old, which fails > > if the tablespaces are on mount points. I have already received a > > report of such a failure. > > I thought it was impossible to use bare mountpoints as tablespaces due > to ownership problems ... Is that not the case? -1 for special hacks > that work around bogus setups, if that means intrusive changes to the > core code. I talked to the person who reported the problem and he and I confirmed that it is quite easy to make the mount point be owned by the postgres user and have that function as a tablespace. Is that not a supported setup? There is probably a larger problem that the tablespace must be located in a directory that has directory rename permission for postgres. I have updated the pg_migrator INSTALL file to mention this issue. As far as .old, we could create the tablespaces as *.new, but that kind of defeats the existing recommended pg_migrator usage where we tell the user to rename PGDATA to .old before running pg_migrator. It was actually Tom's idea months ago to put a version-specific directory in the tablespace. I don't think it is necessary, and we can live with the mount point limitation. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > I was just really asking if disallowing pg_resetxlog -n on a live server > is planned behavior or an oversight. I can see the logic that it should > be disallowed but I am just looking for confirmation from someone and I > can then drop the issue. Well, it's not only a matter of "are we going to clobber live state", it's also "is the state that we are looking at changing under us?". The -n switch only covers the first point. I think it would require some careful analysis, and testing that's never been done, before having any confidence in the results of pg_resetxlog on a live server. Why should you need this anyway? pg_migrator should not be having to run pg_resetxlog on the old installation, I would think. regards, tom lane
On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > Alvaro Herrera wrote: >> > 2) Right now pg_migrator renames old tablespaces to .old, which fails >> > if the tablespaces are on mount points. I have already received a >> > report of such a failure. >> >> I thought it was impossible to use bare mountpoints as tablespaces due >> to ownership problems ... Is that not the case? -1 for special hacks >> that work around bogus setups, if that means intrusive changes to the >> core code. > > I talked to the person who reported the problem and he and I confirmed > that it is quite easy to make the mount point be owned by the postgres > user and have that function as a tablespace. Is that not a supported > setup? There is probably a larger problem that the tablespace must be > located in a directory that has directory rename permission for > postgres. I have updated the pg_migrator INSTALL file to mention this > issue. > > As far as .old, we could create the tablespaces as *.new, but that kind > of defeats the existing recommended pg_migrator usage where we tell the > user to rename PGDATA to .old before running pg_migrator. > > It was actually Tom's idea months ago to put a version-specific > directory in the tablespace. I don't think it is necessary, and we can > live with the mount point limitation. What doesn't work if we just don't rename the tablespace at all? And can't we put some smarts into the backend to handle that thing? ...Robert
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I was just really asking if disallowing pg_resetxlog -n on a live server > > is planned behavior or an oversight. I can see the logic that it should > > be disallowed but I am just looking for confirmation from someone and I > > can then drop the issue. > > Well, it's not only a matter of "are we going to clobber live state", > it's also "is the state that we are looking at changing under us?". > The -n switch only covers the first point. I think it would require > some careful analysis, and testing that's never been done, before having > any confidence in the results of pg_resetxlog on a live server. Yea, that was my analysis too. I will discard the idea and just keep the pg_migrator code that does either. > Why should you need this anyway? pg_migrator should not be having to > run pg_resetxlog on the old installation, I would think. Well, the same code is run on the new and old server. The complex issue is that the same code that checks for matching controldata settings (check mode) is the same that pulls the xid from the old server to set it on the new one. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Robert Haas wrote: > On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Alvaro Herrera wrote: > >> > 2) ?Right now pg_migrator renames old tablespaces to .old, which fails > >> > if the tablespaces are on mount points. ?I have already received a > >> > report of such a failure. > >> > >> I thought it was impossible to use bare mountpoints as tablespaces due > >> to ownership problems ... Is that not the case? ?-1 for special hacks > >> that work around bogus setups, if that means intrusive changes to the > >> core code. > > > > I talked to the person who reported the problem and he and I confirmed > > that it is quite easy to make the mount point be owned by the postgres > > user and have that function as a tablespace. ?Is that not a supported > > setup? ?There is probably a larger problem that the tablespace must be > > located in a directory that has directory rename permission for > > postgres. ?I have updated the pg_migrator INSTALL file to mention this > > issue. > > > > As far as .old, we could create the tablespaces as *.new, but that kind > > of defeats the existing recommended pg_migrator usage where we tell the > > user to rename PGDATA to .old before running pg_migrator. > > > > It was actually Tom's idea months ago to put a version-specific > > directory in the tablespace. ?I don't think it is necessary, and we can > > live with the mount point limitation. > > What doesn't work if we just don't rename the tablespace at all? And > can't we put some smarts into the backend to handle that thing? Well, when you restore the old dump's schema into the new server, the tablespace directory path will be the same, so we had better not have any directory there. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, Jan 4, 2010 at 2:52 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > Alvaro Herrera wrote: >> >> > 2) ?Right now pg_migrator renames old tablespaces to .old, which fails >> >> > if the tablespaces are on mount points. ?I have already received a >> >> > report of such a failure. >> >> >> >> I thought it was impossible to use bare mountpoints as tablespaces due >> >> to ownership problems ... Is that not the case? ?-1 for special hacks >> >> that work around bogus setups, if that means intrusive changes to the >> >> core code. >> > >> > I talked to the person who reported the problem and he and I confirmed >> > that it is quite easy to make the mount point be owned by the postgres >> > user and have that function as a tablespace. ?Is that not a supported >> > setup? ?There is probably a larger problem that the tablespace must be >> > located in a directory that has directory rename permission for >> > postgres. ?I have updated the pg_migrator INSTALL file to mention this >> > issue. >> > >> > As far as .old, we could create the tablespaces as *.new, but that kind >> > of defeats the existing recommended pg_migrator usage where we tell the >> > user to rename PGDATA to .old before running pg_migrator. >> > >> > It was actually Tom's idea months ago to put a version-specific >> > directory in the tablespace. ?I don't think it is necessary, and we can >> > live with the mount point limitation. >> >> What doesn't work if we just don't rename the tablespace at all? And >> can't we put some smarts into the backend to handle that thing? > > Well, when you restore the old dump's schema into the new server, the > tablespace directory path will be the same, so we had better not have > any directory there. Well that seems like something you could work around by hacking the contents of the dump... ...Robert
Robert Haas wrote: > On Mon, Jan 4, 2010 at 2:52 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > >> On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> > Alvaro Herrera wrote: > >> >> > 2) ?Right now pg_migrator renames old tablespaces to .old, which fails > >> >> > if the tablespaces are on mount points. ?I have already received a > >> >> > report of such a failure. > >> >> > >> >> I thought it was impossible to use bare mountpoints as tablespaces due > >> >> to ownership problems ... Is that not the case? ?-1 for special hacks > >> >> that work around bogus setups, if that means intrusive changes to the > >> >> core code. > >> > > >> > I talked to the person who reported the problem and he and I confirmed > >> > that it is quite easy to make the mount point be owned by the postgres > >> > user and have that function as a tablespace. ?Is that not a supported > >> > setup? ?There is probably a larger problem that the tablespace must be > >> > located in a directory that has directory rename permission for > >> > postgres. ?I have updated the pg_migrator INSTALL file to mention this > >> > issue. > >> > > >> > As far as .old, we could create the tablespaces as *.new, but that kind > >> > of defeats the existing recommended pg_migrator usage where we tell the > >> > user to rename PGDATA to .old before running pg_migrator. > >> > > >> > It was actually Tom's idea months ago to put a version-specific > >> > directory in the tablespace. ?I don't think it is necessary, and we can > >> > live with the mount point limitation. > >> > >> What doesn't work if we just don't rename the tablespace at all? ?And > >> can't we put some smarts into the backend to handle that thing? > > > > Well, when you restore the old dump's schema into the new server, the > > tablespace directory path will be the same, so we had better not have > > any directory there. > > Well that seems like something you could work around by hacking the > contents of the dump... True, in --binary-upgrade mode, but what do we make it? *.new? What if they want to have the same tablespace names after the upgrade? It really gets ugly if we are on a mount point because the tablespaces will be in different file systems, which makes --link mode impossible, and might create files in a filesystem that doesn't have enough space. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Alvaro Herrera wrote: > > > 2) Right now pg_migrator renames old tablespaces to .old, which fails > > > if the tablespaces are on mount points. I have already received a > > > report of such a failure. > > > > I thought it was impossible to use bare mountpoints as tablespaces due > > to ownership problems ... Is that not the case? -1 for special hacks > > that work around bogus setups, if that means intrusive changes to the > > core code. > > I talked to the person who reported the problem and he and I confirmed > that it is quite easy to make the mount point be owned by the postgres > user and have that function as a tablespace. Is that not a supported > setup? There is probably a larger problem that the tablespace must be > located in a directory that has directory rename permission for > postgres. I have updated the pg_migrator INSTALL file to mention this > issue. Oh, the actual INSTALL warning is: If you are using tablespaces, there must be sufficient directorypermissions to allow each tablespace directory to be renamedwith a".old" suffix. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, Jan 4, 2010 at 3:33 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Mon, Jan 4, 2010 at 2:52 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > Robert Haas wrote: >> >> On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote: >> >> > Alvaro Herrera wrote: >> >> >> > 2) ?Right now pg_migrator renames old tablespaces to .old, which fails >> >> >> > if the tablespaces are on mount points. ?I have already received a >> >> >> > report of such a failure. >> >> >> >> >> >> I thought it was impossible to use bare mountpoints as tablespaces due >> >> >> to ownership problems ... Is that not the case? ?-1 for special hacks >> >> >> that work around bogus setups, if that means intrusive changes to the >> >> >> core code. >> >> > >> >> > I talked to the person who reported the problem and he and I confirmed >> >> > that it is quite easy to make the mount point be owned by the postgres >> >> > user and have that function as a tablespace. ?Is that not a supported >> >> > setup? ?There is probably a larger problem that the tablespace must be >> >> > located in a directory that has directory rename permission for >> >> > postgres. ?I have updated the pg_migrator INSTALL file to mention this >> >> > issue. >> >> > >> >> > As far as .old, we could create the tablespaces as *.new, but that kind >> >> > of defeats the existing recommended pg_migrator usage where we tell the >> >> > user to rename PGDATA to .old before running pg_migrator. >> >> > >> >> > It was actually Tom's idea months ago to put a version-specific >> >> > directory in the tablespace. ?I don't think it is necessary, and we can >> >> > live with the mount point limitation. >> >> >> >> What doesn't work if we just don't rename the tablespace at all? ?And >> >> can't we put some smarts into the backend to handle that thing? >> > >> > Well, when you restore the old dump's schema into the new server, the >> > tablespace directory path will be the same, so we had better not have >> > any directory there. >> >> Well that seems like something you could work around by hacking the >> contents of the dump... > > True, in --binary-upgrade mode, but what do we make it? *.new? What if > they want to have the same tablespace names after the upgrade? It > really gets ugly if we are on a mount point because the tablespaces will > be in different file systems, which makes --link mode impossible, and > might create files in a filesystem that doesn't have enough space. But can't we just call a special function first before running the CREATE TABLESPACE, like: pg_tablespace_dont_really_create_this_tablespace_because_we_are_in_pg_migrator()? Sorta like what you did to preserve ENUM OIDs, etc.? ...Robert
Robert Haas wrote: > >> >> What doesn't work if we just don't rename the tablespace at all? ?And > >> >> can't we put some smarts into the backend to handle that thing? > >> > > >> > Well, when you restore the old dump's schema into the new server, the > >> > tablespace directory path will be the same, so we had better not have > >> > any directory there. > >> > >> Well that seems like something you could work around by hacking the > >> contents of the dump... > > > > True, in --binary-upgrade mode, but what do we make it? ?*.new? ?What if > > they want to have the same tablespace names after the upgrade? ?It > > really gets ugly if we are on a mount point because the tablespaces will > > be in different file systems, which makes --link mode impossible, and > > might create files in a filesystem that doesn't have enough space. > > But can't we just call a special function first before running the > CREATE TABLESPACE, like: > > pg_tablespace_dont_really_create_this_tablespace_because_we_are_in_pg_migrator()? > > Sorta like what you did to preserve ENUM OIDs, etc.? Well, the problem is that we are creating something in a file system, and the old and new contents of the tablespace directories must exist after the migration (in case the migration is reverted). We were able to get away with this for enum because we were only creating this in the _new_ database. With the file system, we have a resource/namespace shared between the old and new server. What were you thinking this function call would do? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, Jan 4, 2010 at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> >> >> What doesn't work if we just don't rename the tablespace at all? ?And >> >> >> can't we put some smarts into the backend to handle that thing? >> >> > >> >> > Well, when you restore the old dump's schema into the new server, the >> >> > tablespace directory path will be the same, so we had better not have >> >> > any directory there. >> >> >> >> Well that seems like something you could work around by hacking the >> >> contents of the dump... >> > >> > True, in --binary-upgrade mode, but what do we make it? ?*.new? ?What if >> > they want to have the same tablespace names after the upgrade? ?It >> > really gets ugly if we are on a mount point because the tablespaces will >> > be in different file systems, which makes --link mode impossible, and >> > might create files in a filesystem that doesn't have enough space. >> >> But can't we just call a special function first before running the >> CREATE TABLESPACE, like: >> >> pg_tablespace_dont_really_create_this_tablespace_because_we_are_in_pg_migrator()? >> >> Sorta like what you did to preserve ENUM OIDs, etc.? > > Well, the problem is that we are creating something in a file system, > and the old and new contents of the tablespace directories must exist > after the migration (in case the migration is reverted). We were able > to get away with this for enum because we were only creating this in the > _new_ database. With the file system, we have a resource/namespace > shared between the old and new server. Oh, I thought you were hard-linking the files, not copying them, so the old directory would be destroyed anyway. > What were you thinking this function call would do? Basically, make PostgreSQL not reinitialize the directory as it normally would when a new tablespace is created. Or make it ignore the existence of the directory until told to stop ignoring it. Or whatever is needed to avoid having to move the thing around. Sorry, I'm hand-waving wildly here; I haven't read the code. Maybe I should shut up. ...Robert
Robert Haas wrote: > >> But can't we just call a special function first before running the > >> CREATE TABLESPACE, like: > >> > >> pg_tablespace_dont_really_create_this_tablespace_because_we_are_in_pg_migrator()? > >> > >> Sorta like what you did to preserve ENUM OIDs, etc.? > > > > Well, the problem is that we are creating something in a file system, > > and the old and new contents of the tablespace directories must exist > > after the migration (in case the migration is reverted). ?We were able > > to get away with this for enum because we were only creating this in the > > _new_ database. ?With the file system, we have a resource/namespace > > shared between the old and new server. > > Oh, I thought you were hard-linking the files, not copying them, so > the old directory would be destroyed anyway. The default mode is COPY but there is a --link option. You are right that if we only did linking things would be much simpler. > > What were you thinking this function call would do? > > Basically, make PostgreSQL not reinitialize the directory as it > normally would when a new tablespace is created. Or make it ignore > the existence of the directory until told to stop ignoring it. Or > whatever is needed to avoid having to move the thing around. Sorry, > I'm hand-waving wildly here; I haven't read the code. Maybe I should > shut up. Sorry, I can't figure out how that would work. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > pg_migrator has become more popular recently, so it seems time to look > at some enhancements that would improve pg_migrator. None of these are > required, but rather changes that would be nice to have: > > 1) Right now pg_migrator preserves relfilenodes for TOAST files because > this is required for proper migration. Now that we have shown that > strategically-placed global variables with a server-side function to set > them is a viable solution, it would be nice to preserve all relfilenodes > from the old server. This would simplify pg_migrator by no long > requiring place-holder relfilenodes or the renaming of TOAST files. A > simpler solution would just be to allow TOAST table creation to > automatically remove placeholder files and create specified relfilenodes > via global variables. Attached is a patch that implements #1 above by preserving all relfilenodes, with pg_dump support. It uses the same method I used for preserving pg_type/pg_enum. I have tested this on the regression database and it successfully preserved all relfilenodes. This patch also removes the 'force' parameter in toast functions that Tom added for 8.4 --- it is no longer needed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/catalog/heap.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/heap.c,v retrieving revision 1.364 diff -c -c -r1.364 heap.c *** src/backend/catalog/heap.c 2 Jan 2010 16:57:36 -0000 1.364 --- src/backend/catalog/heap.c 5 Jan 2010 02:37:21 -0000 *************** *** 96,101 **** --- 96,104 ---- char *relname); static List *insert_ordered_unique_oid(List *list, Oid datum); + Oid binary_upgrade_next_heap_relfilenode = InvalidOid; + Oid binary_upgrade_next_toast_relfilenode = InvalidOid; + /* ---------------------------------------------------------------- * XXX UGLY HARD CODED BADNESS FOLLOWS XXX *************** *** 942,956 **** errmsg("only shared relations can be placed in pg_global tablespace"))); } ! /* ! * Allocate an OID for the relation, unless we were told what to use. ! * ! * The OID will be the relfilenode as well, so make sure it doesn't ! * collide with either pg_class OIDs or existing physical files. ! */ ! if (!OidIsValid(relid)) relid = GetNewRelFileNode(reltablespace, shared_relation, pg_class_desc); /* * Determine the relation's initial permissions. --- 945,973 ---- errmsg("only shared relations can be placed in pg_global tablespace"))); } ! if ((relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE) && ! OidIsValid(binary_upgrade_next_heap_relfilenode)) ! { ! relid = binary_upgrade_next_heap_relfilenode; ! binary_upgrade_next_heap_relfilenode = InvalidOid; ! } ! else if (relkind == RELKIND_TOASTVALUE && ! OidIsValid(binary_upgrade_next_toast_relfilenode)) ! { ! relid = binary_upgrade_next_toast_relfilenode; ! binary_upgrade_next_toast_relfilenode = InvalidOid; ! } ! else if (!OidIsValid(relid)) ! { ! /* ! * Allocate an OID for the relation, unless we were told what to use. ! * ! * The OID will be the relfilenode as well, so make sure it doesn't ! * collide with either pg_class OIDs or existing physical files. ! */ relid = GetNewRelFileNode(reltablespace, shared_relation, pg_class_desc); + } /* * Determine the relation's initial permissions. Index: src/backend/catalog/index.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.328 diff -c -c -r1.328 index.c *** src/backend/catalog/index.c 2 Jan 2010 16:57:36 -0000 1.328 --- src/backend/catalog/index.c 5 Jan 2010 02:37:21 -0000 *************** *** 79,84 **** --- 79,87 ---- tups_inserted; } v_i_state; + /* For simple relation creation, this is the toast index relfilenode */ + Oid binary_upgrade_next_index_relfilenode = InvalidOid; + /* non-export function prototypes */ static TupleDesc ConstructTupleDescriptor(Relation heapRelation, IndexInfo *indexInfo, *************** *** 640,654 **** accessMethodObjectId, classObjectId); ! /* ! * Allocate an OID for the index, unless we were told what to use. ! * ! * The OID will be the relfilenode as well, so make sure it doesn't ! * collide with either pg_class OIDs or existing physical files. ! */ ! if (!OidIsValid(indexRelationId)) indexRelationId = GetNewRelFileNode(tableSpaceId, shared_relation, pg_class); /* * create the index relation's relcache entry and physical disk file. (If --- 643,664 ---- accessMethodObjectId, classObjectId); ! if (OidIsValid(binary_upgrade_next_index_relfilenode)) ! { ! indexRelationId = binary_upgrade_next_index_relfilenode; ! binary_upgrade_next_index_relfilenode = InvalidOid; ! } ! else if (!OidIsValid(indexRelationId)) ! { ! /* ! * Allocate an OID for the index, unless we were told what to use. ! * ! * The OID will be the relfilenode as well, so make sure it doesn't ! * collide with either pg_class OIDs or existing physical files. ! */ indexRelationId = GetNewRelFileNode(tableSpaceId, shared_relation, pg_class); + } /* * create the index relation's relcache entry and physical disk file. (If Index: src/backend/catalog/toasting.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/toasting.c,v retrieving revision 1.26 diff -c -c -r1.26 toasting.c *** src/backend/catalog/toasting.c 2 Jan 2010 16:57:36 -0000 1.26 --- src/backend/catalog/toasting.c 5 Jan 2010 02:37:21 -0000 *************** *** 32,53 **** #include "utils/syscache.h" Oid binary_upgrade_next_pg_type_toast_oid = InvalidOid; static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, ! Datum reloptions, bool force); static bool needs_toast_table(Relation rel); /* * AlterTableCreateToastTable * If the table needs a toast table, and doesn't already have one, ! * then create a toast table for it. (With the force option, make ! * a toast table even if it appears unnecessary.) ! * ! * The caller can also specify the OID to be used for the toast table. ! * Usually, toastOid should be InvalidOid to allow a free OID to be assigned. ! * (This option, as well as the force option, is not used by core Postgres, ! * but is provided to support pg_migrator.) * * reloptions for the toast table can be passed, too. Pass (Datum) 0 * for default reloptions. --- 32,48 ---- #include "utils/syscache.h" Oid binary_upgrade_next_pg_type_toast_oid = InvalidOid; + extern Oid binary_upgrade_next_toast_relfilenode; static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, ! Datum reloptions); static bool needs_toast_table(Relation rel); /* * AlterTableCreateToastTable * If the table needs a toast table, and doesn't already have one, ! * then create a toast table for it. * * reloptions for the toast table can be passed, too. Pass (Datum) 0 * for default reloptions. *************** *** 57,64 **** * to end with CommandCounterIncrement if it makes any changes. */ void ! AlterTableCreateToastTable(Oid relOid, Oid toastOid, ! Datum reloptions, bool force) { Relation rel; --- 52,58 ---- * to end with CommandCounterIncrement if it makes any changes. */ void ! AlterTableCreateToastTable(Oid relOid, Datum reloptions) { Relation rel; *************** *** 70,76 **** rel = heap_open(relOid, AccessExclusiveLock); /* create_toast_table does all the work */ ! (void) create_toast_table(rel, toastOid, InvalidOid, reloptions, force); heap_close(rel, NoLock); } --- 64,70 ---- rel = heap_open(relOid, AccessExclusiveLock); /* create_toast_table does all the work */ ! (void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions); heap_close(rel, NoLock); } *************** *** 96,102 **** relName))); /* create_toast_table does all the work */ ! if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0, false)) elog(ERROR, "\"%s\" does not require a toast table", relName); --- 90,96 ---- relName))); /* create_toast_table does all the work */ ! if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0)) elog(ERROR, "\"%s\" does not require a toast table", relName); *************** *** 108,119 **** * create_toast_table --- internal workhorse * * rel is already opened and exclusive-locked ! * toastOid and toastIndexOid are normally InvalidOid, but ! * either or both can be nonzero to specify caller-assigned OIDs */ static bool ! create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, ! Datum reloptions, bool force) { Oid relOid = RelationGetRelid(rel); HeapTuple reltup; --- 102,112 ---- * create_toast_table --- internal workhorse * * rel is already opened and exclusive-locked ! * toastOid and toastIndexOid are normally InvalidOid, but during ! * bootstrap they can be nonzero to specify hand-assigned OIDs */ static bool ! create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptions) { Oid relOid = RelationGetRelid(rel); HeapTuple reltup; *************** *** 152,163 **** /* * Check to see whether the table actually needs a TOAST table. ! * ! * Caller can optionally override this check. (Note: at present no ! * callers in core Postgres do so, but this option is needed by ! * pg_migrator.) */ ! if (!force && !needs_toast_table(rel)) return false; /* --- 145,154 ---- /* * Check to see whether the table actually needs a TOAST table. ! * If the relfilenode is specified, force toast file creation. */ ! if (!needs_toast_table(rel) && ! !OidIsValid(binary_upgrade_next_toast_relfilenode)) return false; /* Index: src/backend/commands/cluster.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.189 diff -c -c -r1.189 cluster.c *** src/backend/commands/cluster.c 2 Jan 2010 16:57:37 -0000 1.189 --- src/backend/commands/cluster.c 5 Jan 2010 02:37:21 -0000 *************** *** 743,749 **** if (isNull) reloptions = (Datum) 0; } ! AlterTableCreateToastTable(OIDNewHeap, InvalidOid, reloptions, false); if (OidIsValid(toastid)) ReleaseSysCache(tuple); --- 743,749 ---- if (isNull) reloptions = (Datum) 0; } ! AlterTableCreateToastTable(OIDNewHeap, reloptions); if (OidIsValid(toastid)) ReleaseSysCache(tuple); Index: src/backend/commands/tablecmds.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/tablecmds.c,v retrieving revision 1.313 diff -c -c -r1.313 tablecmds.c *** src/backend/commands/tablecmds.c 2 Jan 2010 16:57:37 -0000 1.313 --- src/backend/commands/tablecmds.c 5 Jan 2010 02:37:21 -0000 *************** *** 2614,2621 **** (tab->subcmds[AT_PASS_ADD_COL] || tab->subcmds[AT_PASS_ALTER_TYPE] || tab->subcmds[AT_PASS_COL_ATTRS])) ! AlterTableCreateToastTable(tab->relid, InvalidOid, ! (Datum) 0, false); } } --- 2614,2620 ---- (tab->subcmds[AT_PASS_ADD_COL] || tab->subcmds[AT_PASS_ALTER_TYPE] || tab->subcmds[AT_PASS_COL_ATTRS])) ! AlterTableCreateToastTable(tab->relid, (Datum) 0); } } Index: src/backend/executor/execMain.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/executor/execMain.c,v retrieving revision 1.339 diff -c -c -r1.339 execMain.c *** src/backend/executor/execMain.c 2 Jan 2010 16:57:40 -0000 1.339 --- src/backend/executor/execMain.c 5 Jan 2010 02:37:22 -0000 *************** *** 2194,2200 **** (void) heap_reloptions(RELKIND_TOASTVALUE, reloptions, true); ! AlterTableCreateToastTable(intoRelationId, InvalidOid, reloptions, false); /* * And open the constructed table for writing. --- 2194,2200 ---- (void) heap_reloptions(RELKIND_TOASTVALUE, reloptions, true); ! AlterTableCreateToastTable(intoRelationId, reloptions); /* * And open the constructed table for writing. Index: src/backend/tcop/utility.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/utility.c,v retrieving revision 1.326 diff -c -c -r1.326 utility.c *** src/backend/tcop/utility.c 2 Jan 2010 16:57:53 -0000 1.326 --- src/backend/tcop/utility.c 5 Jan 2010 02:37:22 -0000 *************** *** 491,504 **** "toast", validnsps, true, false); ! (void) heap_reloptions(RELKIND_TOASTVALUE, ! toast_options, true); ! AlterTableCreateToastTable(relOid, ! InvalidOid, ! toast_options, ! false); } else { --- 491,500 ---- "toast", validnsps, true, false); ! (void) heap_reloptions(RELKIND_TOASTVALUE, toast_options, true); ! AlterTableCreateToastTable(relOid, toast_options); } else { Index: src/bin/pg_dump/pg_dump.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v retrieving revision 1.564 diff -c -c -r1.564 pg_dump.c *** src/bin/pg_dump/pg_dump.c 2 Jan 2010 16:57:59 -0000 1.564 --- src/bin/pg_dump/pg_dump.c 5 Jan 2010 02:37:22 -0000 *************** *** 200,206 **** PQExpBuffer upgrade_buffer, Oid pg_type_oid); static bool binary_upgrade_set_type_oids_by_rel_oid( PQExpBuffer upgrade_buffer, Oid pg_rel_oid); ! static void binary_upgrade_clear_pg_type_toast_oid(PQExpBuffer upgrade_buffer); static const char *getAttrName(int attrnum, TableInfo *tblInfo); static const char *fmtCopyColumnList(const TableInfo *ti); static void do_sql_command(PGconn *conn, const char *query); --- 200,207 ---- PQExpBuffer upgrade_buffer, Oid pg_type_oid); static bool binary_upgrade_set_type_oids_by_rel_oid( PQExpBuffer upgrade_buffer, Oid pg_rel_oid); ! static void binary_upgrade_set_relfilenodes(PQExpBuffer upgrade_buffer, ! Oid pg_class_oid, bool is_index); static const char *getAttrName(int attrnum, TableInfo *tblInfo); static const char *fmtCopyColumnList(const TableInfo *ti); static void do_sql_command(PGconn *conn, const char *query); *************** *** 2289,2309 **** } static void ! binary_upgrade_clear_pg_type_toast_oid(PQExpBuffer upgrade_buffer) { ! /* ! * One complexity is that while the heap might now have a TOAST table, ! * the TOAST table might have been created long after creation when ! * the table was loaded with wide data. For that reason, we clear ! * binary_upgrade_set_next_pg_type_toast_oid so it is not reused ! * by a later table. Logically any later creation that needs a TOAST ! * table should have its own TOAST pg_type oid, but we are cautious. ! */ ! appendPQExpBuffer(upgrade_buffer, ! "\n-- For binary upgrade, clear toast oid because it might not have been needed\n"); appendPQExpBuffer(upgrade_buffer, ! "SELECT binary_upgrade.set_next_pg_type_oid('%u'::pg_catalog.oid);\n\n", ! InvalidOid); } /* --- 2290,2367 ---- } static void ! binary_upgrade_set_relfilenodes(PQExpBuffer upgrade_buffer, Oid pg_class_oid, ! bool is_index) { ! PQExpBuffer upgrade_query = createPQExpBuffer(); ! int ntups; ! PGresult *upgrade_res; ! Oid pg_class_relfilenode; ! Oid pg_class_reltoastrelid; ! Oid pg_class_reltoastidxid; ! ! appendPQExpBuffer(upgrade_query, ! "SELECT c.relfilenode, c.reltoastrelid, t.reltoastidxid " ! "FROM pg_catalog.pg_class c LEFT JOIN " ! "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " ! "WHERE c.oid = '%u'::pg_catalog.oid;", ! pg_class_oid); ! ! upgrade_res = PQexec(g_conn, upgrade_query->data); ! check_sql_result(upgrade_res, g_conn, upgrade_query->data, PGRES_TUPLES_OK); ! ! /* Expecting a single result only */ ! ntups = PQntuples(upgrade_res); ! if (ntups != 1) ! { ! write_msg(NULL, ngettext("query returned %d row instead of one: %s\n", ! "query returned %d rows instead of one: %s\n", ! ntups), ! ntups, upgrade_query->data); ! exit_nicely(); ! } ! ! pg_class_relfilenode = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "relfilenode"))); ! pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid"))); ! pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid"))); ! appendPQExpBuffer(upgrade_buffer, ! "\n-- For binary upgrade, must preserve relfilenodes\n"); ! ! if (!is_index) ! appendPQExpBuffer(upgrade_buffer, ! "SELECT binary_upgrade.set_next_heap_relfilenode('%u'::pg_catalog.oid);\n", ! pg_class_relfilenode); ! else ! appendPQExpBuffer(upgrade_buffer, ! "SELECT binary_upgrade.set_next_index_relfilenode('%u'::pg_catalog.oid);\n", ! pg_class_relfilenode); ! ! if (OidIsValid(pg_class_reltoastrelid)) ! { ! /* ! * One complexity is that the table definition might not require ! * the creation of a TOAST table, and the TOAST table might have ! * been created long after table creation, when the table was ! * loaded with wide data. By setting the TOAST relfilenode we ! * force creation of the TOAST heap and TOAST index by the ! * backend so we can cleanly migrate the files during binary ! * migration. ! */ ! ! appendPQExpBuffer(upgrade_buffer, ! "SELECT binary_upgrade.set_next_toast_relfilenode('%u'::pg_catalog.oid);\n", ! pg_class_reltoastrelid); ! ! /* every toast table has an index */ ! appendPQExpBuffer(upgrade_buffer, ! "SELECT binary_upgrade.set_next_index_relfilenode('%u'::pg_catalog.oid);\n", ! pg_class_reltoastidxid); ! } ! appendPQExpBuffer(upgrade_buffer, "\n"); ! ! PQclear(upgrade_res); ! destroyPQExpBuffer(upgrade_query); } /* *************** *** 10480,10485 **** --- 10538,10546 ---- appendPQExpBuffer(delq, "%s;\n", fmtId(tbinfo->dobj.name)); + if (binary_upgrade) + binary_upgrade_set_relfilenodes(q, tbinfo->dobj.catId.oid, false); + appendPQExpBuffer(q, "CREATE TABLE %s (", fmtId(tbinfo->dobj.name)); actual_atts = 0; *************** *** 10781,10789 **** } } - if (binary_upgrade && toast_set) - binary_upgrade_clear_pg_type_toast_oid(q); - ArchiveEntry(fout, tbinfo->dobj.catId, tbinfo->dobj.dumpId, tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name, --- 10842,10847 ---- *************** *** 10926,10931 **** --- 10984,10992 ---- */ if (indxinfo->indexconstraint == 0) { + if (binary_upgrade) + binary_upgrade_set_relfilenodes(q, indxinfo->dobj.catId.oid, true); + /* Plain secondary index */ appendPQExpBuffer(q, "%s;\n", indxinfo->indexdef); *************** *** 11006,11011 **** --- 11067,11075 ---- exit_nicely(); } + if (binary_upgrade && !coninfo->condef) + binary_upgrade_set_relfilenodes(q, indxinfo->dobj.catId.oid, true); + appendPQExpBuffer(q, "ALTER TABLE ONLY %s\n", fmtId(tbinfo->dobj.name)); appendPQExpBuffer(q, " ADD CONSTRAINT %s ", *************** *** 11416,11422 **** --- 11480,11489 ---- resetPQExpBuffer(query); if (binary_upgrade) + { + binary_upgrade_set_relfilenodes(query, tbinfo->dobj.catId.oid, false); binary_upgrade_set_type_oids_by_rel_oid(query, tbinfo->dobj.catId.oid); + } appendPQExpBuffer(query, "CREATE SEQUENCE %s\n", Index: src/include/catalog/toasting.h =================================================================== RCS file: /cvsroot/pgsql/src/include/catalog/toasting.h,v retrieving revision 1.11 diff -c -c -r1.11 toasting.h *** src/include/catalog/toasting.h 2 Jan 2010 16:58:02 -0000 1.11 --- src/include/catalog/toasting.h 5 Jan 2010 02:37:22 -0000 *************** *** 17,24 **** /* * toasting.c prototypes */ ! extern void AlterTableCreateToastTable(Oid relOid, Oid toastOid, ! Datum reloptions, bool force); extern void BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid); --- 17,23 ---- /* * toasting.c prototypes */ ! extern void AlterTableCreateToastTable(Oid relOid, Datum reloptions); extern void BootstrapToastTable(char *relName, Oid toastOid, Oid toastIndexOid);
Bruce Momjian wrote: > pg_migrator has become more popular recently, so it seems time to look > at some enhancements that would improve pg_migrator. None of these are > required, but rather changes that would be nice to have: > > 1) Right now pg_migrator preserves relfilenodes for TOAST files because > this is required for proper migration. Now that we have shown that > strategically-placed global variables with a server-side function to set > them is a viable solution, it would be nice to preserve all relfilenodes > from the old server. This would simplify pg_migrator by no long > requiring place-holder relfilenodes or the renaming of TOAST files. A > simpler solution would just be to allow TOAST table creation to > automatically remove placeholder files and create specified relfilenodes > via global variables. > > 2) Right now pg_migrator renames old tablespaces to .old, which fails > if the tablespaces are on mount points. I have already received a > report of such a failure. $PGDATA also has that issue, but that > renaming has to be done by the user before pg_migrator is run, and only > if they want to keep the same $PGDATA value after migration, i.e. no > version-specific directory path. One idea we floated around was to have > tablespaces use major version directory names under the tablespace > directory so renaming would not be necessary. I could implement a > pg_migrator --delete-old flag to cleanly delete the old 8.4 server files > which are not in a version-specific subdirectory. > > 3) There is no easy way to analyze all databases. vacuumdb --analyze > does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an > unnecessary vacuum. Right now I recommend ANALYZE in every database, > but it would be nice if there were a single command which did this. > > 4) I have implemented the ability to run pg_migrator --check on a live > old server. However, pg_migrator uses information from controldata to > check things, and it also needs xid information that is only available > via pg_resetxlog -n(no update) to perform the migration. Unfortunately, > pg_resetxlog -n cannot be run on a live server, so pg_migrator runs > pg_controldata for --check and pg_resetxlog -n for real upgrades. It > would simplify pg_migrator if I would run pg_resetxlog -n on a live > server, but I can understand if people don't want to do that because the > xid information reported on a live server is inaccurate. FYI, for those keeping score, I have posted patches for #1 and #3, and we have decided not to implement #2 and #4. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Alvaro Herrera wrote: >> I thought it was impossible to use bare mountpoints as tablespaces due >> to ownership problems ... Is that not the case? -1 for special hacks >> that work around bogus setups, if that means intrusive changes to the >> core code. > I talked to the person who reported the problem and he and I confirmed > that it is quite easy to make the mount point be owned by the postgres > user and have that function as a tablespace. Is that not a supported > setup? It might be *possible*, but that doesn't make it a good idea. The traditional sysadmin advice in this area is that mount points should be owned by root. I don't really remember the reasoning but I'm pretty sure I remember the principle. > It was actually Tom's idea months ago to put a version-specific > directory in the tablespace. I was just about to re-suggest that. Why do you think it's such a bad idea? regards, tom lane
Bruce Momjian <bruce@momjian.us> writes: > Well, when you restore the old dump's schema into the new server, the > tablespace directory path will be the same, so we had better not have > any directory there. If we implicitly added "/8.5" etc at the end of the specified tablespace path, the problems go away. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Alvaro Herrera wrote: > >> I thought it was impossible to use bare mountpoints as tablespaces due > >> to ownership problems ... Is that not the case? -1 for special hacks > >> that work around bogus setups, if that means intrusive changes to the > >> core code. > > > I talked to the person who reported the problem and he and I confirmed > > that it is quite easy to make the mount point be owned by the postgres > > user and have that function as a tablespace. Is that not a supported > > setup? > > It might be *possible*, but that doesn't make it a good idea. The > traditional sysadmin advice in this area is that mount points should > be owned by root. I don't really remember the reasoning but I'm pretty > sure I remember the principle. Yea, I think the logic is that files under that directory disappear after the mount, so you don't want users putting things in there accidentally. In fact, the user said they mount a pg_xlog directory under the $PGDATA directory (rather than use a symlink), which I also thought was an odd approach and prone to problems if the mount failed intermittently. > > It was actually Tom's idea months ago to put a version-specific > > directory in the tablespace. > > I was just about to re-suggest that. Why do you think it's such a > bad idea? I liked the idea, but I listed it as item #2 and no one else said they liked it. The only complexity I can see with the idea is that doing an upgrade from one alpha to another would have the same major version number and would therefore not be possible, so maybe we have to use the catalog version number in there. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > I liked the idea, but I listed it as item #2 and no one else said they > liked it. The only complexity I can see with the idea is that doing an > upgrade from one alpha to another would have the same major version > number and would therefore not be possible, so maybe we have to use the > catalog version number in there. Good point. Using catversion for the purpose seems a bit ugly but I have no better ideas. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I liked the idea, but I listed it as item #2 and no one else said they > > liked it. The only complexity I can see with the idea is that doing an > > upgrade from one alpha to another would have the same major version > > number and would therefore not be possible, so maybe we have to use the > > catalog version number in there. > > Good point. Using catversion for the purpose seems a bit ugly but > I have no better ideas. I thought we had rejected the idea of being able to migrate between alphas. Is migrating between major versions not difficult enough? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Tue, Jan 5, 2010 at 11:06 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >> > I liked the idea, but I listed it as item #2 and no one else said they >> > liked it. The only complexity I can see with the idea is that doing an >> > upgrade from one alpha to another would have the same major version >> > number and would therefore not be possible, so maybe we have to use the >> > catalog version number in there. >> >> Good point. Using catversion for the purpose seems a bit ugly but >> I have no better ideas. > > I thought we had rejected the idea of being able to migrate between > alphas. Is migrating between major versions not difficult enough? We like a challenge. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 5, 2010 at 11:06 AM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> Tom Lane wrote: >>> Good point. �Using catversion for the purpose seems a bit ugly but >>> I have no better ideas. >> >> I thought we had rejected the idea of being able to migrate between >> alphas. �Is migrating between major versions not difficult enough? > We like a challenge. The problem with using just major version there is that then we are *wiring into the on-disk representation* the assumption that pg_migrator only goes from one major version to the next. I agree that we're not likely to start supporting cross-alpha-version migration any time soon, but I don't think it's wise to foreclose the possibility of ever doing it. regards, tom lane
Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Jan 5, 2010 at 11:06 AM, Alvaro Herrera > > <alvherre@commandprompt.com> wrote: > >> Tom Lane wrote: > >>> Good point. �Using catversion for the purpose seems a bit ugly but > >>> I have no better ideas. > >> > >> I thought we had rejected the idea of being able to migrate between > >> alphas. �Is migrating between major versions not difficult enough? > > > We like a challenge. > > The problem with using just major version there is that then we are > *wiring into the on-disk representation* the assumption that pg_migrator > only goes from one major version to the next. I agree that we're not > likely to start supporting cross-alpha-version migration any time soon, > but I don't think it's wise to foreclose the possibility of ever doing > it. I know people are trying to make things easier on pg_migrator, but frankly going from alpha to alpha does not require any new code in pg_migrator. And pg_migrator already supports cross-alpha-version migration just using the existing code --- no new code was added to enable this. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Alvaro Herrera wrote: > Bruce Momjian wrote: > > pg_migrator has become more popular recently, so it seems time to look > > at some enhancements that would improve pg_migrator. None of these are > > required, but rather changes that would be nice to have: > > > > 1) Right now pg_migrator preserves relfilenodes for TOAST files because > > this is required for proper migration. Now that we have shown that > > strategically-placed global variables with a server-side function to set > > them is a viable solution, it would be nice to preserve all relfilenodes > > from the old server. This would simplify pg_migrator by no long > > requiring place-holder relfilenodes or the renaming of TOAST files. A > > simpler solution would just be to allow TOAST table creation to > > automatically remove placeholder files and create specified relfilenodes > > via global variables. > > Getting rid of the need for placeholders is a good idea. +1 on getting > TOAST tables created with the correct relfilenode from the start. I > don't know that preserving any other relfilenode is useful; however if > it means you no longer have to rename the files underlying each table, --> > it would probably also be a good idea. (I don't know how does --> > pg_migrator deal with such things currently -- does it keep a map of --> > table name to relfilenode?) Yes, and it will still need that because we don't want to transfer over any of the system tables or pg_catalog files. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Bruce Momjian wrote: > > pg_migrator has become more popular recently, so it seems time to look > > at some enhancements that would improve pg_migrator. None of these are > > required, but rather changes that would be nice to have: > > > > 1) Right now pg_migrator preserves relfilenodes for TOAST files because > > this is required for proper migration. Now that we have shown that > > strategically-placed global variables with a server-side function to set > > them is a viable solution, it would be nice to preserve all relfilenodes > > from the old server. This would simplify pg_migrator by no long > > requiring place-holder relfilenodes or the renaming of TOAST files. A > > simpler solution would just be to allow TOAST table creation to > > automatically remove placeholder files and create specified relfilenodes > > via global variables. > > Attached is a patch that implements #1 above by preserving all > relfilenodes, with pg_dump support. It uses the same method I used for > preserving pg_type/pg_enum. I have tested this on the regression > database and it successfully preserved all relfilenodes. > > This patch also removes the 'force' parameter in toast functions that > Tom added for 8.4 --- it is no longer needed. Applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Alvaro Herrera wrote: > > > 3) There is no easy way to analyze all databases. vacuumdb --analyze > > > does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an > > > unnecessary vacuum. Right now I recommend ANALYZE in every database, > > > but it would be nice if there were a single command which did this. > > > > +1 for vacuumdb --analyze-only > > OK, I have implemented this using --only-analyze to avoid having the > '--anal' option spelling be ambiguous, which might confuse/frustrate > users. > > I also moved the --freeze option documention mention into a more logical > place. > > Patch attached. Applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Dec 30, 2009, at 9:50 PM, Bruce Momjian wrote: > 3) There is no easy way to analyze all databases. vacuumdb --analyze > does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an > unnecessary vacuum. Right now I recommend ANALYZE in every database, > but it would be nice if there were a single command which did this. I actually started on a patch for this (http://lnk.nu/archives.postgresql.org/14rm.php). IIRC it's pretty close, I just haven'thad time to come back to it for final cleanup and changing the docs as needed. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Jan 6, 2010, at 1:52 AM, decibel wrote: > On Dec 30, 2009, at 9:50 PM, Bruce Momjian wrote: >> 3) There is no easy way to analyze all databases. vacuumdb --analyze >> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an >> unnecessary vacuum. Right now I recommend ANALYZE in every database, >> but it would be nice if there were a single command which did this. > > I actually started on a patch for this (http://lnk.nu/archives.postgresql.org/14rm.php). IIRC it's pretty close, I justhaven't had time to come back to it for final cleanup and changing the docs as needed. Crap, I see I should have read the whole thread before posting. Sorry for the noise (and not getting the patch completed).:( -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Dne 4.01.10 19:28, Alvaro Herrera napsal(a): > Bruce Momjian escribió: > >> I considered that but realize that pg_migrator has to read >> pg_controldata in both the old and new servers, meaning it would need >> access to both C structures, and considering they both have the same >> structure names, that would require some odd C tricks. Add to that you >> don't know which version of Postgres you are migrating from/to during >> compile and the idea of using C becomes even less attractive. > > However, keep in mind that this might not be the last time on which we > will want to read something from a C struct, so perhaps it would be good > to bite the bullet and write the odd tricks. Does it already have > access (at compile time) to the old and new source trees? I have some proof of concept when each control data struct version version have one header file like pg_control_843.h and structure like ControlFileData has name ControlFileData_843. The main pg_control.h defines types without version like typedef ControlFileData_843 ControlFileData; I planed to do it for 8.5 but unfortunately no time :( commit fest is too close. Zdenek
Zdenek Kotala wrote: > Dne 4.01.10 19:28, Alvaro Herrera napsal(a): > > Bruce Momjian escribi?: > > > >> I considered that but realize that pg_migrator has to read > >> pg_controldata in both the old and new servers, meaning it would need > >> access to both C structures, and considering they both have the same > >> structure names, that would require some odd C tricks. Add to that you > >> don't know which version of Postgres you are migrating from/to during > >> compile and the idea of using C becomes even less attractive. > > > > However, keep in mind that this might not be the last time on which we > > will want to read something from a C struct, so perhaps it would be good > > to bite the bullet and write the odd tricks. Does it already have > > access (at compile time) to the old and new source trees? > > I have some proof of concept when each control data struct version > version have one header file like pg_control_843.h and structure like > ControlFileData has name ControlFileData_843. The main pg_control.h > defines types without version like > > typedef ControlFileData_843 ControlFileData; > > I planed to do it for 8.5 but unfortunately no time :( commit fest is > too close. Yea, I think the confusion of doing it isn't worth it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > 2) Right now pg_migrator renames old tablespaces to .old, which fails > if the tablespaces are on mount points. I have already received a > report of such a failure. $PGDATA also has that issue, but that > renaming has to be done by the user before pg_migrator is run, and only > if they want to keep the same $PGDATA value after migration, i.e. no > version-specific directory path. One idea we floated around was to have > tablespaces use major version directory names under the tablespace > directory so renaming would not be necessary. I could implement a > pg_migrator --delete-old flag to cleanly delete the old 8.4 server files > which are not in a version-specific subdirectory. FYI, pg_migrator CVS now uses the relfilenode preservation ability in 8.5 to avoid the creation of placeholder relfilenodes and renaming. It also recommends using vacuumdb --only-analyze. To simplify pg_migrator, you can now only migrate _to_ 8.5 as of today's CVS, not earlier 8.5 CVS trees. One interesting aspect of pg_migrator is that while it has to carry around support for upgrading _from_ many old releases, the target/to version support can stay limited, i.e. I doubt anyone is going to want to use pg_migrator to migrate to 8.4 in a year or two --- they will be migrating to 8.5. We can also consider removing 8.4 target migration support in pg_migrator 8.5 and just tell people they have to use pg_migrator 8.4.X for a migration to PG 8.4.X. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On mån, 2010-01-04 at 13:07 -0500, Bruce Momjian wrote: > Yea, I am not excited about having vacuumdb do only analyze, but it > seems the most minimal solution. I spelled it --only-analyze and just > posted the reason and patch. I can't find the patch and the reason, but note that we already have other options like --data-only, --schema-only, --tuples-only. I personally don't like the spelling of --only-analyze.
Peter Eisentraut <peter_e@gmx.net> writes: > On mån, 2010-01-04 at 13:07 -0500, Bruce Momjian wrote: >> Yea, I am not excited about having vacuumdb do only analyze, but it >> seems the most minimal solution. I spelled it --only-analyze and just >> posted the reason and patch. > I can't find the patch and the reason, but note that we already have > other options like --data-only, --schema-only, --tuples-only. I > personally don't like the spelling of --only-analyze. In particular note that pg_dump has options --schema and --schema-only, and nobody has complained about that. I concur with Peter that this spelling is gratuitously unlike everyplace else. Perhaps we could compromise by calling it "--no-vacuum", instead? regards, tom lane
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > On mån, 2010-01-04 at 13:07 -0500, Bruce Momjian wrote: > >> Yea, I am not excited about having vacuumdb do only analyze, but it > >> seems the most minimal solution. I spelled it --only-analyze and just > >> posted the reason and patch. > > > I can't find the patch and the reason, but note that we already have > > other options like --data-only, --schema-only, --tuples-only. I > > personally don't like the spelling of --only-analyze. > > In particular note that pg_dump has options --schema and --schema-only, > and nobody has complained about that. I concur with Peter that this > spelling is gratuitously unlike everyplace else. Oh, interesting about pg_dump. Let's just go with --analyze-only. --only-analyze is feeling odd to me too. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
> Tom Lane wrote: > > Peter Eisentraut <peter_e@gmx.net> writes: > > > On m<C3><A5>n, 2010-01-04 at 13:07 -0500, Bruce Momjian wrote: > > >> Yea, I am not excited about having vacuumdb do only analyze, but it > > >> seems the most minimal solution. I spelled it --only-analyze and just > > >> posted the reason and patch. > > > > > I can't find the patch and the reason, but note that we already have > > > other options like --data-only, --schema-only, --tuples-only. I > > > personally don't like the spelling of --only-analyze. > > > > In particular note that pg_dump has options --schema and --schema-only, > > and nobody has complained about that. I concur with Peter that this > > spelling is gratuitously unlike everyplace else. > > Oh, interesting about pg_dump. Let's just go with --analyze-only. > --only-analyze is feeling odd to me too. Done, attached and applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/ref/vacuumdb.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/vacuumdb.sgml,v retrieving revision 1.47 diff -c -c -r1.47 vacuumdb.sgml *** doc/src/sgml/ref/vacuumdb.sgml 6 Jan 2010 05:31:13 -0000 1.47 --- doc/src/sgml/ref/vacuumdb.sgml 7 Jan 2010 12:37:08 -0000 *************** *** 28,34 **** <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--only-analyze</arg><arg>-o</arg></group> <arg>--table | -t <replaceable>table</replaceable> <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg> </arg> --- 28,34 ---- <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--analyze-only</arg><arg>-o</arg></group> <arg>--table | -t <replaceable>table</replaceable> <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg> </arg> *************** *** 42,48 **** <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--only-analyze</arg><arg>-o</arg></group> </cmdsynopsis> </refsynopsisdiv> --- 42,48 ---- <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--analyze-only</arg><arg>-o</arg></group> </cmdsynopsis> </refsynopsisdiv> *************** *** 143,149 **** <varlistentry> <term><option>-o</option></term> ! <term><option>--only-analyze</option></term> <listitem> <para> Only calculate statistics for use by the optimizer (no vacuum). --- 143,149 ---- <varlistentry> <term><option>-o</option></term> ! <term><option>--analyze-only</option></term> <listitem> <para> Only calculate statistics for use by the optimizer (no vacuum). *************** *** 168,174 **** <para> Clean or analyze <replaceable class="parameter">table</replaceable> only. Column names can be specified only in conjunction with ! the <option>--analyze</option> or <option>--only-analyze</option> options. </para> <tip> <para> --- 168,174 ---- <para> Clean or analyze <replaceable class="parameter">table</replaceable> only. Column names can be specified only in conjunction with ! the <option>--analyze</option> or <option>--analyze-only</option> options. </para> <tip> <para> Index: src/bin/scripts/vacuumdb.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/scripts/vacuumdb.c,v retrieving revision 1.31 diff -c -c -r1.31 vacuumdb.c *** src/bin/scripts/vacuumdb.c 6 Jan 2010 16:04:05 -0000 1.31 --- src/bin/scripts/vacuumdb.c 7 Jan 2010 12:37:08 -0000 *************** *** 15,26 **** static void vacuum_one_database(const char *dbname, bool full, bool inplace, bool verbose, ! bool and_analyze, bool only_analyze, bool freeze, const char *table, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo); static void vacuum_all_databases(bool full, bool inplace, bool verbose, bool and_analyze, ! bool only_analyze, bool freeze, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo, bool quiet); --- 15,26 ---- static void vacuum_one_database(const char *dbname, bool full, bool inplace, bool verbose, ! bool and_analyze, bool analyze_only, bool freeze, const char *table, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo); static void vacuum_all_databases(bool full, bool inplace, bool verbose, bool and_analyze, ! bool analyze_only, bool freeze, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo, bool quiet); *************** *** 41,47 **** {"quiet", no_argument, NULL, 'q'}, {"dbname", required_argument, NULL, 'd'}, {"analyze", no_argument, NULL, 'z'}, ! {"only-analyze", no_argument, NULL, 'o'}, {"freeze", no_argument, NULL, 'F'}, {"all", no_argument, NULL, 'a'}, {"table", required_argument, NULL, 't'}, --- 41,47 ---- {"quiet", no_argument, NULL, 'q'}, {"dbname", required_argument, NULL, 'd'}, {"analyze", no_argument, NULL, 'z'}, ! {"analyze-only", no_argument, NULL, 'o'}, {"freeze", no_argument, NULL, 'F'}, {"all", no_argument, NULL, 'a'}, {"table", required_argument, NULL, 't'}, *************** *** 63,69 **** bool echo = false; bool quiet = false; bool and_analyze = false; ! bool only_analyze = false; bool freeze = false; bool alldb = false; char *table = NULL; --- 63,69 ---- bool echo = false; bool quiet = false; bool and_analyze = false; ! bool analyze_only = false; bool freeze = false; bool alldb = false; char *table = NULL; *************** *** 108,114 **** and_analyze = true; break; case 'o': ! only_analyze = true; break; case 'F': freeze = true; --- 108,114 ---- and_analyze = true; break; case 'o': ! analyze_only = true; break; case 'F': freeze = true; *************** *** 155,161 **** exit(1); } ! if (only_analyze) { if (full) { --- 155,161 ---- exit(1); } ! if (analyze_only) { if (full) { *************** *** 169,175 **** progname); exit(1); } ! /* allow 'and_analyze' with 'only_analyze' */ } setup_cancel_handler(); --- 169,175 ---- progname); exit(1); } ! /* allow 'and_analyze' with 'analyze_only' */ } setup_cancel_handler(); *************** *** 189,195 **** exit(1); } ! vacuum_all_databases(full, inplace, verbose, and_analyze, only_analyze, freeze, host, port, username, prompt_password, progname, echo, quiet); } --- 189,195 ---- exit(1); } ! vacuum_all_databases(full, inplace, verbose, and_analyze, analyze_only, freeze, host, port, username, prompt_password, progname, echo, quiet); } *************** *** 205,211 **** dbname = get_user_name(progname); } ! vacuum_one_database(dbname, full, inplace, verbose, and_analyze, only_analyze, freeze, table, host, port, username, prompt_password, progname, echo); --- 205,211 ---- dbname = get_user_name(progname); } ! vacuum_one_database(dbname, full, inplace, verbose, and_analyze, analyze_only, freeze, table, host, port, username, prompt_password, progname, echo); *************** *** 217,223 **** static void vacuum_one_database(const char *dbname, bool full, bool inplace, bool verbose, bool and_analyze, ! bool only_analyze, bool freeze, const char *table, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo) --- 217,223 ---- static void vacuum_one_database(const char *dbname, bool full, bool inplace, bool verbose, bool and_analyze, ! bool analyze_only, bool freeze, const char *table, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo) *************** *** 230,236 **** conn = connectDatabase(dbname, host, port, username, prompt_password, progname); ! if (only_analyze) { appendPQExpBuffer(&sql, "ANALYZE"); if (verbose) --- 230,236 ---- conn = connectDatabase(dbname, host, port, username, prompt_password, progname); ! if (analyze_only) { appendPQExpBuffer(&sql, "ANALYZE"); if (verbose) *************** *** 306,312 **** static void ! vacuum_all_databases(bool full, bool inplace, bool verbose, bool and_analyze, bool only_analyze, bool freeze, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo, bool quiet) --- 306,312 ---- static void ! vacuum_all_databases(bool full, bool inplace, bool verbose, bool and_analyze, bool analyze_only, bool freeze, const char *host, const char *port, const char *username, enum trivalue prompt_password, const char *progname, bool echo, bool quiet) *************** *** 329,335 **** fflush(stdout); } ! vacuum_one_database(dbname, full, inplace, verbose, and_analyze, only_analyze, freeze, NULL, host, port, username, prompt_password, progname, echo); } --- 329,335 ---- fflush(stdout); } ! vacuum_one_database(dbname, full, inplace, verbose, and_analyze, analyze_only, freeze, NULL, host, port, username, prompt_password, progname, echo); } *************** *** 351,357 **** printf(_(" -f, --full do full vacuuming\n")); printf(_(" -F, --freeze freeze row transaction information\n")); printf(_(" -i, --inplace do full inplace vacuuming\n")); ! printf(_(" -o, --only-analyze only update optimizer hints\n")); printf(_(" -q, --quiet don't write any messages\n")); printf(_(" -t, --table='TABLE[(COLUMNS)]' vacuum specific table only\n")); printf(_(" -v, --verbose write a lot of output\n")); --- 351,357 ---- printf(_(" -f, --full do full vacuuming\n")); printf(_(" -F, --freeze freeze row transaction information\n")); printf(_(" -i, --inplace do full inplace vacuuming\n")); ! printf(_(" -o, --analyze-only only update optimizer hints\n")); printf(_(" -q, --quiet don't write any messages\n")); printf(_(" -t, --table='TABLE[(COLUMNS)]' vacuum specific table only\n")); printf(_(" -v, --verbose write a lot of output\n"));
Bruce Momjian escribió: > > Oh, interesting about pg_dump. Let's just go with --analyze-only. > > --only-analyze is feeling odd to me too. > > Done, attached and applied. <bikeshedding> Why -o and not, say, -Z? I imagine you picked -o for "only" but it seems strange. </> -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Bruce Momjian escribi?: > > > > Oh, interesting about pg_dump. Let's just go with --analyze-only. > > > --only-analyze is feeling odd to me too. > > > > Done, attached and applied. > > <bikeshedding> > Why -o and not, say, -Z? I imagine you picked -o for "only" but it > seems strange. > </> Hmmm, sure -Z makes sense. Change applied with attached patch. Thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/ref/vacuumdb.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/vacuumdb.sgml,v retrieving revision 1.48 diff -c -c -r1.48 vacuumdb.sgml *** doc/src/sgml/ref/vacuumdb.sgml 7 Jan 2010 12:38:55 -0000 1.48 --- doc/src/sgml/ref/vacuumdb.sgml 7 Jan 2010 14:34:39 -0000 *************** *** 28,34 **** <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--analyze-only</arg><arg>-o</arg></group> <arg>--table | -t <replaceable>table</replaceable> <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg> </arg> --- 28,34 ---- <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--analyze-only</arg><arg>-Z</arg></group> <arg>--table | -t <replaceable>table</replaceable> <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg> </arg> *************** *** 42,48 **** <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--analyze-only</arg><arg>-o</arg></group> </cmdsynopsis> </refsynopsisdiv> --- 42,48 ---- <group><arg>--freeze</arg><arg>-F</arg></group> <group><arg>--verbose</arg><arg>-v</arg></group> <group><arg>--analyze</arg><arg>-z</arg></group> ! <group><arg>--analyze-only</arg><arg>-Z</arg></group> </cmdsynopsis> </refsynopsisdiv> *************** *** 142,148 **** </varlistentry> <varlistentry> ! <term><option>-o</option></term> <term><option>--analyze-only</option></term> <listitem> <para> --- 142,148 ---- </varlistentry> <varlistentry> ! <term><option>-Z</option></term> <term><option>--analyze-only</option></term> <listitem> <para> Index: src/bin/scripts/vacuumdb.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/scripts/vacuumdb.c,v retrieving revision 1.32 diff -c -c -r1.32 vacuumdb.c *** src/bin/scripts/vacuumdb.c 7 Jan 2010 12:38:55 -0000 1.32 --- src/bin/scripts/vacuumdb.c 7 Jan 2010 14:34:44 -0000 *************** *** 41,47 **** {"quiet", no_argument, NULL, 'q'}, {"dbname", required_argument, NULL, 'd'}, {"analyze", no_argument, NULL, 'z'}, ! {"analyze-only", no_argument, NULL, 'o'}, {"freeze", no_argument, NULL, 'F'}, {"all", no_argument, NULL, 'a'}, {"table", required_argument, NULL, 't'}, --- 41,47 ---- {"quiet", no_argument, NULL, 'q'}, {"dbname", required_argument, NULL, 'd'}, {"analyze", no_argument, NULL, 'z'}, ! {"analyze-only", no_argument, NULL, 'Z'}, {"freeze", no_argument, NULL, 'F'}, {"all", no_argument, NULL, 'a'}, {"table", required_argument, NULL, 't'}, *************** *** 107,113 **** case 'z': and_analyze = true; break; ! case 'o': analyze_only = true; break; case 'F': --- 107,113 ---- case 'z': and_analyze = true; break; ! case 'Z': analyze_only = true; break; case 'F': *************** *** 351,361 **** printf(_(" -f, --full do full vacuuming\n")); printf(_(" -F, --freeze freeze row transaction information\n")); printf(_(" -i, --inplace do full inplace vacuuming\n")); - printf(_(" -o, --analyze-only only update optimizer hints\n")); printf(_(" -q, --quiet don't write any messages\n")); printf(_(" -t, --table='TABLE[(COLUMNS)]' vacuum specific table only\n")); printf(_(" -v, --verbose write a lot of output\n")); printf(_(" -z, --analyze update optimizer hints\n")); printf(_(" --help show this help, then exit\n")); printf(_(" --version output version information, then exit\n")); printf(_("\nConnection options:\n")); --- 351,361 ---- printf(_(" -f, --full do full vacuuming\n")); printf(_(" -F, --freeze freeze row transaction information\n")); printf(_(" -i, --inplace do full inplace vacuuming\n")); printf(_(" -q, --quiet don't write any messages\n")); printf(_(" -t, --table='TABLE[(COLUMNS)]' vacuum specific table only\n")); printf(_(" -v, --verbose write a lot of output\n")); printf(_(" -z, --analyze update optimizer hints\n")); + printf(_(" -Z, --analyze-only only update optimizer hints\n")); printf(_(" --help show this help, then exit\n")); printf(_(" --version output version information, then exit\n")); printf(_("\nConnection options:\n"));
Bruce Momjian wrote: > 2) Right now pg_migrator renames old tablespaces to .old, which fails > if the tablespaces are on mount points. I have already received a > report of such a failure. $PGDATA also has that issue, but that > renaming has to be done by the user before pg_migrator is run, and only > if they want to keep the same $PGDATA value after migration, i.e. no > version-specific directory path. One idea we floated around was to have > tablespaces use major version directory names under the tablespace > directory so renaming would not be necessary. I could implement a > pg_migrator --delete-old flag to cleanly delete the old 8.4 server files > which are not in a version-specific subdirectory. I have created a patch to implement per-cluster directories in tablespaces. This is for use by pg_migrator so it doesn't have to rename the tablespaces during the migration. Users still need to remove the old cluster's tablespace subdirectory, and I can add a --delete-old option to pg_migrator to do that. The old code used a symlink from pg_tblspc/#### to the location directory specified in CREATE TABLESPACE. During CREATE TABLESPACE, a PG_VERSION file is created containing the major version number. Anytime a database object is created in the tablespace, a per-database directory is created. With the new code in this patch, pg_tblspc/#### points to the CREATE TABLESPACE directory just like before, but a new directory, PG_ + major_version + catalog_version, e.g. PG_8.5_201001061, is created and all per-database directories are created under that directory. This directory has the same purpose as the old PG_VERSION file. One disadvantage of this approach is that functions that need to look inside tablespaces must now also specify the version directory, e.g. pg_tablespace_databases(). An alternative approach would be for the pg_tblspc/#### symbolic link to point to the new version directory, PG_*, but that makes removal of the version directory complicated, particularly during WAL replay where we don't have access to the system catalogs, and readlink() to read the symbolic link target is not supported on all operating systems (particularly Win32). I used the version directory pattern "PG_8.5_201001061" because "PG_" helps people realize the directory is for the use of Postgres (PG_VERSION is gone in tablespaces), and the catalog version number enables alpha migrations. The major version number is not necessary but probably useful for administrators. pg_migrator is going to need to know about the version directory too, and it can't use the C macro --- it has to construct the directory pattern based on the contents of pg_control from the old and new servers. And, it is going to be difficult to run pg_control on the old server for pg_migrator --delete-old after migration because it is renamed to pg_control.old --- I will need to create a symbolic link during the time I run pg_controldata. Also, the contents of the tablespace directory for an 8.4 to 8.5 migration is going to be ugly because there will be many numeric directories (for databases), and PG_VERSION (for 8.4), and the PG_8.5_201001061 directory which should not be touched. Can someone explain why TablespaceCreateDbspace() creates a non-symlink directory during recovery if the symlink is missing? Is it just for robustness? I would like to document that more clearly. Comments? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/catalog/catalog.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/catalog.c,v retrieving revision 1.86 diff -c -c -r1.86 catalog.c *** src/backend/catalog/catalog.c 6 Jan 2010 02:41:37 -0000 1.86 --- src/backend/catalog/catalog.c 8 Jan 2010 01:07:41 -0000 *************** *** 115,130 **** else { /* All other tablespaces are accessed via symlinks */ ! pathlen = 10 + OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS + 1 ! + FORKNAMECHARS + 1; path = (char *) palloc(pathlen); if (forknum != MAIN_FORKNUM) ! snprintf(path, pathlen, "pg_tblspc/%u/%u/%u_%s", ! rnode.spcNode, rnode.dbNode, rnode.relNode, ! forkNames[forknum]); else ! snprintf(path, pathlen, "pg_tblspc/%u/%u/%u", ! rnode.spcNode, rnode.dbNode, rnode.relNode); } return path; } --- 115,131 ---- else { /* All other tablespaces are accessed via symlinks */ ! pathlen = 9 + 1 + OIDCHARS + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + ! 1 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1; path = (char *) palloc(pathlen); if (forknum != MAIN_FORKNUM) ! snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s", ! rnode.spcNode, TABLESPACE_VERSION_DIRECTORY, ! rnode.dbNode, rnode.relNode, forkNames[forknum]); else ! snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u", ! rnode.spcNode, TABLESPACE_VERSION_DIRECTORY, ! rnode.dbNode, rnode.relNode); } return path; } *************** *** 161,170 **** else { /* All other tablespaces are accessed via symlinks */ ! pathlen = 10 + OIDCHARS + 1 + OIDCHARS + 1; path = (char *) palloc(pathlen); ! snprintf(path, pathlen, "pg_tblspc/%u/%u", ! spcNode, dbNode); } return path; } --- 162,172 ---- else { /* All other tablespaces are accessed via symlinks */ ! pathlen = 9 + 1 + OIDCHARS + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + ! 1 + OIDCHARS + 1; path = (char *) palloc(pathlen); ! snprintf(path, pathlen, "pg_tblspc/%u/%s/%u", ! spcNode, TABLESPACE_VERSION_DIRECTORY, dbNode); } return path; } Index: src/backend/commands/tablespace.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v retrieving revision 1.70 diff -c -c -r1.70 tablespace.c *** src/backend/commands/tablespace.c 7 Jan 2010 04:10:39 -0000 1.70 --- src/backend/commands/tablespace.c 8 Jan 2010 01:07:41 -0000 *************** *** 15,22 **** * To support file access via the information given in RelFileNode, we * maintain a symbolic-link map in $PGDATA/pg_tblspc. The symlinks are * named by tablespace OIDs and point to the actual tablespace directories. * Thus the full path to an arbitrary file is ! * $PGDATA/pg_tblspc/spcoid/dboid/relfilenode * * There are two tablespaces created at initdb time: pg_global (for shared * tables) and pg_default (for everything else). For backwards compatibility --- 15,25 ---- * To support file access via the information given in RelFileNode, we * maintain a symbolic-link map in $PGDATA/pg_tblspc. The symlinks are * named by tablespace OIDs and point to the actual tablespace directories. + * There is also a per-cluster version directory in each tablespace. * Thus the full path to an arbitrary file is ! * $PGDATA/pg_tblspc/spcoid/PG_MAJORVER_CATVER/dboid/relfilenode ! * e.g. ! * $PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814 * * There are two tablespaces created at initdb time: pg_global (for shared * tables) and pg_default (for everything else). For backwards compatibility *************** *** 81,88 **** char *temp_tablespaces = NULL; ! static bool remove_tablespace_directories(Oid tablespaceoid, bool redo); ! static void write_version_file(const char *path); /* --- 84,92 ---- char *temp_tablespaces = NULL; ! static void create_tablespace_directories(const char *location, ! const Oid tablespaceoid); ! static bool destroy_tablespace_directories(Oid tablespaceoid, bool redo); /* *************** *** 146,163 **** { char *parentdir; ! /* Failure other than not exists? */ if (errno != ENOENT || !isRedo) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create directory \"%s\": %m", dir))); ! /* Parent directory must be missing */ parentdir = pstrdup(dir); get_parent_directory(parentdir); ! /* Can't create parent either? */ ! if (mkdir(parentdir, S_IRWXU) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create directory \"%s\": %m", --- 150,185 ---- { char *parentdir; ! /* Failure other than not exists or not in WAL replay? */ if (errno != ENOENT || !isRedo) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create directory \"%s\": %m", dir))); ! /* ! * Parent directories are missing during WAL replay, so ! * continue by creating simple parent directories ! * rather than a symlink. ! */ ! ! /* create two parents up if not exist */ parentdir = pstrdup(dir); get_parent_directory(parentdir); ! get_parent_directory(parentdir); ! /* Can't create parent and it doesn't already exist? */ ! if (mkdir(parentdir, S_IRWXU) < 0 && errno != EEXIST) ! ereport(ERROR, ! (errcode_for_file_access(), ! errmsg("could not create directory \"%s\": %m", ! parentdir))); ! pfree(parentdir); ! ! /* create one parent up if not exist */ ! parentdir = pstrdup(dir); ! get_parent_directory(parentdir); ! /* Can't create parent and it doesn't already exist? */ ! if (mkdir(parentdir, S_IRWXU) < 0 && errno != EEXIST) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create directory \"%s\": %m", *************** *** 212,218 **** HeapTuple tuple; Oid tablespaceoid; char *location; - char *linkloc; Oid ownerId; /* Must be super user */ --- 234,239 ---- *************** *** 251,260 **** /* * Check that location isn't too long. Remember that we're going to append ! * '/<dboid>/<relid>.<nnn>' (XXX but do we ever form the whole path ! * explicitly? This may be overly conservative.) */ ! if (strlen(location) >= MAXPGPATH - 1 - OIDCHARS - 1 - OIDCHARS - 1 - OIDCHARS) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("tablespace location \"%s\" is too long", --- 272,282 ---- /* * Check that location isn't too long. Remember that we're going to append ! * 'PG_XXX/<dboid>/<relid>.<nnn>'. FYI, we never actually reference the ! * whole path, but mkdir() uses the first two parts. */ ! if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + ! OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS > MAXPGPATH) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("tablespace location \"%s\" is too long", *************** *** 311,355 **** /* Record dependency on owner */ recordDependencyOnOwner(TableSpaceRelationId, tablespaceoid, ownerId); ! /* ! * Attempt to coerce target directory to safe permissions. If this fails, ! * it doesn't exist or has the wrong owner. ! */ ! if (chmod(location, 0700) != 0) ! ereport(ERROR, ! (errcode_for_file_access(), ! errmsg("could not set permissions on directory \"%s\": %m", ! location))); ! ! /* ! * Check the target directory is empty. ! */ ! if (!directory_is_empty(location)) ! ereport(ERROR, ! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), ! errmsg("directory \"%s\" is not empty", ! location))); ! ! /* ! * Create the PG_VERSION file in the target directory. This has several ! * purposes: to make sure we can write in the directory, to prevent ! * someone from creating another tablespace pointing at the same directory ! * (the emptiness check above will fail), and to label tablespace ! * directories by PG version. ! */ ! write_version_file(location); ! ! /* ! * All seems well, create the symlink ! */ ! linkloc = (char *) palloc(OIDCHARS + OIDCHARS + 1); ! sprintf(linkloc, "pg_tblspc/%u", tablespaceoid); ! ! if (symlink(location, linkloc) < 0) ! ereport(ERROR, ! (errcode_for_file_access(), ! errmsg("could not create symbolic link \"%s\": %m", ! linkloc))); /* Record the filesystem change in XLOG */ { --- 333,339 ---- /* Record dependency on owner */ recordDependencyOnOwner(TableSpaceRelationId, tablespaceoid, ownerId); ! create_tablespace_directories(location, tablespaceoid); /* Record the filesystem change in XLOG */ { *************** *** 378,384 **** */ ForceSyncCommit(); - pfree(linkloc); pfree(location); /* We keep the lock on pg_tablespace until commit */ --- 362,367 ---- *************** *** 478,484 **** /* * Try to remove the physical infrastructure. */ ! if (!remove_tablespace_directories(tablespaceoid, false)) { /* * Not all files deleted? However, there can be lingering empty files --- 461,467 ---- /* * Try to remove the physical infrastructure. */ ! if (!destroy_tablespace_directories(tablespaceoid, false)) { /* * Not all files deleted? However, there can be lingering empty files *************** *** 490,496 **** * out any lingering files, and try again. */ RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT); ! if (!remove_tablespace_directories(tablespaceoid, false)) { /* Still not empty, the files must be important then */ ereport(ERROR, --- 473,479 ---- * out any lingering files, and try again. */ RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT); ! if (!destroy_tablespace_directories(tablespaceoid, false)) { /* Still not empty, the files must be important then */ ereport(ERROR, *************** *** 542,565 **** #endif /* HAVE_SYMLINK */ } /* ! * remove_tablespace_directories: attempt to remove filesystem infrastructure * ! * Returns TRUE if successful, FALSE if some subdirectory is not empty * ! * redo indicates we are redoing a drop from XLOG; okay if nothing there */ static bool ! remove_tablespace_directories(Oid tablespaceoid, bool redo) { ! char *location; DIR *dirdesc; struct dirent *de; char *subfile; struct stat st; ! location = (char *) palloc(OIDCHARS + OIDCHARS + 1); ! sprintf(location, "pg_tblspc/%u", tablespaceoid); /* * Check if the tablespace still contains any files. We try to rmdir each --- 525,621 ---- #endif /* HAVE_SYMLINK */ } + /* ! * create_tablespace_directories * ! * Attempt to create filesystem infrastructure linking $PGDATA/pg_tblspc/ ! * to the specified directory ! */ ! static void ! create_tablespace_directories(const char *location, const Oid tablespaceoid) ! { ! char *linkloc = palloc(OIDCHARS + OIDCHARS + 1); ! char *location_with_version_dir = palloc(strlen(location) + 1 + ! strlen(TABLESPACE_VERSION_DIRECTORY) + 1); ! ! sprintf(linkloc, "pg_tblspc/%u", tablespaceoid); ! sprintf(location_with_version_dir, "%s/%s", location, ! TABLESPACE_VERSION_DIRECTORY); ! ! /* ! * Attempt to coerce target directory to safe permissions. If this fails, ! * it doesn't exist or has the wrong owner. ! */ ! if (chmod(location, 0700) != 0) ! { ! if (errno == ENOENT) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_FILE), ! errmsg("directory \"%s\" does not exist", ! location))); ! else ! ereport(ERROR, ! (errcode_for_file_access(), ! errmsg("could not set permissions on directory \"%s\": %m", ! location))); ! } ! ! /* ! * The creation of the version directory prevents more than one ! * tablespace in a single location. ! */ ! if (mkdir(location_with_version_dir, S_IRWXU) < 0) ! { ! if (errno == EEXIST) ! ereport(ERROR, ! (errcode(ERRCODE_OBJECT_IN_USE), ! errmsg("directory \"%s\" already in use as a tablespace", ! location_with_version_dir))); ! else ! ereport(ERROR, ! (errcode_for_file_access(), ! errmsg("could not create directory \"%s\": %m", ! location_with_version_dir))); ! } ! ! /* ! * Create the symlink under PGDATA ! */ ! if (symlink(location, linkloc) < 0) ! ereport(ERROR, ! (errcode_for_file_access(), ! errmsg("could not create symbolic link \"%s\": %m", ! linkloc))); ! ! pfree(linkloc); ! pfree(location_with_version_dir); ! } ! ! ! /* ! * destroy_tablespace_directories ! * ! * Attempt to remove filesystem infrastructure ! * ! * 'redo' indicates we are redoing a drop from XLOG; okay if nothing there * ! * Returns TRUE if successful, FALSE if some subdirectory is not empty */ static bool ! destroy_tablespace_directories(Oid tablespaceoid, bool redo) { ! char *linkloc; ! char *linkloc_with_version_dir; DIR *dirdesc; struct dirent *de; char *subfile; struct stat st; ! linkloc_with_version_dir = palloc(9 + 1 + OIDCHARS + 1 + ! strlen(TABLESPACE_VERSION_DIRECTORY)); ! sprintf(linkloc_with_version_dir, "pg_tblspc/%u/%s", tablespaceoid, ! TABLESPACE_VERSION_DIRECTORY); /* * Check if the tablespace still contains any files. We try to rmdir each *************** *** 582,588 **** * and symlink. We want to allow a new DROP attempt to succeed at * removing the catalog entries, so we should not give a hard error here. */ ! dirdesc = AllocateDir(location); if (dirdesc == NULL) { if (errno == ENOENT) --- 638,644 ---- * and symlink. We want to allow a new DROP attempt to succeed at * removing the catalog entries, so we should not give a hard error here. */ ! dirdesc = AllocateDir(linkloc_with_version_dir); if (dirdesc == NULL) { if (errno == ENOENT) *************** *** 591,622 **** ereport(WARNING, (errcode_for_file_access(), errmsg("could not open directory \"%s\": %m", ! location))); ! pfree(location); return true; } /* else let ReadDir report the error */ } ! while ((de = ReadDir(dirdesc, location)) != NULL) { - /* Note we ignore PG_VERSION for the nonce */ if (strcmp(de->d_name, ".") == 0 || ! strcmp(de->d_name, "..") == 0 || ! strcmp(de->d_name, "PG_VERSION") == 0) continue; ! subfile = palloc(strlen(location) + 1 + strlen(de->d_name) + 1); ! sprintf(subfile, "%s/%s", location, de->d_name); /* This check is just to deliver a friendlier error message */ if (!directory_is_empty(subfile)) { FreeDir(dirdesc); return false; } ! /* Do the real deed */ if (rmdir(subfile) < 0) ereport(ERROR, (errcode_for_file_access(), --- 647,678 ---- ereport(WARNING, (errcode_for_file_access(), errmsg("could not open directory \"%s\": %m", ! linkloc_with_version_dir))); ! pfree(linkloc_with_version_dir); return true; } /* else let ReadDir report the error */ } ! while ((de = ReadDir(dirdesc, linkloc_with_version_dir)) != NULL) { if (strcmp(de->d_name, ".") == 0 || ! strcmp(de->d_name, "..") == 0) continue; ! subfile = palloc(strlen(linkloc_with_version_dir) + 1 + strlen(de->d_name) + 1); ! sprintf(subfile, "%s/%s", linkloc_with_version_dir, de->d_name); /* This check is just to deliver a friendlier error message */ if (!directory_is_empty(subfile)) { FreeDir(dirdesc); + pfree(subfile); + pfree(linkloc_with_version_dir); return false; } ! /* remove empty directory */ if (rmdir(subfile) < 0) ereport(ERROR, (errcode_for_file_access(), *************** *** 628,706 **** FreeDir(dirdesc); /* ! * Okay, try to unlink PG_VERSION (we allow it to not be there, even in ! * non-REDO case, for robustness). ! */ ! subfile = palloc(strlen(location) + 11 + 1); ! sprintf(subfile, "%s/PG_VERSION", location); ! ! if (unlink(subfile) < 0) ! { ! if (errno != ENOENT) ! ereport(ERROR, ! (errcode_for_file_access(), ! errmsg("could not remove file \"%s\": %m", ! subfile))); ! } ! ! pfree(subfile); ! ! /* ! * Okay, try to remove the symlink. We must however deal with the * possibility that it's a directory instead of a symlink --- this could * happen during WAL replay (see TablespaceCreateDbspace), and it is also ! * the normal case on Windows. */ ! if (lstat(location, &st) == 0 && S_ISDIR(st.st_mode)) { ! if (rmdir(location) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not remove directory \"%s\": %m", ! location))); } else { ! if (unlink(location) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not remove symbolic link \"%s\": %m", ! location))); } ! pfree(location); return true; } - /* - * write out the PG_VERSION file in the specified directory - */ - static void - write_version_file(const char *path) - { - char *fullname; - FILE *version_file; - - /* Now write the file */ - fullname = palloc(strlen(path) + 11 + 1); - sprintf(fullname, "%s/PG_VERSION", path); - - if ((version_file = AllocateFile(fullname, PG_BINARY_W)) == NULL) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not write to file \"%s\": %m", - fullname))); - fprintf(version_file, "%s\n", PG_MAJORVERSION); - if (FreeFile(version_file)) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not write to file \"%s\": %m", - fullname))); - - pfree(fullname); - } /* * Check if a directory is empty. --- 684,727 ---- FreeDir(dirdesc); + /* remove version directory */ + if (rmdir(linkloc_with_version_dir) < 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not remove directory \"%s\": %m", + linkloc_with_version_dir))); + /* ! * Try to remove the symlink. We must however deal with the * possibility that it's a directory instead of a symlink --- this could * happen during WAL replay (see TablespaceCreateDbspace), and it is also ! * the case on Windows where junction points lstat() as directories. */ ! linkloc = pstrdup(linkloc_with_version_dir); ! get_parent_directory(linkloc); ! if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode)) { ! if (rmdir(linkloc) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not remove directory \"%s\": %m", ! linkloc))); } else { ! if (unlink(linkloc) < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not remove symbolic link \"%s\": %m", ! linkloc))); } ! pfree(linkloc_with_version_dir); ! pfree(linkloc); return true; } /* * Check if a directory is empty. *************** *** 728,733 **** --- 749,755 ---- return true; } + /* * Rename a tablespace */ *************** *** 1336,1370 **** { xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record); char *location = xlrec->ts_path; - char *linkloc; - - /* - * Attempt to coerce target directory to safe permissions. If this - * fails, it doesn't exist or has the wrong owner. - */ - if (chmod(location, 0700) != 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not set permissions on directory \"%s\": %m", - location))); - - /* Create or re-create the PG_VERSION file in the target directory */ - write_version_file(location); - - /* Create the symlink if not already present */ - linkloc = (char *) palloc(OIDCHARS + OIDCHARS + 1); - sprintf(linkloc, "pg_tblspc/%u", xlrec->ts_id); - - if (symlink(location, linkloc) < 0) - { - if (errno != EEXIST) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not create symbolic link \"%s\": %m", - linkloc))); - } ! pfree(linkloc); } else if (info == XLOG_TBLSPC_DROP) { --- 1358,1365 ---- { xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record); char *location = xlrec->ts_path; ! create_tablespace_directories(location, xlrec->ts_id); } else if (info == XLOG_TBLSPC_DROP) { *************** *** 1380,1386 **** * remove all files then do conflict processing and try again, * if currently enabled. */ ! if (!remove_tablespace_directories(xlrec->ts_id, true)) { VirtualTransactionId *temp_file_users; --- 1375,1381 ---- * remove all files then do conflict processing and try again, * if currently enabled. */ ! if (!destroy_tablespace_directories(xlrec->ts_id, true)) { VirtualTransactionId *temp_file_users; *************** *** 1416,1422 **** * exited by now. So lets recheck before we throw an error. * If !process_conflicts then this will just fail again. */ ! if (!remove_tablespace_directories(xlrec->ts_id, true)) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("tablespace %u is not empty", --- 1411,1417 ---- * exited by now. So lets recheck before we throw an error. * If !process_conflicts then this will just fail again. */ ! if (!destroy_tablespace_directories(xlrec->ts_id, true)) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("tablespace %u is not empty", Index: src/backend/storage/file/fd.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/storage/file/fd.c,v retrieving revision 1.152 diff -c -c -r1.152 fd.c *** src/backend/storage/file/fd.c 2 Jan 2010 16:57:51 -0000 1.152 --- src/backend/storage/file/fd.c 8 Jan 2010 01:07:41 -0000 *************** *** 51,56 **** --- 51,57 ---- #include "miscadmin.h" #include "access/xact.h" + #include "catalog/catalog.h" #include "catalog/pg_tablespace.h" #include "storage/fd.h" #include "storage/ipc.h" *************** *** 963,970 **** else { /* All other tablespaces are accessed via symlinks */ ! snprintf(tempdirpath, sizeof(tempdirpath), "pg_tblspc/%u/%s", ! tblspcOid, PG_TEMP_FILES_DIR); } /* --- 964,971 ---- else { /* All other tablespaces are accessed via symlinks */ ! snprintf(tempdirpath, sizeof(tempdirpath), "pg_tblspc/%u/%s/%s", ! tblspcOid, TABLESPACE_VERSION_DIRECTORY, PG_TEMP_FILES_DIR); } /* *************** *** 1841,1848 **** strcmp(spc_de->d_name, "..") == 0) continue; ! snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s", ! spc_de->d_name, PG_TEMP_FILES_DIR); RemovePgTempFilesInDir(temp_path); } --- 1842,1849 ---- strcmp(spc_de->d_name, "..") == 0) continue; ! snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s/%s", ! spc_de->d_name, TABLESPACE_VERSION_DIRECTORY, PG_TEMP_FILES_DIR); RemovePgTempFilesInDir(temp_path); } Index: src/backend/utils/adt/dbsize.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/dbsize.c,v retrieving revision 1.25 diff -c -c -r1.25 dbsize.c *** src/backend/utils/adt/dbsize.c 2 Jan 2010 16:57:53 -0000 1.25 --- src/backend/utils/adt/dbsize.c 8 Jan 2010 01:07:41 -0000 *************** *** 108,115 **** strcmp(direntry->d_name, "..") == 0) continue; ! snprintf(pathname, MAXPGPATH, "pg_tblspc/%s/%u", ! direntry->d_name, dbOid); totalsize += db_dir_size(pathname); } --- 108,115 ---- strcmp(direntry->d_name, "..") == 0) continue; ! snprintf(pathname, MAXPGPATH, "pg_tblspc/%s/%s/%u", ! direntry->d_name, TABLESPACE_VERSION_DIRECTORY, dbOid); totalsize += db_dir_size(pathname); } *************** *** 179,185 **** else if (tblspcOid == GLOBALTABLESPACE_OID) snprintf(tblspcPath, MAXPGPATH, "global"); else ! snprintf(tblspcPath, MAXPGPATH, "pg_tblspc/%u", tblspcOid); dirdesc = AllocateDir(tblspcPath); --- 179,186 ---- else if (tblspcOid == GLOBALTABLESPACE_OID) snprintf(tblspcPath, MAXPGPATH, "global"); else ! snprintf(tblspcPath, MAXPGPATH, "pg_tblspc/%u/%s", tblspcOid, ! TABLESPACE_VERSION_DIRECTORY); dirdesc = AllocateDir(tblspcPath); Index: src/backend/utils/adt/misc.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/misc.c,v retrieving revision 1.73 diff -c -c -r1.73 misc.c *** src/backend/utils/adt/misc.c 2 Jan 2010 16:57:54 -0000 1.73 --- src/backend/utils/adt/misc.c 8 Jan 2010 01:07:41 -0000 *************** *** 20,25 **** --- 20,26 ---- #include <math.h> #include "access/xact.h" + #include "catalog/catalog.h" #include "catalog/pg_type.h" #include "catalog/pg_tablespace.h" #include "commands/dbcommands.h" *************** *** 185,191 **** /* * size = tablespace dirname length + dir sep char + oid + terminator */ ! fctx->location = (char *) palloc(10 + 10 + 1); if (tablespaceOid == GLOBALTABLESPACE_OID) { fctx->dirdesc = NULL; --- 186,193 ---- /* * size = tablespace dirname length + dir sep char + oid + terminator */ ! fctx->location = (char *) palloc(9 + 1 + OIDCHARS + 1 + ! strlen(TABLESPACE_VERSION_DIRECTORY) + 1); if (tablespaceOid == GLOBALTABLESPACE_OID) { fctx->dirdesc = NULL; *************** *** 197,203 **** if (tablespaceOid == DEFAULTTABLESPACE_OID) sprintf(fctx->location, "base"); else ! sprintf(fctx->location, "pg_tblspc/%u", tablespaceOid); fctx->dirdesc = AllocateDir(fctx->location); --- 199,206 ---- if (tablespaceOid == DEFAULTTABLESPACE_OID) sprintf(fctx->location, "base"); else ! sprintf(fctx->location, "pg_tblspc/%u/%s", tablespaceOid, ! TABLESPACE_VERSION_DIRECTORY); fctx->dirdesc = AllocateDir(fctx->location); Index: src/backend/utils/cache/relcache.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v retrieving revision 1.296 diff -c -c -r1.296 relcache.c *** src/backend/utils/cache/relcache.c 5 Jan 2010 01:06:56 -0000 1.296 --- src/backend/utils/cache/relcache.c 8 Jan 2010 01:07:50 -0000 *************** *** 4297,4304 **** if (strspn(de->d_name, "0123456789") == strlen(de->d_name)) { /* Scan the tablespace dir for per-database dirs */ ! snprintf(path, sizeof(path), "%s/%s", ! tblspcdir, de->d_name); RelationCacheInitFileRemoveInDir(path); } } --- 4297,4304 ---- if (strspn(de->d_name, "0123456789") == strlen(de->d_name)) { /* Scan the tablespace dir for per-database dirs */ ! snprintf(path, sizeof(path), "%s/%s/%s", ! tblspcdir, de->d_name, TABLESPACE_VERSION_DIRECTORY); RelationCacheInitFileRemoveInDir(path); } } Index: src/include/catalog/catalog.h =================================================================== RCS file: /cvsroot/pgsql/src/include/catalog/catalog.h,v retrieving revision 1.46 diff -c -c -r1.46 catalog.h *** src/include/catalog/catalog.h 6 Jan 2010 02:41:37 -0000 1.46 --- src/include/catalog/catalog.h 8 Jan 2010 01:07:50 -0000 *************** *** 14,24 **** --- 14,27 ---- #ifndef CATALOG_H #define CATALOG_H + #include "catalog/catversion.h" #include "catalog/pg_class.h" #include "storage/relfilenode.h" #include "utils/relcache.h" #define OIDCHARS 10 /* max chars printed by %u */ + #define TABLESPACE_VERSION_DIRECTORY "PG_" PG_MAJORVERSION "_" \ + CppAsString2(CATALOG_VERSION_NO) extern const char *forkNames[]; extern ForkNumber forkname_to_number(char *forkName); Index: src/test/regress/output/tablespace.source =================================================================== RCS file: /cvsroot/pgsql/src/test/regress/output/tablespace.source,v retrieving revision 1.9 diff -c -c -r1.9 tablespace.source *** src/test/regress/output/tablespace.source 5 Jan 2010 21:54:00 -0000 1.9 --- src/test/regress/output/tablespace.source 8 Jan 2010 01:07:50 -0000 *************** *** 65,71 **** -- Will fail with bad path CREATE TABLESPACE badspace LOCATION '/no/such/location'; ! ERROR: could not set permissions on directory "/no/such/location": No such file or directory -- No such tablespace CREATE TABLE bar (i int) TABLESPACE nosuchspace; ERROR: tablespace "nosuchspace" does not exist --- 65,71 ---- -- Will fail with bad path CREATE TABLESPACE badspace LOCATION '/no/such/location'; ! ERROR: directory "/no/such/location" does not exist -- No such tablespace CREATE TABLE bar (i int) TABLESPACE nosuchspace; ERROR: tablespace "nosuchspace" does not exist
Bruce Momjian wrote: > Bruce Momjian wrote: > > 2) Right now pg_migrator renames old tablespaces to .old, which fails > > if the tablespaces are on mount points. I have already received a > > report of such a failure. $PGDATA also has that issue, but that > > renaming has to be done by the user before pg_migrator is run, and only > > if they want to keep the same $PGDATA value after migration, i.e. no > > version-specific directory path. One idea we floated around was to have > > tablespaces use major version directory names under the tablespace > > directory so renaming would not be necessary. I could implement a > > pg_migrator --delete-old flag to cleanly delete the old 8.4 server files > > which are not in a version-specific subdirectory. > > I have created a patch to implement per-cluster directories in > tablespaces. This is for use by pg_migrator so it doesn't have to > rename the tablespaces during the migration. Users still need to remove > the old cluster's tablespace subdirectory, and I can add a --delete-old > option to pg_migrator to do that. > > The old code used a symlink from pg_tblspc/#### to the location > directory specified in CREATE TABLESPACE. During CREATE TABLESPACE, a > PG_VERSION file is created containing the major version number. Anytime > a database object is created in the tablespace, a per-database directory > is created. > > With the new code in this patch, pg_tblspc/#### points to the CREATE > TABLESPACE directory just like before, but a new directory, PG_ + > major_version + catalog_version, e.g. PG_8.5_201001061, is created and > all per-database directories are created under that directory. This > directory has the same purpose as the old PG_VERSION file. One > disadvantage of this approach is that functions that need to look inside > tablespaces must now also specify the version directory, e.g. > pg_tablespace_databases(). > > An alternative approach would be for the pg_tblspc/#### symbolic link to > point to the new version directory, PG_*, but that makes removal of the > version directory complicated, particularly during WAL replay where we > don't have access to the system catalogs, and readlink() to read the > symbolic link target is not supported on all operating systems > (particularly Win32). > > I used the version directory pattern "PG_8.5_201001061" because "PG_" > helps people realize the directory is for the use of Postgres > (PG_VERSION is gone in tablespaces), and the catalog version number > enables alpha migrations. The major version number is not necessary but > probably useful for administrators. > > pg_migrator is going to need to know about the version directory too, > and it can't use the C macro --- it has to construct the directory > pattern based on the contents of pg_control from the old and new > servers. And, it is going to be difficult to run pg_control on the old > server for pg_migrator --delete-old after migration because it is > renamed to pg_control.old --- I will need to create a symbolic link > during the time I run pg_controldata. Also, the contents of the > tablespace directory for an 8.4 to 8.5 migration is going to be ugly > because there will be many numeric directories (for databases), and > PG_VERSION (for 8.4), and the PG_8.5_201001061 directory which should > not be touched. > > Can someone explain why TablespaceCreateDbspace() creates a non-symlink > directory during recovery if the symlink is missing? Is it just for > robustness? I would like to document that more clearly. Applied. FYI, I decide to create a pg_migrator_remove_old_cluster.sh/.bat file that can be run by the user after the upgrade, instead of adding a --delete-old-cluster option to pg_migrator. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
FYI, I consider all the issues below to be addressed (we did all but #4), and pg_migrator will take advantage of these new facilities for 8.5. --------------------------------------------------------------------------- Bruce Momjian wrote: > pg_migrator has become more popular recently, so it seems time to look > at some enhancements that would improve pg_migrator. None of these are > required, but rather changes that would be nice to have: > > 1) Right now pg_migrator preserves relfilenodes for TOAST files because > this is required for proper migration. Now that we have shown that > strategically-placed global variables with a server-side function to set > them is a viable solution, it would be nice to preserve all relfilenodes > from the old server. This would simplify pg_migrator by no long > requiring place-holder relfilenodes or the renaming of TOAST files. A > simpler solution would just be to allow TOAST table creation to > automatically remove placeholder files and create specified relfilenodes > via global variables. > > 2) Right now pg_migrator renames old tablespaces to .old, which fails > if the tablespaces are on mount points. I have already received a > report of such a failure. $PGDATA also has that issue, but that > renaming has to be done by the user before pg_migrator is run, and only > if they want to keep the same $PGDATA value after migration, i.e. no > version-specific directory path. One idea we floated around was to have > tablespaces use major version directory names under the tablespace > directory so renaming would not be necessary. I could implement a > pg_migrator --delete-old flag to cleanly delete the old 8.4 server files > which are not in a version-specific subdirectory. > > 3) There is no easy way to analyze all databases. vacuumdb --analyze > does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an > unnecessary vacuum. Right now I recommend ANALYZE in every database, > but it would be nice if there were a single command which did this. > > 4) I have implemented the ability to run pg_migrator --check on a live > old server. However, pg_migrator uses information from controldata to > check things, and it also needs xid information that is only available > via pg_resetxlog -n(no update) to perform the migration. Unfortunately, > pg_resetxlog -n cannot be run on a live server, so pg_migrator runs > pg_controldata for --check and pg_resetxlog -n for real upgrades. It > would simplify pg_migrator if I would run pg_resetxlog -n on a live > server, but I can understand if people don't want to do that because the > xid information reported on a live server is inaccurate. > > Comments? > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +