Thread: pg_migrator issues

pg_migrator issues

From

Bruce Momjian

Date:

30 December 2009, 23:50:21

pg_migrator has become more popular recently, so it seems time to look
at some enhancements that would improve pg_migrator.  None of these are
required, but rather changes that would be nice to have:

1)  Right now pg_migrator preserves relfilenodes for TOAST files because
this is required for proper migration.  Now that we have shown that
strategically-placed global variables with a server-side function to set
them is a viable solution, it would be nice to preserve all relfilenodes
from the old server.  This would simplify pg_migrator by no long
requiring place-holder relfilenodes or the renaming of TOAST files.  A
simpler solution would just be to allow TOAST table creation to
automatically remove placeholder files and create specified relfilenodes
via global variables.

2)  Right now pg_migrator renames old tablespaces to .old, which fails
if the tablespaces are on mount points.  I have already received a
report of such a failure.  $PGDATA also has that issue, but that
renaming has to be done by the user before pg_migrator is run, and only
if they want to keep the same $PGDATA value after migration, i.e. no
version-specific directory path.  One idea we floated around was to have
tablespaces use major version directory names under the tablespace
directory so renaming would not be necessary.  I could implement a
pg_migrator --delete-old flag to cleanly delete the old 8.4 server files
which are not in a version-specific subdirectory.

3)  There is no easy way to analyze all databases.  vacuumdb --analyze
does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
unnecessary vacuum.  Right now I recommend ANALYZE in every database,
but it would be nice if there were a single command which did this.

4)  I have implemented the ability to run pg_migrator --check on a live
old server.  However, pg_migrator uses information from controldata to
check things, and it also needs xid information that is only available
via pg_resetxlog -n(no update) to perform the migration.  Unfortunately,
pg_resetxlog -n cannot be run on a live server, so pg_migrator runs
pg_controldata for --check and pg_resetxlog -n for real upgrades.  It
would simplify pg_migrator if I would run pg_resetxlog -n on a live
server, but I can understand if people don't want to do that because the
xid information reported on a live server is inaccurate.

Comments?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

03 January 2010, 18:43:30

Bruce Momjian wrote:
> pg_migrator has become more popular recently, so it seems time to look
> at some enhancements that would improve pg_migrator.  None of these are
> required, but rather changes that would be nice to have:
> 
> 1)  Right now pg_migrator preserves relfilenodes for TOAST files because
> this is required for proper migration.  Now that we have shown that
> strategically-placed global variables with a server-side function to set
> them is a viable solution, it would be nice to preserve all relfilenodes
> from the old server.  This would simplify pg_migrator by no long
> requiring place-holder relfilenodes or the renaming of TOAST files.  A
> simpler solution would just be to allow TOAST table creation to
> automatically remove placeholder files and create specified relfilenodes
> via global variables.
> 
> 2)  Right now pg_migrator renames old tablespaces to .old, which fails
> if the tablespaces are on mount points.  I have already received a
> report of such a failure.  $PGDATA also has that issue, but that
> renaming has to be done by the user before pg_migrator is run, and only
> if they want to keep the same $PGDATA value after migration, i.e. no
> version-specific directory path.  One idea we floated around was to have
> tablespaces use major version directory names under the tablespace
> directory so renaming would not be necessary.  I could implement a
> pg_migrator --delete-old flag to cleanly delete the old 8.4 server files
> which are not in a version-specific subdirectory.
> 
> 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
> unnecessary vacuum.  Right now I recommend ANALYZE in every database,
> but it would be nice if there were a single command which did this.
> 
> 4)  I have implemented the ability to run pg_migrator --check on a live
> old server.  However, pg_migrator uses information from controldata to
> check things, and it also needs xid information that is only available
> via pg_resetxlog -n(no update) to perform the migration.  Unfortunately,
> pg_resetxlog -n cannot be run on a live server, so pg_migrator runs
> pg_controldata for --check and pg_resetxlog -n for real upgrades.  It
> would simplify pg_migrator if I would run pg_resetxlog -n on a live
> server, but I can understand if people don't want to do that because the
> xid information reported on a live server is inaccurate.
> 
> Comments?

Having received no replies to my email above, I assume the community
would like to review and perhaps approve patches to implement all of
these items.  I will start working on the patches.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Robert Haas

Date:

03 January 2010, 23:26:30

Bruce,

Sorry for not having replied sooner...

On Sun, Jan 3, 2010 at 5:43 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> 1)  Right now pg_migrator preserves relfilenodes for TOAST files because
>> this is required for proper migration.  Now that we have shown that
>> strategically-placed global variables with a server-side function to set
>> them is a viable solution, it would be nice to preserve all relfilenodes
>> from the old server.  This would simplify pg_migrator by no long
>> requiring place-holder relfilenodes or the renaming of TOAST files.  A
>> simpler solution would just be to allow TOAST table creation to
>> automatically remove placeholder files and create specified relfilenodes
>> via global variables.

I have no opinion on this one way or the other.

>> 2)  Right now pg_migrator renames old tablespaces to .old, which fails
>> if the tablespaces are on mount points.  I have already received a
>> report of such a failure.  $PGDATA also has that issue, but that
>> renaming has to be done by the user before pg_migrator is run, and only
>> if they want to keep the same $PGDATA value after migration, i.e. no
>> version-specific directory path.  One idea we floated around was to have
>> tablespaces use major version directory names under the tablespace
>> directory so renaming would not be necessary.  I could implement a
>> pg_migrator --delete-old flag to cleanly delete the old 8.4 server files
>> which are not in a version-specific subdirectory.

I don't really like this.  It seems klunky, and it seems like there
ought to be a way to avoid needing to rename the tablespace
directories at all.

>> 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
>> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
>> unnecessary vacuum.  Right now I recommend ANALYZE in every database,
>> but it would be nice if there were a single command which did this.

Something like vacuumdb --analyze-only?  It seems like overkill to
create a whole new command for this, even though vacuumdb doesn't
quite make sense.

>> 4)  I have implemented the ability to run pg_migrator --check on a live
>> old server.  However, pg_migrator uses information from controldata to
>> check things, and it also needs xid information that is only available
>> via pg_resetxlog -n(no update) to perform the migration.  Unfortunately,
>> pg_resetxlog -n cannot be run on a live server, so pg_migrator runs
>> pg_controldata for --check and pg_resetxlog -n for real upgrades.  It
>> would simplify pg_migrator if I would run pg_resetxlog -n on a live
>> server, but I can understand if people don't want to do that because the
>> xid information reported on a live server is inaccurate.

I don't really have a specific thought on this issue, except that it
sounds like you're launching a lot of shell commands, and I wonder
whether it would be better to try to do this through either C code or
by exposing the appropriate stuff at the SQL level.

...Robert

Re: pg_migrator issues

From

Alvaro Herrera

Date:

04 January 2010, 11:14:21

Bruce Momjian wrote:
> pg_migrator has become more popular recently, so it seems time to look
> at some enhancements that would improve pg_migrator.  None of these are
> required, but rather changes that would be nice to have:
> 
> 1)  Right now pg_migrator preserves relfilenodes for TOAST files because
> this is required for proper migration.  Now that we have shown that
> strategically-placed global variables with a server-side function to set
> them is a viable solution, it would be nice to preserve all relfilenodes
> from the old server.  This would simplify pg_migrator by no long
> requiring place-holder relfilenodes or the renaming of TOAST files.  A
> simpler solution would just be to allow TOAST table creation to
> automatically remove placeholder files and create specified relfilenodes
> via global variables.

Getting rid of the need for placeholders is a good idea.  +1 on getting
TOAST tables created with the correct relfilenode from the start.  I
don't know that preserving any other relfilenode is useful; however if
it means you no longer have to rename the files underlying each table,
it would probably also be a good idea.  (I don't know how does
pg_migrator deal with such things currently -- does it keep a map of
table name to relfilenode?)

> 2)  Right now pg_migrator renames old tablespaces to .old, which fails
> if the tablespaces are on mount points.  I have already received a
> report of such a failure.

I thought it was impossible to use bare mountpoints as tablespaces due
to ownership problems ... Is that not the case?  -1 for special hacks
that work around bogus setups, if that means intrusive changes to the
core code.

> 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
> unnecessary vacuum.  Right now I recommend ANALYZE in every database,
> but it would be nice if there were a single command which did this.

+1 for vacuumdb --analyze-only

> 4)  I have implemented the ability to run pg_migrator --check on a live
> old server.  However, pg_migrator uses information from controldata to
> check things, and it also needs xid information that is only available
> via pg_resetxlog -n(no update) to perform the migration.  Unfortunately,
> pg_resetxlog -n cannot be run on a live server, so pg_migrator runs
> pg_controldata for --check and pg_resetxlog -n for real upgrades.  It
> would simplify pg_migrator if I would run pg_resetxlog -n on a live
> server, but I can understand if people don't want to do that because the
> xid information reported on a live server is inaccurate.

What xid info does it need?  Would it be good enough to use the "next
XID" from most recent checkpoint from pg_controldata?  It is a bit
outdated, but can't you simply add some value to it to have a safety margin?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: pg_migrator issues

From

Tom Lane

Date:

04 January 2010, 11:19:02

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Getting rid of the need for placeholders is a good idea.  +1 on getting
> TOAST tables created with the correct relfilenode from the start.  I
> don't know that preserving any other relfilenode is useful; however if
> it means you no longer have to rename the files underlying each table,
> it would probably also be a good idea.

I think this is an all-or-nothing proposition: if you try to preserve
only some relfilenodes, you risk collisions with automatically assigned
ones.  It's just like the situation with pg_type OIDs.

I concur that trying to preserve them looks like it would be less work
than the current method.
        regards, tom lane

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 14:06:12

Alvaro Herrera wrote:
> > 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
> > does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
> > unnecessary vacuum.  Right now I recommend ANALYZE in every database,
> > but it would be nice if there were a single command which did this.
>
> +1 for vacuumdb --analyze-only

OK, I have implemented this using --only-analyze to avoid having the
'--anal' option spelling be ambiguous, which might confuse/frustrate
users.

I also moved the --freeze option documention mention into a more logical
place.

Patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/ref/vacuumdb.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/vacuumdb.sgml,v
retrieving revision 1.45
diff -c -c -r1.45 vacuumdb.sgml
*** doc/src/sgml/ref/vacuumdb.sgml    27 Nov 2009 17:41:26 -0000    1.45
--- doc/src/sgml/ref/vacuumdb.sgml    4 Jan 2010 17:21:34 -0000
***************
*** 24,32 ****
     <command>vacuumdb</command>
     <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
     <group><arg>--full</arg><arg>-f</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--freeze</arg><arg>-F</arg></group>
     <arg>--table | -t <replaceable>table</replaceable>
      <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg>
     </arg>
--- 24,33 ----
     <command>vacuumdb</command>
     <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
     <group><arg>--full</arg><arg>-f</arg></group>
+    <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--only-analyze</arg><arg>-o</arg></group>
     <arg>--table | -t <replaceable>table</replaceable>
      <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg>
     </arg>
***************
*** 36,44 ****
     <arg rep="repeat"><replaceable>connection-options</replaceable></arg>
     <group><arg>--all</arg><arg>-a</arg></group>
     <group><arg>--full</arg><arg>-f</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--freeze</arg><arg>-F</arg></group>
    </cmdsynopsis>
   </refsynopsisdiv>

--- 37,46 ----
     <arg rep="repeat"><replaceable>connection-options</replaceable></arg>
     <group><arg>--all</arg><arg>-a</arg></group>
     <group><arg>--full</arg><arg>-f</arg></group>
+    <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--only-analyze</arg><arg>-o</arg></group>
    </cmdsynopsis>
   </refsynopsisdiv>

***************
*** 56,63 ****
    <para>
     <application>vacuumdb</application> is a wrapper around the SQL
     command <xref linkend="SQL-VACUUM" endterm="SQL-VACUUM-title">.
!    There is no effective difference between vacuuming databases via
!    this utility and via other methods for accessing the server.
    </para>

   </refsect1>
--- 58,66 ----
    <para>
     <application>vacuumdb</application> is a wrapper around the SQL
     command <xref linkend="SQL-VACUUM" endterm="SQL-VACUUM-title">.
!    There is no effective difference between vacuuming and analyzing
!    databases via this utility and via other methods for accessing the
!    server.
    </para>

   </refsect1>
***************
*** 117,122 ****
--- 120,145 ----
       </varlistentry>

       <varlistentry>
+       <term><option>-F</option></term>
+       <term><option>--freeze</option></term>
+       <listitem>
+        <para>
+         Aggressively <quote>freeze</quote> tuples.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term><option>-o</option></term>
+       <term><option>--only-analyze</option></term>
+       <listitem>
+        <para>
+         Only calculate statistics for use by the optimizer (no vacuum).
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
        <term><option>-q</></term>
        <term><option>--quiet</></term>
        <listitem>
***************
*** 133,139 ****
         <para>
          Clean or analyze <replaceable class="parameter">table</replaceable> only.
          Column names can be specified only in conjunction with
!         the <option>--analyze</option> option.
         </para>
         <tip>
          <para>
--- 156,162 ----
         <para>
          Clean or analyze <replaceable class="parameter">table</replaceable> only.
          Column names can be specified only in conjunction with
!         the <option>--analyze</option> or <option>--only-analyze</option> options.
         </para>
         <tip>
          <para>
***************
*** 164,178 ****
        </listitem>
       </varlistentry>

-      <varlistentry>
-       <term><option>-F</option></term>
-       <term><option>--freeze</option></term>
-       <listitem>
-        <para>
-         Aggressively <quote>freeze</quote> tuples.
-        </para>
-       </listitem>
-      </varlistentry>
      </variablelist>
     </para>

--- 187,192 ----
Index: src/bin/scripts/vacuumdb.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/scripts/vacuumdb.c,v
retrieving revision 1.28
diff -c -c -r1.28 vacuumdb.c
*** src/bin/scripts/vacuumdb.c    2 Jan 2010 16:58:00 -0000    1.28
--- src/bin/scripts/vacuumdb.c    4 Jan 2010 17:21:34 -0000
***************
*** 14,25 ****
  #include "common.h"


! static void vacuum_one_database(const char *dbname, bool full, bool verbose, bool analyze,
!                     bool freeze, const char *table,
!                     const char *host, const char *port,
                      const char *username, enum trivalue prompt_password,
                      const char *progname, bool echo);
! static void vacuum_all_databases(bool full, bool verbose, bool analyze, bool freeze,
                       const char *host, const char *port,
                       const char *username, enum trivalue prompt_password,
                       const char *progname, bool echo, bool quiet);
--- 14,26 ----
  #include "common.h"


! static void vacuum_one_database(const char *dbname, bool full, bool verbose,
!                     bool and_analyze, bool only_analyze, bool freeze,
!                     const char *table, const char *host, const char *port,
                      const char *username, enum trivalue prompt_password,
                      const char *progname, bool echo);
! static void vacuum_all_databases(bool full, bool verbose, bool and_analyze,
!                      bool only_analyze, bool freeze,
                       const char *host, const char *port,
                       const char *username, enum trivalue prompt_password,
                       const char *progname, bool echo, bool quiet);
***************
*** 40,45 ****
--- 41,47 ----
          {"quiet", no_argument, NULL, 'q'},
          {"dbname", required_argument, NULL, 'd'},
          {"analyze", no_argument, NULL, 'z'},
+         {"only-analyze", no_argument, NULL, 'o'},
          {"freeze", no_argument, NULL, 'F'},
          {"all", no_argument, NULL, 'a'},
          {"table", required_argument, NULL, 't'},
***************
*** 59,65 ****
      enum trivalue prompt_password = TRI_DEFAULT;
      bool        echo = false;
      bool        quiet = false;
!     bool        analyze = false;
      bool        freeze = false;
      bool        alldb = false;
      char       *table = NULL;
--- 61,68 ----
      enum trivalue prompt_password = TRI_DEFAULT;
      bool        echo = false;
      bool        quiet = false;
!     bool        and_analyze = false;
!     bool        only_analyze = false;
      bool        freeze = false;
      bool        alldb = false;
      char       *table = NULL;
***************
*** 100,106 ****
                  dbname = optarg;
                  break;
              case 'z':
!                 analyze = true;
                  break;
              case 'F':
                  freeze = true;
--- 103,112 ----
                  dbname = optarg;
                  break;
              case 'z':
!                 and_analyze = true;
!                 break;
!             case 'o':
!                 only_analyze = true;
                  break;
              case 'F':
                  freeze = true;
***************
*** 139,144 ****
--- 145,167 ----

      setup_cancel_handler();

+     if (only_analyze)
+     {
+         if (full)
+         {
+             fprintf(stderr, _("%s: cannot use the \"full\" option when performing only analyze\n"),
+                     progname);
+             exit(1);
+         }
+         if (freeze)
+         {
+             fprintf(stderr, _("%s: cannot use the \"freeze\" option when performing only analyze\n"),
+                     progname);
+             exit(1);
+         }
+         /* ignore 'and_analyze' */
+     }
+
      if (alldb)
      {
          if (dbname)
***************
*** 154,160 ****
              exit(1);
          }

!         vacuum_all_databases(full, verbose, analyze, freeze,
                               host, port, username, prompt_password,
                               progname, echo, quiet);
      }
--- 177,183 ----
              exit(1);
          }

!         vacuum_all_databases(full, verbose, and_analyze, only_analyze, freeze,
                               host, port, username, prompt_password,
                               progname, echo, quiet);
      }
***************
*** 170,176 ****
                  dbname = get_user_name(progname);
          }

!         vacuum_one_database(dbname, full, verbose, analyze, freeze, table,
                              host, port, username, prompt_password,
                              progname, echo);
      }
--- 193,200 ----
                  dbname = get_user_name(progname);
          }

!         vacuum_one_database(dbname, full, verbose, and_analyze, only_analyze,
!                             freeze, table,
                              host, port, username, prompt_password,
                              progname, echo);
      }
***************
*** 180,187 ****


  static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool analyze,
!                     bool freeze, const char *table,
                      const char *host, const char *port,
                      const char *username, enum trivalue prompt_password,
                      const char *progname, bool echo)
--- 204,211 ----


  static void
! vacuum_one_database(const char *dbname, bool full, bool verbose, bool and_analyze,
!                     bool only_analyze, bool freeze, const char *table,
                      const char *host, const char *port,
                      const char *username, enum trivalue prompt_password,
                      const char *progname, bool echo)
***************
*** 192,206 ****

      initPQExpBuffer(&sql);

!     appendPQExpBuffer(&sql, "VACUUM");
!     if (full)
!         appendPQExpBuffer(&sql, " FULL");
!     if (freeze)
!         appendPQExpBuffer(&sql, " FREEZE");
      if (verbose)
          appendPQExpBuffer(&sql, " VERBOSE");
-     if (analyze)
-         appendPQExpBuffer(&sql, " ANALYZE");
      if (table)
          appendPQExpBuffer(&sql, " %s", table);
      appendPQExpBuffer(&sql, ";\n");
--- 216,235 ----

      initPQExpBuffer(&sql);

!     if (only_analyze)
!         appendPQExpBuffer(&sql, "ANALYZE");
!     else
!     {
!         appendPQExpBuffer(&sql, "VACUUM");
!         if (full)
!             appendPQExpBuffer(&sql, " FULL");
!         if (freeze)
!             appendPQExpBuffer(&sql, " FREEZE");
!         if (and_analyze)
!             appendPQExpBuffer(&sql, " ANALYZE");
!     }
      if (verbose)
          appendPQExpBuffer(&sql, " VERBOSE");
      if (table)
          appendPQExpBuffer(&sql, " %s", table);
      appendPQExpBuffer(&sql, ";\n");
***************
*** 223,230 ****


  static void
! vacuum_all_databases(bool full, bool verbose, bool analyze, bool freeze,
!                      const char *host, const char *port,
                       const char *username, enum trivalue prompt_password,
                       const char *progname, bool echo, bool quiet)
  {
--- 252,259 ----


  static void
! vacuum_all_databases(bool full, bool verbose, bool and_analyze, bool only_analyze,
!                      bool freeze, const char *host, const char *port,
                       const char *username, enum trivalue prompt_password,
                       const char *progname, bool echo, bool quiet)
  {
***************
*** 246,253 ****
              fflush(stdout);
          }

!         vacuum_one_database(dbname, full, verbose, analyze, freeze, NULL,
!                             host, port, username, prompt_password,
                              progname, echo);
      }

--- 275,282 ----
              fflush(stdout);
          }

!         vacuum_one_database(dbname, full, verbose, and_analyze, only_analyze,
!                             freeze, NULL, host, port, username, prompt_password,
                              progname, echo);
      }

***************
*** 267,272 ****
--- 296,302 ----
      printf(_("  -e, --echo                      show the commands being sent to the server\n"));
      printf(_("  -f, --full                      do full vacuuming\n"));
      printf(_("  -F, --freeze                    freeze row transaction information\n"));
+     printf(_("  -o, --only-analyze              only update optimizer hints\n"));
      printf(_("  -q, --quiet                     don't write any messages\n"));
      printf(_("  -t, --table='TABLE[(COLUMNS)]'  vacuum specific table only\n"));
      printf(_("  -v, --verbose                   write a lot of output\n"));

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 14:07:11

Robert Haas wrote:
> >> 3) ?There is no easy way to analyze all databases. ?vacuumdb --analyze
> >> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
> >> unnecessary vacuum. ?Right now I recommend ANALYZE in every database,
> >> but it would be nice if there were a single command which did this.
> 
> Something like vacuumdb --analyze-only?  It seems like overkill to
> create a whole new command for this, even though vacuumdb doesn't
> quite make sense.

Yea, I am not excited about having vacuumdb do only analyze, but it
seems the most minimal solution.  I spelled it --only-analyze and just
posted the reason and patch.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 14:12:57

Robert Haas wrote:
> >> 4) ?I have implemented the ability to run pg_migrator --check on a live
> >> old server. ?However, pg_migrator uses information from controldata to
> >> check things, and it also needs xid information that is only available
> >> via pg_resetxlog -n(no update) to perform the migration. ?Unfortunately,
> >> pg_resetxlog -n cannot be run on a live server, so pg_migrator runs
> >> pg_controldata for --check and pg_resetxlog -n for real upgrades. ?It
> >> would simplify pg_migrator if I would run pg_resetxlog -n on a live
> >> server, but I can understand if people don't want to do that because the
> >> xid information reported on a live server is inaccurate.
> 
> I don't really have a specific thought on this issue, except that it
> sounds like you're launching a lot of shell commands, and I wonder
> whether it would be better to try to do this through either C code or
> by exposing the appropriate stuff at the SQL level.

I considered that but realize that pg_migrator has to read
pg_controldata in both the old and new servers, meaning it would need
access to both C structures, and considering they both have the same
structure names, that would require some odd C tricks.  Add to that you
don't know which version of Postgres you are migrating from/to during
compile and the idea of using C becomes even less attractive.

Doing this in C would require pg_migrator to track all changes in the
pg_controldata structure layout, which seems excessively
complex/error-prone.  Right now I only have to track changes to the
naming of the output fields.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Alvaro Herrera

Date:

04 January 2010, 14:29:08

Bruce Momjian escribió:

> I considered that but realize that pg_migrator has to read
> pg_controldata in both the old and new servers, meaning it would need
> access to both C structures, and considering they both have the same
> structure names, that would require some odd C tricks.  Add to that you
> don't know which version of Postgres you are migrating from/to during
> compile and the idea of using C becomes even less attractive.

However, keep in mind that this might not be the last time on which we
will want to read something from a C struct, so perhaps it would be good
to bite the bullet and write the odd tricks.  Does it already have
access (at compile time) to the old and new source trees?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 14:52:12

Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > pg_migrator has become more popular recently, so it seems time to look
> > at some enhancements that would improve pg_migrator.  None of these are
> > required, but rather changes that would be nice to have:
> > 
> > 1)  Right now pg_migrator preserves relfilenodes for TOAST files because
> > this is required for proper migration.  Now that we have shown that
> > strategically-placed global variables with a server-side function to set
> > them is a viable solution, it would be nice to preserve all relfilenodes
> > from the old server.  This would simplify pg_migrator by no long
> > requiring place-holder relfilenodes or the renaming of TOAST files.  A
> > simpler solution would just be to allow TOAST table creation to
> > automatically remove placeholder files and create specified relfilenodes
> > via global variables.
> 
> Getting rid of the need for placeholders is a good idea.  +1 on getting
> TOAST tables created with the correct relfilenode from the start.  I
> don't know that preserving any other relfilenode is useful; however if
> it means you no longer have to rename the files underlying each table,
> it would probably also be a good idea.  (I don't know how does
> pg_migrator deal with such things currently -- does it keep a map of
> table name to relfilenode?)

Yea, as Tom said later, there are two options.  Either we create
placeholder files and then remove the place-holders when we create the
toast tables or we just preserve all relfilenodes --- I think the later
is easier.

> > 4)  I have implemented the ability to run pg_migrator --check on a live
> > old server.  However, pg_migrator uses information from controldata to
> > check things, and it also needs xid information that is only available
> > via pg_resetxlog -n(no update) to perform the migration.  Unfortunately,
> > pg_resetxlog -n cannot be run on a live server, so pg_migrator runs
> > pg_controldata for --check and pg_resetxlog -n for real upgrades.  It
> > would simplify pg_migrator if I would run pg_resetxlog -n on a live
> > server, but I can understand if people don't want to do that because the
> > xid information reported on a live server is inaccurate.
> 
> What xid info does it need?  Would it be good enough to use the "next
> XID" from most recent checkpoint from pg_controldata?  It is a bit
> outdated, but can't you simply add some value to it to have a safety margin?

Well, I am not much into 'safety margins' with pg_migrator, meaning I
want to get the most reliable value I can --- I have no idea what that
safety margin would be.  Right now pg_migrator works fine by calling
pg_controldata or pg_resetxlog as appropriate.  I was hoping to allow
pg_resetxlog -n on a live server.  Is that something we should avoid?
I really don't need the change --- it would just simplify pg_migrator.

I was just really asking if disallowing pg_resetxlog -n on a live server
is planned behavior or an oversight.  I can see the logic that it should
be disallowed but I am just looking for confirmation from someone and I
can then drop the issue.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 14:54:32

Alvaro Herrera wrote:
> Bruce Momjian escribi?:
> 
> > I considered that but realize that pg_migrator has to read
> > pg_controldata in both the old and new servers, meaning it would need
> > access to both C structures, and considering they both have the same
> > structure names, that would require some odd C tricks.  Add to that you
> > don't know which version of Postgres you are migrating from/to during
> > compile and the idea of using C becomes even less attractive.
> 
> However, keep in mind that this might not be the last time on which we
> will want to read something from a C struct, so perhaps it would be good
> to bite the bullet and write the odd tricks.  Does it already have
> access (at compile time) to the old and new source trees?

No, only the new soure tree, or actually any source tree, but ideally
the new one.  Remember we have Win32 binaries being built, and right now
there is limited linkage between pg_migrator and the backend code.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 15:29:43

Alvaro Herrera wrote:
> > 2)  Right now pg_migrator renames old tablespaces to .old, which fails
> > if the tablespaces are on mount points.  I have already received a
> > report of such a failure.
> 
> I thought it was impossible to use bare mountpoints as tablespaces due
> to ownership problems ... Is that not the case?  -1 for special hacks
> that work around bogus setups, if that means intrusive changes to the
> core code.

I talked to the person who reported the problem and he and I confirmed
that it is quite easy to make the mount point be owned by the postgres
user and have that function as a tablespace.  Is that not a supported
setup?  There is probably a larger problem that the tablespace must be
located in a directory that has directory rename permission for
postgres.  I have updated the pg_migrator INSTALL file to mention this
issue.

As far as .old, we could create the tablespaces as *.new, but that kind
of defeats the existing recommended pg_migrator usage where we tell the
user to rename PGDATA to .old before running pg_migrator.

It was actually Tom's idea months ago to put a version-specific
directory in the tablespace.  I don't think it is necessary, and we can
live with the mount point limitation.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Tom Lane

Date:

04 January 2010, 15:33:52

Bruce Momjian <bruce@momjian.us> writes:
> I was just really asking if disallowing pg_resetxlog -n on a live server
> is planned behavior or an oversight.  I can see the logic that it should
> be disallowed but I am just looking for confirmation from someone and I
> can then drop the issue.

Well, it's not only a matter of "are we going to clobber live state",
it's also "is the state that we are looking at changing under us?".
The -n switch only covers the first point.  I think it would require
some careful analysis, and testing that's never been done, before having
any confidence in the results of pg_resetxlog on a live server.

Why should you need this anyway?  pg_migrator should not be having to
run pg_resetxlog on the old installation, I would think.
        regards, tom lane

Re: pg_migrator issues

From

Robert Haas

Date:

04 January 2010, 15:40:52

On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Alvaro Herrera wrote:
>> > 2)  Right now pg_migrator renames old tablespaces to .old, which fails
>> > if the tablespaces are on mount points.  I have already received a
>> > report of such a failure.
>>
>> I thought it was impossible to use bare mountpoints as tablespaces due
>> to ownership problems ... Is that not the case?  -1 for special hacks
>> that work around bogus setups, if that means intrusive changes to the
>> core code.
>
> I talked to the person who reported the problem and he and I confirmed
> that it is quite easy to make the mount point be owned by the postgres
> user and have that function as a tablespace.  Is that not a supported
> setup?  There is probably a larger problem that the tablespace must be
> located in a directory that has directory rename permission for
> postgres.  I have updated the pg_migrator INSTALL file to mention this
> issue.
>
> As far as .old, we could create the tablespaces as *.new, but that kind
> of defeats the existing recommended pg_migrator usage where we tell the
> user to rename PGDATA to .old before running pg_migrator.
>
> It was actually Tom's idea months ago to put a version-specific
> directory in the tablespace.  I don't think it is necessary, and we can
> live with the mount point limitation.

What doesn't work if we just don't rename the tablespace at all?  And
can't we put some smarts into the backend to handle that thing?

...Robert

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 15:52:04

Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I was just really asking if disallowing pg_resetxlog -n on a live server
> > is planned behavior or an oversight.  I can see the logic that it should
> > be disallowed but I am just looking for confirmation from someone and I
> > can then drop the issue.
> 
> Well, it's not only a matter of "are we going to clobber live state",
> it's also "is the state that we are looking at changing under us?".
> The -n switch only covers the first point.  I think it would require
> some careful analysis, and testing that's never been done, before having
> any confidence in the results of pg_resetxlog on a live server.

Yea, that was my analysis too.  I will discard the idea and just keep
the pg_migrator code that does either.

> Why should you need this anyway?  pg_migrator should not be having to
> run pg_resetxlog on the old installation, I would think.

Well, the same code is run on the new and old server.  The complex issue
is that the same code that checks for matching controldata settings
(check mode) is the same that pulls the xid from the old server to set
it on the new one.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 15:52:50

Robert Haas wrote:
> On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Alvaro Herrera wrote:
> >> > 2) ?Right now pg_migrator renames old tablespaces to .old, which fails
> >> > if the tablespaces are on mount points. ?I have already received a
> >> > report of such a failure.
> >>
> >> I thought it was impossible to use bare mountpoints as tablespaces due
> >> to ownership problems ... Is that not the case? ?-1 for special hacks
> >> that work around bogus setups, if that means intrusive changes to the
> >> core code.
> >
> > I talked to the person who reported the problem and he and I confirmed
> > that it is quite easy to make the mount point be owned by the postgres
> > user and have that function as a tablespace. ?Is that not a supported
> > setup? ?There is probably a larger problem that the tablespace must be
> > located in a directory that has directory rename permission for
> > postgres. ?I have updated the pg_migrator INSTALL file to mention this
> > issue.
> >
> > As far as .old, we could create the tablespaces as *.new, but that kind
> > of defeats the existing recommended pg_migrator usage where we tell the
> > user to rename PGDATA to .old before running pg_migrator.
> >
> > It was actually Tom's idea months ago to put a version-specific
> > directory in the tablespace. ?I don't think it is necessary, and we can
> > live with the mount point limitation.
> 
> What doesn't work if we just don't rename the tablespace at all?  And
> can't we put some smarts into the backend to handle that thing?

Well, when you restore the old dump's schema into the new server, the
tablespace directory path will be the same, so we had better not have
any directory there.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Robert Haas

Date:

04 January 2010, 16:31:01

On Mon, Jan 4, 2010 at 2:52 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > Alvaro Herrera wrote:
>> >> > 2) ?Right now pg_migrator renames old tablespaces to .old, which fails
>> >> > if the tablespaces are on mount points. ?I have already received a
>> >> > report of such a failure.
>> >>
>> >> I thought it was impossible to use bare mountpoints as tablespaces due
>> >> to ownership problems ... Is that not the case? ?-1 for special hacks
>> >> that work around bogus setups, if that means intrusive changes to the
>> >> core code.
>> >
>> > I talked to the person who reported the problem and he and I confirmed
>> > that it is quite easy to make the mount point be owned by the postgres
>> > user and have that function as a tablespace. ?Is that not a supported
>> > setup? ?There is probably a larger problem that the tablespace must be
>> > located in a directory that has directory rename permission for
>> > postgres. ?I have updated the pg_migrator INSTALL file to mention this
>> > issue.
>> >
>> > As far as .old, we could create the tablespaces as *.new, but that kind
>> > of defeats the existing recommended pg_migrator usage where we tell the
>> > user to rename PGDATA to .old before running pg_migrator.
>> >
>> > It was actually Tom's idea months ago to put a version-specific
>> > directory in the tablespace. ?I don't think it is necessary, and we can
>> > live with the mount point limitation.
>>
>> What doesn't work if we just don't rename the tablespace at all?  And
>> can't we put some smarts into the backend to handle that thing?
>
> Well, when you restore the old dump's schema into the new server, the
> tablespace directory path will be the same, so we had better not have
> any directory there.

Well that seems like something you could work around by hacking the
contents of the dump...

...Robert

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 16:33:47

Robert Haas wrote:
> On Mon, Jan 4, 2010 at 2:52 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> > Alvaro Herrera wrote:
> >> >> > 2) ?Right now pg_migrator renames old tablespaces to .old, which fails
> >> >> > if the tablespaces are on mount points. ?I have already received a
> >> >> > report of such a failure.
> >> >>
> >> >> I thought it was impossible to use bare mountpoints as tablespaces due
> >> >> to ownership problems ... Is that not the case? ?-1 for special hacks
> >> >> that work around bogus setups, if that means intrusive changes to the
> >> >> core code.
> >> >
> >> > I talked to the person who reported the problem and he and I confirmed
> >> > that it is quite easy to make the mount point be owned by the postgres
> >> > user and have that function as a tablespace. ?Is that not a supported
> >> > setup? ?There is probably a larger problem that the tablespace must be
> >> > located in a directory that has directory rename permission for
> >> > postgres. ?I have updated the pg_migrator INSTALL file to mention this
> >> > issue.
> >> >
> >> > As far as .old, we could create the tablespaces as *.new, but that kind
> >> > of defeats the existing recommended pg_migrator usage where we tell the
> >> > user to rename PGDATA to .old before running pg_migrator.
> >> >
> >> > It was actually Tom's idea months ago to put a version-specific
> >> > directory in the tablespace. ?I don't think it is necessary, and we can
> >> > live with the mount point limitation.
> >>
> >> What doesn't work if we just don't rename the tablespace at all? ?And
> >> can't we put some smarts into the backend to handle that thing?
> >
> > Well, when you restore the old dump's schema into the new server, the
> > tablespace directory path will be the same, so we had better not have
> > any directory there.
> 
> Well that seems like something you could work around by hacking the
> contents of the dump...

True, in --binary-upgrade mode, but what do we make it?  *.new?  What if
they want to have the same tablespace names after the upgrade?  It
really gets ugly if we are on a mount point because the tablespaces will
be in different file systems, which makes --link mode impossible, and
might create files in a filesystem that doesn't have enough space.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 16:34:27

Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > > 2)  Right now pg_migrator renames old tablespaces to .old, which fails
> > > if the tablespaces are on mount points.  I have already received a
> > > report of such a failure.
> > 
> > I thought it was impossible to use bare mountpoints as tablespaces due
> > to ownership problems ... Is that not the case?  -1 for special hacks
> > that work around bogus setups, if that means intrusive changes to the
> > core code.
> 
> I talked to the person who reported the problem and he and I confirmed
> that it is quite easy to make the mount point be owned by the postgres
> user and have that function as a tablespace.  Is that not a supported
> setup?  There is probably a larger problem that the tablespace must be
> located in a directory that has directory rename permission for
> postgres.  I have updated the pg_migrator INSTALL file to mention this
> issue.

Oh, the actual INSTALL warning is:
If you are using tablespaces, there must be sufficient directorypermissions to allow each tablespace directory to be
renamedwith a".old" suffix.
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Robert Haas

Date:

04 January 2010, 17:43:20

On Mon, Jan 4, 2010 at 3:33 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Mon, Jan 4, 2010 at 2:52 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > Robert Haas wrote:
>> >> On Mon, Jan 4, 2010 at 2:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> >> > Alvaro Herrera wrote:
>> >> >> > 2) ?Right now pg_migrator renames old tablespaces to .old, which fails
>> >> >> > if the tablespaces are on mount points. ?I have already received a
>> >> >> > report of such a failure.
>> >> >>
>> >> >> I thought it was impossible to use bare mountpoints as tablespaces due
>> >> >> to ownership problems ... Is that not the case? ?-1 for special hacks
>> >> >> that work around bogus setups, if that means intrusive changes to the
>> >> >> core code.
>> >> >
>> >> > I talked to the person who reported the problem and he and I confirmed
>> >> > that it is quite easy to make the mount point be owned by the postgres
>> >> > user and have that function as a tablespace. ?Is that not a supported
>> >> > setup? ?There is probably a larger problem that the tablespace must be
>> >> > located in a directory that has directory rename permission for
>> >> > postgres. ?I have updated the pg_migrator INSTALL file to mention this
>> >> > issue.
>> >> >
>> >> > As far as .old, we could create the tablespaces as *.new, but that kind
>> >> > of defeats the existing recommended pg_migrator usage where we tell the
>> >> > user to rename PGDATA to .old before running pg_migrator.
>> >> >
>> >> > It was actually Tom's idea months ago to put a version-specific
>> >> > directory in the tablespace. ?I don't think it is necessary, and we can
>> >> > live with the mount point limitation.
>> >>
>> >> What doesn't work if we just don't rename the tablespace at all? ?And
>> >> can't we put some smarts into the backend to handle that thing?
>> >
>> > Well, when you restore the old dump's schema into the new server, the
>> > tablespace directory path will be the same, so we had better not have
>> > any directory there.
>>
>> Well that seems like something you could work around by hacking the
>> contents of the dump...
>
> True, in --binary-upgrade mode, but what do we make it?  *.new?  What if
> they want to have the same tablespace names after the upgrade?  It
> really gets ugly if we are on a mount point because the tablespaces will
> be in different file systems, which makes --link mode impossible, and
> might create files in a filesystem that doesn't have enough space.

But can't we just call a special function first before running the
CREATE TABLESPACE, like:

pg_tablespace_dont_really_create_this_tablespace_because_we_are_in_pg_migrator()?

Sorta like what you did to preserve ENUM OIDs, etc.?

...Robert

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 17:53:46

Robert Haas wrote:
> >> >> What doesn't work if we just don't rename the tablespace at all? ?And
> >> >> can't we put some smarts into the backend to handle that thing?
> >> >
> >> > Well, when you restore the old dump's schema into the new server, the
> >> > tablespace directory path will be the same, so we had better not have
> >> > any directory there.
> >>
> >> Well that seems like something you could work around by hacking the
> >> contents of the dump...
> >
> > True, in --binary-upgrade mode, but what do we make it? ?*.new? ?What if
> > they want to have the same tablespace names after the upgrade? ?It
> > really gets ugly if we are on a mount point because the tablespaces will
> > be in different file systems, which makes --link mode impossible, and
> > might create files in a filesystem that doesn't have enough space.
> 
> But can't we just call a special function first before running the
> CREATE TABLESPACE, like:
> 
> pg_tablespace_dont_really_create_this_tablespace_because_we_are_in_pg_migrator()?
> 
> Sorta like what you did to preserve ENUM OIDs, etc.?

Well, the problem is that we are creating something in a file system,
and the old and new contents of the tablespace directories must exist
after the migration (in case the migration is reverted).  We were able
to get away with this for enum because we were only creating this in the
_new_ database.  With the file system, we have a resource/namespace
shared between the old and new server.

What were you thinking this function call would do?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Robert Haas

Date:

04 January 2010, 18:04:30

On Mon, Jan 4, 2010 at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> >> >> What doesn't work if we just don't rename the tablespace at all? ?And
>> >> >> can't we put some smarts into the backend to handle that thing?
>> >> >
>> >> > Well, when you restore the old dump's schema into the new server, the
>> >> > tablespace directory path will be the same, so we had better not have
>> >> > any directory there.
>> >>
>> >> Well that seems like something you could work around by hacking the
>> >> contents of the dump...
>> >
>> > True, in --binary-upgrade mode, but what do we make it? ?*.new? ?What if
>> > they want to have the same tablespace names after the upgrade? ?It
>> > really gets ugly if we are on a mount point because the tablespaces will
>> > be in different file systems, which makes --link mode impossible, and
>> > might create files in a filesystem that doesn't have enough space.
>>
>> But can't we just call a special function first before running the
>> CREATE TABLESPACE, like:
>>
>> pg_tablespace_dont_really_create_this_tablespace_because_we_are_in_pg_migrator()?
>>
>> Sorta like what you did to preserve ENUM OIDs, etc.?
>
> Well, the problem is that we are creating something in a file system,
> and the old and new contents of the tablespace directories must exist
> after the migration (in case the migration is reverted).  We were able
> to get away with this for enum because we were only creating this in the
> _new_ database.  With the file system, we have a resource/namespace
> shared between the old and new server.

Oh, I thought you were hard-linking the files, not copying them, so
the old directory would be destroyed anyway.

> What were you thinking this function call would do?

Basically, make PostgreSQL not reinitialize the directory as it
normally would when a new tablespace is created.  Or make it ignore
the existence of the directory until told to stop ignoring it.  Or
whatever is needed to avoid having to move the thing around.  Sorry,
I'm hand-waving wildly here; I haven't read the code.  Maybe I should
shut up.

...Robert

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 18:14:23

Robert Haas wrote:
> >> But can't we just call a special function first before running the
> >> CREATE TABLESPACE, like:
> >>
> >> pg_tablespace_dont_really_create_this_tablespace_because_we_are_in_pg_migrator()?
> >>
> >> Sorta like what you did to preserve ENUM OIDs, etc.?
> >
> > Well, the problem is that we are creating something in a file system,
> > and the old and new contents of the tablespace directories must exist
> > after the migration (in case the migration is reverted). ?We were able
> > to get away with this for enum because we were only creating this in the
> > _new_ database. ?With the file system, we have a resource/namespace
> > shared between the old and new server.
> 
> Oh, I thought you were hard-linking the files, not copying them, so
> the old directory would be destroyed anyway.

The default mode is COPY but there is a --link option.  You are right
that if we only did linking things would be much simpler.

> > What were you thinking this function call would do?
> 
> Basically, make PostgreSQL not reinitialize the directory as it
> normally would when a new tablespace is created.  Or make it ignore
> the existence of the directory until told to stop ignoring it.  Or
> whatever is needed to avoid having to move the thing around.  Sorry,
> I'm hand-waving wildly here; I haven't read the code.  Maybe I should
> shut up.

Sorry, I can't figure out how that would work.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 23:28:14

Bruce Momjian wrote:
> pg_migrator has become more popular recently, so it seems time to look
> at some enhancements that would improve pg_migrator.  None of these are
> required, but rather changes that would be nice to have:
>
> 1)  Right now pg_migrator preserves relfilenodes for TOAST files because
> this is required for proper migration.  Now that we have shown that
> strategically-placed global variables with a server-side function to set
> them is a viable solution, it would be nice to preserve all relfilenodes
> from the old server.  This would simplify pg_migrator by no long
> requiring place-holder relfilenodes or the renaming of TOAST files.  A
> simpler solution would just be to allow TOAST table creation to
> automatically remove placeholder files and create specified relfilenodes
> via global variables.

Attached is a patch that implements #1 above by preserving all
relfilenodes, with pg_dump support.   It uses the same method I used for
preserving pg_type/pg_enum.  I have tested this on the regression
database and it successfully preserved all relfilenodes.

This patch also removes the 'force' parameter in toast functions that
Tom added for 8.4 --- it is no longer needed.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/catalog/heap.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/heap.c,v
retrieving revision 1.364
diff -c -c -r1.364 heap.c
*** src/backend/catalog/heap.c    2 Jan 2010 16:57:36 -0000    1.364
--- src/backend/catalog/heap.c    5 Jan 2010 02:37:21 -0000
***************
*** 96,101 ****
--- 96,104 ----
                 char *relname);
  static List *insert_ordered_unique_oid(List *list, Oid datum);

+ Oid binary_upgrade_next_heap_relfilenode = InvalidOid;
+ Oid binary_upgrade_next_toast_relfilenode = InvalidOid;
+

  /* ----------------------------------------------------------------
   *                XXX UGLY HARD CODED BADNESS FOLLOWS XXX
***************
*** 942,956 ****
                       errmsg("only shared relations can be placed in pg_global tablespace")));
      }

!     /*
!      * Allocate an OID for the relation, unless we were told what to use.
!      *
!      * The OID will be the relfilenode as well, so make sure it doesn't
!      * collide with either pg_class OIDs or existing physical files.
!      */
!     if (!OidIsValid(relid))
          relid = GetNewRelFileNode(reltablespace, shared_relation,
                                    pg_class_desc);

      /*
       * Determine the relation's initial permissions.
--- 945,973 ----
                       errmsg("only shared relations can be placed in pg_global tablespace")));
      }

!     if ((relkind == RELKIND_RELATION || relkind == RELKIND_SEQUENCE) &&
!         OidIsValid(binary_upgrade_next_heap_relfilenode))
!     {
!         relid = binary_upgrade_next_heap_relfilenode;
!         binary_upgrade_next_heap_relfilenode = InvalidOid;
!     }
!     else if (relkind == RELKIND_TOASTVALUE &&
!         OidIsValid(binary_upgrade_next_toast_relfilenode))
!     {
!         relid = binary_upgrade_next_toast_relfilenode;
!         binary_upgrade_next_toast_relfilenode = InvalidOid;
!     }
!     else if (!OidIsValid(relid))
!     {
!         /*
!          * Allocate an OID for the relation, unless we were told what to use.
!          *
!          * The OID will be the relfilenode as well, so make sure it doesn't
!          * collide with either pg_class OIDs or existing physical files.
!          */
          relid = GetNewRelFileNode(reltablespace, shared_relation,
                                    pg_class_desc);
+     }

      /*
       * Determine the relation's initial permissions.
Index: src/backend/catalog/index.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.328
diff -c -c -r1.328 index.c
*** src/backend/catalog/index.c    2 Jan 2010 16:57:36 -0000    1.328
--- src/backend/catalog/index.c    5 Jan 2010 02:37:21 -0000
***************
*** 79,84 ****
--- 79,87 ----
                  tups_inserted;
  } v_i_state;

+ /* For simple relation creation, this is the toast index relfilenode */
+ Oid binary_upgrade_next_index_relfilenode = InvalidOid;
+
  /* non-export function prototypes */
  static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
                           IndexInfo *indexInfo,
***************
*** 640,654 ****
                                              accessMethodObjectId,
                                              classObjectId);

!     /*
!      * Allocate an OID for the index, unless we were told what to use.
!      *
!      * The OID will be the relfilenode as well, so make sure it doesn't
!      * collide with either pg_class OIDs or existing physical files.
!      */
!     if (!OidIsValid(indexRelationId))
          indexRelationId = GetNewRelFileNode(tableSpaceId, shared_relation,
                                              pg_class);

      /*
       * create the index relation's relcache entry and physical disk file. (If
--- 643,664 ----
                                              accessMethodObjectId,
                                              classObjectId);

!     if (OidIsValid(binary_upgrade_next_index_relfilenode))
!     {
!         indexRelationId = binary_upgrade_next_index_relfilenode;
!         binary_upgrade_next_index_relfilenode = InvalidOid;
!     }
!     else if (!OidIsValid(indexRelationId))
!     {
!         /*
!          * Allocate an OID for the index, unless we were told what to use.
!          *
!          * The OID will be the relfilenode as well, so make sure it doesn't
!          * collide with either pg_class OIDs or existing physical files.
!          */
          indexRelationId = GetNewRelFileNode(tableSpaceId, shared_relation,
                                              pg_class);
+     }

      /*
       * create the index relation's relcache entry and physical disk file. (If
Index: src/backend/catalog/toasting.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/toasting.c,v
retrieving revision 1.26
diff -c -c -r1.26 toasting.c
*** src/backend/catalog/toasting.c    2 Jan 2010 16:57:36 -0000    1.26
--- src/backend/catalog/toasting.c    5 Jan 2010 02:37:21 -0000
***************
*** 32,53 ****
  #include "utils/syscache.h"

  Oid binary_upgrade_next_pg_type_toast_oid = InvalidOid;

  static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
!                    Datum reloptions, bool force);
  static bool needs_toast_table(Relation rel);


  /*
   * AlterTableCreateToastTable
   *        If the table needs a toast table, and doesn't already have one,
!  *        then create a toast table for it.  (With the force option, make
!  *        a toast table even if it appears unnecessary.)
!  *
!  * The caller can also specify the OID to be used for the toast table.
!  * Usually, toastOid should be InvalidOid to allow a free OID to be assigned.
!  * (This option, as well as the force option, is not used by core Postgres,
!  * but is provided to support pg_migrator.)
   *
   * reloptions for the toast table can be passed, too.  Pass (Datum) 0
   * for default reloptions.
--- 32,48 ----
  #include "utils/syscache.h"

  Oid binary_upgrade_next_pg_type_toast_oid = InvalidOid;
+ extern Oid binary_upgrade_next_toast_relfilenode;

  static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
!                    Datum reloptions);
  static bool needs_toast_table(Relation rel);


  /*
   * AlterTableCreateToastTable
   *        If the table needs a toast table, and doesn't already have one,
!  *        then create a toast table for it.
   *
   * reloptions for the toast table can be passed, too.  Pass (Datum) 0
   * for default reloptions.
***************
*** 57,64 ****
   * to end with CommandCounterIncrement if it makes any changes.
   */
  void
! AlterTableCreateToastTable(Oid relOid, Oid toastOid,
!                            Datum reloptions, bool force)
  {
      Relation    rel;

--- 52,58 ----
   * to end with CommandCounterIncrement if it makes any changes.
   */
  void
! AlterTableCreateToastTable(Oid relOid, Datum reloptions)
  {
      Relation    rel;

***************
*** 70,76 ****
      rel = heap_open(relOid, AccessExclusiveLock);

      /* create_toast_table does all the work */
!     (void) create_toast_table(rel, toastOid, InvalidOid, reloptions, force);

      heap_close(rel, NoLock);
  }
--- 64,70 ----
      rel = heap_open(relOid, AccessExclusiveLock);

      /* create_toast_table does all the work */
!     (void) create_toast_table(rel, InvalidOid, InvalidOid, reloptions);

      heap_close(rel, NoLock);
  }
***************
*** 96,102 ****
                          relName)));

      /* create_toast_table does all the work */
!     if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0, false))
          elog(ERROR, "\"%s\" does not require a toast table",
               relName);

--- 90,96 ----
                          relName)));

      /* create_toast_table does all the work */
!     if (!create_toast_table(rel, toastOid, toastIndexOid, (Datum) 0))
          elog(ERROR, "\"%s\" does not require a toast table",
               relName);

***************
*** 108,119 ****
   * create_toast_table --- internal workhorse
   *
   * rel is already opened and exclusive-locked
!  * toastOid and toastIndexOid are normally InvalidOid, but
!  * either or both can be nonzero to specify caller-assigned OIDs
   */
  static bool
! create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
!                    Datum reloptions, bool force)
  {
      Oid            relOid = RelationGetRelid(rel);
      HeapTuple    reltup;
--- 102,112 ----
   * create_toast_table --- internal workhorse
   *
   * rel is already opened and exclusive-locked
!  * toastOid and toastIndexOid are normally InvalidOid, but during
!  * bootstrap they can be nonzero to specify hand-assigned OIDs
   */
  static bool
! create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptions)
  {
      Oid            relOid = RelationGetRelid(rel);
      HeapTuple    reltup;
***************
*** 152,163 ****

      /*
       * Check to see whether the table actually needs a TOAST table.
!      *
!      * Caller can optionally override this check.  (Note: at present no
!      * callers in core Postgres do so, but this option is needed by
!      * pg_migrator.)
       */
!     if (!force && !needs_toast_table(rel))
          return false;

      /*
--- 145,154 ----

      /*
       * Check to see whether the table actually needs a TOAST table.
!      * If the relfilenode is specified, force toast file creation.
       */
!     if (!needs_toast_table(rel) &&
!         !OidIsValid(binary_upgrade_next_toast_relfilenode))
          return false;

      /*
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.189
diff -c -c -r1.189 cluster.c
*** src/backend/commands/cluster.c    2 Jan 2010 16:57:37 -0000    1.189
--- src/backend/commands/cluster.c    5 Jan 2010 02:37:21 -0000
***************
*** 743,749 ****
          if (isNull)
              reloptions = (Datum) 0;
      }
!     AlterTableCreateToastTable(OIDNewHeap, InvalidOid, reloptions, false);

      if (OidIsValid(toastid))
          ReleaseSysCache(tuple);
--- 743,749 ----
          if (isNull)
              reloptions = (Datum) 0;
      }
!     AlterTableCreateToastTable(OIDNewHeap, reloptions);

      if (OidIsValid(toastid))
          ReleaseSysCache(tuple);
Index: src/backend/commands/tablecmds.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/tablecmds.c,v
retrieving revision 1.313
diff -c -c -r1.313 tablecmds.c
*** src/backend/commands/tablecmds.c    2 Jan 2010 16:57:37 -0000    1.313
--- src/backend/commands/tablecmds.c    5 Jan 2010 02:37:21 -0000
***************
*** 2614,2621 ****
              (tab->subcmds[AT_PASS_ADD_COL] ||
               tab->subcmds[AT_PASS_ALTER_TYPE] ||
               tab->subcmds[AT_PASS_COL_ATTRS]))
!             AlterTableCreateToastTable(tab->relid, InvalidOid,
!                                        (Datum) 0, false);
      }
  }

--- 2614,2620 ----
              (tab->subcmds[AT_PASS_ADD_COL] ||
               tab->subcmds[AT_PASS_ALTER_TYPE] ||
               tab->subcmds[AT_PASS_COL_ATTRS]))
!             AlterTableCreateToastTable(tab->relid, (Datum) 0);
      }
  }

Index: src/backend/executor/execMain.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.339
diff -c -c -r1.339 execMain.c
*** src/backend/executor/execMain.c    2 Jan 2010 16:57:40 -0000    1.339
--- src/backend/executor/execMain.c    5 Jan 2010 02:37:22 -0000
***************
*** 2194,2200 ****

      (void) heap_reloptions(RELKIND_TOASTVALUE, reloptions, true);

!     AlterTableCreateToastTable(intoRelationId, InvalidOid, reloptions, false);

      /*
       * And open the constructed table for writing.
--- 2194,2200 ----

      (void) heap_reloptions(RELKIND_TOASTVALUE, reloptions, true);

!     AlterTableCreateToastTable(intoRelationId, reloptions);

      /*
       * And open the constructed table for writing.
Index: src/backend/tcop/utility.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/utility.c,v
retrieving revision 1.326
diff -c -c -r1.326 utility.c
*** src/backend/tcop/utility.c    2 Jan 2010 16:57:53 -0000    1.326
--- src/backend/tcop/utility.c    5 Jan 2010 02:37:22 -0000
***************
*** 491,504 ****
                                                              "toast",
                                                              validnsps,
                                                              true, false);
!                         (void) heap_reloptions(RELKIND_TOASTVALUE,
!                                                toast_options,
                                                 true);

!                         AlterTableCreateToastTable(relOid,
!                                                    InvalidOid,
!                                                    toast_options,
!                                                    false);
                      }
                      else
                      {
--- 491,500 ----
                                                              "toast",
                                                              validnsps,
                                                              true, false);
!                         (void) heap_reloptions(RELKIND_TOASTVALUE, toast_options,
                                                 true);

!                         AlterTableCreateToastTable(relOid, toast_options);
                      }
                      else
                      {
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.564
diff -c -c -r1.564 pg_dump.c
*** src/bin/pg_dump/pg_dump.c    2 Jan 2010 16:57:59 -0000    1.564
--- src/bin/pg_dump/pg_dump.c    5 Jan 2010 02:37:22 -0000
***************
*** 200,206 ****
                      PQExpBuffer upgrade_buffer, Oid pg_type_oid);
  static bool binary_upgrade_set_type_oids_by_rel_oid(
                      PQExpBuffer upgrade_buffer, Oid pg_rel_oid);
! static void binary_upgrade_clear_pg_type_toast_oid(PQExpBuffer upgrade_buffer);
  static const char *getAttrName(int attrnum, TableInfo *tblInfo);
  static const char *fmtCopyColumnList(const TableInfo *ti);
  static void do_sql_command(PGconn *conn, const char *query);
--- 200,207 ----
                      PQExpBuffer upgrade_buffer, Oid pg_type_oid);
  static bool binary_upgrade_set_type_oids_by_rel_oid(
                      PQExpBuffer upgrade_buffer, Oid pg_rel_oid);
! static void binary_upgrade_set_relfilenodes(PQExpBuffer upgrade_buffer,
!                     Oid pg_class_oid, bool is_index);
  static const char *getAttrName(int attrnum, TableInfo *tblInfo);
  static const char *fmtCopyColumnList(const TableInfo *ti);
  static void do_sql_command(PGconn *conn, const char *query);
***************
*** 2289,2309 ****
  }

  static void
! binary_upgrade_clear_pg_type_toast_oid(PQExpBuffer upgrade_buffer)
  {
!     /*
!      *    One complexity is that while the heap might now have a TOAST table,
!      *    the TOAST table might have been created long after creation when
!      *    the table was loaded with wide data.  For that reason, we clear
!      *    binary_upgrade_set_next_pg_type_toast_oid so it is not reused
!      *    by a later table.  Logically any later creation that needs a TOAST
!      *    table should have its own TOAST pg_type oid, but we are cautious.
!      */
!     appendPQExpBuffer(upgrade_buffer,
!         "\n-- For binary upgrade, clear toast oid because it might not have been needed\n");
      appendPQExpBuffer(upgrade_buffer,
!         "SELECT binary_upgrade.set_next_pg_type_oid('%u'::pg_catalog.oid);\n\n",
!         InvalidOid);
  }

  /*
--- 2290,2367 ----
  }

  static void
! binary_upgrade_set_relfilenodes(PQExpBuffer upgrade_buffer, Oid pg_class_oid,
!                                 bool is_index)
  {
!     PQExpBuffer upgrade_query = createPQExpBuffer();
!     int            ntups;
!     PGresult   *upgrade_res;
!     Oid            pg_class_relfilenode;
!     Oid            pg_class_reltoastrelid;
!     Oid            pg_class_reltoastidxid;
!
!     appendPQExpBuffer(upgrade_query,
!                       "SELECT c.relfilenode, c.reltoastrelid, t.reltoastidxid "
!                       "FROM pg_catalog.pg_class c LEFT JOIN "
!                       "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
!                       "WHERE c.oid = '%u'::pg_catalog.oid;",
!                       pg_class_oid);
!
!     upgrade_res = PQexec(g_conn, upgrade_query->data);
!     check_sql_result(upgrade_res, g_conn, upgrade_query->data, PGRES_TUPLES_OK);
!
!     /* Expecting a single result only */
!     ntups = PQntuples(upgrade_res);
!     if (ntups != 1)
!     {
!         write_msg(NULL, ngettext("query returned %d row instead of one: %s\n",
!                                "query returned %d rows instead of one: %s\n",
!                                  ntups),
!                   ntups, upgrade_query->data);
!         exit_nicely();
!     }
!
!     pg_class_relfilenode = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "relfilenode")));
!     pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
!     pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
!
      appendPQExpBuffer(upgrade_buffer,
!                         "\n-- For binary upgrade, must preserve relfilenodes\n");
!
!     if (!is_index)
!         appendPQExpBuffer(upgrade_buffer,
!             "SELECT binary_upgrade.set_next_heap_relfilenode('%u'::pg_catalog.oid);\n",
!             pg_class_relfilenode);
!     else
!         appendPQExpBuffer(upgrade_buffer,
!             "SELECT binary_upgrade.set_next_index_relfilenode('%u'::pg_catalog.oid);\n",
!             pg_class_relfilenode);
!
!     if (OidIsValid(pg_class_reltoastrelid))
!     {
!         /*
!          *  One complexity is that the table definition might not require
!          *    the creation of a TOAST table, and the TOAST table might have
!          *    been created long after table creation, when the table was
!          *    loaded with wide data.  By setting the TOAST relfilenode we
!          *    force creation of the TOAST heap and TOAST index by the
!          *    backend so we can cleanly migrate the files during binary
!          *    migration.
!          */
!
!         appendPQExpBuffer(upgrade_buffer,
!             "SELECT binary_upgrade.set_next_toast_relfilenode('%u'::pg_catalog.oid);\n",
!             pg_class_reltoastrelid);
!
!         /* every toast table has an index */
!         appendPQExpBuffer(upgrade_buffer,
!             "SELECT binary_upgrade.set_next_index_relfilenode('%u'::pg_catalog.oid);\n",
!             pg_class_reltoastidxid);
!     }
!     appendPQExpBuffer(upgrade_buffer, "\n");
!
!     PQclear(upgrade_res);
!     destroyPQExpBuffer(upgrade_query);
  }

  /*
***************
*** 10480,10485 ****
--- 10538,10546 ----
          appendPQExpBuffer(delq, "%s;\n",
                            fmtId(tbinfo->dobj.name));

+         if (binary_upgrade)
+             binary_upgrade_set_relfilenodes(q, tbinfo->dobj.catId.oid, false);
+
          appendPQExpBuffer(q, "CREATE TABLE %s (",
                            fmtId(tbinfo->dobj.name));
          actual_atts = 0;
***************
*** 10781,10789 ****
          }
      }

-     if (binary_upgrade && toast_set)
-         binary_upgrade_clear_pg_type_toast_oid(q);
-
      ArchiveEntry(fout, tbinfo->dobj.catId, tbinfo->dobj.dumpId,
                   tbinfo->dobj.name,
                   tbinfo->dobj.namespace->dobj.name,
--- 10842,10847 ----
***************
*** 10926,10931 ****
--- 10984,10992 ----
       */
      if (indxinfo->indexconstraint == 0)
      {
+         if (binary_upgrade)
+             binary_upgrade_set_relfilenodes(q, indxinfo->dobj.catId.oid, true);
+
          /* Plain secondary index */
          appendPQExpBuffer(q, "%s;\n", indxinfo->indexdef);

***************
*** 11006,11011 ****
--- 11067,11075 ----
              exit_nicely();
          }

+         if (binary_upgrade && !coninfo->condef)
+             binary_upgrade_set_relfilenodes(q, indxinfo->dobj.catId.oid, true);
+
          appendPQExpBuffer(q, "ALTER TABLE ONLY %s\n",
                            fmtId(tbinfo->dobj.name));
          appendPQExpBuffer(q, "    ADD CONSTRAINT %s ",
***************
*** 11416,11422 ****
--- 11480,11489 ----
          resetPQExpBuffer(query);

          if (binary_upgrade)
+         {
+             binary_upgrade_set_relfilenodes(query, tbinfo->dobj.catId.oid, false);
              binary_upgrade_set_type_oids_by_rel_oid(query, tbinfo->dobj.catId.oid);
+         }

          appendPQExpBuffer(query,
                            "CREATE SEQUENCE %s\n",
Index: src/include/catalog/toasting.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/toasting.h,v
retrieving revision 1.11
diff -c -c -r1.11 toasting.h
*** src/include/catalog/toasting.h    2 Jan 2010 16:58:02 -0000    1.11
--- src/include/catalog/toasting.h    5 Jan 2010 02:37:22 -0000
***************
*** 17,24 ****
  /*
   * toasting.c prototypes
   */
! extern void AlterTableCreateToastTable(Oid relOid, Oid toastOid,
!                            Datum reloptions, bool force);
  extern void BootstrapToastTable(char *relName,
                      Oid toastOid, Oid toastIndexOid);

--- 17,23 ----
  /*
   * toasting.c prototypes
   */
! extern void AlterTableCreateToastTable(Oid relOid, Datum reloptions);
  extern void BootstrapToastTable(char *relName,
                      Oid toastOid, Oid toastIndexOid);

Re: pg_migrator issues

From

Bruce Momjian

Date:

04 January 2010, 23:30:18

Bruce Momjian wrote:
> pg_migrator has become more popular recently, so it seems time to look
> at some enhancements that would improve pg_migrator.  None of these are
> required, but rather changes that would be nice to have:
> 
> 1)  Right now pg_migrator preserves relfilenodes for TOAST files because
> this is required for proper migration.  Now that we have shown that
> strategically-placed global variables with a server-side function to set
> them is a viable solution, it would be nice to preserve all relfilenodes
> from the old server.  This would simplify pg_migrator by no long
> requiring place-holder relfilenodes or the renaming of TOAST files.  A
> simpler solution would just be to allow TOAST table creation to
> automatically remove placeholder files and create specified relfilenodes
> via global variables.
> 
> 2)  Right now pg_migrator renames old tablespaces to .old, which fails
> if the tablespaces are on mount points.  I have already received a
> report of such a failure.  $PGDATA also has that issue, but that
> renaming has to be done by the user before pg_migrator is run, and only
> if they want to keep the same $PGDATA value after migration, i.e. no
> version-specific directory path.  One idea we floated around was to have
> tablespaces use major version directory names under the tablespace
> directory so renaming would not be necessary.  I could implement a
> pg_migrator --delete-old flag to cleanly delete the old 8.4 server files
> which are not in a version-specific subdirectory.
> 
> 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
> unnecessary vacuum.  Right now I recommend ANALYZE in every database,
> but it would be nice if there were a single command which did this.
> 
> 4)  I have implemented the ability to run pg_migrator --check on a live
> old server.  However, pg_migrator uses information from controldata to
> check things, and it also needs xid information that is only available
> via pg_resetxlog -n(no update) to perform the migration.  Unfortunately,
> pg_resetxlog -n cannot be run on a live server, so pg_migrator runs
> pg_controldata for --check and pg_resetxlog -n for real upgrades.  It
> would simplify pg_migrator if I would run pg_resetxlog -n on a live
> server, but I can understand if people don't want to do that because the
> xid information reported on a live server is inaccurate.

FYI, for those keeping score, I have posted patches for #1 and #3, and
we have decided not to implement #2 and #4.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Tom Lane

Date:

05 January 2010, 01:20:49

Bruce Momjian <bruce@momjian.us> writes:
> Alvaro Herrera wrote:
>> I thought it was impossible to use bare mountpoints as tablespaces due
>> to ownership problems ... Is that not the case?  -1 for special hacks
>> that work around bogus setups, if that means intrusive changes to the
>> core code.

> I talked to the person who reported the problem and he and I confirmed
> that it is quite easy to make the mount point be owned by the postgres
> user and have that function as a tablespace.  Is that not a supported
> setup?

It might be *possible*, but that doesn't make it a good idea.  The
traditional sysadmin advice in this area is that mount points should
be owned by root.  I don't really remember the reasoning but I'm pretty
sure I remember the principle.

> It was actually Tom's idea months ago to put a version-specific
> directory in the tablespace.

I was just about to re-suggest that.  Why do you think it's such a
bad idea?
        regards, tom lane

Re: pg_migrator issues

From

Tom Lane

Date:

05 January 2010, 01:25:21

Bruce Momjian <bruce@momjian.us> writes:
> Well, when you restore the old dump's schema into the new server, the
> tablespace directory path will be the same, so we had better not have
> any directory there.

If we implicitly added "/8.5" etc at the end of the specified tablespace
path, the problems go away.
        regards, tom lane

Re: pg_migrator issues

From

Bruce Momjian

Date:

05 January 2010, 11:27:52

Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Alvaro Herrera wrote:
> >> I thought it was impossible to use bare mountpoints as tablespaces due
> >> to ownership problems ... Is that not the case?  -1 for special hacks
> >> that work around bogus setups, if that means intrusive changes to the
> >> core code.
> 
> > I talked to the person who reported the problem and he and I confirmed
> > that it is quite easy to make the mount point be owned by the postgres
> > user and have that function as a tablespace.  Is that not a supported
> > setup?
> 
> It might be *possible*, but that doesn't make it a good idea.  The
> traditional sysadmin advice in this area is that mount points should
> be owned by root.  I don't really remember the reasoning but I'm pretty
> sure I remember the principle.

Yea, I think the logic is that files under that directory disappear
after the mount, so you don't want users putting things in there
accidentally.  In fact, the user said they mount a pg_xlog directory
under the $PGDATA directory (rather than use a symlink), which I also
thought was an odd approach and prone to problems if the mount failed
intermittently.

> > It was actually Tom's idea months ago to put a version-specific
> > directory in the tablespace.
> 
> I was just about to re-suggest that.  Why do you think it's such a
> bad idea?

I liked the idea, but I listed it as item #2 and no one else said they
liked it.  The only complexity I can see with the idea is that doing an
upgrade from one alpha to another would have the same major version
number and would therefore not be possible, so maybe we have to use the
catalog version number in there.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Tom Lane

Date:

05 January 2010, 11:54:04

Bruce Momjian <bruce@momjian.us> writes:
> I liked the idea, but I listed it as item #2 and no one else said they
> liked it.  The only complexity I can see with the idea is that doing an
> upgrade from one alpha to another would have the same major version
> number and would therefore not be possible, so maybe we have to use the
> catalog version number in there.

Good point.  Using catversion for the purpose seems a bit ugly but
I have no better ideas.
        regards, tom lane

Re: pg_migrator issues

From

Alvaro Herrera

Date:

05 January 2010, 12:07:05

Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I liked the idea, but I listed it as item #2 and no one else said they
> > liked it.  The only complexity I can see with the idea is that doing an
> > upgrade from one alpha to another would have the same major version
> > number and would therefore not be possible, so maybe we have to use the
> > catalog version number in there.
> 
> Good point.  Using catversion for the purpose seems a bit ugly but
> I have no better ideas.

I thought we had rejected the idea of being able to migrate between
alphas.  Is migrating between major versions not difficult enough?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: pg_migrator issues

From

Robert Haas

Date:

05 January 2010, 12:08:14

On Tue, Jan 5, 2010 at 11:06 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>> > I liked the idea, but I listed it as item #2 and no one else said they
>> > liked it.  The only complexity I can see with the idea is that doing an
>> > upgrade from one alpha to another would have the same major version
>> > number and would therefore not be possible, so maybe we have to use the
>> > catalog version number in there.
>>
>> Good point.  Using catversion for the purpose seems a bit ugly but
>> I have no better ideas.
>
> I thought we had rejected the idea of being able to migrate between
> alphas.  Is migrating between major versions not difficult enough?

We like a challenge.

...Robert

Re: pg_migrator issues

From

Tom Lane

Date:

05 January 2010, 12:11:13

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 5, 2010 at 11:06 AM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>> Tom Lane wrote:
>>> Good point. �Using catversion for the purpose seems a bit ugly but
>>> I have no better ideas.
>> 
>> I thought we had rejected the idea of being able to migrate between
>> alphas. �Is migrating between major versions not difficult enough?

> We like a challenge.

The problem with using just major version there is that then we are
*wiring into the on-disk representation* the assumption that pg_migrator
only goes from one major version to the next.  I agree that we're not
likely to start supporting cross-alpha-version migration any time soon,
but I don't think it's wise to foreclose the possibility of ever doing
it.
        regards, tom lane

Re: pg_migrator issues

From

Bruce Momjian

Date:

05 January 2010, 12:23:17

Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Tue, Jan 5, 2010 at 11:06 AM, Alvaro Herrera
> > <alvherre@commandprompt.com> wrote:
> >> Tom Lane wrote:
> >>> Good point. �Using catversion for the purpose seems a bit ugly but
> >>> I have no better ideas.
> >> 
> >> I thought we had rejected the idea of being able to migrate between
> >> alphas. �Is migrating between major versions not difficult enough?
> 
> > We like a challenge.
> 
> The problem with using just major version there is that then we are
> *wiring into the on-disk representation* the assumption that pg_migrator
> only goes from one major version to the next.  I agree that we're not
> likely to start supporting cross-alpha-version migration any time soon,
> but I don't think it's wise to foreclose the possibility of ever doing
> it.

I know people are trying to make things easier on pg_migrator, but
frankly going from alpha to alpha does not require any new code in
pg_migrator.  And pg_migrator already supports cross-alpha-version
migration just using the existing code --- no new code was added to
enable this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

05 January 2010, 17:00:51

Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > pg_migrator has become more popular recently, so it seems time to look
> > at some enhancements that would improve pg_migrator.  None of these are
> > required, but rather changes that would be nice to have:
> > 
> > 1)  Right now pg_migrator preserves relfilenodes for TOAST files because
> > this is required for proper migration.  Now that we have shown that
> > strategically-placed global variables with a server-side function to set
> > them is a viable solution, it would be nice to preserve all relfilenodes
> > from the old server.  This would simplify pg_migrator by no long
> > requiring place-holder relfilenodes or the renaming of TOAST files.  A
> > simpler solution would just be to allow TOAST table creation to
> > automatically remove placeholder files and create specified relfilenodes
> > via global variables.
> 
> Getting rid of the need for placeholders is a good idea.  +1 on getting
> TOAST tables created with the correct relfilenode from the start.  I
> don't know that preserving any other relfilenode is useful; however if
> it means you no longer have to rename the files underlying each table,

--> > it would probably also be a good idea.  (I don't know how does
--> > pg_migrator deal with such things currently -- does it keep a map of
--> > table name to relfilenode?)

Yes, and it will still need that because we don't want to transfer over
any of the system tables or pg_catalog files.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

05 January 2010, 23:04:59

Bruce Momjian wrote:
> Bruce Momjian wrote:
> > pg_migrator has become more popular recently, so it seems time to look
> > at some enhancements that would improve pg_migrator.  None of these are
> > required, but rather changes that would be nice to have:
> > 
> > 1)  Right now pg_migrator preserves relfilenodes for TOAST files because
> > this is required for proper migration.  Now that we have shown that
> > strategically-placed global variables with a server-side function to set
> > them is a viable solution, it would be nice to preserve all relfilenodes
> > from the old server.  This would simplify pg_migrator by no long
> > requiring place-holder relfilenodes or the renaming of TOAST files.  A
> > simpler solution would just be to allow TOAST table creation to
> > automatically remove placeholder files and create specified relfilenodes
> > via global variables.
> 
> Attached is a patch that implements #1 above by preserving all
> relfilenodes, with pg_dump support.   It uses the same method I used for
> preserving pg_type/pg_enum.  I have tested this on the regression
> database and it successfully preserved all relfilenodes.
> 
> This patch also removes the 'force' parameter in toast functions that
> Tom added for 8.4 --- it is no longer needed.

Applied.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

05 January 2010, 23:05:59

Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > > 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
> > > does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
> > > unnecessary vacuum.  Right now I recommend ANALYZE in every database,
> > > but it would be nice if there were a single command which did this.
> > 
> > +1 for vacuumdb --analyze-only
> 
> OK, I have implemented this using --only-analyze to avoid having the
> '--anal' option spelling be ambiguous, which might confuse/frustrate
> users.
> 
> I also moved the --freeze option documention mention into a more logical
> place.
> 
> Patch attached.

Applied.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

decibel

Date:

06 January 2010, 03:53:11

On Dec 30, 2009, at 9:50 PM, Bruce Momjian wrote:
> 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
> unnecessary vacuum.  Right now I recommend ANALYZE in every database,
> but it would be nice if there were a single command which did this.

I actually started on a patch for this (http://lnk.nu/archives.postgresql.org/14rm.php). IIRC it's pretty close, I just
haven'thad time to come back to it for final cleanup and changing the docs as needed. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: pg_migrator issues

From

decibel

Date:

06 January 2010, 04:10:41

On Jan 6, 2010, at 1:52 AM, decibel wrote:
> On Dec 30, 2009, at 9:50 PM, Bruce Momjian wrote:
>> 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
>> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
>> unnecessary vacuum.  Right now I recommend ANALYZE in every database,
>> but it would be nice if there were a single command which did this.
>
> I actually started on a patch for this (http://lnk.nu/archives.postgresql.org/14rm.php). IIRC it's pretty close, I
justhaven't had time to come back to it for final cleanup and changing the docs as needed. 

Crap, I see I should have read the whole thread before posting. Sorry for the noise (and not getting the patch
completed).:( 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: pg_migrator issues

From

Zdenek Kotala

Date:

06 January 2010, 10:59:05

Dne  4.01.10 19:28, Alvaro Herrera napsal(a):
> Bruce Momjian escribió:
> 
>> I considered that but realize that pg_migrator has to read
>> pg_controldata in both the old and new servers, meaning it would need
>> access to both C structures, and considering they both have the same
>> structure names, that would require some odd C tricks.  Add to that you
>> don't know which version of Postgres you are migrating from/to during
>> compile and the idea of using C becomes even less attractive.
> 
> However, keep in mind that this might not be the last time on which we
> will want to read something from a C struct, so perhaps it would be good
> to bite the bullet and write the odd tricks.  Does it already have
> access (at compile time) to the old and new source trees?

I have some proof of concept when each control data struct version 
version have one header file like pg_control_843.h and structure like 
ControlFileData has name ControlFileData_843. The main pg_control.h 
defines types without version like

typedef ControlFileData_843 ControlFileData;

I planed to do it for 8.5 but unfortunately no time :( commit fest is 
too close.
Zdenek

Re: pg_migrator issues

From

Bruce Momjian

Date:

06 January 2010, 13:44:57

Zdenek Kotala wrote:
> Dne  4.01.10 19:28, Alvaro Herrera napsal(a):
> > Bruce Momjian escribi?:
> > 
> >> I considered that but realize that pg_migrator has to read
> >> pg_controldata in both the old and new servers, meaning it would need
> >> access to both C structures, and considering they both have the same
> >> structure names, that would require some odd C tricks.  Add to that you
> >> don't know which version of Postgres you are migrating from/to during
> >> compile and the idea of using C becomes even less attractive.
> > 
> > However, keep in mind that this might not be the last time on which we
> > will want to read something from a C struct, so perhaps it would be good
> > to bite the bullet and write the odd tricks.  Does it already have
> > access (at compile time) to the old and new source trees?
> 
> I have some proof of concept when each control data struct version 
> version have one header file like pg_control_843.h and structure like 
> ControlFileData has name ControlFileData_843. The main pg_control.h 
> defines types without version like
> 
> typedef ControlFileData_843 ControlFileData;
> 
> I planed to do it for 8.5 but unfortunately no time :( commit fest is 
> too close.

Yea, I think the confusion of doing it isn't worth it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

06 January 2010, 14:07:49

Bruce Momjian wrote:
> 2)  Right now pg_migrator renames old tablespaces to .old, which fails
> if the tablespaces are on mount points.  I have already received a
> report of such a failure.  $PGDATA also has that issue, but that
> renaming has to be done by the user before pg_migrator is run, and only
> if they want to keep the same $PGDATA value after migration, i.e. no
> version-specific directory path.  One idea we floated around was to have
> tablespaces use major version directory names under the tablespace
> directory so renaming would not be necessary.  I could implement a
> pg_migrator --delete-old flag to cleanly delete the old 8.4 server files
> which are not in a version-specific subdirectory.

FYI, pg_migrator CVS now uses the relfilenode preservation ability in
8.5 to avoid the creation of placeholder relfilenodes and renaming.  It
also recommends using vacuumdb --only-analyze.

To simplify pg_migrator, you can now only migrate _to_ 8.5 as of today's
CVS, not earlier 8.5 CVS trees.  One interesting aspect of pg_migrator
is that while it has to carry around support for upgrading _from_ many
old releases, the target/to version support can stay limited, i.e. I
doubt anyone is going to want to use pg_migrator to migrate to 8.4 in a
year or two --- they will be migrating to 8.5.  We can also consider
removing 8.4 target migration support in pg_migrator 8.5 and just tell
people they have to use pg_migrator 8.4.X for a migration to PG 8.4.X.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Peter Eisentraut

Date:

07 January 2010, 01:25:03

On mån, 2010-01-04 at 13:07 -0500, Bruce Momjian wrote:
> Yea, I am not excited about having vacuumdb do only analyze, but it
> seems the most minimal solution.  I spelled it --only-analyze and just
> posted the reason and patch.

I can't find the patch and the reason, but note that we already have
other options like --data-only, --schema-only, --tuples-only.  I
personally don't like the spelling of --only-analyze.

Re: pg_migrator issues

From

Tom Lane

Date:

07 January 2010, 01:32:50

Peter Eisentraut <peter_e@gmx.net> writes:
> On mån, 2010-01-04 at 13:07 -0500, Bruce Momjian wrote:
>> Yea, I am not excited about having vacuumdb do only analyze, but it
>> seems the most minimal solution.  I spelled it --only-analyze and just
>> posted the reason and patch.

> I can't find the patch and the reason, but note that we already have
> other options like --data-only, --schema-only, --tuples-only.  I
> personally don't like the spelling of --only-analyze.

In particular note that pg_dump has options --schema and --schema-only,
and nobody has complained about that.  I concur with Peter that this
spelling is gratuitously unlike everyplace else.

Perhaps we could compromise by calling it "--no-vacuum", instead?
        regards, tom lane

Re: pg_migrator issues

From

Bruce Momjian

Date:

07 January 2010, 08:13:03

Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On mån, 2010-01-04 at 13:07 -0500, Bruce Momjian wrote:
> >> Yea, I am not excited about having vacuumdb do only analyze, but it
> >> seems the most minimal solution.  I spelled it --only-analyze and just
> >> posted the reason and patch.
> 
> > I can't find the patch and the reason, but note that we already have
> > other options like --data-only, --schema-only, --tuples-only.  I
> > personally don't like the spelling of --only-analyze.
> 
> In particular note that pg_dump has options --schema and --schema-only,
> and nobody has complained about that.  I concur with Peter that this
> spelling is gratuitously unlike everyplace else.

Oh, interesting about pg_dump.  Let's just go with --analyze-only. 
--only-analyze is feeling odd to me too.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

07 January 2010, 08:40:19

> Tom Lane wrote:
> > Peter Eisentraut <peter_e@gmx.net> writes:
> > > On m<C3><A5>n, 2010-01-04 at 13:07 -0500, Bruce Momjian wrote:
> > >> Yea, I am not excited about having vacuumdb do only analyze, but it
> > >> seems the most minimal solution.  I spelled it --only-analyze and just
> > >> posted the reason and patch.
> >
> > > I can't find the patch and the reason, but note that we already have
> > > other options like --data-only, --schema-only, --tuples-only.  I
> > > personally don't like the spelling of --only-analyze.
> >
> > In particular note that pg_dump has options --schema and --schema-only,
> > and nobody has complained about that.  I concur with Peter that this
> > spelling is gratuitously unlike everyplace else.
>
> Oh, interesting about pg_dump.  Let's just go with --analyze-only.
> --only-analyze is feeling odd to me too.

Done, attached and applied.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/ref/vacuumdb.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/vacuumdb.sgml,v
retrieving revision 1.47
diff -c -c -r1.47 vacuumdb.sgml
*** doc/src/sgml/ref/vacuumdb.sgml    6 Jan 2010 05:31:13 -0000    1.47
--- doc/src/sgml/ref/vacuumdb.sgml    7 Jan 2010 12:37:08 -0000
***************
*** 28,34 ****
     <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--only-analyze</arg><arg>-o</arg></group>
     <arg>--table | -t <replaceable>table</replaceable>
      <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg>
     </arg>
--- 28,34 ----
     <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--analyze-only</arg><arg>-o</arg></group>
     <arg>--table | -t <replaceable>table</replaceable>
      <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg>
     </arg>
***************
*** 42,48 ****
     <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--only-analyze</arg><arg>-o</arg></group>
    </cmdsynopsis>
   </refsynopsisdiv>

--- 42,48 ----
     <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--analyze-only</arg><arg>-o</arg></group>
    </cmdsynopsis>
   </refsynopsisdiv>

***************
*** 143,149 ****

       <varlistentry>
        <term><option>-o</option></term>
!       <term><option>--only-analyze</option></term>
        <listitem>
         <para>
          Only calculate statistics for use by the optimizer (no vacuum).
--- 143,149 ----

       <varlistentry>
        <term><option>-o</option></term>
!       <term><option>--analyze-only</option></term>
        <listitem>
         <para>
          Only calculate statistics for use by the optimizer (no vacuum).
***************
*** 168,174 ****
         <para>
          Clean or analyze <replaceable class="parameter">table</replaceable> only.
          Column names can be specified only in conjunction with
!         the <option>--analyze</option> or <option>--only-analyze</option> options.
         </para>
         <tip>
          <para>
--- 168,174 ----
         <para>
          Clean or analyze <replaceable class="parameter">table</replaceable> only.
          Column names can be specified only in conjunction with
!         the <option>--analyze</option> or <option>--analyze-only</option> options.
         </para>
         <tip>
          <para>
Index: src/bin/scripts/vacuumdb.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/scripts/vacuumdb.c,v
retrieving revision 1.31
diff -c -c -r1.31 vacuumdb.c
*** src/bin/scripts/vacuumdb.c    6 Jan 2010 16:04:05 -0000    1.31
--- src/bin/scripts/vacuumdb.c    7 Jan 2010 12:37:08 -0000
***************
*** 15,26 ****


  static void vacuum_one_database(const char *dbname, bool full, bool inplace, bool verbose,
!                     bool and_analyze, bool only_analyze, bool freeze,
                      const char *table, const char *host, const char *port,
                      const char *username, enum trivalue prompt_password,
                      const char *progname, bool echo);
  static void vacuum_all_databases(bool full, bool inplace, bool verbose, bool and_analyze,
!                      bool only_analyze, bool freeze,
                       const char *host, const char *port,
                       const char *username, enum trivalue prompt_password,
                       const char *progname, bool echo, bool quiet);
--- 15,26 ----


  static void vacuum_one_database(const char *dbname, bool full, bool inplace, bool verbose,
!                     bool and_analyze, bool analyze_only, bool freeze,
                      const char *table, const char *host, const char *port,
                      const char *username, enum trivalue prompt_password,
                      const char *progname, bool echo);
  static void vacuum_all_databases(bool full, bool inplace, bool verbose, bool and_analyze,
!                      bool analyze_only, bool freeze,
                       const char *host, const char *port,
                       const char *username, enum trivalue prompt_password,
                       const char *progname, bool echo, bool quiet);
***************
*** 41,47 ****
          {"quiet", no_argument, NULL, 'q'},
          {"dbname", required_argument, NULL, 'd'},
          {"analyze", no_argument, NULL, 'z'},
!         {"only-analyze", no_argument, NULL, 'o'},
          {"freeze", no_argument, NULL, 'F'},
          {"all", no_argument, NULL, 'a'},
          {"table", required_argument, NULL, 't'},
--- 41,47 ----
          {"quiet", no_argument, NULL, 'q'},
          {"dbname", required_argument, NULL, 'd'},
          {"analyze", no_argument, NULL, 'z'},
!         {"analyze-only", no_argument, NULL, 'o'},
          {"freeze", no_argument, NULL, 'F'},
          {"all", no_argument, NULL, 'a'},
          {"table", required_argument, NULL, 't'},
***************
*** 63,69 ****
      bool        echo = false;
      bool        quiet = false;
      bool        and_analyze = false;
!     bool        only_analyze = false;
      bool        freeze = false;
      bool        alldb = false;
      char       *table = NULL;
--- 63,69 ----
      bool        echo = false;
      bool        quiet = false;
      bool        and_analyze = false;
!     bool        analyze_only = false;
      bool        freeze = false;
      bool        alldb = false;
      char       *table = NULL;
***************
*** 108,114 ****
                  and_analyze = true;
                  break;
              case 'o':
!                 only_analyze = true;
                  break;
              case 'F':
                  freeze = true;
--- 108,114 ----
                  and_analyze = true;
                  break;
              case 'o':
!                 analyze_only = true;
                  break;
              case 'F':
                  freeze = true;
***************
*** 155,161 ****
          exit(1);
      }

!     if (only_analyze)
      {
          if (full)
          {
--- 155,161 ----
          exit(1);
      }

!     if (analyze_only)
      {
          if (full)
          {
***************
*** 169,175 ****
                      progname);
              exit(1);
          }
!         /* allow 'and_analyze' with 'only_analyze' */
      }

      setup_cancel_handler();
--- 169,175 ----
                      progname);
              exit(1);
          }
!         /* allow 'and_analyze' with 'analyze_only' */
      }

      setup_cancel_handler();
***************
*** 189,195 ****
              exit(1);
          }

!         vacuum_all_databases(full, inplace, verbose, and_analyze, only_analyze, freeze,
                               host, port, username, prompt_password,
                               progname, echo, quiet);
      }
--- 189,195 ----
              exit(1);
          }

!         vacuum_all_databases(full, inplace, verbose, and_analyze, analyze_only, freeze,
                               host, port, username, prompt_password,
                               progname, echo, quiet);
      }
***************
*** 205,211 ****
                  dbname = get_user_name(progname);
          }

!         vacuum_one_database(dbname, full, inplace, verbose, and_analyze, only_analyze,
                              freeze, table,
                              host, port, username, prompt_password,
                              progname, echo);
--- 205,211 ----
                  dbname = get_user_name(progname);
          }

!         vacuum_one_database(dbname, full, inplace, verbose, and_analyze, analyze_only,
                              freeze, table,
                              host, port, username, prompt_password,
                              progname, echo);
***************
*** 217,223 ****

  static void
  vacuum_one_database(const char *dbname, bool full, bool inplace, bool verbose, bool and_analyze,
!                     bool only_analyze, bool freeze, const char *table,
                      const char *host, const char *port,
                      const char *username, enum trivalue prompt_password,
                      const char *progname, bool echo)
--- 217,223 ----

  static void
  vacuum_one_database(const char *dbname, bool full, bool inplace, bool verbose, bool and_analyze,
!                     bool analyze_only, bool freeze, const char *table,
                      const char *host, const char *port,
                      const char *username, enum trivalue prompt_password,
                      const char *progname, bool echo)
***************
*** 230,236 ****

      conn = connectDatabase(dbname, host, port, username, prompt_password, progname);

!     if (only_analyze)
      {
          appendPQExpBuffer(&sql, "ANALYZE");
          if (verbose)
--- 230,236 ----

      conn = connectDatabase(dbname, host, port, username, prompt_password, progname);

!     if (analyze_only)
      {
          appendPQExpBuffer(&sql, "ANALYZE");
          if (verbose)
***************
*** 306,312 ****


  static void
! vacuum_all_databases(bool full, bool inplace, bool verbose, bool and_analyze, bool only_analyze,
                       bool freeze, const char *host, const char *port,
                       const char *username, enum trivalue prompt_password,
                       const char *progname, bool echo, bool quiet)
--- 306,312 ----


  static void
! vacuum_all_databases(bool full, bool inplace, bool verbose, bool and_analyze, bool analyze_only,
                       bool freeze, const char *host, const char *port,
                       const char *username, enum trivalue prompt_password,
                       const char *progname, bool echo, bool quiet)
***************
*** 329,335 ****
              fflush(stdout);
          }

!         vacuum_one_database(dbname, full, inplace, verbose, and_analyze, only_analyze,
                              freeze, NULL, host, port, username, prompt_password,
                              progname, echo);
      }
--- 329,335 ----
              fflush(stdout);
          }

!         vacuum_one_database(dbname, full, inplace, verbose, and_analyze, analyze_only,
                              freeze, NULL, host, port, username, prompt_password,
                              progname, echo);
      }
***************
*** 351,357 ****
      printf(_("  -f, --full                      do full vacuuming\n"));
      printf(_("  -F, --freeze                    freeze row transaction information\n"));
      printf(_("  -i, --inplace                   do full inplace vacuuming\n"));
!     printf(_("  -o, --only-analyze              only update optimizer hints\n"));
      printf(_("  -q, --quiet                     don't write any messages\n"));
      printf(_("  -t, --table='TABLE[(COLUMNS)]'  vacuum specific table only\n"));
      printf(_("  -v, --verbose                   write a lot of output\n"));
--- 351,357 ----
      printf(_("  -f, --full                      do full vacuuming\n"));
      printf(_("  -F, --freeze                    freeze row transaction information\n"));
      printf(_("  -i, --inplace                   do full inplace vacuuming\n"));
!     printf(_("  -o, --analyze-only              only update optimizer hints\n"));
      printf(_("  -q, --quiet                     don't write any messages\n"));
      printf(_("  -t, --table='TABLE[(COLUMNS)]'  vacuum specific table only\n"));
      printf(_("  -v, --verbose                   write a lot of output\n"));

Re: pg_migrator issues

From

Alvaro Herrera

Date:

07 January 2010, 10:06:52

Bruce Momjian escribió:

> > Oh, interesting about pg_dump.  Let's just go with --analyze-only.
> > --only-analyze is feeling odd to me too.
> 
> Done, attached and applied.

<bikeshedding>
Why -o and not, say, -Z?  I imagine you picked -o for "only" but it
seems strange.
</>

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: pg_migrator issues

From

Bruce Momjian

Date:

07 January 2010, 10:36:19

Alvaro Herrera wrote:
> Bruce Momjian escribi?:
>
> > > Oh, interesting about pg_dump.  Let's just go with --analyze-only.
> > > --only-analyze is feeling odd to me too.
> >
> > Done, attached and applied.
>
> <bikeshedding>
> Why -o and not, say, -Z?  I imagine you picked -o for "only" but it
> seems strange.
> </>

Hmmm, sure -Z makes sense.  Change applied with attached patch. Thanks.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/ref/vacuumdb.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/vacuumdb.sgml,v
retrieving revision 1.48
diff -c -c -r1.48 vacuumdb.sgml
*** doc/src/sgml/ref/vacuumdb.sgml    7 Jan 2010 12:38:55 -0000    1.48
--- doc/src/sgml/ref/vacuumdb.sgml    7 Jan 2010 14:34:39 -0000
***************
*** 28,34 ****
     <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--analyze-only</arg><arg>-o</arg></group>
     <arg>--table | -t <replaceable>table</replaceable>
      <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg>
     </arg>
--- 28,34 ----
     <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--analyze-only</arg><arg>-Z</arg></group>
     <arg>--table | -t <replaceable>table</replaceable>
      <arg>( <replaceable class="parameter">column</replaceable> [,...] )</arg>
     </arg>
***************
*** 42,48 ****
     <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--analyze-only</arg><arg>-o</arg></group>
    </cmdsynopsis>
   </refsynopsisdiv>

--- 42,48 ----
     <group><arg>--freeze</arg><arg>-F</arg></group>
     <group><arg>--verbose</arg><arg>-v</arg></group>
     <group><arg>--analyze</arg><arg>-z</arg></group>
!    <group><arg>--analyze-only</arg><arg>-Z</arg></group>
    </cmdsynopsis>
   </refsynopsisdiv>

***************
*** 142,148 ****
       </varlistentry>

       <varlistentry>
!       <term><option>-o</option></term>
        <term><option>--analyze-only</option></term>
        <listitem>
         <para>
--- 142,148 ----
       </varlistentry>

       <varlistentry>
!       <term><option>-Z</option></term>
        <term><option>--analyze-only</option></term>
        <listitem>
         <para>
Index: src/bin/scripts/vacuumdb.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/scripts/vacuumdb.c,v
retrieving revision 1.32
diff -c -c -r1.32 vacuumdb.c
*** src/bin/scripts/vacuumdb.c    7 Jan 2010 12:38:55 -0000    1.32
--- src/bin/scripts/vacuumdb.c    7 Jan 2010 14:34:44 -0000
***************
*** 41,47 ****
          {"quiet", no_argument, NULL, 'q'},
          {"dbname", required_argument, NULL, 'd'},
          {"analyze", no_argument, NULL, 'z'},
!         {"analyze-only", no_argument, NULL, 'o'},
          {"freeze", no_argument, NULL, 'F'},
          {"all", no_argument, NULL, 'a'},
          {"table", required_argument, NULL, 't'},
--- 41,47 ----
          {"quiet", no_argument, NULL, 'q'},
          {"dbname", required_argument, NULL, 'd'},
          {"analyze", no_argument, NULL, 'z'},
!         {"analyze-only", no_argument, NULL, 'Z'},
          {"freeze", no_argument, NULL, 'F'},
          {"all", no_argument, NULL, 'a'},
          {"table", required_argument, NULL, 't'},
***************
*** 107,113 ****
              case 'z':
                  and_analyze = true;
                  break;
!             case 'o':
                  analyze_only = true;
                  break;
              case 'F':
--- 107,113 ----
              case 'z':
                  and_analyze = true;
                  break;
!             case 'Z':
                  analyze_only = true;
                  break;
              case 'F':
***************
*** 351,361 ****
      printf(_("  -f, --full                      do full vacuuming\n"));
      printf(_("  -F, --freeze                    freeze row transaction information\n"));
      printf(_("  -i, --inplace                   do full inplace vacuuming\n"));
-     printf(_("  -o, --analyze-only              only update optimizer hints\n"));
      printf(_("  -q, --quiet                     don't write any messages\n"));
      printf(_("  -t, --table='TABLE[(COLUMNS)]'  vacuum specific table only\n"));
      printf(_("  -v, --verbose                   write a lot of output\n"));
      printf(_("  -z, --analyze                   update optimizer hints\n"));
      printf(_("  --help                          show this help, then exit\n"));
      printf(_("  --version                       output version information, then exit\n"));
      printf(_("\nConnection options:\n"));
--- 351,361 ----
      printf(_("  -f, --full                      do full vacuuming\n"));
      printf(_("  -F, --freeze                    freeze row transaction information\n"));
      printf(_("  -i, --inplace                   do full inplace vacuuming\n"));
      printf(_("  -q, --quiet                     don't write any messages\n"));
      printf(_("  -t, --table='TABLE[(COLUMNS)]'  vacuum specific table only\n"));
      printf(_("  -v, --verbose                   write a lot of output\n"));
      printf(_("  -z, --analyze                   update optimizer hints\n"));
+     printf(_("  -Z, --analyze-only              only update optimizer hints\n"));
      printf(_("  --help                          show this help, then exit\n"));
      printf(_("  --version                       output version information, then exit\n"));
      printf(_("\nConnection options:\n"));

Re: pg_migrator issues

From

Bruce Momjian

Date:

07 January 2010, 21:14:55

Bruce Momjian wrote:
> 2)  Right now pg_migrator renames old tablespaces to .old, which fails
> if the tablespaces are on mount points.  I have already received a
> report of such a failure.  $PGDATA also has that issue, but that
> renaming has to be done by the user before pg_migrator is run, and only
> if they want to keep the same $PGDATA value after migration, i.e. no
> version-specific directory path.  One idea we floated around was to have
> tablespaces use major version directory names under the tablespace
> directory so renaming would not be necessary.  I could implement a
> pg_migrator --delete-old flag to cleanly delete the old 8.4 server files
> which are not in a version-specific subdirectory.

I have created a patch to implement per-cluster directories in
tablespaces.  This is for use by pg_migrator so it doesn't have to
rename the tablespaces during the migration.  Users still need to remove
the old cluster's tablespace subdirectory, and I can add a --delete-old
option to pg_migrator to do that.

The old code used a symlink from pg_tblspc/#### to the location
directory specified in CREATE TABLESPACE.  During CREATE TABLESPACE, a
PG_VERSION file is created containing the major version number.  Anytime
a database object is created in the tablespace, a per-database directory
is created.

With the new code in this patch, pg_tblspc/#### points to the CREATE
TABLESPACE directory just like before, but a new directory, PG_ +
major_version + catalog_version, e.g. PG_8.5_201001061, is created and
all per-database directories are created under that directory.  This
directory has the same purpose as the old PG_VERSION file.  One
disadvantage of this approach is that functions that need to look inside
tablespaces must now also specify the version directory, e.g.
pg_tablespace_databases().

An alternative approach would be for the pg_tblspc/#### symbolic link to
point to the new version directory, PG_*, but that makes removal of the
version directory complicated, particularly during WAL replay where we
don't have access to the system catalogs, and readlink() to read the
symbolic link target is not supported on all operating systems
(particularly Win32).

I used the version directory pattern "PG_8.5_201001061" because "PG_"
helps people realize the directory is for the use of Postgres
(PG_VERSION is gone in tablespaces), and the catalog version number
enables alpha migrations.  The major version number is not necessary but
probably useful for administrators.

pg_migrator is going to need to know about the version directory too,
and it can't use the C macro --- it has to construct the directory
pattern based on the contents of pg_control from the old and new
servers. And, it is going to be difficult to run pg_control on the old
server for pg_migrator --delete-old after migration because it is
renamed to pg_control.old --- I will need to create a symbolic link
during the time I run pg_controldata.  Also, the contents of the
tablespace directory for an 8.4 to 8.5 migration is going to be ugly
because there will be many numeric directories (for databases), and
PG_VERSION (for 8.4), and the PG_8.5_201001061 directory which should
not be touched.

Can someone explain why TablespaceCreateDbspace() creates a non-symlink
directory during recovery if the symlink is missing?  Is it just for
robustness?  I would like to document that more clearly.

Comments?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/catalog/catalog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/catalog/catalog.c,v
retrieving revision 1.86
diff -c -c -r1.86 catalog.c
*** src/backend/catalog/catalog.c    6 Jan 2010 02:41:37 -0000    1.86
--- src/backend/catalog/catalog.c    8 Jan 2010 01:07:41 -0000
***************
*** 115,130 ****
      else
      {
          /* All other tablespaces are accessed via symlinks */
!         pathlen = 10 + OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS + 1
!             + FORKNAMECHARS + 1;
          path = (char *) palloc(pathlen);
          if (forknum != MAIN_FORKNUM)
!             snprintf(path, pathlen, "pg_tblspc/%u/%u/%u_%s",
!                      rnode.spcNode, rnode.dbNode, rnode.relNode,
!                      forkNames[forknum]);
          else
!             snprintf(path, pathlen, "pg_tblspc/%u/%u/%u",
!                      rnode.spcNode, rnode.dbNode, rnode.relNode);
      }
      return path;
  }
--- 115,131 ----
      else
      {
          /* All other tablespaces are accessed via symlinks */
!         pathlen = 9 + 1 + OIDCHARS + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) +
!             1 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
          path = (char *) palloc(pathlen);
          if (forknum != MAIN_FORKNUM)
!             snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s",
!                      rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
!                      rnode.dbNode, rnode.relNode, forkNames[forknum]);
          else
!             snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u",
!                      rnode.spcNode, TABLESPACE_VERSION_DIRECTORY,
!                      rnode.dbNode, rnode.relNode);
      }
      return path;
  }
***************
*** 161,170 ****
      else
      {
          /* All other tablespaces are accessed via symlinks */
!         pathlen = 10 + OIDCHARS + 1 + OIDCHARS + 1;
          path = (char *) palloc(pathlen);
!         snprintf(path, pathlen, "pg_tblspc/%u/%u",
!                  spcNode, dbNode);
      }
      return path;
  }
--- 162,172 ----
      else
      {
          /* All other tablespaces are accessed via symlinks */
!         pathlen = 9 + 1 + OIDCHARS + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) +
!                   1 + OIDCHARS + 1;
          path = (char *) palloc(pathlen);
!         snprintf(path, pathlen, "pg_tblspc/%u/%s/%u",
!                  spcNode, TABLESPACE_VERSION_DIRECTORY, dbNode);
      }
      return path;
  }
Index: src/backend/commands/tablespace.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v
retrieving revision 1.70
diff -c -c -r1.70 tablespace.c
*** src/backend/commands/tablespace.c    7 Jan 2010 04:10:39 -0000    1.70
--- src/backend/commands/tablespace.c    8 Jan 2010 01:07:41 -0000
***************
*** 15,22 ****
   * To support file access via the information given in RelFileNode, we
   * maintain a symbolic-link map in $PGDATA/pg_tblspc. The symlinks are
   * named by tablespace OIDs and point to the actual tablespace directories.
   * Thus the full path to an arbitrary file is
!  *            $PGDATA/pg_tblspc/spcoid/dboid/relfilenode
   *
   * There are two tablespaces created at initdb time: pg_global (for shared
   * tables) and pg_default (for everything else).  For backwards compatibility
--- 15,25 ----
   * To support file access via the information given in RelFileNode, we
   * maintain a symbolic-link map in $PGDATA/pg_tblspc. The symlinks are
   * named by tablespace OIDs and point to the actual tablespace directories.
+  * There is also a per-cluster version directory in each tablespace.
   * Thus the full path to an arbitrary file is
!  *            $PGDATA/pg_tblspc/spcoid/PG_MAJORVER_CATVER/dboid/relfilenode
!  * e.g.
!  *            $PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814
   *
   * There are two tablespaces created at initdb time: pg_global (for shared
   * tables) and pg_default (for everything else).  For backwards compatibility
***************
*** 81,88 ****
  char       *temp_tablespaces = NULL;


! static bool remove_tablespace_directories(Oid tablespaceoid, bool redo);
! static void write_version_file(const char *path);


  /*
--- 84,92 ----
  char       *temp_tablespaces = NULL;


! static void create_tablespace_directories(const char *location,
!                                          const Oid tablespaceoid);
! static bool destroy_tablespace_directories(Oid tablespaceoid, bool redo);


  /*
***************
*** 146,163 ****
                  {
                      char       *parentdir;

!                     /* Failure other than not exists? */
                      if (errno != ENOENT || !isRedo)
                          ereport(ERROR,
                                  (errcode_for_file_access(),
                                errmsg("could not create directory \"%s\": %m",
                                       dir)));

!                     /* Parent directory must be missing */
                      parentdir = pstrdup(dir);
                      get_parent_directory(parentdir);
!                     /* Can't create parent either? */
!                     if (mkdir(parentdir, S_IRWXU) < 0)
                          ereport(ERROR,
                                  (errcode_for_file_access(),
                                errmsg("could not create directory \"%s\": %m",
--- 150,185 ----
                  {
                      char       *parentdir;

!                     /* Failure other than not exists or not in WAL replay? */
                      if (errno != ENOENT || !isRedo)
                          ereport(ERROR,
                                  (errcode_for_file_access(),
                                errmsg("could not create directory \"%s\": %m",
                                       dir)));

!                     /*
!                      * Parent directories are missing during WAL replay, so
!                      * continue by creating simple parent directories
!                      * rather than a symlink.
!                      */
!
!                     /* create two parents up if not exist */
                      parentdir = pstrdup(dir);
                      get_parent_directory(parentdir);
!                     get_parent_directory(parentdir);
!                     /* Can't create parent and it doesn't already exist? */
!                     if (mkdir(parentdir, S_IRWXU) < 0 && errno != EEXIST)
!                         ereport(ERROR,
!                                 (errcode_for_file_access(),
!                               errmsg("could not create directory \"%s\": %m",
!                                      parentdir)));
!                     pfree(parentdir);
!
!                     /* create one parent up if not exist */
!                     parentdir = pstrdup(dir);
!                     get_parent_directory(parentdir);
!                     /* Can't create parent and it doesn't already exist? */
!                     if (mkdir(parentdir, S_IRWXU) < 0 && errno != EEXIST)
                          ereport(ERROR,
                                  (errcode_for_file_access(),
                                errmsg("could not create directory \"%s\": %m",
***************
*** 212,218 ****
      HeapTuple    tuple;
      Oid            tablespaceoid;
      char       *location;
-     char       *linkloc;
      Oid            ownerId;

      /* Must be super user */
--- 234,239 ----
***************
*** 251,260 ****

      /*
       * Check that location isn't too long. Remember that we're going to append
!      * '/<dboid>/<relid>.<nnn>'  (XXX but do we ever form the whole path
!      * explicitly?    This may be overly conservative.)
       */
!     if (strlen(location) >= MAXPGPATH - 1 - OIDCHARS - 1 - OIDCHARS - 1 - OIDCHARS)
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
                   errmsg("tablespace location \"%s\" is too long",
--- 272,282 ----

      /*
       * Check that location isn't too long. Remember that we're going to append
!      * 'PG_XXX/<dboid>/<relid>.<nnn>'.  FYI, we never actually reference the
!      * whole path, but mkdir() uses the first two parts.
       */
!     if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 +
!         OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS > MAXPGPATH)
          ereport(ERROR,
                  (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
                   errmsg("tablespace location \"%s\" is too long",
***************
*** 311,355 ****
      /* Record dependency on owner */
      recordDependencyOnOwner(TableSpaceRelationId, tablespaceoid, ownerId);

!     /*
!      * Attempt to coerce target directory to safe permissions.    If this fails,
!      * it doesn't exist or has the wrong owner.
!      */
!     if (chmod(location, 0700) != 0)
!         ereport(ERROR,
!                 (errcode_for_file_access(),
!                  errmsg("could not set permissions on directory \"%s\": %m",
!                         location)));
!
!     /*
!      * Check the target directory is empty.
!      */
!     if (!directory_is_empty(location))
!         ereport(ERROR,
!                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!                  errmsg("directory \"%s\" is not empty",
!                         location)));
!
!     /*
!      * Create the PG_VERSION file in the target directory.    This has several
!      * purposes: to make sure we can write in the directory, to prevent
!      * someone from creating another tablespace pointing at the same directory
!      * (the emptiness check above will fail), and to label tablespace
!      * directories by PG version.
!      */
!     write_version_file(location);
!
!     /*
!      * All seems well, create the symlink
!      */
!     linkloc = (char *) palloc(OIDCHARS + OIDCHARS + 1);
!     sprintf(linkloc, "pg_tblspc/%u", tablespaceoid);
!
!     if (symlink(location, linkloc) < 0)
!         ereport(ERROR,
!                 (errcode_for_file_access(),
!                  errmsg("could not create symbolic link \"%s\": %m",
!                         linkloc)));

      /* Record the filesystem change in XLOG */
      {
--- 333,339 ----
      /* Record dependency on owner */
      recordDependencyOnOwner(TableSpaceRelationId, tablespaceoid, ownerId);

!     create_tablespace_directories(location, tablespaceoid);

      /* Record the filesystem change in XLOG */
      {
***************
*** 378,384 ****
       */
      ForceSyncCommit();

-     pfree(linkloc);
      pfree(location);

      /* We keep the lock on pg_tablespace until commit */
--- 362,367 ----
***************
*** 478,484 ****
      /*
       * Try to remove the physical infrastructure.
       */
!     if (!remove_tablespace_directories(tablespaceoid, false))
      {
          /*
           * Not all files deleted?  However, there can be lingering empty files
--- 461,467 ----
      /*
       * Try to remove the physical infrastructure.
       */
!     if (!destroy_tablespace_directories(tablespaceoid, false))
      {
          /*
           * Not all files deleted?  However, there can be lingering empty files
***************
*** 490,496 ****
           * out any lingering files, and try again.
           */
          RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
!         if (!remove_tablespace_directories(tablespaceoid, false))
          {
              /* Still not empty, the files must be important then */
              ereport(ERROR,
--- 473,479 ----
           * out any lingering files, and try again.
           */
          RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
!         if (!destroy_tablespace_directories(tablespaceoid, false))
          {
              /* Still not empty, the files must be important then */
              ereport(ERROR,
***************
*** 542,565 ****
  #endif   /* HAVE_SYMLINK */
  }

  /*
!  * remove_tablespace_directories: attempt to remove filesystem infrastructure
   *
!  * Returns TRUE if successful, FALSE if some subdirectory is not empty
   *
!  * redo indicates we are redoing a drop from XLOG; okay if nothing there
   */
  static bool
! remove_tablespace_directories(Oid tablespaceoid, bool redo)
  {
!     char       *location;
      DIR           *dirdesc;
      struct dirent *de;
      char       *subfile;
      struct stat st;

!     location = (char *) palloc(OIDCHARS + OIDCHARS + 1);
!     sprintf(location, "pg_tblspc/%u", tablespaceoid);

      /*
       * Check if the tablespace still contains any files.  We try to rmdir each
--- 525,621 ----
  #endif   /* HAVE_SYMLINK */
  }

+
  /*
!  * create_tablespace_directories
   *
!  *    Attempt to create filesystem infrastructure linking $PGDATA/pg_tblspc/
!  *    to the specified directory
!  */
! static void
! create_tablespace_directories(const char *location, const Oid tablespaceoid)
! {
!     char *linkloc = palloc(OIDCHARS + OIDCHARS + 1);
!     char *location_with_version_dir = palloc(strlen(location) + 1 +
!                                         strlen(TABLESPACE_VERSION_DIRECTORY) + 1);
!
!     sprintf(linkloc, "pg_tblspc/%u", tablespaceoid);
!     sprintf(location_with_version_dir, "%s/%s", location,
!                                         TABLESPACE_VERSION_DIRECTORY);
!
!     /*
!      * Attempt to coerce target directory to safe permissions.    If this fails,
!      * it doesn't exist or has the wrong owner.
!      */
!     if (chmod(location, 0700) != 0)
!     {
!         if (errno == ENOENT)
!             ereport(ERROR,
!                     (errcode(ERRCODE_UNDEFINED_FILE),
!                      errmsg("directory \"%s\" does not exist",
!                             location)));
!         else
!             ereport(ERROR,
!                 (errcode_for_file_access(),
!                  errmsg("could not set permissions on directory \"%s\": %m",
!                         location)));
!     }
!
!     /*
!      * The creation of the version directory prevents more than one
!      *     tablespace in a single location.
!      */
!     if (mkdir(location_with_version_dir, S_IRWXU) < 0)
!     {
!         if (errno == EEXIST)
!             ereport(ERROR,
!                     (errcode(ERRCODE_OBJECT_IN_USE),
!                      errmsg("directory \"%s\" already in use as a tablespace",
!                             location_with_version_dir)));
!         else
!             ereport(ERROR,
!                     (errcode_for_file_access(),
!                   errmsg("could not create directory \"%s\": %m",
!                          location_with_version_dir)));
!     }
!
!     /*
!      * Create the symlink under PGDATA
!      */
!     if (symlink(location, linkloc) < 0)
!         ereport(ERROR,
!                 (errcode_for_file_access(),
!                  errmsg("could not create symbolic link \"%s\": %m",
!                         linkloc)));
!
!     pfree(linkloc);
!     pfree(location_with_version_dir);
! }
!
!
! /*
!  * destroy_tablespace_directories
!  *
!  * Attempt to remove filesystem infrastructure
!  *
!  * 'redo' indicates we are redoing a drop from XLOG; okay if nothing there
   *
!  * Returns TRUE if successful, FALSE if some subdirectory is not empty
   */
  static bool
! destroy_tablespace_directories(Oid tablespaceoid, bool redo)
  {
!     char       *linkloc;
!     char       *linkloc_with_version_dir;
      DIR           *dirdesc;
      struct dirent *de;
      char       *subfile;
      struct stat st;

!     linkloc_with_version_dir = palloc(9 + 1 + OIDCHARS + 1 +
!                                     strlen(TABLESPACE_VERSION_DIRECTORY));
!     sprintf(linkloc_with_version_dir, "pg_tblspc/%u/%s", tablespaceoid,
!                                     TABLESPACE_VERSION_DIRECTORY);

      /*
       * Check if the tablespace still contains any files.  We try to rmdir each
***************
*** 582,588 ****
       * and symlink.  We want to allow a new DROP attempt to succeed at
       * removing the catalog entries, so we should not give a hard error here.
       */
!     dirdesc = AllocateDir(location);
      if (dirdesc == NULL)
      {
          if (errno == ENOENT)
--- 638,644 ----
       * and symlink.  We want to allow a new DROP attempt to succeed at
       * removing the catalog entries, so we should not give a hard error here.
       */
!     dirdesc = AllocateDir(linkloc_with_version_dir);
      if (dirdesc == NULL)
      {
          if (errno == ENOENT)
***************
*** 591,622 ****
                  ereport(WARNING,
                          (errcode_for_file_access(),
                           errmsg("could not open directory \"%s\": %m",
!                                 location)));
!             pfree(location);
              return true;
          }
          /* else let ReadDir report the error */
      }

!     while ((de = ReadDir(dirdesc, location)) != NULL)
      {
-         /* Note we ignore PG_VERSION for the nonce */
          if (strcmp(de->d_name, ".") == 0 ||
!             strcmp(de->d_name, "..") == 0 ||
!             strcmp(de->d_name, "PG_VERSION") == 0)
              continue;

!         subfile = palloc(strlen(location) + 1 + strlen(de->d_name) + 1);
!         sprintf(subfile, "%s/%s", location, de->d_name);

          /* This check is just to deliver a friendlier error message */
          if (!directory_is_empty(subfile))
          {
              FreeDir(dirdesc);
              return false;
          }

!         /* Do the real deed */
          if (rmdir(subfile) < 0)
              ereport(ERROR,
                      (errcode_for_file_access(),
--- 647,678 ----
                  ereport(WARNING,
                          (errcode_for_file_access(),
                           errmsg("could not open directory \"%s\": %m",
!                                 linkloc_with_version_dir)));
!             pfree(linkloc_with_version_dir);
              return true;
          }
          /* else let ReadDir report the error */
      }

!     while ((de = ReadDir(dirdesc, linkloc_with_version_dir)) != NULL)
      {
          if (strcmp(de->d_name, ".") == 0 ||
!             strcmp(de->d_name, "..") == 0)
              continue;

!         subfile = palloc(strlen(linkloc_with_version_dir) + 1 + strlen(de->d_name) + 1);
!         sprintf(subfile, "%s/%s", linkloc_with_version_dir, de->d_name);

          /* This check is just to deliver a friendlier error message */
          if (!directory_is_empty(subfile))
          {
              FreeDir(dirdesc);
+             pfree(subfile);
+             pfree(linkloc_with_version_dir);
              return false;
          }

!         /* remove empty directory */
          if (rmdir(subfile) < 0)
              ereport(ERROR,
                      (errcode_for_file_access(),
***************
*** 628,706 ****

      FreeDir(dirdesc);

      /*
!      * Okay, try to unlink PG_VERSION (we allow it to not be there, even in
!      * non-REDO case, for robustness).
!      */
!     subfile = palloc(strlen(location) + 11 + 1);
!     sprintf(subfile, "%s/PG_VERSION", location);
!
!     if (unlink(subfile) < 0)
!     {
!         if (errno != ENOENT)
!             ereport(ERROR,
!                     (errcode_for_file_access(),
!                      errmsg("could not remove file \"%s\": %m",
!                             subfile)));
!     }
!
!     pfree(subfile);
!
!     /*
!      * Okay, try to remove the symlink.  We must however deal with the
       * possibility that it's a directory instead of a symlink --- this could
       * happen during WAL replay (see TablespaceCreateDbspace), and it is also
!      * the normal case on Windows.
       */
!     if (lstat(location, &st) == 0 && S_ISDIR(st.st_mode))
      {
!         if (rmdir(location) < 0)
              ereport(ERROR,
                      (errcode_for_file_access(),
                       errmsg("could not remove directory \"%s\": %m",
!                             location)));
      }
      else
      {
!         if (unlink(location) < 0)
              ereport(ERROR,
                      (errcode_for_file_access(),
                       errmsg("could not remove symbolic link \"%s\": %m",
!                             location)));
      }

!     pfree(location);

      return true;
  }

- /*
-  * write out the PG_VERSION file in the specified directory
-  */
- static void
- write_version_file(const char *path)
- {
-     char       *fullname;
-     FILE       *version_file;
-
-     /* Now write the file */
-     fullname = palloc(strlen(path) + 11 + 1);
-     sprintf(fullname, "%s/PG_VERSION", path);
-
-     if ((version_file = AllocateFile(fullname, PG_BINARY_W)) == NULL)
-         ereport(ERROR,
-                 (errcode_for_file_access(),
-                  errmsg("could not write to file \"%s\": %m",
-                         fullname)));
-     fprintf(version_file, "%s\n", PG_MAJORVERSION);
-     if (FreeFile(version_file))
-         ereport(ERROR,
-                 (errcode_for_file_access(),
-                  errmsg("could not write to file \"%s\": %m",
-                         fullname)));
-
-     pfree(fullname);
- }

  /*
   * Check if a directory is empty.
--- 684,727 ----

      FreeDir(dirdesc);

+     /* remove version directory */
+     if (rmdir(linkloc_with_version_dir) < 0)
+         ereport(ERROR,
+                 (errcode_for_file_access(),
+                  errmsg("could not remove directory \"%s\": %m",
+                         linkloc_with_version_dir)));
+
      /*
!      * Try to remove the symlink.  We must however deal with the
       * possibility that it's a directory instead of a symlink --- this could
       * happen during WAL replay (see TablespaceCreateDbspace), and it is also
!      * the case on Windows where junction points lstat() as directories.
       */
!     linkloc = pstrdup(linkloc_with_version_dir);
!     get_parent_directory(linkloc);
!     if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))
      {
!         if (rmdir(linkloc) < 0)
              ereport(ERROR,
                      (errcode_for_file_access(),
                       errmsg("could not remove directory \"%s\": %m",
!                             linkloc)));
      }
      else
      {
!         if (unlink(linkloc) < 0)
              ereport(ERROR,
                      (errcode_for_file_access(),
                       errmsg("could not remove symbolic link \"%s\": %m",
!                             linkloc)));
      }

!     pfree(linkloc_with_version_dir);
!     pfree(linkloc);

      return true;
  }


  /*
   * Check if a directory is empty.
***************
*** 728,733 ****
--- 749,755 ----
      return true;
  }

+
  /*
   * Rename a tablespace
   */
***************
*** 1336,1370 ****
      {
          xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
          char       *location = xlrec->ts_path;
-         char       *linkloc;
-
-         /*
-          * Attempt to coerce target directory to safe permissions.    If this
-          * fails, it doesn't exist or has the wrong owner.
-          */
-         if (chmod(location, 0700) != 0)
-             ereport(ERROR,
-                     (errcode_for_file_access(),
-                   errmsg("could not set permissions on directory \"%s\": %m",
-                          location)));
-
-         /* Create or re-create the PG_VERSION file in the target directory */
-         write_version_file(location);
-
-         /* Create the symlink if not already present */
-         linkloc = (char *) palloc(OIDCHARS + OIDCHARS + 1);
-         sprintf(linkloc, "pg_tblspc/%u", xlrec->ts_id);
-
-         if (symlink(location, linkloc) < 0)
-         {
-             if (errno != EEXIST)
-                 ereport(ERROR,
-                         (errcode_for_file_access(),
-                          errmsg("could not create symbolic link \"%s\": %m",
-                                 linkloc)));
-         }

!         pfree(linkloc);
      }
      else if (info == XLOG_TBLSPC_DROP)
      {
--- 1358,1365 ----
      {
          xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
          char       *location = xlrec->ts_path;

!         create_tablespace_directories(location, xlrec->ts_id);
      }
      else if (info == XLOG_TBLSPC_DROP)
      {
***************
*** 1380,1386 ****
           * remove all files then do conflict processing and try again,
           * if currently enabled.
           */
!         if (!remove_tablespace_directories(xlrec->ts_id, true))
          {
              VirtualTransactionId *temp_file_users;

--- 1375,1381 ----
           * remove all files then do conflict processing and try again,
           * if currently enabled.
           */
!         if (!destroy_tablespace_directories(xlrec->ts_id, true))
          {
              VirtualTransactionId *temp_file_users;

***************
*** 1416,1422 ****
               * exited by now. So lets recheck before we throw an error.
               * If !process_conflicts then this will just fail again.
               */
!             if (!remove_tablespace_directories(xlrec->ts_id, true))
                  ereport(ERROR,
                      (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                       errmsg("tablespace %u is not empty",
--- 1411,1417 ----
               * exited by now. So lets recheck before we throw an error.
               * If !process_conflicts then this will just fail again.
               */
!             if (!destroy_tablespace_directories(xlrec->ts_id, true))
                  ereport(ERROR,
                      (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                       errmsg("tablespace %u is not empty",
Index: src/backend/storage/file/fd.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/file/fd.c,v
retrieving revision 1.152
diff -c -c -r1.152 fd.c
*** src/backend/storage/file/fd.c    2 Jan 2010 16:57:51 -0000    1.152
--- src/backend/storage/file/fd.c    8 Jan 2010 01:07:41 -0000
***************
*** 51,56 ****
--- 51,57 ----

  #include "miscadmin.h"
  #include "access/xact.h"
+ #include "catalog/catalog.h"
  #include "catalog/pg_tablespace.h"
  #include "storage/fd.h"
  #include "storage/ipc.h"
***************
*** 963,970 ****
      else
      {
          /* All other tablespaces are accessed via symlinks */
!         snprintf(tempdirpath, sizeof(tempdirpath), "pg_tblspc/%u/%s",
!                  tblspcOid, PG_TEMP_FILES_DIR);
      }

      /*
--- 964,971 ----
      else
      {
          /* All other tablespaces are accessed via symlinks */
!         snprintf(tempdirpath, sizeof(tempdirpath), "pg_tblspc/%u/%s/%s",
!                  tblspcOid, TABLESPACE_VERSION_DIRECTORY, PG_TEMP_FILES_DIR);
      }

      /*
***************
*** 1841,1848 ****
              strcmp(spc_de->d_name, "..") == 0)
              continue;

!         snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
!                  spc_de->d_name, PG_TEMP_FILES_DIR);
          RemovePgTempFilesInDir(temp_path);
      }

--- 1842,1849 ----
              strcmp(spc_de->d_name, "..") == 0)
              continue;

!         snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s/%s",
!                  spc_de->d_name, TABLESPACE_VERSION_DIRECTORY, PG_TEMP_FILES_DIR);
          RemovePgTempFilesInDir(temp_path);
      }

Index: src/backend/utils/adt/dbsize.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/dbsize.c,v
retrieving revision 1.25
diff -c -c -r1.25 dbsize.c
*** src/backend/utils/adt/dbsize.c    2 Jan 2010 16:57:53 -0000    1.25
--- src/backend/utils/adt/dbsize.c    8 Jan 2010 01:07:41 -0000
***************
*** 108,115 ****
              strcmp(direntry->d_name, "..") == 0)
              continue;

!         snprintf(pathname, MAXPGPATH, "pg_tblspc/%s/%u",
!                  direntry->d_name, dbOid);
          totalsize += db_dir_size(pathname);
      }

--- 108,115 ----
              strcmp(direntry->d_name, "..") == 0)
              continue;

!         snprintf(pathname, MAXPGPATH, "pg_tblspc/%s/%s/%u",
!                  direntry->d_name, TABLESPACE_VERSION_DIRECTORY, dbOid);
          totalsize += db_dir_size(pathname);
      }

***************
*** 179,185 ****
      else if (tblspcOid == GLOBALTABLESPACE_OID)
          snprintf(tblspcPath, MAXPGPATH, "global");
      else
!         snprintf(tblspcPath, MAXPGPATH, "pg_tblspc/%u", tblspcOid);

      dirdesc = AllocateDir(tblspcPath);

--- 179,186 ----
      else if (tblspcOid == GLOBALTABLESPACE_OID)
          snprintf(tblspcPath, MAXPGPATH, "global");
      else
!         snprintf(tblspcPath, MAXPGPATH, "pg_tblspc/%u/%s", tblspcOid,
!                                          TABLESPACE_VERSION_DIRECTORY);

      dirdesc = AllocateDir(tblspcPath);

Index: src/backend/utils/adt/misc.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/misc.c,v
retrieving revision 1.73
diff -c -c -r1.73 misc.c
*** src/backend/utils/adt/misc.c    2 Jan 2010 16:57:54 -0000    1.73
--- src/backend/utils/adt/misc.c    8 Jan 2010 01:07:41 -0000
***************
*** 20,25 ****
--- 20,26 ----
  #include <math.h>

  #include "access/xact.h"
+ #include "catalog/catalog.h"
  #include "catalog/pg_type.h"
  #include "catalog/pg_tablespace.h"
  #include "commands/dbcommands.h"
***************
*** 185,191 ****
          /*
           * size = tablespace dirname length + dir sep char + oid + terminator
           */
!         fctx->location = (char *) palloc(10 + 10 + 1);
          if (tablespaceOid == GLOBALTABLESPACE_OID)
          {
              fctx->dirdesc = NULL;
--- 186,193 ----
          /*
           * size = tablespace dirname length + dir sep char + oid + terminator
           */
!         fctx->location = (char *) palloc(9 + 1 + OIDCHARS + 1 +
!                          strlen(TABLESPACE_VERSION_DIRECTORY) + 1);
          if (tablespaceOid == GLOBALTABLESPACE_OID)
          {
              fctx->dirdesc = NULL;
***************
*** 197,203 ****
              if (tablespaceOid == DEFAULTTABLESPACE_OID)
                  sprintf(fctx->location, "base");
              else
!                 sprintf(fctx->location, "pg_tblspc/%u", tablespaceOid);

              fctx->dirdesc = AllocateDir(fctx->location);

--- 199,206 ----
              if (tablespaceOid == DEFAULTTABLESPACE_OID)
                  sprintf(fctx->location, "base");
              else
!                 sprintf(fctx->location, "pg_tblspc/%u/%s", tablespaceOid,
!                         TABLESPACE_VERSION_DIRECTORY);

              fctx->dirdesc = AllocateDir(fctx->location);

Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.296
diff -c -c -r1.296 relcache.c
*** src/backend/utils/cache/relcache.c    5 Jan 2010 01:06:56 -0000    1.296
--- src/backend/utils/cache/relcache.c    8 Jan 2010 01:07:50 -0000
***************
*** 4297,4304 ****
          if (strspn(de->d_name, "0123456789") == strlen(de->d_name))
          {
              /* Scan the tablespace dir for per-database dirs */
!             snprintf(path, sizeof(path), "%s/%s",
!                      tblspcdir, de->d_name);
              RelationCacheInitFileRemoveInDir(path);
          }
      }
--- 4297,4304 ----
          if (strspn(de->d_name, "0123456789") == strlen(de->d_name))
          {
              /* Scan the tablespace dir for per-database dirs */
!             snprintf(path, sizeof(path), "%s/%s/%s",
!                      tblspcdir, de->d_name, TABLESPACE_VERSION_DIRECTORY);
              RelationCacheInitFileRemoveInDir(path);
          }
      }
Index: src/include/catalog/catalog.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/catalog.h,v
retrieving revision 1.46
diff -c -c -r1.46 catalog.h
*** src/include/catalog/catalog.h    6 Jan 2010 02:41:37 -0000    1.46
--- src/include/catalog/catalog.h    8 Jan 2010 01:07:50 -0000
***************
*** 14,24 ****
--- 14,27 ----
  #ifndef CATALOG_H
  #define CATALOG_H

+ #include "catalog/catversion.h"
  #include "catalog/pg_class.h"
  #include "storage/relfilenode.h"
  #include "utils/relcache.h"

  #define OIDCHARS        10        /* max chars printed by %u */
+ #define TABLESPACE_VERSION_DIRECTORY    "PG_" PG_MAJORVERSION "_" \
+                                     CppAsString2(CATALOG_VERSION_NO)

  extern const char *forkNames[];
  extern ForkNumber forkname_to_number(char *forkName);
Index: src/test/regress/output/tablespace.source
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/output/tablespace.source,v
retrieving revision 1.9
diff -c -c -r1.9 tablespace.source
*** src/test/regress/output/tablespace.source    5 Jan 2010 21:54:00 -0000    1.9
--- src/test/regress/output/tablespace.source    8 Jan 2010 01:07:50 -0000
***************
*** 65,71 ****

  -- Will fail with bad path
  CREATE TABLESPACE badspace LOCATION '/no/such/location';
! ERROR:  could not set permissions on directory "/no/such/location": No such file or directory
  -- No such tablespace
  CREATE TABLE bar (i int) TABLESPACE nosuchspace;
  ERROR:  tablespace "nosuchspace" does not exist
--- 65,71 ----

  -- Will fail with bad path
  CREATE TABLESPACE badspace LOCATION '/no/such/location';
! ERROR:  directory "/no/such/location" does not exist
  -- No such tablespace
  CREATE TABLE bar (i int) TABLESPACE nosuchspace;
  ERROR:  tablespace "nosuchspace" does not exist

Re: pg_migrator issues

From

Bruce Momjian

Date:

11 January 2010, 22:46:34

Bruce Momjian wrote:
> Bruce Momjian wrote:
> > 2)  Right now pg_migrator renames old tablespaces to .old, which fails
> > if the tablespaces are on mount points.  I have already received a
> > report of such a failure.  $PGDATA also has that issue, but that
> > renaming has to be done by the user before pg_migrator is run, and only
> > if they want to keep the same $PGDATA value after migration, i.e. no
> > version-specific directory path.  One idea we floated around was to have
> > tablespaces use major version directory names under the tablespace
> > directory so renaming would not be necessary.  I could implement a
> > pg_migrator --delete-old flag to cleanly delete the old 8.4 server files
> > which are not in a version-specific subdirectory.
> 
> I have created a patch to implement per-cluster directories in
> tablespaces.  This is for use by pg_migrator so it doesn't have to
> rename the tablespaces during the migration.  Users still need to remove
> the old cluster's tablespace subdirectory, and I can add a --delete-old
> option to pg_migrator to do that.
> 
> The old code used a symlink from pg_tblspc/#### to the location
> directory specified in CREATE TABLESPACE.  During CREATE TABLESPACE, a
> PG_VERSION file is created containing the major version number.  Anytime
> a database object is created in the tablespace, a per-database directory
> is created.
> 
> With the new code in this patch, pg_tblspc/#### points to the CREATE
> TABLESPACE directory just like before, but a new directory, PG_ +
> major_version + catalog_version, e.g. PG_8.5_201001061, is created and
> all per-database directories are created under that directory.  This
> directory has the same purpose as the old PG_VERSION file.  One
> disadvantage of this approach is that functions that need to look inside
> tablespaces must now also specify the version directory, e.g.
> pg_tablespace_databases().
> 
> An alternative approach would be for the pg_tblspc/#### symbolic link to
> point to the new version directory, PG_*, but that makes removal of the
> version directory complicated, particularly during WAL replay where we
> don't have access to the system catalogs, and readlink() to read the
> symbolic link target is not supported on all operating systems
> (particularly Win32).
> 
> I used the version directory pattern "PG_8.5_201001061" because "PG_"
> helps people realize the directory is for the use of Postgres
> (PG_VERSION is gone in tablespaces), and the catalog version number
> enables alpha migrations.  The major version number is not necessary but
> probably useful for administrators.
> 
> pg_migrator is going to need to know about the version directory too,
> and it can't use the C macro --- it has to construct the directory
> pattern based on the contents of pg_control from the old and new
> servers. And, it is going to be difficult to run pg_control on the old
> server for pg_migrator --delete-old after migration because it is
> renamed to pg_control.old --- I will need to create a symbolic link
> during the time I run pg_controldata.  Also, the contents of the
> tablespace directory for an 8.4 to 8.5 migration is going to be ugly
> because there will be many numeric directories (for databases), and
> PG_VERSION (for 8.4), and the PG_8.5_201001061 directory which should
> not be touched.
> 
> Can someone explain why TablespaceCreateDbspace() creates a non-symlink
> directory during recovery if the symlink is missing?  Is it just for
> robustness?  I would like to document that more clearly.

Applied.

FYI, I decide to create a pg_migrator_remove_old_cluster.sh/.bat file
that can be run by the user after the upgrade, instead of adding a
--delete-old-cluster option to pg_migrator.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: pg_migrator issues

From

Bruce Momjian

Date:

11 January 2010, 22:47:26

FYI, I consider all the issues below to be addressed (we did all but
#4), and pg_migrator will take advantage of these new facilities for 8.5.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> pg_migrator has become more popular recently, so it seems time to look
> at some enhancements that would improve pg_migrator.  None of these are
> required, but rather changes that would be nice to have:
> 
> 1)  Right now pg_migrator preserves relfilenodes for TOAST files because
> this is required for proper migration.  Now that we have shown that
> strategically-placed global variables with a server-side function to set
> them is a viable solution, it would be nice to preserve all relfilenodes
> from the old server.  This would simplify pg_migrator by no long
> requiring place-holder relfilenodes or the renaming of TOAST files.  A
> simpler solution would just be to allow TOAST table creation to
> automatically remove placeholder files and create specified relfilenodes
> via global variables.
> 
> 2)  Right now pg_migrator renames old tablespaces to .old, which fails
> if the tablespaces are on mount points.  I have already received a
> report of such a failure.  $PGDATA also has that issue, but that
> renaming has to be done by the user before pg_migrator is run, and only
> if they want to keep the same $PGDATA value after migration, i.e. no
> version-specific directory path.  One idea we floated around was to have
> tablespaces use major version directory names under the tablespace
> directory so renaming would not be necessary.  I could implement a
> pg_migrator --delete-old flag to cleanly delete the old 8.4 server files
> which are not in a version-specific subdirectory.
> 
> 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
> does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
> unnecessary vacuum.  Right now I recommend ANALYZE in every database,
> but it would be nice if there were a single command which did this.
> 
> 4)  I have implemented the ability to run pg_migrator --check on a live
> old server.  However, pg_migrator uses information from controldata to
> check things, and it also needs xid information that is only available
> via pg_resetxlog -n(no update) to perform the migration.  Unfortunately,
> pg_resetxlog -n cannot be run on a live server, so pg_migrator runs
> pg_controldata for --check and pg_resetxlog -n for real upgrades.  It
> would simplify pg_migrator if I would run pg_resetxlog -n on a live
> server, but I can understand if people don't want to do that because the
> xid information reported on a live server is inaccurate.
> 
> Comments?
> 
> -- 
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +