Re: BUG #9616: Materialized view with indexes unable to load from pg_dump - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #9616: Materialized view with indexes unable to load from pg_dump
Date
Msg-id 3867.1395250733@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #9616: Materialized view with indexes unable to load from pg_dump  (Jesse Denardo <jesse.denardo@myfarms.com>)
List pgsql-bugs
Jesse Denardo <jesse.denardo@myfarms.com> writes:
> [ pg_dump fails to dump this sanely: ]

> CREATE SCHEMA test_mat;
> CREATE TABLE test_mat.a (
>     id integer,
>     name character varying
> );
> ALTER TABLE ONLY test_mat.a ADD CONSTRAINT a_pkey PRIMARY KEY (id);
> CREATE MATERIALIZED VIEW test_mat.mat AS (
>     SELECT id, name FROM test_mat.a GROUP BY id
> );

OK, the reason why this is odd is that the view definition requires that
a.id be a primary key.  Because primary keys are supposed to be emitted
in the post-data section of the dump, while the CREATE MAT VIEW command
will be emitted pre-data, we have a dump-object sorting failure, which
manifests as a dependency loop involving the matview, its rule, a's
primary key index, and the section boundary pseudo-objects.

The code in pg_dump_sort.c assumes that any relation with an ON SELECT
rule must be a regular view, and so it tries to break the loop by
splitting the view into a CREATE TABLE command followed by a CREATE RULE
command.  Of course, that results in SQL that recreates a plain view not a
matview; not to mention your original complaint that it fails when there
are indexes on the matview.

I've developed the attached draft patch that fixes this by allowing the
CREATE MATERIALIZED VIEW command to be postponed into the post-data
section.  I think this is pretty darn ugly, since people won't expect such
a categorization, but I'm not sure we have any choice so far as existing
branches are concerned.

If we had a CREATE OR REPLACE MATERIALIZED VIEW type of command, we could
imagine fixing this by emitting a dummy create command in the pre-data
section, along the lines of

     CREATE MATERIALIZED VIEW mv AS SELECT
         NULL::coltype1 AS colname1,
         NULL::coltype2 AS colname2,
        ...
     WITH NO DATA;

and then overwriting that with CREATE OR REPLACE MATERIALIZED VIEW at the
point where it's safe to give the real view definition.  (If we were to
write code for that, I'd be pretty inclined to change the dumping of
separated views to look similar, instead of the hack they are now.)

Of course, you could argue that that wouldn't be materially less confusing
than the other way.  But I suspect we are going to end up doing something
of the sort anyhow, because I have little faith that the technique in this
patch will fix every case of circularities involving matviews.

Comments?  Anybody want to try to fix this in another way?

            regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 4001f3f..2653ef0 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** getTables(Archive *fout, int *numTables)
*** 4796,4801 ****
--- 4796,4803 ----
              selectDumpableTable(&tblinfo[i]);
          tblinfo[i].interesting = tblinfo[i].dobj.dump;

+         tblinfo[i].postponed_def = false; /* might get set during sort */
+
          /*
           * Read-lock target tables to make sure they aren't DROPPED or altered
           * in schema before we get around to dumping them.
*************** dumpTableSchema(Archive *fout, TableInfo
*** 13611,13617 ****
              (tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
                   tbinfo->rolname,
                 (strcmp(reltypename, "TABLE") == 0) ? tbinfo->hasoids : false,
!                  reltypename, SECTION_PRE_DATA,
                   q->data, delq->data, NULL,
                   NULL, 0,
                   NULL, NULL);
--- 13613,13620 ----
              (tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
                   tbinfo->rolname,
                 (strcmp(reltypename, "TABLE") == 0) ? tbinfo->hasoids : false,
!                  reltypename,
!                  tbinfo->postponed_def ? SECTION_POST_DATA : SECTION_PRE_DATA,
                   q->data, delq->data, NULL,
                   NULL, 0,
                   NULL, NULL);
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 398eca1..e01015e 100644
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
*************** typedef struct _tableInfo
*** 241,247 ****
      char       *reltablespace;    /* relation tablespace */
      char       *reloptions;        /* options specified by WITH (...) */
      char       *checkoption;    /* WITH CHECK OPTION */
!     char       *toast_reloptions;        /* ditto, for the TOAST table */
      bool        hasindex;        /* does it have any indexes? */
      bool        hasrules;        /* does it have any rules? */
      bool        hastriggers;    /* does it have any triggers? */
--- 241,247 ----
      char       *reltablespace;    /* relation tablespace */
      char       *reloptions;        /* options specified by WITH (...) */
      char       *checkoption;    /* WITH CHECK OPTION */
!     char       *toast_reloptions;        /* WITH options for the TOAST table */
      bool        hasindex;        /* does it have any indexes? */
      bool        hasrules;        /* does it have any rules? */
      bool        hastriggers;    /* does it have any triggers? */
*************** typedef struct _tableInfo
*** 254,262 ****
      /* these two are set only if table is a sequence owned by a column: */
      Oid            owning_tab;        /* OID of table owning sequence */
      int            owning_col;        /* attr # of column owning sequence */
!     int            relpages;

      bool        interesting;    /* true if need to collect more data */

      /*
       * These fields are computed only if we decide the table is interesting
--- 254,263 ----
      /* these two are set only if table is a sequence owned by a column: */
      Oid            owning_tab;        /* OID of table owning sequence */
      int            owning_col;        /* attr # of column owning sequence */
!     int            relpages;        /* table's size in pages (from pg_class) */

      bool        interesting;    /* true if need to collect more data */
+     bool        postponed_def;    /* matview must be postponed into post-data */

      /*
       * These fields are computed only if we decide the table is interesting
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index bda3535..3590932 100644
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
*************** repairTypeFuncLoop(DumpableObject *typeo
*** 798,803 ****
--- 798,804 ----
   * will be an implicit dependency in the other direction, we need to break
   * the loop.  If there are no other objects in the loop then we can remove
   * the implicit dependency and leave the ON SELECT rule non-separate.
+  * This applies to matviews, as well.
   */
  static void
  repairViewRuleLoop(DumpableObject *viewobj,
*************** repairViewRuleLoop(DumpableObject *viewo
*** 814,820 ****
   * Because findLoop() finds shorter cycles before longer ones, it's likely
   * that we will have previously fired repairViewRuleLoop() and removed the
   * rule's dependency on the view.  Put it back to ensure the rule won't be
!  * emitted before the view...
   */
  static void
  repairViewRuleMultiLoop(DumpableObject *viewobj,
--- 815,823 ----
   * Because findLoop() finds shorter cycles before longer ones, it's likely
   * that we will have previously fired repairViewRuleLoop() and removed the
   * rule's dependency on the view.  Put it back to ensure the rule won't be
!  * emitted before the view.
!  *
!  * Note: this approach does *not* work for matviews, at the moment.
   */
  static void
  repairViewRuleMultiLoop(DumpableObject *viewobj,
*************** repairViewRuleMultiLoop(DumpableObject *
*** 842,847 ****
--- 845,874 ----
  }

  /*
+  * If a matview is involved in a multi-object loop, we can't currently fix
+  * that by splitting off the rule.    As a stopgap, we try to fix it by
+  * dropping the constraint that the matview be dumped in the pre-data section.
+  * This is sufficient to handle cases where a matview depends on some unique
+  * index, as can happen if it has a GROUP BY for example.
+  *
+  * Note that the "next object" is not necessarily the matview itself;
+  * it could be the matview's rowtype, for example.  We may come through here
+  * several times while removing all the pre-data linkages.
+  */
+ static void
+ repairMatViewBoundaryMultiLoop(DumpableObject *matviewobj,
+                                DumpableObject *boundaryobj,
+                                DumpableObject *nextobj)
+ {
+     TableInfo  *matviewinfo = (TableInfo *) matviewobj;
+
+     /* remove boundary's dependency on object after it in loop */
+     removeObjectDependency(boundaryobj, nextobj->dumpId);
+     /* mark matview as postponed into post-data section */
+     matviewinfo->postponed_def = true;
+ }
+
+ /*
   * Because we make tables depend on their CHECK constraints, while there
   * will be an automatic dependency in the other direction, we need to break
   * the loop.  If there are no other objects in the loop then we can remove
*************** repairDependencyLoop(DumpableObject **lo
*** 956,965 ****
          return;
      }

!     /* View and its ON SELECT rule */
      if (nLoop == 2 &&
          loop[0]->objType == DO_TABLE &&
          loop[1]->objType == DO_RULE &&
          ((RuleInfo *) loop[1])->ev_type == '1' &&
          ((RuleInfo *) loop[1])->is_instead &&
          ((RuleInfo *) loop[1])->ruletable == (TableInfo *) loop[0])
--- 983,994 ----
          return;
      }

!     /* View (including matview) and its ON SELECT rule */
      if (nLoop == 2 &&
          loop[0]->objType == DO_TABLE &&
          loop[1]->objType == DO_RULE &&
+         (((TableInfo *) loop[0])->relkind == 'v' ||        /* RELKIND_VIEW */
+          ((TableInfo *) loop[0])->relkind == 'm') &&    /* RELKIND_MATVIEW */
          ((RuleInfo *) loop[1])->ev_type == '1' &&
          ((RuleInfo *) loop[1])->is_instead &&
          ((RuleInfo *) loop[1])->ruletable == (TableInfo *) loop[0])
*************** repairDependencyLoop(DumpableObject **lo
*** 970,975 ****
--- 999,1006 ----
      if (nLoop == 2 &&
          loop[1]->objType == DO_TABLE &&
          loop[0]->objType == DO_RULE &&
+         (((TableInfo *) loop[1])->relkind == 'v' ||        /* RELKIND_VIEW */
+          ((TableInfo *) loop[1])->relkind == 'm') &&    /* RELKIND_MATVIEW */
          ((RuleInfo *) loop[0])->ev_type == '1' &&
          ((RuleInfo *) loop[0])->is_instead &&
          ((RuleInfo *) loop[0])->ruletable == (TableInfo *) loop[1])
*************** repairDependencyLoop(DumpableObject **lo
*** 978,989 ****
          return;
      }

!     /* Indirect loop involving view and ON SELECT rule */
      if (nLoop > 2)
      {
          for (i = 0; i < nLoop; i++)
          {
!             if (loop[i]->objType == DO_TABLE)
              {
                  for (j = 0; j < nLoop; j++)
                  {
--- 1009,1021 ----
          return;
      }

!     /* Indirect loop involving view (but not matview) and ON SELECT rule */
      if (nLoop > 2)
      {
          for (i = 0; i < nLoop; i++)
          {
!             if (loop[i]->objType == DO_TABLE &&
!                 ((TableInfo *) loop[i])->relkind == 'v')        /* RELKIND_VIEW */
              {
                  for (j = 0; j < nLoop; j++)
                  {
*************** repairDependencyLoop(DumpableObject **lo
*** 1000,1005 ****
--- 1032,1061 ----
          }
      }

+     /* Indirect loop involving matview and data boundary */
+     if (nLoop > 2)
+     {
+         for (i = 0; i < nLoop; i++)
+         {
+             if (loop[i]->objType == DO_TABLE &&
+                 ((TableInfo *) loop[i])->relkind == 'm')        /* RELKIND_MATVIEW */
+             {
+                 for (j = 0; j < nLoop; j++)
+                 {
+                     if (loop[j]->objType == DO_PRE_DATA_BOUNDARY)
+                     {
+                         DumpableObject *nextobj;
+
+                         nextobj = (j < nLoop - 1) ? loop[j + 1] : loop[0];
+                         repairMatViewBoundaryMultiLoop(loop[i], loop[j],
+                                                        nextobj);
+                         return;
+                     }
+                 }
+             }
+         }
+     }
+
      /* Table and CHECK constraint */
      if (nLoop == 2 &&
          loop[0]->objType == DO_TABLE &&

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Re: Leaking regexp_replace in 9.3.1 ? (was: [HACKERSUninterruptable regexp_replace in 9.3.1 ?)
Next
From: jkoceniak@mediamath.com
Date:
Subject: BUG #9635: Wal sender process is using 100% CPU