Re: refresh materialized view concurrently - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: refresh materialized view concurrently
Date
Msg-id 1373399447.2804.YahooMailNeo@web162904.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: refresh materialized view concurrently  (Hitoshi Harada <umi.tanuki@gmail.com>)
Responses Re: refresh materialized view concurrently
List pgsql-hackers
Hitoshi Harada <umi.tanuki@gmail.com> wrote:

> I think the point is not check the duplicate rows.  It is about unique
> key constraint violation.  So, if you change the original table foo as
> values(1, 10), (1, 20), the issue is still reproduced.  In
> non-concurrent operation it is checked by reindex_index when swapping
> the heap, but apparently we are not doing constraint check in
> concurrent mode.

The check for duplicate rows is necessary to allow the FULL join to
work correctly, but it is not sufficient without this:

@@ -654,7 +668,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid)
                 errhint("Create a UNIQUE index with no WHERE clause on one or more columns of the materialized
view.")));
 
    appendStringInfoString(&querybuf,
-                          ") WHERE (y.*) IS DISTINCT FROM (x.*)"
+                          " AND y = x) WHERE (y.*) IS DISTINCT FROM (x.*)"
                           " ORDER BY tid");
 
    /* Create the temporary "diff" table. */

The sad thing is that I had had that extra test in much earlier
because the relational algebra seemed to require it, but convinced
myself that it wasn't really needed because I couldn't think of a
test that caused a failure without it.  It turns out that was a
failure of imagination on my part.  I guess I should trust the
math.

The only thing I don't like about this is that the duplicate key
errors show as their context the SPI query doing the UPDATE or
INSERT.  I'm not sure whether it's worth the extra processing time
to avoid that.

> One thing I need to apology > make_temptable_name_n, because I
> pointed the previous coding would be a potential memory overrun,
> but actually the relation name is defined by make_new_heap, so
> unless the function generates stupid long name, there is not
> possibility to make such big name that overruns NAMEDATALEN.  So
> +1 for revert make_temptable_name_n, which is also meaninglessly
> invented.

I knew that but didn't point it out since I was changing to the
existing functions which use palloc() -- that became immaterial.

> I've found another issue, which is regarding
> matview_maintenance_depth.  If SPI calls failed
> (pg_cancel_backend is quite possible) and errors out in the
> middle of processing, this value stays above zero, so
> subsequently we can issue DML on the matview.  We should make
> sure the value becomes zero before jumping out from this
> function.

Good point.  There's more than one way to do that, but this seemed
cleanest to me because it avoids allowing any modification of
matview_maintenance_depth outside of matview.c:

@@ -251,7 +251,21 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 
    /* Make the matview match the newly generated data. */
    if (concurrent)
-       refresh_by_match_merge(matviewOid, OIDNewHeap);
+   {
+       int     old_depth = matview_maintenance_depth;
+
+       PG_TRY();
+       {
+           refresh_by_match_merge(matviewOid, OIDNewHeap);
+       }
+       PG_CATCH();
+       {
+           matview_maintenance_depth = old_depth;
+           PG_RE_THROW();
+       }
+       PG_END_TRY();
+       Assert(matview_maintenance_depth == old_depth);
+   }
    else
        refresh_by_heap_swap(matviewOid, OIDNewHeap);
 }

Thanks again!  New patch attached.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Review: extension template
Next
From: Merlin Moncure
Date:
Subject: Re: Millisecond-precision connect_timeout for libpq