Re: Materialized views WIP patch - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Materialized views WIP patch
Date
Msg-id 1362073978.4025.YahooMailNeo@web162905.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: Materialized views WIP patch  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Materialized views WIP patch
List pgsql-hackers
Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 28.02.2013 16:55, Robert Haas wrote:

>> Well, personally, I'm in favor of either TRUNCATE or ALTER
>> MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous
>> precedent to suppose that we're going to start using DISCARD for
>> things that have nothing to do with the existing meanings of
>> DISCARD.  Number one, I think it's confusing.  Number two, it's
>> currently possible to determine whether something is DDL, DML,
>> or other by looking at the first word of the command.  If we
>> throw that out the window we may cause performance issues for
>> connection pooling software that tries to be clever like that.
>
> FWIW, I totally agree with that. From that point of view, the
> best thing would be to tack this onto the REFRESH command,
> perhaps something like:
>
> REFRESH matview INVALIDATE;
> REFRESH matview UNREFRESH;
> REFRESH matview DISCARD;
>
> It's a bit weird that the command is called REFRESH, if the
> effect is the exact opposite of refreshing it. And we usually do
> have two separate commands for doing something and undoing the
> same; CREATE - DROP, PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and
> so forth.
>
> I think we're being too hung up on avoiding new (unreserved)
> keywords. Yes, the grammar is large because of so many keywords,
> but surely there's some better solution to that than adopt syntax
> that sucks. Let's invent a new keyword (INVALIDATE? UNREFRESH?),
> and deal with the grammar bloat separately.

I'm OK with any grammar that we can reach consensus on, but that
seems elusive and I don't want to hold up getting the meat of the
patch committed while we haggle out this syntax detail.  Votes have
shifted frequently, but as I make out the latest opinion of people
making a statement that looks like an explicit vote, dividing the
value of a split vote between the choices, I get:

Kevin Grittner: TRUNCATE
Stephen Frost: TRUNCATE
Peter Eisentraut: ALTER
Andres Freund: RESET or DISCARD
Josh Berkus: RESET
Greg Stark: TRUNCATE
Michael Paquier: DISCARD
Robert Haas: TRUNCATE or ALTER
Tom Lane: TRUNCATE or ALTER
Heikki Linnakangas: REFRESH?

Vote totals:

TRUNCATE:  4.0
ALTER:     2.0
DISCARD:   1.5
RESET:     1.5
REFRESH    1.0?

Barring a sudden confluence of opinion, I will go with TRUNCATE for
the initial spelling.  I tend to favor that spelling for several
reasons.  One was the size of the patch needed to add the opposite
of REFRESH to the backend code:

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..eb7a14f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1234,11 +1234,12 @@ truncate_check_rel(Relation rel)
 {
    AclResult   aclresult;
 
-   /* Only allow truncate on regular tables */
-   if (rel->rd_rel->relkind != RELKIND_RELATION)
+   /* Only allow truncate on regular tables and materialized views. */
+   if (rel->rd_rel->relkind != RELKIND_RELATION &&
+       rel->rd_rel->relkind != RELKIND_MATVIEW)
        ereport(ERROR,
                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-                errmsg("\"%s\" is not a table",
+                errmsg("\"%s\" is not a table or materialized view",
                        RelationGetRelationName(rel))));
 
    /* Permissions checks */

That's it.  That takes it from "no way to release the space held by
the current contents of a materialized view and render it
unscannable until its rule's query is used to populate it again" to
"working".  Now, there are docs and psql support needed on top of
that, and the regression tests use the verb, and bikeshedding led
to a minor tweak of TRUNCATE so that you could specify TRUNCATE
MATERIALIZED VIEW -- but to get working backend code, the above is
sufficient.  That strikes me a prima facie evidence that it's not a
horribly off-target verb.

In terms of things to consider in choosing a verb are that, while
complete absence of data in the MV's backing table is the only
thing which will render it unscannable in this initial version,
there is clear demand to eventually track information on how
up-to-date data is, and treat the MV as unscannable for less
extreme conditions, such as the asynchronous application of changes
from backing tables having fallen behind some configurable
threshold.  In essence, people want the error instead of scanning
the relation if the generated data is not within a range of time
that is "fresh" enough, and truncating the backing relation (or
creating the MV WITH NO DATA) is a special case of that since the
data is not representing the results of the MV's query as of *any*
known point in time.  (As previously discussed, that is distinct
from an empty relation which is known to represent a fresh view of
the results of that query.)

If we pick a new pair of verbs, the connotations I think we should
be looking for include the ability to deal with at least these:

* Make the MV represent the result set of its query as of this
moment.

* Declare the MV to be too stale for the contents of its backing
table to be useful, and free the space held by the current
contents.

Other operations will be needed, for which syntax is not settled.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: sql_drop Event Trigger
Next
From: Jim Nasby
Date:
Subject: Re: autovacuum not prioritising for-wraparound tables