Materialized views WIP patch - Mailing list pgsql-hackers

From Kevin Grittner
Subject Materialized views WIP patch
Date
Msg-id 20121115022819.90140@gmx.com
Whole thread Raw
Responses Re: Materialized views WIP patch  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Materialized views WIP patch  (Greg Smith <greg@2ndQuadrant.com>)
Re: Materialized views WIP patch  (Josh Berkus <josh@agliodbs.com>)
Re: Materialized views WIP patch  (Jeff Davis <pgsql@j-davis.com>)
Re: Materialized views WIP patch  (Thom Brown <thom@linux.com>)
Re: Materialized views WIP patch  (Marko Tiikkaja <pgmail@joh.to>)
Re: Materialized views WIP patch  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Attached is a patch that is still WIP but that I think is getting
pretty close to completion. It is not intended to be the be-all and
end-all for materialized views, but the minimum useful feature set --
which is all that I've had time to do for this release. In
particular, the view is only updated on demand by a complete rebuild.
For the next release, I hope to build on this base to allow more
eager and incremental updates, and perhaps a concurrent batch update.

1.  CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
    TABLE AS, with all the same clauses supported. That includes
    declaring a materialized view to be temporary or unlogged.
2.  MVs don't support inheritance.
3.  MVs can't define foreign keys.
4.  MVs can't be the target of foreign keys.
5.  MVs can't have triggers.
6.  Users can't create rules which reference MVs (although MVs
    [ab]use the rules mechanism internally, similar to how views do).
7.  MVs can't be converted to views, nor vice versa.
8.  Users may not directly use INSERT/UPDATE/DELETE on an MV.
9.  MVs can't directly be used in a COPY statement, but can be the
    source of data using a SELECT.
10. MVs can't own sequences.
11. MVs can't be the target of LOCK statements, although other
    statements get locks just like a table.
12. MVs can't use data modifying CTEs in their definitions.
13. pg_class now has a relisvalid column, which is true if an MV is
    truncated or created WITH NO DATA. You can not scan a relation
    flagged as invalid.
14. ALTER MATERIALIZED VIEW is supported for the options that seemed
    to make sense. For example, you can change the tablespace or
    schema, but you cannot add or drop column with ALTER.
15. The SELECT query used to define the MV may not contain a
    data-modifying CTE.
16. To get new data into the MV, the command is LOAD MATERIALIZED
    VIEW mat view_name. This seemed more descriptive to me that the
    alternatives and avoids declaring any new keywords beyond
    MATERIALIZED. If the MV is flagged as relisvalid == false, this
    will change it to true.
17. Since the data viewed in an MV is not up-to-date with the latest
    committed transaction, it didn't seem to make any sense to try to
    apply SERIALIZABLE transaction semantics to queries looking at
    the contents of an MV, although if LMV is run in a SERIALIZABLE
    transaction the MV data is guaranteed to be free of serialization
    anomalies. This does leave the transaction running the LOAD
    command vulnerable to serialization failures unless it is also
    READ ONLY DEFERRABLE.
18. Bound parameters are not supported for the CREATE MATERIALIZED
    VIEW statement.
19. LMV doesn't show a row count. It wouldn't be hard to add, it just
    seemed a little out of place to do that, when CLUSTER, etc.,
    don't.

I wasn't able to wrap up a few things in time for this commitfest:

 - Documentation is incomplete.
 - pg_dump support needs addtional dependencies added to properly
   handle MVs which are defined using other MVs.
 - pg_dump binary hasn't had a lot of attention yet.
 - There are no regression tests yet.
 - I ran into problems getting the validity check working right, so I
   have disabled it by commenting out the function body in this
   patch.
 - TRUNCATE should probably support a MATERIALIZED VIEW clause.

It would be good to have some discussion to try to reach a consensus
about whether we need to differentiate between *missing* datat (where
a materialized view which has been loaded WITH NO DATA or TRUNCATEd
and has not been subsequently LOADed) and potentially *stale* data.
If we don't care to distinguish between a view which generated no
rows when it ran and a one for which the query has not been run, we
can avoid adding the relisvalid flag, and we could support UNLOGGED
MVs. Perhaps someone can come up with a better solution to that
problem.

In the long term, we will probably need to separate the
implementation of CREATE TABLE AS and CREATE MATERIALIZED VIEW, but
for now there is so little that they need to do differently it seemed
less evil to have a few "if" clauses that that much duplicated code.

The paint is pretty wet still, but hopefully people can evaluate the
approach and work out any issues with the design choices in the CF so
that it can be wrapped up nicely for the next one.

 92 files changed, 2377 insertions(+), 440 deletions(-)

-Kevin

Attachment

pgsql-hackers by date:

Previous
From: "David Rowley"
Date:
Subject: Re: Doc patch making firm recommendation for setting the value of commit_delay
Next
From: Craig Ringer
Date:
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY