Thread: Materialized views WIP patch

Materialized views WIP patch

From
"Kevin Grittner"
Date:
Attached is a patch that is still WIP but that I think is getting
pretty close to completion. It is not intended to be the be-all and
end-all for materialized views, but the minimum useful feature set --
which is all that I've had time to do for this release. In
particular, the view is only updated on demand by a complete rebuild.
For the next release, I hope to build on this base to allow more
eager and incremental updates, and perhaps a concurrent batch update.

1.  CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
    TABLE AS, with all the same clauses supported. That includes
    declaring a materialized view to be temporary or unlogged.
2.  MVs don't support inheritance.
3.  MVs can't define foreign keys.
4.  MVs can't be the target of foreign keys.
5.  MVs can't have triggers.
6.  Users can't create rules which reference MVs (although MVs
    [ab]use the rules mechanism internally, similar to how views do).
7.  MVs can't be converted to views, nor vice versa.
8.  Users may not directly use INSERT/UPDATE/DELETE on an MV.
9.  MVs can't directly be used in a COPY statement, but can be the
    source of data using a SELECT.
10. MVs can't own sequences.
11. MVs can't be the target of LOCK statements, although other
    statements get locks just like a table.
12. MVs can't use data modifying CTEs in their definitions.
13. pg_class now has a relisvalid column, which is true if an MV is
    truncated or created WITH NO DATA. You can not scan a relation
    flagged as invalid.
14. ALTER MATERIALIZED VIEW is supported for the options that seemed
    to make sense. For example, you can change the tablespace or
    schema, but you cannot add or drop column with ALTER.
15. The SELECT query used to define the MV may not contain a
    data-modifying CTE.
16. To get new data into the MV, the command is LOAD MATERIALIZED
    VIEW mat view_name. This seemed more descriptive to me that the
    alternatives and avoids declaring any new keywords beyond
    MATERIALIZED. If the MV is flagged as relisvalid == false, this
    will change it to true.
17. Since the data viewed in an MV is not up-to-date with the latest
    committed transaction, it didn't seem to make any sense to try to
    apply SERIALIZABLE transaction semantics to queries looking at
    the contents of an MV, although if LMV is run in a SERIALIZABLE
    transaction the MV data is guaranteed to be free of serialization
    anomalies. This does leave the transaction running the LOAD
    command vulnerable to serialization failures unless it is also
    READ ONLY DEFERRABLE.
18. Bound parameters are not supported for the CREATE MATERIALIZED
    VIEW statement.
19. LMV doesn't show a row count. It wouldn't be hard to add, it just
    seemed a little out of place to do that, when CLUSTER, etc.,
    don't.

I wasn't able to wrap up a few things in time for this commitfest:

 - Documentation is incomplete.
 - pg_dump support needs addtional dependencies added to properly
   handle MVs which are defined using other MVs.
 - pg_dump binary hasn't had a lot of attention yet.
 - There are no regression tests yet.
 - I ran into problems getting the validity check working right, so I
   have disabled it by commenting out the function body in this
   patch.
 - TRUNCATE should probably support a MATERIALIZED VIEW clause.

It would be good to have some discussion to try to reach a consensus
about whether we need to differentiate between *missing* datat (where
a materialized view which has been loaded WITH NO DATA or TRUNCATEd
and has not been subsequently LOADed) and potentially *stale* data.
If we don't care to distinguish between a view which generated no
rows when it ran and a one for which the query has not been run, we
can avoid adding the relisvalid flag, and we could support UNLOGGED
MVs. Perhaps someone can come up with a better solution to that
problem.

In the long term, we will probably need to separate the
implementation of CREATE TABLE AS and CREATE MATERIALIZED VIEW, but
for now there is so little that they need to do differently it seemed
less evil to have a few "if" clauses that that much duplicated code.

The paint is pretty wet still, but hopefully people can evaluate the
approach and work out any issues with the design choices in the CF so
that it can be wrapped up nicely for the next one.

 92 files changed, 2377 insertions(+), 440 deletions(-)

-Kevin

Attachment

Re: Materialized views WIP patch

From
Alvaro Herrera
Date:
Kevin Grittner wrote:

Interesting stuff.

>   /*
> +  * SetRelationIsValid
> +  *        Set the value of the relation's relisvalid field in pg_class.
> +  *
> +  * NOTE: caller must be holding an appropriate lock on the relation.
> +  * ShareUpdateExclusiveLock is sufficient.
> +  *
> +  * NOTE: an important side-effect of this operation is that an SI invalidation
> +  * message is sent out to all backends --- including me --- causing plans
> +  * referencing the relation to be rebuilt with the new list of children.
> +  * This must happen even if we find that no change is needed in the pg_class
> +  * row.
> +  */
> + void
> + SetRelationIsValid(Oid relationId, bool relisvalid)
> + {

It's not clear to me that it's right to do this by doing regular heap
updates here instead of heap_inplace_update.  Also, I think this might
end up causing a lot of pg_class tuple churn (at least for matviews that
delete rows at xact end), which would be nice to avoid.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Materialized views WIP patch

From
Greg Smith
Date:
On 11/14/12 6:28 PM, Kevin Grittner wrote:
>   - Documentation is incomplete.
>   ...
>   - There are no regression tests yet.

Do you have any simple test cases you've been using you could attach? 
With epic new features like this, when things don't work it's hard to 
distinguish between "that just isn't implemented yet" and "the author 
never tested that".  Having some known good samples you have tested, 
even if they're not proper regression tests, would be helpful for 
establishing the code baseline works.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



Re: Materialized views WIP patch

From
Josh Berkus
Date:
Kevin,

> Attached is a patch that is still WIP but that I think is getting
> pretty close to completion. It is not intended to be the be-all and
> end-all for materialized views, but the minimum useful feature set --
> which is all that I've had time to do for this release. In
> particular, the view is only updated on demand by a complete rebuild.
> For the next release, I hope to build on this base to allow more
> eager and incremental updates, and perhaps a concurrent batch update.

Nice to see this come in!

> 1.  CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
>     TABLE AS, with all the same clauses supported. That includes
>     declaring a materialized view to be temporary or unlogged.

What use would a temporary matview be?

Unlogged is good.

> 2.  MVs don't support inheritance.

In which direction?  Can't inherit, or can't be inherited from?

> 3.  MVs can't define foreign keys.
> 4.  MVs can't be the target of foreign keys.
> 5.  MVs can't have triggers.

Makes sense.

> 9.  MVs can't directly be used in a COPY statement, but can be the
>     source of data using a SELECT.

Hmmm? I don't understand the reason for this.

> 13. pg_class now has a relisvalid column, which is true if an MV is
>     truncated or created WITH NO DATA. You can not scan a relation
>     flagged as invalid.

What error would a user see?

> 14. ALTER MATERIALIZED VIEW is supported for the options that seemed
>     to make sense. For example, you can change the tablespace or
>     schema, but you cannot add or drop column with ALTER.

How would you change the definition of an MV then?

> 16. To get new data into the MV, the command is LOAD MATERIALIZED
>     VIEW mat view_name. This seemed more descriptive to me that the
>     alternatives and avoids declaring any new keywords beyond
>     MATERIALIZED. If the MV is flagged as relisvalid == false, this
>     will change it to true.

UPDATE MATERIALIZED VIEW was problematic?

Does LOAD automatically TRUNCATE the view before reloading it?  If not,
why not?

> It would be good to have some discussion to try to reach a consensus
> about whether we need to differentiate between *missing* datat (where
> a materialized view which has been loaded WITH NO DATA or TRUNCATEd
> and has not been subsequently LOADed) and potentially *stale* data.
> If we don't care to distinguish between a view which generated no
> rows when it ran and a one for which the query has not been run, we
> can avoid adding the relisvalid flag, and we could support UNLOGGED
> MVs. Perhaps someone can come up with a better solution to that
> problem.

Hmmm.  I understand the distinction you're making here, but I'm not sure
it actually matters to the user.  MVs, by their nature, always have
potentially stale data.  Being empty (in an inaccurate way) is just one
kind of stale data.

It would be nice for the user to have some way to know that a matview is
empty due to never being LOADed or recently being TRUNCATEd.  However, I
don't think that relisvalid flag -- and preventing scanning the relation
-- is a good solution.  What I'd rather have instead is a timestamp of
when the MV was last LOADed.  If the MV was never loaded (or was
truncated) that timestamp would be NULL.  Such a timestamp would allow
users to construct all kinds of ad-hoc refresh schemes for MVs which
would not be possible without it.

I don't see how this relates to UNLOGGED matviews either way.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Jeff Davis
Date:
On Wed, 2012-11-14 at 21:28 -0500, Kevin Grittner wrote:
> Attached is a patch that is still WIP but that I think is getting
> pretty close to completion. It is not intended to be the be-all and
> end-all for materialized views, but the minimum useful feature set --
> which is all that I've had time to do for this release. In
> particular, the view is only updated on demand by a complete rebuild.
> For the next release, I hope to build on this base to allow more
> eager and incremental updates, and perhaps a concurrent batch update.

The documentation says that a materialized view is basically a
create-table-as-select except that it remembers the query. Would you say
that there is a compelling use case for this alone, or is this a
building block for more sophisticated materialized view support (e.g.
eager updating) later?

Regards,Jeff Davis




Re: Materialized views WIP patch

From
Dimitri Fontaine
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> The documentation says that a materialized view is basically a
> create-table-as-select except that it remembers the query. Would you say
> that there is a compelling use case for this alone, or is this a
> building block for more sophisticated materialized view support (e.g.
> eager updating) later?

The implementation of the re-LOAD'ing command makes it already
worthwile. Bonus point if locking is limited to when the new content is
all computer and ready, but even without that, I want to have it. ;)

I'd bikeshed and prefer the UPDATE MATERIALIZED VIEW nsp.foo; of course.

The alternative is creating a view, a matching table and a stored
procedure that will implement the rebuilding, for each mat view you want
to have. So that's already a big step forward in my eyes.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Greg Smith wrote:
> On 11/14/12 6:28 PM, Kevin Grittner wrote:
>> - Documentation is incomplete.
>> ...
>> - There are no regression tests yet.
> 
> Do you have any simple test cases you've been using you could attach? 
> With epic new features like this, when things don't work it's hard to 
> distinguish between "that just isn't implemented yet" and "the author 
> never tested that". Having some known good samples you have tested, 
> even if they're not proper regression tests, would be helpful for 
> establishing the code baseline works.

I can probably post something along those lines Monday. Sorry for the
delay. Basically, though, I tried to state everything that I know of
which is not yet done and working; so if you find something which
doesn't behave as you would expect, please let me know. It's well
within the realm of possibility that there are issues that I didn't
think of. I certainly can fall  into the tendency of programmers to
think about testing those things which I thought to cover in the code.

-Kevin



Re: Materialized views WIP patch

From
Merlin Moncure
Date:
On Fri, Nov 16, 2012 at 7:13 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
>> The documentation says that a materialized view is basically a
>> create-table-as-select except that it remembers the query. Would you say
>> that there is a compelling use case for this alone, or is this a
>> building block for more sophisticated materialized view support (e.g.
>> eager updating) later?
>
> The implementation of the re-LOAD'ing command makes it already
> worthwile. Bonus point if locking is limited to when the new content is
> all computer and ready, but even without that, I want to have it. ;)

Seconded.  Background lock free refresh of materialization table would
be wonderful.  But moving dependency between source and materialized
table out of plpgsql function and into defined schema justifies
feature on its own merits.

merlin



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Jeff Davis wrote:
> On Wed, 2012-11-14 at 21:28 -0500, Kevin Grittner wrote:
> > Attached is a patch that is still WIP but that I think is getting
> > pretty close to completion. It is not intended to be the be-all and
> > end-all for materialized views, but the minimum useful feature set --
> > which is all that I've had time to do for this release. In
> > particular, the view is only updated on demand by a complete rebuild.
> > For the next release, I hope to build on this base to allow more
> > eager and incremental updates, and perhaps a concurrent batch update.
> 
> The documentation says that a materialized view is basically a
> create-table-as-select except that it remembers the query. Would you say
> that there is a compelling use case for this alone, or is this a
> building block for more sophisticated materialized view support (e.g.
> eager updating) later?

IMV, this has some slight value as it stands, although perhaps not enough
to justify a patch this big. The idea is that with this much in place,
patches to implement more aggressive and incremental maintenance of the
MV data become possible. So I think the bar it should pass for commit is
that it seems a sane basis for that, while providing some functionality
which people will find useful.

-Kevin



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Alvaro Herrera wrote:

> It's not clear to me that it's right to do this by doing regular heap
> updates here instead of heap_inplace_update. Also, I think this might
> end up causing a lot of pg_class tuple churn (at least for matviews that
> delete rows at xact end), which would be nice to avoid.

If we keep the flag, I will look into heap_inplace_update.

Thanks!

-Kevin



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Josh Berkus wrote:

>> 1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
>> TABLE AS, with all the same clauses supported. That includes
>> declaring a materialized view to be temporary or unlogged.
> 
> What use would a temporary matview be?

It would be essentially like a temporary table, with all the same
persistence options. I'm not really sure how often it will be more
useful than a temporary table before we have incremental maintenance
of materialized views; once we have that, though, it seems likely
that there could be reasonable use cases.

> Unlogged is good.

I agree that there are likely to be more use cases for this than
temp MVs. Unfortunately, I've had a hard time figuring out how to
flag an MV which is empty because its contents were lost after a
crash with preventing people from using an MV which hasn't been
populated, which has the potential to silently return incorrect
results.

>> 2. MVs don't support inheritance.
> 
> In which direction? Can't inherit, or can't be inherited from?

The table inheritance has not been implemented in either direction
for MVs. It didn't seem clear to me that there were reasonable use
cases. Do you see any?

>> 9. MVs can't directly be used in a COPY statement, but can be the
>> source of data using a SELECT.
> 
> Hmmm? I don't understand the reason for this.

Consistency. There are other object types which seem to enforce this
rule for no reason that I can see beyond maybe a desire to have both
directions of COPY work with the same set of objects. If I remember
correctly, allowing this would eliminate one line of code from the
patch, so if sentiment is in favor of it, it is very easily done.

>> 13. pg_class now has a relisvalid column, which is true if an MV is
>> truncated or created WITH NO DATA. You can not scan a relation
>> flagged as invalid.
> 
> What error would a user see?

I can more directly answer that on Monday. If you enable the body of
the function which makes the relisvalid check you can see the messages.
I commented it out because I have not yet figured out how to suppress
the check for a LOAD MV command.

>> 14. ALTER MATERIALIZED VIEW is supported for the options that seemed
>> to make sense. For example, you can change the tablespace or
>> schema, but you cannot add or drop column with ALTER.
> 
> How would you change the definition of an MV then?

At this point you would need to drop and re-create the MV. If we
want to add columns to an MV or change what an existing column holds,
perhaps we could have an ALTER MV which changed the SELECT statement
that populates the MV? I would prefer to leave that to a later patch,
though -- it seems like a bit of a minefield compared to what is
being implemented in this patch.

>> 16. To get new data into the MV, the command is LOAD MATERIALIZED
>> VIEW mat view_name. This seemed more descriptive to me that the
>> alternatives and avoids declaring any new keywords beyond
>> MATERIALIZED. If the MV is flagged as relisvalid == false, this
>> will change it to true.
> 
> UPDATE MATERIALIZED VIEW was problematic?

Not technically, really, but I saw two reasons that I preferred LOAD MV:

1. It seems to me to better convey that the entire contents of the MV  will be built from scratch, rather than
incrementallyadjusted.
 
2. We haven't hashed out the syntax for more aggressive maintenance of  an MV, and it seemed like UPDATE MV might be
syntaxwe would want to  use for something which updated selected parts of an MV when we do.
 

> Does LOAD automatically TRUNCATE the view before reloading it? If not,
> why not?

It builds a new heap and moves it into place. When the transaction
running LMV commits, the old heap is deleted. In implementation it is
closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
creating a new table. This allows all permissions, etc., to stay in
place.

>> It would be good to have some discussion to try to reach a consensus
>> about whether we need to differentiate between *missing* datat (where
>> a materialized view which has been loaded WITH NO DATA or TRUNCATEd
>> and has not been subsequently LOADed) and potentially *stale* data.
>> If we don't care to distinguish between a view which generated no
>> rows when it ran and a one for which the query has not been run, we
>> can avoid adding the relisvalid flag, and we could support UNLOGGED
>> MVs. Perhaps someone can come up with a better solution to that
>> problem.
> 
> Hmmm. I understand the distinction you're making here, but I'm not sure
> it actually matters to the user. MVs, by their nature, always have
> potentially stale data. Being empty (in an inaccurate way) is just one
> kind of stale data.

Robert feels the same way, but I disagree. Some MVs will not be terribly
volatile. In my view there is a big difference between having a "top ten"
list which might be based on yesterday's base tables rather than the base
table states as of this moment, and having a "top ten" list with no
entries. If you want to, for example, take some action if an order comes
in for one of your top customers, and a different path for other
customers, suddenly treating all of your long-time top customers as not
being so, without any squawk from the database, seems dangerous.

> It would be nice for the user to have some way to know that a matview is
> empty due to never being LOADed or recently being TRUNCATEd. However, I
> don't think that relisvalid flag -- and preventing scanning the relation
> -- is a good solution. What I'd rather have instead is a timestamp of
> when the MV was last LOADed. If the MV was never loaded (or was
> truncated) that timestamp would be NULL. Such a timestamp would allow
> users to construct all kinds of ad-hoc refresh schemes for MVs which
> would not be possible without it.

I see your point there; I'll think about that. My take was more that MVs
would often be refreshed by crontab, and that you would want to keep
subsequent steps from running and generating potentially plausible but
completely inaccurate results if the LMV failed.

> I don't see how this relates to UNLOGGED matviews either way.

UNLOGGED tables and indexes are made empty on crash recovery by copying
the initialization fork over the "normal" relations. Care was taken to
avoid needing to connect to each database in turn to complete that
recovery. This style of recovery can't really set the relisvalid flag, as
far as I can see; which leaves us choosing between unlogged MVs and
knowing whether they hold valid data -- unless someone has a better idea.

-Kevin



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Thu, Nov 15, 2012 at 1:36 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Hmmm.  I understand the distinction you're making here, but I'm not sure
> it actually matters to the user.  MVs, by their nature, always have
> potentially stale data.  Being empty (in an inaccurate way) is just one
> kind of stale data.

This is my feeling also.

> I don't see how this relates to UNLOGGED matviews either way.

Right now, Kevin has things set up so that when you do "TRUNCATE mv",
it clears the relisvalid flag.  If we allowed unlogged materialized
views, the table would be automatically truncated on a crash, but
there wouldn't be any way to clear relisvalid in that situation, so
Kevin felt we should simply disable unlogged MVs.  Personally, I'm not
excited about having a relisvalid flag at all, and doubly not excited
if it means we can't have unlogged MVs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Thom Brown
Date:
On 16 November 2012 16:25, Kevin Grittner <kgrittn@mail.com> wrote:
Josh Berkus wrote:

> Unlogged is good.

I agree that there are likely to be more use cases for this than
temp MVs. Unfortunately, I've had a hard time figuring out how to
flag an MV which is empty because its contents were lost after a
crash with preventing people from using an MV which hasn't been
populated, which has the potential to silently return incorrect
results.

pg_class.relisvalid = false.. getting rid of the use for truncated MVs?

--
Thom

Re: Materialized views WIP patch

From
Tom Lane
Date:
"Kevin Grittner" <kgrittn@mail.com> writes:
> Josh Berkus wrote:
>> What use would a temporary matview be?

> It would be essentially like a temporary table, with all the same
> persistence options. I'm not really sure how often it will be more
> useful than a temporary table before we have incremental maintenance
> of materialized views; once we have that, though, it seems likely
> that there could be reasonable use cases.

One of the principal attributes of a temp table is that its contents
aren't (reliably) accessible from anywhere except the owning backend.
Not sure where you're going to hide the incremental maintenance in
that scenario.

> The table inheritance has not been implemented in either direction
> for MVs. It didn't seem clear to me that there were reasonable use
> cases. Do you see any?

We don't have inheritance for views, so how would we have it for
materialized views?
        regards, tom lane



Re: Materialized views WIP patch

From
Thom Brown
Date:
On 15 November 2012 02:28, Kevin Grittner <kgrittn@mail.com> wrote:
Attached is a patch that...

 Got this error:

postgres=# create view v_test as select 1;
CREATE VIEW
postgres=# create materialized view mv_test as select * from v_test;
ERROR:  could not open file "base/12064/16425": No such file or directory

--
Thom

Re: Materialized views WIP patch

From
Antonin Houska
Date:
By chance (?) I got similar one today too, when dropping extension:

ERROR:  could not open file "base/12623/12548": No such file or directory

I thought something might have gone wrong during Linux upgrade 2 days ago, but it's not likely that we both have the issue.

I wonder if something is broken in the catalog. The last commit I have in my environment is

commit 4af3dda13601d859a20425e3554533fde0549056
Author: Peter Eisentraut <peter_e@gmx.net>
Date:   Sun Oct 28 10:35:46 2012 -0400

Kind regards,
Tony.

On 11/16/2012 06:14 PM, Thom Brown wrote:
On 15 November 2012 02:28, Kevin Grittner <kgrittn@mail.com> wrote:
Attached is a patch that...

 Got this error:

postgres=# create view v_test as select 1;
CREATE VIEW
postgres=# create materialized view mv_test as select * from v_test;
ERROR:  could not open file "base/12064/16425": No such file or directory

--
Thom

Re: Materialized views WIP patch

From
Josh Berkus
Date:
Kevin,

> I agree that there are likely to be more use cases for this than
> temp MVs. Unfortunately, I've had a hard time figuring out how to
> flag an MV which is empty because its contents were lost after a
> crash with preventing people from using an MV which hasn't been
> populated, which has the potential to silently return incorrect
> results.

See below.

>>> 2. MVs don't support inheritance.
>>
>> In which direction? Can't inherit, or can't be inherited from?
> 
> The table inheritance has not been implemented in either direction
> for MVs. It didn't seem clear to me that there were reasonable use
> cases. Do you see any?

No, I just wanted clarity on this.  I can see a strong case for
eventually supporting CREATE MATERIALIZED VIEW matview_1 LIKE matview,
in order to "copy" mativews, though.

> Consistency. There are other object types which seem to enforce this
> rule for no reason that I can see beyond maybe a desire to have both
> directions of COPY work with the same set of objects. If I remember
> correctly, allowing this would eliminate one line of code from the
> patch, so if sentiment is in favor of it, it is very easily done.

There's going to be a pretty strong demand for COPY FROM matviews.
Forcing the user to use COPY FROM ( SELECT ... ) will be seen as
arbitrary and unintuitive.

>> How would you change the definition of an MV then?
> 
> At this point you would need to drop and re-create the MV. If we
> want to add columns to an MV or change what an existing column holds,
> perhaps we could have an ALTER MV which changed the SELECT statement
> that populates the MV? I would prefer to leave that to a later patch,
> though -- it seems like a bit of a minefield compared to what is
> being implemented in this patch.

I agree that it should be a later patch.

> Not technically, really, but I saw two reasons that I preferred LOAD MV:
> 
> 1. It seems to me to better convey that the entire contents of the MV
>    will be built from scratch, rather than incrementally adjusted.
> 2. We haven't hashed out the syntax for more aggressive maintenance of
>    an MV, and it seemed like UPDATE MV might be syntax we would want to
>    use for something which updated selected parts of an MV when we do.

Hmmm, I see your point.  So "LOAD" would recreate, and (when supported)
UPDATE would incrementally update?

> It builds a new heap and moves it into place. When the transaction
> running LMV commits, the old heap is deleted. In implementation it is
> closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
> creating a new table. This allows all permissions, etc., to stay in
> place.

OK, so same effect as a truncate.

> Robert feels the same way, but I disagree. Some MVs will not be terribly
> volatile. In my view there is a big difference between having a "top ten"
> list which might be based on yesterday's base tables rather than the base
> table states as of this moment, and having a "top ten" list with no
> entries. If you want to, for example, take some action if an order comes
> in for one of your top customers, and a different path for other
> customers, suddenly treating all of your long-time top customers as not
> being so, without any squawk from the database, seems dangerous.

Right, but a relisvalid flag just tells me that the matview was updated
at sometime in the past, and not *when* it was updated.  It could have
been 3 years ago.  The fact that it was updated at some indefinite time
is fairly valueless information.

There's a rule in data warehousing which says that it's better to have
no data (and know that you have no data) than to have incorrect data.

> I see your point there; I'll think about that. My take was more that MVs
> would often be refreshed by crontab, and that you would want to keep
> subsequent steps from running and generating potentially plausible but
> completely inaccurate results if the LMV failed.

Yeah, that too.  Also, a timestamp it would make it easy to double-check
if the cron job was failing or had been disabled.

> UNLOGGED tables and indexes are made empty on crash recovery by copying
> the initialization fork over the "normal" relations. Care was taken to
> avoid needing to connect to each database in turn to complete that
> recovery. This style of recovery can't really set the relisvalid flag, as
> far as I can see; which leaves us choosing between unlogged MVs and
> knowing whether they hold valid data -- unless someone has a better idea.

Yeah, well, whether we have relisvalid or mvlastupdate, we're going to
have to work out some way to have that field react to changes to the
table overall.  I don't know *how*, but it's something we'll have to solve.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Dimitri Fontaine
Date:
"Kevin Grittner" <kgrittn@mail.com> writes:
>> UPDATE MATERIALIZED VIEW was problematic?
>
> Not technically, really, but I saw two reasons that I preferred LOAD MV:
>
> 1. It seems to me to better convey that the entire contents of the MV
>    will be built from scratch, rather than incrementally adjusted.
> 2. We haven't hashed out the syntax for more aggressive maintenance of
>    an MV, and it seemed like UPDATE MV might be syntax we would want to
>    use for something which updated selected parts of an MV when we do.

Good point, and while I'm in the mood for some grammar input, here's a
try:
 ALTER MATERIALIZED VIEW foo RESET; ALTER MATERIALIZED VIEW foo UPDATE;

I think such wholesale operations make more sense as ALTER statement
than as UPDATE statements.

> It builds a new heap and moves it into place. When the transaction
> running LMV commits, the old heap is deleted. In implementation it is
> closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
> creating a new table. This allows all permissions, etc., to stay in
> place.

When you say closer to CLUSTER, do you include the Access Exclusive Lock
that forbids reading the previous version's data while you prepare the
new one? That would be very bad and I wouldn't understand the need to,
in the scope of MATERIALIZED VIEWs which are by definition lagging
behind…

If as I think you don't have that limitation in your implementation,
it's awesome and just what I was hoping to read :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support




Re: Materialized views WIP patch

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> There's going to be a pretty strong demand for COPY FROM matviews.
> Forcing the user to use COPY FROM ( SELECT ... ) will be seen as
> arbitrary and unintuitive.

You could make that same claim about plain views, but in point of
fact the demand for making them work in COPY has been minimal.
So I'm not convinced this is an essential first-cut feature.
We can always add it later.
        regards, tom lane



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Thom Brown wrote:

> postgres=# create view v_test as select 1;
> CREATE VIEW
> postgres=# create materialized view mv_test as select * from v_test;
> ERROR: could not open file "base/12064/16425": No such file or directory

Thanks for the report; will investigate.

-Kevin



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Robert Haas wrote:
> Josh Berkus wrote:

>> Being empty (in an inaccurate way) is just one kind of stale data.
> 
> This is my feeling also.

If you had an MV summarizing Wisconsin courts cumulative case counts
by case type, "empty" would not have been a valid "stale" state for
over 150 years. That is a degree of staleness that IMV is not just a
quantitative degree of staleness, as if a nightly recalculation had
failed to occur, but a qualitatively different state entirely. While
you may or may not want to use the stale data if last night's regen
failed, and so it should be under application control, I can't
imagine a situation where you would want to proceed if the MV didn't
have data that had at some time been correct -- preferrably at some
time since the invention of digital electronic computers. Could you
provide an example where it would be a good thing to do so?

-Kevin



Re: Materialized views WIP patch

From
Josh Berkus
Date:
> You could make that same claim about plain views, but in point of
> fact the demand for making them work in COPY has been minimal.
> So I'm not convinced this is an essential first-cut feature.
> We can always add it later.

Of course.  I just had the impression that we could support COPY FROM by
*deleting* a couple lines from Kevin's patch, rather than it being extra
work.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> You could make that same claim about plain views, but in point of
>> fact the demand for making them work in COPY has been minimal.
>> So I'm not convinced this is an essential first-cut feature.
>> We can always add it later.

> Of course.  I just had the impression that we could support COPY FROM by
> *deleting* a couple lines from Kevin's patch, rather than it being extra
> work.

Even if it happens to be trivial in the current patch, it's an added
functional requirement that we might later regret having cavalierly
signed up for.  And, as noted upthread, relations that support only
one direction of COPY don't exist at the moment; that would be adding
an asymmetry that we might later regret, too.
        regards, tom lane



Re: Materialized views WIP patch

From
Simon Riggs
Date:
On 16 November 2012 11:25, Kevin Grittner <kgrittn@mail.com> wrote:

>>> 16. To get new data into the MV, the command is LOAD MATERIALIZED
>>> VIEW mat view_name. This seemed more descriptive to me that the
>>> alternatives and avoids declaring any new keywords beyond
>>> MATERIALIZED. If the MV is flagged as relisvalid == false, this
>>> will change it to true.
>>
>> UPDATE MATERIALIZED VIEW was problematic?
>
> Not technically, really, but I saw two reasons that I preferred LOAD MV:
>
> 1. It seems to me to better convey that the entire contents of the MV
>    will be built from scratch, rather than incrementally adjusted.
> 2. We haven't hashed out the syntax for more aggressive maintenance of
>    an MV, and it seemed like UPDATE MV might be syntax we would want to
>    use for something which updated selected parts of an MV when we do.
>
>> Does LOAD automatically TRUNCATE the view before reloading it? If not,
>> why not?
>
> It builds a new heap and moves it into place. When the transaction
> running LMV commits, the old heap is deleted. In implementation it is
> closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
> creating a new table. This allows all permissions, etc., to stay in
> place.

This seems very similar to the REPLACE command we discussed earlier,
except this is restricted to Mat Views.

If we're going to have this, I would prefer a whole command.

e.g. REPLACE matviewname REFRESH

that would also allow

REPLACE tablename AS query

Same thing under the covers, just more widely applicable and thus more useful.



Either way, I don't much like overloading the use of LOAD, which
already has a very different meaning.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Materialized views WIP patch

From
"Albe Laurenz"
Date:
Josh Berkus wrote:
>> It would be nice for the user to have some way to know that a matview is
>> empty due to never being LOADed or recently being TRUNCATEd. However, I
>> don't think that relisvalid flag -- and preventing scanning the relation
>> -- is a good solution. What I'd rather have instead is a timestamp of
>> when the MV was last LOADed. If the MV was never loaded (or was
>> truncated) that timestamp would be NULL. Such a timestamp would allow
>> users to construct all kinds of ad-hoc refresh schemes for MVs which
>> would not be possible without it.

+1

Kevin Grittner wrote:
> I see your point there; I'll think about that. My take was more that MVs
> would often be refreshed by crontab, and that you would want to keep
> subsequent steps from running and generating potentially plausible but
> completely inaccurate results if the LMV failed.

If one of these subsequent steps doesn't care if refresh
failed once, it shouldn't be forced to fail.  I imagine
that for many applications yesterday's data can be good enough.

Those that care should check the timestamp.

Yours,
Laurenz Albe

Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Simon Riggs wrote:

> This seems very similar to the REPLACE command we discussed
> earlier, except this is restricted to Mat Views.

I don't remember that discussion -- do you have a reference?

> If we're going to have this, I would prefer a whole command.
> 
> e.g. REPLACE matviewname REFRESH
> 
> that would also allow
> 
> REPLACE tablename AS query
> 
> Same thing under the covers, just more widely applicable and thus
> more useful.

An interesting throught. I would have thought that if we were going
to allow changing the definition of an existing MV, we would be
better off with CREATE OR REPLACE MATERIALIZED VIEW. Either way, if
you allow the column types or the number of columns to be changed,
you do tend to run into issues if there are other MVs, views,
triggers, rules, etc., which depend on the MV, so I don't think it's
material for an initial patch. But it is worth considering which way
we might want to extend it.

> Either way, I don't much like overloading the use of LOAD, which
> already has a very different meaning.

Well, it's hard to avoid creating new keywords without overloading
the meaning of exsiting ones. Personally I didn't find
 LOAD MATERIALIZED VIEW matview_name;

to be very easy to confuse with
 LOAD 'filename';

But that's a subjective thing. If too many people find that
confusing, it may be worth creating a new keyword; but I wanted to
see whether it was really necessary first.

-Kevin



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Albe Laurenz wrote:
> Kevin Grittner wrote:
> 
>> My take was more that MVs would often be refreshed by crontab, and
>> that you would want to keep subsequent steps from running and
>> generating potentially plausible but completely inaccurate results
>> if the LMV failed.
>
> If one of these subsequent steps doesn't care if refresh
> failed once, it shouldn't be forced to fail. I imagine
> that for many applications yesterday's data can be good enough.
> 
> Those that care should check the timestamp.

It sounds like you and I are in agreement on this; I just didn't
state it very precisely. If a LMV on a MV which already has data
fails, the relisvalid would not prevent it from being used -- it
would be stale, but still valid data from *some* point in time. The
point is that if an MV is created WITH NO DATA or has been TRUNCATEd
and there has not been a subsequent LMV, what it contains may not
represent any state which was *ever* valid, or it may represent a
state which would only have been valid hundreds of years in the past,
had the system been computerized at that time. To me, that does not
seem like the same thing as a simple "stale" state.

I'm looking at whether there is some reasonable way to detect invalid
data as well as capture age of data. Every solution I've thought of
so far has at least one hard-to-solve race condition, but I have
hopes that I can either solve that for one of the ideas, or come up
with an idea which falls more gracefully under MVCC management.

-Kevin



Re: Materialized views WIP patch

From
Tom Lane
Date:
"Kevin Grittner" <kgrittn@mail.com> writes:
> Simon Riggs wrote:
>> Either way, I don't much like overloading the use of LOAD, which
>> already has a very different meaning.

> Well, it's hard to avoid creating new keywords without overloading
> the meaning of exsiting ones.

FWIW, I'd much rather see us overload LOAD (which is seldom used)
than REPLACE (which might in the future become a widely-used DML
command).
        regards, tom lane



Re: Materialized views WIP patch

From
Josh Berkus
Date:
Kevin,

> I'm looking at whether there is some reasonable way to detect invalid
> data as well as capture age of data. Every solution I've thought of
> so far has at least one hard-to-solve race condition, but I have
> hopes that I can either solve that for one of the ideas, or come up
> with an idea which falls more gracefully under MVCC management.

What's the race condition?  I'd think that LOAD would take an exclusive
lock on the matview involved.

>   LOAD MATERIALIZED VIEW matview_name;
>
> to be very easy to confuse with
>
>   LOAD 'filename';
>
> But that's a subjective thing. If too many people find that
> confusing, it may be worth creating a new keyword; but I wanted to
> see whether it was really necessary first.

I do not find them confusing.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Josh Berkus
Date:
On 11/19/12 9:57 AM, Josh Berkus wrote:
> Kevin,
> 
>> I'm looking at whether there is some reasonable way to detect invalid
>> data as well as capture age of data. Every solution I've thought of
>> so far has at least one hard-to-solve race condition, but I have
>> hopes that I can either solve that for one of the ideas, or come up
>> with an idea which falls more gracefully under MVCC management.
> 
> What's the race condition?  I'd think that LOAD would take an exclusive
> lock on the matview involved.

BTW, another thought on the timestamp: while it would be better to have
a lastrefresh timestamp in pg_class, the other option is to have an
extra column in the matview (pg_last_update).  While that would involve
some redundant storage, it would neatly solve the issues around unlogged
matviews; the timestamp and the data would vanish at the same time.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Tom Lane wrote:
> "Kevin Grittner" <kgrittn@mail.com> writes:
>> Josh Berkus wrote:
>>> What use would a temporary matview be?
> 
>> It would be essentially like a temporary table, with all the same
>> persistence options. I'm not really sure how often it will be more
>> useful than a temporary table before we have incremental
>> maintenance of materialized views; once we have that, though, it
>> seems likely that there could be reasonable use cases.
> 
> One of the principal attributes of a temp table is that its
> contents aren't (reliably) accessible from anywhere except the
> owning backend. Not sure where you're going to hide the incremental
> maintenance in that scenario.

The more I think about that, the less sensible temporary MVs seem.
Unless I can figure out some reasonable use case, I'll diable that in
the next version of the patch.

-Kevin



Re: Materialized views WIP patch

From
Marko Tiikkaja
Date:
Hi Kevin,

On 15/11/2012 03:28, Kevin Grittner wrote:
> Attached is a patch that is still WIP but that I think is getting
> pretty close to completion.

I've been looking at this, but I unfortunately haven't had as much time 
as I had hoped for, and have not looked at the code in detail yet.  It's 
also a relatively big patch, so I wouldn't mind another pair of eyes on it.


I have been testing the patch a bit, and I'm slightly disappointed by 
the fact that it still doesn't solve this problem (and I apologize if I 
have missed discussion about this in the docs or in this thread):

<assume "foo" is a non-empty materialized view>

T1: BEGIN;
T1: LOAD MATERIALIZED VIEW foo;

T2: SELECT * FROM foo;

T1: COMMIT;

<T2 sees an empty table>


As others have pointed out, replacing the contents of a table is 
something which people have been wanting to do for a long time, and I 
think having this ability would make this patch a lot better; now it 
just feels like syntactic sugar.

> 1.  CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
>      TABLE AS, with all the same clauses supported. That includes
>      declaring a materialized view to be temporary or unlogged.
> 2.  MVs don't support inheritance.
> 3.  MVs can't define foreign keys.
> 4.  MVs can't be the target of foreign keys.
> 5.  MVs can't have triggers.
> 6.  Users can't create rules which reference MVs (although MVs
>      [ab]use the rules mechanism internally, similar to how views do).
> 7.  MVs can't be converted to views, nor vice versa.
> 8.  Users may not directly use INSERT/UPDATE/DELETE on an MV.
> 9.  MVs can't directly be used in a COPY statement, but can be the
>      source of data using a SELECT.
> 10. MVs can't own sequences.
> 11. MVs can't be the target of LOCK statements, although other
>      statements get locks just like a table.
> 12. MVs can't use data modifying CTEs in their definitions.
> 13. pg_class now has a relisvalid column, which is true if an MV is
>      truncated or created WITH NO DATA. You can not scan a relation
>      flagged as invalid.
> 14. ALTER MATERIALIZED VIEW is supported for the options that seemed
>      to make sense. For example, you can change the tablespace or
>      schema, but you cannot add or drop column with ALTER.
> 16. To get new data into the MV, the command is LOAD MATERIALIZED
>      VIEW mat view_name. This seemed more descriptive to me that the
>      alternatives and avoids declaring any new keywords beyond
>      MATERIALIZED. If the MV is flagged as relisvalid == false, this
>      will change it to true.
> 17. Since the data viewed in an MV is not up-to-date with the latest
>      committed transaction, it didn't seem to make any sense to try to
>      apply SERIALIZABLE transaction semantics to queries looking at
>      the contents of an MV, although if LMV is run in a SERIALIZABLE
>      transaction the MV data is guaranteed to be free of serialization
>      anomalies. This does leave the transaction running the LOAD
>      command vulnerable to serialization failures unless it is also
>      READ ONLY DEFERRABLE.
> 18. Bound parameters are not supported for the CREATE MATERIALIZED
>      VIEW statement.

I believe all of these points have been under discussion, and I don't 
have anything to add to the ongoing discussions.

> 19. LMV doesn't show a row count. It wouldn't be hard to add, it just
>      seemed a little out of place to do that, when CLUSTER, etc.,
>      don't.

This sounds like a useful feature, but your point about CLUSTER and 
friends still stands.

> In the long term, we will probably need to separate the
> implementation of CREATE TABLE AS and CREATE MATERIALIZED VIEW, but
> for now there is so little that they need to do differently it seemed
> less evil to have a few "if" clauses that that much duplicated code.

Seems sensible.

I'll get back when I manage to get a better grasp of the code.


Regards,
Marko Tiikkaja



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Sun, Nov 25, 2012 at 7:30 PM, Marko Tiikkaja <pgmail@joh.to> wrote:
> As others have pointed out, replacing the contents of a table is something
> which people have been wanting to do for a long time, and I think having
> this ability would make this patch a lot better; now it just feels like
> syntactic sugar.

I agree that it's mostly syntactic sugar, but I think we need to have
realistic expectations for what is possible in an initial patch.  When
I committed the first patch for foreign data wrappers, it didn't work
at all: it was just syntax support.  Tom later committed a follow-on
patch that made them work.  Similarly, I split the event trigger patch
into two halves, one of which added the syntax support and the other
of which made them functional: and even with both commits in, I think
it's fair to say that event triggers are still in a fairly primitive
state.

None of those patches were small patches.  It's going to take multiple
years to get materialized views up to a state where they're really
useful to a broad audience in production applications, but I don't
think we should sneer at anyone for writing a patch that is "just
syntactic sugar".  As it turns out, adding a whole new object type is
a lot of work and generates a big patch even if it doesn't do much
just yet.  Rejecting such patches on the grounds that they aren't
comprehensive enough is, IMHO, extremely unwise; we'll either end up
landing even larger patches that are almost impossible to review
comprehensively and therefore more likely to break something, or else
we'll kill the projects outright and end up with nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Marko Tiikkaja
Date:
On 11/26/12 2:07 PM, Robert Haas wrote:
> On Sun, Nov 25, 2012 at 7:30 PM, Marko Tiikkaja <pgmail@joh.to> wrote:
>> As others have pointed out, replacing the contents of a table is something
>> which people have been wanting to do for a long time, and I think having
>> this ability would make this patch a lot better; now it just feels like
>> syntactic sugar.
>
> I agree that it's mostly syntactic sugar, but I think we need to have
> realistic expectations for what is possible in an initial patch.  When
> I committed the first patch for foreign data wrappers, it didn't work
> at all: it was just syntax support.  Tom later committed a follow-on
> patch that made them work.  Similarly, I split the event trigger patch
> into two halves, one of which added the syntax support and the other
> of which made them functional: and even with both commits in, I think
> it's fair to say that event triggers are still in a fairly primitive
> state.
>
> None of those patches were small patches.  It's going to take multiple
> years to get materialized views up to a state where they're really
> useful to a broad audience in production applications, but I don't
> think we should sneer at anyone for writing a patch that is "just
> syntactic sugar".  As it turns out, adding a whole new object type is
> a lot of work and generates a big patch even if it doesn't do much
> just yet.  Rejecting such patches on the grounds that they aren't
> comprehensive enough is, IMHO, extremely unwise; we'll either end up
> landing even larger patches that are almost impossible to review
> comprehensively and therefore more likely to break something, or else
> we'll kill the projects outright and end up with nothing.

First of all, I have to apologize.  Re-reading the email I sent out last 
night, it does indeed feel a bit harsh and I can understand your reaction.

At no point did I mean to belittle Kevin's efforts or the patch itself.  I was mostly looking for Kevin's input on how
hardit would be to 
 
solve the particular problem and whether it would be possible to do so 
for 9.3.

While I feel like the problem I pointed out is a small caveat and should 
be at least documented for 9.3, I think this patch has merits of its own 
even if that problem never gets fixed, and I will continue to review 
this patch.


Regards,
Marko Tiikkaja



Re: Materialized views WIP patch

From
Peter Geoghegan
Date:
On 26 November 2012 13:07, Robert Haas <robertmhaas@gmail.com> wrote:
> None of those patches were small patches.  It's going to take multiple
> years to get materialized views up to a state where they're really
> useful to a broad audience in production applications, but I don't
> think we should sneer at anyone for writing a patch that is "just
> syntactic sugar".

+1. I have a sweet tooth. I don't like it when people criticise
patches on the basis of "obviously you could achieve the same effect
with $CONVOLUTION". Making things simpler is a desirable outcome. Now,
that isn't to say that we should disregard everything or even anything
else in pursuit of simplicity; just that "needing a Ph.D is
Postgresology", as you once put it, to do something routine to many is
really hard to defend.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 11/14/12 9:28 PM, Kevin Grittner wrote:
> 17. Since the data viewed in an MV is not up-to-date with the latest
>     committed transaction,

So, the way I understand it, in Oracle terms, this feature is a
"snapshot", not a materialized view.  Maybe that's what it should be
called then.




Re: Materialized views WIP patch

From
Robert Haas
Date:
On Mon, Nov 26, 2012 at 8:14 AM, Marko Tiikkaja <pgmail@joh.to> wrote:
> First of all, I have to apologize.  Re-reading the email I sent out last
> night, it does indeed feel a bit harsh and I can understand your reaction.
>
> At no point did I mean to belittle Kevin's efforts or the patch itself.  I
> was mostly looking for Kevin's input on how hard it would be to solve the
> particular problem and whether it would be possible to do so for 9.3.
>
> While I feel like the problem I pointed out is a small caveat and should be
> at least documented for 9.3, I think this patch has merits of its own even
> if that problem never gets fixed, and I will continue to review this patch.

OK, no worries.  I didn't really interpret your email as belittling; I
just want to make sure this feature doesn't get feature-creeped to
death.  I think everyone, including Kevin, understands that the
real-world applicability of v1 is going to be limited and many people
will choose alternative techniques rather than relying on this new
feature.  But I also think that we'll never get to a really awesome,
kick-ass feature unless we're willing to commit an initial version
that isn't all that awesome or kick-ass.  If I understand Kevin's
goals correctly, the plan is to get this basic version committed for
9.3, and then to try to expand the capability during the 9.4 release
cycle (and maybe 9.5, too, there's a lot of work to do here).  I think
that's a pretty sound plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Andrew Dunstan
Date:
On 11/26/2012 09:46 AM, Peter Eisentraut wrote:
> On 11/14/12 9:28 PM, Kevin Grittner wrote:
>> 17. Since the data viewed in an MV is not up-to-date with the latest
>>      committed transaction,
> So, the way I understand it, in Oracle terms, this feature is a
> "snapshot", not a materialized view.  Maybe that's what it should be
> called then.
>
>


If you use Jonathan Gardner's taxonomy at 
<http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views>, 
snapshots are a subclass of materialized views.

cheers

andrew




Re: Materialized views WIP patch

From
David Fetter
Date:
On Mon, Nov 26, 2012 at 09:46:33AM -0500, Peter Eisentraut wrote:
> On 11/14/12 9:28 PM, Kevin Grittner wrote:
> > 17. Since the data viewed in an MV is not up-to-date with the
> > latest committed transaction,
> 
> So, the way I understand it, in Oracle terms, this feature is a
> "snapshot", not a materialized view.  Maybe that's what it should be
> called then.

"Snapshot" is one of the options for refreshing an Oracle materialized
view.  There are others, which we'll eventually add if past is any
prologue :)

I hate to add to the bike-shedding, but we should probably add REFRESH
SNAPSHOT as an optional piece of the grammar, with more REFRESH
options to come.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Materialized views WIP patch

From
Peter Geoghegan
Date:
On 26 November 2012 15:24, David Fetter <david@fetter.org> wrote:
> I hate to add to the bike-shedding, but we should probably add REFRESH
> SNAPSHOT as an optional piece of the grammar, with more REFRESH
> options to come.

I don't know that they should be called materialised views, but do we
really need to overload the word snapshot? I'd just as soon invent a
new word as use the Oracle one, since I don't think the term snapshot
is widely recognised as referring to anything other than snapshot
isolation.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services



Re: Materialized views WIP patch

From
David Fetter
Date:
On Mon, Nov 26, 2012 at 04:02:17PM +0000, Peter Geoghegan wrote:
> On 26 November 2012 15:24, David Fetter <david@fetter.org> wrote:
> > I hate to add to the bike-shedding, but we should probably add
> > REFRESH SNAPSHOT as an optional piece of the grammar, with more
> > REFRESH options to come.
> 
> I don't know that they should be called materialised views, but do
> we really need to overload the word snapshot? I'd just as soon
> invent a new word as use the Oracle one, since I don't think the
> term snapshot is widely recognised as referring to anything other
> than snapshot isolation.

I believe that the meaning here is unambiguous, and is used in other
descriptions than Oracle's, including the one on our wiki.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Marko Tiikkaja wrote:
> On 15/11/2012 03:28, Kevin Grittner wrote:

> I have been testing the patch a bit

Thanks!

> and I'm slightly disappointed by the fact that it still doesn't
> solve this problem (and I apologize if I have missed discussion
> about this in the docs or in this thread):
> 
> <assume "foo" is a non-empty materialized view>
> 
> T1: BEGIN;
> T1: LOAD MATERIALIZED VIEW foo;
> 
> T2: SELECT * FROM foo;
> 
> T1: COMMIT;
> 
> <T2 sees an empty table>

As far as I know you are the first to notice this behavior. Thanks
for pointing it out.

I agree with Robert that we have to be careful about scope creep,
but this one looks to me like it should be considered a bug. IMO,
LOAD for a populated MV should move it from one state which
reflects the captured state of a previous point in time to a
captured state which is later, with no invalid or inconsistent
states visible in between.

I will take a look at the issue; I don't know whether it's
something small I can address in this CF or whether it will need to
be in the next CF, but I will fix it.

>> 19. LMV doesn't show a row count. It wouldn't be hard to add, it
>>     just seemed a little out of place to do that, when CLUSTER,
>>     etc., don't.
> 
> This sounds like a useful feature, but your point about CLUSTER
> and friends still stands.

Other possible arguments against providing a count are:
(1) For a populated MV, the LOAD might be replacing the contents
with fewer rows than were there before.
(2) Once we have incremental updates of the MV, this count is only
one of the ways to update the view -- and the others won't show
counts. Showing it here might be considered inconsistent.

I don't feel strongly about it, and I don't think it's a big change
either way; just explaining what got me off the fence when I had to
pick one behavior or the other to post the WIP patch.

> I'll get back when I manage to get a better grasp of the code.

Thanks.

Keep in mind that the current behavior of behaving like a regular
view when the contents are invalid is not what I had in mind, that
was an accidental effect of commenting out the body of the
ExecCheckRelationsValid() function right before posting the patch
because I noticed a regression. When I noticed current behavior, it
struck me that someone might prefer it to the intended behavior of
showing an error like this:
 ereport(ERROR,   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),    errmsg("materialized view \"%s\" has not been
populated",          get_rel_name(rte->relid)),    errhint("Use the LOAD MATERIALIZED VIEW command.")));
 

I mention it in case someone wants to argue for silently behaving
as a regular view when the MV is not populated.

-Kevin



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
David Fetter wrote:
> On Mon, Nov 26, 2012 at 09:46:33AM -0500, Peter Eisentraut wrote:

>> So, the way I understand it, in Oracle terms, this feature is a
>> "snapshot", not a materialized view. Maybe that's what it should
>> be called then.
> 
> "Snapshot" is one of the options for refreshing an Oracle
> materialized view. There are others, which we'll eventually add
> if past is any prologue :)

That's the way I understand it, too.

> I hate to add to the bike-shedding, but we should probably add
> REFRESH SNAPSHOT as an optional piece of the grammar, with more
> REFRESH options to come.

I would prefer to leave the syntax for refreshing MVs to a later
patch. I'm kind of assuming that SNAPSHOT would be the default if
no other options are specified, but I would prefer not to get into
too much speculation about add-on patches for fear of derailing
this initial effort.

-Kevin



Re: Materialized views WIP patch

From
David Fetter
Date:
On Mon, Nov 26, 2012 at 04:34:36PM -0500, Kevin Grittner wrote:
> David Fetter wrote:
> > On Mon, Nov 26, 2012 at 09:46:33AM -0500, Peter Eisentraut wrote:
> 
> >> So, the way I understand it, in Oracle terms, this feature is a
> >> "snapshot", not a materialized view. Maybe that's what it should
> >> be called then.
> > 
> > "Snapshot" is one of the options for refreshing an Oracle
> > materialized view. There are others, which we'll eventually add
> > if past is any prologue :)
> 
> That's the way I understand it, too.

Great :)

> > I hate to add to the bike-shedding, but we should probably add
> > REFRESH SNAPSHOT as an optional piece of the grammar, with more
> > REFRESH options to come.
> 
> I would prefer to leave the syntax for refreshing MVs to a later
> patch. I'm kind of assuming that SNAPSHOT would be the default if
> no other options are specified, but I would prefer not to get into
> too much speculation about add-on patches for fear of derailing
> this initial effort.

You're right.  I withdraw my suggestion until such time as this patch
(or descendent) is committed and actual working code implementing
other refresh strategies is written.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On Mon, 2012-11-26 at 09:46 -0500, Peter Eisentraut wrote:
> On 11/14/12 9:28 PM, Kevin Grittner wrote:
> > 17. Since the data viewed in an MV is not up-to-date with the latest
> >     committed transaction,
> 
> So, the way I understand it, in Oracle terms, this feature is a
> "snapshot", not a materialized view.  Maybe that's what it should be
> called then.

OK, I take everything back and claim the opposite.

In current Oracle, SNAPSHOT is an obsolete alias for MATERIALIZED VIEW.
Materialized views have the option of REFRESH ON DEMAND and REFRESH ON
COMMIT, with the former being the default.  So it seems that the syntax
of what you are proposing is in line with Oracle.

I'm not fond of overloading LOAD as the refresh command.  Maybe you
could go the Oracle route here as well and use a stored procedure.  That
would also allow things like SELECT pg_refresh_mv(oid) FROM ... more
easily.





Re: Materialized views WIP patch

From
"David Rowley"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Peter Eisentraut
> Sent: 27 November 2012 13:35
> To: Kevin Grittner
> Cc: Pgsql Hackers
> Subject: Re: [HACKERS] Materialized views WIP patch
>
> On Mon, 2012-11-26 at 09:46 -0500, Peter Eisentraut wrote:
> > On 11/14/12 9:28 PM, Kevin Grittner wrote:
> > > 17. Since the data viewed in an MV is not up-to-date with the latest
> > >     committed transaction,
> >
> > So, the way I understand it, in Oracle terms, this feature is a
> > "snapshot", not a materialized view.  Maybe that's what it should be
> > called then.
>
> OK, I take everything back and claim the opposite.
>
> In current Oracle, SNAPSHOT is an obsolete alias for MATERIALIZED VIEW.
> Materialized views have the option of REFRESH ON DEMAND and REFRESH
> ON COMMIT, with the former being the default.  So it seems that the syntax
> of what you are proposing is in line with Oracle.
>
> I'm not fond of overloading LOAD as the refresh command.  Maybe you could
> go the Oracle route here as well and use a stored procedure.  That would also
> allow things like SELECT pg_refresh_mv(oid) FROM ... more easily.
>
>
+1 to this.
I can see a use case where you might want to refresh all MVs that are X number of days/hours old. Rather than having to
executestatements for each one. Something like pg_refresh_mv() within a query would allow this. 

Pretty exciting work Kevin, I understand what Robert said about feature creep etc and agree 100%, but I'm really
lookingforward to when we can *one day* have the planner make use of an eager MV to optimise a query! 

Regards

David Rowley

>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers




Re: Materialized views WIP patch

From
Dimitri Fontaine
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I'm not fond of overloading LOAD as the refresh command.  Maybe you
> could go the Oracle route here as well and use a stored procedure.  That
> would also allow things like SELECT pg_refresh_mv(oid) FROM ... more
> easily.

I would like that we have a way to refresh a Materialized View by
calling a stored procedure, but I don't think it should be the main UI.

The wholesale refreshing of a matview appears to me to be comparable to
TRUNCATE is that it's both a DDL and a DML. The incremental refreshing
modes we want to have later are clearly DML only, either on commit
refresh or incrementally on demand.

I would then propose that we use ALTER MATERIALIZED VIEW as the UI for
the wholesale on demand refreshing operation, and UPDATE MATERIALIZED
VIEW as the incremental command (to come later).

So my proposal for the current feature would be:
 ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ]; UPDATE MATERIALIZED VIEW mv;

The choice of keywords and syntax here hopefully clearly hint the user
about the locking behavior of the commands, too. And as we said, the
bare minimum for this patch does *not* include the CONCURRENTLY option,
which we still all want to have (someday). :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support




Re: Materialized views WIP patch

From
Pavel Stehule
Date:
2012/11/27 Dimitri Fontaine <dimitri@2ndquadrant.fr>:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> I'm not fond of overloading LOAD as the refresh command.  Maybe you
>> could go the Oracle route here as well and use a stored procedure.  That
>> would also allow things like SELECT pg_refresh_mv(oid) FROM ... more
>> easily.
>
> I would like that we have a way to refresh a Materialized View by
> calling a stored procedure, but I don't think it should be the main UI.
>
> The wholesale refreshing of a matview appears to me to be comparable to
> TRUNCATE is that it's both a DDL and a DML. The incremental refreshing
> modes we want to have later are clearly DML only, either on commit
> refresh or incrementally on demand.
>
> I would then propose that we use ALTER MATERIALIZED VIEW as the UI for
> the wholesale on demand refreshing operation, and UPDATE MATERIALIZED
> VIEW as the incremental command (to come later).
>
> So my proposal for the current feature would be:
>
>   ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ];
>   UPDATE MATERIALIZED VIEW mv;
>
> The choice of keywords and syntax here hopefully clearly hint the user
> about the locking behavior of the commands, too. And as we said, the
> bare minimum for this patch does *not* include the CONCURRENTLY option,
> which we still all want to have (someday). :)
>

+1

Pavel

> Regards,
> --
> Dimitri Fontaine
> http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Pavel Stehule wrote:
> 2012/11/27 Dimitri Fontaine <dimitri@2ndquadrant.fr>:

>> I would like that we have a way to refresh a Materialized View
>> by calling a stored procedure, but I don't think it should be
>> the main UI.

I agree. I saw that Oracle uses a function for that without any
statement-level support, and that would probably be easier to
implement; but it felt wrong to do it that way. I couldn't think of
any other cases where similar action is taken without statement
syntax for it.

>> The wholesale refreshing of a matview appears to me to be
>> comparable to TRUNCATE is that it's both a DDL and a DML. The
>> incremental refreshing modes we want to have later are clearly
>> DML only, either on commit refresh or incrementally on demand.

Personally, I expect the most popular update method to eventually
become a queued update. I've looked ahead far enough to see that I
want to structure the incremental updates to be controlled through
an API where changes to supporting tables produce records saying
what was done which are fed to consumers which do the updating.
Then it becomes a matter of whether that consumer performs the
related updates to the MV during commit processing of the producer,
by pulling from a queue, or by reading accumulated rows when the MV
is referenced.

But I'm getting ahead of things with such speculation...

>> I would then propose that we use ALTER MATERIALIZED VIEW as the
>> UI for the wholesale on demand refreshing operation, and UPDATE
>> MATERIALIZED VIEW as the incremental command (to come later).

Honestly, I have managed to keep myself from speculating on syntax
for incremental updates. There will be enough complexity involved
that I expect months of bikeshedding. :-/

>> So my proposal for the current feature would be:
>>
>> ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ];
>> UPDATE MATERIALIZED VIEW mv;

An ALTER MATERIALIZED VIEW option was my first thought on syntax to
do what LOAD does in the current patch. But it bothered me that I
couldn't think of any other cases where ALTER <some-object-type>
only changed the data contained within the object and had no other
impact. Are you both really comfortable with an ALTER MATERIALIZED
VIEW which has no effect other than to update the data? It seems
wrong to me.

>> The choice of keywords and syntax here hopefully clearly hint
>> the user about the locking behavior of the commands, too. And as
>> we said, the bare minimum for this patch does *not* include the
>> CONCURRENTLY option, which we still all want to have (someday).
>> :)
> 
> +1

Sure -- a CONCURRENTLY option for LMV (or AMVU) seems like one of
the next steps. I'll feel more confident about implementing that
when it appears that we have shaken the last bugs out of
CREATE/DROP INDEX CONCURRENTLY, since anything which affects those
statements will probably also matter here.

-Kevin



Re: Materialized views WIP patch

From
David Johnston
Date:
On Nov 27, 2012, at 5:25, Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote:

>
> So my proposal for the current feature would be:
>
>  ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ];
>  UPDATE MATERIALIZED VIEW mv;
>
> The choice of keywords and syntax here hopefully clearly hint the user
> about the locking behavior of the commands, too. And as we said, the
> bare minimum for this patch does *not* include the CONCURRENTLY option,
> which we still all want to have (someday). :)
>

I dislike using ALTER syntax to perform a data-only action.

The other advantage of non-functional syntax is that you could more easily supply some form of where clause should you
onlywant to perform a partial refresh.  With a function call that becomes more obtuse. 

David J.


Re: Materialized views WIP patch

From
Dimitri Fontaine
Date:
"Kevin Grittner" <kgrittn@mail.com> writes:
> An ALTER MATERIALIZED VIEW option was my first thought on syntax to
> do what LOAD does in the current patch. But it bothered me that I
> couldn't think of any other cases where ALTER <some-object-type>
> only changed the data contained within the object and had no other
> impact. Are you both really comfortable with an ALTER MATERIALIZED
> VIEW which has no effect other than to update the data? It seems
> wrong to me.

I think you can already do that with some clever use of alter table ...
type using, or alter table set default.

> Sure -- a CONCURRENTLY option for LMV (or AMVU) seems like one of
> the next steps. I'll feel more confident about implementing that
> when it appears that we have shaken the last bugs out of
> CREATE/DROP INDEX CONCURRENTLY, since anything which affects those
> statements will probably also matter here.

Sure.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Dimitri Fontaine wrote:
> "Kevin Grittner" <kgrittn@mail.com> writes:
>> An ALTER MATERIALIZED VIEW option was my first thought on syntax
>> to do what LOAD does in the current patch. But it bothered me
>> that I couldn't think of any other cases where ALTER
>> <some-object-type> only changed the data contained within the
>> object and had no other impact. Are you both really comfortable
>> with an ALTER MATERIALIZED VIEW which has no effect other than
>> to update the data? It seems wrong to me.
> 
> I think you can already do that with some clever use of alter
> table ... type using, or alter table set default.

You mean, specifying an ALTER TABLE which appears to specify a
change to some non-data aspect of a table but which is really
setting it to the existing state? And it therefore rewrites the
table? I suppose that with USING you could actually even have it
rewritten with data different from what was there before without
changing the structure of the table. Somehow I don't find that
pursuasive as an argument for what ALTER MATERIALIZED VIEW should
rescan the source relations and build a whole new set of data for
exactly the same MV definition.

Consider that in relational theory a table is considered a relation
variable. ALTER is supposed to change the definition of the
variable in some way. Other statements are used to change the value
contained in the variable. Sure there are some grey areas already,
but I don't see where we need to muddy the waters in this case.

-Kevin



Re: Materialized views WIP patch

From
Dimitri Fontaine
Date:
"Kevin Grittner" <kgrittn@mail.com> writes:
> changing the structure of the table. Somehow I don't find that
> pursuasive as an argument for what ALTER MATERIALIZED VIEW should
> rescan the source relations and build a whole new set of data for
> exactly the same MV definition.

Fair enough.

> Consider that in relational theory a table is considered a relation
> variable. ALTER is supposed to change the definition of the
> variable in some way. Other statements are used to change the value
> contained in the variable. Sure there are some grey areas already,
> but I don't see where we need to muddy the waters in this case.

Under that light, using ALTER is strange indeed. I still don't like
using LOAD that much, allow me to try a last syntax proposal. Well all I
can find just now would be:
 UPDATE MATERIALIZED VIEW mv FOR EACH ROW; UPDATE MATERIALIZED VIEW mv FOR EACH STATEMENT [ CONCURRENTLY ];

The only value of such a proposal is that it's not LOAD and it's still
not introducing any new keyword. Oh it's also avoiding to overload the
SNAPSHOT keyword. Well, it still does not look like the best candidate.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: Materialized views WIP patch

From
Tom Lane
Date:
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
> Under that light, using ALTER is strange indeed.

Agreed, seems like a poor choice.

> I still don't like
> using LOAD that much, allow me to try a last syntax proposal. Well all I
> can find just now would be:

>   UPDATE MATERIALIZED VIEW mv FOR EACH ROW;
>   UPDATE MATERIALIZED VIEW mv FOR EACH STATEMENT [ CONCURRENTLY ];

> The only value of such a proposal is that it's not LOAD and it's still
> not introducing any new keyword. Oh it's also avoiding to overload the
> SNAPSHOT keyword. Well, it still does not look like the best candidate.

I think this syntax would require making MATERIALIZED (and possibly also
VIEW) fully reserved keywords, which would be better avoided.
        regards, tom lane



Re: Materialized views WIP patch

From
"David Johnston"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Dimitri Fontaine
> Sent: Tuesday, November 27, 2012 10:03 AM
> To: Kevin Grittner
> Cc: Pavel Stehule; Peter Eisentraut; Pgsql Hackers
> Subject: Re: [HACKERS] Materialized views WIP patch
> 
> "Kevin Grittner" <kgrittn@mail.com> writes:
> > changing the structure of the table. Somehow I don't find that
> > pursuasive as an argument for what ALTER MATERIALIZED VIEW should
> > rescan the source relations and build a whole new set of data for
> > exactly the same MV definition.
> 
> Fair enough.
> 
> > Consider that in relational theory a table is considered a relation
> > variable. ALTER is supposed to change the definition of the variable
> > in some way. Other statements are used to change the value contained
> > in the variable. Sure there are some grey areas already, but I don't
> > see where we need to muddy the waters in this case.
> 
> Under that light, using ALTER is strange indeed. I still don't like using
LOAD
> that much, allow me to try a last syntax proposal. Well all I can find
just now
> would be:
> 
>   UPDATE MATERIALIZED VIEW mv FOR EACH ROW;
>   UPDATE MATERIALIZED VIEW mv FOR EACH STATEMENT [ CONCURRENTLY ];
> 
> The only value of such a proposal is that it's not LOAD and it's still not
> introducing any new keyword. Oh it's also avoiding to overload the
> SNAPSHOT keyword. Well, it still does not look like the best candidate.
> 
> Regards,

Just a thought but how about something like:

DO REFRESH OF MATERIALIZED VIEW mat_view;

In effect we begin overloading the meaning of "DO" to not only mean
anonymous code blocks but to also call pre-defined internal routines that
can be executed without having to use function-call syntax.  "MATERIALIZED
VIEW" can be more generic "e.g., TABLE" if the need arises, the REFRESH
"Action" is generic, and additional clauses can be added after the object
name (FOR, CONCURRENTLY, WHERE, etc...)

David J.











Re: Materialized views WIP patch

From
Pavel Stehule
Date:
2012/11/27 David Johnston <polobo@yahoo.com>:
>> -----Original Message-----
>> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
>> owner@postgresql.org] On Behalf Of Dimitri Fontaine
>> Sent: Tuesday, November 27, 2012 10:03 AM
>> To: Kevin Grittner
>> Cc: Pavel Stehule; Peter Eisentraut; Pgsql Hackers
>> Subject: Re: [HACKERS] Materialized views WIP patch
>>
>> "Kevin Grittner" <kgrittn@mail.com> writes:
>> > changing the structure of the table. Somehow I don't find that
>> > pursuasive as an argument for what ALTER MATERIALIZED VIEW should
>> > rescan the source relations and build a whole new set of data for
>> > exactly the same MV definition.
>>
>> Fair enough.
>>
>> > Consider that in relational theory a table is considered a relation
>> > variable. ALTER is supposed to change the definition of the variable
>> > in some way. Other statements are used to change the value contained
>> > in the variable. Sure there are some grey areas already, but I don't
>> > see where we need to muddy the waters in this case.
>>
>> Under that light, using ALTER is strange indeed. I still don't like using
> LOAD
>> that much, allow me to try a last syntax proposal. Well all I can find
> just now
>> would be:
>>
>>   UPDATE MATERIALIZED VIEW mv FOR EACH ROW;
>>   UPDATE MATERIALIZED VIEW mv FOR EACH STATEMENT [ CONCURRENTLY ];
>>
>> The only value of such a proposal is that it's not LOAD and it's still not
>> introducing any new keyword. Oh it's also avoiding to overload the
>> SNAPSHOT keyword. Well, it still does not look like the best candidate.
>>
>> Regards,
>
> Just a thought but how about something like:
>
> DO REFRESH OF MATERIALIZED VIEW mat_view;
>
> In effect we begin overloading the meaning of "DO" to not only mean
> anonymous code blocks but to also call pre-defined internal routines that
> can be executed without having to use function-call syntax.  "MATERIALIZED
> VIEW" can be more generic "e.g., TABLE" if the need arises, the REFRESH
> "Action" is generic, and additional clauses can be added after the object
> name (FOR, CONCURRENTLY, WHERE, etc...)

-1

I  unlike using keywords DO for this purpose - when we use it for
anonymous blocks

Regards

Pavel

>
> David J.
>
>
>
>
>
>
>
>



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Tue, Nov 27, 2012 at 10:58 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I  unlike using keywords DO for this purpose - when we use it for
> anonymous blocks

Yeah, I don't much like that either.  My original suggestion when
Kevin and I discussed this over voice was ALTER MATERIALIZED VIEW ..
REFRESH or ALTER MATERIALIZED VIEW .. UPDATE.  I don't particularly
like syntaxes involving DO or LOAD because those words already have
strong associations with completely unrelated features.  Now, if we
don't want to do that and we don't want to use ALTER for a
data-modifying command either, another option would be to invent a new
toplevel command:

REFRESH <view_name>;

Of course, that does introduce another keyword, but the penalty for a
new unreserved keyword is pretty small.  It seems like a rough
analogue of CLUSTER, which could be spelled ALTER TABLE <table_name>
UPDATE TABLE ORDER TO if keyword minimization trumped both concision
and clarity, but it doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Robert Haas wrote:

> I don't particularly like syntaxes involving DO or LOAD because
> those words already have strong associations with completely
> unrelated features. Now, if we don't want to do that and we don't
> want to use ALTER for a data-modifying command either, another
> option would be to invent a new toplevel command:
> 
> REFRESH <view_name>;
> 
> Of course, that does introduce another keyword, but the penalty
> for a new unreserved keyword is pretty small.

Of the alternatives to LOAD MATERIALIZED VIEW, something involving
REFRESH seems the best to me. The question is whether REFRESH
MATERIALIZED VIEW (or just REFRESH) is more clear, and whether it
is so by enough to merit another keyword. Of course, there is a
chance that we may wind up needing that keyword for declaring
incremental updates anyway, so it might be a matter of *when* we do
it rather than *whether* we do it -- depending on the yet-to-be-
determined syntax for specifying incremental updates.

My personal preference is still for LOAD MATERIALIZED VIEW because
it implies a complete regeneration rather than something more
incremental, but I realize that is subjective.

-Kevin



Re: Materialized views WIP patch

From
Pavel Stehule
Date:
2012/11/28 Kevin Grittner <kgrittn@mail.com>:
> Robert Haas wrote:
>
>> I don't particularly like syntaxes involving DO or LOAD because
>> those words already have strong associations with completely
>> unrelated features. Now, if we don't want to do that and we don't
>> want to use ALTER for a data-modifying command either, another
>> option would be to invent a new toplevel command:
>>
>> REFRESH <view_name>;
>>
>> Of course, that does introduce another keyword, but the penalty
>> for a new unreserved keyword is pretty small.
>
> Of the alternatives to LOAD MATERIALIZED VIEW, something involving
> REFRESH seems the best to me. The question is whether REFRESH
> MATERIALIZED VIEW (or just REFRESH) is more clear, and whether it
> is so by enough to merit another keyword. Of course, there is a
> chance that we may wind up needing that keyword for declaring
> incremental updates anyway, so it might be a matter of *when* we do
> it rather than *whether* we do it -- depending on the yet-to-be-
> determined syntax for specifying incremental updates.
>
> My personal preference is still for LOAD MATERIALIZED VIEW because
> it implies a complete regeneration rather than something more
> incremental, but I realize that is subjective.

In this context I prefer REFRESH keyword - I have a LOAD associated
with BULKLOAD, a this is different

Regards

Pavel

>
> -Kevin



Re: Materialized views WIP patch

From
"Marko Tiikkaja"
Date:
Hi Kevin,

On Mon, 26 Nov 2012 22:24:33 +0100, Kevin Grittner <kgrittn@mail.com>  
wrote:
> Marko Tiikkaja wrote:
>> <T2 sees an empty table>
>
> As far as I know you are the first to notice this behavior. Thanks
> for pointing it out.
>
> I will take a look at the issue; I don't know whether it's
> something small I can address in this CF or whether it will need to
> be in the next CF, but I will fix it.

Any news on this front?

>> I'll get back when I manage to get a better grasp of the code.

The code looks relatively straightforward and good to my eyes.  It passes  
my testing and looks to be changing all the necessary parts of the code.

> Keep in mind that the current behavior of behaving like a regular
> view when the contents are invalid is not what I had in mind, that
> was an accidental effect of commenting out the body of the
> ExecCheckRelationsValid() function right before posting the patch
> because I noticed a regression. When I noticed current behavior, it
> struck me that someone might prefer it to the intended behavior of
> showing an error like this:
>
>   ereport(ERROR,
>     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>      errmsg("materialized view \"%s\" has not been populated",
>             get_rel_name(rte->relid)),
>      errhint("Use the LOAD MATERIALIZED VIEW command.")));
>
> I mention it in case someone wants to argue for silently behaving
> as a regular view when the MV is not populated.

FWIW, I'd prefer an error in this case, but I don't feel strongly about it.


Regards,
Marko Tiikkaja



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Marko Tiikkaja wrote:
> Kevin Grittner <kgrittn@mail.com> wrote:
>> Marko Tiikkaja wrote:
>>> <T2 sees an empty table>
>>
>> As far as I know you are the first to notice this behavior.
>> Thanks for pointing it out.
>>
>> I will take a look at the issue; I don't know whether it's
>> something small I can address in this CF or whether it will need
>> to be in the next CF, but I will fix it.
> 
> Any news on this front?

On a preliminary look, I think that it's because when the new heap
is visible, it has been populated with rows containing an xmin too
new to be visible to the transaction which was blocked. I'll see if
I can't populate the new heap with tuples which are already frozen
and hinted, which will not only fix this bug, but improve
performance.

>>> I'll get back when I manage to get a better grasp of the code.
> 
> The code looks relatively straightforward and good to my eyes. It
> passes my testing and looks to be changing all the necessary
> parts of the code.

Thanks. I'll put together a new version of the patch based on
feedback so far. I need to finish my testing of the fklocks patch
and post a review first, so it will be a few days.

>> Keep in mind that the current behavior of behaving like a
>> regular view when the contents are invalid is not what I had in
>> mind, that was an accidental effect of commenting out the body
>> of the ExecCheckRelationsValid() function right before posting
>> the patch because I noticed a regression. When I noticed current
>> behavior, it struck me that someone might prefer it to the
>> intended behavior of showing an error like this:
>>
>> ereport(ERROR,
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> errmsg("materialized view \"%s\" has not been populated",
>> get_rel_name(rte->relid)),
>> errhint("Use the LOAD MATERIALIZED VIEW command.")));
>>
>> I mention it in case someone wants to argue for silently
>> behaving as a regular view when the MV is not populated.
> 
> FWIW, I'd prefer an error in this case, but I don't feel strongly
> about it.

Me, too. Since nobody else has spoken up, that's what I'll do.

I haven't heard anyone argue for keeping temporary materialized
views, so those will be dropped from the next version of the patch.

It sounds like most people prefer REFRESH to LOAD. If there is no
further discussion of that, I will make that change in the next
version of the patch; so if you want to argue against adding a new
unreserved keyword for that, now is the time to say so. Also,
should it be just?:

REFRESH matview_name;

or?:

REFRESH MATERIALIZED VIEW matview_name;

I will look at heap_inplace_update() where appropriate for the
pg_class changes.

On the issue of UHLOGGED MVs, I think it's pretty clear that the
right way to handle this is to have something in the init fork of
an unlogged MV that indicates that it is in an invalid state, which
is checked when the MV is opened. I'm not sure of the best way to
store that, but my first instinct would be to put it inot the
"special space" of the initial page, which would then be removed
after forcing the relisvalid flag to false. That feels pretty
kludgy, but its the best idea I've had so far. Anyone else want to
suggest a better way?

I will add regression tests in the next version of the patch.

While I agree that tracking one or more timestamps related to MV
"freshness" is a good idea, that seems like material for a
follow-on patch.

I don't agree that never having been loaded is a form of "stale",
in spite of multiple people whom I hold in high regard expressing
that opinion, so barring strong opposition I intend to keep the
relisvalid column. I just don't feel comfortable about the
fundamental correctness of the feature without it. If I'm reading
things correctly, so far the opposition has been based on being
lukewarm on the value of the column and wanting other features
which might be harder with this column (i.e., UNLOGGED) or which
are seen as alternatives to this column (i.e., a "freshness"
timestamp) rather than actual opposition to throwing an error on an
attempt to use an MV which has not been loaded with data.

I will fix the commented-out test of relisvalid.

I still haven't quite figured out what to do about the case Thom
uncovered where the MV's SELECT was just selecting a literal. I
haven't established what the most general case of that is, nor what
a sensible behavior is. Any opinions or insights welcome.

I didn't see any rebuttal to Tom's concerns about adding
asymetrical COPY support, so I will leave that part as is.

I will try to fill out the dependencies in pg_dump so that MVs
which depend on other MVs will be dumped in the right order.

I think that will probably be enough for the next version.

-Kevin



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Here is a new version of the patch, with most issues discussed in
previous posts fixed.

I've been struggling with two areas:

 - pg_dump sorting for MVs which depend on other MVs
 - proper handling of the relisvalid flag for unlogged MVs after recovery

I've been hacking at the code in those areas without success;
what's here is the least broken form I have, but work is still
needed for these cases. Any other problems are news to me.

In addition, the docs need another pass, and there is an open
question about what is the right thing to use for TRUNCATE syntax.

-Kevin

Attachment

Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
[resending with patch compressed, since original post didn't make
it through to the list]

Here is a new version of the patch, with most issues discussed in
previous posts fixed.

I've been struggling with two areas:

 - pg_dump sorting for MVs which depend on other MVs
 - proper handling of the relisvalid flag for unlogged MVs after recovery

I've been hacking at the code in those areas without success;
what's here is the least broken form I have, but work is still
needed for these cases. Any other problems are news to me.

In addition, the docs need another pass, and there is an open
question about what is the right thing to use for TRUNCATE syntax.

-Kevin

Attachment

Re: Materialized views WIP patch

From
Tom Lane
Date:
"Kevin Grittner" <kgrittn@mail.com> writes:
> I've been struggling with two areas:
>  - pg_dump sorting for MVs which depend on other MVs

Surely that should fall out automatically given that the dependency is
properly expressed in pg_depend?

If you mean you're trying to get it to cope with circular dependencies
between MVs, it might take some work on the pg_dump side, but plain
ordering shouldn't require new code.
        regards, tom lane



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Tom Lane wrote:
> "Kevin Grittner" <kgrittn@mail.com> writes:
>> I've been struggling with two areas:
>> - pg_dump sorting for MVs which depend on other MVs
> 
> Surely that should fall out automatically given that the
> dependency is properly expressed in pg_depend?
> 
> If you mean you're trying to get it to cope with circular
> dependencies between MVs, it might take some work on the pg_dump
> side, but plain ordering shouldn't require new code.

The *definitions* sort properly, but what I'm trying to do is
define them WITH NO DATA and load data after all the COPY
statements for tables. If mva is referenced by mvb, the goal is the
REFRESH mva, build its indexes before running REFRESH for mvb and
building its indexes. To do things in any other order does't seem
to me to leave things after restore in the same state they were in
at the time of the dump.

So I should have been a little more verbose describing the problem:
pg_dump sorting of REFRESH and CREATE INDEX steps for MVs which
depend on other MVs.

Last night I found why my previous attempts had been failing -- I
was trying to build the dependencies at the wrong point in the dump
process, after the sorts had already been done.  Now that I've
spotted that fundamental flaw, I think I can get this out of the
way without too much more fanfare. I kept thinking I had something
wrong in the detail of my approach, while the problem was at a much
higher level.

Where I really need someone to hit me upside the head with a
clue-stick is the code I added to the bottom of RelationBuildDesc()
in relcache.c. The idea is that on first access to an unlogged MV,
to detect that the heap has been replaced by the init fork, set
relisvalid to false, and make the heap look normal again. I
couldn't see any way to do that which wasn't a kludge, and I can't
figure out how to deal with relcache properly in implementing that
kludge. Either a tip about the right way to work the kludge, or a
suggestion for a less kludgy alternative would be welcome.

-Kevin



Re: Materialized views WIP patch

From
Thom Brown
Date:
On 16 January 2013 05:40, Kevin Grittner <kgrittn@mail.com> wrote:
Here is a new version of the patch, with most issues discussed in
previous posts fixed.

I've been struggling with two areas:

 - pg_dump sorting for MVs which depend on other MVs
 - proper handling of the relisvalid flag for unlogged MVs after recovery

Some weirdness:

postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
SELECT 2
postgres=# \d+ mv_test2
                  Materialized view "public.mv_test2"
  Column  |  Type   | Modifiers | Storage | Stats target | Description 
----------+---------+-----------+---------+--------------+-------------
 moo      | integer |           | plain   |              | 
 ?column? | integer |           | plain   |              | 
View definition:
 SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";
Has OIDs: no

The "weirdness" I refer you to is the view definition.  This does not occur with a straightforward UNION.

This does not occur with a regular view:

postgres=# CREATE VIEW v_test3 AS SELECT moo, 2*moo FROM v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
CREATE VIEW
postgres=# \d+ v_test3
                 View "public.v_test3"
  Column  |  Type   | Modifiers | Storage | Description 
----------+---------+-----------+---------+-------------
 moo      | integer |           | plain   | 
 ?column? | integer |           | plain   | 
View definition:
         SELECT v_test2.moo, 2 * v_test2.moo
           FROM v_test2
UNION ALL 
         SELECT v_test2.moo, 3 * v_test2.moo
           FROM v_test2;

--
Thom

Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Thom Brown wrote:

> Some weirdness:
> 
> postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
> CREATE VIEW
> postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
> v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
> SELECT 2
> postgres=# \d+ mv_test2
>  Materialized view "public.mv_test2"
>  Column | Type | Modifiers | Storage | Stats target | Description
> ----------+---------+-----------+---------+--------------+-------------
>  moo | integer | | plain | |
>  ?column? | integer | | plain | |
> View definition:
>  SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";

You are very good at coming up with these, Thom!

Will investigate.

Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?

Thanks,

-Kevin



Re: Materialized views WIP patch

From
Thom Brown
Date:
On 16 January 2013 17:20, Kevin Grittner <kgrittn@mail.com> wrote:
Thom Brown wrote:

> Some weirdness:
>
> postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
> CREATE VIEW
> postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
> v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
> SELECT 2
> postgres=# \d+ mv_test2
>  Materialized view "public.mv_test2"
>  Column | Type | Modifiers | Storage | Stats target | Description
> ----------+---------+-----------+---------+--------------+-------------
>  moo | integer | | plain | |
>  ?column? | integer | | plain | |
> View definition:
>  SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";

You are very good at coming up with these, Thom!

Will investigate.

Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?

Yes, nothing wrong with using the MV, or refreshing it:

postgres=# TABLE mv_test2;
 moo | ?column? 
-----+----------
   1 |        2
   1 |        3
(2 rows)

postgres=# SELECT * FROM mv_test2;
 moo | ?column? 
-----+----------
   1 |        2
   1 |        3
(2 rows)

postgres=# REFRESH MATERIALIZED VIEW mv_test2;
REFRESH MATERIALIZED VIEW

But a pg_dump of the MV has the same issue as the view definition:

--
-- Name: mv_test2; Type: MATERIALIZED VIEW; Schema: public; Owner: thom; Tablespace: 
--

CREATE MATERIALIZED VIEW mv_test2 (
    moo,
    "?column?"
) AS
    SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"
  WITH NO DATA;
 
--
Thom

Re: Materialized views WIP patch

From
Tom Lane
Date:
"Kevin Grittner" <kgrittn@mail.com> writes:
> Tom Lane wrote:
>> Surely that should fall out automatically given that the
>> dependency is properly expressed in pg_depend?

> The *definitions* sort properly, but what I'm trying to do is
> define them WITH NO DATA and load data after all the COPY
> statements for tables. If mva is referenced by mvb, the goal is the
> REFRESH mva, build its indexes before running REFRESH for mvb and
> building its indexes. To do things in any other order does't seem
> to me to leave things after restore in the same state they were in
> at the time of the dump.

Ah.  Can't you treat this using the same pg_dump infrastructure as
for the data for an ordinary table?  The dependencies made for the
TableDataInfo object might be a bit different, but after that it
seems like the sort logic ought to be happy.

> Where I really need someone to hit me upside the head with a
> clue-stick is the code I added to the bottom of RelationBuildDesc()
> in relcache.c. The idea is that on first access to an unlogged MV,
> to detect that the heap has been replaced by the init fork, set
> relisvalid to false, and make the heap look normal again.

Hmm.  I agree that relcache.c has absolutely no business doing that,
but not sure what else to do instead.  Seems like it might be better
done at first touch of the MV in the parser, rewriter, or planner ---
but the fact that I can't immediately decide which of those is right
makes me feel that it's still too squishy.

I'm also wondering about locking issues there.  Obviously you don't
want more than one backend trying to rebuild the MV.

Do we really need unlogged MVs in the first iteration?  Seems like
that's adding a whole bunch of new issues, when you have quite enough
already without that.
        regards, tom lane



Re: Materialized views WIP patch

From
Josh Berkus
Date:
> Do we really need unlogged MVs in the first iteration?  Seems like
> that's adding a whole bunch of new issues, when you have quite enough
> already without that.

While I think there is strong user demand for unlogged MVs, if we can
get MVs without unlogged ones for 9.3, I say go for that.  We'll add
unlogged in 9.4.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Where I really need someone to hit me upside the head with a
>> clue-stick is the code I added to the bottom of RelationBuildDesc()
>> in relcache.c. The idea is that on first access to an unlogged MV,
>> to detect that the heap has been replaced by the init fork, set
>> relisvalid to false, and make the heap look normal again.
>
> Hmm.  I agree that relcache.c has absolutely no business doing that,
> but not sure what else to do instead.  Seems like it might be better
> done at first touch of the MV in the parser, rewriter, or planner ---
> but the fact that I can't immediately decide which of those is right
> makes me feel that it's still too squishy.

I think we shouldn't be doing that at all.  The whole business of
transferring the relation-is-invalid information from the relation to
a pg_class flag seems like a Rube Goldberg device to me.  I'm still
not convinced that we should have a relation-is-invalid flag at all,
but can we at least not have two?

It seems perfectly adequate to detect that the MV is invalid when we
actually try to execute a plan - that is, when we first access the
heap or one of its indexes.  So the bit can just live in the
file-on-disk, and there's no need to have a second copy of it in
pg_class.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Thom Brown
Date:
On 16 January 2013 17:25, Thom Brown <thom@linux.com> wrote:
On 16 January 2013 17:20, Kevin Grittner <kgrittn@mail.com> wrote:
Thom Brown wrote:

> Some weirdness:
>
> postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
> CREATE VIEW
> postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
> v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
> SELECT 2
> postgres=# \d+ mv_test2
>  Materialized view "public.mv_test2"
>  Column | Type | Modifiers | Storage | Stats target | Description
> ----------+---------+-----------+---------+--------------+-------------
>  moo | integer | | plain | |
>  ?column? | integer | | plain | |
> View definition:
>  SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";

You are very good at coming up with these, Thom!

Will investigate.

Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?

Yes, nothing wrong with using the MV, or refreshing it:

postgres=# TABLE mv_test2;
 moo | ?column? 
-----+----------
   1 |        2
   1 |        3
(2 rows)

postgres=# SELECT * FROM mv_test2;
 moo | ?column? 
-----+----------
   1 |        2
   1 |        3
(2 rows)

postgres=# REFRESH MATERIALIZED VIEW mv_test2;
REFRESH MATERIALIZED VIEW

But a pg_dump of the MV has the same issue as the view definition:

--
-- Name: mv_test2; Type: MATERIALIZED VIEW; Schema: public; Owner: thom; Tablespace: 
--

CREATE MATERIALIZED VIEW mv_test2 (
    moo,
    "?column?"
) AS
    SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"
  WITH NO DATA;

A separate issue is with psql tab-completion:

postgres=# COMMENT ON MATERIALIZED VIEW ^IIS

This should be offering MV names instead of prematurely providing the "IS" keyword.

--
Thom

Re: Materialized views WIP patch

From
Simon Riggs
Date:
On 16 January 2013 05:40, Kevin Grittner <kgrittn@mail.com> wrote:

> Here is a new version of the patch, with most issues discussed in
> previous posts fixed.

Looks good.

The patch implements one kind of MV. In the future, we hope to have
other features or other kinds of MV alongside this:
* Snapshot MV - built once at start and then refreshed by explicit command only
* Snapshot MV with fast refresh
* Maintained MV (lazy) - changes trickle continuously into lazy MVs
* Maintained MV (transactional) - changes applied as part of write transactions
and or others

So I think we should agree now some aspects of those other options so
we can decide syntax. Otherwise we'll be left in the situation that
what we implement in 9.3 becomes the default for all time and/or we
have difficulties adding things later. e.g.
REFRESH ON COMMAND

Also, since there is no optimizer linkage between these MVs and the
tables they cover, I think we need to have that explicitly as a
command option, e.g.
DISABLE OPTIMIZATION

That way in the future we can implement "ENABLE OPTIMIZATION" mode and
REFRESH TRANSACTIONAL mode as separate items.

So all I am requesting is that we add additional syntax now, so that
future additional features are clear.

Please suggest syntax, not wedded to those... and we may want to use
more compatible syntax also.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Materialized views WIP patch

From
Thom Brown
Date:
On 17 January 2013 16:03, Thom Brown <thom@linux.com> wrote:
On 16 January 2013 17:25, Thom Brown <thom@linux.com> wrote:
On 16 January 2013 17:20, Kevin Grittner <kgrittn@mail.com> wrote:
Thom Brown wrote:

> Some weirdness:
>
> postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
> CREATE VIEW
> postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
> v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
> SELECT 2
> postgres=# \d+ mv_test2
>  Materialized view "public.mv_test2"
>  Column | Type | Modifiers | Storage | Stats target | Description
> ----------+---------+-----------+---------+--------------+-------------
>  moo | integer | | plain | |
>  ?column? | integer | | plain | |
> View definition:
>  SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";

You are very good at coming up with these, Thom!

Will investigate.

Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?

Yes, nothing wrong with using the MV, or refreshing it:

postgres=# TABLE mv_test2;
 moo | ?column? 
-----+----------
   1 |        2
   1 |        3
(2 rows)

postgres=# SELECT * FROM mv_test2;
 moo | ?column? 
-----+----------
   1 |        2
   1 |        3
(2 rows)

postgres=# REFRESH MATERIALIZED VIEW mv_test2;
REFRESH MATERIALIZED VIEW

But a pg_dump of the MV has the same issue as the view definition:

--
-- Name: mv_test2; Type: MATERIALIZED VIEW; Schema: public; Owner: thom; Tablespace: 
--

CREATE MATERIALIZED VIEW mv_test2 (
    moo,
    "?column?"
) AS
    SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"
  WITH NO DATA;

A separate issue is with psql tab-completion:

postgres=# COMMENT ON MATERIALIZED VIEW ^IIS

This should be offering MV names instead of prematurely providing the "IS" keyword.

Also in doc/src/sgml/ref/alter_materialized_view.sgml:

s/materailized/materialized/


In src/backend/executor/execMain.c:

s/referrenced/referenced/
 
--
Thom

Re: Materialized views WIP patch

From
Noah Misch
Date:
On Thu, Jan 17, 2013 at 07:54:55AM -0500, Robert Haas wrote:
> On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Where I really need someone to hit me upside the head with a
> >> clue-stick is the code I added to the bottom of RelationBuildDesc()
> >> in relcache.c. The idea is that on first access to an unlogged MV,
> >> to detect that the heap has been replaced by the init fork, set
> >> relisvalid to false, and make the heap look normal again.
> >
> > Hmm.  I agree that relcache.c has absolutely no business doing that,
> > but not sure what else to do instead.  Seems like it might be better
> > done at first touch of the MV in the parser, rewriter, or planner ---
> > but the fact that I can't immediately decide which of those is right
> > makes me feel that it's still too squishy.
> 
> I think we shouldn't be doing that at all.  The whole business of
> transferring the relation-is-invalid information from the relation to
> a pg_class flag seems like a Rube Goldberg device to me.  I'm still
> not convinced that we should have a relation-is-invalid flag at all,
> but can we at least not have two?
> 
> It seems perfectly adequate to detect that the MV is invalid when we
> actually try to execute a plan - that is, when we first access the
> heap or one of its indexes.  So the bit can just live in the
> file-on-disk, and there's no need to have a second copy of it in
> pg_class.

Like Kevin, I want a way to distinguish unpopulated MVs from MVs that
genuinely yielded the empty set at last refresh.  I agree that there's no
particular need to store that fact in pg_class, and I would much prefer only
storing it one way in any case.  A user-visible disadvantage of the current
implementation is that relisvalid remains stale until something opens the rel.
That's fine for the system itself, but it can deceive user-initiated catalog
queries.  Imagine a check_postgres action that looks for invalid MVs to
complain about.  It couldn't just scan pg_class; it would need to first do
something that opens every MV.

I suggest the following:

1. Let an invalid MV have a zero-length heap.  Distinguish a valid, empty MV  by giving it a page with no tuples.  This
entailsVACUUM[1] not truncating  MVs below one page and the refresh operation, where necessary, extending  the relation
fromzero pages to one.
 
2. Remove pg_class.relisvalid.
3. Add a bool field to RelationData.  The word "valid" is used in that context  to refer to the validity of the
structureitself, so perhaps call the new  field rd_scannable.  RelationIsFlaggedAsValid() can become a macro;  consider
changingits name for consistency with the field name.
 
4. During relcache build, set the field to "RelationGetNumberBlocks(rel) != 0"  for MVs, fixed "true" for everyone
else. Any operation that changes the  field must, and probably would anyway, instigate a relcache invalidation.
 
5. Expose a database function, say pg_relation_scannable(), for querying the  current state of a relation.  This
supportsuser-level monitoring.
 

Does that seem reasonable?  One semantic difference to keep in mind is that
unlogged MVs will be considered invalid on the standby while valid on the
master.  That's essentially an accurate report, so I won't mind it.

For the benefit of the archives, I note that we almost need not truncate an
unlogged materialized view during crash recovery.  MVs are refreshed in a
VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
pg_class to that relfilenode.  When a crash occurs with no refresh in flight,
the latest refresh had been safely synced.  When a crash cuts short a refresh,
the pg_class update will not stick, and the durability of the old heap is not
in doubt.  However, non-btree index builds don't have the same property; we
would need to force an immediate sync of the indexes to be safe here.  It
would remain necessary to truncate unlogged MVs when recovering a base backup,
which may contain a partially-written refresh that did eventually commit.
Future MV variants that modify the MV in place would also need the usual
truncate on crash.

I'm going to follow this with a review covering a broader range of topics.

Thanks,
nm

[1] For the time being, it's unfortunate to VACUUM materialized views at all;
they only ever bear frozen tuples.



Re: Materialized views WIP patch

From
Noah Misch
Date:
Hi Kevin,

The patch conflicts with git master; I tested against master@{2013-01-20}.

On Wed, Jan 16, 2013 at 12:40:55AM -0500, Kevin Grittner wrote:
> I've been struggling with two areas:
> 
>  - pg_dump sorting for MVs which depend on other MVs

From your later messages, I understand that you have a way forward on this.

>  - proper handling of the relisvalid flag for unlogged MVs after recovery

I have discussed this in a separate email.  While reading the patch to assess
that topic, I found a few more things:

> *** a/contrib/pg_upgrade/version_old_8_3.c
> --- b/contrib/pg_upgrade/version_old_8_3.c
> ***************
> *** 145,151 **** old_8_3_check_for_tsquery_usage(ClusterInfo *cluster)
>                                   "FROM    pg_catalog.pg_class c, "
>                                   "        pg_catalog.pg_namespace n, "
>                                   "        pg_catalog.pg_attribute a "
> !                                 "WHERE    c.relkind = 'r' AND "
>                                   "        c.oid = a.attrelid AND "
>                                   "        NOT a.attisdropped AND "
>                                   "        a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND "
> --- 145,151 ----
>                                   "FROM    pg_catalog.pg_class c, "
>                                   "        pg_catalog.pg_namespace n, "
>                                   "        pg_catalog.pg_attribute a "
> !                                 "WHERE    c.relkind in ('r', 'm') AND "
>                                   "        c.oid = a.attrelid AND "
>                                   "        NOT a.attisdropped AND "
>                                   "        a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND "

PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really
matter whether this change happens or not.  I suggest adding a comment,
whether or not you keep the code change.

> *** a/contrib/sepgsql/sepgsql.h
> --- b/contrib/sepgsql/sepgsql.h
> ***************
> *** 32,37 ****
> --- 32,39 ----
>   
>   /*
>    * Internally used code of object classes
> +  *
> +  * NOTE: Materialized views are treated as tables for now.

This smells like a bypass of mandatory access control.  Unless you've
determined that this is correct within the sepgsql security model, I suggest
starting with a draconian policy, like simply crippling MVs.  Even if you have
determined that, separating out the nontrivial sepgsql support might be good.
The set of ideal reviewers is quite different.

>    */
>   #define SEPG_CLASS_PROCESS            0
>   #define SEPG_CLASS_FILE                1
> *** a/contrib/vacuumlo/vacuumlo.c
> --- b/contrib/vacuumlo/vacuumlo.c
> ***************
> *** 209,215 **** vacuumlo(const char *database, const struct _param * param)
>       strcat(buf, "      AND a.atttypid = t.oid ");
>       strcat(buf, "      AND c.relnamespace = s.oid ");
>       strcat(buf, "      AND t.typname in ('oid', 'lo') ");
> !     strcat(buf, "      AND c.relkind = 'r'");
>       strcat(buf, "      AND s.nspname !~ '^pg_'");
>       res = PQexec(conn, buf);
>       if (PQresultStatus(res) != PGRES_TUPLES_OK)
> --- 209,215 ----
>       strcat(buf, "      AND a.atttypid = t.oid ");
>       strcat(buf, "      AND c.relnamespace = s.oid ");
>       strcat(buf, "      AND t.typname in ('oid', 'lo') ");
> !     strcat(buf, "      AND c.relkind in ('r', 'm')");

It concerns me slightly that older vacuumlo could silently remove large
objects still referenced by MVs.  Only slightly, though, because the next MV
refresh would remove those references anyway.  Is there anything we should do
to help that situation?  If nothing else, perhaps backpatch this patch hunk.

> +    <varlistentry>
> +     <term><literal>WITH OIDS</></term>
> +     <term><literal>WITHOUT OIDS</></term>
> +     <listitem>
> +      <para>
> +       These are obsolescent syntaxes equivalent to <literal>WITH (OIDS)</>
> +       and <literal>WITH (OIDS=FALSE)</>, respectively.  If you wish to give
> +       both an <literal>OIDS</> setting and storage parameters, you must use
> +       the <literal>WITH ( ... )</> syntax; see above.
> +      </para>
> +     </listitem>
> +    </varlistentry>

Let's not support OIDs on MVs.  They'll be regenerated on every refresh.

> ***************
> *** 336,342 **** ExplainOneQuery(Query *query, IntoClause *into, ExplainState *es,
>    */
>   void
>   ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
> !                   const char *queryString, ParamListInfo params)
>   {
>       if (utilityStmt == NULL)
>           return;
> --- 338,345 ----
>    */
>   void
>   ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
> !                   const char *queryString, DestReceiver *dest,
> !                   ParamListInfo params)
>   {
>       if (utilityStmt == NULL)
>           return;
> ***************
> *** 349,361 **** ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
>            * contained parsetree another time, but let's be safe.
>            */
>           CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt;
> !         List       *rewritten;
>   
>           Assert(IsA(ctas->query, Query));
> !         rewritten = QueryRewrite((Query *) copyObject(ctas->query));
> !         Assert(list_length(rewritten) == 1);
> !         ExplainOneQuery((Query *) linitial(rewritten), ctas->into, es,
> !                         queryString, params);
>       }
>       else if (IsA(utilityStmt, ExecuteStmt))
>           ExplainExecuteQuery((ExecuteStmt *) utilityStmt, into, es,
> --- 352,366 ----
>            * contained parsetree another time, but let's be safe.
>            */
>           CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt;
> !         Query       *query = (Query *) ctas->query;
> ! 
> !         dest = CreateIntoRelDestReceiver(into);
>   
>           Assert(IsA(ctas->query, Query));
> ! 
> !         query = SetupForCreateTableAs(query, ctas->into, queryString, params, dest);
> ! 
> !         ExplainOneQuery(query, ctas->into, es, queryString, dest, params);
>       }
>       else if (IsA(utilityStmt, ExecuteStmt))
>           ExplainExecuteQuery((ExecuteStmt *) utilityStmt, into, es,

If I'm reading this right, you always overwrite the passed-in dest without
looking at it.  What's the intent here?

> +     /*
> +      * Kludge here to allow refresh of a materialized view which is invalid
> +      * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the
> +      * first two RangeTblEntry list elements, which were added to the front
> +      * of the rewritten Query to keep the rules system happy, with the
> +      * isResultRel flag to indicate that it is OK if they are flagged as
> +      * invalid.
> +      */
> +     rtable = dataQuery->rtable;
> +     ((RangeTblEntry *) linitial(rtable))->isResultRel = true;
> +     ((RangeTblEntry *) lsecond(rtable))->isResultRel = true;

Is it safe to assume that the first two RTEs are the correct ones to flag?

> +     /*
> +      * Swap the physical files of the target and transient tables, then
> +      * rebuild the target's indexes and throw away the transient table.
> +      */
> +     finish_heap_swap(matviewOid, OIDNewHeap, false, false, false, RecentXmin);

The check_constraints argument should be "true", because the refresh could
have invalidated a UNIQUE index.

> ***************
> *** 3049,3055 **** ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
>               break;
>           case AT_ClusterOn:        /* CLUSTER ON */
>           case AT_DropCluster:    /* SET WITHOUT CLUSTER */
> !             ATSimplePermissions(rel, ATT_TABLE);
>               /* These commands never recurse */
>               /* No command-specific prep needed */
>               pass = AT_PASS_MISC;
> --- 3104,3110 ----
>               break;
>           case AT_ClusterOn:        /* CLUSTER ON */
>           case AT_DropCluster:    /* SET WITHOUT CLUSTER */
> !             ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW);

If the user desires an actually-clustered MV, he must re-CLUSTER it after each
refresh.  That deserves a documentation mention.

> ***************
> *** 724,729 **** InitPlan(QueryDesc *queryDesc, int eflags)
> --- 765,775 ----
>       ExecCheckRTPerms(rangeTable, true);
>   
>       /*
> +      * Ensure that all referrenced relations are flagged as valid.

Typo.

> +      */
> +     ExecCheckRelationsValid(rangeTable);

I believe this ought to happen after the executor lock acquisitions, perhaps
right in ExecOpenScanRelation().  Since you'll then have an open Relation,
RelationIsFlaggedAsValid() can use the relcache.

> ***************
> *** 1591,1596 **** fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
> --- 1592,1607 ----
>           rel = heap_open(rte->relid, NoLock);
>   
>           /*
> +          * Skip materialized view expansion when resultRelation is set.
> +          */
> +         if (rel->rd_rel->relkind == RELKIND_MATVIEW &&
> +             rel->rd_rel->relisvalid)
> +         {
> +             heap_close(rel, NoLock);
> +             break;
> +         }

Would you elaborate on this?

> +     /* Strip off the trailing semicolon so that other things may follow. */
> +     appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1);

I suggest verifying that the last character is indeed a semicolon.

>   /*
> +  * dumpMatViewIndex
> +  *      write out to fout a user-defined index
> +  */
> + static void
> + dumpMatViewIndex(Archive *fout, IndxInfo *indxinfo)

This is so similar to dumpIndex(); can we avoid this level of duplication?

> *** /dev/null
> --- b/src/test/regress/sql/matview.sql

> + -- test diemv when the mv does exist
> + DROP MATERIALIZED VIEW IF EXISTS tum;
> + 
> + -- make sure that dependencies are reported properly when they block the drop
> + DROP TABLE t;
> + 
> + -- make sure dependencies are dropped and reported
> + DROP TABLE t CASCADE;

Please retain an interesting sample of materialized views in the regression
database.  Among other benefits, the pg_upgrade test suite exercises pg_dump
and pg_upgrade for all object types retained in the regression database.

The regression tests should probably include a few other wrinkles, like an
index on a MV.


Creating a RULE on an MV succeeds, but refreshing the view then fails:

[local] test=# create rule mvrule as on insert to mymv where 1 = 0 do also select 1;
CREATE RULE
[local] test=# REFRESH MATERIALIZED VIEW mymv;
ERROR:  materialized view "mymv" has too many rules


The documentation is a good start.  I would expect a brief introduction in
Tutorial -> Advanced Features and possibly a deeper discussion under The SQL
Language.  I suggest updating Explicit Locking to mention the new commands;
users will be interested in the lock level of a refresh.

You have chosen to make pg_dump preserve the valid-or-invalid state of each
MV.  That seems reasonable, though I'm slightly concerned about the case of a
dump taken from a standby.


We support ALTER TABLE against regular views for historical reasons.  When we
added foreign tables, we did not extend that permissiveness; one can only use
ALTER FOREIGN TABLE on foreign tables.  Please do the same for materialized
views.  See RangeVarCallbackForAlterRelation().  Note that "ALTER TABLE
... RENAME colname TO newname" and "ALTER TABLE ... RENAME CONSTRAINT" are
currently supported for MVs by ALTER TABLE but not by ALTER MATERIALIZED VIEW.

There's no documented support for table constraints on MVs, but UNIQUE
constraints are permitted:

[local] test=# alter materialized view mymv add unique (c);
ALTER MATERIALIZED VIEW
[local] test=# alter materialized view mymv add check (c > 0);
ERROR:  "mymv" is not a table
[local] test=# alter materialized view mymv add primary key (c);
ERROR:  "mymv" is not a table or foreign table

Some of the ALTER TABLE variants would make plenty of sense for MVs:
   ALTER [ COLUMN ] column_name SET STATISTICS integer   ALTER [ COLUMN ] column_name SET ( attribute_option = value [,
...] )   ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )   ALTER [ COLUMN ] column_name SET STORAGE {
PLAIN| EXTERNAL | EXTENDED | MAIN }
 

It wouldn't be a problem to skip those for the first patch, though.
Conversely, this syntax is accepted:
   ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] )

But there are no available options.  The only option accepted for regular
views, security_barrier, is rejected.  MVs always have security_barrier
semantics, in any event.

Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to
determine which ones make sense for MVs.  For each one in the sensible set,
either allow it or add a comment indicating that it could reasonably be
allowed in the future.  For each one outside the set, forbid it.  Verify that
the documentation, the results of your evaluation, and the actual allowed
operations are all consistent.

Thanks,
nm



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Thanks for looking at this!

Noah Misch wrote:

> For the benefit of the archives, I note that we almost need not truncate an
> unlogged materialized view during crash recovery. MVs are refreshed in a
> VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
> pg_class to that relfilenode. When a crash occurs with no refresh in flight,
> the latest refresh had been safely synced. When a crash cuts short a refresh,
> the pg_class update will not stick, and the durability of the old heap is not
> in doubt. However, non-btree index builds don't have the same property; we
> would need to force an immediate sync of the indexes to be safe here. It
> would remain necessary to truncate unlogged MVs when recovering a base backup,
> which may contain a partially-written refresh that did eventually commit.
> Future MV variants that modify the MV in place would also need the usual
> truncate on crash.

Hmm. That's a very good observation. Perhaps the issue can be
punted to a future release where we start adding more incremental
updates to them. I'll think on that, but on the face of it, it
sounds like the best choice.

> I'm going to follow this with a review covering a broader range
> of topics.

I'll need time to digest the rest of it. As you note, recent
commits conflict with the last patch. Please look at the github
repo where I've been working on this. I'll post an updated patch
later today.

https://github.com/kgrittn/postgres/tree/matview

You might want to ignore the interim work on detecting the new
pg_dump dependencies through walking the internal structures. I
decided that was heading in a direction which might be
unnecessarily fragile and slow; so I tried writing it as a query
against the system tables. I'm pretty happy with the results.
Here's the query:

with recursive w as
(
select   d1.objid,   d1.objid as wrkid,   d2.refobjid,   c2.relkind as refrelkind from pg_depend d1 join pg_class c1 on
c1.oid= d1.objid                 and c1.relkind = 'm'                 and c1.relisvalid join pg_rewrite r1 on
r1.ev_class= d1.objid join pg_depend d2 on d2.classid = 'pg_rewrite'::regclass                  and d2.objid = r1.oid 
               and d2.refobjid <> d1.objid join pg_class c2 on c2.oid = d2.refobjid                 and c2.relkind in
('m','v')                and c2.relisvalid where d1.classid = 'pg_class'::regclass
 
union
select   w.objid,   w.refobjid as wrkid,   d3.refobjid,   c3.relkind as refrelkind from w join pg_rewrite r3 on
r3.ev_class= w.refobjid join pg_depend d3 on d3.classid = 'pg_rewrite'::regclass                  and d3.objid =
r3.oid                 and d3.refobjid <> w.refobjid join pg_class c3 on c3.oid = d3.refobjid                 and
c3.relkindin ('m','v')                 and c3.relisvalid where w.refrelkind <> 'm'
 
),
x as
(
select objid::regclass, refobjid::regclass from w where refrelkind = 'm'
)
select 'm'::text as type, x.objid, x.refobjid from x
union all
select   'i'::text as type,   x.objid,   i.indexrelid as refobjid from x join pg_index i on i.indrelid = x.refobjid   
           and i.indisvalid
 
;

If we bail on having pg_class.relisvalid, then it will obviously
need adjustment.

-Kevin



Re: Materialized views WIP patch

From
Noah Misch
Date:
On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:
> Noah Misch wrote:
> > For the benefit of the archives, I note that we almost need not truncate an
> > unlogged materialized view during crash recovery. MVs are refreshed in a
> > VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
> > pg_class to that relfilenode. When a crash occurs with no refresh in flight,
> > the latest refresh had been safely synced. When a crash cuts short a refresh,
> > the pg_class update will not stick, and the durability of the old heap is not
> > in doubt. However, non-btree index builds don't have the same property; we
> > would need to force an immediate sync of the indexes to be safe here. It
> > would remain necessary to truncate unlogged MVs when recovering a base backup,
> > which may contain a partially-written refresh that did eventually commit.
> > Future MV variants that modify the MV in place would also need the usual
> > truncate on crash.
> 
> Hmm. That's a very good observation. Perhaps the issue can be
> punted to a future release where we start adding more incremental
> updates to them. I'll think on that, but on the face of it, it
> sounds like the best choice.

That situation is challenging for the same reason pg_class.relisvalid was hard
to implement for unlogged relations.  The startup process doesn't know the
relkind of the unlogged-relation relfilenodes it cleans.  If you can work
through all that, it's certainly a nice endpoint to not lose unlogged snapshot
MVs on crash.  But I intended the first half of my message as the
recommendation and the above as a wish for the future.

> You might want to ignore the interim work on detecting the new
> pg_dump dependencies through walking the internal structures. I
> decided that was heading in a direction which might be
> unnecessarily fragile and slow; so I tried writing it as a query
> against the system tables. I'm pretty happy with the results.
> Here's the query:
> 
> with recursive w as
[snip]

Why is the dependency problem of ordering MV refreshes and MV index builds so
different from existing pg_dump dependency problems?

> If we bail on having pg_class.relisvalid, then it will obviously
> need adjustment.

Even if we don't have the column, we can have the fact of an MV's validity
SQL-visible in some other way.



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Noah Misch wrote:
> On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:
>> Noah Misch wrote:
>>> For the benefit of the archives, I note that we almost need not truncate an
>>> unlogged materialized view during crash recovery. MVs are refreshed in a
>>> VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
>>> pg_class to that relfilenode. When a crash occurs with no refresh in flight,
>>> the latest refresh had been safely synced. When a crash cuts short a refresh,
>>> the pg_class update will not stick, and the durability of the old heap is not
>>> in doubt. However, non-btree index builds don't have the same property; we
>>> would need to force an immediate sync of the indexes to be safe here. It
>>> would remain necessary to truncate unlogged MVs when recovering a base backup,
>>> which may contain a partially-written refresh that did eventually commit.
>>> Future MV variants that modify the MV in place would also need the usual
>>> truncate on crash.
>> 
>> Hmm. That's a very good observation. Perhaps the issue can be
>> punted to a future release where we start adding more incremental
>> updates to them. I'll think on that, but on the face of it, it
>> sounds like the best choice.
> 
> That situation is challenging for the same reason pg_class.relisvalid was hard
> to implement for unlogged relations. The startup process doesn't know the
> relkind of the unlogged-relation relfilenodes it cleans. If you can work
> through all that, it's certainly a nice endpoint to not lose unlogged snapshot
> MVs on crash. But I intended the first half of my message as the
> recommendation and the above as a wish for the future.

Well, if I just don't create an init fork for MVs, they are left as
they were on recovery, aren't they? So for 9.3, that solves that
issue, I think. pg_class.relisvald is a separate issue.

>> You might want to ignore the interim work on detecting the new
>> pg_dump dependencies through walking the internal structures. I
>> decided that was heading in a direction which might be
>> unnecessarily fragile and slow; so I tried writing it as a query
>> against the system tables. I'm pretty happy with the results.
>> Here's the query:
>> 
>> with recursive w as
> [snip]
> 
> Why is the dependency problem of ordering MV refreshes and MV index builds so
> different from existing pg_dump dependency problems?

If mva has indexes and is referenced by mvb, the CREATE statements
are all properly ordered, but you want mva populated and indexed
before you attempt to populate mvb. (Populated to get correct
results, indexed to get them quickly.) We don't have anything else
like that.

>> If we bail on having pg_class.relisvalid, then it will obviously
>> need adjustment.
> 
> Even if we don't have the column, we can have the fact of an MV's validity
> SQL-visible in some other way.

Sure, I didn't say we had to abandon the query -- probably just
replace the relisvalid tests with a function call using the oid of
the MV.

-Kevin



Re: Materialized views WIP patch

From
Noah Misch
Date:
On Thu, Jan 24, 2013 at 03:14:15PM -0500, Kevin Grittner wrote:
> Noah Misch wrote:
> > On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:
> >> Noah Misch wrote:
> >>> For the benefit of the archives, I note that we almost need not truncate an
> >>> unlogged materialized view during crash recovery. MVs are refreshed in a
> >>> VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
> >>> pg_class to that relfilenode. When a crash occurs with no refresh in flight,
> >>> the latest refresh had been safely synced. When a crash cuts short a refresh,
> >>> the pg_class update will not stick, and the durability of the old heap is not
> >>> in doubt. However, non-btree index builds don't have the same property; we
> >>> would need to force an immediate sync of the indexes to be safe here. It
> >>> would remain necessary to truncate unlogged MVs when recovering a base backup,
> >>> which may contain a partially-written refresh that did eventually commit.
> >>> Future MV variants that modify the MV in place would also need the usual
> >>> truncate on crash.
> >> 
> >> Hmm. That's a very good observation. Perhaps the issue can be
> >> punted to a future release where we start adding more incremental
> >> updates to them. I'll think on that, but on the face of it, it
> >> sounds like the best choice.
> > 
> > That situation is challenging for the same reason pg_class.relisvalid was hard
> > to implement for unlogged relations. The startup process doesn't know the
> > relkind of the unlogged-relation relfilenodes it cleans. If you can work
> > through all that, it's certainly a nice endpoint to not lose unlogged snapshot
> > MVs on crash. But I intended the first half of my message as the
> > recommendation and the above as a wish for the future.
> 
> Well, if I just don't create an init fork for MVs, they are left as
> they were on recovery, aren't they? So for 9.3, that solves that
> issue, I think. pg_class.relisvald is a separate issue.

The startup process just looks for init forks, yes.  But it's acceptable to
leave the unlogged MV materials alone during *crash* recovery only.  When
recovering from a base backup, we once again need an init fork to refresh the
unlogged-MV relations.  In turn, we would still need a relisvalid
implementation that copes.  This is all solvable, sure, but it looks like a
trip off into the weeds relative to the core aim of this patch.

> > Why is the dependency problem of ordering MV refreshes and MV index builds so
> > different from existing pg_dump dependency problems?
> 
> If mva has indexes and is referenced by mvb, the CREATE statements
> are all properly ordered, but you want mva populated and indexed
> before you attempt to populate mvb. (Populated to get correct
> results, indexed to get them quickly.) We don't have anything else
> like that.

Is the REFRESH order just a replay of the CREATE order (with index builds
interspersed), or can it differ?

nm



Re: Materialized views WIP patch

From
"Kevin Grittner"
Date:
Noah Misch wrote:

> The patch conflicts with git master; I tested against master@{2013-01-20}.

New patch rebased, fixes issues raised by Thom Brown, and addresses
some of your points.

> PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really
> matter whether this change happens or not. I suggest adding a comment,
> whether or not you keep the code change.

Reverted code changes to version_old_8_3.c; added comments.

> > *** a/contrib/sepgsql/sepgsql.h
> > --- b/contrib/sepgsql/sepgsql.h
> > ***************
> > *** 32,37 ****
> > --- 32,39 ----
> >
> > /*
> > * Internally used code of object classes
> > + *
> > + * NOTE: Materialized views are treated as tables for now.
>
> This smells like a bypass of mandatory access control. Unless you've
> determined that this is correct within the sepgsql security model, I suggest
> starting with a draconian policy, like simply crippling MVs. Even if you have
> determined that, separating out the nontrivial sepgsql support might be good.
> The set of ideal reviewers is quite different.

Robert suggested this way of coping for now. Will post just the
sepgsql separately to try to attract the right crowd to confirm.

> It concerns me slightly that older vacuumlo could silently remove large
> objects still referenced by MVs. Only slightly, though, because the next MV
> refresh would remove those references anyway. Is there anything we should do
> to help that situation? If nothing else, perhaps backpatch this patch hunk.

Defensive backpatching of this code sounds like a good idea to me.
I'm open to other opinions on whether we need to defend 9.3 and
later against earler versions of vacuumlo being run against them.

> Let's not support OIDs on MVs. They'll be regenerated on every refresh.

Do they have any value for people who might want to use cursors? If
nobody speaks up for them, I will drop OID support for materialized
views.

> If I'm reading this right, you always overwrite the passed-in dest without
> looking at it. What's the intent here?

Let me get back to you on that one.

> > + /*
> > + * Kludge here to allow refresh of a materialized view which is invalid
> > + * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the
> > + * first two RangeTblEntry list elements, which were added to the front
> > + * of the rewritten Query to keep the rules system happy, with the
> > + * isResultRel flag to indicate that it is OK if they are flagged as
> > + * invalid.
> > + */
> > + rtable = dataQuery->rtable;
> > + ((RangeTblEntry *) linitial(rtable))->isResultRel = true;
>> + ((RangeTblEntry *) lsecond(rtable))->isResultRel = true;
>
> Is it safe to assume that the first two RTEs are the correct ones to flag?

I'm trying to play along with UpdateRangeTableOfViewParse() in
view.c. See the comment in front of that function for details.

>> + finish_heap_swap(matviewOid, OIDNewHeap, false, false, false, RecentXmin);
>
> The check_constraints argument should be "true", because the refresh could
> have invalidated a UNIQUE index.

Fixed.

> If the user desires an actually-clustered MV, he must re-CLUSTER it after each
> refresh. That deserves a documentation mention.

That point had not occurred to me. Let me see if I can fix that before changing docs.

> > + * Ensure that all referrenced relations are flagged as valid.
>
> Typo.

Fixed.

>> + ExecCheckRelationsValid(rangeTable);
>
> I believe this ought to happen after the executor lock acquisitions, perhaps
> right in ExecOpenScanRelation(). Since you'll then have an open Relation,
> RelationIsFlaggedAsValid() can use the relcache.

That would break MVs entirely. This probably deserves more
comments. It's a little fragile, but was the best way I found to
handle things. An MV has a rule associated with it, just like a
"regular" view, which is parse-analyzed but not rewritten or
planned. We need to allow the rewrite and planning for statements
which populate the view, but suppress expansion of the rule for
simple references. It is OK for an MV to be invalid if it is being
populated, but not if it is being referenced. Long story short,
this call helps determine which relations will be opened.

If someone can suggest a better alternative, I'll see what I can
do; otherwise I guess I should add comments around the key places.

>> ***************
>> *** 1591,1596 **** fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
>> --- 1592,1607 ----
>> rel = heap_open(rte->relid, NoLock);
>>
>> /*
>> + * Skip materialized view expansion when resultRelation is set.
>> + */
>> + if (rel->rd_rel->relkind == RELKIND_MATVIEW &&
>> + rel->rd_rel->relisvalid)
>> + {
>> + heap_close(rel, NoLock);
>> + break;
>> + }
>
> Would you elaborate on this?

It's diretly related to the point immediately preceding. At this
point we have thrown an error if the MV is invalid and being used
as a source of data. If it is the target of data it is flagged as
invalid so that it will not be expanded. Maybe we need a better way
to determine this, but I'm not sure just what to use.

>> + /* Strip off the trailing semicolon so that other things may follow. */
>> + appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1);
>
> I suggest verifying that the last character is indeed a semicolon.

How about if I have it exit_horribly if the semicolon added 21
lines up has disappeared? Or use Assert if we have that for the
frontend now?

> > + static void
> > + dumpMatViewIndex(Archive *fout, IndxInfo *indxinfo)
>
> This is so similar to dumpIndex(); can we avoid this level of duplication?

It is identical except for name. I can only assume that I thought I
needed a modified version and changed my mind. Removed.

> Please retain an interesting sample of materialized views in the regression
> database. Among other benefits, the pg_upgrade test suite exercises pg_dump
> and pg_upgrade for all object types retained in the regression database.

OK

> The regression tests should probably include a few other wrinkles, like an
> index on a MV.

Yeah. Will do.

> Creating a RULE on an MV succeeds, but refreshing the view then fails:

Fixed by prohibiting CREATE RULE on an MV.

> The documentation is a good start. I would expect a brief introduction in
> Tutorial -> Advanced Features and possibly a deeper discussion under The SQL
> Language. I suggest updating Explicit Locking to mention the new commands;
> users will be interested in the lock level of a refresh.

Yeah, the docs need another pass. It seemed prudent to make sure of
what I was documenting first.

> You have chosen to make pg_dump preserve the valid-or-invalid state of each
> MV. That seems reasonable, though I'm slightly concerned about the case of a
> dump taken from a standby.

I'm not clear on the problem. Could you explain?

> Some of the ALTER TABLE variants would make plenty of sense for MVs:
>
>  ALTER [ COLUMN ] column_name SET STATISTICS integer
>  ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] )
>  ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )
>  ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
>
> It wouldn't be a problem to skip those for the first patch, though.
> Conversely, this syntax is accepted:
>
>  ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] )
>
> But there are no available options. The only option accepted for regular
> views, security_barrier, is rejected. MVs always have security_barrier
> semantics, in any event.

I think those are doc problems, not implementation of the
functionality. Will double-check and fix where needed.

> Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to
> determine which ones make sense for MVs. For each one in the sensible set,
> either allow it or add a comment indicating that it could reasonably be
> allowed in the future. For each one outside the set, forbid it. Verify that
> the documentation, the results of your evaluation, and the actual allowed
> operations are all consistent.

I have already tried to do that in the coding, although maybe you
think more comments are needed there? The docs definitely need to
catch up. This part isn't in flux, so I'll fix that part of the
docs in the next day or two.

Thanks for the review!

-Kevin

Attachment

Re: Materialized views WIP patch

From
Thom Brown
Date:
On 25 January 2013 06:00, Kevin Grittner <kgrittn@mail.com> wrote:
> Noah Misch wrote:
>
>> The patch conflicts with git master; I tested against master@{2013-01-20}.
>
> New patch rebased, fixes issues raised by Thom Brown, and addresses
> some of your points.

Thanks for the new version.  All previous issues I raised have been resolved.

I have an inconsistency to note between VIEWs and MATERIALIZED VIEWs:

CREATE VIEW v_test8 (meow, "?column?") AS SELECT 1 bark, 2;
CREATE MATERIALIZED VIEW mv_test8 (meow, "?column?") AS SELECT 1 bark, 2;

pg_dump output:

CREATE VIEW v_test8 AS   SELECT 1 AS meow, 2;

CREATE MATERIALIZED VIEW mv_test8 (   meow,   "?column?"
) AS   SELECT 1 AS meow, 2 WITH NO DATA;

The materialized view adds in column name parameters, whereas the
standard view doesn't.  But it seems to add column parameters
regardless:

CREATE VIEW v_test9 AS SELECT 1;
CREATE MATERIALIZED VIEW v_test9 AS SELECT 1;

CREATE VIEW v_test9 AS   SELECT 1;

CREATE MATERIALIZED VIEW mv_test9 (   "?column?"
) AS   SELECT 1 WITH NO DATA;

VIEWs never seem to use column parameters, MATERIALIZED VIEWs always
appear to use them.

--
Thom



Re: Materialized views WIP patch

From
Noah Misch
Date:
On Fri, Jan 25, 2013 at 01:00:59AM -0500, Kevin Grittner wrote:
> Noah Misch wrote:
> > > *** a/contrib/sepgsql/sepgsql.h
> > > --- b/contrib/sepgsql/sepgsql.h
> > > ***************
> > > *** 32,37 ****
> > > --- 32,39 ----
> > > 
> > > /*
> > > * Internally used code of object classes
> > > + *
> > > + * NOTE: Materialized views are treated as tables for now.
> > 
> > This smells like a bypass of mandatory access control. Unless you've
> > determined that this is correct within the sepgsql security model, I suggest
> > starting with a draconian policy, like simply crippling MVs. Even if you have
> > determined that, separating out the nontrivial sepgsql support might be good.
> > The set of ideal reviewers is quite different.
> 
> Robert suggested this way of coping for now. Will post just the
> sepgsql separately to try to attract the right crowd to confirm.

Sounds good.

> > Let's not support OIDs on MVs. They'll be regenerated on every refresh.
> 
> Do they have any value for people who might want to use cursors?

Not that I have heard, for whatever that's worth.

> > > + /*
> > > + * Kludge here to allow refresh of a materialized view which is invalid
> > > + * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the
> > > + * first two RangeTblEntry list elements, which were added to the front
> > > + * of the rewritten Query to keep the rules system happy, with the
> > > + * isResultRel flag to indicate that it is OK if they are flagged as
> > > + * invalid.
> > > + */
> > > + rtable = dataQuery->rtable;
> > > + ((RangeTblEntry *) linitial(rtable))->isResultRel = true;
> >> + ((RangeTblEntry *) lsecond(rtable))->isResultRel = true;
> > 
> > Is it safe to assume that the first two RTEs are the correct ones to flag?
> 
> I'm trying to play along with UpdateRangeTableOfViewParse() in
> view.c. See the comment in front of that function for details.

Ah.  Perhaps assert that those RTEs have the aliases "old" and "new"?

> >> + ExecCheckRelationsValid(rangeTable);
> > 
> > I believe this ought to happen after the executor lock acquisitions, perhaps
> > right in ExecOpenScanRelation(). Since you'll then have an open Relation,
> > RelationIsFlaggedAsValid() can use the relcache.
> 
> That would break MVs entirely. This probably deserves more
> comments. It's a little fragile, but was the best way I found to
> handle things. An MV has a rule associated with it, just like a
> "regular" view, which is parse-analyzed but not rewritten or
> planned. We need to allow the rewrite and planning for statements
> which populate the view, but suppress expansion of the rule for
> simple references. It is OK for an MV to be invalid if it is being
> populated, but not if it is being referenced. Long story short,
> this call helps determine which relations will be opened.
> 
> If someone can suggest a better alternative, I'll see what I can
> do; otherwise I guess I should add comments around the key places.

I see.  It seems wrong to check MV validity before the executor locks a table;
if the executor lock were in fact the first lock on the relation, then the
view could become invalid again before we lock it.  I don't know a way to
actually make the executor's lock be the first lock; some parser or planner
lock invariably seems to precede it.  Is that proper to rely on?

> >> + /* Strip off the trailing semicolon so that other things may follow. */
> >> + appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1);
> > 
> > I suggest verifying that the last character is indeed a semicolon.
> 
> How about if I have it exit_horribly if the semicolon added 21
> lines up has disappeared? Or use Assert if we have that for the
> frontend now?

The code 21 lines back is adding a semicolon to a query being sent to the
server to retrieve the view's definition.  Here you're removing the trailing
semicolon from a column of the server's response.  Granted, there's not much
reason we'd ever change the server to omit the trailing semicolon, and the
breakage would be relatively obvious even without an explicit test here.

> > You have chosen to make pg_dump preserve the valid-or-invalid state of each
> > MV. That seems reasonable, though I'm slightly concerned about the case of a
> > dump taken from a standby.
> 
> I'm not clear on the problem. Could you explain?

Currently none.  If you took my suggestion regarding relisvalid, then MVs
would be considered invalid on the standby.  That is one disadvantage of the
suggestion, perhaps.

> > Some of the ALTER TABLE variants would make plenty of sense for MVs:
> > 
> > ??ALTER [ COLUMN ] column_name SET STATISTICS integer
> > ??ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] )
> > ??ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )
> > ??ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
> > 
> > It wouldn't be a problem to skip those for the first patch, though.
> > Conversely, this syntax is accepted:
> > 
> > ??ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] )
> > 
> > But there are no available options. The only option accepted for regular
> > views, security_barrier, is rejected. MVs always have security_barrier
> > semantics, in any event.
> 
> I think those are doc problems, not implementation of the
> functionality. Will double-check and fix where needed.

Looks so.

> > Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to
> > determine which ones make sense for MVs. For each one in the sensible set,
> > either allow it or add a comment indicating that it could reasonably be
> > allowed in the future. For each one outside the set, forbid it. Verify that
> > the documentation, the results of your evaluation, and the actual allowed
> > operations are all consistent.
> 
> I have already tried to do that in the coding, although maybe you
> think more comments are needed there? The docs definitely need to
> catch up. This part isn't in flux, so I'll fix that part of the
> docs in the next day or two.

I won't worry about such comments for now; it looks like you're targeting most
of the reasonable-to-support ALTER operations.  I just didn't realize that the
docs were out of date in this respect.


By the way, your mailer omits References: and In-Reply-To: headers, breaking
the thread.

nm



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 1/25/13 1:00 AM, Kevin Grittner wrote:
> New patch rebased, fixes issues raised by Thom Brown, and addresses
> some of your points.

This patch doesn't apply anymore, so I just took a superficial look.  I
think the intended functionality and the interfaces look pretty good.
Documentation looks complete, tests are there.

I have a couple of notes:

* What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED.
It might be better to use that as well then.

* You use fields named relkind in the parse nodes, but they don't
actually contain relkind values, which is confusing.  I'd just name the
field is_matview or something.

* More generally, I wouldn't be so fond of combining the parse handling
of CREATE TABLE AS and CREATE MATERIALIZED VIEW.  They are similar, but
then again so are a lot of other things.

* Some of the terminology is inconsistent.  A materialized view is
sometimes called valid, populated, or built, with approximately the same
meaning.  Personally, I would settle on "built", as per above, but it
should be one term only.

* I find the name of the relisvalid column a bit confusing.  Especially
because it only applies to materialized views, and there is already a
meaning of "valid" for indexes.  (Recall that indexes are also stored in
pg_class, but they are concerned about indisvalid.)  I would name it
something like relmvbuilt.


Btw., half of the patch seems to consist of updating places referring to
relkind.  Is something wrong with the meaning of relkind that this sort
of thing is required?  Maybe these places should be operating in terms
of features, not accessing relkind directly.




Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Noah Misch <noah@leadboat.com> wrote:

> Like Kevin, I want a way to distinguish unpopulated MVs from MVs that
> genuinely yielded the empty set at last refresh.  I agree that there's no
> particular need to store that fact in pg_class, and I would much prefer only
> storing it one way in any case.  A user-visible disadvantage of the current
> implementation is that relisvalid remains stale until something opens the rel.
> That's fine for the system itself, but it can deceive user-initiated catalog
> queries.  Imagine a check_postgres action that looks for invalid MVs to
> complain about.  It couldn't just scan pg_class; it would need to first do
> something that opens every MV.
>
> I suggest the following:
>
> 1. Let an invalid MV have a zero-length heap.  Distinguish a valid, empty MV
>    by giving it a page with no tuples.  This entails VACUUM[1] not truncating
>    MVs below one page and the refresh operation, where necessary, extending
>    the relation from zero pages to one.
> 2. Remove pg_class.relisvalid.
> 3. Add a bool field to RelationData.  The word "valid" is used in that context
>    to refer to the validity of the structure itself, so perhaps call the new
>    field rd_scannable.  RelationIsFlaggedAsValid() can become a macro;
>    consider changing its name for consistency with the field name.
> 4. During relcache build, set the field to "RelationGetNumberBlocks(rel) !=> 0"
>    for MVs, fixed "true" for everyone else.  Any operation that changes the
>    field must, and probably would anyway, instigate a relcache invalidation.
> 5. Expose a database function, say pg_relation_scannable(), for querying the
>    current state of a relation.  This supports user-level monitoring.
>
> Does that seem reasonable?  One semantic difference to keep in mind is that
> unlogged MVs will be considered invalid on the standby while valid on the
> master.  That's essentially an accurate report, so I won't mind it.

Changed to work pretty much as you suggested.

> I'm going to follow this with a review covering a broader range of topics.

Those issues addressed, too.  That includes the most egregious doc
problems you pointed out, but there still needs to be a thorough
review, and I expect to find a few more doc cleanup issues.

There was one minor syntax issue not addressed by Noah, nor much
discussed in general that I didn't want to just unilaterally
choose; but given that nobody seems to care that much I will put
forward a proposal and do it that way tomorrow if nobody objects.
Before this patch tables were the only things subject to
truncation, but now materialized views can also be truncated.  So
far we have been treating TABLE as a noise word in the truncate
command.  I assume we still want to allow tables to be truncated
with or without the word.  The question is what to do about
materialized views, and wheter both can be specified on a single
TRUNCATE statement.  I propose that we allow TABLE or MATERIALIZED
VIEW to be specified, or that part of the statement to be left out.
I propose that either type of object be allowed unless one or the
other is specified and the object to be truncated is not of that
kind.  So you could mix both kinds on one statement, so long as you
didn't specify either kind.

There is one odd aspect to pg_dump, but I think the way it is
behaving is the best way to handle it, although I invite other
opinions.  If you load from pg_dump output, it will try to
populated materialized views which were populated on dump, and
leave the ones which were not scannable because the contents had
not been generated in an empty and unscannable state on restore.
That much seems pretty obvious.  Where it gets  a little tricky is
if mva is generated with data, and mvb is generated based on mva.
Then mva is truncated.  Then you dump.  mvb was populated at the
time of the dump, but its contents can't be regenerated on restore
because mva is not scannable.  As the patch currently stands, you
get an error on the attempt to REFRESH mvb.  I think that's a good
thing, but I'm open to arguments to the contrary.

I don't have any handling in pg_dump for circular references among
materialized views, because I couldn't see how that could happen.
I'm not 100% sure that isn't just a failure of imagination on my
part, though.

The only other comment I know of that hasn't been addressed is
Simon's suggestion that I put in syntax for features which we might
implement in future releases.  I don't want to do that without the
usual community design and bike-shedding process, so syntax is only
implemented for implemented features.

I'm still waiting for final word (and a small patch?) from KaiGai
Kohei for the sepgsql part.

This patch does require an initdb because of a new function.

Unless something else comes up in review or I get feedback to the
contrary I plan to deal with the above-mentioned issues and commit
this within a week or two.

Thanks to Marko Tiikkaja, Robert Haas, Thom Brown, Simon Riggs,
KaiGai Kohei, and Noah Misch for the reviews and suggestions so
far, thanks to Robert for the initial cut at the docs, and big
thanks to Noah for helping me track down an elusive bug.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Materialized views WIP patch

From
Robert Haas
Date:
On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> There is one odd aspect to pg_dump, but I think the way it is
> behaving is the best way to handle it, although I invite other
> opinions.  If you load from pg_dump output, it will try to
> populated materialized views which were populated on dump, and
> leave the ones which were not scannable because the contents had
> not been generated in an empty and unscannable state on restore.
> That much seems pretty obvious.  Where it gets  a little tricky is
> if mva is generated with data, and mvb is generated based on mva.
> Then mva is truncated.  Then you dump.  mvb was populated at the
> time of the dump, but its contents can't be regenerated on restore
> because mva is not scannable.  As the patch currently stands, you
> get an error on the attempt to REFRESH mvb.  I think that's a good
> thing, but I'm open to arguments to the contrary.

Hmm, anything that means a dump-and-restore can fail seems like a bad
thing to me.  There's nothing outrageous about that scenario.  It's
arguable what state dump-and-restore should leave the new database in,
but I don't see why it shouldn't work.  I predict we'll end up with
unhappy users if we leave it like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Bruce Momjian
Date:
On Fri, Feb 15, 2013 at 08:24:16PM -0500, Robert Haas wrote:
> On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> > There is one odd aspect to pg_dump, but I think the way it is
> > behaving is the best way to handle it, although I invite other
> > opinions.  If you load from pg_dump output, it will try to
> > populated materialized views which were populated on dump, and
> > leave the ones which were not scannable because the contents had
> > not been generated in an empty and unscannable state on restore.
> > That much seems pretty obvious.  Where it gets  a little tricky is
> > if mva is generated with data, and mvb is generated based on mva.
> > Then mva is truncated.  Then you dump.  mvb was populated at the
> > time of the dump, but its contents can't be regenerated on restore
> > because mva is not scannable.  As the patch currently stands, you
> > get an error on the attempt to REFRESH mvb.  I think that's a good
> > thing, but I'm open to arguments to the contrary.
> 
> Hmm, anything that means a dump-and-restore can fail seems like a bad
> thing to me.  There's nothing outrageous about that scenario.  It's
> arguable what state dump-and-restore should leave the new database in,
> but I don't see why it shouldn't work.  I predict we'll end up with
> unhappy users if we leave it like this.

pg_upgrade is going to fail on that pg_restore error.  :-(

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

>> There is one odd aspect to pg_dump, but I think the way it is
>> behaving is the best way to handle it, although I invite other
>> opinions.  If you load from pg_dump output, it will try to
>> populated materialized views which were populated on dump, and
>> leave the ones which were not scannable because the contents had
>> not been generated in an empty and unscannable state on restore.
>> That much seems pretty obvious.  Where it gets  a little tricky is
>> if mva is generated with data, and mvb is generated based on mva.
>> Then mva is truncated.  Then you dump.  mvb was populated at the
>> time of the dump, but its contents can't be regenerated on restore
>> because mva is not scannable.  As the patch currently stands, you
>> get an error on the attempt to REFRESH mvb.  I think that's a good
>> thing, but I'm open to arguments to the contrary.
>
> Hmm, anything that means a dump-and-restore can fail seems like a bad
> thing to me.  There's nothing outrageous about that scenario.  It's
> arguable what state dump-and-restore should leave the new database in,
> but I don't see why it shouldn't work.  I predict we'll end up with
> unhappy users if we leave it like this.

Keeping in mind that mva may take hours to refresh, and mvb may
take only minutes once you have the data from mva, what behavior do
you think is preferable?

The alternatives I can think of are:

(1)  Force mva to refresh on restore, even though it was empty and
not scannable on dump.  This may delay completion of the restore
for an extended time.  It would leave both mva and mvb populated.

(2)  Populate mvb by using mva's query as a regular view.  This
would leave things in the same state as they were on dump, and
might possibly optimized to something faster than generating mva
and then mvb; but probably would not be much faster in most cases.

(3)  Change the failure to generate data for mvb in this case as a
WARNING rather than an ERROR.

(4)  Actually dump and restore data with COPY statements for
materialized views, rather than having the dump create REFRESH
statements.  The main down side of this, it seems to me, is that it
opens up materialized views to direct tinkering of contents via SQL
statements, which I was hoping to avoid.  Perhaps this can be
mitigated in some way.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Feb 15, 2013 at 08:24:16PM -0500, Robert Haas wrote:
>> On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>> There is one odd aspect to pg_dump, but I think the way it is
>>> behaving is the best way to handle it, although I invite other
>>> opinions.  If you load from pg_dump output, it will try to
>>> populated materialized views which were populated on dump, and
>>> leave the ones which were not scannable because the contents had
>>> not been generated in an empty and unscannable state on restore.
>>> That much seems pretty obvious.  Where it gets  a little tricky is
>>> if mva is generated with data, and mvb is generated based on mva.
>>> Then mva is truncated.  Then you dump.  mvb was populated at the
>>> time of the dump, but its contents can't be regenerated on restore
>>> because mva is not scannable.  As the patch currently stands, you
>>> get an error on the attempt to REFRESH mvb.  I think that's a good
>>> thing, but I'm open to arguments to the contrary.
>>
>> Hmm, anything that means a dump-and-restore can fail seems like a bad
>> thing to me.  There's nothing outrageous about that scenario.  It's
>> arguable what state dump-and-restore should leave the new database in,
>> but I don't see why it shouldn't work.  I predict we'll end up with
>> unhappy users if we leave it like this.
>
> pg_upgrade is going to fail on that pg_restore error.  :-(

With the hard link option it should succeed, I would think.  If we
arranged for the check option, when run without the hard link
option, to report such cases so that the user could choose to
either truncate mvb or refresh mva before the upgrade, would that
satisfy this concern?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Noah Misch
Date:
On Sat, Feb 16, 2013 at 09:53:14AM -0800, Kevin Grittner wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
> > On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> >> There is one odd aspect to pg_dump, but I think the way it is
> >> behaving is the best way to handle it, although I invite other
> >> opinions.  If you load from pg_dump output, it will try to
> >> populated materialized views which were populated on dump, and
> >> leave the ones which were not scannable because the contents had
> >> not been generated in an empty and unscannable state on restore.
> >> That much seems pretty obvious.  Where it gets  a little tricky is
> >> if mva is generated with data, and mvb is generated based on mva.
> >> Then mva is truncated.  Then you dump.  mvb was populated at the
> >> time of the dump, but its contents can't be regenerated on restore
> >> because mva is not scannable.  As the patch currently stands, you
> >> get an error on the attempt to REFRESH mvb.  I think that's a good
> >> thing, but I'm open to arguments to the contrary.
> >
> > Hmm, anything that means a dump-and-restore can fail seems like a bad
> > thing to me.  There's nothing outrageous about that scenario.  It's
> > arguable what state dump-and-restore should leave the new database in,
> > but I don't see why it shouldn't work.  I predict we'll end up with
> > unhappy users if we leave it like this.

I agree that making the dump fail on this account is bad.

> Keeping in mind that mva may take hours to refresh, and mvb may
> take only minutes once you have the data from mva, what behavior do
> you think is preferable?
> 
> The alternatives I can think of are:
> 
> (1)  Force mva to refresh on restore, even though it was empty and
> not scannable on dump.  This may delay completion of the restore
> for an extended time.  It would leave both mva and mvb populated.

This is reasonable.  If the user doesn't like it, he can presumably use an
edited dump list to remove the REFRESHes.

> (2)  Populate mvb by using mva's query as a regular view.  This
> would leave things in the same state as they were on dump, and
> might possibly optimized to something faster than generating mva
> and then mvb; but probably would not be much faster in most cases.

Interesting idea, but I don't think adding novel server behavior is justified
to achieve this.

> (3)  Change the failure to generate data for mvb in this case as a
> WARNING rather than an ERROR.

This is also fair.  However, I think it's better to restore more valid MVs
(option 1) than fewer.

> (4)  Actually dump and restore data with COPY statements for
> materialized views, rather than having the dump create REFRESH
> statements.  The main down side of this, it seems to me, is that it
> opens up materialized views to direct tinkering of contents via SQL
> statements, which I was hoping to avoid.  Perhaps this can be
> mitigated in some way.

This is a door better left closed.


Overall, I recommend option 1.  None of the options will furnish the desire of
every database, but the DBA can always tailor the outcome by replacing the
dumped REFRESH statements with his own.  I'm not envisioning that MVs left
invalid for the long term will be a typical thing, anyway.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Noah Misch <noah@leadboat.com> wrote:
> On Sat, Feb 16, 2013 at 09:53:14AM -0800, Kevin Grittner wrote:

> I agree that making the dump fail on this account is bad.

I would argue that this is an overstatement of the issue except for
restores that use the single-transaction switch and pg_upgrade
without the hard link option.  In all other cases, one could just
issue REFRESH statements after the dump successfully completed all
the other work.  But those two cases are important enough that the
issue must be seriously considered.

>> (1)  Force mva to refresh on restore, even though it was empty
>> and not scannable on dump.  This may delay completion of the
>> restore for an extended time.  It would leave both mva and mvb
>> populated.
>
> This is reasonable.  If the user doesn't like it, he can
> presumably use an edited dump list to remove the REFRESHes.

> Overall, I recommend option 1.

I'm OK with that approach, and in the absence of anyone pushing for
another direction, will make that change to pg_dump.  I'm thinking
I would only do this for materialized views which were not
scannable, but which cause REFRESH failures on other materialized
views if not refreshed first (recursively evaluated), rather than
just automatically refreshing all MVs on restore.  The reason this
seems important is that some MVs may take a long time to refresh,
and a user might want a dump/restore to get to a point where they
can use the rest of the database while building the contents of
some MVs in the background or during off hours.

> None of the options will furnish the desire of every database,

Agreed.

> but the DBA can always tailor the outcome by replacing the dumped
> REFRESH statements with his own.

... or by issuing TRUNCATE or REFRESH statements before the dump to
avoid the issue.

> I'm not envisioning that MVs left invalid for the long term will
> be a typical thing, anyway.

Agreed.  I think this will be an infrequent issue caused by unusual
user actions; but it would be bound to come up occasionally.

-Kevin



Re: Materialized views WIP patch

From
Thom Brown
Date:
On 16 February 2013 01:01, Kevin Grittner <kgrittn@ymail.com> wrote:
> Unless something else comes up in review or I get feedback to the
> contrary I plan to deal with the above-mentioned issues and commit
> this within a week or two.

At the moment it's not possible to rename a column without using ALTER
TABLE on an MV.

Also, shouldn't we have the ability to set the collation on a column of the MV?

And the inconsistency between regular views and MVs is still present,
where MVs always dump with column parameters in their definition, and
views never do.  Not a show-stopper, but curious nonetheless.

--
Thom



Re: Materialized views WIP patch

From
Alvaro Herrera
Date:
Kevin Grittner escribió:

> I'm OK with that approach, and in the absence of anyone pushing for
> another direction, will make that change to pg_dump.  I'm thinking
> I would only do this for materialized views which were not
> scannable, but which cause REFRESH failures on other materialized
> views if not refreshed first (recursively evaluated), rather than
> just automatically refreshing all MVs on restore.  The reason this
> seems important is that some MVs may take a long time to refresh,
> and a user might want a dump/restore to get to a point where they
> can use the rest of the database while building the contents of
> some MVs in the background or during off hours.

Maybe it would be a good idea to try to put such commands at the very
end of the dump, if possible.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Kevin Grittner escribió:
>
>> I'm OK with that approach, and in the absence of anyone pushing
>> for another direction, will make that change to pg_dump.  I'm
>> thinking I would only do this for materialized views which were
>> not scannable, but which cause REFRESH failures on other
>> materialized views if not refreshed first (recursively
>> evaluated), rather than just automatically refreshing all MVs on
>> restore.  The reason this seems important is that some MVs may
>> take a long time to refresh, and a user might want a
>> dump/restore to get to a point where they can use the rest of
>> the database while building the contents of some MVs in the
>> background or during off hours.
>
> Maybe it would be a good idea to try to put such commands at the
> very end of the dump, if possible.

Here is the dump order as currently implemented in that patch.  MVs
are created at the same priority as tables and views.  MV REFRESH
and MV index builds obviously need to follow population of table
data. These are at the same priority because it makes the most
sense to populated an MV without any indexes and then build them
before the MV is used to populate some other MV.  Dependency
information is used to get that to sort properly within the
priority level.

    1,                            /* DO_NAMESPACE */
    2,                            /* DO_PROCLANG */
    3,                            /* DO_COLLATION */
    4,                            /* DO_EXTENSION */
    5,                            /* DO_TYPE */
    5,                            /* DO_SHELL_TYPE */
    6,                            /* DO_FUNC */
    7,                            /* DO_AGG */
    8,                            /* DO_OPERATOR */
    9,                            /* DO_OPCLASS */
    9,                            /* DO_OPFAMILY */
    10,                            /* DO_CAST */
    11,                            /* DO_CONVERSION */
    12,                            /* DO_TSPARSER */
    13,                            /* DO_TSTEMPLATE */
    14,                            /* DO_TSDICT */
    15,                            /* DO_TSCONFIG */
    16,                            /* DO_FDW */
    17,                            /* DO_FOREIGN_SERVER */
    18,                            /* DO_TABLE */
    19,                            /* DO_DUMMY_TYPE */
    20,                            /* DO_ATTRDEF */
    21,                            /* DO_BLOB */
    22,                            /* DO_PRE_DATA_BOUNDARY */
    23,                            /* DO_TABLE_DATA */
    24,                            /* DO_BLOB_DATA */
    25,                            /* DO_POST_DATA_BOUNDARY */
    26,                            /* DO_CONSTRAINT */
    27,                            /* DO_INDEX */
    28,                            /* DO_REFRESH_MATVIEW */
    28                             /* DO_MATVIEW_INDEX */
    29,                            /* DO_RULE */
    30,                            /* DO_TRIGGER */
    31,                            /* DO_FK_CONSTRAINT */
    32,                            /* DO_DEFAULT_ACL */
    33,                            /* DO_EVENT_TRIGGER */

I don't think that pushing MV refreshes and index creation farther
down the list should require anything beyond adjusting the priority
numbers.  I don't see a problem pushing them to the end.  Does
anyone else see anything past priority 28 that MV population should
*not* follow?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Thom Brown <thom@linux.com> wrote:
> On 16 February 2013 01:01, Kevin Grittner <kgrittn@ymail.com> wrote:
>> Unless something else comes up in review or I get feedback to
>> the contrary I plan to deal with the above-mentioned issues and
>> commit this within a week or two.
>
> At the moment it's not possible to rename a column without using
> ALTER TABLE on an MV.
>
> Also, shouldn't we have the ability to set the collation on a
> column of the MV?

Will fix.

> And the inconsistency between regular views and MVs is still
> present, where MVs always dump with column parameters in their
> definition, and views never do.  Not a show-stopper, but curious
> nonetheless.

I haven't worried about this because current behavior generates
correct results -- this seems like a micro-optimization.  The
explanation for why it wound up that way is that creating a
materialized view is in many ways more like creating a table than
like creating a view -- it seemed safer and less invasive to modify
the CREATE TABLE code than the CREATE VIEW code, and specifying
column names just fell out of that as part of the minimal change.
In looking at the pg_dump output, though, I see that the CMV AS
clause also is getting the names right with column-level aliases,
so it should be pretty simple and safe to leave off the
column-list section for MVs.  I guess it's worth it just to
forestall further questions on the topic.

Thanks!

-Kevin



Re: Materialized views WIP patch

From
Noah Misch
Date:
On Mon, Feb 18, 2013 at 06:49:14AM -0800, Kevin Grittner wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Maybe it would be a good idea to try to put such commands at the
> > very end of the dump, if possible.

>     25,                            /* DO_POST_DATA_BOUNDARY */
>     26,                            /* DO_CONSTRAINT */
>     27,                            /* DO_INDEX */
>     28,                            /* DO_REFRESH_MATVIEW */
>     28                             /* DO_MATVIEW_INDEX */
>     29,                            /* DO_RULE */
>     30,                            /* DO_TRIGGER */
>     31,                            /* DO_FK_CONSTRAINT */
>     32,                            /* DO_DEFAULT_ACL */
>     33,                            /* DO_EVENT_TRIGGER */
> 
> I don't think that pushing MV refreshes and index creation farther
> down the list should require anything beyond adjusting the priority
> numbers.  I don't see a problem pushing them to the end.  Does
> anyone else see anything past priority 28 that MV population should
> *not* follow?

DO_EVENT_TRIGGER should remain last; it may change the behavior of nearly any
other command.

Moving DO_REFRESH_MATVIEW past DO_TRIGGER would affect the outcome when the MV
calls functions that ultimately trip triggers or rules.  Currently, the
behavior will be the same as for CHECK constraints: the rules and triggers
don't exist yet.  This may also affect, for the better, MVs referencing views
that need the CREATE TABLE ... CREATE RULE _RETURN restoration pathway.  It
looks like a positive change.  On the flip side, I wonder if there's some case
I'm not considering where it's important to delay restoring rules and/or
triggers until after restoring objects for which restoration can entail calls
to arbitrary user functions.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Noah Misch <noah@leadboat.com> wrote:
> On Mon, Feb 18, 2013 at 06:49:14AM -0800, Kevin Grittner wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>> Maybe it would be a good idea to try to put such commands at
>>> the very end of the dump, if possible.
>
>>     25,                            /* DO_POST_DATA_BOUNDARY */
>>     26,                            /* DO_CONSTRAINT */
>>     27,                            /* DO_INDEX */
>>     28,                            /* DO_REFRESH_MATVIEW */
>>     28                             /* DO_MATVIEW_INDEX */
>>     29,                            /* DO_RULE */
>>     30,                            /* DO_TRIGGER */
>>     31,                            /* DO_FK_CONSTRAINT */
>>     32,                            /* DO_DEFAULT_ACL */
>>     33,                            /* DO_EVENT_TRIGGER */
>>
>> I don't think that pushing MV refreshes and index creation
>> farther down the list should require anything beyond adjusting
>> the priority numbers.  I don't see a problem pushing them to the
>> end.  Does anyone else see anything past priority 28 that MV
>> population should *not* follow?
>
> DO_EVENT_TRIGGER should remain last; it may change the behavior
> of nearly any other command.
>
> Moving DO_REFRESH_MATVIEW past DO_TRIGGER would affect the
> outcome when the MV calls functions that ultimately trip triggers
> or rules. Currently, the behavior will be the same as for CHECK
> constraints: the rules and triggers don't exist yet.  This may
> also affect, for the better, MVs referencing views that need the
> CREATE TABLE ... CREATE RULE _RETURN restoration pathway.  It
> looks like a positive change.  On the flip side, I wonder if
> there's some case I'm not considering where it's important to
> delay restoring rules and/or triggers until after restoring
> objects for which restoration can entail calls to arbitrary user
> functions.

I didn't quite follow all of Noah's points or their implications,
so we chatted off-list.  He made a couple additional observations
which allow some simplification of the patch, and allow MV REFRESH
to be moved to the very end of the priority list without ill
effect.

(1)  While it might be incorrect for the CREATE INDEX on a
materialized view to come after event triggers are set up, REFRESH
can be expected to be a routine action in the presence of such
triggers, and it might actually be incorrect to REFRESH when the
triggers are not present.

(2)  REFRESH MATERIALIZED VIEW creates and builds a new heap, and
reindexes it after the data has been loaded, so the timing of the
CREATE INDEX statements on MVs is not critical, as long as they are
done after the CREATE and before the REFRESH.  We could drop them
into the same priority as the other CREATE INDEX statements, and it
would not be a big deal because the MVs would be empty.

This should allow me to simplify the code a little bit and move the
RMV step to the very end.  That may have some advantages when users
want to start using the database while MVs are being populated.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Mon, Feb 18, 2013 at 4:48 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> This should allow me to simplify the code a little bit and move the
> RMV step to the very end.  That may have some advantages when users
> want to start using the database while MVs are being populated.

In the department of crazy ideas, what about having pg_dump NEVER
refresh ANY materialized views?

It's true that the job of pg_dump and pg_restore is to put the new
database in the same state that the old database was in, but I think
you could make a reasonable case that materialized views ought to be
an exception.  After all, even with all of this infrastructure,
chances are pretty good that the new MV contents won't end up being
the same as the old MV contents on the old server - because the old
MVs could easily have been stale.  So why not just get the restore
over with as fast as possible, and then let the user refresh the MVs
that they think need refreshing (perhaps after getting the portions of
their system that don't rely on MVs up and running)?

At the very least, I think we ought to have an option for this
behavior.  But the more I think about it, the more I think maybe it
ought to be the default.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Nicolas Barbier
Date:
2013/2/19 Robert Haas <robertmhaas@gmail.com>:

> In the department of crazy ideas, what about having pg_dump NEVER
> refresh ANY materialized views?
>
> It's true that the job of pg_dump and pg_restore is to put the new
> database in the same state that the old database was in, but I think
> you could make a reasonable case that materialized views ought to be
> an exception.  After all, even with all of this infrastructure,
> chances are pretty good that the new MV contents won't end up being
> the same as the old MV contents on the old server - because the old
> MVs could easily have been stale.  So why not just get the restore
> over with as fast as possible, and then let the user refresh the MVs
> that they think need refreshing (perhaps after getting the portions of
> their system that don't rely on MVs up and running)?
>
> At the very least, I think we ought to have an option for this
> behavior.  But the more I think about it, the more I think maybe it
> ought to be the default.

+1 from me from a minimalist point of view.

I think of a matview of the manually refreshed kind as “can contain
stale contents (or be invalid) unless someone manually makes sure it
is up to date (or valid)”. Making any matviews invalid by default upon
restoring (itself being a manual action) would be consistent with that
definition. Additionally, ISTM to be the least arbitrary (and hence
most elegant) choice, and even more so in the context of
matviews-depending-on-matviews.

Spamming some more craziness:

Another (more elaborate) suggestion could be: Store for each matview
whether it is to be rebuilt upon restore or not. Using this setting
would intuitively mean something like “I consider this matview being
valid a precondition for considering the database state valid.”
Setting this to true for a matview would only be allowed when any
other matviews on which it depends also have this setting set to true.

Just my €0.02 of course.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 2/19/13 8:54 AM, Robert Haas wrote:
> In the department of crazy ideas, what about having pg_dump NEVER
> refresh ANY materialized views?

It might be useful to have an option for this, but I don't think it
should be the default.  The default should be that the new database is
"ready to go".

Then again, when would you ever actually use that option?

This might be different if there were a command to refresh all
materialized views, because you don't want to have to go around and type
separate commands 47 times after a restore.




Re: Materialized views WIP patch

From
"Erik Rijkers"
Date:
On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
> matview-v4.patch.gz

Hi,

I was wondering if material views should not go into information_schema.  I was thinking either
.views or .tables.  Have you considered this?

I ask because as far as I can see querying for mv's has to go like this:

SELECT n.nspname, c.relname
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('m','')
  and n.nspname = 'myschema'

which seems rather ugly.


Also, some documentation typos: please see attached.


Thanks,

Erik Rijkers




Attachment

Re: Materialized views WIP patch

From
Josh Berkus
Date:
On 02/19/2013 02:09 PM, Erik Rijkers wrote:
> On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
>> matview-v4.patch.gz
> 
> Hi,
> 
> I was wondering if material views should not go into information_schema.  I was thinking either
> .views or .tables.  Have you considered this?
> 
> I ask because as far as I can see querying for mv's has to go like this:
> 
> SELECT n.nspname, c.relname
> FROM pg_catalog.pg_class c
> LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
> WHERE c.relkind IN ('m','')
>   and n.nspname = 'myschema'

Well, I'm not sure about information_schema, but we'll definitely want a
pg_matviews system view. Also a \dM.  That could wait until 9.4, though.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
David Fetter
Date:
On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote:
> On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
> > matview-v4.patch.gz
> 
> Hi,
> 
> I was wondering if material views should not go into information_schema.  I was thinking either
> .views or .tables.  Have you considered this?

I'm guessing it'd be .views if anything.  Haven't been able to
decipher from section 11 of the standard (Schemata) whether the
standard has anything to say on the matter.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Erik Rijkers <er@xs4all.nl> wrote:

> I was wondering if material views should not go into
> information_schema.  I was thinking either .views or .tables.
> Have you considered this?

I had not considered this to be a good idea because
information_schema is defined by the standard, and materialized
views are an extension to the standard.  Someone using these views
to identify either tables or views might make a bad choice based on
this.  I'm open to arguments for inclusion, if you think it would
not violate the standard.  Which would be safe?

> Also, some documentation typos: please see attached.

Will apply.  Thanks.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Josh Berkus <josh@agliodbs.com> wrote:

> Well, I'm not sure about information_schema, but we'll definitely
> want a pg_matviews system view.

> That could wait until 9.4, though.

That I could probably do.  Do you think they should have a separate
pg_stat_user_matviews table, etc., or do you think it would be
better to include them in with tables there?

> Also a \dM.

I already added it as \dm in the current patch.  Does that conflict
with something else that's pending?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Josh Berkus
Date:
> That I could probably do.  Do you think they should have a separate
> pg_stat_user_matviews table, etc., or do you think it would be
> better to include them in with tables there?

Well, ideally pg_matviews would have matview definitions, and
pg_stat_matviews would have stats on matview usage and rows.  But see
what you can get done; I imagine we'll overhaul it for 9.4 anyway once
we've had a chance to use the feature.

> 
>> Also a \dM.
> 
> I already added it as \dm in the current patch.  Does that conflict
> with something else that's pending?

Oh, no, I thought \dm was *already* in use, but apparently not.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Kevin Grittner <kgrittn@ymail.com>

> There was one minor syntax issue not addressed by Noah, nor much
> discussed in general that I didn't want to just unilaterally
> choose; but given that nobody seems to care that much I will put
> forward a proposal and do it that way tomorrow if nobody objects.
> Before this patch tables were the only things subject to
> truncation, but now materialized views can also be truncated.  So
> far we have been treating TABLE as a noise word in the truncate
> command.  I assume we still want to allow tables to be truncated
> with or without the word.  The question is what to do about
> materialized views, and wheter both can be specified on a single
> TRUNCATE statement.  I propose that we allow TABLE or MATERIALIZED
> VIEW to be specified, or that part of the statement to be left out.
> I propose that either type of object be allowed unless one or the
> other is specified and the object to be truncated is not of that
> kind.  So you could mix both kinds on one statement, so long as you
> didn't specify either kind.

When I went to do this, I hit a shift/reduce conflict, because with
TABLE being optional it couldn't tell whether:

TRUNCATE MATERIALIZED VIEW x, y, z;

... was looking for five relations or three.  That goes away with
MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that OK?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Josh Berkus <josh@agliodbs.com> wrote:

>> That I could probably do.  Do you think they should have a separate
>> pg_stat_user_matviews table, etc., or do you think it would be
>> better to include them in with tables there?
>
> Well, ideally pg_matviews would have matview definitions, and
> pg_stat_matviews would have stats on matview usage and rows.  But see
> what you can get done; I imagine we'll overhaul it for 9.4 anyway once
> we've had a chance to use the feature.

I agree on pg_matviews, but after looking over the existing views
and thinking about what I would use them for as a DBA, I'm inclined
to fold the backing tables for MVs into the _stat_ and _statio_
views -- especially since we already include the backing tables and
indexes for TOAST.  There is a precident for including
implementation details at that level.  The only difference from
TOAST, is that I include the heap and indexes for MVs in the _user_
views.  I'm attaching the patch for just the system_views.sql file
for discussion before I go write docs for this part.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Kevin Grittner <kgrittn@ymail.com> wrote:

> I'm attaching the patch for just the system_views.sql file
> for discussion before I go write docs for this part.

Meh.  If I'm gonna have pg_matviews I might as well include an
isscannable column.  v2 attached.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Materialized views WIP patch

From
Josh Berkus
Date:
On 02/19/2013 03:41 PM, Kevin Grittner wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
> 
>> I'm attaching the patch for just the system_views.sql file
>> for discussion before I go write docs for this part.
> 
> Meh.  If I'm gonna have pg_matviews I might as well include an
> isscannable column.  v2 attached.

pg_get_viewdef() will work on Matviews?  Great.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> When I went to do this, I hit a shift/reduce conflict, because with
> TABLE being optional it couldn't tell whether:

> TRUNCATE MATERIALIZED VIEW x, y, z;

> ... was looking for five relations or three.� That goes away with
> MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.� Is that OK?

Not really.  I would much rather see us not bother with this pedantic
syntax than introduce an even-partially-reserved word.

Having said that, I don't think I believe your analysis of why this
doesn't work.  The presence or absence of commas ought to make the
syntax non-ambiguous, I would think.  Maybe you just factored the
grammar wrong.
        regards, tom lane



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 2/19/13 8:54 AM, Robert Haas wrote:
>> In the department of crazy ideas, what about having pg_dump NEVER
>> refresh ANY materialized views?
>
> It might be useful to have an option for this, but I don't think it
> should be the default.  The default should be that the new database is
> "ready to go".
>
> Then again, when would you ever actually use that option?

You'd use that option if you'd rather get the database mostly-up as
soon as possible, and then worry about the materialized views
afterwards.

> This might be different if there were a command to refresh all
> materialized views, because you don't want to have to go around and type
> separate commands 47 times after a restore.

Well, it's pretty easy to do:

SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM
pg_class WHERE relkind = 'm';

...but we could also add explicit syntax for it, perhaps along the
lines of what we have for CLUSTER and VACUUM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> This might be different if there were a command to refresh all
>> materialized views, because you don't want to have to go around and type
>> separate commands 47 times after a restore.

> Well, it's pretty easy to do:

> SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM
> pg_class WHERE relkind = 'm';

> ...but we could also add explicit syntax for it, perhaps along the
> lines of what we have for CLUSTER and VACUUM.

It's not really that easy, because of the likelihood that MVs have to be
refreshed in a specific order.  The SELECT you suggest definitely seems
too simplistic.  A dedicated command could perhaps be built to pay
attention to dependencies ... but if we're still coding such things now,
it seems a bit late for 9.3.
        regards, tom lane



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 2/19/13 5:22 PM, David Fetter wrote:
> On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote:
>> On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
>>> matview-v4.patch.gz
>>
>> Hi,
>>
>> I was wondering if material views should not go into information_schema.  I was thinking either
>> .views or .tables.  Have you considered this?
> 
> I'm guessing it'd be .views if anything.  Haven't been able to
> decipher from section 11 of the standard (Schemata) whether the
> standard has anything to say on the matter.

I suppose one should be able to expect that if one finds a view in the
information schema, then one should be able to use DROP VIEW to remove
it.  Which in this case wouldn't work.  So I don't think including a
materialized view under views or tables is appropriate.





Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> When I went to do this, I hit a shift/reduce conflict, because
>> with TABLE being optional it couldn't tell whether:
>
>> TRUNCATE MATERIALIZED VIEW x, y, z;
>
>> ... was looking for five relations or three.  That goes away
>> with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that
>> OK?
>
> Not really.  I would much rather see us not bother with this
> pedantic syntax than introduce an even-partially-reserved word.

I'm not sure it's worth it either; but two people requested it and
I didn't forsee this shift/reduce conflict, so I took a shot at it.
 If we can't eliminate the conflict, I'm fine with leaving things
as they are in the latest posted patch.

> Having said that, I don't think I believe your analysis of why
> this doesn't work.  The presence or absence of commas ought to
> make the syntax non-ambiguous, I would think.  Maybe you just
> factored the grammar wrong.

Well, it wouldn't be the first time you've seen a better way to do
something in flex than I was able to see.  Taking just the gram.y
part of the change which implemented this, and omitting the change
in reservedness of MATERIALIZED, I have:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 820cb41..1d393c5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -394,6 +394,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);

 %type <ival>   opt_column event cursor_options opt_hold opt_set_data
 %type <objtype>    reindex_type drop_type comment_type security_label_type
+               trunc_type

 %type <node>   fetch_args limit_clause select_limit_value
                offset_clause select_offset_value
@@ -5172,9 +5173,10 @@ attrs:       '.' attr_name
  *****************************************************************************/

 TruncateStmt:
-           TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior
+           TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior
                {
                    TruncateStmt *n = makeNode(TruncateStmt);
+                   n->objtype = $2;
                    n->relations = $3;
                    n->restart_seqs = $4;
                    n->behavior = $5;
@@ -5182,6 +5184,12 @@ TruncateStmt:
                }
        ;

+trunc_type:
+           TABLE                       { $$ = OBJECT_TABLE; }
+           | MATERIALIZED VIEW         { $$ = OBJECT_MATVIEW; }
+           | /*EMPTY*/                 { $$ = OBJECT_UNSPECIFIED; }
+       ;
+
 opt_restart_seqs:
            CONTINUE_P IDENTITY_P       { $$ = false; }
            | RESTART IDENTITY_P        { $$ = true; }

I'm open to suggestions on a better way.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Peter Eisentraut <peter_e@gmx.net> wrote:

> I suppose one should be able to expect that if one finds a view
> in the information schema, then one should be able to use DROP
> VIEW to remove it.  Which in this case wouldn't work.  So I don't
> think including a materialized view under views or tables is
> appropriate.

Right.  I think adding pg_matviews covers the stated use-case
enough to answer Erik's concern.  I'm not going to mess with adding
non-standard stuff to the standard views.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Having said that, I don't think I believe your analysis of why
>> this doesn't work.

> Well, it wouldn't be the first time you've seen a better way to do
> something in flex than I was able to see.  Taking just the gram.y
> part of the change which implemented this, and omitting the change
> in reservedness of MATERIALIZED, I have:

> -           TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior
> +           TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior

> +trunc_type:
> +           TABLE                       { $$ = OBJECT_TABLE; }
> +           | MATERIALIZED VIEW         { $$ = OBJECT_MATVIEW; }
> +           | /*EMPTY*/                 { $$ = OBJECT_UNSPECIFIED; }
> +       ;

Yeah, this is a standard gotcha when working with unreserved keywords.
You can't factor it like that because then the parser is required to
make a shift-reduce decision (on whether to reduce trunc_type to empty)
before it can "see past" the first word.  So for instance given
TRUNCATE MATERIALIZED ...        ^

the parser has to make that decision when it can't see past the word
"MATERIALIZED" and so doesn't know what comes after it.

The way to fix it is to not try to use the sub-production but spell it
all out:
  TRUNCATE TABLE relation_expr_list ...| TRUNCATE MATERIALIZED VIEW relation_expr_list ...| TRUNCATE relation_expr_list
...

Now the parser doesn't have to make any shift-reduce decision until
after it can "see past" the first identifier.  It's a bit tedious
but beats making a word more reserved than it has to be.
        regards, tom lane



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:


> The way to fix it is to not try to use the sub-production but spell it
> all out:
>
>       TRUNCATE TABLE relation_expr_list ...
>     | TRUNCATE MATERIALIZED VIEW relation_expr_list ...
>     | TRUNCATE relation_expr_list ...
>
> Now the parser doesn't have to make any shift-reduce decision until
> after it can "see past" the first identifier.  It's a bit tedious
> but beats making a word more reserved than it has to be.

Thanks!  Will do.


--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 2/20/13 6:13 AM, Robert Haas wrote:
>> It might be useful to have an option for this, but I don't think it
>> > should be the default.  The default should be that the new database is
>> > "ready to go".
>> >
>> > Then again, when would you ever actually use that option?
> You'd use that option if you'd rather get the database mostly-up as
> soon as possible, and then worry about the materialized views
> afterwards.

Since the proposed materialized views are not available for implicit use
in query optimization, the only way an application would make use of
them is to access them directly.  And if it accesses an unpopulated
materialized view, it would fail.  So I don't think in the current state
a database is mostly-up without the materialized views filled in.

I can see the value in having a restore mode that postpones certain
nonessential operations, such as creating indexes or certain constraints
or even materialized views.  But I think the boundaries and expectations
for that need to be defined more precisely.  For example, a database
without constraints might be considered "ready for read-only use",
without secondary indexes it might be "ready for use but slow".




Re: Materialized views WIP patch

From
"Erik Rijkers"
Date:
On Wed, February 20, 2013 16:28, Kevin Grittner wrote:
> Peter Eisentraut <peter_e@gmx.net> wrote:
>
>> I suppose one should be able to expect that if one finds a view
>> in the information schema, then one should be able to use DROP
>> VIEW to remove it.  Which in this case wouldn't work.  So I don't
>> think including a materialized view under views or tables is
>> appropriate.
>
> Right.  I think adding pg_matviews covers the stated use-case
> enough to answer Erik's concern. 

Absolutely - I agree pg_matviews is much better than adding deviating information_schema stuff.

Thank you,

Erik Rijkers




Re: Materialized views WIP patch

From
Greg Stark
Date:
On Wed, Feb 20, 2013 at 4:26 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>
>> The way to fix it is to not try to use the sub-production but spell it
>> all out:
>>
>>       TRUNCATE TABLE relation_expr_list ...
>>     | TRUNCATE MATERIALIZED VIEW relation_expr_list ...
>>     | TRUNCATE relation_expr_list ...
>>
>> Now the parser doesn't have to make any shift-reduce decision until
>> after it can "see past" the first identifier.  It's a bit tedious
>> but beats making a word more reserved than it has to be.
>
> Thanks!  Will do.

Fwiw I think worrying about stuff like this at this point is probably
a waste of time. There'll be a period of bike-shedding where people
debate what the command should be called so worrying about parser
conflicts before there's a consensus is kind pointless.

I would like to know what operations you plan to support independently
of the command names. I may have missed much earlier in the discussion
but then I suspect things have evolved since then.

It sounds like you want to support:

1) Selecting from materialized viws
2) Manually refreshing materialized views
3) Manually truncating materialized views

And explicitly not support

1) Automatically rewriting queries to select from matching views
2) Incrementally refreshing materialized views
3) Manual DML against data in materialized views (except truncate
which is kind of DDL)
4) Keeping track of whether the data in the materialized view is up to date

I have to say I find this model a bit odd. It seems the UI you're
presenting is that they're basically read-only tables that the
database will fill in the data for automatically. My mental model of
materialized views is that they're basically views that the database
guarantees a different performance characteristic for.

I would expect a materialized view to be up to date all the time. If
we don't support incremental updates (which seems like a fine thing
not to support in a first cut) then I would expect any DML against the
table to mark the view invalid and any queries against it to produce
an error (or possibly go to the source tables using the view
definition but that's probably a bad idea for most use cases). Ie.
they should behave like a view at all times and have up to date
information or fail entirely.

I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but
I would expect it to be called something like INVALIDATE rather than
TRUNCATE and dropping the storage is a side effect of simply telling
the database that it doesn't need to maintain this materialized view.
Though I could be convinced "truncate" is a good name as long as it's
documented well.

-- 
greg



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Greg Stark <stark@mit.edu> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:

>>> The way to fix it is to not try to use the sub-production but
>>> spell it all out:
>>>
>>>       TRUNCATE TABLE relation_expr_list ...
>>>     | TRUNCATE MATERIALIZED VIEW relation_expr_list ...
>>>     | TRUNCATE relation_expr_list ...
>>>
>>> Now the parser doesn't have to make any shift-reduce decision
>>> until after it can "see past" the first identifier.  It's a bit
>>> tedious but beats making a word more reserved than it has to
>>> be.
>>
>> Thanks!  Will do.
>
> Fwiw I think worrying about stuff like this at this point is
> probably a waste of time. There'll be a period of bike-shedding
> where people debate what the command should be called so worrying
> about parser conflicts before there's a consensus is kind
> pointless.

That sort of bikeshedding already happened three months ago.  Too
late now.

> I would like to know what operations you plan to support
> independently of the command names. I may have missed much
> earlier in the discussion but then I suspect things have evolved
> since then.
>
> It sounds like you want to support:
>
> 1) Selecting from materialized viws
> 2) Manually refreshing materialized views
> 3) Manually truncating materialized views
>
> And explicitly not support
>
> 1) Automatically rewriting queries to select from matching views
> 2) Incrementally refreshing materialized views

Those are material for later releases, building on the base of what
goes into this release.

> 3) Manual DML against data in materialized views (except truncate
> which is kind of DDL)

There is quite a lot of DML allowed -- changing tablespace,
changing schema, changing name of the MV or of individual columns
in it, changing statistics targets, creating indexes, and other
operations are supported.

> 4) Keeping track of whether the data in the materialized view is
> up to date

Only keeping track of whether data has been populated or not, for
now.  There has been agreement that one or more timestamps relating
to freshness will make sense, but these are not in the initial
patch.

> I have to say I find this model a bit odd.

It's not a model, it's a starting point.  Several people have
already said that even this much is useful and they expect to take
advantage of it.  I'm doing what I can to not paint us into a
corner where it's hard to extend to all the features everyone
dreams of, but if we waited for that to commit something, it will
never happen.

> I would expect a materialized view to be up to date all the time.

I expect that this will eventually be an option, but I expect that
is will be a seldom-used one.  Most cases that I've seen, people
want summary data that is reasonably up-to-date without unduly
affecting the performance of incremental changes to the underlying
data.  I've sketched out the roadmap from this patch to all of
these options in a vauge, handwavy fashion, and don't have a lot of
interest in taking it farther until we're past 9.3 beta.

> If we don't support incremental updates (which seems like a fine
> thing not to support in a first cut) then I would expect any DML
> against the table to mark the view invalid and any queries
> against it to produce an error (or possibly go to the source
> tables using the view definition but that's probably a bad idea
> for most use cases). Ie. they should behave like a view at all
> times and have up to date information or fail entirely.

That would render them completely useless for the use-cases I've
seen.  If you want to offer a patch to do that as an option, feel
free, but I will strongly argue against that as unconditional
behavior.

> I would expect a command like TRUNCATE MATERIALIZED VIEW to exist
> but I would expect it to be called something like INVALIDATE
> rather than TRUNCATE and dropping the storage is a side effect of
> simply telling the database that it doesn't need to maintain this
> materialized view. Though I could be convinced "truncate" is a
> good name as long as it's documented well.

I'm trying to minimize the number of new keywords.  The initial
patch only added MATERIALIZED.  I added REFRESH due to
near-universal demand for something other than the LOAD
MATERIALIZED VIEW I initially used.  Have you seen the statistics
Tom gave out on how much the size of the executable bloats with
every new keyword?  Until now nobody has expressed concern about
TRUNCATE MATERIALIZED VIEW, so it would take quite a groundswell of
concern at this point to even consider a new keyword for this
functionality this late in the game.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Josh Berkus
Date:
> And explicitly not support
> 
> 1) Automatically rewriting queries to select from matching views
> 2) Incrementally refreshing materialized views
> 3) Manual DML against data in materialized views (except truncate
> which is kind of DDL)
> 4) Keeping track of whether the data in the materialized view is up to date

The idea is to add the above features over the next few versions of
Postgres.

> I have to say I find this model a bit odd. It seems the UI you're
> presenting is that they're basically read-only tables that the
> database will fill in the data for automatically. 

This is what matviews are in other DBMSes.

> My mental model of
> materialized views is that they're basically views that the database
> guarantees a different performance characteristic for.

How would we do that, exactly?  That would be lovely, but unless you
have a way to accomplish it ...

> I would expect a materialized view to be up to date all the time. 

Actually, there's a huge use case for asynchronously updated matviews,
so we would not want an implementation which ruled them out.  Also
there's the argument that synchronously updated matviews have little
actual performance advantage over regular dynamic views.

Or to put it another way: I could use this feature, as it is, in about 8
different projects I'm currently supporting.  I personally can't think
of a single project where I need synchronously updated matviews,
currently.  I have in the past, but it's a LOT less frequent that the
desire for async, just as the desire for async replication is more
common than the desire for syncrep.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 2/19/13 5:47 PM, Kevin Grittner wrote:
> When I went to do this, I hit a shift/reduce conflict, because with
> TABLE being optional it couldn't tell whether:
> 
> TRUNCATE MATERIALIZED VIEW x, y, z;
> 
> ... was looking for five relations or three.  That goes away with
> MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that OK?

Is TRUNCATE even the right command here?  For regular tables TRUNCATE is
a fast DELETE, which logically empties the table.  For materialized
views, there is no deleting, so this command (I suppose?) just
invalidates the materalized view.  That's not the same thing.

Are there TRUNCATE triggers on materialized views?




Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Peter Eisentraut <peter_e@gmx.net> wrote:
> On 2/19/13 5:47 PM, Kevin Grittner wrote:
>> When I went to do this, I hit a shift/reduce conflict, because
>> with TABLE being optional it couldn't tell whether:
>>
>> TRUNCATE MATERIALIZED VIEW x, y, z;
>>
>> ... was looking for five relations or three.  That goes away
>> with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that
>> OK?
>
> Is TRUNCATE even the right command here?  For regular tables
> TRUNCATE is a fast DELETE, which logically empties the table.
> For materialized views, there is no deleting, so this command (I
> suppose?) just invalidates the materalized view.  That's not the
> same thing.

Hmm.  That's what Greg Stark just said, and I replied that nobody
else had raised the issue in over three months.  With Greg, that's
two now.

TRUNCATE MATERIALIZED VIEW discards any data which has been loaded
into the MV, rendering it unavailable for scanning.  Internally, it
does do a truncate, exactly as truncate table.  The resulting
zero-length heap file is what is used to determine whether a
materialized view is "scannable".  When a CREATE WITH DATA or a
REFRESH generates zero rows, an empty single page is created to
indicate that it is scannable (valid to use in queries) but
contains no rows.

I agree that INVALIDATE is probably more descriptive, although it
seems that there might be some even better word if we bikeshed
enough.  The question is, is it worth creating a new keyword to
call the internal truncate function for materialized views, versus
documenting that truncating a materialized view renders it invalid?
Again, given the numbers that Tom presented a while back about the
space requirements of every new keyword, I don't think this is
enough of a gain to justify that.  I still squirm a little about
having used REFRESH, even though demand for that was overwhelming.

> Are there TRUNCATE triggers on materialized views?

No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
create a trigger of any type on a materialized view.  I don't think
that would interfere with event triggers, though.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Stephen Frost
Date:
* Kevin Grittner (kgrittn@ymail.com) wrote:
> Peter Eisentraut <peter_e@gmx.net> wrote:
> > Is TRUNCATE even the right command here?  For regular tables
> > TRUNCATE is a fast DELETE, which logically empties the table.
> > For materialized views, there is no deleting, so this command (I
> > suppose?) just invalidates the materalized view.  That's not the
> > same thing.
>
> Hmm.  That's what Greg Stark just said, and I replied that nobody
> else had raised the issue in over three months.  With Greg, that's
> two now.

TRUNCATE MAT VIEW seems like the right command to me.  Just my 2c.
Thanks,
    Stephen

Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 2/20/13 2:30 PM, Kevin Grittner wrote:
>> Are there TRUNCATE triggers on materialized views?
> No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
> create a trigger of any type on a materialized view.  I don't think
> that would interfere with event triggers, though.

More generally, I would consider the invalidation of a materialized view
a DDL command, whereas truncating a table is a DML command.  This has
various implications with triggers, logging, permissions.  I think it's
not good to mix those two.

Also note that un-invalidating==refreshing a materialized view is
already a DDL command.




Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Peter Eisentraut <peter_e@gmx.net> wrote:
> On 2/20/13 2:30 PM, Kevin Grittner wrote:
>>> Are there TRUNCATE triggers on materialized views?
>> No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
>> create a trigger of any type on a materialized view.  I don't
>> think that would interfere with event triggers, though.
>
> More generally, I would consider the invalidation of a
> materialized view a DDL command, whereas truncating a table is a
> DML command.

The force of that assertion is somewhat undercut by the fact that
the ExecuteTruncate() function does exactly what needs to be done
to discard the data in a materialized view and make it appear as
non-scannable.  Even if we dress it up with different syntax, it's
not clear that we wouldn't build a TruncateStmt in the parser and
pass it through exactly the same execution path.  We would just
need to look at the relkind to generate a different command tag.

> This has various implications with triggers, logging,
> permissions.  I think it's not good to mix those two.

Could you give a more concrete example of where you see a problem?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Tatsuo Ishii
Date:
> I would like to know what operations you plan to support independently
> of the command names. I may have missed much earlier in the discussion
> but then I suspect things have evolved since then.
> 
> It sounds like you want to support:
> 
> 1) Selecting from materialized viws
> 2) Manually refreshing materialized views
> 3) Manually truncating materialized views

Maybe plus?

4) Automatically dropping materialized views if underlying table(s)
are dropped/altered

Or this has to be done manually?

> And explicitly not support
> 
> 1) Automatically rewriting queries to select from matching views
> 2) Incrementally refreshing materialized views
> 3) Manual DML against data in materialized views (except truncate
> which is kind of DDL)
> 4) Keeping track of whether the data in the materialized view is up to date
> 
> I have to say I find this model a bit odd. It seems the UI you're
> presenting is that they're basically read-only tables that the
> database will fill in the data for automatically. My mental model of
> materialized views is that they're basically views that the database
> guarantees a different performance characteristic for.
> 
> I would expect a materialized view to be up to date all the time. If
> we don't support incremental updates (which seems like a fine thing
> not to support in a first cut) then I would expect any DML against the
> table to mark the view invalid and any queries against it to produce
> an error (or possibly go to the source tables using the view
> definition but that's probably a bad idea for most use cases). Ie.
> they should behave like a view at all times and have up to date
> information or fail entirely.
> 
> I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but
> I would expect it to be called something like INVALIDATE rather than
> TRUNCATE and dropping the storage is a side effect of simply telling
> the database that it doesn't need to maintain this materialized view.
> Though I could be convinced "truncate" is a good name as long as it's
> documented well.
> 
> -- 
> greg
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



Re: Materialized views WIP patch

From
Greg Stark
Date:
On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> More generally, I would consider the invalidation of a materialized view
> a DDL command, whereas truncating a table is a DML command.

That's not entirely true. From the database's point of view, TRUNCATE
is in many ways actually DDL.

I actually don't really dislike using "TRUNCATE" for this command. I
was more asking about whether this meant people were thinking of the
view as a thing where you could control the data in it by hand and
could have the view be "empty" rather than just "not valid".

The way I was thinking about it, whatever the command is named, you
might be able to tell the database to drop the storage associated with
the view but that would make the view invalid until it was refreshed.
It wouldn't make it appear to be empty.




-- 
greg



Re: Materialized views WIP patch

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> The way I was thinking about it, whatever the command is named, you
> might be able to tell the database to drop the storage associated with
> the view but that would make the view invalid until it was refreshed.
> It wouldn't make it appear to be empty.

Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
results in a table that is perfectly valid, you just deleted all the
rows that used to be in it.  Throwing away an MV's contents should
not result in an MV that is considered valid.  That being the case,
lumping them as being the "same" operation feels like the wrong thing,
and so we should choose a different name for the MV operation.
        regards, tom lane



Re: Materialized views WIP patch

From
Andres Freund
Date:
On 2013-02-21 04:14:09 +0000, Greg Stark wrote:
> On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> > More generally, I would consider the invalidation of a materialized view
> > a DDL command, whereas truncating a table is a DML command.
> 
> That's not entirely true. From the database's point of view, TRUNCATE
> is in many ways actually DDL.
> 
> I actually don't really dislike using "TRUNCATE" for this command. I
> was more asking about whether this meant people were thinking of the
> view as a thing where you could control the data in it by hand and
> could have the view be "empty" rather than just "not valid".

It also might get confusing when we get materialized views that are
auto-updateable. I am not suggesting to forward TRUNCATE to the internal
storage in that case but giving an error so its an easy to find
distinction to a normal table seems like a good idea.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Wed, Feb 20, 2013 at 11:14 PM, Greg Stark <stark@mit.edu> wrote:
> On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> More generally, I would consider the invalidation of a materialized view
>> a DDL command, whereas truncating a table is a DML command.
>
> That's not entirely true. From the database's point of view, TRUNCATE
> is in many ways actually DDL.
>
> I actually don't really dislike using "TRUNCATE" for this command.

Me neither.  I'm astonished that we're seriously considering adding
new keywords for this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 2/20/13 5:03 PM, Kevin Grittner wrote:
> Peter Eisentraut <peter_e@gmx.net> wrote:
>> On 2/20/13 2:30 PM, Kevin Grittner wrote:
>>>> Are there TRUNCATE triggers on materialized views?
>>> No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
>>> create a trigger of any type on a materialized view.  I don't
>>> think that would interfere with event triggers, though.
>>
>> More generally, I would consider the invalidation of a
>> materialized view a DDL command, whereas truncating a table is a
>> DML command.
> 
> The force of that assertion is somewhat undercut by the fact that
> the ExecuteTruncate() function does exactly what needs to be done
> to discard the data in a materialized view and make it appear as
> non-scannable.  Even if we dress it up with different syntax, it's
> not clear that we wouldn't build a TruncateStmt in the parser and
> pass it through exactly the same execution path.  We would just
> need to look at the relkind to generate a different command tag.

This is a fall-out of the implementation, and that's fine (although I'd
personally still be in favor of putting that state in the catalog, not
into the block count on disk, effectively), but I'm talking about the
external interfaces we present.

>> This has various implications with triggers, logging,
>> permissions.  I think it's not good to mix those two.
> 
> Could you give a more concrete example of where you see a problem?

* Logging: You can set things to log DDL commands only.  I would want a
MV invalidation to be logged.

* Permissions: There is a TRUNCATE permission, would that apply here?
There is no refresh permission.

* Triggers: There are TRUNCATE triggers, but they don't apply here.

* Triggers: I don't know how event triggers work, but I'd like
materialized view events to be grouped together somehow.

* Don't know the opinion of sepgsql on all this.

I think what this all comes down to, as I've mentioned before, is that
the opposite of this proposed truncate operation is the refresh
operation, and that is a DDL command under ALTER MATERIALIZED VIEW.
Both of these fundamental operations -- truncate/refresh,
invalidate/validate, empty/refill, whatever -- should be grouped
together somehow, as far as syntax, as well logging, permissions,
trigger handling, and so on are concerned.  You don't need a new command
or key word for this.  How about ALTER MATERIALIZED VIEW DISCARD?



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 2/20/13 11:14 PM, Greg Stark wrote:
> That's not entirely true. From the database's point of view, TRUNCATE
> is in many ways actually DDL.

Whether something is DDL or DML or a read operation (query) is not an
implementation detail, it's a user-exposed category.  Since TRUNCATE is
logically equivalent to DELETE, it's a DML operation, as far as the user
is concerned.




Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Peter Eisentraut <peter_e@gmx.net> wrote:
> On 2/20/13 11:14 PM, Greg Stark wrote:
>> That's not entirely true. From the database's point of view,
>> TRUNCATE is in many ways actually DDL.
>
> Whether something is DDL or DML or a read operation (query) is
> not an implementation detail, it's a user-exposed category.
> Since TRUNCATE is logically equivalent to DELETE, it's a DML
> operation, as far as the user is concerned.

Not really.  It doesn't follow the same MVCC behavior as DML.  This
is user-visible, documented behavior.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> The way I was thinking about it, whatever the command is named, you
>> might be able to tell the database to drop the storage associated with
>> the view but that would make the view invalid until it was refreshed.
>> It wouldn't make it appear to be empty.
>
> Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
> results in a table that is perfectly valid, you just deleted all the
> rows that used to be in it.  Throwing away an MV's contents should
> not result in an MV that is considered valid.

It doesn't.  That was one of the more contentious points in the
earlier bikeshedding phases.  Some felt that throwing away the
contents was a form of making the MV "out of date" and as such
didn't require any special handling.  Others, including myself,
felt that "data not present" was a distinct state from "generated
zero rows" and that attempting to scan a materialized view for
which data had not been generated must result in an error.  The
latter property has been maintained from the beginning -- or at
least that has been the intent.

test=# CREATE TABLE t (id int NOT NULL PRIMARY KEY, type text NOT NULL, amt numeric NOT NULL);
CREATE TABLE
test=# CREATE MATERIALIZED VIEW tm AS SELECT type, sum(amt) AS totamt FROM t GROUP BY type WITH NO DATA;
SELECT 0
test=# SELECT pg_relation_is_scannable('tm'::regclass);
 pg_relation_is_scannable
--------------------------
 f
(1 row)

test=# SELECT * FROM tm;
ERROR:  materialized view "tm" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.
test=# REFRESH MATERIALIZED VIEW tm;
REFRESH MATERIALIZED VIEW
test=# SELECT pg_relation_is_scannable('tm'::regclass);
 pg_relation_is_scannable
--------------------------
 t
(1 row)

test=# TRUNCATE MATERIALIZED VIEW tm;
TRUNCATE TABLE
test=# SELECT * FROM tm;
ERROR:  materialized view "tm" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.
test=# SELECT pg_relation_is_scannable('tm'::regclass);
 pg_relation_is_scannable
--------------------------
 f
(1 row)

> That being the case, lumping them as being the "same" operation
> feels like the wrong thing, and so we should choose a different
> name for the MV operation.

There is currently no truncation of MV data without rendering the
MV unscannable.  Do you still feel it needs a different command
name?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Greg Stark <stark@mit.edu> wrote:

> I actually don't really dislike using "TRUNCATE" for this
> command.  I was more asking about whether this meant people were
> thinking of the view as a thing where you could control the data
> in it by hand and could have the view be "empty" rather than just
> "not valid".

You can either populate the MV in the CREATE command or by REFRESH,
and it will be scannable.  If it is created WITH NO DATA or
TRUNCATEd it is not scannable, generating an error on an attempt to
reference it.

test=# select * from tm;
 type | totamt
------+--------
 y    |     12
 z    |     24
 x    |      5
(3 rows)

test=# truncate tm;
TRUNCATE TABLE
test=# select * from tm;
ERROR:  materialized view "tm" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.

> The way I was thinking about it, whatever the command is named,
> you might be able to tell the database to drop the storage
> associated with the view but that would make the view invalid
> until it was refreshed.  It wouldn't make it appear to be empty.

I think we're on the same page after all.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Peter Eisentraut <peter_e@gmx.net> wrote:

I'll need more time to ponder your other points, but...

> You don't need a new command or key word for this.  How about
> ALTER MATERIALIZED VIEW DISCARD?

I don't like this because we don't have ALTER TABLE REINDEX.  But
the fact that DISCARD is a keyword does open up some interesting
syntax possibilities without adding more keywords.  Maybe:

DISCARD MATERIALIZED VIEW DATA tm;

Or something like that.  Paint buckets are over there by the
bikeshed.  Have at it.  :-)
 
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Heikki Linnakangas
Date:
On 21.02.2013 16:38, Kevin Grittner wrote:
> Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Greg Stark<stark@mit.edu>  writes:
>>> The way I was thinking about it, whatever the command is named, you
>>> might be able to tell the database to drop the storage associated with
>>> the view but that would make the view invalid until it was refreshed.
>>> It wouldn't make it appear to be empty.
>>
>> Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
>> results in a table that is perfectly valid, you just deleted all the
>> rows that used to be in it.  Throwing away an MV's contents should
>> not result in an MV that is considered valid.
>
> It doesn't.  That was one of the more contentious points in the
> earlier bikeshedding phases.  Some felt that throwing away the
> contents was a form of making the MV "out of date" and as such
> didn't require any special handling.  Others, including myself,
> felt that "data not present" was a distinct state from "generated
> zero rows" and that attempting to scan a materialized view for
> which data had not been generated must result in an error.  The
> latter property has been maintained from the beginning -- or at
> least that has been the intent.

Yeah, "data not present" is clearly different from "0 rows". I'm 
surprised there isn't an explicit boolean column somewhere for that, but 
I guess you can use the size of the heap for that too, as long as you're 
careful to not truncate it to 0 blocks when it's empty but scannable.

There's at least one bug left in that area:

postgres=# create table t (id int4);
CREATE TABLE
postgres=# create materialized view tm as select * from t where id < 
0;SELECT 0
postgres=# select * from tm; id
----
(0 rows)

postgres=# create index i_tm on tm(id);CREATE INDEX
postgres=# cluster tm using i_tm;
CLUSTER
postgres=# select * from tm;
ERROR:  materialized view "tm" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.

Clustering a materialized view invalidates it.

- Heikki



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Andres Freund <andres@2ndquadrant.com> wrote:
> giving an error so its an easy to find distinction to a normal
> table seems like a good idea.

I'm not sure I understood your concerns entirely, but wonder
whether this helps?:

test=# \d
              List of relations
 Schema | Name  |       Type        |  Owner
--------+-------+-------------------+---------
 public | bb    | materialized view | kgrittn
 public | t     | table             | kgrittn
 public | tm    | materialized view | kgrittn
 public | tmm   | materialized view | kgrittn
 public | tv    | view              | kgrittn
 public | tvmm  | materialized view | kgrittn
 public | tvv   | view              | kgrittn
 public | tvvm  | materialized view | kgrittn
 public | tvvmv | view              | kgrittn
(9 rows)

test=# truncate table tm;
ERROR:  "tm" is not a table
test=# truncate materialized view t;
ERROR:  "t" is not a materialized view
test=# truncate materialized view tm;
TRUNCATE TABLE
test=# truncate table t;
TRUNCATE TABLE

Well, maybe those command tags could use a tweak.

Then there's this, if you don't specify an object type:

test=# truncate t, tm;
TRUNCATE TABLE

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That being the case, lumping them as being the "same" operation
>> feels like the wrong thing, and so we should choose a different
>> name for the MV operation.

> There is currently no truncation of MV data without rendering the
> MV unscannable.� Do you still feel it needs a different command
> name?

You didn't say anything that changed my opinion: it doesn't feel like
a TRUNCATE to me.  It's not changing the object to a different but
entirely valid state, which is what TRUNCATE does.

Peter claimed upthread that REFRESH is a subcommand of ALTER MATERIALIZE
VIEW and that this operation should be another one.  That sounds pretty
reasonable from here.
        regards, tom lane



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 21.02.2013 16:38, Kevin Grittner wrote:
>> Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>> Greg Stark<stark@mit.edu>  writes:
>>>> The way I was thinking about it, whatever the command is named, you
>>>> might be able to tell the database to drop the storage associated with
>>>> the view but that would make the view invalid until it was refreshed.
>>>> It wouldn't make it appear to be empty.
>>>
>>> Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
>>> results in a table that is perfectly valid, you just deleted all the
>>> rows that used to be in it.  Throwing away an MV's contents should
>>> not result in an MV that is considered valid.
>>
>> It doesn't.  That was one of the more contentious points in the
>> earlier bikeshedding phases.  Some felt that throwing away the
>> contents was a form of making the MV "out of date" and as such
>> didn't require any special handling.  Others, including myself,
>> felt that "data not present" was a distinct state from "generated
>> zero rows" and that attempting to scan a materialized view for
>> which data had not been generated must result in an error.  The
>> latter property has been maintained from the beginning -- or at
>> least that has been the intent.
>
> Yeah, "data not present" is clearly different from "0 rows".
> I'm surprised there isn't an explicit boolean column somewhere
> for that,

There was, in earlier versions of the patch: pg_class.relisvald.
The problem is that we needed some way to determine from the heap
that it was invalid to support UNLOGGED MVs.  Several people were
offended by my attempt to use relisvald as the primary indicator
and transfer the information from the heap state to pg_class and
relcache.  There were some pretty big technical challenges to that.
So I caved on that one and went with the pg_relation_is_scannable()
function based on the heap as reported by relcache.  That being one
of the newer parts of the patch, it is probably not as solid as the
parts which haven't changed much in the last three months.

> but I guess you can use the size of the heap for that too, as
> long as you're careful to not truncate it to 0 blocks when it's
> empty but scannable.
>
> There's at least one bug left in that area:
>
> postgres=# create table t (id int4);
> CREATE TABLE
> postgres=# create materialized view tm as select * from t where id < 0;SELECT
> 0
> postgres=# select * from tm;
> id
> ----
> (0 rows)
>
> postgres=# create index i_tm on tm(id);CREATE INDEX
> postgres=# cluster tm using i_tm;
> CLUSTER
> postgres=# select * from tm;
> ERROR:  materialized view "tm" has not been populated
> HINT:  Use the REFRESH MATERIALIZED VIEW command.
>
> Clustering a materialized view invalidates it.

Good spot.  That should be easy enough to fix.

Thanks.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Andres Freund
Date:
On 2013-02-21 07:10:09 -0800, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> > giving an error so its an easy to find distinction to a normal
> > table seems like a good idea.
>
> I'm not sure I understood your concerns entirely, but wonder
> whether this helps?:

To explain it a bit:

I assume that at some point matviews will get (auto-)updateable, just as
normal views recently got. In that case application programmers might
not be aware anymore that something is a view either because they just
don't know or because a table got converted into a matview after the
code was written.

Because of the potential wish for transparency (with security views as a
potential user) at least normal views might get the capability to be
TRUNCATEd directly, so it might be that matviews do as well.

> test=# \d
>               List of relations
>  Schema | Name  |       Type        |  Owner
> --------+-------+-------------------+---------
>  public | bb    | materialized view | kgrittn
>  public | t     | table             | kgrittn
>  public | tm    | materialized view | kgrittn
>  public | tmm   | materialized view | kgrittn
>  public | tv    | view              | kgrittn
>  public | tvmm  | materialized view | kgrittn
>  public | tvv   | view              | kgrittn
>  public | tvvm  | materialized view | kgrittn
>  public | tvvmv | view              | kgrittn
> (9 rows)
>
> test=# truncate table tm;
> ERROR:  "tm" is not a table
> test=# truncate materialized view t;
> ERROR:  "t" is not a materialized view
> test=# truncate materialized view tm;
> TRUNCATE TABLE
> test=# truncate table t;
> TRUNCATE TABLE

Thats not bad.

But what if we allow TRUNCATE on views someday (possibly only if a
truncate trigger is defined). For consistency we might also want that on
matvies. Having a difference between TRUNCATE view; and TRUNCATE
MATERIALIZED VIEW; in that case sounds ugly to me.

What about DISABLE? DISCARD or DEALLOCATE would also be nice but it
seems hard to fit that into existing syntax.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> That being the case, lumping them as being the "same"
>>> operation feels like the wrong thing, and so we should choose a
>>> different name for the MV operation.
>>
>> There is currently no truncation of MV data without rendering
>> the MV unscannable.  Do you still feel it needs a different
>> command name?
>
> You didn't say anything that changed my opinion: it doesn't feel
> like a TRUNCATE to me.  It's not changing the object to a
> different but entirely valid state, which is what TRUNCATE does.
>
> Peter claimed upthread that REFRESH is a subcommand of ALTER
> MATERIALIZE VIEW

It's not, nor do I think it should be.

> and that this operation should be another one.  That sounds
> pretty reasonable from here.

That feels completely wrong to me.  For one thing, I can't think of
any ALTER commands to populate or remove data.  What did you think
of the idea of something like DISCARD MATERIALIZED VIEW DATA as a
new statment?  Or maybe RESET MATERIALIZED VIEW?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Andres Freund <andres@2ndquadrant.com> wrote:

> I assume that at some point matviews will get (auto-)updateable,
> just as normal views recently got.

I'm dubious about that.  Every use case I've seen for MVs involves
aggregation, although they are a generalized feature, so that won't
always be true.  But if you have a view like:

CREATE MATERIALIZED VIEW tm AS
 SELECT t.type,
    sum(t.amt) AS totamt
   FROM t
  GROUP BY t.type;

... I don't see how that can be updateable.  If I add 5 to totamt
for some row, what do you do?  I expect that 99% of MVs will be
updated asynchronously from changes to the underlying data -- what
do you do if someone updates a row that no longer exists in the
underlying data.  This are just seems fraught with peril and out of
sync with the usual uses of MVs.

> What about DISABLE? DISCARD or DEALLOCATE would also be nice but
> it seems hard to fit that into existing syntax.

Thanks for the suggestions.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Peter claimed upthread that REFRESH is a subcommand of ALTER
>> MATERIALIZE VIEW

> It's not, nor do I think it should be.

Oh, never mind then.

>> and that this operation should be another one.� That sounds
>> pretty reasonable from here.

> That feels completely wrong to me.� For one thing, I can't think of
> any ALTER commands to populate or remove data.� What did you think
> of the idea of something like DISCARD MATERIALIZED VIEW DATA as a
> new statment?� Or maybe RESET MATERIALIZED VIEW?

I could live with either DISCARD or RESET.
        regards, tom lane



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
On 2/21/13 9:25 AM, Kevin Grittner wrote:
> Peter Eisentraut <peter_e@gmx.net> wrote:
>> On 2/20/13 11:14 PM, Greg Stark wrote:
>>> That's not entirely true. From the database's point of view,
>>> TRUNCATE is in many ways actually DDL.
>>
>> Whether something is DDL or DML or a read operation (query) is
>> not an implementation detail, it's a user-exposed category. 
>> Since TRUNCATE is logically equivalent to DELETE, it's a DML
>> operation, as far as the user is concerned.
> 
> Not really.  It doesn't follow the same MVCC behavior as DML.  This
> is user-visible, documented behavior.

MVCC behavior does not determine whether something is considered DDL or DML.




Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:

>> What did you think of the idea of something like DISCARD
>> MATERIALIZED VIEW DATA as a new statment?  Or maybe RESET
>> MATERIALIZED VIEW?
>
> I could live with either DISCARD or RESET.

I figured this was worth a pass through the keyword list to look
for all imperative verbs suitable for this, which could support the
needed syntax without adding a new keyword.  Here are the
possibilities I came up with, along with a note about why they are
keywords already.

DISABLE MATERIALIZED VIEW mv;  -- ALTER clause for constraints
DISCARD MATERIALIZED VIEW DATA mv;  -- session state
RELEASE MATERIALIZED VIEW DATA mv;  -- savepoint
RESET MATERIALIZED VIEW DATA mv;  -- run-time parameter

I think any of these could work.  I'm personally most inclined
toward DISABLE MATERIALIZED VIEW.  It seems to convey the semantics
better, especially if you leave out DATA as an additonal word.
Given that a materialized view will retain its query, tablespace,
indexes, statistics targets, etc. with this operation, and will
just not be available for scanning, some of the above seem
downright misleading without DATA thrown in.

Opinions?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Greg Stark
Date:
On Thu, Feb 21, 2013 at 2:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> It doesn't.  That was one of the more contentious points in the
> earlier bikeshedding phases.  Some felt that throwing away the
> contents was a form of making the MV "out of date" and as such
> didn't require any special handling.  Others, including myself,
> felt that "data not present" was a distinct state from "generated
> zero rows" and that attempting to scan a materialized view for
> which data had not been generated must result in an error.  The
> latter property has been maintained from the beginning -- or at
> least that has been the intent.

Actually this sounds like exactly what I was saying. I withdraw my
concern entirely.

-- 
greg



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Greg Stark <stark@mit.edu> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> It doesn't.  That was one of the more contentious points in the
>> earlier bikeshedding phases.  Some felt that throwing away the
>> contents was a form of making the MV "out of date" and as such
>> didn't require any special handling.  Others, including myself,
>> felt that "data not present" was a distinct state from
>> "generated zero rows" and that attempting to scan a materialized
>> view for which data had not been generated must result in an
>> error.  The latter property has been maintained from the
>> beginning -- or at least that has been the intent.
>
> Actually this sounds like exactly what I was saying. I withdraw
> my concern entirely.

Reviewing your concerns and discussions of "freshness" in general
got me thinking -- while it is clear that not having generated
values in the MV based on its query clearly should make the view
non-scannable, and will be the only criterion for that in this
initial patch; later versions will almost certainly support
age-based conditions for whether the MV is scannable.  So in the
next release the MV may become non-scannable based on the passage
of time since DML was run on a source table without the MV having
been refreshed or incrementally updated to reflect that DML.  Which
makes me wonder why DML making the MV non-scannable is such a bad
thing in the case of TRUNCATE.  Granted there is a difference in
that it is run *on the MV* rather than *on a source relation*; but
still, I'm having some second thoughts about that being a problem.

The problem with TRUNCATE MATERIALIZED VIEW from a logical
perspective doesn't seem to me so much that it makes the MV
non-scannable, as that it is the only DML which would be allowed
directly on an MV -- which is kind of a weird exception.  It is
pretty much a given that when we can get to implementing it, DML
statements will render MVs unscannable under various conditions.
Josh Berkus and Greg Stark have been the most explicit about that,
but I think most of us who are interested in the feature take it as
a given.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Andres Freund
Date:
On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Kevin Grittner <kgrittn@ymail.com> writes:
>
> >> What did you think of the idea of something like DISCARD
> >> MATERIALIZED VIEW DATA as a new statment?  Or maybe RESET
> >> MATERIALIZED VIEW?
> >
> > I could live with either DISCARD or RESET.
>
> I figured this was worth a pass through the keyword list to look
> for all imperative verbs suitable for this, which could support the
> needed syntax without adding a new keyword.  Here are the
> possibilities I came up with, along with a note about why they are
> keywords already.
>
> DISABLE MATERIALIZED VIEW mv;  -- ALTER clause for constraints
> DISCARD MATERIALIZED VIEW DATA mv;  -- session state
> RELEASE MATERIALIZED VIEW DATA mv;  -- savepoint
> RESET MATERIALIZED VIEW DATA mv;  -- run-time parameter
>
> I think any of these could work.  I'm personally most inclined
> toward DISABLE MATERIALIZED VIEW.  It seems to convey the semantics
> better, especially if you leave out DATA as an additonal word.

> Given that a materialized view will retain its query, tablespace,
> indexes, statistics targets, etc. with this operation, and will
> just not be available for scanning, some of the above seem
> downright misleading without DATA thrown in.

I vote for RESET or DISCARD. DISABLE sounds more like you disable
automatic refreshes or somesuch.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Materialized views WIP patch

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote:
>> DISABLE MATERIALIZED VIEW mv;� -- ALTER clause for constraints
>> DISCARD MATERIALIZED VIEW DATA mv;� -- session state
>> RELEASE MATERIALIZED VIEW DATA mv;� -- savepoint
>> RESET MATERIALIZED VIEW DATA mv;� -- run-time parameter
>> 
>> I think any of these could work.� I'm personally most inclined
>> toward DISABLE MATERIALIZED VIEW.� It seems to convey the semantics
>> better, especially if you leave out DATA as an additonal word. 

> I vote for RESET or DISCARD. DISABLE sounds more like you disable
> automatic refreshes or somesuch.

Yeah, I don't much like DISABLE either.  I'm also concerned about
overloading RESET this way --- that statement has complicated-enough
syntax already, not to mention way too many shades of meaning.  So that
leaves me voting for DISCARD M.V. DATA, which seems pretty precise.
It's a bit verbose, but since when has SQL been succinct?
        regards, tom lane



Re: Materialized views WIP patch

From
Josh Berkus
Date:
> That feels completely wrong to me.  For one thing, I can't think of
> any ALTER commands to populate or remove data.  What did you think
> of the idea of something like DISCARD MATERIALIZED VIEW DATA as a
> new statment?  Or maybe RESET MATERIALIZED VIEW?

I prefer RESET, especially since it could eventually support RESET ALL
MATERIALIZED VIEWS if that turns out to be useful.  How does the parser
like that?

BTW, to contradict Peter E., for my part I would NOT want matview resets
to be logged as DDL.  I would only want matview definitition changes to
be so logged.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Greg Stark
Date:
On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus <josh@agliodbs.com> wrote:
> I prefer RESET, especially since it could eventually support RESET ALL
> MATERIALIZED VIEWS if that turns out to be useful.  How does the parser
> like that?

Isn't reset currently only used for GUCs? I think that makes for a
strange crossover.

Really, I'm sorry for bringing this up. I really don't think
"truncate" is a bad way to spell it.


-- 
greg



Re: Materialized views WIP patch

From
Michael Paquier
Date:


On Sat, Feb 23, 2013 at 9:55 PM, Greg Stark <stark@mit.edu> wrote:
On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus <josh@agliodbs.com> wrote:
> I prefer RESET, especially since it could eventually support RESET ALL
> MATERIALIZED VIEWS if that turns out to be useful.  How does the parser
> like that?

Isn't reset currently only used for GUCs? I think that makes for a
strange crossover.
--
Michael

Re: Materialized views WIP patch

From
"Erik Rijkers"
Date:
2013-02-19  Kevin Grittner wrote:
[matview-system_views-v2.diff]

I assumed the patches matview-v4.patch and matview-system_views-v2.diff
were to be applied together.

They do apply correctly but during tests, the "test rules     ... FAILED".

Perhaps it is solved already but I thought I'd mention it in case it is overlooked.

Thanks,

Erik Rijkers





Re: Materialized views WIP patch

From
Robert Haas
Date:
On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
> DISCARD would be better.

Well, personally, I'm in favor of either TRUNCATE or ALTER
MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous precedent to
suppose that we're going to start using DISCARD for things that have
nothing to do with the existing meanings of DISCARD.  Number one, I
think it's confusing.  Number two, it's currently possible to
determine whether something is DDL, DML, or other by looking at the
first word of the command.  If we throw that out the window we may
cause performance issues for connection pooling software that tries to
be clever like that.  Mind you, I'm not aware of any connection
pooling software that actually does this today, because I think pgpool
includes a full parser and pgbouncer includes no parser at all, but it
still seems like a possibly-useful trick.  And number three, the
possibility of grammar conflicts with things we might want to do in
the future seems unnecessarily high.  If we use ALTER MATERIALIZED
VIEW ... WHATEVER, we only have to worry about grammar conflicts with
other ALTER MATERIALIZED VIEW commands; if we reuse DISCARD or RESET,
we've got potential conflicts with completely unrelated syntax.  That
consideration alone would be sufficient reason for me to choose to
stick everything under ALTER MATERIALIZED VIEW, no matter how poor a
semantic fit it seems otherwise.  Of course, if we stick with
TRUNCATE, this becomes a non-issue.

All that having been said, I'm not in favor of pushing this patch out
to 9.4 because we can't agree on minor syntax details.  In the absence
of consensus, my feeling is that Kevin should exercise committer's
prerogative and commit this in the way that seems best to him.  If a
clear contrary consensus subsequently emerges, we can always change
it.  It is not as if the particular choice of SQL syntax is hard to
whack around.  Once we release it we're stuck with it, but certainly
between now and beta we can change it whenever we like.  I'd rather
have the core part of the feature committed and tinker with the syntax
than wait longer to land the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
>> DISCARD would be better.

> Well, personally, I'm in favor of either TRUNCATE or ALTER
> MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous precedent to
> suppose that we're going to start using DISCARD for things that have
> nothing to do with the existing meanings of DISCARD.

Yeah, there's actually a serious problem with choosing DISCARD:
surely we are not going to include "trash all MVs" in the behavior
of DISCARD ALL.  So unless you would like to say that DISCARD ALL
doesn't mean what it appears to mean, we can't make MV reset be
one of the sub-flavors of DISCARD.

So that seems to leave us with either TRUNCATE or an ALTER sub-syntax.
Personally I'd prefer the latter but it's surely debatable.
        regards, tom lane



Re: Materialized views WIP patch

From
Heikki Linnakangas
Date:
On 28.02.2013 16:55, Robert Haas wrote:
> On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier
> <michael.paquier@gmail.com>  wrote:
>> it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
>> DISCARD would be better.
>
> Well, personally, I'm in favor of either TRUNCATE or ALTER
> MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous precedent to
> suppose that we're going to start using DISCARD for things that have
> nothing to do with the existing meanings of DISCARD.  Number one, I
> think it's confusing.  Number two, it's currently possible to
> determine whether something is DDL, DML, or other by looking at the
> first word of the command.  If we throw that out the window we may
> cause performance issues for connection pooling software that tries to
> be clever like that.

FWIW, I totally agree with that. From that point of view, the best thing 
would be to tack this onto the REFRESH command, perhaps something like:

REFRESH matview INVALIDATE;
REFRESH matview UNREFRESH;
REFRESH matview DISCARD;

It's a bit weird that the command is called REFRESH, if the effect is 
the exact opposite of refreshing it. And we usually do have two separate 
commands for doing something and undoing the same; CREATE - DROP, 
PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and so forth.

I think we're being too hung up on avoiding new (unreserved) keywords. 
Yes, the grammar is large because of so many keywords, but surely 
there's some better solution to that than adopt syntax that sucks. Let's 
invent a new keyword (INVALIDATE? UNREFRESH?), and deal with the grammar 
bloat separately.

- Heikki



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Thu, Feb 28, 2013 at 11:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, personally, I'm in favor of either TRUNCATE or ALTER
>> MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous precedent to
>> suppose that we're going to start using DISCARD for things that have
>> nothing to do with the existing meanings of DISCARD.
>
> Yeah, there's actually a serious problem with choosing DISCARD:
> surely we are not going to include "trash all MVs" in the behavior
> of DISCARD ALL.  So unless you would like to say that DISCARD ALL
> doesn't mean what it appears to mean, we can't make MV reset be
> one of the sub-flavors of DISCARD.

Good point.

> So that seems to leave us with either TRUNCATE or an ALTER sub-syntax.
> Personally I'd prefer the latter but it's surely debatable.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 28.02.2013 16:55, Robert Haas wrote:

>> Well, personally, I'm in favor of either TRUNCATE or ALTER
>> MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous
>> precedent to suppose that we're going to start using DISCARD for
>> things that have nothing to do with the existing meanings of
>> DISCARD.  Number one, I think it's confusing.  Number two, it's
>> currently possible to determine whether something is DDL, DML,
>> or other by looking at the first word of the command.  If we
>> throw that out the window we may cause performance issues for
>> connection pooling software that tries to be clever like that.
>
> FWIW, I totally agree with that. From that point of view, the
> best thing would be to tack this onto the REFRESH command,
> perhaps something like:
>
> REFRESH matview INVALIDATE;
> REFRESH matview UNREFRESH;
> REFRESH matview DISCARD;
>
> It's a bit weird that the command is called REFRESH, if the
> effect is the exact opposite of refreshing it. And we usually do
> have two separate commands for doing something and undoing the
> same; CREATE - DROP, PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and
> so forth.
>
> I think we're being too hung up on avoiding new (unreserved)
> keywords. Yes, the grammar is large because of so many keywords,
> but surely there's some better solution to that than adopt syntax
> that sucks. Let's invent a new keyword (INVALIDATE? UNREFRESH?),
> and deal with the grammar bloat separately.

I'm OK with any grammar that we can reach consensus on, but that
seems elusive and I don't want to hold up getting the meat of the
patch committed while we haggle out this syntax detail.  Votes have
shifted frequently, but as I make out the latest opinion of people
making a statement that looks like an explicit vote, dividing the
value of a split vote between the choices, I get:

Kevin Grittner: TRUNCATE
Stephen Frost: TRUNCATE
Peter Eisentraut: ALTER
Andres Freund: RESET or DISCARD
Josh Berkus: RESET
Greg Stark: TRUNCATE
Michael Paquier: DISCARD
Robert Haas: TRUNCATE or ALTER
Tom Lane: TRUNCATE or ALTER
Heikki Linnakangas: REFRESH?

Vote totals:

TRUNCATE:  4.0
ALTER:     2.0
DISCARD:   1.5
RESET:     1.5
REFRESH    1.0?

Barring a sudden confluence of opinion, I will go with TRUNCATE for
the initial spelling.  I tend to favor that spelling for several
reasons.  One was the size of the patch needed to add the opposite
of REFRESH to the backend code:

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..eb7a14f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1234,11 +1234,12 @@ truncate_check_rel(Relation rel)
 {
    AclResult   aclresult;
 
-   /* Only allow truncate on regular tables */
-   if (rel->rd_rel->relkind != RELKIND_RELATION)
+   /* Only allow truncate on regular tables and materialized views. */
+   if (rel->rd_rel->relkind != RELKIND_RELATION &&
+       rel->rd_rel->relkind != RELKIND_MATVIEW)
        ereport(ERROR,
                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-                errmsg("\"%s\" is not a table",
+                errmsg("\"%s\" is not a table or materialized view",
                        RelationGetRelationName(rel))));
 
    /* Permissions checks */

That's it.  That takes it from "no way to release the space held by
the current contents of a materialized view and render it
unscannable until its rule's query is used to populate it again" to
"working".  Now, there are docs and psql support needed on top of
that, and the regression tests use the verb, and bikeshedding led
to a minor tweak of TRUNCATE so that you could specify TRUNCATE
MATERIALIZED VIEW -- but to get working backend code, the above is
sufficient.  That strikes me a prima facie evidence that it's not a
horribly off-target verb.

In terms of things to consider in choosing a verb are that, while
complete absence of data in the MV's backing table is the only
thing which will render it unscannable in this initial version,
there is clear demand to eventually track information on how
up-to-date data is, and treat the MV as unscannable for less
extreme conditions, such as the asynchronous application of changes
from backing tables having fallen behind some configurable
threshold.  In essence, people want the error instead of scanning
the relation if the generated data is not within a range of time
that is "fresh" enough, and truncating the backing relation (or
creating the MV WITH NO DATA) is a special case of that since the
data is not representing the results of the MV's query as of *any*
known point in time.  (As previously discussed, that is distinct
from an empty relation which is known to represent a fresh view of
the results of that query.)

If we pick a new pair of verbs, the connotations I think we should
be looking for include the ability to deal with at least these:

* Make the MV represent the result set of its query as of this
moment.

* Declare the MV to be too stale for the contents of its backing
table to be useful, and free the space held by the current
contents.

Other operations will be needed, for which syntax is not settled.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Ants Aasma
Date:
On Thu, Feb 28, 2013 at 7:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Barring a sudden confluence of opinion, I will go with TRUNCATE for
> the initial spelling.  I tend to favor that spelling for several
> reasons.  One was the size of the patch needed to add the opposite
> of REFRESH to the backend code:

FWIW, I found Andres's point about closing the door on updatable views
quite convincing. If at any point we want to add updatable
materialized views, it seems like a bad inconsistency to have TRUNCATE
mean something completely different from DELETE. While update
capability for materialized views might not be a common use case, I
don't think it's fair to completely shut the door on it to have easier
implementation and shorter syntax. Especially as the shorter syntax is
semantically inconsistent - normal truncate removes the data,
materialized view just makes the data inaccessible until the next
refresh.

Sorry for weighing in late, but it seemed to me that this point didn't
get enough consideration.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Ants Aasma <ants@cybertec.at> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> Barring a sudden confluence of opinion, I will go with TRUNCATE
>> for the initial spelling.  I tend to favor that spelling for
>> several reasons.  One was the size of the patch needed to add
>> the opposite of REFRESH to the backend code:
>
> FWIW, I found Andres's point about closing the door on updatable
> views quite convincing. If at any point we want to add updatable
> materialized views, it seems like a bad inconsistency to have
> TRUNCATE mean something completely different from DELETE. While
> update capability for materialized views might not be a common
> use case, I don't think it's fair to completely shut the door on
> it to have easier implementation and shorter syntax. Especially
> as the shorter syntax is semantically inconsistent - normal
> truncate removes the data, materialized view just makes the data
> inaccessible until the next refresh.
>
> Sorry for weighing in late, but it seemed to me that this point
> didn't get enough consideration.

Personally, I don't understand why anyone would want updateable
materialized views.  That's probably because 99% of the cases where
I've seen that someone wanted them, they wanted them updated to
match the underlying data using some technique that didn't require
the modification or commit of the underlying data to carry the
overhead of maintaining the MV.  In other words, they do not want
the MV to be up-to-date for performance reasons.  That is a big
part of the reason for *wanting* to use an MV.  How do you make an
asynchronously-maintained view updateable?

In addtion, at least 80% of the cases I've seen where people want
an MV it is summary information, which does not tie a single MV row
to a single underlying row.  If someone updates an aggregate number
like an average, I see no reasonable way to map that to the
underlying data in a meaningful way.

I see the contract of a materialized view as providing a
table-backed relation which shows the result set of a query as of
some point in time.  Perhaps it is a failure of imagination, but I
don't see where modifying that relation directly is compatible with
that contract.

Can you describe a meaningful use cases for an udpateable
materialized view?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Ants Aasma
Date:
On Fri, Mar 1, 2013 at 4:18 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Personally, I don't understand why anyone would want updateable
> materialized views.  That's probably because 99% of the cases where
> I've seen that someone wanted them, they wanted them updated to
> match the underlying data using some technique that didn't require
> the modification or commit of the underlying data to carry the
> overhead of maintaining the MV.  In other words, they do not want
> the MV to be up-to-date for performance reasons.  That is a big
> part of the reason for *wanting* to use an MV.  How do you make an
> asynchronously-maintained view updateable?
>
> In addtion, at least 80% of the cases I've seen where people want
> an MV it is summary information, which does not tie a single MV row
> to a single underlying row.  If someone updates an aggregate number
> like an average, I see no reasonable way to map that to the
> underlying data in a meaningful way.
>
> I see the contract of a materialized view as providing a
> table-backed relation which shows the result set of a query as of
> some point in time.  Perhaps it is a failure of imagination, but I
> don't see where modifying that relation directly is compatible with
> that contract.
>
> Can you describe a meaningful use cases for an udpateable
> materialized view?

I actually agree that overwhelming majority of users don't need or
want updateable materialized views. My point was that we can't at this
point rule out that people will think of a good use for this. I don't
have any real use cases for this, but I can imagine a few situations
where updateable materialized views wouldn't be nonsensical.

One case would be if the underlying data is bulkloaded and is
subsetted into smaller materialized views for processing using
off-the-shelf tools that expect tables. One might want to propagate
changes from those applications to the base data.

The other case would be the theoretical future where materialized
views can be incrementally and transactionally maintained, in that
case being able to express modifications on the views could actually
make sense.

I understand that the examples are completely hypothetical and could
be solved by using regular tables. I just have feeling that will
regret conflating TRUNCATE semantics for slight implementation and
notation convenience. To give another example of potential future
update semantics, if we were to allow users manually maintaining
materialized view contents using DML commands, one would expect
TRUNCATE to mean "make this matview empty", not "make this matview
unavailable".

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



Re: Materialized views WIP patch

From
Greg Stark
Date:
On Fri, Mar 1, 2013 at 3:01 PM, Ants Aasma <ants@cybertec.at> wrote:
> . To give another example of potential future
> update semantics, if we were to allow users manually maintaining
> materialized view contents using DML commands, one would expect
> TRUNCATE to mean "make this matview empty", not "make this matview
> unavailable".

Wouldn't that just be a regular table then though? How is that a
materialized view?

If anything someone might expect truncate to delete any rows from the
source table that appear in the view. But I think it's likely that
even if materialized views were updateable truncate wouldn't be one of
the updateable operations.


-- 
greg



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Greg Stark <stark@mit.edu> wrote:
> Ants Aasma <ants@cybertec.at> wrote:

>> To give another example of potential future update semantics, if
>> we were to allow users manually maintaining materialized view
>> contents using DML commands, one would expect TRUNCATE to mean
>> "make this matview empty", not "make this matview unavailable".
>
> Wouldn't that just be a regular table then though? How is that a
> materialized view?
>
> If anything someone might expect truncate to delete any rows from
> the source table that appear in the view. But I think it's likely
> that even if materialized views were updateable truncate wouldn't
> be one of the updateable operations.

Yeah, the only way it would make sense to truncate MV contents from
a user-written maintenance trigger (assuming we might have such a
thing some day) would be if you decided that the change to the
underlying data was so severe that you effectively needed to
REFRESH, and there would probably be a better way to go about
dealing with that.

Two other reasons that this might not be a problem are:

(1)  Any DML against the MV would need to be limited to some
context fired by the underlying changes.  If we allow changes to
the MV outside of that without it being part of some "updateable
MV" feature (reversing the direction of flow of changes), the MV
could not be trusted at all.  If you're going to do that, just use
a table.

(2)  I can think of a couple not-too-horrible syntax tricks we
could use to escape from the corner we're worried about painting
ourselves into.

All of that said, some combination of Heikki's previous suggestion
that maybe REFRESH could be used and my noticing that both TRUNCATE
and REFRESH create a new heap for an MV, it's just a question of
whether we then run the MV's query to fill it with data, led to
this thought:

REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA

This sort of mirrors the CREATE MATERIALIZED VIEW style (which was
based on CREATE TABLE AS) and WITH NO DATA puts the MV into the
unscannable state either way.  I can change the parser to make this
literally just the new spelling of TRUNCATE MATERIALIZED VIEW
without dashing my hopes of pushing the patch tomorrow.  (The patch
has been ready to go for weeks other than this syntax issue and
documentation which needs to refer to whatever syntax is chosen.)

Barring objections, I will use the above and push tomorrow.

The only issues which have been raised which will not be addressed
at that point are:

(1)  The suggestion that ALTER MATERIALIZED VIEW name ALTER column
support changing the collation isn't something I can see how to do
without complication and risk beyond what is appropriate at this
point in the release cycle.  It will be left off, at least for now.
 To get a new collation, you will need to drop the MV and re-create
it with a query which specifies the collation for the result
column.

(2)  The sepgsql changes are still waiting for a decision from
security focused folks.  I have two patches for that contrib module
ready based on my best reading of things -- one which uses table
security labels and one which instead uses new matview labels.
When we get a call on which is preferred, I suspect that one patch
or the other will be good as-is or with minimal change.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Dean Rasheed
Date:
On 2 March 2013 15:06, Kevin Grittner <kgrittn@ymail.com> wrote:
> [ ... ] led to
> this thought:
>
> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
>

[Sorry to join this discussion so late]

FWIW I had a quick look at other DBs to see if there were any other
precedents out there. Oracle was the only one I could find with
anything similar. They use the same creation syntax:
 CREATE MATERIALIZED VIEW name [options] AS SELECT ...

and they use ALTER for everything else, such as refreshing the MV:
 ALTER MATERIALIZED VIEW name REFRESH [options];

AFAICT the nearest thing they have to TRUNCATE/DISCARD is:
 ALTER MATERIALIZED VIEW name CONSIDER FRESH;

They do also support updateable materialized views with standard DML,
but it doesn't look as though they allow TRUNCATE to operate directly
on a MV (although it can be made to propagate from a base table to a
MV, in which case allowing TRUNCATE on the MV itself with a different
meaning would likely be confusing).

Oracle's MVs have lots of options, all of which hang off the 2 basic
CREATE and ALTER commands. There's a certain appeal to that, rather
than inventing or overloading a bunch of other commands as more
options get added. The proposed REFRESH command is OK for today's
options, but I think it might be overly limiting in the future.

Of course, since this isn't in the SQL standard, we are free to use
any syntax we like. We don't have to follow Oracle, but having a
common syntax might make some people's lives easier, and I haven't
seen a convincing argument as to why any alternative syntax is better.

Regards,
Dean



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:

>> [ ... ] led to this thought:
>>
>> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
>
> [Sorry to join this discussion so late]
>
> FWIW I had a quick look at other DBs to see if there were any
> other precedents out there. Oracle was the only one I could find
> with anything similar. They use the same creation syntax:
>
>   CREATE MATERIALIZED VIEW name [options] AS SELECT ...

It is a pretty obvious choice when you look at other SQL
statements.

> and they use ALTER for everything else, such as refreshing the
> MV:
>
>   ALTER MATERIALIZED VIEW name REFRESH [options];

No, that is for specifiying when and under what conditions an
automatic refresh is done.  To do an immediate action which is
equivalent to what I have for the REFRESH statement, they use a
REFRESH() function.  That seemed too incompatible with how we've
done everything else in PostgreSQL -- I felt that a statement would
make more sense.  Consider REINDEX, CLUSTER, and VACUUM FULL for
example.

> AFAICT the nearest thing they have to TRUNCATE/DISCARD is:
>
>   ALTER MATERIALIZED VIEW name CONSIDER FRESH;

No, that doesn't rebuild or discard data -- if the MV is
out-of-date and therefore unscannable according to the how the MV
has been set up, this overrides that indication and allows scanning
in spite of that.

> They do also support updateable materialized views with standard
> DML, but it doesn't look as though they allow TRUNCATE to operate
> directly on a MV (although it can be made to propagate from a
> base table to a MV, in which case allowing TRUNCATE on the MV
> itself with a different meaning would likely be confusing).

They allow DML on the MV in order to update it.  The default
REFRESH() function executes a TRUNCATE statement followed by INSERT
/ SELECT using the MV's query.

> Oracle's MVs have lots of options, all of which hang off the 2
> basic CREATE and ALTER commands. There's a certain appeal to
> that, rather than inventing or overloading a bunch of other
> commands as more options get added. The proposed REFRESH command
> is OK for today's options, but I think it might be overly
> limiting in the future.

For what ALTER MATERIALIZED VIEW in Oracle does, I think it makes
sense to use ALTER.  I don't think this feature should use
functions for REFRESH.  Why Oracle can get away with functions for
it is that they allow DML on an MV, which seems to me to compromise
the integrity of the feature, at least as default behavior.

I see us supporting automatic incremental updates of progressively
more complex queries, and we may at some point want to add a
trigger-based maintenance option; but the functionality available
with a trigger-based approach is almost entirely availaable in
PostgreSQL today without this feature.  Rewriting queries using
expressions which match the MV's query to pull from the MV instead
of the underlying tables is the exception.  While that is a "sexy"
feature, and I'm sure one can construct examples where it helps
performance, it seems to me unlikely to be very generally useful.
I suspect that it exists mostly so that people who want to write an
RFP to pick a particular product can include that as a requirement.
 In other words, I think the main benefit of automatic rewrite
using an MV is marketing, not technical or performance.  That's
important, too; but let's focus first on getting what is most
useful.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Kevin Grittner <kgrittn@ymail.com> wrote:

> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA

Given the short time, I left out the "[, ...]".  If people think
that is important to get into this release, a follow-on patch might
be possible.

> Barring objections, I will use the above and push tomorrow.

I'm still working on docs, and the changes related to the syntax
change are still only lightly tested, but as far as I know, all is
complete except for the docs.  I'm still working on those and
expect to have them completed late today.  I'm posting this patch
to allow a chance for final review of the code changes before I
push.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Materialized views WIP patch

From
Dean Rasheed
Date:
On 3 March 2013 13:12, Kevin Grittner <kgrittn@ymail.com> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> Kevin Grittner <kgrittn@ymail.com> wrote:
>
>>> [ ... ] led to this thought:
>>>
>>> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
>>
>> [Sorry to join this discussion so late]
>>
>> FWIW I had a quick look at other DBs to see if there were any
>> other precedents out there. Oracle was the only one I could find
>> with anything similar. They use the same creation syntax:
>>
>>   CREATE MATERIALIZED VIEW name [options] AS SELECT ...
>
> It is a pretty obvious choice when you look at other SQL
> statements.
>
>> and they use ALTER for everything else, such as refreshing the
>> MV:
>>
>>   ALTER MATERIALIZED VIEW name REFRESH [options];
>
> No, that is for specifiying when and under what conditions an
> automatic refresh is done.  To do an immediate action which is
> equivalent to what I have for the REFRESH statement, they use a
> REFRESH() function.  That seemed too incompatible with how we've
> done everything else in PostgreSQL -- I felt that a statement would
> make more sense.  Consider REINDEX, CLUSTER, and VACUUM FULL for
> example.
>
>> AFAICT the nearest thing they have to TRUNCATE/DISCARD is:
>>
>>   ALTER MATERIALIZED VIEW name CONSIDER FRESH;
>
> No, that doesn't rebuild or discard data -- if the MV is
> out-of-date and therefore unscannable according to the how the MV
> has been set up, this overrides that indication and allows scanning
> in spite of that.
>

Ah, OK I see.

I misunderstood what the Oracle docs were saying. ALTER only changes
the MV's definition, whereas their REFRESH() function and your REFRESH
statement updates the data in the MV. That makes much more sense.

Regards,
Dean



Re: Materialized views WIP patch

From
Nicolas Barbier
Date:
2013/3/3 Kevin Grittner <kgrittn@ymail.com>:

> Rewriting queries using
> expressions which match the MV's query to pull from the MV instead
> of the underlying tables is the exception.  While that is a "sexy"
> feature, and I'm sure one can construct examples where it helps
> performance, it seems to me unlikely to be very generally useful.
> I suspect that it exists mostly so that people who want to write an
> RFP to pick a particular product can include that as a requirement.
>  In other words, I think the main benefit of automatic rewrite
> using an MV is marketing, not technical or performance.

I think that automatically using materialized views even when the
query doesn’t mention them directly, is akin to automatically using
indexes without having to mention them in the query. That way, queries
can be written the natural way, and “creating materialized views” is
an optimization that can be applied by a DBA without having to update
the application queries to use them.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
> 2013/3/3 Kevin Grittner <kgrittn@ymail.com>:

>> Rewriting queries using expressions which match the MV's query
>> to pull from the MV instead of the underlying tables is the
>> exception.  While that is a "sexy" feature, and I'm sure one can
>> construct examples where it helps performance, it seems to me
>> unlikely to be very generally useful.  I suspect that it exists
>> mostly so that people who want to write an RFP to pick a
>> particular product can include that as a requirement.  In other
>> words, I think the main benefit of automatic rewrite using an MV
>> is marketing, not technical or performance.
>
> I think that automatically using materialized views even when the
> query doesn’t mention them directly, is akin to automatically
> using indexes without having to mention them in the query. That
> way, queries can be written the natural way, and “creating
> materialized views” is an optimization that can be applied by a
> DBA without having to update the application queries to use them.

Oh, I understand that concept perfectly well, I just wonder how
often it is useful in practice.  The cost of planning with indexes
tends to go up dramatically the planner needs to evaluate all
possible combinations of access paths.  We've devoted quite a bit
of energy keeping that from being something like the factorial of
the number of indexes.  If you now need to find all materialized
views which could substitute for parts of a query, and look at all
permutations of how those could be used, and which indexes can be
used for each of those combinations, you have planning time which
can explode to extreme levels.

Where the number of database objects are small and their sizes are
large (like some data warehouse situations), you could come out
ahead; and if I wanted to showcase the capability you describe
that's what I would use.  With a large number of database objects
with only a few tens of millions of rows per table, I doubt you
will come out ahead.

Granted, you could say the same thing about indexes, and they are
very often useful.  I'm saying that I expect the usefulness of the
technique you describe is generally very low, but not zero.  Except
for marketing, where it's a flashy feature.  I would be interested
in seeing information to show where it works well, though.  There
is probably something to be learned by looking at the details of
the environment and workload of such a site.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Greg Stark
Date:
On Sat, Mar 2, 2013 at 3:06 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>
> (1)  Any DML against the MV would need to be limited to some
> context fired by the underlying changes.  If we allow changes to
> the MV outside of that without it being part of some "updateable
> MV" feature (reversing the direction of flow of changes), the MV
> could not be trusted at all.  If you're going to do that, just use
> a table.

Oh! I misunderstood what you were suggesting. I think we were talking
at cross-purposes.

You're imagining a user issues truncate against the underlying
table(s) and the code that handles updating the materialized view will
need to issue a truncate against the MV to update it.

I was imagining that you wanted to be able to issue DML against the MV
just as one can against an updateable view. That DML should propagate
to the underlying table(s) through various magic.

It's a pretty theoretical fear now but one day it may be important to
avoid confusion between these two.

-- 
greg



Re: Materialized views WIP patch

From
Nicolas Barbier
Date:
2013/3/3 Kevin Grittner <kgrittn@ymail.com>:

> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>
>> I think that automatically using materialized views even when the
>> query doesn’t mention them directly, is akin to automatically
>> using indexes without having to mention them in the query. That
>> way, queries can be written the natural way, and “creating
>> materialized views” is an optimization that can be applied by a
>> DBA without having to update the application queries to use them.
>
> Oh, I understand that concept perfectly well, I just wonder how
> often it is useful in practice.  The cost of planning with indexes
> tends to go up dramatically the planner needs to evaluate all
> possible combinations of access paths.  We've devoted quite a bit
> of energy keeping that from being something like the factorial of
> the number of indexes.  If you now need to find all materialized
> views which could substitute for parts of a query, and look at all
> permutations of how those could be used, and which indexes can be
> used for each of those combinations, you have planning time which
> can explode to extreme levels.

I guess that a basic version of such a feature would do something like this:

(1) Check for each matview whether it simply consists of an optional
bunch of joins + optional aggregation + optional general filter
conditions (to support something akin to a partial index). If not, the
optimization doesn’t take this matview into account. This step can be
done beforehand.
(2) Check for each (sub)query in the query-to-optimize whether it does
the following (at a smart point in the optimization phase). If any of
these conditions are not met, don’t use this matview: - Joins at least the tables that are joined in the matview. -
Containsjoin conditions and general filter conditions that are at 
least as strict. - Doesn’t refer elsewhere to any attributes that the matview doesn’t contain.
(3) Always replace the corresponding query parts with the matview
(i.e., assume that the cost will always be lower than performing the
original query).
(4) If multiple matviews fit in step 3, try them all (and use the one
that yields the lower total cost).
(5) Always replace any aggregation with the corresponding
aggregation-results (if they exist) from the matview.

That doesn’t sound as if it would make planning time explode that much
(except, because of step 4, when there are many matviews that contain
overlapping sets of joined tables, and a query joins over the union of
those sets, *and* the replaceable-by-matviews parts are in
subqueries). It could even decrease it significantly (e.g., in the
case where a bunch of joins would be replaced with a scan of a
matview).

Also, I suppose that once such functionality exists, application
writers would be more inclined to write “heavy” queries that do lots
of aggregation even in an OLTP environment, of the kind that is these
days typically only done in OLAP environments.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: Materialized views WIP patch

From
Tom Lane
Date:
Nicolas Barbier <nicolas.barbier@gmail.com> writes:
> 2013/3/3 Kevin Grittner <kgrittn@ymail.com>:
>> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>>> I think that automatically using materialized views even when the
>>> query doesn’t mention them directly, is akin to automatically
>>> using indexes without having to mention them in the query.

>> Oh, I understand that concept perfectly well, I just wonder how
>> often it is useful in practice.

There's a much more fundamental reason why this will never happen, which
is that the query planner is not licensed to decide that you only want
an approximate and not an exact answer to your query.

If MVs were guaranteed always up-to-date, maybe we could think about
automatic use of them --- but that's a far different feature from what
Kevin has here.
        regards, tom lane



Re: Materialized views WIP patch

From
Josh Berkus
Date:
> There's a much more fundamental reason why this will never happen, which
> is that the query planner is not licensed to decide that you only want
> an approximate and not an exact answer to your query.

I think it would be worth talking about when someone wants to implement
it.  I'd imagine it would require setting a GUC, though, which would be
off by default for obvious reasosn.

> If MVs were guaranteed always up-to-date, maybe we could think about
> automatic use of them --- but that's a far different feature from what
> Kevin has here.

And of limited utility, as mentioned.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Kevin Grittner <kgrittn@ymail.com> wrote:

> I'm still working on docs, and the changes related to the syntax
> change are still only lightly tested, but as far as I know, all is
> complete except for the docs.  I'm still working on those and
> expect to have them completed late today.  I'm posting this patch
> to allow a chance for final review of the code changes before I
> push.

Pushed.

I'll be keeping an eye on the buildfarm.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Craig Ringer
Date:
On 03/04/2013 08:27 AM, Josh Berkus wrote:
>> There's a much more fundamental reason why this will never happen, which
>> is that the query planner is not licensed to decide that you only want
>> an approximate and not an exact answer to your query.
> I think it would be worth talking about when someone wants to implement
> it.  I'd imagine it would require setting a GUC, though, which would be
> off by default for obvious reasosn.
I'm not a fan of this, even with a GUC. Imagine doing remote debugging
by email/phone. There are enough things to check already ("does your
application REALLY commit that transaction?") without also having to
deal with settings that can cause a potentially out of date view of the
data to be used without it being visible in the query its self.

I hate to even say it, but this is where a per-query [redacted] would be
good, so we could say in the query text that this query may use matviews
that are not perfectly up to date.

At this point it's all hand-waving anyway, since no feature to allow the
planner to automatically rewrite a subtree of a query to use a matview
instead exists.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services




Re: Materialized views WIP patch

From
Euler Taveira
Date:
On 03-03-2013 21:27, Josh Berkus wrote:
> I think it would be worth talking about when someone wants to implement
> it.  I'd imagine it would require setting a GUC, though, which would be
> off by default for obvious reasosn.
> 
-1. Why not adding another storage_parameter, say auto_refresh=on? Also,
that's another feature.

> And of limited utility, as mentioned.
> 
For a first release, that is fine as is. Let's not complicate a feature that
has been widely discussed in this development cycle.


--   Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/  PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 



Re: Materialized views WIP patch

From
Simon Riggs
Date:
On 3 March 2013 23:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Nicolas Barbier <nicolas.barbier@gmail.com> writes:
>> 2013/3/3 Kevin Grittner <kgrittn@ymail.com>:
>>> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>>>> I think that automatically using materialized views even when the
>>>> query doesn’t mention them directly, is akin to automatically
>>>> using indexes without having to mention them in the query.
>
>>> Oh, I understand that concept perfectly well, I just wonder how
>>> often it is useful in practice.
>
> There's a much more fundamental reason why this will never happen, which
> is that the query planner is not licensed to decide that you only want
> an approximate and not an exact answer to your query.
>
> If MVs were guaranteed always up-to-date, maybe we could think about
> automatic use of them --- but that's a far different feature from what
> Kevin has here.

Its not a different feature, its what most people expect a feature
called MV to deliver. That's not a matter of opinion, its simply how
every other database works currently - Oracle, Teradata, SQLServer at
least. The fact that we don't allow MVs to automatically optimize
queries is acceptable, as long as that is clearly marked in some way
to avoid confusion, and I don't mean buried on p5 of the docs. What we
have here is a partial implementation that can be improved upon over
next few releases. I hope anyone isn't going to claim that
"Materialized Views" have been implemented in the release notes in
this release, because unqualified that would be seriously misleading
and might even stifle further funding to improve things to the level
already implemented elsewhere. Just to reiterate, I fully support the
committing of this partial feature into Postgres in this release
because it will be a long haul to complete the full feature and what
we have here is a reasonable stepping stone to get there.

Transactionally up-yo-date MVs can be used like indexes in the
planner. The idea that this is impossible because of the permutations
involved is somewhat ridiculous; there is much published work on
optimising that and some obvious optimisations. Clearly that varies
according to the number of MVs and the number of tables they touch,
not the overall complexity of the query. The overhead is probably same
or less as partial indexes, which we currently think is acceptable. In
any case, if you don't wish that overhead, don't use MVs.

Non-transactionally up-to-date MVs could also be used like indexes in
the planner, if we gave the planner the "licence" it (clearly) lacks.
If using MV makes a two-hour query return in 1 minute, then using an
MV that is 15 minutes out of date is likely to be a win. The "licence"
is some kind of user parameter/option that specifies how stale an
answer a query can return. For many queries that involve averages and
sums, a stale or perhaps an approximate answer would hardly differ
anyway. So I think there is room somewhere there for a "staleness"
time specification by the user, allowing approximation.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> On 3 March 2013 23:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Nicolas Barbier <nicolas.barbier@gmail.com> writes:
>>> 2013/3/3 Kevin Grittner <kgrittn@ymail.com>:
>>>> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>>>>> I think that automatically using materialized views even when
>>>>> the query doesn’t mention them directly, is akin to
>>>>> automatically using indexes without having to mention them in
>>>>> the query.
>>
>>>> Oh, I understand that concept perfectly well, I just wonder
>>>> how often it is useful in practice.
>>
>> There's a much more fundamental reason why this will never
>> happen, which is that the query planner is not licensed to
>> decide that you only want an approximate and not an exact answer
>> to your query.
>>
>> If MVs were guaranteed always up-to-date, maybe we could think
>> about automatic use of them --- but that's a far different
>> feature from what Kevin has here.
>
> Its not a different feature, its what most people expect a
> feature called MV to deliver. That's not a matter of opinion, its
> simply how every other database works currently - Oracle,
> Teradata, SQLServer at least. The fact that we don't allow MVs to
> automatically optimize queries is acceptable, as long as that is
> clearly marked in some way to avoid confusion, and I don't mean
> buried on p5 of the docs. What we have here is a partial
> implementation that can be improved upon over next few releases.
> I hope anyone isn't going to claim that "Materialized Views" have
> been implemented in the release notes in this release, because
> unqualified that would be seriously misleading and might even
> stifle further funding to improve things to the level already
> implemented elsewhere. Just to reiterate, I fully support the
> committing of this partial feature into Postgres in this release
> because it will be a long haul to complete the full feature and
> what we have here is a reasonable stepping stone to get there.
>
> Transactionally up-yo-date MVs can be used like indexes in the
> planner. The idea that this is impossible because of the
> permutations involved is somewhat ridiculous; there is much
> published work on optimising that and some obvious optimisations.
> Clearly that varies according to the number of MVs and the number
> of tables they touch, not the overall complexity of the query.
> The overhead is probably same or less as partial indexes, which
> we currently think is acceptable. In any case, if you don't wish
> that overhead, don't use MVs.
>
> Non-transactionally up-to-date MVs could also be used like
> indexes in the planner, if we gave the planner the "licence" it
> (clearly) lacks. If using MV makes a two-hour query return in 1
> minute, then using an MV that is 15 minutes out of date is likely
> to be a win. The "licence" is some kind of user parameter/option
> that specifies how stale an answer a query can return. For many
> queries that involve averages and sums, a stale or perhaps an
> approximate answer would hardly differ anyway. So I think there
> is room somewhere there for a "staleness" time specification by
> the user, allowing approximation.

I don't think I disagree with any of what Simon says other than his
feelings about the planning cost.  Imagine that there are ten MVs
that might apply to a complex query, but some of them are mutually
exclusive, so there are a large number of permutations of MVs which
could be used to replace parts of the original query.  And maybe
some of base tables have indexes which could reduce execution cost
which aren't present in some or all of the MVs.  And each MV has a
number of indexes.  The combinatorial explosion of possible plans
would make it hard to constrain plan time without resorting to some
crude rules about what to choose.  That's not an unsolvable
problem, but I see it as a pretty big problem.

I have no doubt that someone could take a big data warehouse and
add one or two MVs and show a dramatic improvement in the run time
of a query.  Almost as big as if the query were rewritten to usee
the MV directly.  It would make for a very nice presentation, and
as long as they are used sparingly this could be a useful tool for
a data warehouse environment which is playing with alternative ways
to optimize slow queries which pass a lot of data.  In other
environments, I feel that it gets a lot harder to show a big win.

The good news is that it sounds like we agree on the ideal
long-term feature set.  I'm just a lot more excited, based on the
use-cases I've seen, about the addition of incremental updates than
substituting MVs into query plans which reference the underlying
tables.  Perhaps that indicates a chance to the final feature set
sooner, through everyone scratching their own itches.  :-)

And we both seem to feel that some system for managing acceptable
levels of MV "freshness" is a necessary feature in order to go very
much further.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Simon Riggs
Date:
On 5 March 2013 12:15, Kevin Grittner <kgrittn@ymail.com> wrote:

> I don't think I disagree with any of what Simon says other than his
> feelings about the planning cost.  Imagine that there are ten MVs
> that might apply to a complex query, but some of them are mutually
> exclusive, so there are a large number of permutations of MVs which
> could be used to replace parts of the original query.  And maybe
> some of base tables have indexes which could reduce execution cost
> which aren't present in some or all of the MVs.  And each MV has a
> number of indexes.  The combinatorial explosion of possible plans
> would make it hard to constrain plan time without resorting to some
> crude rules about what to choose.  That's not an unsolvable
> problem, but I see it as a pretty big problem.

If we are proposing that we shouldn't try to optimise because its not
usefully solvable, then I would disagree.

If we are saying that more plans are possible with MVs, then I'd say,
yes there *could* be - that's the one of the purposes. That represents
more options for optimisation and we should be happy, not sad about
that. Yes, we would need some thought to how those potential
optimisations can be applied without additional planning cost, but I
see that as a long term task, not a problem. The question is will the
potential for additional planning time actually materialise into a
planning problem? (See below).

> I have no doubt that someone could take a big data warehouse and
> add one or two MVs and show a dramatic improvement in the run time
> of a query.  Almost as big as if the query were rewritten to usee
> the MV directly.  It would make for a very nice presentation, and
> as long as they are used sparingly this could be a useful tool for
> a data warehouse environment which is playing with alternative ways
> to optimize slow queries which pass a lot of data.  In other
> environments, I feel that it gets a lot harder to show a big win.

Are there realistically going to be more options to consider? In
practice, no, because in most cases we won't be considering both MVs
and indexes.

Splatting MVs on randomly won't show any more improvement than
splatting indexes on randomly helps. Specific optimisations help in
specific cases only. And of course, adding extra data structures that
have no value certainly does increase planning time. Presumably we'd
need some way of seeing how frequently MVs were picked, so we could
drop unused MVs just like we can indexes.

* Indexes are great at speeding up various kinds of search queries. If
you don't have any queries like that, they help very little.

* MVs help in specific and restricted use cases, but can otherwise be
thought of as a new kind of index structure. MVs help with joins and
aggregations, so if you don't do much of that, you'll see no benefit.

That knowledge also allows us to develop heuristics for sane
optimisation. If the MV has a GROUP BY in it, and a query doesn't,
then it probably won't help much to improve query times. If it
involves a join you aren't using, then that won't help either. My
first guess would be that we don't even bother looking for MV plans
unless it has an aggregated result, or a large scale join. We do
something similar when we look for plans that use indexes when we have
appropriate quals - no quals, no indexes.

As a result, I don't see MVs increasing planning times for requests
that would make little or no use of them. There will be more planning
time on queries that could make use of them and that is good because
we really care about that.

Sure, a badly written planner might cost more time than it saves. All
of this work requires investment from someone with the time and/or
experience to make a good go at it. I'm not pushing Tom towards it,
nor anyone else, but I do want to see the door kept open for someone
to do this when possible (i.e. not GSoC).

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Materialized views WIP patch

From
Josh Berkus
Date:
Simon, Kevin, all:

Actually, there was already an attempt at automated MV query planning as
a prior university project.  We could mine that for ideas.

Hmmm.  I thought it was on pgfoundry, but it's not.  Does anyone have
access to ACM databases etc. so they could search for this?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> I don't think I disagree with any of what Simon says other than his
> feelings about the planning cost.  Imagine that there are ten MVs
> that might apply to a complex query, but some of them are mutually
> exclusive, so there are a large number of permutations of MVs which
> could be used to replace parts of the original query.  And maybe
> some of base tables have indexes which could reduce execution cost
> which aren't present in some or all of the MVs.  And each MV has a
> number of indexes.  The combinatorial explosion of possible plans
> would make it hard to constrain plan time without resorting to some
> crude rules about what to choose.  That's not an unsolvable
> problem, but I see it as a pretty big problem.

I'm not sure I agree.  Suppose you have a query like SELECT * FROM a
INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE <some
stuff>.  The query planner will construct paths for scans on a, b, and
c.  Then it will construct joinrels for (a b), (a c), (b c), and
eventually (a b c) and calculate a set of promising paths for each of
them.  If there is a materialized view available for one of those
joinrels, all we really need to do is add the possible paths for
scanning that materialized view to the joinrel.  They'll either be
better than the existing paths, or they won't.  If they're better, the
paths that otherwise would have gotten chosen will get kicked out.  If
they're worse, the materialized-view paths will get kicked out.
Either way, we don't have any more paths than we would have otherwise
- so no combinatorial explosion.

There is one case where we might end up with more paths than we had
before.  Suppose there's a materialized view on the query SELECT a.x,
a.y, a.z, b.t FROM a INNER JOIN b ON a.x = b.x ORDER BY a.z and the
users enters just that query.  Suppose further that the materialized
view has an index on column z, but table a does not.  In that case,
the best paths for the joinrel (a b) not involving the materialized
view will be an unordered path, but we could scan the materialized
view using the index and so will have a path that is ordered by a.z to
add to the joinrel.  This path will stick around even if it's more
expensive than the unordered path because we know it avoids a final
sort.  So in that case we do have more paths, but they are potentially
useful paths, so I don't see that as a problem.

It seems to me that the tricky part of this is not that it might add a
lot more paths (because I don't think it will, and if it does I think
they're useful paths), but that figuring out whether or not a
materialized view matches any particular joinrel might be expensive.
I think we're going to need a pretty accurate heuristic for quickly
discarding materialized views that can't possibly be relevant to the
query as a whole, or to particular joinrels.  There are a couple of
possible ways to approach that.  The most manual of these is probably
to have a command like ALTER TABLE x {ENABLE | DISABLE } REWRITE
MATERIALIZED y, where the user has to explicitly ask for materialized
views to be considered, or else they aren't.  That sort of
fine-grained control might have other benefits as well.

I think a more automated solution is also possible, if we want it.
For a materialized view to match a query, all of the baserels in the
materialized view must also be present in the query.  (Actually, there
are situations where this isn't true; e.g. the materialized view has
an extra table, but it's joined in a way that could be pruned away by
the join removal code, but I think we could ignore such
somewhat-pathological cases at least initially.)  It seems to me that
if we could figure out a very-cheap way to throw away all of the
materialized views that don't pass that basic test, we'd be reasonably
close to a workable solution.  A database with tons of materialized
views defined on the same set of target relations might still have
some planning time problems, but so might a database with tons of
indexes.

In that regard, what I was thinking about is to use something sort of
like a Bloom filter.  Essentially, produce a "fingerprint" for each
materialized view.  For the sake of argument, let's say that the
fingerprint is a 64-bit integer, although it could be a bit string of
any length.  To construct the fingerprint, hash the OID of each
relation involved in the view onto a bit position between 0 and 63.
Set the bits for all relations involved in the materialized view.
Then, construct a fingerprint for the query in the same way.  Any
materialized view where (matview_fingerprint & query_fingerprint) !=
matview_fingerprint needn't be considered; moreover, for a given
joinrel, any materialized view where matview_fingerprint !=
joinrel_fingerprint (computed using the same scheme) needn't be
considered.  Of course, a matching fingerprint doesn't mean that the
materialized view matches, or even necessarily that it involves the
correct set of relations, so there's a lot more double-checking that
has to be done afterwards before you can decide that the view is in
fact a match, but at least it's a fast way to throw away some large
percentage of materialized views that are definitely NOT relevant to
the query, and only do more detailed evaluation on the ones that might
be matches.  The chances of false positives can be reduced if you care
to make the fingerprints bigger (and thus more expensive to compare).

All that having been said, it's hard for me to imagine that anyone
really cares about any of this until we have an incremental update
feature, which right now we don't.  Actually, I'm betting that's going
to be significantly harder than automatic-query-rewrite, when all is
said and done.  Automatic-query-rewrite, if and when we get it, will
not be easy and will require a bunch of work from someone with a good
understanding of the planner, but it strikes me as the sort of thing
that might work out to one large project and then it's done.  Whereas,
incremental update sounds to me like a series of projects over a
series of releases targeting various special cases, where we can
always point to some improvements vs. release N-1 but we're never
actually done and able to move on to the next thing.  As a roadmap
goes, I think that's OK.  Even a reasonably simplistic and partial
implementation of incremental update will benefit a lot of users.  But
in terms of relative difficulty, it's not at all obvious to me that
that's the easier part of the project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Josh Berkus <josh@agliodbs.com> wrote:

There is no shortage of literature on the topic, although any
papers from the ACM could certainly be of interest due to the
general high quality of papers published there.  Adding anything
like this to 9.3 is clearly out of the question, though, so I
really don't want to spend time researching this now, or
encouraging others to do so until after we have a 9.3 release
candidate.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> All that having been said, it's hard for me to imagine that
> anyone really cares about any of this until we have an
> incremental update feature, which right now we don't.

These are actually independent of one another, as long as we nail
down how we're determining "freshness" -- which is probably needed
for either.  Someone who's immersed in tuning long-running DW
queries might be interested in this before incremental update.
(They might load the data once per month, so refreshing the MVs as
a step in that process might be cheaper than incrementally
maintaining them.)  Someone could base "freshness" on
pg_relation_is_scannable() and start working on automatic query
rewrite right now, if they wanted to.

> Actually, I'm betting that's going to be significantly harder
> than automatic-query-rewrite, when all is said and done.
> Automatic-query-rewrite, if and when we get it, will not be easy
> and will require a bunch of work from someone with a good
> understanding of the planner, but it strikes me as the sort of
> thing that might work out to one large project and then it's
> done.

I still think we're going to hit the wall on planning time under
certain circumstances and need to tweak that over the course of
several releases, but now is not the time to get into the details
of why I think that.  We've spent way too much time on it already
for the point we're at in the 9.3 cycle.  I've kept my concerns
hand-wavy on purpose, and am trying hard to resist the temptation
to spend a lot of time demonstrating the problems.

> Whereas, incremental update sounds to me like a series of
> projects over a series of releases targeting various special
> cases, where we can always point to some improvements vs. release
> N-1 but we're never actually done

Exactly.  I predict that we will eventually have some special sort
of trigger for maintaining MVs based on base table changes to
handle the ones that are just too expensive (in developer time or
run time) to fully automate.  But there is a lot of low-hanging
fruit for automation.

> Even a reasonably simplistic and partial implementation of
> incremental update will benefit a lot of users.

Agreed.

> But in terms of relative difficulty, it's not at all obvious to
> me that that's the easier part of the project.

I totally agree that getting something working to use MVs in place
of underlying tables is not all that different or more difficult
than using partial indexes.  I just predict that we'll get a lot of
complaints about cases where it results in worse performance and
we'll need to deal with those issues.  I don't seem that as being
brain surgery; just a messy matter of trying to get this pretty
theory to work well in the real world -- probably using a bunch of
not-so-elegant heuristics.  And in the end, the best you can hope
for is performance not noticeably worse than you would get if you
modified your query to explicitly use the MV(s) -- you're just
saving yourself the rewrite.  Well, OK, there is the point that,
(like indexes) if you run the query which hits the base tables with
different parameters, and a new plan is generated each time, it
might pick different MVs or exclude them as is most efficient for
the given parameters.  That's the Holy Grail of all this.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Nicolas Barbier
Date:
2013/3/5 Robert Haas <robertmhaas@gmail.com>:

> All that having been said, it's hard for me to imagine that anyone
> really cares about any of this until we have an incremental update
> feature, which right now we don't.  Actually, I'm betting that's going
> to be significantly harder than automatic-query-rewrite, when all is
> said and done.

I agree.

E.g., things such as keeping a matview consistent relative to changes
applied to the base tables during the same transaction, might be
mightily difficult to implement in a performant way. OTOH, matviews
that can only be used for optimization if their base tables were not
changed “too recently” (e.g., by transactions that are still in
flight, including the current transaction), are probably kind of
useful in themselves as long as those base tables are not updated all
the time.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: Materialized views WIP patch

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> I don't think I disagree with any of what Simon says other than his
>> feelings about the planning cost.

> I'm not sure I agree.  Suppose you have a query like SELECT * FROM a
> INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE <some
> stuff>.  The query planner will construct paths for scans on a, b, and
> c.  Then it will construct joinrels for (a b), (a c), (b c), and
> eventually (a b c) and calculate a set of promising paths for each of
> them.  If there is a materialized view available for one of those
> joinrels, all we really need to do is add the possible paths for
> scanning that materialized view to the joinrel.

That only works to the extent that a materialized view can be described
by a path.  My impression is that most of the use-cases for MVs will
involve aggregates or similar data reduction operators, and we don't
currently implement anything about aggregates at the Path level.
Arguably it would be useful to do so; in particular, we could get rid
of the currently hard-wired mechanism for choosing between sorted and
hashed aggregation, and perhaps there'd be a less grotty way to deal
with index-optimized MIN/MAX aggregates.  But there's a great deal to do
to make that happen, and up to now I haven't seen any indication that it
would do much except add overhead.

FWIW, my opinion is that doing anything like this in the planner is
going to be enormously expensive.  Index matching is already pretty
expensive, and that has the saving grace that we only do it once per
base relation.  Your sketch above implies trying to match to MVs once
per considered join relation, which will be combinatorially worse.
Even with a lot of sweat over reducing the cost of the matching, it
will hurt.

> All that having been said, it's hard for me to imagine that anyone
> really cares about any of this until we have an incremental update
> feature, which right now we don't.

Agreed.  Even if we're willing to have an "approximate results are OK"
GUC (which frankly strikes me as a horrid idea), people would certainly
not be willing to turn it on without some guarantee as to how stale the
results could be.
        regards, tom lane



Re: Materialized views WIP patch

From
Nicolas Barbier
Date:
2013/3/5 Kevin Grittner <kgrittn@ymail.com>:

> Exactly.  I predict that we will eventually have some special sort
> of trigger for maintaining MVs based on base table changes to
> handle the ones that are just too expensive (in developer time or
> run time) to fully automate.  But there is a lot of low-hanging
> fruit for automation.

I think it would be totally OK to restrict the possible definitions
for matviews that can be maintained fully incrementally to something
like:

SELECT attributes and aggregations FROM trivial joins WHERE trivial
condition GROUP BY attributes;

Those definitions are the most useful for optimizing the things that
matviews are good at (joins and aggregation).

Nicolas

PS. Sorry for having fired off this discussion that obviously doesn’t
really relate to the current patch.

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Nicolas Barbier <nicolas.barbier@gmail.com> wrote:

> PS. Sorry for having fired off this discussion that obviously
> doesn’t really relate to the current patch.

I know it's hard to resist.  While I think there will be a number
of people for whom the current patch will be a convenience and will
therefore use it, it is hard to look at what's there and *not* go
"if only it also..."

Perhaps it would be worth looking for anything in the patch that
you think might be painting us into a corner where it would be hard
to do all the other cool things.  While it's late enough in the
process that changing anything like that which you find would be
painful, it might be a lot more painful later if we release without
doing something about it.  My hope, of course, is that you won't
find any such thing.  With this patch I've tried to provide a
minimal framework onto which these other things can be bolted.
I've tried hard not to do anything which would make it hard to
extend, but new eyes may see something I missed.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Josh Berkus
Date:
On 03/05/2013 01:09 PM, Kevin Grittner wrote:
> Josh Berkus <josh@agliodbs.com> wrote:
> 
> There is no shortage of literature on the topic, although any
> papers from the ACM could certainly be of interest due to the
> general high quality of papers published there.  Adding anything
> like this to 9.3 is clearly out of the question, though, so I
> really don't want to spend time researching this now, or
> encouraging others to do so until after we have a 9.3 release
> candidate.

Good point.

Just FYI: once we start work on 9.4, some university team got planner
stuff for matviews working with postgres, using version 8.0 or something.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Materialized views WIP patch

From
Tatsuo Ishii
Date:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>
>> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
>
> Given the short time, I left out the "[, ...]".  If people think
> that is important to get into this release, a follow-on patch might
> be possible.
>
>> Barring objections, I will use the above and push tomorrow.
>
> I'm still working on docs, and the changes related to the syntax
> change are still only lightly tested, but as far as I know, all is
> complete except for the docs.  I'm still working on those and
> expect to have them completed late today.  I'm posting this patch
> to allow a chance for final review of the code changes before I
> push.

Was the remaining work on docs done? I would like to test MVs and am
waiting for the docs completed.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp



Re: Materialized views WIP patch

From
Simon Riggs
Date:
On 5 March 2013 22:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> FWIW, my opinion is that doing anything like this in the planner is
> going to be enormously expensive.  Index matching is already pretty
> expensive, and that has the saving grace that we only do it once per
> base relation.  Your sketch above implies trying to match to MVs once
> per considered join relation, which will be combinatorially worse.
> Even with a lot of sweat over reducing the cost of the matching, it
> will hurt.

As we already said: no MVs => zero overhead => no problem. It costs in
the cases where time savings are possible and not otherwise. The type
of queries and their typical run times are different to the OLTP case,
so any additional planning time is likely to be acceptable. I'm sure
we can have a deep disussion about how to optimise the planner for
this; I'm pretty sure there are reasonable answers to the not-small
difficulties along that road.

Most importantly, I want to make sure we don't swing the door shut on
improvements here, especially if you (Tom) are not personally
convinced enough to do the work yourself.

Making transactional MVs work would be in part justified by the
possible existence of automatic lookaside planning, so yes, the two
things are not linked but certainly closely related.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Materialized views WIP patch

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 5 March 2013 22:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> FWIW, my opinion is that doing anything like this in the planner is
>> going to be enormously expensive.

> As we already said: no MVs => zero overhead => no problem.

Well, in the first place that statement is false on its face: we'll
still spend cycles looking for relevant MVs, or at least maintaining a
complexly-indexed cache that helps us find out that there are none in
a reasonable amount of time.  In the second place, even if it were
approximately true it wouldn't help the people who were using MVs.

> It costs in
> the cases where time savings are possible and not otherwise.

And that is just complete nonsense: matching costs whether you find a
match or not.  Could we have a little less Pollyanna-ish optimism and
a bit more realism about the likely cost of such a feature?
        regards, tom lane



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Tatsuo Ishii <ishii@postgresql.org> wrote:

> Was the remaining work on docs done? I would like to test MVs and
> am waiting for the docs completed.

I think they are done.  If you notice anything missing or in need
of clarification please let me know.  At this point missing docs
would be a bug in need of a fix.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Peter Eisentraut
Date:
Kevin,

I haven't seen a reply to this.  Were you able to give my notes below
any consideration?


On 2/15/13 12:44 PM, Peter Eisentraut wrote:
> On 1/25/13 1:00 AM, Kevin Grittner wrote:
>> New patch rebased, fixes issues raised by Thom Brown, and addresses
>> some of your points.
> 
> This patch doesn't apply anymore, so I just took a superficial look.  I
> think the intended functionality and the interfaces look pretty good.
> Documentation looks complete, tests are there.
> 
> I have a couple of notes:
> 
> * What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED.
> It might be better to use that as well then.
> 
> * You use fields named relkind in the parse nodes, but they don't
> actually contain relkind values, which is confusing.  I'd just name the
> field is_matview or something.
> 
> * More generally, I wouldn't be so fond of combining the parse handling
> of CREATE TABLE AS and CREATE MATERIALIZED VIEW.  They are similar, but
> then again so are a lot of other things.
> 
> * Some of the terminology is inconsistent.  A materialized view is
> sometimes called valid, populated, or built, with approximately the same
> meaning.  Personally, I would settle on "built", as per above, but it
> should be one term only.
> 
> * I find the name of the relisvalid column a bit confusing.  Especially
> because it only applies to materialized views, and there is already a
> meaning of "valid" for indexes.  (Recall that indexes are also stored in
> pg_class, but they are concerned about indisvalid.)  I would name it
> something like relmvbuilt.
> 
> 
> Btw., half of the patch seems to consist of updating places referring to
> relkind.  Is something wrong with the meaning of relkind that this sort
> of thing is required?  Maybe these places should be operating in terms
> of features, not accessing relkind directly.
> 
> 
> 




Re: Materialized views WIP patch

From
Greg Stark
Date:
On Tue, Mar 5, 2013 at 9:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> All that having been said, it's hard for me to imagine that anyone
> really cares about any of this until we have an incremental update
> feature, which right now we don't.  Actually, I'm betting that's going
> to be significantly harder than automatic-query-rewrite, when all is
> said and done.  Automatic-query-rewrite, if and when we get it, will
> not be easy and will require a bunch of work from someone with a good
> understanding of the planner, but it strikes me as the sort of thing
> that might work out to one large project and then it's done.  Whereas,
> incremental update sounds to me like a series of projects over a
> series of releases targeting various special cases, where we can
> always point to some improvements vs. release N-1 but we're never
> actually done and able to move on to the next thing.  As a roadmap
> goes, I think that's OK.  Even a reasonably simplistic and partial
> implementation of incremental update will benefit a lot of users.  But
> in terms of relative difficulty, it's not at all obvious to me that
> that's the easier part of the project.

While true that's true for a lot of Postgres features. The only ones
that are one-shot projects are buried deep in the internals. Anything
with UI implications inevitably has limitations and then other people
come along and and work on removing or extending those features.

I do agree with Tom though -- the most frequently asked for
materialized view in the past has always been "select count(*) from
tab". People assume we already do this and are surprised when we
don't. The cookie cutter solution for it is basically exactly what a
incrementally updated materialized view  solution would look like
(with the queue of updates with transacion information that are
periodically flattened into the aggregate). Rewriting this might be a
bit tricky and require heuristics to determine just how much work to
expend trying to match materialized views, this type of view would be
where most of the win would be.

I also can't see implementing query rewriting for
non-transactionally-accurate materialized views. If people want a
snapshot of the data that may be out of date that's great. I can tons
of use cases for that. But then surely they won't be surprised to have
to query the snapshot explicitly. If can't see going to all this
trouble to implement transactions and snapshots and wal logging and so
on and then silently rewriting queries to produce data that is not up
to date. I think users would be surprised to find bog-standard SQL
occasionally producing "incorrect" results.

That said, there are cases where snapshots might be up to date even
though we don't implement any incremental updates. If the underlying
data is read-only or hasn't received any update commits since the
snapshot was taken then it might still be useful. There are tons of
ETL applications where you load the data once and then build MVs for
it and  never touch the underlying data again.

-- 
greg



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Kevin Grittner <kgrittn@ymail.com> wrote:
> Tatsuo Ishii <ishii@postgresql.org> wrote:
>
>> Was the remaining work on docs done? I would like to test MVs and
>> am waiting for the docs completed.
>
> I think they are done.  If you notice anything missing or in need
> of clarification please let me know.  At this point missing docs
> would be a bug in need of a fix.

I decided to take another pass through to try to spot anyplace I
might have missed.  I found that I had missed documenting the new
pg_matviews system view, so I have just pushed that.

I also think that something should be done about the documentation
for indexes.  Right now that always refers to a "table".  It would
clearly be awkward to change that to "table or materialized view"
everywhere.  I wonder if most of thosse should be changed to
"relation" with a few mentions that the relation could be a table
or a materialized view, or whether some less intrusive change would
be better.  Opinions welcome.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
"David E. Wheeler"
Date:
On Mar 6, 2013, at 1:51 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

> I also think that something should be done about the documentation
> for indexes.  Right now that always refers to a "table".  It would
> clearly be awkward to change that to "table or materialized view"
> everywhere.  I wonder if most of thosse should be changed to
> "relation" with a few mentions that the relation could be a table
> or a materialized view, or whether some less intrusive change would
> be better.  Opinions welcome.

Isn’t a materialized view really just a table that gets updated periodically? And isn’t a non-matierialized view also
thoughtof as a “relation”? 

If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the
termincludes materialized views (and excludes foreign tables). 

Best,

David


Re: Materialized views WIP patch

From
Tatsuo Ishii
Date:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> Tatsuo Ishii <ishii@postgresql.org> wrote:
>>
>>> Was the remaining work on docs done? I would like to test MVs and
>>> am waiting for the docs completed.
>>
>> I think they are done.  If you notice anything missing or in need
>> of clarification please let me know.  At this point missing docs
>> would be a bug in need of a fix.

Ok.

> I decided to take another pass through to try to spot anyplace I
> might have missed.  I found that I had missed documenting the new
> pg_matviews system view, so I have just pushed that.
>
> I also think that something should be done about the documentation
> for indexes.  Right now that always refers to a "table".  It would
> clearly be awkward to change that to "table or materialized view"
> everywhere.  I wonder if most of thosse should be changed to
> "relation" with a few mentions that the relation could be a table
> or a materialized view, or whether some less intrusive change would
> be better.  Opinions welcome.

You might want to add small description about MVs to Tutorial
documentation "3.2 Views". Here is the current description of views in
the doc.

---------------------------------------------------------------------
3.2. Views

Refer back to the queries in Section 2.6. Suppose the combined listing
of weather records and city location is of particular interest to your
application, but you do not want to type the query each time you need
it. You can create a view over the query, which gives a name to the
query that you can refer to like an ordinary table:

CREATE VIEW myview AS   SELECT city, temp_lo, temp_hi, prcp, date, location       FROM weather, cities       WHERE city
=name; 

SELECT * FROM myview;

Making liberal use of views is a key aspect of good SQL database
design. Views allow you to encapsulate the details of the structure of
your tables, which might change as your application evolves, behind
consistent interfaces.

Views can be used in almost any place a real table can be
used. Building views upon other views is not uncommon.
---------------------------------------------------------------------
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp



Re: Materialized views WIP patch

From
Simon Riggs
Date:
On 6 March 2013 14:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 5 March 2013 22:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> FWIW, my opinion is that doing anything like this in the planner is
>>> going to be enormously expensive.
>
>> As we already said: no MVs => zero overhead => no problem.
>
> Well, in the first place that statement is false on its face: we'll
> still spend cycles looking for relevant MVs, or at least maintaining a
> complexly-indexed cache that helps us find out that there are none in
> a reasonable amount of time.  In the second place, even if it were
> approximately true it wouldn't help the people who were using MVs.

We can store info in the relcache, and reduce such a lookup to a
simple if test in the planner. Populating the cache would be easy
enough, approx same overhead as deriving list of constraints for the
relcache.

If you were using MVs, there are further heuristics to apply. MVs come
in various shapes, so we can assess whether they use aggregates,
joins, filters etc and use that for a general match against a query. I
don't see the need for complex assessments in every case.


>> It costs in
>> the cases where time savings are possible and not otherwise.
>
> And that is just complete nonsense: matching costs whether you find a
> match or not.  Could we have a little less Pollyanna-ish optimism and
> a bit more realism about the likely cost of such a feature?

It's not a trivial feature; this is a lot of work. But it can be done
efficiently, without significant effect on other workloads. If that
really were to be true, then enable_lookaside = off can be the
default, just as we have for another costly planning feature,
constraint_exclusion.

Matview lookaside is the next-best-action for further work on the
planner, AFAICS. Correctly optimised query parallelism is harder,
IMHO.

What I'm hearing at the moment is "please don't make any changes in my
area" or "don't climb the North face". Considering the rather high bar
to being able to do this effectively, I do understand your interest in
not having your/our time wasted by casual attempts to approach the
problem, but I don't want to slam the door on a serious attempt (2
year project, 1+ man year effort).

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
David E. Wheeler <david@justatheory.com> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>
>> I also think that something should be done about the
>> documentation for indexes.  Right now that always refers to a
>> "table".  It would clearly be awkward to change that to "table
>> or materialized view" everywhere.  I wonder if most of thosse
>> should be changed to "relation" with a few mentions that the
>> relation could be a table or a materialized view, or whether
>> some less intrusive change would be better.  Opinions welcome.
>
> Isn’t a materialized view really just a table that gets updated
> periodically?

Not exactly.  It is a relation which has some characteristics of a
view (including an entry in pg_rewrite exactly like that for a
view) and some characteristics of a table (including a heap and
optional indexes).  Whether it looks more like a table or more like
a view depends on how you tilt your head.  You could just as easily
say that it is really just a view which periodically caches its
results on disk.  They really are "their own thing".  As future
releases add more subtle "freshness" concepts, incremental updates,
and query rewrite that unique identity will become even more
conspicuous, I think.

> And isn’t a non-matierialized view also thought of as a
> “relation”?

Yes.  Tables, views, and materialized views are all relations.

> If the answer to both those questions is “yes,” I think the term
> should remain “table,” with a few mentions that the term includes
> materialized views (and excludes foreign tables).

And if the answers are "not exactly" and "yes"?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
"David E. Wheeler"
Date:
On Mar 7, 2013, at 7:55 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

>> If the answer to both those questions is “yes,” I think the term
>> should remain “table,” with a few mentions that the term includes
>> materialized views (and excludes foreign tables).
>
> And if the answers are "not exactly" and "yes"?

I still tend to think that the term should remain “table,” with brief mentions at the top of pages when the term should
beassumed to represent tables and matviews, and otherwise required disambiguations. 

Trying to make the least possible work for you here. :-)

David




Re: Materialized views WIP patch

From
Nicolas Barbier
Date:
2013/3/5 Kevin Grittner <kgrittn@ymail.com>:

> Perhaps it would be worth looking for anything in the patch that
> you think might be painting us into a corner where it would be hard
> to do all the other cool things.  While it's late enough in the
> process that changing anything like that which you find would be
> painful, it might be a lot more painful later if we release without
> doing something about it.  My hope, of course, is that you won't
> find any such thing.  With this patch I've tried to provide a
> minimal framework onto which these other things can be bolted.
> I've tried hard not to do anything which would make it hard to
> extend, but new eyes may see something I missed.

(Without having looked at the patch, or even the documentation :-/.)

I think that something that might prove useful is the following:
Keeping in mind the possibility of storing something in the matview’s
heap that doesn’t correspond to what a SELECT * FROM matview would
yield (i.e., the “logical content”). The transformation could be
performed by an INSTEAD rule (similar to how a view is expanded to its
definition, a reference to a matview would expand to its heap content
transformed to the “logical content”).

(Note that I don’t have any reason to believe that the current
implementation would make this more difficult than it should be.)

<ridiculously long rationale for the previous>

(All the following requires making matviews (inter- and intra-)
transactionally up-to-date w.r.t. their base tables at the moment of
querying them. I don’t deal with approximate results, however useful
that might be.)

I think that the possibility of optimizing COUNT(*) (see mail by Greg
Stark in this thread with “the queue of updates with transacion
information that are periodically flattened into the aggregate”) can
be generalized to generic aggregation that way. The idea would be that
a transaction that adds (or deletes or updates) a row in a base table
causes a “delta” row version in the matview. Selecting from the
matview then merges these deltas into one value (for each row that is
logically present in the matview). Every once in a while (or rather
quite often, if the base tables change often), a VACUUM-like clean-up
operations must be run to merge all rows that are “old enough” (i.e.,
whose transactions are not in flight anymore).

Example of trivial aggregation matview weight_per_kind defined as:

SELECT kind, SUM(weight) FROM fruit GROUP BY kind;

The matview would then physically contain rows such as:

xmin, xmax, kind, weight
1000, 0, 'banana', 123
1000, 0, 'apple', 1
1001, 0, 'banana', 2
1002, 0, 'banana', -3

Which means:

* tx 1000 probably performed a clean-up operation and merged a bunch
of banana rows together to yield 123; it also inserted an apple of
weight 1.
* tx 1001 inserted a banana of weight 2. Any clean-up operation coming
by could not merge the 2 into the first row, as long as tx 1000 is in
flight. Otherwise, it would yield 125; physically this would mean
adding a 125 row, marking the 123 and 2 rows as deleted, and then
waiting for VACUUM to remove them).
* tx 1002 deleted a banana with weight 3.

The result of a SELECT * FROM weight_per_kind; would actually execute
SELECT kind, SUM_merge(weight) FROM heap_of_weight_per_kind GROUP BY
kind;

This would result, for tx 1001 (assuming tx 1000 committed and our
snapshot can see it), in:

kind, weight
'banana', 125
'apple', 1

(The -3 is not taken into account, because it is not visible to tx 1001.)

The operator to use at the location of SUM_merge is something that
merges multiple aggregation results (plus results that represent some
kind of “negation”) together. For SUM, it would be SUM itself and the
negation would be numerical negation. There might also be some kind of
“difference” concept used for UPDATE: When updating a weight from 4 to
3, the difference would be -1. Those additional properties could
optionally be added to the definition of each aggregation function; It
must be done for each function that you want to use in such a way.

Other aggregation functions such as AVG would require storing the SUM
+ number of rows in the matview (otherwise two AVGs could not be
merged); again a difference between the heap and the logical content.
Functions such as MIN and MAX are more difficult to fit in this
framework: I can only see how it would work if row deletion were not
allowed (which might still be a valid use-case); luckily, I think MIN
and MAX are not the typical things for which you would want to use
matviews, because quick computation can typically be done directly
using the base tables.

This whole thing would result in incremental updates of
aggregation-matviews that don’t require physical serialization of the
transactions that update the base tables and that query the matview,
which other models (that depend on correspondence between the heap and
logical content of matviews) would probably require.

And that’s where I stop rambling because nobody gets this far anyway,
and I urgently need some sleep :-).

</ridiculously long rationale>

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: Materialized views WIP patch

From
Bruce Momjian
Date:
On Wed, Mar  6, 2013 at 09:16:59AM -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On 5 March 2013 22:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> FWIW, my opinion is that doing anything like this in the planner is
> >> going to be enormously expensive.
> 
> > As we already said: no MVs => zero overhead => no problem.
> 
> Well, in the first place that statement is false on its face: we'll
> still spend cycles looking for relevant MVs, or at least maintaining a
> complexly-indexed cache that helps us find out that there are none in
> a reasonable amount of time.  In the second place, even if it were
> approximately true it wouldn't help the people who were using MVs.
> 
> > It costs in
> > the cases where time savings are possible and not otherwise.
> 
> And that is just complete nonsense: matching costs whether you find a
> match or not.  Could we have a little less Pollyanna-ish optimism and
> a bit more realism about the likely cost of such a feature?

Should we add this to the TODO list as a possibility?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Materialized views WIP patch

From
Bruce Momjian
Date:
On Tue, Mar  5, 2013 at 08:50:39AM +0000, Simon Riggs wrote:
> Its not a different feature, its what most people expect a feature
> called MV to deliver. That's not a matter of opinion, its simply how
> every other database works currently - Oracle, Teradata, SQLServer at
> least. The fact that we don't allow MVs to automatically optimize

Good points.

> queries is acceptable, as long as that is clearly marked in some way
> to avoid confusion, and I don't mean buried on p5 of the docs. What we
> have here is a partial implementation that can be improved upon over
> next few releases. I hope anyone isn't going to claim that
> "Materialized Views" have been implemented in the release notes in
> this release, because unqualified that would be seriously misleading
> and might even stifle further funding to improve things to the level
> already implemented elsewhere. Just to reiterate, I fully support the
> committing of this partial feature into Postgres in this release
> because it will be a long haul to complete the full feature and what
> we have here is a reasonable stepping stone to get there.
> 
> Transactionally up-yo-date MVs can be used like indexes in the
> planner. The idea that this is impossible because of the permutations
> involved is somewhat ridiculous; there is much published work on
> optimising that and some obvious optimisations. Clearly that varies
> according to the number of MVs and the number of tables they touch,
> not the overall complexity of the query. The overhead is probably same
> or less as partial indexes, which we currently think is acceptable. In
> any case, if you don't wish that overhead, don't use MVs.

While you are right that automatically using materialized views is like
the optimizer choosing partial indexes, we actually already have
auto-selection of row-level materialized views with expression indexes
and index-only scans.  When you do the insert or update, the indexed
function is called and the value stored in the index.  If you later
query the function call, we can pull the value right from the index. 
This, of course, is a very crude definition of materialized view, but it
seems relevant.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Materialized views WIP patch

From
Robert Haas
Date:
On Thu, Mar 7, 2013 at 12:14 PM, David E. Wheeler <david@justatheory.com> wrote:
> On Mar 7, 2013, at 7:55 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>> If the answer to both those questions is “yes,” I think the term
>>> should remain “table,” with a few mentions that the term includes
>>> materialized views (and excludes foreign tables).
>>
>> And if the answers are "not exactly" and "yes"?
>
> I still tend to think that the term should remain “table,” with brief mentions at the top of pages when the term
shouldbe assumed to represent tables and matviews, and otherwise required disambiguations. 

This ship has already sailed.  There are plenty of places where
operations apply to a subset of the relation types that exist today,
and we either list them out or refer to "relations" generically.
Changing that would require widespread changes to both the
documentation and existing error message text.  We cannot decide that
"table" now means "table or materialized view" any more than we can
decide that it means "table or foreign table", as was proposed around
the time those changes went in.  Yeah, it's more work, and it's a
little annoying, but it's also clear.  Nothing else is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Noah Misch
Date:
While doing some post-commit review of the 9.3 materialized view feature, I
noticed a few loose ends:

On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote:
> Note that [...] "ALTER TABLE ... RENAME CONSTRAINT" [is]
> currently supported for MVs by ALTER TABLE but not by ALTER MATERIALIZED VIEW.
> 
> There's no documented support for table constraints on MVs, but UNIQUE
> constraints are permitted:
> 
> [local] test=# alter materialized view mymv add unique (c);
> ALTER MATERIALIZED VIEW
> [local] test=# alter materialized view mymv add check (c > 0);
> ERROR:  "mymv" is not a table
> [local] test=# alter materialized view mymv add primary key (c);
> ERROR:  "mymv" is not a table or foreign table

The above points still apply.

Also, could you explain the use of RelationCacheInvalidateEntry() in
ExecRefreshMatView()?  CacheInvalidateRelcacheByRelid() followed by
CommandCounterIncrement() is the typical pattern; this is novel.  I suspect,
though, neither is necessary now that the relcache does not maintain populated
status based on a fork size reading.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Apologies, but this sub-thread got lost when I changed email
accounts.  I found it in a final review to make sure nothing had
fallen through the cracks.

Noah Misch <noah@leadboat.com> wrote:
> On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote:

>> There's no documented support for table constraints on MVs, but
>> UNIQUE constraints are permitted:
>>
>> [local] test=# alter materialized view mymv add unique (c);
>> ALTER MATERIALIZED VIEW

Fix pushed.

> Also, could you explain the use of RelationCacheInvalidateEntry()
> in ExecRefreshMatView()?  CacheInvalidateRelcacheByRelid()
> followed by CommandCounterIncrement() is the typical pattern;
> this is novel. I suspect, though, neither is necessary now that
> the relcache does not maintain populated status based on a fork
> size reading.

Yeah, that was part of the attempt to support unlogged materialized
views while also not returning bogus results if the view had not
been populated, using heap file size.  I agree that this line can
just come out.  If there are no objections real soon now, I will
remove it in master and the 9.3 branch before the release
candidate.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Materialized views WIP patch

From
Kevin Grittner
Date:
Kevin Grittner <kgrittn@ymail.com> wrote:
> Noah Misch <noah@leadboat.com> wrote:

>> Also, could you explain the use of RelationCacheInvalidateEntry()
>> in ExecRefreshMatView()?  CacheInvalidateRelcacheByRelid()
>> followed by CommandCounterIncrement() is the typical pattern;
>> this is novel. I suspect, though, neither is necessary now that
>> the relcache does not maintain populated status based on a fork
>> size reading.
>
> Yeah, that was part of the attempt to support unlogged materialized
> views while also not returning bogus results if the view had not
> been populated, using heap file size.  I agree that this line can
> just come out.  If there are no objections real soon now, I will
> remove it in master and the 9.3 branch before the release
> candidate.

Done.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company