Re: Materialized views WIP patch - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: Materialized views WIP patch |
Date | |
Msg-id | 20121116162558.90150@gmx.com Whole thread Raw |
In response to | Materialized views WIP patch ("Kevin Grittner" <kgrittn@mail.com>) |
Responses |
Re: Materialized views WIP patch
Re: Materialized views WIP patch Re: Materialized views WIP patch Re: Materialized views WIP patch Re: Materialized views WIP patch Re: Materialized views WIP patch |
List | pgsql-hackers |
Josh Berkus wrote: >> 1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE >> TABLE AS, with all the same clauses supported. That includes >> declaring a materialized view to be temporary or unlogged. > > What use would a temporary matview be? It would be essentially like a temporary table, with all the same persistence options. I'm not really sure how often it will be more useful than a temporary table before we have incremental maintenance of materialized views; once we have that, though, it seems likely that there could be reasonable use cases. > Unlogged is good. I agree that there are likely to be more use cases for this than temp MVs. Unfortunately, I've had a hard time figuring out how to flag an MV which is empty because its contents were lost after a crash with preventing people from using an MV which hasn't been populated, which has the potential to silently return incorrect results. >> 2. MVs don't support inheritance. > > In which direction? Can't inherit, or can't be inherited from? The table inheritance has not been implemented in either direction for MVs. It didn't seem clear to me that there were reasonable use cases. Do you see any? >> 9. MVs can't directly be used in a COPY statement, but can be the >> source of data using a SELECT. > > Hmmm? I don't understand the reason for this. Consistency. There are other object types which seem to enforce this rule for no reason that I can see beyond maybe a desire to have both directions of COPY work with the same set of objects. If I remember correctly, allowing this would eliminate one line of code from the patch, so if sentiment is in favor of it, it is very easily done. >> 13. pg_class now has a relisvalid column, which is true if an MV is >> truncated or created WITH NO DATA. You can not scan a relation >> flagged as invalid. > > What error would a user see? I can more directly answer that on Monday. If you enable the body of the function which makes the relisvalid check you can see the messages. I commented it out because I have not yet figured out how to suppress the check for a LOAD MV command. >> 14. ALTER MATERIALIZED VIEW is supported for the options that seemed >> to make sense. For example, you can change the tablespace or >> schema, but you cannot add or drop column with ALTER. > > How would you change the definition of an MV then? At this point you would need to drop and re-create the MV. If we want to add columns to an MV or change what an existing column holds, perhaps we could have an ALTER MV which changed the SELECT statement that populates the MV? I would prefer to leave that to a later patch, though -- it seems like a bit of a minefield compared to what is being implemented in this patch. >> 16. To get new data into the MV, the command is LOAD MATERIALIZED >> VIEW mat view_name. This seemed more descriptive to me that the >> alternatives and avoids declaring any new keywords beyond >> MATERIALIZED. If the MV is flagged as relisvalid == false, this >> will change it to true. > > UPDATE MATERIALIZED VIEW was problematic? Not technically, really, but I saw two reasons that I preferred LOAD MV: 1. It seems to me to better convey that the entire contents of the MV will be built from scratch, rather than incrementallyadjusted. 2. We haven't hashed out the syntax for more aggressive maintenance of an MV, and it seemed like UPDATE MV might be syntaxwe would want to use for something which updated selected parts of an MV when we do. > Does LOAD automatically TRUNCATE the view before reloading it? If not, > why not? It builds a new heap and moves it into place. When the transaction running LMV commits, the old heap is deleted. In implementation it is closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by creating a new table. This allows all permissions, etc., to stay in place. >> It would be good to have some discussion to try to reach a consensus >> about whether we need to differentiate between *missing* datat (where >> a materialized view which has been loaded WITH NO DATA or TRUNCATEd >> and has not been subsequently LOADed) and potentially *stale* data. >> If we don't care to distinguish between a view which generated no >> rows when it ran and a one for which the query has not been run, we >> can avoid adding the relisvalid flag, and we could support UNLOGGED >> MVs. Perhaps someone can come up with a better solution to that >> problem. > > Hmmm. I understand the distinction you're making here, but I'm not sure > it actually matters to the user. MVs, by their nature, always have > potentially stale data. Being empty (in an inaccurate way) is just one > kind of stale data. Robert feels the same way, but I disagree. Some MVs will not be terribly volatile. In my view there is a big difference between having a "top ten" list which might be based on yesterday's base tables rather than the base table states as of this moment, and having a "top ten" list with no entries. If you want to, for example, take some action if an order comes in for one of your top customers, and a different path for other customers, suddenly treating all of your long-time top customers as not being so, without any squawk from the database, seems dangerous. > It would be nice for the user to have some way to know that a matview is > empty due to never being LOADed or recently being TRUNCATEd. However, I > don't think that relisvalid flag -- and preventing scanning the relation > -- is a good solution. What I'd rather have instead is a timestamp of > when the MV was last LOADed. If the MV was never loaded (or was > truncated) that timestamp would be NULL. Such a timestamp would allow > users to construct all kinds of ad-hoc refresh schemes for MVs which > would not be possible without it. I see your point there; I'll think about that. My take was more that MVs would often be refreshed by crontab, and that you would want to keep subsequent steps from running and generating potentially plausible but completely inaccurate results if the LMV failed. > I don't see how this relates to UNLOGGED matviews either way. UNLOGGED tables and indexes are made empty on crash recovery by copying the initialization fork over the "normal" relations. Care was taken to avoid needing to connect to each database in turn to complete that recovery. This style of recovery can't really set the relisvalid flag, as far as I can see; which leaves us choosing between unlogged MVs and knowing whether they hold valid data -- unless someone has a better idea. -Kevin
pgsql-hackers by date: