Re: Materialized views WIP patch - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Materialized views WIP patch |
Date | |
Msg-id | 50A6AE02.7000102@agliodbs.com Whole thread Raw |
In response to | Re: Materialized views WIP patch ("Kevin Grittner" <kgrittn@mail.com>) |
Responses |
Re: Materialized views WIP patch
|
List | pgsql-hackers |
Kevin, > I agree that there are likely to be more use cases for this than > temp MVs. Unfortunately, I've had a hard time figuring out how to > flag an MV which is empty because its contents were lost after a > crash with preventing people from using an MV which hasn't been > populated, which has the potential to silently return incorrect > results. See below. >>> 2. MVs don't support inheritance. >> >> In which direction? Can't inherit, or can't be inherited from? > > The table inheritance has not been implemented in either direction > for MVs. It didn't seem clear to me that there were reasonable use > cases. Do you see any? No, I just wanted clarity on this. I can see a strong case for eventually supporting CREATE MATERIALIZED VIEW matview_1 LIKE matview, in order to "copy" mativews, though. > Consistency. There are other object types which seem to enforce this > rule for no reason that I can see beyond maybe a desire to have both > directions of COPY work with the same set of objects. If I remember > correctly, allowing this would eliminate one line of code from the > patch, so if sentiment is in favor of it, it is very easily done. There's going to be a pretty strong demand for COPY FROM matviews. Forcing the user to use COPY FROM ( SELECT ... ) will be seen as arbitrary and unintuitive. >> How would you change the definition of an MV then? > > At this point you would need to drop and re-create the MV. If we > want to add columns to an MV or change what an existing column holds, > perhaps we could have an ALTER MV which changed the SELECT statement > that populates the MV? I would prefer to leave that to a later patch, > though -- it seems like a bit of a minefield compared to what is > being implemented in this patch. I agree that it should be a later patch. > Not technically, really, but I saw two reasons that I preferred LOAD MV: > > 1. It seems to me to better convey that the entire contents of the MV > will be built from scratch, rather than incrementally adjusted. > 2. We haven't hashed out the syntax for more aggressive maintenance of > an MV, and it seemed like UPDATE MV might be syntax we would want to > use for something which updated selected parts of an MV when we do. Hmmm, I see your point. So "LOAD" would recreate, and (when supported) UPDATE would incrementally update? > It builds a new heap and moves it into place. When the transaction > running LMV commits, the old heap is deleted. In implementation it is > closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by > creating a new table. This allows all permissions, etc., to stay in > place. OK, so same effect as a truncate. > Robert feels the same way, but I disagree. Some MVs will not be terribly > volatile. In my view there is a big difference between having a "top ten" > list which might be based on yesterday's base tables rather than the base > table states as of this moment, and having a "top ten" list with no > entries. If you want to, for example, take some action if an order comes > in for one of your top customers, and a different path for other > customers, suddenly treating all of your long-time top customers as not > being so, without any squawk from the database, seems dangerous. Right, but a relisvalid flag just tells me that the matview was updated at sometime in the past, and not *when* it was updated. It could have been 3 years ago. The fact that it was updated at some indefinite time is fairly valueless information. There's a rule in data warehousing which says that it's better to have no data (and know that you have no data) than to have incorrect data. > I see your point there; I'll think about that. My take was more that MVs > would often be refreshed by crontab, and that you would want to keep > subsequent steps from running and generating potentially plausible but > completely inaccurate results if the LMV failed. Yeah, that too. Also, a timestamp it would make it easy to double-check if the cron job was failing or had been disabled. > UNLOGGED tables and indexes are made empty on crash recovery by copying > the initialization fork over the "normal" relations. Care was taken to > avoid needing to connect to each database in turn to complete that > recovery. This style of recovery can't really set the relisvalid flag, as > far as I can see; which leaves us choosing between unlogged MVs and > knowing whether they hold valid data -- unless someone has a better idea. Yeah, well, whether we have relisvalid or mvlastupdate, we're going to have to work out some way to have that field react to changes to the table overall. I don't know *how*, but it's something we'll have to solve. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: