Thread: Materialized views WIP patch
Attached is a patch that is still WIP but that I think is getting pretty close to completion. It is not intended to be the be-all and end-all for materialized views, but the minimum useful feature set -- which is all that I've had time to do for this release. In particular, the view is only updated on demand by a complete rebuild. For the next release, I hope to build on this base to allow more eager and incremental updates, and perhaps a concurrent batch update. 1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE TABLE AS, with all the same clauses supported. That includes declaring a materialized view to be temporary or unlogged. 2. MVs don't support inheritance. 3. MVs can't define foreign keys. 4. MVs can't be the target of foreign keys. 5. MVs can't have triggers. 6. Users can't create rules which reference MVs (although MVs [ab]use the rules mechanism internally, similar to how views do). 7. MVs can't be converted to views, nor vice versa. 8. Users may not directly use INSERT/UPDATE/DELETE on an MV. 9. MVs can't directly be used in a COPY statement, but can be the source of data using a SELECT. 10. MVs can't own sequences. 11. MVs can't be the target of LOCK statements, although other statements get locks just like a table. 12. MVs can't use data modifying CTEs in their definitions. 13. pg_class now has a relisvalid column, which is true if an MV is truncated or created WITH NO DATA. You can not scan a relation flagged as invalid. 14. ALTER MATERIALIZED VIEW is supported for the options that seemed to make sense. For example, you can change the tablespace or schema, but you cannot add or drop column with ALTER. 15. The SELECT query used to define the MV may not contain a data-modifying CTE. 16. To get new data into the MV, the command is LOAD MATERIALIZED VIEW mat view_name. This seemed more descriptive to me that the alternatives and avoids declaring any new keywords beyond MATERIALIZED. If the MV is flagged as relisvalid == false, this will change it to true. 17. Since the data viewed in an MV is not up-to-date with the latest committed transaction, it didn't seem to make any sense to try to apply SERIALIZABLE transaction semantics to queries looking at the contents of an MV, although if LMV is run in a SERIALIZABLE transaction the MV data is guaranteed to be free of serialization anomalies. This does leave the transaction running the LOAD command vulnerable to serialization failures unless it is also READ ONLY DEFERRABLE. 18. Bound parameters are not supported for the CREATE MATERIALIZED VIEW statement. 19. LMV doesn't show a row count. It wouldn't be hard to add, it just seemed a little out of place to do that, when CLUSTER, etc., don't. I wasn't able to wrap up a few things in time for this commitfest: - Documentation is incomplete. - pg_dump support needs addtional dependencies added to properly handle MVs which are defined using other MVs. - pg_dump binary hasn't had a lot of attention yet. - There are no regression tests yet. - I ran into problems getting the validity check working right, so I have disabled it by commenting out the function body in this patch. - TRUNCATE should probably support a MATERIALIZED VIEW clause. It would be good to have some discussion to try to reach a consensus about whether we need to differentiate between *missing* datat (where a materialized view which has been loaded WITH NO DATA or TRUNCATEd and has not been subsequently LOADed) and potentially *stale* data. If we don't care to distinguish between a view which generated no rows when it ran and a one for which the query has not been run, we can avoid adding the relisvalid flag, and we could support UNLOGGED MVs. Perhaps someone can come up with a better solution to that problem. In the long term, we will probably need to separate the implementation of CREATE TABLE AS and CREATE MATERIALIZED VIEW, but for now there is so little that they need to do differently it seemed less evil to have a few "if" clauses that that much duplicated code. The paint is pretty wet still, but hopefully people can evaluate the approach and work out any issues with the design choices in the CF so that it can be wrapped up nicely for the next one. 92 files changed, 2377 insertions(+), 440 deletions(-) -Kevin
Attachment
Kevin Grittner wrote: Interesting stuff. > /* > + * SetRelationIsValid > + * Set the value of the relation's relisvalid field in pg_class. > + * > + * NOTE: caller must be holding an appropriate lock on the relation. > + * ShareUpdateExclusiveLock is sufficient. > + * > + * NOTE: an important side-effect of this operation is that an SI invalidation > + * message is sent out to all backends --- including me --- causing plans > + * referencing the relation to be rebuilt with the new list of children. > + * This must happen even if we find that no change is needed in the pg_class > + * row. > + */ > + void > + SetRelationIsValid(Oid relationId, bool relisvalid) > + { It's not clear to me that it's right to do this by doing regular heap updates here instead of heap_inplace_update. Also, I think this might end up causing a lot of pg_class tuple churn (at least for matviews that delete rows at xact end), which would be nice to avoid. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/14/12 6:28 PM, Kevin Grittner wrote: > - Documentation is incomplete. > ... > - There are no regression tests yet. Do you have any simple test cases you've been using you could attach? With epic new features like this, when things don't work it's hard to distinguish between "that just isn't implemented yet" and "the author never tested that". Having some known good samples you have tested, even if they're not proper regression tests, would be helpful for establishing the code baseline works. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Kevin, > Attached is a patch that is still WIP but that I think is getting > pretty close to completion. It is not intended to be the be-all and > end-all for materialized views, but the minimum useful feature set -- > which is all that I've had time to do for this release. In > particular, the view is only updated on demand by a complete rebuild. > For the next release, I hope to build on this base to allow more > eager and incremental updates, and perhaps a concurrent batch update. Nice to see this come in! > 1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE > TABLE AS, with all the same clauses supported. That includes > declaring a materialized view to be temporary or unlogged. What use would a temporary matview be? Unlogged is good. > 2. MVs don't support inheritance. In which direction? Can't inherit, or can't be inherited from? > 3. MVs can't define foreign keys. > 4. MVs can't be the target of foreign keys. > 5. MVs can't have triggers. Makes sense. > 9. MVs can't directly be used in a COPY statement, but can be the > source of data using a SELECT. Hmmm? I don't understand the reason for this. > 13. pg_class now has a relisvalid column, which is true if an MV is > truncated or created WITH NO DATA. You can not scan a relation > flagged as invalid. What error would a user see? > 14. ALTER MATERIALIZED VIEW is supported for the options that seemed > to make sense. For example, you can change the tablespace or > schema, but you cannot add or drop column with ALTER. How would you change the definition of an MV then? > 16. To get new data into the MV, the command is LOAD MATERIALIZED > VIEW mat view_name. This seemed more descriptive to me that the > alternatives and avoids declaring any new keywords beyond > MATERIALIZED. If the MV is flagged as relisvalid == false, this > will change it to true. UPDATE MATERIALIZED VIEW was problematic? Does LOAD automatically TRUNCATE the view before reloading it? If not, why not? > It would be good to have some discussion to try to reach a consensus > about whether we need to differentiate between *missing* datat (where > a materialized view which has been loaded WITH NO DATA or TRUNCATEd > and has not been subsequently LOADed) and potentially *stale* data. > If we don't care to distinguish between a view which generated no > rows when it ran and a one for which the query has not been run, we > can avoid adding the relisvalid flag, and we could support UNLOGGED > MVs. Perhaps someone can come up with a better solution to that > problem. Hmmm. I understand the distinction you're making here, but I'm not sure it actually matters to the user. MVs, by their nature, always have potentially stale data. Being empty (in an inaccurate way) is just one kind of stale data. It would be nice for the user to have some way to know that a matview is empty due to never being LOADed or recently being TRUNCATEd. However, I don't think that relisvalid flag -- and preventing scanning the relation -- is a good solution. What I'd rather have instead is a timestamp of when the MV was last LOADed. If the MV was never loaded (or was truncated) that timestamp would be NULL. Such a timestamp would allow users to construct all kinds of ad-hoc refresh schemes for MVs which would not be possible without it. I don't see how this relates to UNLOGGED matviews either way. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, 2012-11-14 at 21:28 -0500, Kevin Grittner wrote: > Attached is a patch that is still WIP but that I think is getting > pretty close to completion. It is not intended to be the be-all and > end-all for materialized views, but the minimum useful feature set -- > which is all that I've had time to do for this release. In > particular, the view is only updated on demand by a complete rebuild. > For the next release, I hope to build on this base to allow more > eager and incremental updates, and perhaps a concurrent batch update. The documentation says that a materialized view is basically a create-table-as-select except that it remembers the query. Would you say that there is a compelling use case for this alone, or is this a building block for more sophisticated materialized view support (e.g. eager updating) later? Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > The documentation says that a materialized view is basically a > create-table-as-select except that it remembers the query. Would you say > that there is a compelling use case for this alone, or is this a > building block for more sophisticated materialized view support (e.g. > eager updating) later? The implementation of the re-LOAD'ing command makes it already worthwile. Bonus point if locking is limited to when the new content is all computer and ready, but even without that, I want to have it. ;) I'd bikeshed and prefer the UPDATE MATERIALIZED VIEW nsp.foo; of course. The alternative is creating a view, a matching table and a stored procedure that will implement the rebuilding, for each mat view you want to have. So that's already a big step forward in my eyes. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Greg Smith wrote: > On 11/14/12 6:28 PM, Kevin Grittner wrote: >> - Documentation is incomplete. >> ... >> - There are no regression tests yet. > > Do you have any simple test cases you've been using you could attach? > With epic new features like this, when things don't work it's hard to > distinguish between "that just isn't implemented yet" and "the author > never tested that". Having some known good samples you have tested, > even if they're not proper regression tests, would be helpful for > establishing the code baseline works. I can probably post something along those lines Monday. Sorry for the delay. Basically, though, I tried to state everything that I know of which is not yet done and working; so if you find something which doesn't behave as you would expect, please let me know. It's well within the realm of possibility that there are issues that I didn't think of. I certainly can fall into the tendency of programmers to think about testing those things which I thought to cover in the code. -Kevin
On Fri, Nov 16, 2012 at 7:13 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Jeff Davis <pgsql@j-davis.com> writes: >> The documentation says that a materialized view is basically a >> create-table-as-select except that it remembers the query. Would you say >> that there is a compelling use case for this alone, or is this a >> building block for more sophisticated materialized view support (e.g. >> eager updating) later? > > The implementation of the re-LOAD'ing command makes it already > worthwile. Bonus point if locking is limited to when the new content is > all computer and ready, but even without that, I want to have it. ;) Seconded. Background lock free refresh of materialization table would be wonderful. But moving dependency between source and materialized table out of plpgsql function and into defined schema justifies feature on its own merits. merlin
Jeff Davis wrote: > On Wed, 2012-11-14 at 21:28 -0500, Kevin Grittner wrote: > > Attached is a patch that is still WIP but that I think is getting > > pretty close to completion. It is not intended to be the be-all and > > end-all for materialized views, but the minimum useful feature set -- > > which is all that I've had time to do for this release. In > > particular, the view is only updated on demand by a complete rebuild. > > For the next release, I hope to build on this base to allow more > > eager and incremental updates, and perhaps a concurrent batch update. > > The documentation says that a materialized view is basically a > create-table-as-select except that it remembers the query. Would you say > that there is a compelling use case for this alone, or is this a > building block for more sophisticated materialized view support (e.g. > eager updating) later? IMV, this has some slight value as it stands, although perhaps not enough to justify a patch this big. The idea is that with this much in place, patches to implement more aggressive and incremental maintenance of the MV data become possible. So I think the bar it should pass for commit is that it seems a sane basis for that, while providing some functionality which people will find useful. -Kevin
Alvaro Herrera wrote: > It's not clear to me that it's right to do this by doing regular heap > updates here instead of heap_inplace_update. Also, I think this might > end up causing a lot of pg_class tuple churn (at least for matviews that > delete rows at xact end), which would be nice to avoid. If we keep the flag, I will look into heap_inplace_update. Thanks! -Kevin
Josh Berkus wrote: >> 1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE >> TABLE AS, with all the same clauses supported. That includes >> declaring a materialized view to be temporary or unlogged. > > What use would a temporary matview be? It would be essentially like a temporary table, with all the same persistence options. I'm not really sure how often it will be more useful than a temporary table before we have incremental maintenance of materialized views; once we have that, though, it seems likely that there could be reasonable use cases. > Unlogged is good. I agree that there are likely to be more use cases for this than temp MVs. Unfortunately, I've had a hard time figuring out how to flag an MV which is empty because its contents were lost after a crash with preventing people from using an MV which hasn't been populated, which has the potential to silently return incorrect results. >> 2. MVs don't support inheritance. > > In which direction? Can't inherit, or can't be inherited from? The table inheritance has not been implemented in either direction for MVs. It didn't seem clear to me that there were reasonable use cases. Do you see any? >> 9. MVs can't directly be used in a COPY statement, but can be the >> source of data using a SELECT. > > Hmmm? I don't understand the reason for this. Consistency. There are other object types which seem to enforce this rule for no reason that I can see beyond maybe a desire to have both directions of COPY work with the same set of objects. If I remember correctly, allowing this would eliminate one line of code from the patch, so if sentiment is in favor of it, it is very easily done. >> 13. pg_class now has a relisvalid column, which is true if an MV is >> truncated or created WITH NO DATA. You can not scan a relation >> flagged as invalid. > > What error would a user see? I can more directly answer that on Monday. If you enable the body of the function which makes the relisvalid check you can see the messages. I commented it out because I have not yet figured out how to suppress the check for a LOAD MV command. >> 14. ALTER MATERIALIZED VIEW is supported for the options that seemed >> to make sense. For example, you can change the tablespace or >> schema, but you cannot add or drop column with ALTER. > > How would you change the definition of an MV then? At this point you would need to drop and re-create the MV. If we want to add columns to an MV or change what an existing column holds, perhaps we could have an ALTER MV which changed the SELECT statement that populates the MV? I would prefer to leave that to a later patch, though -- it seems like a bit of a minefield compared to what is being implemented in this patch. >> 16. To get new data into the MV, the command is LOAD MATERIALIZED >> VIEW mat view_name. This seemed more descriptive to me that the >> alternatives and avoids declaring any new keywords beyond >> MATERIALIZED. If the MV is flagged as relisvalid == false, this >> will change it to true. > > UPDATE MATERIALIZED VIEW was problematic? Not technically, really, but I saw two reasons that I preferred LOAD MV: 1. It seems to me to better convey that the entire contents of the MV will be built from scratch, rather than incrementallyadjusted. 2. We haven't hashed out the syntax for more aggressive maintenance of an MV, and it seemed like UPDATE MV might be syntaxwe would want to use for something which updated selected parts of an MV when we do. > Does LOAD automatically TRUNCATE the view before reloading it? If not, > why not? It builds a new heap and moves it into place. When the transaction running LMV commits, the old heap is deleted. In implementation it is closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by creating a new table. This allows all permissions, etc., to stay in place. >> It would be good to have some discussion to try to reach a consensus >> about whether we need to differentiate between *missing* datat (where >> a materialized view which has been loaded WITH NO DATA or TRUNCATEd >> and has not been subsequently LOADed) and potentially *stale* data. >> If we don't care to distinguish between a view which generated no >> rows when it ran and a one for which the query has not been run, we >> can avoid adding the relisvalid flag, and we could support UNLOGGED >> MVs. Perhaps someone can come up with a better solution to that >> problem. > > Hmmm. I understand the distinction you're making here, but I'm not sure > it actually matters to the user. MVs, by their nature, always have > potentially stale data. Being empty (in an inaccurate way) is just one > kind of stale data. Robert feels the same way, but I disagree. Some MVs will not be terribly volatile. In my view there is a big difference between having a "top ten" list which might be based on yesterday's base tables rather than the base table states as of this moment, and having a "top ten" list with no entries. If you want to, for example, take some action if an order comes in for one of your top customers, and a different path for other customers, suddenly treating all of your long-time top customers as not being so, without any squawk from the database, seems dangerous. > It would be nice for the user to have some way to know that a matview is > empty due to never being LOADed or recently being TRUNCATEd. However, I > don't think that relisvalid flag -- and preventing scanning the relation > -- is a good solution. What I'd rather have instead is a timestamp of > when the MV was last LOADed. If the MV was never loaded (or was > truncated) that timestamp would be NULL. Such a timestamp would allow > users to construct all kinds of ad-hoc refresh schemes for MVs which > would not be possible without it. I see your point there; I'll think about that. My take was more that MVs would often be refreshed by crontab, and that you would want to keep subsequent steps from running and generating potentially plausible but completely inaccurate results if the LMV failed. > I don't see how this relates to UNLOGGED matviews either way. UNLOGGED tables and indexes are made empty on crash recovery by copying the initialization fork over the "normal" relations. Care was taken to avoid needing to connect to each database in turn to complete that recovery. This style of recovery can't really set the relisvalid flag, as far as I can see; which leaves us choosing between unlogged MVs and knowing whether they hold valid data -- unless someone has a better idea. -Kevin
On Thu, Nov 15, 2012 at 1:36 PM, Josh Berkus <josh@agliodbs.com> wrote: > Hmmm. I understand the distinction you're making here, but I'm not sure > it actually matters to the user. MVs, by their nature, always have > potentially stale data. Being empty (in an inaccurate way) is just one > kind of stale data. This is my feeling also. > I don't see how this relates to UNLOGGED matviews either way. Right now, Kevin has things set up so that when you do "TRUNCATE mv", it clears the relisvalid flag. If we allowed unlogged materialized views, the table would be automatically truncated on a crash, but there wouldn't be any way to clear relisvalid in that situation, so Kevin felt we should simply disable unlogged MVs. Personally, I'm not excited about having a relisvalid flag at all, and doubly not excited if it means we can't have unlogged MVs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 16 November 2012 16:25, Kevin Grittner <kgrittn@mail.com> wrote:
Josh Berkus wrote:> Unlogged is good.
I agree that there are likely to be more use cases for this than
temp MVs. Unfortunately, I've had a hard time figuring out how to
flag an MV which is empty because its contents were lost after a
crash with preventing people from using an MV which hasn't been
populated, which has the potential to silently return incorrect
results.
Thom
"Kevin Grittner" <kgrittn@mail.com> writes: > Josh Berkus wrote: >> What use would a temporary matview be? > It would be essentially like a temporary table, with all the same > persistence options. I'm not really sure how often it will be more > useful than a temporary table before we have incremental maintenance > of materialized views; once we have that, though, it seems likely > that there could be reasonable use cases. One of the principal attributes of a temp table is that its contents aren't (reliably) accessible from anywhere except the owning backend. Not sure where you're going to hide the incremental maintenance in that scenario. > The table inheritance has not been implemented in either direction > for MVs. It didn't seem clear to me that there were reasonable use > cases. Do you see any? We don't have inheritance for views, so how would we have it for materialized views? regards, tom lane
On 15 November 2012 02:28, Kevin Grittner <kgrittn@mail.com> wrote:
Attached is a patch that...
Got this error:
postgres=# create view v_test as select 1;
CREATE VIEW
postgres=# create materialized view mv_test as select * from v_test;
ERROR: could not open file "base/12064/16425": No such file or directory
Thom
By chance (?) I got similar one today too, when dropping extension:
ERROR: could not open file "base/12623/12548": No such file or directory
I thought something might have gone wrong during Linux upgrade 2 days ago, but it's not likely that we both have the issue.
I wonder if something is broken in the catalog. The last commit I have in my environment is
commit 4af3dda13601d859a20425e3554533fde0549056
Author: Peter Eisentraut <peter_e@gmx.net>
Date: Sun Oct 28 10:35:46 2012 -0400
Kind regards,
Tony.
On 11/16/2012 06:14 PM, Thom Brown wrote:
ERROR: could not open file "base/12623/12548": No such file or directory
I thought something might have gone wrong during Linux upgrade 2 days ago, but it's not likely that we both have the issue.
I wonder if something is broken in the catalog. The last commit I have in my environment is
commit 4af3dda13601d859a20425e3554533fde0549056
Author: Peter Eisentraut <peter_e@gmx.net>
Date: Sun Oct 28 10:35:46 2012 -0400
Kind regards,
Tony.
On 11/16/2012 06:14 PM, Thom Brown wrote:
On 15 November 2012 02:28, Kevin Grittner <kgrittn@mail.com> wrote:Attached is a patch that...Got this error:postgres=# create view v_test as select 1;CREATE VIEWpostgres=# create materialized view mv_test as select * from v_test;ERROR: could not open file "base/12064/16425": No such file or directory--
Thom
Kevin, > I agree that there are likely to be more use cases for this than > temp MVs. Unfortunately, I've had a hard time figuring out how to > flag an MV which is empty because its contents were lost after a > crash with preventing people from using an MV which hasn't been > populated, which has the potential to silently return incorrect > results. See below. >>> 2. MVs don't support inheritance. >> >> In which direction? Can't inherit, or can't be inherited from? > > The table inheritance has not been implemented in either direction > for MVs. It didn't seem clear to me that there were reasonable use > cases. Do you see any? No, I just wanted clarity on this. I can see a strong case for eventually supporting CREATE MATERIALIZED VIEW matview_1 LIKE matview, in order to "copy" mativews, though. > Consistency. There are other object types which seem to enforce this > rule for no reason that I can see beyond maybe a desire to have both > directions of COPY work with the same set of objects. If I remember > correctly, allowing this would eliminate one line of code from the > patch, so if sentiment is in favor of it, it is very easily done. There's going to be a pretty strong demand for COPY FROM matviews. Forcing the user to use COPY FROM ( SELECT ... ) will be seen as arbitrary and unintuitive. >> How would you change the definition of an MV then? > > At this point you would need to drop and re-create the MV. If we > want to add columns to an MV or change what an existing column holds, > perhaps we could have an ALTER MV which changed the SELECT statement > that populates the MV? I would prefer to leave that to a later patch, > though -- it seems like a bit of a minefield compared to what is > being implemented in this patch. I agree that it should be a later patch. > Not technically, really, but I saw two reasons that I preferred LOAD MV: > > 1. It seems to me to better convey that the entire contents of the MV > will be built from scratch, rather than incrementally adjusted. > 2. We haven't hashed out the syntax for more aggressive maintenance of > an MV, and it seemed like UPDATE MV might be syntax we would want to > use for something which updated selected parts of an MV when we do. Hmmm, I see your point. So "LOAD" would recreate, and (when supported) UPDATE would incrementally update? > It builds a new heap and moves it into place. When the transaction > running LMV commits, the old heap is deleted. In implementation it is > closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by > creating a new table. This allows all permissions, etc., to stay in > place. OK, so same effect as a truncate. > Robert feels the same way, but I disagree. Some MVs will not be terribly > volatile. In my view there is a big difference between having a "top ten" > list which might be based on yesterday's base tables rather than the base > table states as of this moment, and having a "top ten" list with no > entries. If you want to, for example, take some action if an order comes > in for one of your top customers, and a different path for other > customers, suddenly treating all of your long-time top customers as not > being so, without any squawk from the database, seems dangerous. Right, but a relisvalid flag just tells me that the matview was updated at sometime in the past, and not *when* it was updated. It could have been 3 years ago. The fact that it was updated at some indefinite time is fairly valueless information. There's a rule in data warehousing which says that it's better to have no data (and know that you have no data) than to have incorrect data. > I see your point there; I'll think about that. My take was more that MVs > would often be refreshed by crontab, and that you would want to keep > subsequent steps from running and generating potentially plausible but > completely inaccurate results if the LMV failed. Yeah, that too. Also, a timestamp it would make it easy to double-check if the cron job was failing or had been disabled. > UNLOGGED tables and indexes are made empty on crash recovery by copying > the initialization fork over the "normal" relations. Care was taken to > avoid needing to connect to each database in turn to complete that > recovery. This style of recovery can't really set the relisvalid flag, as > far as I can see; which leaves us choosing between unlogged MVs and > knowing whether they hold valid data -- unless someone has a better idea. Yeah, well, whether we have relisvalid or mvlastupdate, we're going to have to work out some way to have that field react to changes to the table overall. I don't know *how*, but it's something we'll have to solve. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
"Kevin Grittner" <kgrittn@mail.com> writes: >> UPDATE MATERIALIZED VIEW was problematic? > > Not technically, really, but I saw two reasons that I preferred LOAD MV: > > 1. It seems to me to better convey that the entire contents of the MV > will be built from scratch, rather than incrementally adjusted. > 2. We haven't hashed out the syntax for more aggressive maintenance of > an MV, and it seemed like UPDATE MV might be syntax we would want to > use for something which updated selected parts of an MV when we do. Good point, and while I'm in the mood for some grammar input, here's a try: ALTER MATERIALIZED VIEW foo RESET; ALTER MATERIALIZED VIEW foo UPDATE; I think such wholesale operations make more sense as ALTER statement than as UPDATE statements. > It builds a new heap and moves it into place. When the transaction > running LMV commits, the old heap is deleted. In implementation it is > closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by > creating a new table. This allows all permissions, etc., to stay in > place. When you say closer to CLUSTER, do you include the Access Exclusive Lock that forbids reading the previous version's data while you prepare the new one? That would be very bad and I wouldn't understand the need to, in the scope of MATERIALIZED VIEWs which are by definition lagging behind… If as I think you don't have that limitation in your implementation, it's awesome and just what I was hoping to read :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Josh Berkus <josh@agliodbs.com> writes: > There's going to be a pretty strong demand for COPY FROM matviews. > Forcing the user to use COPY FROM ( SELECT ... ) will be seen as > arbitrary and unintuitive. You could make that same claim about plain views, but in point of fact the demand for making them work in COPY has been minimal. So I'm not convinced this is an essential first-cut feature. We can always add it later. regards, tom lane
Thom Brown wrote: > postgres=# create view v_test as select 1; > CREATE VIEW > postgres=# create materialized view mv_test as select * from v_test; > ERROR: could not open file "base/12064/16425": No such file or directory Thanks for the report; will investigate. -Kevin
Robert Haas wrote: > Josh Berkus wrote: >> Being empty (in an inaccurate way) is just one kind of stale data. > > This is my feeling also. If you had an MV summarizing Wisconsin courts cumulative case counts by case type, "empty" would not have been a valid "stale" state for over 150 years. That is a degree of staleness that IMV is not just a quantitative degree of staleness, as if a nightly recalculation had failed to occur, but a qualitatively different state entirely. While you may or may not want to use the stale data if last night's regen failed, and so it should be under application control, I can't imagine a situation where you would want to proceed if the MV didn't have data that had at some time been correct -- preferrably at some time since the invention of digital electronic computers. Could you provide an example where it would be a good thing to do so? -Kevin
> You could make that same claim about plain views, but in point of > fact the demand for making them work in COPY has been minimal. > So I'm not convinced this is an essential first-cut feature. > We can always add it later. Of course. I just had the impression that we could support COPY FROM by *deleting* a couple lines from Kevin's patch, rather than it being extra work. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> You could make that same claim about plain views, but in point of >> fact the demand for making them work in COPY has been minimal. >> So I'm not convinced this is an essential first-cut feature. >> We can always add it later. > Of course. I just had the impression that we could support COPY FROM by > *deleting* a couple lines from Kevin's patch, rather than it being extra > work. Even if it happens to be trivial in the current patch, it's an added functional requirement that we might later regret having cavalierly signed up for. And, as noted upthread, relations that support only one direction of COPY don't exist at the moment; that would be adding an asymmetry that we might later regret, too. regards, tom lane
On 16 November 2012 11:25, Kevin Grittner <kgrittn@mail.com> wrote: >>> 16. To get new data into the MV, the command is LOAD MATERIALIZED >>> VIEW mat view_name. This seemed more descriptive to me that the >>> alternatives and avoids declaring any new keywords beyond >>> MATERIALIZED. If the MV is flagged as relisvalid == false, this >>> will change it to true. >> >> UPDATE MATERIALIZED VIEW was problematic? > > Not technically, really, but I saw two reasons that I preferred LOAD MV: > > 1. It seems to me to better convey that the entire contents of the MV > will be built from scratch, rather than incrementally adjusted. > 2. We haven't hashed out the syntax for more aggressive maintenance of > an MV, and it seemed like UPDATE MV might be syntax we would want to > use for something which updated selected parts of an MV when we do. > >> Does LOAD automatically TRUNCATE the view before reloading it? If not, >> why not? > > It builds a new heap and moves it into place. When the transaction > running LMV commits, the old heap is deleted. In implementation it is > closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by > creating a new table. This allows all permissions, etc., to stay in > place. This seems very similar to the REPLACE command we discussed earlier, except this is restricted to Mat Views. If we're going to have this, I would prefer a whole command. e.g. REPLACE matviewname REFRESH that would also allow REPLACE tablename AS query Same thing under the covers, just more widely applicable and thus more useful. Either way, I don't much like overloading the use of LOAD, which already has a very different meaning. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Josh Berkus wrote: >> It would be nice for the user to have some way to know that a matview is >> empty due to never being LOADed or recently being TRUNCATEd. However, I >> don't think that relisvalid flag -- and preventing scanning the relation >> -- is a good solution. What I'd rather have instead is a timestamp of >> when the MV was last LOADed. If the MV was never loaded (or was >> truncated) that timestamp would be NULL. Such a timestamp would allow >> users to construct all kinds of ad-hoc refresh schemes for MVs which >> would not be possible without it. +1 Kevin Grittner wrote: > I see your point there; I'll think about that. My take was more that MVs > would often be refreshed by crontab, and that you would want to keep > subsequent steps from running and generating potentially plausible but > completely inaccurate results if the LMV failed. If one of these subsequent steps doesn't care if refresh failed once, it shouldn't be forced to fail. I imagine that for many applications yesterday's data can be good enough. Those that care should check the timestamp. Yours, Laurenz Albe
Simon Riggs wrote: > This seems very similar to the REPLACE command we discussed > earlier, except this is restricted to Mat Views. I don't remember that discussion -- do you have a reference? > If we're going to have this, I would prefer a whole command. > > e.g. REPLACE matviewname REFRESH > > that would also allow > > REPLACE tablename AS query > > Same thing under the covers, just more widely applicable and thus > more useful. An interesting throught. I would have thought that if we were going to allow changing the definition of an existing MV, we would be better off with CREATE OR REPLACE MATERIALIZED VIEW. Either way, if you allow the column types or the number of columns to be changed, you do tend to run into issues if there are other MVs, views, triggers, rules, etc., which depend on the MV, so I don't think it's material for an initial patch. But it is worth considering which way we might want to extend it. > Either way, I don't much like overloading the use of LOAD, which > already has a very different meaning. Well, it's hard to avoid creating new keywords without overloading the meaning of exsiting ones. Personally I didn't find LOAD MATERIALIZED VIEW matview_name; to be very easy to confuse with LOAD 'filename'; But that's a subjective thing. If too many people find that confusing, it may be worth creating a new keyword; but I wanted to see whether it was really necessary first. -Kevin
Albe Laurenz wrote: > Kevin Grittner wrote: > >> My take was more that MVs would often be refreshed by crontab, and >> that you would want to keep subsequent steps from running and >> generating potentially plausible but completely inaccurate results >> if the LMV failed. > > If one of these subsequent steps doesn't care if refresh > failed once, it shouldn't be forced to fail. I imagine > that for many applications yesterday's data can be good enough. > > Those that care should check the timestamp. It sounds like you and I are in agreement on this; I just didn't state it very precisely. If a LMV on a MV which already has data fails, the relisvalid would not prevent it from being used -- it would be stale, but still valid data from *some* point in time. The point is that if an MV is created WITH NO DATA or has been TRUNCATEd and there has not been a subsequent LMV, what it contains may not represent any state which was *ever* valid, or it may represent a state which would only have been valid hundreds of years in the past, had the system been computerized at that time. To me, that does not seem like the same thing as a simple "stale" state. I'm looking at whether there is some reasonable way to detect invalid data as well as capture age of data. Every solution I've thought of so far has at least one hard-to-solve race condition, but I have hopes that I can either solve that for one of the ideas, or come up with an idea which falls more gracefully under MVCC management. -Kevin
"Kevin Grittner" <kgrittn@mail.com> writes: > Simon Riggs wrote: >> Either way, I don't much like overloading the use of LOAD, which >> already has a very different meaning. > Well, it's hard to avoid creating new keywords without overloading > the meaning of exsiting ones. FWIW, I'd much rather see us overload LOAD (which is seldom used) than REPLACE (which might in the future become a widely-used DML command). regards, tom lane
Kevin, > I'm looking at whether there is some reasonable way to detect invalid > data as well as capture age of data. Every solution I've thought of > so far has at least one hard-to-solve race condition, but I have > hopes that I can either solve that for one of the ideas, or come up > with an idea which falls more gracefully under MVCC management. What's the race condition? I'd think that LOAD would take an exclusive lock on the matview involved. > LOAD MATERIALIZED VIEW matview_name; > > to be very easy to confuse with > > LOAD 'filename'; > > But that's a subjective thing. If too many people find that > confusing, it may be worth creating a new keyword; but I wanted to > see whether it was really necessary first. I do not find them confusing. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/19/12 9:57 AM, Josh Berkus wrote: > Kevin, > >> I'm looking at whether there is some reasonable way to detect invalid >> data as well as capture age of data. Every solution I've thought of >> so far has at least one hard-to-solve race condition, but I have >> hopes that I can either solve that for one of the ideas, or come up >> with an idea which falls more gracefully under MVCC management. > > What's the race condition? I'd think that LOAD would take an exclusive > lock on the matview involved. BTW, another thought on the timestamp: while it would be better to have a lastrefresh timestamp in pg_class, the other option is to have an extra column in the matview (pg_last_update). While that would involve some redundant storage, it would neatly solve the issues around unlogged matviews; the timestamp and the data would vanish at the same time. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Tom Lane wrote: > "Kevin Grittner" <kgrittn@mail.com> writes: >> Josh Berkus wrote: >>> What use would a temporary matview be? > >> It would be essentially like a temporary table, with all the same >> persistence options. I'm not really sure how often it will be more >> useful than a temporary table before we have incremental >> maintenance of materialized views; once we have that, though, it >> seems likely that there could be reasonable use cases. > > One of the principal attributes of a temp table is that its > contents aren't (reliably) accessible from anywhere except the > owning backend. Not sure where you're going to hide the incremental > maintenance in that scenario. The more I think about that, the less sensible temporary MVs seem. Unless I can figure out some reasonable use case, I'll diable that in the next version of the patch. -Kevin
Hi Kevin, On 15/11/2012 03:28, Kevin Grittner wrote: > Attached is a patch that is still WIP but that I think is getting > pretty close to completion. I've been looking at this, but I unfortunately haven't had as much time as I had hoped for, and have not looked at the code in detail yet. It's also a relatively big patch, so I wouldn't mind another pair of eyes on it. I have been testing the patch a bit, and I'm slightly disappointed by the fact that it still doesn't solve this problem (and I apologize if I have missed discussion about this in the docs or in this thread): <assume "foo" is a non-empty materialized view> T1: BEGIN; T1: LOAD MATERIALIZED VIEW foo; T2: SELECT * FROM foo; T1: COMMIT; <T2 sees an empty table> As others have pointed out, replacing the contents of a table is something which people have been wanting to do for a long time, and I think having this ability would make this patch a lot better; now it just feels like syntactic sugar. > 1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE > TABLE AS, with all the same clauses supported. That includes > declaring a materialized view to be temporary or unlogged. > 2. MVs don't support inheritance. > 3. MVs can't define foreign keys. > 4. MVs can't be the target of foreign keys. > 5. MVs can't have triggers. > 6. Users can't create rules which reference MVs (although MVs > [ab]use the rules mechanism internally, similar to how views do). > 7. MVs can't be converted to views, nor vice versa. > 8. Users may not directly use INSERT/UPDATE/DELETE on an MV. > 9. MVs can't directly be used in a COPY statement, but can be the > source of data using a SELECT. > 10. MVs can't own sequences. > 11. MVs can't be the target of LOCK statements, although other > statements get locks just like a table. > 12. MVs can't use data modifying CTEs in their definitions. > 13. pg_class now has a relisvalid column, which is true if an MV is > truncated or created WITH NO DATA. You can not scan a relation > flagged as invalid. > 14. ALTER MATERIALIZED VIEW is supported for the options that seemed > to make sense. For example, you can change the tablespace or > schema, but you cannot add or drop column with ALTER. > 16. To get new data into the MV, the command is LOAD MATERIALIZED > VIEW mat view_name. This seemed more descriptive to me that the > alternatives and avoids declaring any new keywords beyond > MATERIALIZED. If the MV is flagged as relisvalid == false, this > will change it to true. > 17. Since the data viewed in an MV is not up-to-date with the latest > committed transaction, it didn't seem to make any sense to try to > apply SERIALIZABLE transaction semantics to queries looking at > the contents of an MV, although if LMV is run in a SERIALIZABLE > transaction the MV data is guaranteed to be free of serialization > anomalies. This does leave the transaction running the LOAD > command vulnerable to serialization failures unless it is also > READ ONLY DEFERRABLE. > 18. Bound parameters are not supported for the CREATE MATERIALIZED > VIEW statement. I believe all of these points have been under discussion, and I don't have anything to add to the ongoing discussions. > 19. LMV doesn't show a row count. It wouldn't be hard to add, it just > seemed a little out of place to do that, when CLUSTER, etc., > don't. This sounds like a useful feature, but your point about CLUSTER and friends still stands. > In the long term, we will probably need to separate the > implementation of CREATE TABLE AS and CREATE MATERIALIZED VIEW, but > for now there is so little that they need to do differently it seemed > less evil to have a few "if" clauses that that much duplicated code. Seems sensible. I'll get back when I manage to get a better grasp of the code. Regards, Marko Tiikkaja
On Sun, Nov 25, 2012 at 7:30 PM, Marko Tiikkaja <pgmail@joh.to> wrote: > As others have pointed out, replacing the contents of a table is something > which people have been wanting to do for a long time, and I think having > this ability would make this patch a lot better; now it just feels like > syntactic sugar. I agree that it's mostly syntactic sugar, but I think we need to have realistic expectations for what is possible in an initial patch. When I committed the first patch for foreign data wrappers, it didn't work at all: it was just syntax support. Tom later committed a follow-on patch that made them work. Similarly, I split the event trigger patch into two halves, one of which added the syntax support and the other of which made them functional: and even with both commits in, I think it's fair to say that event triggers are still in a fairly primitive state. None of those patches were small patches. It's going to take multiple years to get materialized views up to a state where they're really useful to a broad audience in production applications, but I don't think we should sneer at anyone for writing a patch that is "just syntactic sugar". As it turns out, adding a whole new object type is a lot of work and generates a big patch even if it doesn't do much just yet. Rejecting such patches on the grounds that they aren't comprehensive enough is, IMHO, extremely unwise; we'll either end up landing even larger patches that are almost impossible to review comprehensively and therefore more likely to break something, or else we'll kill the projects outright and end up with nothing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/26/12 2:07 PM, Robert Haas wrote: > On Sun, Nov 25, 2012 at 7:30 PM, Marko Tiikkaja <pgmail@joh.to> wrote: >> As others have pointed out, replacing the contents of a table is something >> which people have been wanting to do for a long time, and I think having >> this ability would make this patch a lot better; now it just feels like >> syntactic sugar. > > I agree that it's mostly syntactic sugar, but I think we need to have > realistic expectations for what is possible in an initial patch. When > I committed the first patch for foreign data wrappers, it didn't work > at all: it was just syntax support. Tom later committed a follow-on > patch that made them work. Similarly, I split the event trigger patch > into two halves, one of which added the syntax support and the other > of which made them functional: and even with both commits in, I think > it's fair to say that event triggers are still in a fairly primitive > state. > > None of those patches were small patches. It's going to take multiple > years to get materialized views up to a state where they're really > useful to a broad audience in production applications, but I don't > think we should sneer at anyone for writing a patch that is "just > syntactic sugar". As it turns out, adding a whole new object type is > a lot of work and generates a big patch even if it doesn't do much > just yet. Rejecting such patches on the grounds that they aren't > comprehensive enough is, IMHO, extremely unwise; we'll either end up > landing even larger patches that are almost impossible to review > comprehensively and therefore more likely to break something, or else > we'll kill the projects outright and end up with nothing. First of all, I have to apologize. Re-reading the email I sent out last night, it does indeed feel a bit harsh and I can understand your reaction. At no point did I mean to belittle Kevin's efforts or the patch itself. I was mostly looking for Kevin's input on how hardit would be to solve the particular problem and whether it would be possible to do so for 9.3. While I feel like the problem I pointed out is a small caveat and should be at least documented for 9.3, I think this patch has merits of its own even if that problem never gets fixed, and I will continue to review this patch. Regards, Marko Tiikkaja
On 26 November 2012 13:07, Robert Haas <robertmhaas@gmail.com> wrote: > None of those patches were small patches. It's going to take multiple > years to get materialized views up to a state where they're really > useful to a broad audience in production applications, but I don't > think we should sneer at anyone for writing a patch that is "just > syntactic sugar". +1. I have a sweet tooth. I don't like it when people criticise patches on the basis of "obviously you could achieve the same effect with $CONVOLUTION". Making things simpler is a desirable outcome. Now, that isn't to say that we should disregard everything or even anything else in pursuit of simplicity; just that "needing a Ph.D is Postgresology", as you once put it, to do something routine to many is really hard to defend. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 11/14/12 9:28 PM, Kevin Grittner wrote: > 17. Since the data viewed in an MV is not up-to-date with the latest > committed transaction, So, the way I understand it, in Oracle terms, this feature is a "snapshot", not a materialized view. Maybe that's what it should be called then.
On Mon, Nov 26, 2012 at 8:14 AM, Marko Tiikkaja <pgmail@joh.to> wrote: > First of all, I have to apologize. Re-reading the email I sent out last > night, it does indeed feel a bit harsh and I can understand your reaction. > > At no point did I mean to belittle Kevin's efforts or the patch itself. I > was mostly looking for Kevin's input on how hard it would be to solve the > particular problem and whether it would be possible to do so for 9.3. > > While I feel like the problem I pointed out is a small caveat and should be > at least documented for 9.3, I think this patch has merits of its own even > if that problem never gets fixed, and I will continue to review this patch. OK, no worries. I didn't really interpret your email as belittling; I just want to make sure this feature doesn't get feature-creeped to death. I think everyone, including Kevin, understands that the real-world applicability of v1 is going to be limited and many people will choose alternative techniques rather than relying on this new feature. But I also think that we'll never get to a really awesome, kick-ass feature unless we're willing to commit an initial version that isn't all that awesome or kick-ass. If I understand Kevin's goals correctly, the plan is to get this basic version committed for 9.3, and then to try to expand the capability during the 9.4 release cycle (and maybe 9.5, too, there's a lot of work to do here). I think that's a pretty sound plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/26/2012 09:46 AM, Peter Eisentraut wrote: > On 11/14/12 9:28 PM, Kevin Grittner wrote: >> 17. Since the data viewed in an MV is not up-to-date with the latest >> committed transaction, > So, the way I understand it, in Oracle terms, this feature is a > "snapshot", not a materialized view. Maybe that's what it should be > called then. > > If you use Jonathan Gardner's taxonomy at <http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views>, snapshots are a subclass of materialized views. cheers andrew
On Mon, Nov 26, 2012 at 09:46:33AM -0500, Peter Eisentraut wrote: > On 11/14/12 9:28 PM, Kevin Grittner wrote: > > 17. Since the data viewed in an MV is not up-to-date with the > > latest committed transaction, > > So, the way I understand it, in Oracle terms, this feature is a > "snapshot", not a materialized view. Maybe that's what it should be > called then. "Snapshot" is one of the options for refreshing an Oracle materialized view. There are others, which we'll eventually add if past is any prologue :) I hate to add to the bike-shedding, but we should probably add REFRESH SNAPSHOT as an optional piece of the grammar, with more REFRESH options to come. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 26 November 2012 15:24, David Fetter <david@fetter.org> wrote: > I hate to add to the bike-shedding, but we should probably add REFRESH > SNAPSHOT as an optional piece of the grammar, with more REFRESH > options to come. I don't know that they should be called materialised views, but do we really need to overload the word snapshot? I'd just as soon invent a new word as use the Oracle one, since I don't think the term snapshot is widely recognised as referring to anything other than snapshot isolation. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Nov 26, 2012 at 04:02:17PM +0000, Peter Geoghegan wrote: > On 26 November 2012 15:24, David Fetter <david@fetter.org> wrote: > > I hate to add to the bike-shedding, but we should probably add > > REFRESH SNAPSHOT as an optional piece of the grammar, with more > > REFRESH options to come. > > I don't know that they should be called materialised views, but do > we really need to overload the word snapshot? I'd just as soon > invent a new word as use the Oracle one, since I don't think the > term snapshot is widely recognised as referring to anything other > than snapshot isolation. I believe that the meaning here is unambiguous, and is used in other descriptions than Oracle's, including the one on our wiki. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Marko Tiikkaja wrote: > On 15/11/2012 03:28, Kevin Grittner wrote: > I have been testing the patch a bit Thanks! > and I'm slightly disappointed by the fact that it still doesn't > solve this problem (and I apologize if I have missed discussion > about this in the docs or in this thread): > > <assume "foo" is a non-empty materialized view> > > T1: BEGIN; > T1: LOAD MATERIALIZED VIEW foo; > > T2: SELECT * FROM foo; > > T1: COMMIT; > > <T2 sees an empty table> As far as I know you are the first to notice this behavior. Thanks for pointing it out. I agree with Robert that we have to be careful about scope creep, but this one looks to me like it should be considered a bug. IMO, LOAD for a populated MV should move it from one state which reflects the captured state of a previous point in time to a captured state which is later, with no invalid or inconsistent states visible in between. I will take a look at the issue; I don't know whether it's something small I can address in this CF or whether it will need to be in the next CF, but I will fix it. >> 19. LMV doesn't show a row count. It wouldn't be hard to add, it >> just seemed a little out of place to do that, when CLUSTER, >> etc., don't. > > This sounds like a useful feature, but your point about CLUSTER > and friends still stands. Other possible arguments against providing a count are: (1) For a populated MV, the LOAD might be replacing the contents with fewer rows than were there before. (2) Once we have incremental updates of the MV, this count is only one of the ways to update the view -- and the others won't show counts. Showing it here might be considered inconsistent. I don't feel strongly about it, and I don't think it's a big change either way; just explaining what got me off the fence when I had to pick one behavior or the other to post the WIP patch. > I'll get back when I manage to get a better grasp of the code. Thanks. Keep in mind that the current behavior of behaving like a regular view when the contents are invalid is not what I had in mind, that was an accidental effect of commenting out the body of the ExecCheckRelationsValid() function right before posting the patch because I noticed a regression. When I noticed current behavior, it struck me that someone might prefer it to the intended behavior of showing an error like this: ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("materialized view \"%s\" has not been populated", get_rel_name(rte->relid)), errhint("Use the LOAD MATERIALIZED VIEW command."))); I mention it in case someone wants to argue for silently behaving as a regular view when the MV is not populated. -Kevin
David Fetter wrote: > On Mon, Nov 26, 2012 at 09:46:33AM -0500, Peter Eisentraut wrote: >> So, the way I understand it, in Oracle terms, this feature is a >> "snapshot", not a materialized view. Maybe that's what it should >> be called then. > > "Snapshot" is one of the options for refreshing an Oracle > materialized view. There are others, which we'll eventually add > if past is any prologue :) That's the way I understand it, too. > I hate to add to the bike-shedding, but we should probably add > REFRESH SNAPSHOT as an optional piece of the grammar, with more > REFRESH options to come. I would prefer to leave the syntax for refreshing MVs to a later patch. I'm kind of assuming that SNAPSHOT would be the default if no other options are specified, but I would prefer not to get into too much speculation about add-on patches for fear of derailing this initial effort. -Kevin
On Mon, Nov 26, 2012 at 04:34:36PM -0500, Kevin Grittner wrote: > David Fetter wrote: > > On Mon, Nov 26, 2012 at 09:46:33AM -0500, Peter Eisentraut wrote: > > >> So, the way I understand it, in Oracle terms, this feature is a > >> "snapshot", not a materialized view. Maybe that's what it should > >> be called then. > > > > "Snapshot" is one of the options for refreshing an Oracle > > materialized view. There are others, which we'll eventually add > > if past is any prologue :) > > That's the way I understand it, too. Great :) > > I hate to add to the bike-shedding, but we should probably add > > REFRESH SNAPSHOT as an optional piece of the grammar, with more > > REFRESH options to come. > > I would prefer to leave the syntax for refreshing MVs to a later > patch. I'm kind of assuming that SNAPSHOT would be the default if > no other options are specified, but I would prefer not to get into > too much speculation about add-on patches for fear of derailing > this initial effort. You're right. I withdraw my suggestion until such time as this patch (or descendent) is committed and actual working code implementing other refresh strategies is written. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, 2012-11-26 at 09:46 -0500, Peter Eisentraut wrote: > On 11/14/12 9:28 PM, Kevin Grittner wrote: > > 17. Since the data viewed in an MV is not up-to-date with the latest > > committed transaction, > > So, the way I understand it, in Oracle terms, this feature is a > "snapshot", not a materialized view. Maybe that's what it should be > called then. OK, I take everything back and claim the opposite. In current Oracle, SNAPSHOT is an obsolete alias for MATERIALIZED VIEW. Materialized views have the option of REFRESH ON DEMAND and REFRESH ON COMMIT, with the former being the default. So it seems that the syntax of what you are proposing is in line with Oracle. I'm not fond of overloading LOAD as the refresh command. Maybe you could go the Oracle route here as well and use a stored procedure. That would also allow things like SELECT pg_refresh_mv(oid) FROM ... more easily.
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Peter Eisentraut > Sent: 27 November 2012 13:35 > To: Kevin Grittner > Cc: Pgsql Hackers > Subject: Re: [HACKERS] Materialized views WIP patch > > On Mon, 2012-11-26 at 09:46 -0500, Peter Eisentraut wrote: > > On 11/14/12 9:28 PM, Kevin Grittner wrote: > > > 17. Since the data viewed in an MV is not up-to-date with the latest > > > committed transaction, > > > > So, the way I understand it, in Oracle terms, this feature is a > > "snapshot", not a materialized view. Maybe that's what it should be > > called then. > > OK, I take everything back and claim the opposite. > > In current Oracle, SNAPSHOT is an obsolete alias for MATERIALIZED VIEW. > Materialized views have the option of REFRESH ON DEMAND and REFRESH > ON COMMIT, with the former being the default. So it seems that the syntax > of what you are proposing is in line with Oracle. > > I'm not fond of overloading LOAD as the refresh command. Maybe you could > go the Oracle route here as well and use a stored procedure. That would also > allow things like SELECT pg_refresh_mv(oid) FROM ... more easily. > > +1 to this. I can see a use case where you might want to refresh all MVs that are X number of days/hours old. Rather than having to executestatements for each one. Something like pg_refresh_mv() within a query would allow this. Pretty exciting work Kevin, I understand what Robert said about feature creep etc and agree 100%, but I'm really lookingforward to when we can *one day* have the planner make use of an eager MV to optimise a query! Regards David Rowley > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make > changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Peter Eisentraut <peter_e@gmx.net> writes: > I'm not fond of overloading LOAD as the refresh command. Maybe you > could go the Oracle route here as well and use a stored procedure. That > would also allow things like SELECT pg_refresh_mv(oid) FROM ... more > easily. I would like that we have a way to refresh a Materialized View by calling a stored procedure, but I don't think it should be the main UI. The wholesale refreshing of a matview appears to me to be comparable to TRUNCATE is that it's both a DDL and a DML. The incremental refreshing modes we want to have later are clearly DML only, either on commit refresh or incrementally on demand. I would then propose that we use ALTER MATERIALIZED VIEW as the UI for the wholesale on demand refreshing operation, and UPDATE MATERIALIZED VIEW as the incremental command (to come later). So my proposal for the current feature would be: ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ]; UPDATE MATERIALIZED VIEW mv; The choice of keywords and syntax here hopefully clearly hint the user about the locking behavior of the commands, too. And as we said, the bare minimum for this patch does *not* include the CONCURRENTLY option, which we still all want to have (someday). :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
2012/11/27 Dimitri Fontaine <dimitri@2ndquadrant.fr>: > Peter Eisentraut <peter_e@gmx.net> writes: >> I'm not fond of overloading LOAD as the refresh command. Maybe you >> could go the Oracle route here as well and use a stored procedure. That >> would also allow things like SELECT pg_refresh_mv(oid) FROM ... more >> easily. > > I would like that we have a way to refresh a Materialized View by > calling a stored procedure, but I don't think it should be the main UI. > > The wholesale refreshing of a matview appears to me to be comparable to > TRUNCATE is that it's both a DDL and a DML. The incremental refreshing > modes we want to have later are clearly DML only, either on commit > refresh or incrementally on demand. > > I would then propose that we use ALTER MATERIALIZED VIEW as the UI for > the wholesale on demand refreshing operation, and UPDATE MATERIALIZED > VIEW as the incremental command (to come later). > > So my proposal for the current feature would be: > > ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ]; > UPDATE MATERIALIZED VIEW mv; > > The choice of keywords and syntax here hopefully clearly hint the user > about the locking behavior of the commands, too. And as we said, the > bare minimum for this patch does *not* include the CONCURRENTLY option, > which we still all want to have (someday). :) > +1 Pavel > Regards, > -- > Dimitri Fontaine > http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Pavel Stehule wrote: > 2012/11/27 Dimitri Fontaine <dimitri@2ndquadrant.fr>: >> I would like that we have a way to refresh a Materialized View >> by calling a stored procedure, but I don't think it should be >> the main UI. I agree. I saw that Oracle uses a function for that without any statement-level support, and that would probably be easier to implement; but it felt wrong to do it that way. I couldn't think of any other cases where similar action is taken without statement syntax for it. >> The wholesale refreshing of a matview appears to me to be >> comparable to TRUNCATE is that it's both a DDL and a DML. The >> incremental refreshing modes we want to have later are clearly >> DML only, either on commit refresh or incrementally on demand. Personally, I expect the most popular update method to eventually become a queued update. I've looked ahead far enough to see that I want to structure the incremental updates to be controlled through an API where changes to supporting tables produce records saying what was done which are fed to consumers which do the updating. Then it becomes a matter of whether that consumer performs the related updates to the MV during commit processing of the producer, by pulling from a queue, or by reading accumulated rows when the MV is referenced. But I'm getting ahead of things with such speculation... >> I would then propose that we use ALTER MATERIALIZED VIEW as the >> UI for the wholesale on demand refreshing operation, and UPDATE >> MATERIALIZED VIEW as the incremental command (to come later). Honestly, I have managed to keep myself from speculating on syntax for incremental updates. There will be enough complexity involved that I expect months of bikeshedding. :-/ >> So my proposal for the current feature would be: >> >> ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ]; >> UPDATE MATERIALIZED VIEW mv; An ALTER MATERIALIZED VIEW option was my first thought on syntax to do what LOAD does in the current patch. But it bothered me that I couldn't think of any other cases where ALTER <some-object-type> only changed the data contained within the object and had no other impact. Are you both really comfortable with an ALTER MATERIALIZED VIEW which has no effect other than to update the data? It seems wrong to me. >> The choice of keywords and syntax here hopefully clearly hint >> the user about the locking behavior of the commands, too. And as >> we said, the bare minimum for this patch does *not* include the >> CONCURRENTLY option, which we still all want to have (someday). >> :) > > +1 Sure -- a CONCURRENTLY option for LMV (or AMVU) seems like one of the next steps. I'll feel more confident about implementing that when it appears that we have shaken the last bugs out of CREATE/DROP INDEX CONCURRENTLY, since anything which affects those statements will probably also matter here. -Kevin
On Nov 27, 2012, at 5:25, Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote: > > So my proposal for the current feature would be: > > ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ]; > UPDATE MATERIALIZED VIEW mv; > > The choice of keywords and syntax here hopefully clearly hint the user > about the locking behavior of the commands, too. And as we said, the > bare minimum for this patch does *not* include the CONCURRENTLY option, > which we still all want to have (someday). :) > I dislike using ALTER syntax to perform a data-only action. The other advantage of non-functional syntax is that you could more easily supply some form of where clause should you onlywant to perform a partial refresh. With a function call that becomes more obtuse. David J.
"Kevin Grittner" <kgrittn@mail.com> writes: > An ALTER MATERIALIZED VIEW option was my first thought on syntax to > do what LOAD does in the current patch. But it bothered me that I > couldn't think of any other cases where ALTER <some-object-type> > only changed the data contained within the object and had no other > impact. Are you both really comfortable with an ALTER MATERIALIZED > VIEW which has no effect other than to update the data? It seems > wrong to me. I think you can already do that with some clever use of alter table ... type using, or alter table set default. > Sure -- a CONCURRENTLY option for LMV (or AMVU) seems like one of > the next steps. I'll feel more confident about implementing that > when it appears that we have shaken the last bugs out of > CREATE/DROP INDEX CONCURRENTLY, since anything which affects those > statements will probably also matter here. Sure. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine wrote: > "Kevin Grittner" <kgrittn@mail.com> writes: >> An ALTER MATERIALIZED VIEW option was my first thought on syntax >> to do what LOAD does in the current patch. But it bothered me >> that I couldn't think of any other cases where ALTER >> <some-object-type> only changed the data contained within the >> object and had no other impact. Are you both really comfortable >> with an ALTER MATERIALIZED VIEW which has no effect other than >> to update the data? It seems wrong to me. > > I think you can already do that with some clever use of alter > table ... type using, or alter table set default. You mean, specifying an ALTER TABLE which appears to specify a change to some non-data aspect of a table but which is really setting it to the existing state? And it therefore rewrites the table? I suppose that with USING you could actually even have it rewritten with data different from what was there before without changing the structure of the table. Somehow I don't find that pursuasive as an argument for what ALTER MATERIALIZED VIEW should rescan the source relations and build a whole new set of data for exactly the same MV definition. Consider that in relational theory a table is considered a relation variable. ALTER is supposed to change the definition of the variable in some way. Other statements are used to change the value contained in the variable. Sure there are some grey areas already, but I don't see where we need to muddy the waters in this case. -Kevin
"Kevin Grittner" <kgrittn@mail.com> writes: > changing the structure of the table. Somehow I don't find that > pursuasive as an argument for what ALTER MATERIALIZED VIEW should > rescan the source relations and build a whole new set of data for > exactly the same MV definition. Fair enough. > Consider that in relational theory a table is considered a relation > variable. ALTER is supposed to change the definition of the > variable in some way. Other statements are used to change the value > contained in the variable. Sure there are some grey areas already, > but I don't see where we need to muddy the waters in this case. Under that light, using ALTER is strange indeed. I still don't like using LOAD that much, allow me to try a last syntax proposal. Well all I can find just now would be: UPDATE MATERIALIZED VIEW mv FOR EACH ROW; UPDATE MATERIALIZED VIEW mv FOR EACH STATEMENT [ CONCURRENTLY ]; The only value of such a proposal is that it's not LOAD and it's still not introducing any new keyword. Oh it's also avoiding to overload the SNAPSHOT keyword. Well, it still does not look like the best candidate. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Under that light, using ALTER is strange indeed. Agreed, seems like a poor choice. > I still don't like > using LOAD that much, allow me to try a last syntax proposal. Well all I > can find just now would be: > UPDATE MATERIALIZED VIEW mv FOR EACH ROW; > UPDATE MATERIALIZED VIEW mv FOR EACH STATEMENT [ CONCURRENTLY ]; > The only value of such a proposal is that it's not LOAD and it's still > not introducing any new keyword. Oh it's also avoiding to overload the > SNAPSHOT keyword. Well, it still does not look like the best candidate. I think this syntax would require making MATERIALIZED (and possibly also VIEW) fully reserved keywords, which would be better avoided. regards, tom lane
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Dimitri Fontaine > Sent: Tuesday, November 27, 2012 10:03 AM > To: Kevin Grittner > Cc: Pavel Stehule; Peter Eisentraut; Pgsql Hackers > Subject: Re: [HACKERS] Materialized views WIP patch > > "Kevin Grittner" <kgrittn@mail.com> writes: > > changing the structure of the table. Somehow I don't find that > > pursuasive as an argument for what ALTER MATERIALIZED VIEW should > > rescan the source relations and build a whole new set of data for > > exactly the same MV definition. > > Fair enough. > > > Consider that in relational theory a table is considered a relation > > variable. ALTER is supposed to change the definition of the variable > > in some way. Other statements are used to change the value contained > > in the variable. Sure there are some grey areas already, but I don't > > see where we need to muddy the waters in this case. > > Under that light, using ALTER is strange indeed. I still don't like using LOAD > that much, allow me to try a last syntax proposal. Well all I can find just now > would be: > > UPDATE MATERIALIZED VIEW mv FOR EACH ROW; > UPDATE MATERIALIZED VIEW mv FOR EACH STATEMENT [ CONCURRENTLY ]; > > The only value of such a proposal is that it's not LOAD and it's still not > introducing any new keyword. Oh it's also avoiding to overload the > SNAPSHOT keyword. Well, it still does not look like the best candidate. > > Regards, Just a thought but how about something like: DO REFRESH OF MATERIALIZED VIEW mat_view; In effect we begin overloading the meaning of "DO" to not only mean anonymous code blocks but to also call pre-defined internal routines that can be executed without having to use function-call syntax. "MATERIALIZED VIEW" can be more generic "e.g., TABLE" if the need arises, the REFRESH "Action" is generic, and additional clauses can be added after the object name (FOR, CONCURRENTLY, WHERE, etc...) David J.
2012/11/27 David Johnston <polobo@yahoo.com>: >> -----Original Message----- >> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- >> owner@postgresql.org] On Behalf Of Dimitri Fontaine >> Sent: Tuesday, November 27, 2012 10:03 AM >> To: Kevin Grittner >> Cc: Pavel Stehule; Peter Eisentraut; Pgsql Hackers >> Subject: Re: [HACKERS] Materialized views WIP patch >> >> "Kevin Grittner" <kgrittn@mail.com> writes: >> > changing the structure of the table. Somehow I don't find that >> > pursuasive as an argument for what ALTER MATERIALIZED VIEW should >> > rescan the source relations and build a whole new set of data for >> > exactly the same MV definition. >> >> Fair enough. >> >> > Consider that in relational theory a table is considered a relation >> > variable. ALTER is supposed to change the definition of the variable >> > in some way. Other statements are used to change the value contained >> > in the variable. Sure there are some grey areas already, but I don't >> > see where we need to muddy the waters in this case. >> >> Under that light, using ALTER is strange indeed. I still don't like using > LOAD >> that much, allow me to try a last syntax proposal. Well all I can find > just now >> would be: >> >> UPDATE MATERIALIZED VIEW mv FOR EACH ROW; >> UPDATE MATERIALIZED VIEW mv FOR EACH STATEMENT [ CONCURRENTLY ]; >> >> The only value of such a proposal is that it's not LOAD and it's still not >> introducing any new keyword. Oh it's also avoiding to overload the >> SNAPSHOT keyword. Well, it still does not look like the best candidate. >> >> Regards, > > Just a thought but how about something like: > > DO REFRESH OF MATERIALIZED VIEW mat_view; > > In effect we begin overloading the meaning of "DO" to not only mean > anonymous code blocks but to also call pre-defined internal routines that > can be executed without having to use function-call syntax. "MATERIALIZED > VIEW" can be more generic "e.g., TABLE" if the need arises, the REFRESH > "Action" is generic, and additional clauses can be added after the object > name (FOR, CONCURRENTLY, WHERE, etc...) -1 I unlike using keywords DO for this purpose - when we use it for anonymous blocks Regards Pavel > > David J. > > > > > > > >
On Tue, Nov 27, 2012 at 10:58 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I unlike using keywords DO for this purpose - when we use it for > anonymous blocks Yeah, I don't much like that either. My original suggestion when Kevin and I discussed this over voice was ALTER MATERIALIZED VIEW .. REFRESH or ALTER MATERIALIZED VIEW .. UPDATE. I don't particularly like syntaxes involving DO or LOAD because those words already have strong associations with completely unrelated features. Now, if we don't want to do that and we don't want to use ALTER for a data-modifying command either, another option would be to invent a new toplevel command: REFRESH <view_name>; Of course, that does introduce another keyword, but the penalty for a new unreserved keyword is pretty small. It seems like a rough analogue of CLUSTER, which could be spelled ALTER TABLE <table_name> UPDATE TABLE ORDER TO if keyword minimization trumped both concision and clarity, but it doesn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > I don't particularly like syntaxes involving DO or LOAD because > those words already have strong associations with completely > unrelated features. Now, if we don't want to do that and we don't > want to use ALTER for a data-modifying command either, another > option would be to invent a new toplevel command: > > REFRESH <view_name>; > > Of course, that does introduce another keyword, but the penalty > for a new unreserved keyword is pretty small. Of the alternatives to LOAD MATERIALIZED VIEW, something involving REFRESH seems the best to me. The question is whether REFRESH MATERIALIZED VIEW (or just REFRESH) is more clear, and whether it is so by enough to merit another keyword. Of course, there is a chance that we may wind up needing that keyword for declaring incremental updates anyway, so it might be a matter of *when* we do it rather than *whether* we do it -- depending on the yet-to-be- determined syntax for specifying incremental updates. My personal preference is still for LOAD MATERIALIZED VIEW because it implies a complete regeneration rather than something more incremental, but I realize that is subjective. -Kevin
2012/11/28 Kevin Grittner <kgrittn@mail.com>: > Robert Haas wrote: > >> I don't particularly like syntaxes involving DO or LOAD because >> those words already have strong associations with completely >> unrelated features. Now, if we don't want to do that and we don't >> want to use ALTER for a data-modifying command either, another >> option would be to invent a new toplevel command: >> >> REFRESH <view_name>; >> >> Of course, that does introduce another keyword, but the penalty >> for a new unreserved keyword is pretty small. > > Of the alternatives to LOAD MATERIALIZED VIEW, something involving > REFRESH seems the best to me. The question is whether REFRESH > MATERIALIZED VIEW (or just REFRESH) is more clear, and whether it > is so by enough to merit another keyword. Of course, there is a > chance that we may wind up needing that keyword for declaring > incremental updates anyway, so it might be a matter of *when* we do > it rather than *whether* we do it -- depending on the yet-to-be- > determined syntax for specifying incremental updates. > > My personal preference is still for LOAD MATERIALIZED VIEW because > it implies a complete regeneration rather than something more > incremental, but I realize that is subjective. In this context I prefer REFRESH keyword - I have a LOAD associated with BULKLOAD, a this is different Regards Pavel > > -Kevin
Hi Kevin, On Mon, 26 Nov 2012 22:24:33 +0100, Kevin Grittner <kgrittn@mail.com> wrote: > Marko Tiikkaja wrote: >> <T2 sees an empty table> > > As far as I know you are the first to notice this behavior. Thanks > for pointing it out. > > I will take a look at the issue; I don't know whether it's > something small I can address in this CF or whether it will need to > be in the next CF, but I will fix it. Any news on this front? >> I'll get back when I manage to get a better grasp of the code. The code looks relatively straightforward and good to my eyes. It passes my testing and looks to be changing all the necessary parts of the code. > Keep in mind that the current behavior of behaving like a regular > view when the contents are invalid is not what I had in mind, that > was an accidental effect of commenting out the body of the > ExecCheckRelationsValid() function right before posting the patch > because I noticed a regression. When I noticed current behavior, it > struck me that someone might prefer it to the intended behavior of > showing an error like this: > > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("materialized view \"%s\" has not been populated", > get_rel_name(rte->relid)), > errhint("Use the LOAD MATERIALIZED VIEW command."))); > > I mention it in case someone wants to argue for silently behaving > as a regular view when the MV is not populated. FWIW, I'd prefer an error in this case, but I don't feel strongly about it. Regards, Marko Tiikkaja
Marko Tiikkaja wrote: > Kevin Grittner <kgrittn@mail.com> wrote: >> Marko Tiikkaja wrote: >>> <T2 sees an empty table> >> >> As far as I know you are the first to notice this behavior. >> Thanks for pointing it out. >> >> I will take a look at the issue; I don't know whether it's >> something small I can address in this CF or whether it will need >> to be in the next CF, but I will fix it. > > Any news on this front? On a preliminary look, I think that it's because when the new heap is visible, it has been populated with rows containing an xmin too new to be visible to the transaction which was blocked. I'll see if I can't populate the new heap with tuples which are already frozen and hinted, which will not only fix this bug, but improve performance. >>> I'll get back when I manage to get a better grasp of the code. > > The code looks relatively straightforward and good to my eyes. It > passes my testing and looks to be changing all the necessary > parts of the code. Thanks. I'll put together a new version of the patch based on feedback so far. I need to finish my testing of the fklocks patch and post a review first, so it will be a few days. >> Keep in mind that the current behavior of behaving like a >> regular view when the contents are invalid is not what I had in >> mind, that was an accidental effect of commenting out the body >> of the ExecCheckRelationsValid() function right before posting >> the patch because I noticed a regression. When I noticed current >> behavior, it struck me that someone might prefer it to the >> intended behavior of showing an error like this: >> >> ereport(ERROR, >> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), >> errmsg("materialized view \"%s\" has not been populated", >> get_rel_name(rte->relid)), >> errhint("Use the LOAD MATERIALIZED VIEW command."))); >> >> I mention it in case someone wants to argue for silently >> behaving as a regular view when the MV is not populated. > > FWIW, I'd prefer an error in this case, but I don't feel strongly > about it. Me, too. Since nobody else has spoken up, that's what I'll do. I haven't heard anyone argue for keeping temporary materialized views, so those will be dropped from the next version of the patch. It sounds like most people prefer REFRESH to LOAD. If there is no further discussion of that, I will make that change in the next version of the patch; so if you want to argue against adding a new unreserved keyword for that, now is the time to say so. Also, should it be just?: REFRESH matview_name; or?: REFRESH MATERIALIZED VIEW matview_name; I will look at heap_inplace_update() where appropriate for the pg_class changes. On the issue of UHLOGGED MVs, I think it's pretty clear that the right way to handle this is to have something in the init fork of an unlogged MV that indicates that it is in an invalid state, which is checked when the MV is opened. I'm not sure of the best way to store that, but my first instinct would be to put it inot the "special space" of the initial page, which would then be removed after forcing the relisvalid flag to false. That feels pretty kludgy, but its the best idea I've had so far. Anyone else want to suggest a better way? I will add regression tests in the next version of the patch. While I agree that tracking one or more timestamps related to MV "freshness" is a good idea, that seems like material for a follow-on patch. I don't agree that never having been loaded is a form of "stale", in spite of multiple people whom I hold in high regard expressing that opinion, so barring strong opposition I intend to keep the relisvalid column. I just don't feel comfortable about the fundamental correctness of the feature without it. If I'm reading things correctly, so far the opposition has been based on being lukewarm on the value of the column and wanting other features which might be harder with this column (i.e., UNLOGGED) or which are seen as alternatives to this column (i.e., a "freshness" timestamp) rather than actual opposition to throwing an error on an attempt to use an MV which has not been loaded with data. I will fix the commented-out test of relisvalid. I still haven't quite figured out what to do about the case Thom uncovered where the MV's SELECT was just selecting a literal. I haven't established what the most general case of that is, nor what a sensible behavior is. Any opinions or insights welcome. I didn't see any rebuttal to Tom's concerns about adding asymetrical COPY support, so I will leave that part as is. I will try to fill out the dependencies in pg_dump so that MVs which depend on other MVs will be dumped in the right order. I think that will probably be enough for the next version. -Kevin
Here is a new version of the patch, with most issues discussed in previous posts fixed. I've been struggling with two areas: - pg_dump sorting for MVs which depend on other MVs - proper handling of the relisvalid flag for unlogged MVs after recovery I've been hacking at the code in those areas without success; what's here is the least broken form I have, but work is still needed for these cases. Any other problems are news to me. In addition, the docs need another pass, and there is an open question about what is the right thing to use for TRUNCATE syntax. -Kevin
Attachment
[resending with patch compressed, since original post didn't make it through to the list] Here is a new version of the patch, with most issues discussed in previous posts fixed. I've been struggling with two areas: - pg_dump sorting for MVs which depend on other MVs - proper handling of the relisvalid flag for unlogged MVs after recovery I've been hacking at the code in those areas without success; what's here is the least broken form I have, but work is still needed for these cases. Any other problems are news to me. In addition, the docs need another pass, and there is an open question about what is the right thing to use for TRUNCATE syntax. -Kevin
Attachment
"Kevin Grittner" <kgrittn@mail.com> writes: > I've been struggling with two areas: > - pg_dump sorting for MVs which depend on other MVs Surely that should fall out automatically given that the dependency is properly expressed in pg_depend? If you mean you're trying to get it to cope with circular dependencies between MVs, it might take some work on the pg_dump side, but plain ordering shouldn't require new code. regards, tom lane
Tom Lane wrote: > "Kevin Grittner" <kgrittn@mail.com> writes: >> I've been struggling with two areas: >> - pg_dump sorting for MVs which depend on other MVs > > Surely that should fall out automatically given that the > dependency is properly expressed in pg_depend? > > If you mean you're trying to get it to cope with circular > dependencies between MVs, it might take some work on the pg_dump > side, but plain ordering shouldn't require new code. The *definitions* sort properly, but what I'm trying to do is define them WITH NO DATA and load data after all the COPY statements for tables. If mva is referenced by mvb, the goal is the REFRESH mva, build its indexes before running REFRESH for mvb and building its indexes. To do things in any other order does't seem to me to leave things after restore in the same state they were in at the time of the dump. So I should have been a little more verbose describing the problem: pg_dump sorting of REFRESH and CREATE INDEX steps for MVs which depend on other MVs. Last night I found why my previous attempts had been failing -- I was trying to build the dependencies at the wrong point in the dump process, after the sorts had already been done. Now that I've spotted that fundamental flaw, I think I can get this out of the way without too much more fanfare. I kept thinking I had something wrong in the detail of my approach, while the problem was at a much higher level. Where I really need someone to hit me upside the head with a clue-stick is the code I added to the bottom of RelationBuildDesc() in relcache.c. The idea is that on first access to an unlogged MV, to detect that the heap has been replaced by the init fork, set relisvalid to false, and make the heap look normal again. I couldn't see any way to do that which wasn't a kludge, and I can't figure out how to deal with relcache properly in implementing that kludge. Either a tip about the right way to work the kludge, or a suggestion for a less kludgy alternative would be welcome. -Kevin
On 16 January 2013 05:40, Kevin Grittner <kgrittn@mail.com> wrote:
--
Thom
Here is a new version of the patch, with most issues discussed in
previous posts fixed.
I've been struggling with two areas:
- pg_dump sorting for MVs which depend on other MVs
- proper handling of the relisvalid flag for unlogged MVs after recovery
Some weirdness:
postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
SELECT 2
postgres=# \d+ mv_test2
Materialized view "public.mv_test2"
Column | Type | Modifiers | Storage | Stats target | Description
----------+---------+-----------+---------+--------------+-------------
moo | integer | | plain | |
?column? | integer | | plain | |
View definition:
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";
Has OIDs: no
The "weirdness" I refer you to is the view definition. This does not occur with a straightforward UNION.
This does not occur with a regular view:
postgres=# CREATE VIEW v_test3 AS SELECT moo, 2*moo FROM v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
CREATE VIEW
postgres=# \d+ v_test3
View "public.v_test3"
Column | Type | Modifiers | Storage | Description
----------+---------+-----------+---------+-------------
moo | integer | | plain |
?column? | integer | | plain |
View definition:
SELECT v_test2.moo, 2 * v_test2.moo
FROM v_test2
UNION ALL
SELECT v_test2.moo, 3 * v_test2.moo
FROM v_test2;
Thom
Thom Brown wrote: > Some weirdness: > > postgres=# CREATE VIEW v_test2 AS SELECT 1 moo; > CREATE VIEW > postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM > v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2; > SELECT 2 > postgres=# \d+ mv_test2 > Materialized view "public.mv_test2" > Column | Type | Modifiers | Storage | Stats target | Description > ----------+---------+-----------+---------+--------------+------------- > moo | integer | | plain | | > ?column? | integer | | plain | | > View definition: > SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"; You are very good at coming up with these, Thom! Will investigate. Can you confirm that *selecting* from the MV works as you would expect; it is just the presentation in \d+ that's a problem? Thanks, -Kevin
On 16 January 2013 17:20, Kevin Grittner <kgrittn@mail.com> wrote:
--
Thom
Thom Brown wrote:You are very good at coming up with these, Thom!
> Some weirdness:
>
> postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
> CREATE VIEW
> postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
> v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
> SELECT 2
> postgres=# \d+ mv_test2
> Materialized view "public.mv_test2"
> Column | Type | Modifiers | Storage | Stats target | Description
> ----------+---------+-----------+---------+--------------+-------------
> moo | integer | | plain | |
> ?column? | integer | | plain | |
> View definition:
> SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";
Will investigate.
Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?
Yes, nothing wrong with using the MV, or refreshing it:
postgres=# TABLE mv_test2;
moo | ?column?
-----+----------
1 | 2
1 | 3
(2 rows)
postgres=# SELECT * FROM mv_test2;
moo | ?column?
-----+----------
1 | 2
1 | 3
(2 rows)
postgres=# REFRESH MATERIALIZED VIEW mv_test2;
REFRESH MATERIALIZED VIEW
But a pg_dump of the MV has the same issue as the view definition:
--
-- Name: mv_test2; Type: MATERIALIZED VIEW; Schema: public; Owner: thom; Tablespace:
--
CREATE MATERIALIZED VIEW mv_test2 (
moo,
"?column?"
) AS
SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"
WITH NO DATA;
Thom
"Kevin Grittner" <kgrittn@mail.com> writes: > Tom Lane wrote: >> Surely that should fall out automatically given that the >> dependency is properly expressed in pg_depend? > The *definitions* sort properly, but what I'm trying to do is > define them WITH NO DATA and load data after all the COPY > statements for tables. If mva is referenced by mvb, the goal is the > REFRESH mva, build its indexes before running REFRESH for mvb and > building its indexes. To do things in any other order does't seem > to me to leave things after restore in the same state they were in > at the time of the dump. Ah. Can't you treat this using the same pg_dump infrastructure as for the data for an ordinary table? The dependencies made for the TableDataInfo object might be a bit different, but after that it seems like the sort logic ought to be happy. > Where I really need someone to hit me upside the head with a > clue-stick is the code I added to the bottom of RelationBuildDesc() > in relcache.c. The idea is that on first access to an unlogged MV, > to detect that the heap has been replaced by the init fork, set > relisvalid to false, and make the heap look normal again. Hmm. I agree that relcache.c has absolutely no business doing that, but not sure what else to do instead. Seems like it might be better done at first touch of the MV in the parser, rewriter, or planner --- but the fact that I can't immediately decide which of those is right makes me feel that it's still too squishy. I'm also wondering about locking issues there. Obviously you don't want more than one backend trying to rebuild the MV. Do we really need unlogged MVs in the first iteration? Seems like that's adding a whole bunch of new issues, when you have quite enough already without that. regards, tom lane
> Do we really need unlogged MVs in the first iteration? Seems like > that's adding a whole bunch of new issues, when you have quite enough > already without that. While I think there is strong user demand for unlogged MVs, if we can get MVs without unlogged ones for 9.3, I say go for that. We'll add unlogged in 9.4. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Where I really need someone to hit me upside the head with a >> clue-stick is the code I added to the bottom of RelationBuildDesc() >> in relcache.c. The idea is that on first access to an unlogged MV, >> to detect that the heap has been replaced by the init fork, set >> relisvalid to false, and make the heap look normal again. > > Hmm. I agree that relcache.c has absolutely no business doing that, > but not sure what else to do instead. Seems like it might be better > done at first touch of the MV in the parser, rewriter, or planner --- > but the fact that I can't immediately decide which of those is right > makes me feel that it's still too squishy. I think we shouldn't be doing that at all. The whole business of transferring the relation-is-invalid information from the relation to a pg_class flag seems like a Rube Goldberg device to me. I'm still not convinced that we should have a relation-is-invalid flag at all, but can we at least not have two? It seems perfectly adequate to detect that the MV is invalid when we actually try to execute a plan - that is, when we first access the heap or one of its indexes. So the bit can just live in the file-on-disk, and there's no need to have a second copy of it in pg_class. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 16 January 2013 17:25, Thom Brown <thom@linux.com> wrote:
--
Thom
On 16 January 2013 17:20, Kevin Grittner <kgrittn@mail.com> wrote:Thom Brown wrote:You are very good at coming up with these, Thom!
> Some weirdness:
>
> postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
> CREATE VIEW
> postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
> v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
> SELECT 2
> postgres=# \d+ mv_test2
> Materialized view "public.mv_test2"
> Column | Type | Modifiers | Storage | Stats target | Description
> ----------+---------+-----------+---------+--------------+-------------
> moo | integer | | plain | |
> ?column? | integer | | plain | |
> View definition:
> SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";
Will investigate.
Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?Yes, nothing wrong with using the MV, or refreshing it:postgres=# TABLE mv_test2;moo | ?column?-----+----------1 | 21 | 3(2 rows)postgres=# SELECT * FROM mv_test2;moo | ?column?-----+----------1 | 21 | 3(2 rows)postgres=# REFRESH MATERIALIZED VIEW mv_test2;REFRESH MATERIALIZED VIEWBut a pg_dump of the MV has the same issue as the view definition:---- Name: mv_test2; Type: MATERIALIZED VIEW; Schema: public; Owner: thom; Tablespace:--CREATE MATERIALIZED VIEW mv_test2 (moo,"?column?") ASSELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"WITH NO DATA;
A separate issue is with psql tab-completion:
postgres=# COMMENT ON MATERIALIZED VIEW ^IIS
This should be offering MV names instead of prematurely providing the "IS" keyword.
Thom
On 16 January 2013 05:40, Kevin Grittner <kgrittn@mail.com> wrote: > Here is a new version of the patch, with most issues discussed in > previous posts fixed. Looks good. The patch implements one kind of MV. In the future, we hope to have other features or other kinds of MV alongside this: * Snapshot MV - built once at start and then refreshed by explicit command only * Snapshot MV with fast refresh * Maintained MV (lazy) - changes trickle continuously into lazy MVs * Maintained MV (transactional) - changes applied as part of write transactions and or others So I think we should agree now some aspects of those other options so we can decide syntax. Otherwise we'll be left in the situation that what we implement in 9.3 becomes the default for all time and/or we have difficulties adding things later. e.g. REFRESH ON COMMAND Also, since there is no optimizer linkage between these MVs and the tables they cover, I think we need to have that explicitly as a command option, e.g. DISABLE OPTIMIZATION That way in the future we can implement "ENABLE OPTIMIZATION" mode and REFRESH TRANSACTIONAL mode as separate items. So all I am requesting is that we add additional syntax now, so that future additional features are clear. Please suggest syntax, not wedded to those... and we may want to use more compatible syntax also. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 17 January 2013 16:03, Thom Brown <thom@linux.com> wrote:
--
Thom
On 16 January 2013 17:25, Thom Brown <thom@linux.com> wrote:On 16 January 2013 17:20, Kevin Grittner <kgrittn@mail.com> wrote:Thom Brown wrote:You are very good at coming up with these, Thom!
> Some weirdness:
>
> postgres=# CREATE VIEW v_test2 AS SELECT 1 moo;
> CREATE VIEW
> postgres=# CREATE MATERIALIZED VIEW mv_test2 AS SELECT moo, 2*moo FROM
> v_test2 UNION ALL SELECT moo, 3*moo FROM v_test2;
> SELECT 2
> postgres=# \d+ mv_test2
> Materialized view "public.mv_test2"
> Column | Type | Modifiers | Storage | Stats target | Description
> ----------+---------+-----------+---------+--------------+-------------
> moo | integer | | plain | |
> ?column? | integer | | plain | |
> View definition:
> SELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?";
Will investigate.
Can you confirm that *selecting* from the MV works as you would
expect; it is just the presentation in \d+ that's a problem?Yes, nothing wrong with using the MV, or refreshing it:postgres=# TABLE mv_test2;moo | ?column?-----+----------1 | 21 | 3(2 rows)postgres=# SELECT * FROM mv_test2;moo | ?column?-----+----------1 | 21 | 3(2 rows)postgres=# REFRESH MATERIALIZED VIEW mv_test2;REFRESH MATERIALIZED VIEWBut a pg_dump of the MV has the same issue as the view definition:---- Name: mv_test2; Type: MATERIALIZED VIEW; Schema: public; Owner: thom; Tablespace:--CREATE MATERIALIZED VIEW mv_test2 (moo,"?column?") ASSELECT "*SELECT* 1".moo, "*SELECT* 1"."?column?"WITH NO DATA;A separate issue is with psql tab-completion:postgres=# COMMENT ON MATERIALIZED VIEW ^IISThis should be offering MV names instead of prematurely providing the "IS" keyword.
Also in doc/src/sgml/ref/alter_materialized_view.sgml:
s/materailized/materialized/
In src/backend/executor/execMain.c:
s/referrenced/referenced/
Thom
On Thu, Jan 17, 2013 at 07:54:55AM -0500, Robert Haas wrote: > On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Where I really need someone to hit me upside the head with a > >> clue-stick is the code I added to the bottom of RelationBuildDesc() > >> in relcache.c. The idea is that on first access to an unlogged MV, > >> to detect that the heap has been replaced by the init fork, set > >> relisvalid to false, and make the heap look normal again. > > > > Hmm. I agree that relcache.c has absolutely no business doing that, > > but not sure what else to do instead. Seems like it might be better > > done at first touch of the MV in the parser, rewriter, or planner --- > > but the fact that I can't immediately decide which of those is right > > makes me feel that it's still too squishy. > > I think we shouldn't be doing that at all. The whole business of > transferring the relation-is-invalid information from the relation to > a pg_class flag seems like a Rube Goldberg device to me. I'm still > not convinced that we should have a relation-is-invalid flag at all, > but can we at least not have two? > > It seems perfectly adequate to detect that the MV is invalid when we > actually try to execute a plan - that is, when we first access the > heap or one of its indexes. So the bit can just live in the > file-on-disk, and there's no need to have a second copy of it in > pg_class. Like Kevin, I want a way to distinguish unpopulated MVs from MVs that genuinely yielded the empty set at last refresh. I agree that there's no particular need to store that fact in pg_class, and I would much prefer only storing it one way in any case. A user-visible disadvantage of the current implementation is that relisvalid remains stale until something opens the rel. That's fine for the system itself, but it can deceive user-initiated catalog queries. Imagine a check_postgres action that looks for invalid MVs to complain about. It couldn't just scan pg_class; it would need to first do something that opens every MV. I suggest the following: 1. Let an invalid MV have a zero-length heap. Distinguish a valid, empty MV by giving it a page with no tuples. This entailsVACUUM[1] not truncating MVs below one page and the refresh operation, where necessary, extending the relation fromzero pages to one. 2. Remove pg_class.relisvalid. 3. Add a bool field to RelationData. The word "valid" is used in that context to refer to the validity of the structureitself, so perhaps call the new field rd_scannable. RelationIsFlaggedAsValid() can become a macro; consider changingits name for consistency with the field name. 4. During relcache build, set the field to "RelationGetNumberBlocks(rel) != 0" for MVs, fixed "true" for everyone else. Any operation that changes the field must, and probably would anyway, instigate a relcache invalidation. 5. Expose a database function, say pg_relation_scannable(), for querying the current state of a relation. This supportsuser-level monitoring. Does that seem reasonable? One semantic difference to keep in mind is that unlogged MVs will be considered invalid on the standby while valid on the master. That's essentially an accurate report, so I won't mind it. For the benefit of the archives, I note that we almost need not truncate an unlogged materialized view during crash recovery. MVs are refreshed in a VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's pg_class to that relfilenode. When a crash occurs with no refresh in flight, the latest refresh had been safely synced. When a crash cuts short a refresh, the pg_class update will not stick, and the durability of the old heap is not in doubt. However, non-btree index builds don't have the same property; we would need to force an immediate sync of the indexes to be safe here. It would remain necessary to truncate unlogged MVs when recovering a base backup, which may contain a partially-written refresh that did eventually commit. Future MV variants that modify the MV in place would also need the usual truncate on crash. I'm going to follow this with a review covering a broader range of topics. Thanks, nm [1] For the time being, it's unfortunate to VACUUM materialized views at all; they only ever bear frozen tuples.
Hi Kevin, The patch conflicts with git master; I tested against master@{2013-01-20}. On Wed, Jan 16, 2013 at 12:40:55AM -0500, Kevin Grittner wrote: > I've been struggling with two areas: > > - pg_dump sorting for MVs which depend on other MVs From your later messages, I understand that you have a way forward on this. > - proper handling of the relisvalid flag for unlogged MVs after recovery I have discussed this in a separate email. While reading the patch to assess that topic, I found a few more things: > *** a/contrib/pg_upgrade/version_old_8_3.c > --- b/contrib/pg_upgrade/version_old_8_3.c > *************** > *** 145,151 **** old_8_3_check_for_tsquery_usage(ClusterInfo *cluster) > "FROM pg_catalog.pg_class c, " > " pg_catalog.pg_namespace n, " > " pg_catalog.pg_attribute a " > ! "WHERE c.relkind = 'r' AND " > " c.oid = a.attrelid AND " > " NOT a.attisdropped AND " > " a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND " > --- 145,151 ---- > "FROM pg_catalog.pg_class c, " > " pg_catalog.pg_namespace n, " > " pg_catalog.pg_attribute a " > ! "WHERE c.relkind in ('r', 'm') AND " > " c.oid = a.attrelid AND " > " NOT a.attisdropped AND " > " a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND " PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really matter whether this change happens or not. I suggest adding a comment, whether or not you keep the code change. > *** a/contrib/sepgsql/sepgsql.h > --- b/contrib/sepgsql/sepgsql.h > *************** > *** 32,37 **** > --- 32,39 ---- > > /* > * Internally used code of object classes > + * > + * NOTE: Materialized views are treated as tables for now. This smells like a bypass of mandatory access control. Unless you've determined that this is correct within the sepgsql security model, I suggest starting with a draconian policy, like simply crippling MVs. Even if you have determined that, separating out the nontrivial sepgsql support might be good. The set of ideal reviewers is quite different. > */ > #define SEPG_CLASS_PROCESS 0 > #define SEPG_CLASS_FILE 1 > *** a/contrib/vacuumlo/vacuumlo.c > --- b/contrib/vacuumlo/vacuumlo.c > *************** > *** 209,215 **** vacuumlo(const char *database, const struct _param * param) > strcat(buf, " AND a.atttypid = t.oid "); > strcat(buf, " AND c.relnamespace = s.oid "); > strcat(buf, " AND t.typname in ('oid', 'lo') "); > ! strcat(buf, " AND c.relkind = 'r'"); > strcat(buf, " AND s.nspname !~ '^pg_'"); > res = PQexec(conn, buf); > if (PQresultStatus(res) != PGRES_TUPLES_OK) > --- 209,215 ---- > strcat(buf, " AND a.atttypid = t.oid "); > strcat(buf, " AND c.relnamespace = s.oid "); > strcat(buf, " AND t.typname in ('oid', 'lo') "); > ! strcat(buf, " AND c.relkind in ('r', 'm')"); It concerns me slightly that older vacuumlo could silently remove large objects still referenced by MVs. Only slightly, though, because the next MV refresh would remove those references anyway. Is there anything we should do to help that situation? If nothing else, perhaps backpatch this patch hunk. > + <varlistentry> > + <term><literal>WITH OIDS</></term> > + <term><literal>WITHOUT OIDS</></term> > + <listitem> > + <para> > + These are obsolescent syntaxes equivalent to <literal>WITH (OIDS)</> > + and <literal>WITH (OIDS=FALSE)</>, respectively. If you wish to give > + both an <literal>OIDS</> setting and storage parameters, you must use > + the <literal>WITH ( ... )</> syntax; see above. > + </para> > + </listitem> > + </varlistentry> Let's not support OIDs on MVs. They'll be regenerated on every refresh. > *************** > *** 336,342 **** ExplainOneQuery(Query *query, IntoClause *into, ExplainState *es, > */ > void > ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es, > ! const char *queryString, ParamListInfo params) > { > if (utilityStmt == NULL) > return; > --- 338,345 ---- > */ > void > ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es, > ! const char *queryString, DestReceiver *dest, > ! ParamListInfo params) > { > if (utilityStmt == NULL) > return; > *************** > *** 349,361 **** ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es, > * contained parsetree another time, but let's be safe. > */ > CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt; > ! List *rewritten; > > Assert(IsA(ctas->query, Query)); > ! rewritten = QueryRewrite((Query *) copyObject(ctas->query)); > ! Assert(list_length(rewritten) == 1); > ! ExplainOneQuery((Query *) linitial(rewritten), ctas->into, es, > ! queryString, params); > } > else if (IsA(utilityStmt, ExecuteStmt)) > ExplainExecuteQuery((ExecuteStmt *) utilityStmt, into, es, > --- 352,366 ---- > * contained parsetree another time, but let's be safe. > */ > CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt; > ! Query *query = (Query *) ctas->query; > ! > ! dest = CreateIntoRelDestReceiver(into); > > Assert(IsA(ctas->query, Query)); > ! > ! query = SetupForCreateTableAs(query, ctas->into, queryString, params, dest); > ! > ! ExplainOneQuery(query, ctas->into, es, queryString, dest, params); > } > else if (IsA(utilityStmt, ExecuteStmt)) > ExplainExecuteQuery((ExecuteStmt *) utilityStmt, into, es, If I'm reading this right, you always overwrite the passed-in dest without looking at it. What's the intent here? > + /* > + * Kludge here to allow refresh of a materialized view which is invalid > + * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the > + * first two RangeTblEntry list elements, which were added to the front > + * of the rewritten Query to keep the rules system happy, with the > + * isResultRel flag to indicate that it is OK if they are flagged as > + * invalid. > + */ > + rtable = dataQuery->rtable; > + ((RangeTblEntry *) linitial(rtable))->isResultRel = true; > + ((RangeTblEntry *) lsecond(rtable))->isResultRel = true; Is it safe to assume that the first two RTEs are the correct ones to flag? > + /* > + * Swap the physical files of the target and transient tables, then > + * rebuild the target's indexes and throw away the transient table. > + */ > + finish_heap_swap(matviewOid, OIDNewHeap, false, false, false, RecentXmin); The check_constraints argument should be "true", because the refresh could have invalidated a UNIQUE index. > *************** > *** 3049,3055 **** ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, > break; > case AT_ClusterOn: /* CLUSTER ON */ > case AT_DropCluster: /* SET WITHOUT CLUSTER */ > ! ATSimplePermissions(rel, ATT_TABLE); > /* These commands never recurse */ > /* No command-specific prep needed */ > pass = AT_PASS_MISC; > --- 3104,3110 ---- > break; > case AT_ClusterOn: /* CLUSTER ON */ > case AT_DropCluster: /* SET WITHOUT CLUSTER */ > ! ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW); If the user desires an actually-clustered MV, he must re-CLUSTER it after each refresh. That deserves a documentation mention. > *************** > *** 724,729 **** InitPlan(QueryDesc *queryDesc, int eflags) > --- 765,775 ---- > ExecCheckRTPerms(rangeTable, true); > > /* > + * Ensure that all referrenced relations are flagged as valid. Typo. > + */ > + ExecCheckRelationsValid(rangeTable); I believe this ought to happen after the executor lock acquisitions, perhaps right in ExecOpenScanRelation(). Since you'll then have an open Relation, RelationIsFlaggedAsValid() can use the relcache. > *************** > *** 1591,1596 **** fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown) > --- 1592,1607 ---- > rel = heap_open(rte->relid, NoLock); > > /* > + * Skip materialized view expansion when resultRelation is set. > + */ > + if (rel->rd_rel->relkind == RELKIND_MATVIEW && > + rel->rd_rel->relisvalid) > + { > + heap_close(rel, NoLock); > + break; > + } Would you elaborate on this? > + /* Strip off the trailing semicolon so that other things may follow. */ > + appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1); I suggest verifying that the last character is indeed a semicolon. > /* > + * dumpMatViewIndex > + * write out to fout a user-defined index > + */ > + static void > + dumpMatViewIndex(Archive *fout, IndxInfo *indxinfo) This is so similar to dumpIndex(); can we avoid this level of duplication? > *** /dev/null > --- b/src/test/regress/sql/matview.sql > + -- test diemv when the mv does exist > + DROP MATERIALIZED VIEW IF EXISTS tum; > + > + -- make sure that dependencies are reported properly when they block the drop > + DROP TABLE t; > + > + -- make sure dependencies are dropped and reported > + DROP TABLE t CASCADE; Please retain an interesting sample of materialized views in the regression database. Among other benefits, the pg_upgrade test suite exercises pg_dump and pg_upgrade for all object types retained in the regression database. The regression tests should probably include a few other wrinkles, like an index on a MV. Creating a RULE on an MV succeeds, but refreshing the view then fails: [local] test=# create rule mvrule as on insert to mymv where 1 = 0 do also select 1; CREATE RULE [local] test=# REFRESH MATERIALIZED VIEW mymv; ERROR: materialized view "mymv" has too many rules The documentation is a good start. I would expect a brief introduction in Tutorial -> Advanced Features and possibly a deeper discussion under The SQL Language. I suggest updating Explicit Locking to mention the new commands; users will be interested in the lock level of a refresh. You have chosen to make pg_dump preserve the valid-or-invalid state of each MV. That seems reasonable, though I'm slightly concerned about the case of a dump taken from a standby. We support ALTER TABLE against regular views for historical reasons. When we added foreign tables, we did not extend that permissiveness; one can only use ALTER FOREIGN TABLE on foreign tables. Please do the same for materialized views. See RangeVarCallbackForAlterRelation(). Note that "ALTER TABLE ... RENAME colname TO newname" and "ALTER TABLE ... RENAME CONSTRAINT" are currently supported for MVs by ALTER TABLE but not by ALTER MATERIALIZED VIEW. There's no documented support for table constraints on MVs, but UNIQUE constraints are permitted: [local] test=# alter materialized view mymv add unique (c); ALTER MATERIALIZED VIEW [local] test=# alter materialized view mymv add check (c > 0); ERROR: "mymv" is not a table [local] test=# alter materialized view mymv add primary key (c); ERROR: "mymv" is not a table or foreign table Some of the ALTER TABLE variants would make plenty of sense for MVs: ALTER [ COLUMN ] column_name SET STATISTICS integer ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ...] ) ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] ) ALTER [ COLUMN ] column_name SET STORAGE { PLAIN| EXTERNAL | EXTENDED | MAIN } It wouldn't be a problem to skip those for the first patch, though. Conversely, this syntax is accepted: ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] ) But there are no available options. The only option accepted for regular views, security_barrier, is rejected. MVs always have security_barrier semantics, in any event. Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to determine which ones make sense for MVs. For each one in the sensible set, either allow it or add a comment indicating that it could reasonably be allowed in the future. For each one outside the set, forbid it. Verify that the documentation, the results of your evaluation, and the actual allowed operations are all consistent. Thanks, nm
Thanks for looking at this! Noah Misch wrote: > For the benefit of the archives, I note that we almost need not truncate an > unlogged materialized view during crash recovery. MVs are refreshed in a > VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's > pg_class to that relfilenode. When a crash occurs with no refresh in flight, > the latest refresh had been safely synced. When a crash cuts short a refresh, > the pg_class update will not stick, and the durability of the old heap is not > in doubt. However, non-btree index builds don't have the same property; we > would need to force an immediate sync of the indexes to be safe here. It > would remain necessary to truncate unlogged MVs when recovering a base backup, > which may contain a partially-written refresh that did eventually commit. > Future MV variants that modify the MV in place would also need the usual > truncate on crash. Hmm. That's a very good observation. Perhaps the issue can be punted to a future release where we start adding more incremental updates to them. I'll think on that, but on the face of it, it sounds like the best choice. > I'm going to follow this with a review covering a broader range > of topics. I'll need time to digest the rest of it. As you note, recent commits conflict with the last patch. Please look at the github repo where I've been working on this. I'll post an updated patch later today. https://github.com/kgrittn/postgres/tree/matview You might want to ignore the interim work on detecting the new pg_dump dependencies through walking the internal structures. I decided that was heading in a direction which might be unnecessarily fragile and slow; so I tried writing it as a query against the system tables. I'm pretty happy with the results. Here's the query: with recursive w as ( select d1.objid, d1.objid as wrkid, d2.refobjid, c2.relkind as refrelkind from pg_depend d1 join pg_class c1 on c1.oid= d1.objid and c1.relkind = 'm' and c1.relisvalid join pg_rewrite r1 on r1.ev_class= d1.objid join pg_depend d2 on d2.classid = 'pg_rewrite'::regclass and d2.objid = r1.oid and d2.refobjid <> d1.objid join pg_class c2 on c2.oid = d2.refobjid and c2.relkind in ('m','v') and c2.relisvalid where d1.classid = 'pg_class'::regclass union select w.objid, w.refobjid as wrkid, d3.refobjid, c3.relkind as refrelkind from w join pg_rewrite r3 on r3.ev_class= w.refobjid join pg_depend d3 on d3.classid = 'pg_rewrite'::regclass and d3.objid = r3.oid and d3.refobjid <> w.refobjid join pg_class c3 on c3.oid = d3.refobjid and c3.relkindin ('m','v') and c3.relisvalid where w.refrelkind <> 'm' ), x as ( select objid::regclass, refobjid::regclass from w where refrelkind = 'm' ) select 'm'::text as type, x.objid, x.refobjid from x union all select 'i'::text as type, x.objid, i.indexrelid as refobjid from x join pg_index i on i.indrelid = x.refobjid and i.indisvalid ; If we bail on having pg_class.relisvalid, then it will obviously need adjustment. -Kevin
On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote: > Noah Misch wrote: > > For the benefit of the archives, I note that we almost need not truncate an > > unlogged materialized view during crash recovery. MVs are refreshed in a > > VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's > > pg_class to that relfilenode. When a crash occurs with no refresh in flight, > > the latest refresh had been safely synced. When a crash cuts short a refresh, > > the pg_class update will not stick, and the durability of the old heap is not > > in doubt. However, non-btree index builds don't have the same property; we > > would need to force an immediate sync of the indexes to be safe here. It > > would remain necessary to truncate unlogged MVs when recovering a base backup, > > which may contain a partially-written refresh that did eventually commit. > > Future MV variants that modify the MV in place would also need the usual > > truncate on crash. > > Hmm. That's a very good observation. Perhaps the issue can be > punted to a future release where we start adding more incremental > updates to them. I'll think on that, but on the face of it, it > sounds like the best choice. That situation is challenging for the same reason pg_class.relisvalid was hard to implement for unlogged relations. The startup process doesn't know the relkind of the unlogged-relation relfilenodes it cleans. If you can work through all that, it's certainly a nice endpoint to not lose unlogged snapshot MVs on crash. But I intended the first half of my message as the recommendation and the above as a wish for the future. > You might want to ignore the interim work on detecting the new > pg_dump dependencies through walking the internal structures. I > decided that was heading in a direction which might be > unnecessarily fragile and slow; so I tried writing it as a query > against the system tables. I'm pretty happy with the results. > Here's the query: > > with recursive w as [snip] Why is the dependency problem of ordering MV refreshes and MV index builds so different from existing pg_dump dependency problems? > If we bail on having pg_class.relisvalid, then it will obviously > need adjustment. Even if we don't have the column, we can have the fact of an MV's validity SQL-visible in some other way.
Noah Misch wrote: > On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote: >> Noah Misch wrote: >>> For the benefit of the archives, I note that we almost need not truncate an >>> unlogged materialized view during crash recovery. MVs are refreshed in a >>> VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's >>> pg_class to that relfilenode. When a crash occurs with no refresh in flight, >>> the latest refresh had been safely synced. When a crash cuts short a refresh, >>> the pg_class update will not stick, and the durability of the old heap is not >>> in doubt. However, non-btree index builds don't have the same property; we >>> would need to force an immediate sync of the indexes to be safe here. It >>> would remain necessary to truncate unlogged MVs when recovering a base backup, >>> which may contain a partially-written refresh that did eventually commit. >>> Future MV variants that modify the MV in place would also need the usual >>> truncate on crash. >> >> Hmm. That's a very good observation. Perhaps the issue can be >> punted to a future release where we start adding more incremental >> updates to them. I'll think on that, but on the face of it, it >> sounds like the best choice. > > That situation is challenging for the same reason pg_class.relisvalid was hard > to implement for unlogged relations. The startup process doesn't know the > relkind of the unlogged-relation relfilenodes it cleans. If you can work > through all that, it's certainly a nice endpoint to not lose unlogged snapshot > MVs on crash. But I intended the first half of my message as the > recommendation and the above as a wish for the future. Well, if I just don't create an init fork for MVs, they are left as they were on recovery, aren't they? So for 9.3, that solves that issue, I think. pg_class.relisvald is a separate issue. >> You might want to ignore the interim work on detecting the new >> pg_dump dependencies through walking the internal structures. I >> decided that was heading in a direction which might be >> unnecessarily fragile and slow; so I tried writing it as a query >> against the system tables. I'm pretty happy with the results. >> Here's the query: >> >> with recursive w as > [snip] > > Why is the dependency problem of ordering MV refreshes and MV index builds so > different from existing pg_dump dependency problems? If mva has indexes and is referenced by mvb, the CREATE statements are all properly ordered, but you want mva populated and indexed before you attempt to populate mvb. (Populated to get correct results, indexed to get them quickly.) We don't have anything else like that. >> If we bail on having pg_class.relisvalid, then it will obviously >> need adjustment. > > Even if we don't have the column, we can have the fact of an MV's validity > SQL-visible in some other way. Sure, I didn't say we had to abandon the query -- probably just replace the relisvalid tests with a function call using the oid of the MV. -Kevin
On Thu, Jan 24, 2013 at 03:14:15PM -0500, Kevin Grittner wrote: > Noah Misch wrote: > > On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote: > >> Noah Misch wrote: > >>> For the benefit of the archives, I note that we almost need not truncate an > >>> unlogged materialized view during crash recovery. MVs are refreshed in a > >>> VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's > >>> pg_class to that relfilenode. When a crash occurs with no refresh in flight, > >>> the latest refresh had been safely synced. When a crash cuts short a refresh, > >>> the pg_class update will not stick, and the durability of the old heap is not > >>> in doubt. However, non-btree index builds don't have the same property; we > >>> would need to force an immediate sync of the indexes to be safe here. It > >>> would remain necessary to truncate unlogged MVs when recovering a base backup, > >>> which may contain a partially-written refresh that did eventually commit. > >>> Future MV variants that modify the MV in place would also need the usual > >>> truncate on crash. > >> > >> Hmm. That's a very good observation. Perhaps the issue can be > >> punted to a future release where we start adding more incremental > >> updates to them. I'll think on that, but on the face of it, it > >> sounds like the best choice. > > > > That situation is challenging for the same reason pg_class.relisvalid was hard > > to implement for unlogged relations. The startup process doesn't know the > > relkind of the unlogged-relation relfilenodes it cleans. If you can work > > through all that, it's certainly a nice endpoint to not lose unlogged snapshot > > MVs on crash. But I intended the first half of my message as the > > recommendation and the above as a wish for the future. > > Well, if I just don't create an init fork for MVs, they are left as > they were on recovery, aren't they? So for 9.3, that solves that > issue, I think. pg_class.relisvald is a separate issue. The startup process just looks for init forks, yes. But it's acceptable to leave the unlogged MV materials alone during *crash* recovery only. When recovering from a base backup, we once again need an init fork to refresh the unlogged-MV relations. In turn, we would still need a relisvalid implementation that copes. This is all solvable, sure, but it looks like a trip off into the weeds relative to the core aim of this patch. > > Why is the dependency problem of ordering MV refreshes and MV index builds so > > different from existing pg_dump dependency problems? > > If mva has indexes and is referenced by mvb, the CREATE statements > are all properly ordered, but you want mva populated and indexed > before you attempt to populate mvb. (Populated to get correct > results, indexed to get them quickly.) We don't have anything else > like that. Is the REFRESH order just a replay of the CREATE order (with index builds interspersed), or can it differ? nm
Noah Misch wrote: > The patch conflicts with git master; I tested against master@{2013-01-20}. New patch rebased, fixes issues raised by Thom Brown, and addresses some of your points. > PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really > matter whether this change happens or not. I suggest adding a comment, > whether or not you keep the code change. Reverted code changes to version_old_8_3.c; added comments. > > *** a/contrib/sepgsql/sepgsql.h > > --- b/contrib/sepgsql/sepgsql.h > > *************** > > *** 32,37 **** > > --- 32,39 ---- > > > > /* > > * Internally used code of object classes > > + * > > + * NOTE: Materialized views are treated as tables for now. > > This smells like a bypass of mandatory access control. Unless you've > determined that this is correct within the sepgsql security model, I suggest > starting with a draconian policy, like simply crippling MVs. Even if you have > determined that, separating out the nontrivial sepgsql support might be good. > The set of ideal reviewers is quite different. Robert suggested this way of coping for now. Will post just the sepgsql separately to try to attract the right crowd to confirm. > It concerns me slightly that older vacuumlo could silently remove large > objects still referenced by MVs. Only slightly, though, because the next MV > refresh would remove those references anyway. Is there anything we should do > to help that situation? If nothing else, perhaps backpatch this patch hunk. Defensive backpatching of this code sounds like a good idea to me. I'm open to other opinions on whether we need to defend 9.3 and later against earler versions of vacuumlo being run against them. > Let's not support OIDs on MVs. They'll be regenerated on every refresh. Do they have any value for people who might want to use cursors? If nobody speaks up for them, I will drop OID support for materialized views. > If I'm reading this right, you always overwrite the passed-in dest without > looking at it. What's the intent here? Let me get back to you on that one. > > + /* > > + * Kludge here to allow refresh of a materialized view which is invalid > > + * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the > > + * first two RangeTblEntry list elements, which were added to the front > > + * of the rewritten Query to keep the rules system happy, with the > > + * isResultRel flag to indicate that it is OK if they are flagged as > > + * invalid. > > + */ > > + rtable = dataQuery->rtable; > > + ((RangeTblEntry *) linitial(rtable))->isResultRel = true; >> + ((RangeTblEntry *) lsecond(rtable))->isResultRel = true; > > Is it safe to assume that the first two RTEs are the correct ones to flag? I'm trying to play along with UpdateRangeTableOfViewParse() in view.c. See the comment in front of that function for details. >> + finish_heap_swap(matviewOid, OIDNewHeap, false, false, false, RecentXmin); > > The check_constraints argument should be "true", because the refresh could > have invalidated a UNIQUE index. Fixed. > If the user desires an actually-clustered MV, he must re-CLUSTER it after each > refresh. That deserves a documentation mention. That point had not occurred to me. Let me see if I can fix that before changing docs. > > + * Ensure that all referrenced relations are flagged as valid. > > Typo. Fixed. >> + ExecCheckRelationsValid(rangeTable); > > I believe this ought to happen after the executor lock acquisitions, perhaps > right in ExecOpenScanRelation(). Since you'll then have an open Relation, > RelationIsFlaggedAsValid() can use the relcache. That would break MVs entirely. This probably deserves more comments. It's a little fragile, but was the best way I found to handle things. An MV has a rule associated with it, just like a "regular" view, which is parse-analyzed but not rewritten or planned. We need to allow the rewrite and planning for statements which populate the view, but suppress expansion of the rule for simple references. It is OK for an MV to be invalid if it is being populated, but not if it is being referenced. Long story short, this call helps determine which relations will be opened. If someone can suggest a better alternative, I'll see what I can do; otherwise I guess I should add comments around the key places. >> *************** >> *** 1591,1596 **** fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown) >> --- 1592,1607 ---- >> rel = heap_open(rte->relid, NoLock); >> >> /* >> + * Skip materialized view expansion when resultRelation is set. >> + */ >> + if (rel->rd_rel->relkind == RELKIND_MATVIEW && >> + rel->rd_rel->relisvalid) >> + { >> + heap_close(rel, NoLock); >> + break; >> + } > > Would you elaborate on this? It's diretly related to the point immediately preceding. At this point we have thrown an error if the MV is invalid and being used as a source of data. If it is the target of data it is flagged as invalid so that it will not be expanded. Maybe we need a better way to determine this, but I'm not sure just what to use. >> + /* Strip off the trailing semicolon so that other things may follow. */ >> + appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1); > > I suggest verifying that the last character is indeed a semicolon. How about if I have it exit_horribly if the semicolon added 21 lines up has disappeared? Or use Assert if we have that for the frontend now? > > + static void > > + dumpMatViewIndex(Archive *fout, IndxInfo *indxinfo) > > This is so similar to dumpIndex(); can we avoid this level of duplication? It is identical except for name. I can only assume that I thought I needed a modified version and changed my mind. Removed. > Please retain an interesting sample of materialized views in the regression > database. Among other benefits, the pg_upgrade test suite exercises pg_dump > and pg_upgrade for all object types retained in the regression database. OK > The regression tests should probably include a few other wrinkles, like an > index on a MV. Yeah. Will do. > Creating a RULE on an MV succeeds, but refreshing the view then fails: Fixed by prohibiting CREATE RULE on an MV. > The documentation is a good start. I would expect a brief introduction in > Tutorial -> Advanced Features and possibly a deeper discussion under The SQL > Language. I suggest updating Explicit Locking to mention the new commands; > users will be interested in the lock level of a refresh. Yeah, the docs need another pass. It seemed prudent to make sure of what I was documenting first. > You have chosen to make pg_dump preserve the valid-or-invalid state of each > MV. That seems reasonable, though I'm slightly concerned about the case of a > dump taken from a standby. I'm not clear on the problem. Could you explain? > Some of the ALTER TABLE variants would make plenty of sense for MVs: > > ALTER [ COLUMN ] column_name SET STATISTICS integer > ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] ) > ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] ) > ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } > > It wouldn't be a problem to skip those for the first patch, though. > Conversely, this syntax is accepted: > > ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] ) > > But there are no available options. The only option accepted for regular > views, security_barrier, is rejected. MVs always have security_barrier > semantics, in any event. I think those are doc problems, not implementation of the functionality. Will double-check and fix where needed. > Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to > determine which ones make sense for MVs. For each one in the sensible set, > either allow it or add a comment indicating that it could reasonably be > allowed in the future. For each one outside the set, forbid it. Verify that > the documentation, the results of your evaluation, and the actual allowed > operations are all consistent. I have already tried to do that in the coding, although maybe you think more comments are needed there? The docs definitely need to catch up. This part isn't in flux, so I'll fix that part of the docs in the next day or two. Thanks for the review! -Kevin
Attachment
On 25 January 2013 06:00, Kevin Grittner <kgrittn@mail.com> wrote: > Noah Misch wrote: > >> The patch conflicts with git master; I tested against master@{2013-01-20}. > > New patch rebased, fixes issues raised by Thom Brown, and addresses > some of your points. Thanks for the new version. All previous issues I raised have been resolved. I have an inconsistency to note between VIEWs and MATERIALIZED VIEWs: CREATE VIEW v_test8 (meow, "?column?") AS SELECT 1 bark, 2; CREATE MATERIALIZED VIEW mv_test8 (meow, "?column?") AS SELECT 1 bark, 2; pg_dump output: CREATE VIEW v_test8 AS SELECT 1 AS meow, 2; CREATE MATERIALIZED VIEW mv_test8 ( meow, "?column?" ) AS SELECT 1 AS meow, 2 WITH NO DATA; The materialized view adds in column name parameters, whereas the standard view doesn't. But it seems to add column parameters regardless: CREATE VIEW v_test9 AS SELECT 1; CREATE MATERIALIZED VIEW v_test9 AS SELECT 1; CREATE VIEW v_test9 AS SELECT 1; CREATE MATERIALIZED VIEW mv_test9 ( "?column?" ) AS SELECT 1 WITH NO DATA; VIEWs never seem to use column parameters, MATERIALIZED VIEWs always appear to use them. -- Thom
On Fri, Jan 25, 2013 at 01:00:59AM -0500, Kevin Grittner wrote: > Noah Misch wrote: > > > *** a/contrib/sepgsql/sepgsql.h > > > --- b/contrib/sepgsql/sepgsql.h > > > *************** > > > *** 32,37 **** > > > --- 32,39 ---- > > > > > > /* > > > * Internally used code of object classes > > > + * > > > + * NOTE: Materialized views are treated as tables for now. > > > > This smells like a bypass of mandatory access control. Unless you've > > determined that this is correct within the sepgsql security model, I suggest > > starting with a draconian policy, like simply crippling MVs. Even if you have > > determined that, separating out the nontrivial sepgsql support might be good. > > The set of ideal reviewers is quite different. > > Robert suggested this way of coping for now. Will post just the > sepgsql separately to try to attract the right crowd to confirm. Sounds good. > > Let's not support OIDs on MVs. They'll be regenerated on every refresh. > > Do they have any value for people who might want to use cursors? Not that I have heard, for whatever that's worth. > > > + /* > > > + * Kludge here to allow refresh of a materialized view which is invalid > > > + * (that is, it was created WITH NO DATA or was TRUNCATED). We flag the > > > + * first two RangeTblEntry list elements, which were added to the front > > > + * of the rewritten Query to keep the rules system happy, with the > > > + * isResultRel flag to indicate that it is OK if they are flagged as > > > + * invalid. > > > + */ > > > + rtable = dataQuery->rtable; > > > + ((RangeTblEntry *) linitial(rtable))->isResultRel = true; > >> + ((RangeTblEntry *) lsecond(rtable))->isResultRel = true; > > > > Is it safe to assume that the first two RTEs are the correct ones to flag? > > I'm trying to play along with UpdateRangeTableOfViewParse() in > view.c. See the comment in front of that function for details. Ah. Perhaps assert that those RTEs have the aliases "old" and "new"? > >> + ExecCheckRelationsValid(rangeTable); > > > > I believe this ought to happen after the executor lock acquisitions, perhaps > > right in ExecOpenScanRelation(). Since you'll then have an open Relation, > > RelationIsFlaggedAsValid() can use the relcache. > > That would break MVs entirely. This probably deserves more > comments. It's a little fragile, but was the best way I found to > handle things. An MV has a rule associated with it, just like a > "regular" view, which is parse-analyzed but not rewritten or > planned. We need to allow the rewrite and planning for statements > which populate the view, but suppress expansion of the rule for > simple references. It is OK for an MV to be invalid if it is being > populated, but not if it is being referenced. Long story short, > this call helps determine which relations will be opened. > > If someone can suggest a better alternative, I'll see what I can > do; otherwise I guess I should add comments around the key places. I see. It seems wrong to check MV validity before the executor locks a table; if the executor lock were in fact the first lock on the relation, then the view could become invalid again before we lock it. I don't know a way to actually make the executor's lock be the first lock; some parser or planner lock invariably seems to precede it. Is that proper to rely on? > >> + /* Strip off the trailing semicolon so that other things may follow. */ > >> + appendBinaryPQExpBuffer(result, PQgetvalue(res, 0, 0), len - 1); > > > > I suggest verifying that the last character is indeed a semicolon. > > How about if I have it exit_horribly if the semicolon added 21 > lines up has disappeared? Or use Assert if we have that for the > frontend now? The code 21 lines back is adding a semicolon to a query being sent to the server to retrieve the view's definition. Here you're removing the trailing semicolon from a column of the server's response. Granted, there's not much reason we'd ever change the server to omit the trailing semicolon, and the breakage would be relatively obvious even without an explicit test here. > > You have chosen to make pg_dump preserve the valid-or-invalid state of each > > MV. That seems reasonable, though I'm slightly concerned about the case of a > > dump taken from a standby. > > I'm not clear on the problem. Could you explain? Currently none. If you took my suggestion regarding relisvalid, then MVs would be considered invalid on the standby. That is one disadvantage of the suggestion, perhaps. > > Some of the ALTER TABLE variants would make plenty of sense for MVs: > > > > ??ALTER [ COLUMN ] column_name SET STATISTICS integer > > ??ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] ) > > ??ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] ) > > ??ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } > > > > It wouldn't be a problem to skip those for the first patch, though. > > Conversely, this syntax is accepted: > > > > ??ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ] ) > > > > But there are no available options. The only option accepted for regular > > views, security_barrier, is rejected. MVs always have security_barrier > > semantics, in any event. > > I think those are doc problems, not implementation of the > functionality. Will double-check and fix where needed. Looks so. > > Overall, I recommend auditing all the ALTER TABLE and ALTER VIEW options to > > determine which ones make sense for MVs. For each one in the sensible set, > > either allow it or add a comment indicating that it could reasonably be > > allowed in the future. For each one outside the set, forbid it. Verify that > > the documentation, the results of your evaluation, and the actual allowed > > operations are all consistent. > > I have already tried to do that in the coding, although maybe you > think more comments are needed there? The docs definitely need to > catch up. This part isn't in flux, so I'll fix that part of the > docs in the next day or two. I won't worry about such comments for now; it looks like you're targeting most of the reasonable-to-support ALTER operations. I just didn't realize that the docs were out of date in this respect. By the way, your mailer omits References: and In-Reply-To: headers, breaking the thread. nm
On 1/25/13 1:00 AM, Kevin Grittner wrote: > New patch rebased, fixes issues raised by Thom Brown, and addresses > some of your points. This patch doesn't apply anymore, so I just took a superficial look. I think the intended functionality and the interfaces look pretty good. Documentation looks complete, tests are there. I have a couple of notes: * What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED. It might be better to use that as well then. * You use fields named relkind in the parse nodes, but they don't actually contain relkind values, which is confusing. I'd just name the field is_matview or something. * More generally, I wouldn't be so fond of combining the parse handling of CREATE TABLE AS and CREATE MATERIALIZED VIEW. They are similar, but then again so are a lot of other things. * Some of the terminology is inconsistent. A materialized view is sometimes called valid, populated, or built, with approximately the same meaning. Personally, I would settle on "built", as per above, but it should be one term only. * I find the name of the relisvalid column a bit confusing. Especially because it only applies to materialized views, and there is already a meaning of "valid" for indexes. (Recall that indexes are also stored in pg_class, but they are concerned about indisvalid.) I would name it something like relmvbuilt. Btw., half of the patch seems to consist of updating places referring to relkind. Is something wrong with the meaning of relkind that this sort of thing is required? Maybe these places should be operating in terms of features, not accessing relkind directly.
Noah Misch <noah@leadboat.com> wrote: > Like Kevin, I want a way to distinguish unpopulated MVs from MVs that > genuinely yielded the empty set at last refresh. I agree that there's no > particular need to store that fact in pg_class, and I would much prefer only > storing it one way in any case. A user-visible disadvantage of the current > implementation is that relisvalid remains stale until something opens the rel. > That's fine for the system itself, but it can deceive user-initiated catalog > queries. Imagine a check_postgres action that looks for invalid MVs to > complain about. It couldn't just scan pg_class; it would need to first do > something that opens every MV. > > I suggest the following: > > 1. Let an invalid MV have a zero-length heap. Distinguish a valid, empty MV > by giving it a page with no tuples. This entails VACUUM[1] not truncating > MVs below one page and the refresh operation, where necessary, extending > the relation from zero pages to one. > 2. Remove pg_class.relisvalid. > 3. Add a bool field to RelationData. The word "valid" is used in that context > to refer to the validity of the structure itself, so perhaps call the new > field rd_scannable. RelationIsFlaggedAsValid() can become a macro; > consider changing its name for consistency with the field name. > 4. During relcache build, set the field to "RelationGetNumberBlocks(rel) !=> 0" > for MVs, fixed "true" for everyone else. Any operation that changes the > field must, and probably would anyway, instigate a relcache invalidation. > 5. Expose a database function, say pg_relation_scannable(), for querying the > current state of a relation. This supports user-level monitoring. > > Does that seem reasonable? One semantic difference to keep in mind is that > unlogged MVs will be considered invalid on the standby while valid on the > master. That's essentially an accurate report, so I won't mind it. Changed to work pretty much as you suggested. > I'm going to follow this with a review covering a broader range of topics. Those issues addressed, too. That includes the most egregious doc problems you pointed out, but there still needs to be a thorough review, and I expect to find a few more doc cleanup issues. There was one minor syntax issue not addressed by Noah, nor much discussed in general that I didn't want to just unilaterally choose; but given that nobody seems to care that much I will put forward a proposal and do it that way tomorrow if nobody objects. Before this patch tables were the only things subject to truncation, but now materialized views can also be truncated. So far we have been treating TABLE as a noise word in the truncate command. I assume we still want to allow tables to be truncated with or without the word. The question is what to do about materialized views, and wheter both can be specified on a single TRUNCATE statement. I propose that we allow TABLE or MATERIALIZED VIEW to be specified, or that part of the statement to be left out. I propose that either type of object be allowed unless one or the other is specified and the object to be truncated is not of that kind. So you could mix both kinds on one statement, so long as you didn't specify either kind. There is one odd aspect to pg_dump, but I think the way it is behaving is the best way to handle it, although I invite other opinions. If you load from pg_dump output, it will try to populated materialized views which were populated on dump, and leave the ones which were not scannable because the contents had not been generated in an empty and unscannable state on restore. That much seems pretty obvious. Where it gets a little tricky is if mva is generated with data, and mvb is generated based on mva. Then mva is truncated. Then you dump. mvb was populated at the time of the dump, but its contents can't be regenerated on restore because mva is not scannable. As the patch currently stands, you get an error on the attempt to REFRESH mvb. I think that's a good thing, but I'm open to arguments to the contrary. I don't have any handling in pg_dump for circular references among materialized views, because I couldn't see how that could happen. I'm not 100% sure that isn't just a failure of imagination on my part, though. The only other comment I know of that hasn't been addressed is Simon's suggestion that I put in syntax for features which we might implement in future releases. I don't want to do that without the usual community design and bike-shedding process, so syntax is only implemented for implemented features. I'm still waiting for final word (and a small patch?) from KaiGai Kohei for the sepgsql part. This patch does require an initdb because of a new function. Unless something else comes up in review or I get feedback to the contrary I plan to deal with the above-mentioned issues and commit this within a week or two. Thanks to Marko Tiikkaja, Robert Haas, Thom Brown, Simon Riggs, KaiGai Kohei, and Noah Misch for the reviews and suggestions so far, thanks to Robert for the initial cut at the docs, and big thanks to Noah for helping me track down an elusive bug. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > There is one odd aspect to pg_dump, but I think the way it is > behaving is the best way to handle it, although I invite other > opinions. If you load from pg_dump output, it will try to > populated materialized views which were populated on dump, and > leave the ones which were not scannable because the contents had > not been generated in an empty and unscannable state on restore. > That much seems pretty obvious. Where it gets a little tricky is > if mva is generated with data, and mvb is generated based on mva. > Then mva is truncated. Then you dump. mvb was populated at the > time of the dump, but its contents can't be regenerated on restore > because mva is not scannable. As the patch currently stands, you > get an error on the attempt to REFRESH mvb. I think that's a good > thing, but I'm open to arguments to the contrary. Hmm, anything that means a dump-and-restore can fail seems like a bad thing to me. There's nothing outrageous about that scenario. It's arguable what state dump-and-restore should leave the new database in, but I don't see why it shouldn't work. I predict we'll end up with unhappy users if we leave it like this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 15, 2013 at 08:24:16PM -0500, Robert Haas wrote: > On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > > There is one odd aspect to pg_dump, but I think the way it is > > behaving is the best way to handle it, although I invite other > > opinions. If you load from pg_dump output, it will try to > > populated materialized views which were populated on dump, and > > leave the ones which were not scannable because the contents had > > not been generated in an empty and unscannable state on restore. > > That much seems pretty obvious. Where it gets a little tricky is > > if mva is generated with data, and mvb is generated based on mva. > > Then mva is truncated. Then you dump. mvb was populated at the > > time of the dump, but its contents can't be regenerated on restore > > because mva is not scannable. As the patch currently stands, you > > get an error on the attempt to REFRESH mvb. I think that's a good > > thing, but I'm open to arguments to the contrary. > > Hmm, anything that means a dump-and-restore can fail seems like a bad > thing to me. There's nothing outrageous about that scenario. It's > arguable what state dump-and-restore should leave the new database in, > but I don't see why it shouldn't work. I predict we'll end up with > unhappy users if we leave it like this. pg_upgrade is going to fail on that pg_restore error. :-( -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote: >> There is one odd aspect to pg_dump, but I think the way it is >> behaving is the best way to handle it, although I invite other >> opinions. If you load from pg_dump output, it will try to >> populated materialized views which were populated on dump, and >> leave the ones which were not scannable because the contents had >> not been generated in an empty and unscannable state on restore. >> That much seems pretty obvious. Where it gets a little tricky is >> if mva is generated with data, and mvb is generated based on mva. >> Then mva is truncated. Then you dump. mvb was populated at the >> time of the dump, but its contents can't be regenerated on restore >> because mva is not scannable. As the patch currently stands, you >> get an error on the attempt to REFRESH mvb. I think that's a good >> thing, but I'm open to arguments to the contrary. > > Hmm, anything that means a dump-and-restore can fail seems like a bad > thing to me. There's nothing outrageous about that scenario. It's > arguable what state dump-and-restore should leave the new database in, > but I don't see why it shouldn't work. I predict we'll end up with > unhappy users if we leave it like this. Keeping in mind that mva may take hours to refresh, and mvb may take only minutes once you have the data from mva, what behavior do you think is preferable? The alternatives I can think of are: (1) Force mva to refresh on restore, even though it was empty and not scannable on dump. This may delay completion of the restore for an extended time. It would leave both mva and mvb populated. (2) Populate mvb by using mva's query as a regular view. This would leave things in the same state as they were on dump, and might possibly optimized to something faster than generating mva and then mvb; but probably would not be much faster in most cases. (3) Change the failure to generate data for mvb in this case as a WARNING rather than an ERROR. (4) Actually dump and restore data with COPY statements for materialized views, rather than having the dump create REFRESH statements. The main down side of this, it seems to me, is that it opens up materialized views to direct tinkering of contents via SQL statements, which I was hoping to avoid. Perhaps this can be mitigated in some way. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Bruce Momjian <bruce@momjian.us> wrote: > On Fri, Feb 15, 2013 at 08:24:16PM -0500, Robert Haas wrote: >> On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote: >>> There is one odd aspect to pg_dump, but I think the way it is >>> behaving is the best way to handle it, although I invite other >>> opinions. If you load from pg_dump output, it will try to >>> populated materialized views which were populated on dump, and >>> leave the ones which were not scannable because the contents had >>> not been generated in an empty and unscannable state on restore. >>> That much seems pretty obvious. Where it gets a little tricky is >>> if mva is generated with data, and mvb is generated based on mva. >>> Then mva is truncated. Then you dump. mvb was populated at the >>> time of the dump, but its contents can't be regenerated on restore >>> because mva is not scannable. As the patch currently stands, you >>> get an error on the attempt to REFRESH mvb. I think that's a good >>> thing, but I'm open to arguments to the contrary. >> >> Hmm, anything that means a dump-and-restore can fail seems like a bad >> thing to me. There's nothing outrageous about that scenario. It's >> arguable what state dump-and-restore should leave the new database in, >> but I don't see why it shouldn't work. I predict we'll end up with >> unhappy users if we leave it like this. > > pg_upgrade is going to fail on that pg_restore error. :-( With the hard link option it should succeed, I would think. If we arranged for the check option, when run without the hard link option, to report such cases so that the user could choose to either truncate mvb or refresh mva before the upgrade, would that satisfy this concern? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 16, 2013 at 09:53:14AM -0800, Kevin Grittner wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Feb 15, 2013 at 8:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > >> There is one odd aspect to pg_dump, but I think the way it is > >> behaving is the best way to handle it, although I invite other > >> opinions. If you load from pg_dump output, it will try to > >> populated materialized views which were populated on dump, and > >> leave the ones which were not scannable because the contents had > >> not been generated in an empty and unscannable state on restore. > >> That much seems pretty obvious. Where it gets a little tricky is > >> if mva is generated with data, and mvb is generated based on mva. > >> Then mva is truncated. Then you dump. mvb was populated at the > >> time of the dump, but its contents can't be regenerated on restore > >> because mva is not scannable. As the patch currently stands, you > >> get an error on the attempt to REFRESH mvb. I think that's a good > >> thing, but I'm open to arguments to the contrary. > > > > Hmm, anything that means a dump-and-restore can fail seems like a bad > > thing to me. There's nothing outrageous about that scenario. It's > > arguable what state dump-and-restore should leave the new database in, > > but I don't see why it shouldn't work. I predict we'll end up with > > unhappy users if we leave it like this. I agree that making the dump fail on this account is bad. > Keeping in mind that mva may take hours to refresh, and mvb may > take only minutes once you have the data from mva, what behavior do > you think is preferable? > > The alternatives I can think of are: > > (1) Force mva to refresh on restore, even though it was empty and > not scannable on dump. This may delay completion of the restore > for an extended time. It would leave both mva and mvb populated. This is reasonable. If the user doesn't like it, he can presumably use an edited dump list to remove the REFRESHes. > (2) Populate mvb by using mva's query as a regular view. This > would leave things in the same state as they were on dump, and > might possibly optimized to something faster than generating mva > and then mvb; but probably would not be much faster in most cases. Interesting idea, but I don't think adding novel server behavior is justified to achieve this. > (3) Change the failure to generate data for mvb in this case as a > WARNING rather than an ERROR. This is also fair. However, I think it's better to restore more valid MVs (option 1) than fewer. > (4) Actually dump and restore data with COPY statements for > materialized views, rather than having the dump create REFRESH > statements. The main down side of this, it seems to me, is that it > opens up materialized views to direct tinkering of contents via SQL > statements, which I was hoping to avoid. Perhaps this can be > mitigated in some way. This is a door better left closed. Overall, I recommend option 1. None of the options will furnish the desire of every database, but the DBA can always tailor the outcome by replacing the dumped REFRESH statements with his own. I'm not envisioning that MVs left invalid for the long term will be a typical thing, anyway. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Noah Misch <noah@leadboat.com> wrote: > On Sat, Feb 16, 2013 at 09:53:14AM -0800, Kevin Grittner wrote: > I agree that making the dump fail on this account is bad. I would argue that this is an overstatement of the issue except for restores that use the single-transaction switch and pg_upgrade without the hard link option. In all other cases, one could just issue REFRESH statements after the dump successfully completed all the other work. But those two cases are important enough that the issue must be seriously considered. >> (1) Force mva to refresh on restore, even though it was empty >> and not scannable on dump. This may delay completion of the >> restore for an extended time. It would leave both mva and mvb >> populated. > > This is reasonable. If the user doesn't like it, he can > presumably use an edited dump list to remove the REFRESHes. > Overall, I recommend option 1. I'm OK with that approach, and in the absence of anyone pushing for another direction, will make that change to pg_dump. I'm thinking I would only do this for materialized views which were not scannable, but which cause REFRESH failures on other materialized views if not refreshed first (recursively evaluated), rather than just automatically refreshing all MVs on restore. The reason this seems important is that some MVs may take a long time to refresh, and a user might want a dump/restore to get to a point where they can use the rest of the database while building the contents of some MVs in the background or during off hours. > None of the options will furnish the desire of every database, Agreed. > but the DBA can always tailor the outcome by replacing the dumped > REFRESH statements with his own. ... or by issuing TRUNCATE or REFRESH statements before the dump to avoid the issue. > I'm not envisioning that MVs left invalid for the long term will > be a typical thing, anyway. Agreed. I think this will be an infrequent issue caused by unusual user actions; but it would be bound to come up occasionally. -Kevin
On 16 February 2013 01:01, Kevin Grittner <kgrittn@ymail.com> wrote: > Unless something else comes up in review or I get feedback to the > contrary I plan to deal with the above-mentioned issues and commit > this within a week or two. At the moment it's not possible to rename a column without using ALTER TABLE on an MV. Also, shouldn't we have the ability to set the collation on a column of the MV? And the inconsistency between regular views and MVs is still present, where MVs always dump with column parameters in their definition, and views never do. Not a show-stopper, but curious nonetheless. -- Thom
Kevin Grittner escribió: > I'm OK with that approach, and in the absence of anyone pushing for > another direction, will make that change to pg_dump. I'm thinking > I would only do this for materialized views which were not > scannable, but which cause REFRESH failures on other materialized > views if not refreshed first (recursively evaluated), rather than > just automatically refreshing all MVs on restore. The reason this > seems important is that some MVs may take a long time to refresh, > and a user might want a dump/restore to get to a point where they > can use the rest of the database while building the contents of > some MVs in the background or during off hours. Maybe it would be a good idea to try to put such commands at the very end of the dump, if possible. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Kevin Grittner escribió: > >> I'm OK with that approach, and in the absence of anyone pushing >> for another direction, will make that change to pg_dump. I'm >> thinking I would only do this for materialized views which were >> not scannable, but which cause REFRESH failures on other >> materialized views if not refreshed first (recursively >> evaluated), rather than just automatically refreshing all MVs on >> restore. The reason this seems important is that some MVs may >> take a long time to refresh, and a user might want a >> dump/restore to get to a point where they can use the rest of >> the database while building the contents of some MVs in the >> background or during off hours. > > Maybe it would be a good idea to try to put such commands at the > very end of the dump, if possible. Here is the dump order as currently implemented in that patch. MVs are created at the same priority as tables and views. MV REFRESH and MV index builds obviously need to follow population of table data. These are at the same priority because it makes the most sense to populated an MV without any indexes and then build them before the MV is used to populate some other MV. Dependency information is used to get that to sort properly within the priority level. 1, /* DO_NAMESPACE */ 2, /* DO_PROCLANG */ 3, /* DO_COLLATION */ 4, /* DO_EXTENSION */ 5, /* DO_TYPE */ 5, /* DO_SHELL_TYPE */ 6, /* DO_FUNC */ 7, /* DO_AGG */ 8, /* DO_OPERATOR */ 9, /* DO_OPCLASS */ 9, /* DO_OPFAMILY */ 10, /* DO_CAST */ 11, /* DO_CONVERSION */ 12, /* DO_TSPARSER */ 13, /* DO_TSTEMPLATE */ 14, /* DO_TSDICT */ 15, /* DO_TSCONFIG */ 16, /* DO_FDW */ 17, /* DO_FOREIGN_SERVER */ 18, /* DO_TABLE */ 19, /* DO_DUMMY_TYPE */ 20, /* DO_ATTRDEF */ 21, /* DO_BLOB */ 22, /* DO_PRE_DATA_BOUNDARY */ 23, /* DO_TABLE_DATA */ 24, /* DO_BLOB_DATA */ 25, /* DO_POST_DATA_BOUNDARY */ 26, /* DO_CONSTRAINT */ 27, /* DO_INDEX */ 28, /* DO_REFRESH_MATVIEW */ 28 /* DO_MATVIEW_INDEX */ 29, /* DO_RULE */ 30, /* DO_TRIGGER */ 31, /* DO_FK_CONSTRAINT */ 32, /* DO_DEFAULT_ACL */ 33, /* DO_EVENT_TRIGGER */ I don't think that pushing MV refreshes and index creation farther down the list should require anything beyond adjusting the priority numbers. I don't see a problem pushing them to the end. Does anyone else see anything past priority 28 that MV population should *not* follow? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thom Brown <thom@linux.com> wrote: > On 16 February 2013 01:01, Kevin Grittner <kgrittn@ymail.com> wrote: >> Unless something else comes up in review or I get feedback to >> the contrary I plan to deal with the above-mentioned issues and >> commit this within a week or two. > > At the moment it's not possible to rename a column without using > ALTER TABLE on an MV. > > Also, shouldn't we have the ability to set the collation on a > column of the MV? Will fix. > And the inconsistency between regular views and MVs is still > present, where MVs always dump with column parameters in their > definition, and views never do. Not a show-stopper, but curious > nonetheless. I haven't worried about this because current behavior generates correct results -- this seems like a micro-optimization. The explanation for why it wound up that way is that creating a materialized view is in many ways more like creating a table than like creating a view -- it seemed safer and less invasive to modify the CREATE TABLE code than the CREATE VIEW code, and specifying column names just fell out of that as part of the minimal change. In looking at the pg_dump output, though, I see that the CMV AS clause also is getting the names right with column-level aliases, so it should be pretty simple and safe to leave off the column-list section for MVs. I guess it's worth it just to forestall further questions on the topic. Thanks! -Kevin
On Mon, Feb 18, 2013 at 06:49:14AM -0800, Kevin Grittner wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Maybe it would be a good idea to try to put such commands at the > > very end of the dump, if possible. > 25, /* DO_POST_DATA_BOUNDARY */ > 26, /* DO_CONSTRAINT */ > 27, /* DO_INDEX */ > 28, /* DO_REFRESH_MATVIEW */ > 28 /* DO_MATVIEW_INDEX */ > 29, /* DO_RULE */ > 30, /* DO_TRIGGER */ > 31, /* DO_FK_CONSTRAINT */ > 32, /* DO_DEFAULT_ACL */ > 33, /* DO_EVENT_TRIGGER */ > > I don't think that pushing MV refreshes and index creation farther > down the list should require anything beyond adjusting the priority > numbers. I don't see a problem pushing them to the end. Does > anyone else see anything past priority 28 that MV population should > *not* follow? DO_EVENT_TRIGGER should remain last; it may change the behavior of nearly any other command. Moving DO_REFRESH_MATVIEW past DO_TRIGGER would affect the outcome when the MV calls functions that ultimately trip triggers or rules. Currently, the behavior will be the same as for CHECK constraints: the rules and triggers don't exist yet. This may also affect, for the better, MVs referencing views that need the CREATE TABLE ... CREATE RULE _RETURN restoration pathway. It looks like a positive change. On the flip side, I wonder if there's some case I'm not considering where it's important to delay restoring rules and/or triggers until after restoring objects for which restoration can entail calls to arbitrary user functions. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Noah Misch <noah@leadboat.com> wrote: > On Mon, Feb 18, 2013 at 06:49:14AM -0800, Kevin Grittner wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> Maybe it would be a good idea to try to put such commands at >>> the very end of the dump, if possible. > >> 25, /* DO_POST_DATA_BOUNDARY */ >> 26, /* DO_CONSTRAINT */ >> 27, /* DO_INDEX */ >> 28, /* DO_REFRESH_MATVIEW */ >> 28 /* DO_MATVIEW_INDEX */ >> 29, /* DO_RULE */ >> 30, /* DO_TRIGGER */ >> 31, /* DO_FK_CONSTRAINT */ >> 32, /* DO_DEFAULT_ACL */ >> 33, /* DO_EVENT_TRIGGER */ >> >> I don't think that pushing MV refreshes and index creation >> farther down the list should require anything beyond adjusting >> the priority numbers. I don't see a problem pushing them to the >> end. Does anyone else see anything past priority 28 that MV >> population should *not* follow? > > DO_EVENT_TRIGGER should remain last; it may change the behavior > of nearly any other command. > > Moving DO_REFRESH_MATVIEW past DO_TRIGGER would affect the > outcome when the MV calls functions that ultimately trip triggers > or rules. Currently, the behavior will be the same as for CHECK > constraints: the rules and triggers don't exist yet. This may > also affect, for the better, MVs referencing views that need the > CREATE TABLE ... CREATE RULE _RETURN restoration pathway. It > looks like a positive change. On the flip side, I wonder if > there's some case I'm not considering where it's important to > delay restoring rules and/or triggers until after restoring > objects for which restoration can entail calls to arbitrary user > functions. I didn't quite follow all of Noah's points or their implications, so we chatted off-list. He made a couple additional observations which allow some simplification of the patch, and allow MV REFRESH to be moved to the very end of the priority list without ill effect. (1) While it might be incorrect for the CREATE INDEX on a materialized view to come after event triggers are set up, REFRESH can be expected to be a routine action in the presence of such triggers, and it might actually be incorrect to REFRESH when the triggers are not present. (2) REFRESH MATERIALIZED VIEW creates and builds a new heap, and reindexes it after the data has been loaded, so the timing of the CREATE INDEX statements on MVs is not critical, as long as they are done after the CREATE and before the REFRESH. We could drop them into the same priority as the other CREATE INDEX statements, and it would not be a big deal because the MVs would be empty. This should allow me to simplify the code a little bit and move the RMV step to the very end. That may have some advantages when users want to start using the database while MVs are being populated. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Feb 18, 2013 at 4:48 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > This should allow me to simplify the code a little bit and move the > RMV step to the very end. That may have some advantages when users > want to start using the database while MVs are being populated. In the department of crazy ideas, what about having pg_dump NEVER refresh ANY materialized views? It's true that the job of pg_dump and pg_restore is to put the new database in the same state that the old database was in, but I think you could make a reasonable case that materialized views ought to be an exception. After all, even with all of this infrastructure, chances are pretty good that the new MV contents won't end up being the same as the old MV contents on the old server - because the old MVs could easily have been stale. So why not just get the restore over with as fast as possible, and then let the user refresh the MVs that they think need refreshing (perhaps after getting the portions of their system that don't rely on MVs up and running)? At the very least, I think we ought to have an option for this behavior. But the more I think about it, the more I think maybe it ought to be the default. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2013/2/19 Robert Haas <robertmhaas@gmail.com>: > In the department of crazy ideas, what about having pg_dump NEVER > refresh ANY materialized views? > > It's true that the job of pg_dump and pg_restore is to put the new > database in the same state that the old database was in, but I think > you could make a reasonable case that materialized views ought to be > an exception. After all, even with all of this infrastructure, > chances are pretty good that the new MV contents won't end up being > the same as the old MV contents on the old server - because the old > MVs could easily have been stale. So why not just get the restore > over with as fast as possible, and then let the user refresh the MVs > that they think need refreshing (perhaps after getting the portions of > their system that don't rely on MVs up and running)? > > At the very least, I think we ought to have an option for this > behavior. But the more I think about it, the more I think maybe it > ought to be the default. +1 from me from a minimalist point of view. I think of a matview of the manually refreshed kind as “can contain stale contents (or be invalid) unless someone manually makes sure it is up to date (or valid)”. Making any matviews invalid by default upon restoring (itself being a manual action) would be consistent with that definition. Additionally, ISTM to be the least arbitrary (and hence most elegant) choice, and even more so in the context of matviews-depending-on-matviews. Spamming some more craziness: Another (more elaborate) suggestion could be: Store for each matview whether it is to be rebuilt upon restore or not. Using this setting would intuitively mean something like “I consider this matview being valid a precondition for considering the database state valid.” Setting this to true for a matview would only be allowed when any other matviews on which it depends also have this setting set to true. Just my €0.02 of course. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
On 2/19/13 8:54 AM, Robert Haas wrote: > In the department of crazy ideas, what about having pg_dump NEVER > refresh ANY materialized views? It might be useful to have an option for this, but I don't think it should be the default. The default should be that the new database is "ready to go". Then again, when would you ever actually use that option? This might be different if there were a command to refresh all materialized views, because you don't want to have to go around and type separate commands 47 times after a restore.
On Sat, February 16, 2013 02:01, Kevin Grittner wrote: > matview-v4.patch.gz Hi, I was wondering if material views should not go into information_schema. I was thinking either .views or .tables. Have you considered this? I ask because as far as I can see querying for mv's has to go like this: SELECT n.nspname, c.relname FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('m','') and n.nspname = 'myschema' which seems rather ugly. Also, some documentation typos: please see attached. Thanks, Erik Rijkers
Attachment
On 02/19/2013 02:09 PM, Erik Rijkers wrote: > On Sat, February 16, 2013 02:01, Kevin Grittner wrote: >> matview-v4.patch.gz > > Hi, > > I was wondering if material views should not go into information_schema. I was thinking either > .views or .tables. Have you considered this? > > I ask because as far as I can see querying for mv's has to go like this: > > SELECT n.nspname, c.relname > FROM pg_catalog.pg_class c > LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace > WHERE c.relkind IN ('m','') > and n.nspname = 'myschema' Well, I'm not sure about information_schema, but we'll definitely want a pg_matviews system view. Also a \dM. That could wait until 9.4, though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote: > On Sat, February 16, 2013 02:01, Kevin Grittner wrote: > > matview-v4.patch.gz > > Hi, > > I was wondering if material views should not go into information_schema. I was thinking either > .views or .tables. Have you considered this? I'm guessing it'd be .views if anything. Haven't been able to decipher from section 11 of the standard (Schemata) whether the standard has anything to say on the matter. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Erik Rijkers <er@xs4all.nl> wrote: > I was wondering if material views should not go into > information_schema. I was thinking either .views or .tables. > Have you considered this? I had not considered this to be a good idea because information_schema is defined by the standard, and materialized views are an extension to the standard. Someone using these views to identify either tables or views might make a bad choice based on this. I'm open to arguments for inclusion, if you think it would not violate the standard. Which would be safe? > Also, some documentation typos: please see attached. Will apply. Thanks. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Josh Berkus <josh@agliodbs.com> wrote: > Well, I'm not sure about information_schema, but we'll definitely > want a pg_matviews system view. > That could wait until 9.4, though. That I could probably do. Do you think they should have a separate pg_stat_user_matviews table, etc., or do you think it would be better to include them in with tables there? > Also a \dM. I already added it as \dm in the current patch. Does that conflict with something else that's pending? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> That I could probably do. Do you think they should have a separate > pg_stat_user_matviews table, etc., or do you think it would be > better to include them in with tables there? Well, ideally pg_matviews would have matview definitions, and pg_stat_matviews would have stats on matview usage and rows. But see what you can get done; I imagine we'll overhaul it for 9.4 anyway once we've had a chance to use the feature. > >> Also a \dM. > > I already added it as \dm in the current patch. Does that conflict > with something else that's pending? Oh, no, I thought \dm was *already* in use, but apparently not. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Kevin Grittner <kgrittn@ymail.com> > There was one minor syntax issue not addressed by Noah, nor much > discussed in general that I didn't want to just unilaterally > choose; but given that nobody seems to care that much I will put > forward a proposal and do it that way tomorrow if nobody objects. > Before this patch tables were the only things subject to > truncation, but now materialized views can also be truncated. So > far we have been treating TABLE as a noise word in the truncate > command. I assume we still want to allow tables to be truncated > with or without the word. The question is what to do about > materialized views, and wheter both can be specified on a single > TRUNCATE statement. I propose that we allow TABLE or MATERIALIZED > VIEW to be specified, or that part of the statement to be left out. > I propose that either type of object be allowed unless one or the > other is specified and the object to be truncated is not of that > kind. So you could mix both kinds on one statement, so long as you > didn't specify either kind. When I went to do this, I hit a shift/reduce conflict, because with TABLE being optional it couldn't tell whether: TRUNCATE MATERIALIZED VIEW x, y, z; ... was looking for five relations or three. That goes away with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that OK? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Josh Berkus <josh@agliodbs.com> wrote: >> That I could probably do. Do you think they should have a separate >> pg_stat_user_matviews table, etc., or do you think it would be >> better to include them in with tables there? > > Well, ideally pg_matviews would have matview definitions, and > pg_stat_matviews would have stats on matview usage and rows. But see > what you can get done; I imagine we'll overhaul it for 9.4 anyway once > we've had a chance to use the feature. I agree on pg_matviews, but after looking over the existing views and thinking about what I would use them for as a DBA, I'm inclined to fold the backing tables for MVs into the _stat_ and _statio_ views -- especially since we already include the backing tables and indexes for TOAST. There is a precident for including implementation details at that level. The only difference from TOAST, is that I include the heap and indexes for MVs in the _user_ views. I'm attaching the patch for just the system_views.sql file for discussion before I go write docs for this part. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Kevin Grittner <kgrittn@ymail.com> wrote: > I'm attaching the patch for just the system_views.sql file > for discussion before I go write docs for this part. Meh. If I'm gonna have pg_matviews I might as well include an isscannable column. v2 attached. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 02/19/2013 03:41 PM, Kevin Grittner wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: > >> I'm attaching the patch for just the system_views.sql file >> for discussion before I go write docs for this part. > > Meh. If I'm gonna have pg_matviews I might as well include an > isscannable column. v2 attached. pg_get_viewdef() will work on Matviews? Great. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Kevin Grittner <kgrittn@ymail.com> writes: > When I went to do this, I hit a shift/reduce conflict, because with > TABLE being optional it couldn't tell whether: > TRUNCATE MATERIALIZED VIEW x, y, z; > ... was looking for five relations or three.� That goes away with > MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.� Is that OK? Not really. I would much rather see us not bother with this pedantic syntax than introduce an even-partially-reserved word. Having said that, I don't think I believe your analysis of why this doesn't work. The presence or absence of commas ought to make the syntax non-ambiguous, I would think. Maybe you just factored the grammar wrong. regards, tom lane
On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 2/19/13 8:54 AM, Robert Haas wrote: >> In the department of crazy ideas, what about having pg_dump NEVER >> refresh ANY materialized views? > > It might be useful to have an option for this, but I don't think it > should be the default. The default should be that the new database is > "ready to go". > > Then again, when would you ever actually use that option? You'd use that option if you'd rather get the database mostly-up as soon as possible, and then worry about the materialized views afterwards. > This might be different if there were a command to refresh all > materialized views, because you don't want to have to go around and type > separate commands 47 times after a restore. Well, it's pretty easy to do: SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM pg_class WHERE relkind = 'm'; ...but we could also add explicit syntax for it, perhaps along the lines of what we have for CLUSTER and VACUUM. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut <peter_e@gmx.net> wrote: >> This might be different if there were a command to refresh all >> materialized views, because you don't want to have to go around and type >> separate commands 47 times after a restore. > Well, it's pretty easy to do: > SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM > pg_class WHERE relkind = 'm'; > ...but we could also add explicit syntax for it, perhaps along the > lines of what we have for CLUSTER and VACUUM. It's not really that easy, because of the likelihood that MVs have to be refreshed in a specific order. The SELECT you suggest definitely seems too simplistic. A dedicated command could perhaps be built to pay attention to dependencies ... but if we're still coding such things now, it seems a bit late for 9.3. regards, tom lane
On 2/19/13 5:22 PM, David Fetter wrote: > On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote: >> On Sat, February 16, 2013 02:01, Kevin Grittner wrote: >>> matview-v4.patch.gz >> >> Hi, >> >> I was wondering if material views should not go into information_schema. I was thinking either >> .views or .tables. Have you considered this? > > I'm guessing it'd be .views if anything. Haven't been able to > decipher from section 11 of the standard (Schemata) whether the > standard has anything to say on the matter. I suppose one should be able to expect that if one finds a view in the information schema, then one should be able to use DROP VIEW to remove it. Which in this case wouldn't work. So I don't think including a materialized view under views or tables is appropriate.
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> When I went to do this, I hit a shift/reduce conflict, because >> with TABLE being optional it couldn't tell whether: > >> TRUNCATE MATERIALIZED VIEW x, y, z; > >> ... was looking for five relations or three. That goes away >> with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that >> OK? > > Not really. I would much rather see us not bother with this > pedantic syntax than introduce an even-partially-reserved word. I'm not sure it's worth it either; but two people requested it and I didn't forsee this shift/reduce conflict, so I took a shot at it. If we can't eliminate the conflict, I'm fine with leaving things as they are in the latest posted patch. > Having said that, I don't think I believe your analysis of why > this doesn't work. The presence or absence of commas ought to > make the syntax non-ambiguous, I would think. Maybe you just > factored the grammar wrong. Well, it wouldn't be the first time you've seen a better way to do something in flex than I was able to see. Taking just the gram.y part of the change which implemented this, and omitting the change in reservedness of MATERIALIZED, I have: diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 820cb41..1d393c5 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -394,6 +394,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <ival> opt_column event cursor_options opt_hold opt_set_data %type <objtype> reindex_type drop_type comment_type security_label_type + trunc_type %type <node> fetch_args limit_clause select_limit_value offset_clause select_offset_value @@ -5172,9 +5173,10 @@ attrs: '.' attr_name *****************************************************************************/ TruncateStmt: - TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior + TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior { TruncateStmt *n = makeNode(TruncateStmt); + n->objtype = $2; n->relations = $3; n->restart_seqs = $4; n->behavior = $5; @@ -5182,6 +5184,12 @@ TruncateStmt: } ; +trunc_type: + TABLE { $$ = OBJECT_TABLE; } + | MATERIALIZED VIEW { $$ = OBJECT_MATVIEW; } + | /*EMPTY*/ { $$ = OBJECT_UNSPECIFIED; } + ; + opt_restart_seqs: CONTINUE_P IDENTITY_P { $$ = false; } | RESTART IDENTITY_P { $$ = true; } I'm open to suggestions on a better way. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Peter Eisentraut <peter_e@gmx.net> wrote: > I suppose one should be able to expect that if one finds a view > in the information schema, then one should be able to use DROP > VIEW to remove it. Which in this case wouldn't work. So I don't > think including a materialized view under views or tables is > appropriate. Right. I think adding pg_matviews covers the stated use-case enough to answer Erik's concern. I'm not going to mess with adding non-standard stuff to the standard views. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Having said that, I don't think I believe your analysis of why >> this doesn't work. > Well, it wouldn't be the first time you've seen a better way to do > something in flex than I was able to see. Taking just the gram.y > part of the change which implemented this, and omitting the change > in reservedness of MATERIALIZED, I have: > - TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior > + TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior > +trunc_type: > + TABLE { $$ = OBJECT_TABLE; } > + | MATERIALIZED VIEW { $$ = OBJECT_MATVIEW; } > + | /*EMPTY*/ { $$ = OBJECT_UNSPECIFIED; } > + ; Yeah, this is a standard gotcha when working with unreserved keywords. You can't factor it like that because then the parser is required to make a shift-reduce decision (on whether to reduce trunc_type to empty) before it can "see past" the first word. So for instance given TRUNCATE MATERIALIZED ... ^ the parser has to make that decision when it can't see past the word "MATERIALIZED" and so doesn't know what comes after it. The way to fix it is to not try to use the sub-production but spell it all out: TRUNCATE TABLE relation_expr_list ...| TRUNCATE MATERIALIZED VIEW relation_expr_list ...| TRUNCATE relation_expr_list ... Now the parser doesn't have to make any shift-reduce decision until after it can "see past" the first identifier. It's a bit tedious but beats making a word more reserved than it has to be. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > The way to fix it is to not try to use the sub-production but spell it > all out: > > TRUNCATE TABLE relation_expr_list ... > | TRUNCATE MATERIALIZED VIEW relation_expr_list ... > | TRUNCATE relation_expr_list ... > > Now the parser doesn't have to make any shift-reduce decision until > after it can "see past" the first identifier. It's a bit tedious > but beats making a word more reserved than it has to be. Thanks! Will do. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2/20/13 6:13 AM, Robert Haas wrote: >> It might be useful to have an option for this, but I don't think it >> > should be the default. The default should be that the new database is >> > "ready to go". >> > >> > Then again, when would you ever actually use that option? > You'd use that option if you'd rather get the database mostly-up as > soon as possible, and then worry about the materialized views > afterwards. Since the proposed materialized views are not available for implicit use in query optimization, the only way an application would make use of them is to access them directly. And if it accesses an unpopulated materialized view, it would fail. So I don't think in the current state a database is mostly-up without the materialized views filled in. I can see the value in having a restore mode that postpones certain nonessential operations, such as creating indexes or certain constraints or even materialized views. But I think the boundaries and expectations for that need to be defined more precisely. For example, a database without constraints might be considered "ready for read-only use", without secondary indexes it might be "ready for use but slow".
On Wed, February 20, 2013 16:28, Kevin Grittner wrote: > Peter Eisentraut <peter_e@gmx.net> wrote: > >> I suppose one should be able to expect that if one finds a view >> in the information schema, then one should be able to use DROP >> VIEW to remove it. Which in this case wouldn't work. So I don't >> think including a materialized view under views or tables is >> appropriate. > > Right. I think adding pg_matviews covers the stated use-case > enough to answer Erik's concern. Absolutely - I agree pg_matviews is much better than adding deviating information_schema stuff. Thank you, Erik Rijkers
On Wed, Feb 20, 2013 at 4:26 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> The way to fix it is to not try to use the sub-production but spell it >> all out: >> >> TRUNCATE TABLE relation_expr_list ... >> | TRUNCATE MATERIALIZED VIEW relation_expr_list ... >> | TRUNCATE relation_expr_list ... >> >> Now the parser doesn't have to make any shift-reduce decision until >> after it can "see past" the first identifier. It's a bit tedious >> but beats making a word more reserved than it has to be. > > Thanks! Will do. Fwiw I think worrying about stuff like this at this point is probably a waste of time. There'll be a period of bike-shedding where people debate what the command should be called so worrying about parser conflicts before there's a consensus is kind pointless. I would like to know what operations you plan to support independently of the command names. I may have missed much earlier in the discussion but then I suspect things have evolved since then. It sounds like you want to support: 1) Selecting from materialized viws 2) Manually refreshing materialized views 3) Manually truncating materialized views And explicitly not support 1) Automatically rewriting queries to select from matching views 2) Incrementally refreshing materialized views 3) Manual DML against data in materialized views (except truncate which is kind of DDL) 4) Keeping track of whether the data in the materialized view is up to date I have to say I find this model a bit odd. It seems the UI you're presenting is that they're basically read-only tables that the database will fill in the data for automatically. My mental model of materialized views is that they're basically views that the database guarantees a different performance characteristic for. I would expect a materialized view to be up to date all the time. If we don't support incremental updates (which seems like a fine thing not to support in a first cut) then I would expect any DML against the table to mark the view invalid and any queries against it to produce an error (or possibly go to the source tables using the view definition but that's probably a bad idea for most use cases). Ie. they should behave like a view at all times and have up to date information or fail entirely. I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but I would expect it to be called something like INVALIDATE rather than TRUNCATE and dropping the storage is a side effect of simply telling the database that it doesn't need to maintain this materialized view. Though I could be convinced "truncate" is a good name as long as it's documented well. -- greg
Greg Stark <stark@mit.edu> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The way to fix it is to not try to use the sub-production but >>> spell it all out: >>> >>> TRUNCATE TABLE relation_expr_list ... >>> | TRUNCATE MATERIALIZED VIEW relation_expr_list ... >>> | TRUNCATE relation_expr_list ... >>> >>> Now the parser doesn't have to make any shift-reduce decision >>> until after it can "see past" the first identifier. It's a bit >>> tedious but beats making a word more reserved than it has to >>> be. >> >> Thanks! Will do. > > Fwiw I think worrying about stuff like this at this point is > probably a waste of time. There'll be a period of bike-shedding > where people debate what the command should be called so worrying > about parser conflicts before there's a consensus is kind > pointless. That sort of bikeshedding already happened three months ago. Too late now. > I would like to know what operations you plan to support > independently of the command names. I may have missed much > earlier in the discussion but then I suspect things have evolved > since then. > > It sounds like you want to support: > > 1) Selecting from materialized viws > 2) Manually refreshing materialized views > 3) Manually truncating materialized views > > And explicitly not support > > 1) Automatically rewriting queries to select from matching views > 2) Incrementally refreshing materialized views Those are material for later releases, building on the base of what goes into this release. > 3) Manual DML against data in materialized views (except truncate > which is kind of DDL) There is quite a lot of DML allowed -- changing tablespace, changing schema, changing name of the MV or of individual columns in it, changing statistics targets, creating indexes, and other operations are supported. > 4) Keeping track of whether the data in the materialized view is > up to date Only keeping track of whether data has been populated or not, for now. There has been agreement that one or more timestamps relating to freshness will make sense, but these are not in the initial patch. > I have to say I find this model a bit odd. It's not a model, it's a starting point. Several people have already said that even this much is useful and they expect to take advantage of it. I'm doing what I can to not paint us into a corner where it's hard to extend to all the features everyone dreams of, but if we waited for that to commit something, it will never happen. > I would expect a materialized view to be up to date all the time. I expect that this will eventually be an option, but I expect that is will be a seldom-used one. Most cases that I've seen, people want summary data that is reasonably up-to-date without unduly affecting the performance of incremental changes to the underlying data. I've sketched out the roadmap from this patch to all of these options in a vauge, handwavy fashion, and don't have a lot of interest in taking it farther until we're past 9.3 beta. > If we don't support incremental updates (which seems like a fine > thing not to support in a first cut) then I would expect any DML > against the table to mark the view invalid and any queries > against it to produce an error (or possibly go to the source > tables using the view definition but that's probably a bad idea > for most use cases). Ie. they should behave like a view at all > times and have up to date information or fail entirely. That would render them completely useless for the use-cases I've seen. If you want to offer a patch to do that as an option, feel free, but I will strongly argue against that as unconditional behavior. > I would expect a command like TRUNCATE MATERIALIZED VIEW to exist > but I would expect it to be called something like INVALIDATE > rather than TRUNCATE and dropping the storage is a side effect of > simply telling the database that it doesn't need to maintain this > materialized view. Though I could be convinced "truncate" is a > good name as long as it's documented well. I'm trying to minimize the number of new keywords. The initial patch only added MATERIALIZED. I added REFRESH due to near-universal demand for something other than the LOAD MATERIALIZED VIEW I initially used. Have you seen the statistics Tom gave out on how much the size of the executable bloats with every new keyword? Until now nobody has expressed concern about TRUNCATE MATERIALIZED VIEW, so it would take quite a groundswell of concern at this point to even consider a new keyword for this functionality this late in the game. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> And explicitly not support > > 1) Automatically rewriting queries to select from matching views > 2) Incrementally refreshing materialized views > 3) Manual DML against data in materialized views (except truncate > which is kind of DDL) > 4) Keeping track of whether the data in the materialized view is up to date The idea is to add the above features over the next few versions of Postgres. > I have to say I find this model a bit odd. It seems the UI you're > presenting is that they're basically read-only tables that the > database will fill in the data for automatically. This is what matviews are in other DBMSes. > My mental model of > materialized views is that they're basically views that the database > guarantees a different performance characteristic for. How would we do that, exactly? That would be lovely, but unless you have a way to accomplish it ... > I would expect a materialized view to be up to date all the time. Actually, there's a huge use case for asynchronously updated matviews, so we would not want an implementation which ruled them out. Also there's the argument that synchronously updated matviews have little actual performance advantage over regular dynamic views. Or to put it another way: I could use this feature, as it is, in about 8 different projects I'm currently supporting. I personally can't think of a single project where I need synchronously updated matviews, currently. I have in the past, but it's a LOT less frequent that the desire for async, just as the desire for async replication is more common than the desire for syncrep. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2/19/13 5:47 PM, Kevin Grittner wrote: > When I went to do this, I hit a shift/reduce conflict, because with > TABLE being optional it couldn't tell whether: > > TRUNCATE MATERIALIZED VIEW x, y, z; > > ... was looking for five relations or three. That goes away with > MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that OK? Is TRUNCATE even the right command here? For regular tables TRUNCATE is a fast DELETE, which logically empties the table. For materialized views, there is no deleting, so this command (I suppose?) just invalidates the materalized view. That's not the same thing. Are there TRUNCATE triggers on materialized views?
Peter Eisentraut <peter_e@gmx.net> wrote: > On 2/19/13 5:47 PM, Kevin Grittner wrote: >> When I went to do this, I hit a shift/reduce conflict, because >> with TABLE being optional it couldn't tell whether: >> >> TRUNCATE MATERIALIZED VIEW x, y, z; >> >> ... was looking for five relations or three. That goes away >> with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that >> OK? > > Is TRUNCATE even the right command here? For regular tables > TRUNCATE is a fast DELETE, which logically empties the table. > For materialized views, there is no deleting, so this command (I > suppose?) just invalidates the materalized view. That's not the > same thing. Hmm. That's what Greg Stark just said, and I replied that nobody else had raised the issue in over three months. With Greg, that's two now. TRUNCATE MATERIALIZED VIEW discards any data which has been loaded into the MV, rendering it unavailable for scanning. Internally, it does do a truncate, exactly as truncate table. The resulting zero-length heap file is what is used to determine whether a materialized view is "scannable". When a CREATE WITH DATA or a REFRESH generates zero rows, an empty single page is created to indicate that it is scannable (valid to use in queries) but contains no rows. I agree that INVALIDATE is probably more descriptive, although it seems that there might be some even better word if we bikeshed enough. The question is, is it worth creating a new keyword to call the internal truncate function for materialized views, versus documenting that truncating a materialized view renders it invalid? Again, given the numbers that Tom presented a while back about the space requirements of every new keyword, I don't think this is enough of a gain to justify that. I still squirm a little about having used REFRESH, even though demand for that was overwhelming. > Are there TRUNCATE triggers on materialized views? No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't create a trigger of any type on a materialized view. I don't think that would interfere with event triggers, though. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Kevin Grittner (kgrittn@ymail.com) wrote: > Peter Eisentraut <peter_e@gmx.net> wrote: > > Is TRUNCATE even the right command here? For regular tables > > TRUNCATE is a fast DELETE, which logically empties the table. > > For materialized views, there is no deleting, so this command (I > > suppose?) just invalidates the materalized view. That's not the > > same thing. > > Hmm. That's what Greg Stark just said, and I replied that nobody > else had raised the issue in over three months. With Greg, that's > two now. TRUNCATE MAT VIEW seems like the right command to me. Just my 2c. Thanks, Stephen
On 2/20/13 2:30 PM, Kevin Grittner wrote: >> Are there TRUNCATE triggers on materialized views? > No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't > create a trigger of any type on a materialized view. I don't think > that would interfere with event triggers, though. More generally, I would consider the invalidation of a materialized view a DDL command, whereas truncating a table is a DML command. This has various implications with triggers, logging, permissions. I think it's not good to mix those two. Also note that un-invalidating==refreshing a materialized view is already a DDL command.
Peter Eisentraut <peter_e@gmx.net> wrote: > On 2/20/13 2:30 PM, Kevin Grittner wrote: >>> Are there TRUNCATE triggers on materialized views? >> No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't >> create a trigger of any type on a materialized view. I don't >> think that would interfere with event triggers, though. > > More generally, I would consider the invalidation of a > materialized view a DDL command, whereas truncating a table is a > DML command. The force of that assertion is somewhat undercut by the fact that the ExecuteTruncate() function does exactly what needs to be done to discard the data in a materialized view and make it appear as non-scannable. Even if we dress it up with different syntax, it's not clear that we wouldn't build a TruncateStmt in the parser and pass it through exactly the same execution path. We would just need to look at the relkind to generate a different command tag. > This has various implications with triggers, logging, > permissions. I think it's not good to mix those two. Could you give a more concrete example of where you see a problem? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> I would like to know what operations you plan to support independently > of the command names. I may have missed much earlier in the discussion > but then I suspect things have evolved since then. > > It sounds like you want to support: > > 1) Selecting from materialized viws > 2) Manually refreshing materialized views > 3) Manually truncating materialized views Maybe plus? 4) Automatically dropping materialized views if underlying table(s) are dropped/altered Or this has to be done manually? > And explicitly not support > > 1) Automatically rewriting queries to select from matching views > 2) Incrementally refreshing materialized views > 3) Manual DML against data in materialized views (except truncate > which is kind of DDL) > 4) Keeping track of whether the data in the materialized view is up to date > > I have to say I find this model a bit odd. It seems the UI you're > presenting is that they're basically read-only tables that the > database will fill in the data for automatically. My mental model of > materialized views is that they're basically views that the database > guarantees a different performance characteristic for. > > I would expect a materialized view to be up to date all the time. If > we don't support incremental updates (which seems like a fine thing > not to support in a first cut) then I would expect any DML against the > table to mark the view invalid and any queries against it to produce > an error (or possibly go to the source tables using the view > definition but that's probably a bad idea for most use cases). Ie. > they should behave like a view at all times and have up to date > information or fail entirely. > > I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but > I would expect it to be called something like INVALIDATE rather than > TRUNCATE and dropping the storage is a side effect of simply telling > the database that it doesn't need to maintain this materialized view. > Though I could be convinced "truncate" is a good name as long as it's > documented well. > > -- > greg > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > More generally, I would consider the invalidation of a materialized view > a DDL command, whereas truncating a table is a DML command. That's not entirely true. From the database's point of view, TRUNCATE is in many ways actually DDL. I actually don't really dislike using "TRUNCATE" for this command. I was more asking about whether this meant people were thinking of the view as a thing where you could control the data in it by hand and could have the view be "empty" rather than just "not valid". The way I was thinking about it, whatever the command is named, you might be able to tell the database to drop the storage associated with the view but that would make the view invalid until it was refreshed. It wouldn't make it appear to be empty. -- greg
Greg Stark <stark@mit.edu> writes: > The way I was thinking about it, whatever the command is named, you > might be able to tell the database to drop the storage associated with > the view but that would make the view invalid until it was refreshed. > It wouldn't make it appear to be empty. Actually, that seems like a pretty key point to me. TRUNCATE TABLE results in a table that is perfectly valid, you just deleted all the rows that used to be in it. Throwing away an MV's contents should not result in an MV that is considered valid. That being the case, lumping them as being the "same" operation feels like the wrong thing, and so we should choose a different name for the MV operation. regards, tom lane
On 2013-02-21 04:14:09 +0000, Greg Stark wrote: > On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > > More generally, I would consider the invalidation of a materialized view > > a DDL command, whereas truncating a table is a DML command. > > That's not entirely true. From the database's point of view, TRUNCATE > is in many ways actually DDL. > > I actually don't really dislike using "TRUNCATE" for this command. I > was more asking about whether this meant people were thinking of the > view as a thing where you could control the data in it by hand and > could have the view be "empty" rather than just "not valid". It also might get confusing when we get materialized views that are auto-updateable. I am not suggesting to forward TRUNCATE to the internal storage in that case but giving an error so its an easy to find distinction to a normal table seems like a good idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Feb 20, 2013 at 11:14 PM, Greg Stark <stark@mit.edu> wrote: > On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut <peter_e@gmx.net> wrote: >> More generally, I would consider the invalidation of a materialized view >> a DDL command, whereas truncating a table is a DML command. > > That's not entirely true. From the database's point of view, TRUNCATE > is in many ways actually DDL. > > I actually don't really dislike using "TRUNCATE" for this command. Me neither. I'm astonished that we're seriously considering adding new keywords for this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2/20/13 5:03 PM, Kevin Grittner wrote: > Peter Eisentraut <peter_e@gmx.net> wrote: >> On 2/20/13 2:30 PM, Kevin Grittner wrote: >>>> Are there TRUNCATE triggers on materialized views? >>> No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't >>> create a trigger of any type on a materialized view. I don't >>> think that would interfere with event triggers, though. >> >> More generally, I would consider the invalidation of a >> materialized view a DDL command, whereas truncating a table is a >> DML command. > > The force of that assertion is somewhat undercut by the fact that > the ExecuteTruncate() function does exactly what needs to be done > to discard the data in a materialized view and make it appear as > non-scannable. Even if we dress it up with different syntax, it's > not clear that we wouldn't build a TruncateStmt in the parser and > pass it through exactly the same execution path. We would just > need to look at the relkind to generate a different command tag. This is a fall-out of the implementation, and that's fine (although I'd personally still be in favor of putting that state in the catalog, not into the block count on disk, effectively), but I'm talking about the external interfaces we present. >> This has various implications with triggers, logging, >> permissions. I think it's not good to mix those two. > > Could you give a more concrete example of where you see a problem? * Logging: You can set things to log DDL commands only. I would want a MV invalidation to be logged. * Permissions: There is a TRUNCATE permission, would that apply here? There is no refresh permission. * Triggers: There are TRUNCATE triggers, but they don't apply here. * Triggers: I don't know how event triggers work, but I'd like materialized view events to be grouped together somehow. * Don't know the opinion of sepgsql on all this. I think what this all comes down to, as I've mentioned before, is that the opposite of this proposed truncate operation is the refresh operation, and that is a DDL command under ALTER MATERIALIZED VIEW. Both of these fundamental operations -- truncate/refresh, invalidate/validate, empty/refill, whatever -- should be grouped together somehow, as far as syntax, as well logging, permissions, trigger handling, and so on are concerned. You don't need a new command or key word for this. How about ALTER MATERIALIZED VIEW DISCARD?
On 2/20/13 11:14 PM, Greg Stark wrote: > That's not entirely true. From the database's point of view, TRUNCATE > is in many ways actually DDL. Whether something is DDL or DML or a read operation (query) is not an implementation detail, it's a user-exposed category. Since TRUNCATE is logically equivalent to DELETE, it's a DML operation, as far as the user is concerned.
Peter Eisentraut <peter_e@gmx.net> wrote: > On 2/20/13 11:14 PM, Greg Stark wrote: >> That's not entirely true. From the database's point of view, >> TRUNCATE is in many ways actually DDL. > > Whether something is DDL or DML or a read operation (query) is > not an implementation detail, it's a user-exposed category. > Since TRUNCATE is logically equivalent to DELETE, it's a DML > operation, as far as the user is concerned. Not really. It doesn't follow the same MVCC behavior as DML. This is user-visible, documented behavior. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@mit.edu> writes: >> The way I was thinking about it, whatever the command is named, you >> might be able to tell the database to drop the storage associated with >> the view but that would make the view invalid until it was refreshed. >> It wouldn't make it appear to be empty. > > Actually, that seems like a pretty key point to me. TRUNCATE TABLE > results in a table that is perfectly valid, you just deleted all the > rows that used to be in it. Throwing away an MV's contents should > not result in an MV that is considered valid. It doesn't. That was one of the more contentious points in the earlier bikeshedding phases. Some felt that throwing away the contents was a form of making the MV "out of date" and as such didn't require any special handling. Others, including myself, felt that "data not present" was a distinct state from "generated zero rows" and that attempting to scan a materialized view for which data had not been generated must result in an error. The latter property has been maintained from the beginning -- or at least that has been the intent. test=# CREATE TABLE t (id int NOT NULL PRIMARY KEY, type text NOT NULL, amt numeric NOT NULL); CREATE TABLE test=# CREATE MATERIALIZED VIEW tm AS SELECT type, sum(amt) AS totamt FROM t GROUP BY type WITH NO DATA; SELECT 0 test=# SELECT pg_relation_is_scannable('tm'::regclass); pg_relation_is_scannable -------------------------- f (1 row) test=# SELECT * FROM tm; ERROR: materialized view "tm" has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. test=# REFRESH MATERIALIZED VIEW tm; REFRESH MATERIALIZED VIEW test=# SELECT pg_relation_is_scannable('tm'::regclass); pg_relation_is_scannable -------------------------- t (1 row) test=# TRUNCATE MATERIALIZED VIEW tm; TRUNCATE TABLE test=# SELECT * FROM tm; ERROR: materialized view "tm" has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. test=# SELECT pg_relation_is_scannable('tm'::regclass); pg_relation_is_scannable -------------------------- f (1 row) > That being the case, lumping them as being the "same" operation > feels like the wrong thing, and so we should choose a different > name for the MV operation. There is currently no truncation of MV data without rendering the MV unscannable. Do you still feel it needs a different command name? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Greg Stark <stark@mit.edu> wrote: > I actually don't really dislike using "TRUNCATE" for this > command. I was more asking about whether this meant people were > thinking of the view as a thing where you could control the data > in it by hand and could have the view be "empty" rather than just > "not valid". You can either populate the MV in the CREATE command or by REFRESH, and it will be scannable. If it is created WITH NO DATA or TRUNCATEd it is not scannable, generating an error on an attempt to reference it. test=# select * from tm; type | totamt ------+-------- y | 12 z | 24 x | 5 (3 rows) test=# truncate tm; TRUNCATE TABLE test=# select * from tm; ERROR: materialized view "tm" has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. > The way I was thinking about it, whatever the command is named, > you might be able to tell the database to drop the storage > associated with the view but that would make the view invalid > until it was refreshed. It wouldn't make it appear to be empty. I think we're on the same page after all. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Peter Eisentraut <peter_e@gmx.net> wrote: I'll need more time to ponder your other points, but... > You don't need a new command or key word for this. How about > ALTER MATERIALIZED VIEW DISCARD? I don't like this because we don't have ALTER TABLE REINDEX. But the fact that DISCARD is a keyword does open up some interesting syntax possibilities without adding more keywords. Maybe: DISCARD MATERIALIZED VIEW DATA tm; Or something like that. Paint buckets are over there by the bikeshed. Have at it. :-) -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 21.02.2013 16:38, Kevin Grittner wrote: > Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Greg Stark<stark@mit.edu> writes: >>> The way I was thinking about it, whatever the command is named, you >>> might be able to tell the database to drop the storage associated with >>> the view but that would make the view invalid until it was refreshed. >>> It wouldn't make it appear to be empty. >> >> Actually, that seems like a pretty key point to me. TRUNCATE TABLE >> results in a table that is perfectly valid, you just deleted all the >> rows that used to be in it. Throwing away an MV's contents should >> not result in an MV that is considered valid. > > It doesn't. That was one of the more contentious points in the > earlier bikeshedding phases. Some felt that throwing away the > contents was a form of making the MV "out of date" and as such > didn't require any special handling. Others, including myself, > felt that "data not present" was a distinct state from "generated > zero rows" and that attempting to scan a materialized view for > which data had not been generated must result in an error. The > latter property has been maintained from the beginning -- or at > least that has been the intent. Yeah, "data not present" is clearly different from "0 rows". I'm surprised there isn't an explicit boolean column somewhere for that, but I guess you can use the size of the heap for that too, as long as you're careful to not truncate it to 0 blocks when it's empty but scannable. There's at least one bug left in that area: postgres=# create table t (id int4); CREATE TABLE postgres=# create materialized view tm as select * from t where id < 0;SELECT 0 postgres=# select * from tm; id ---- (0 rows) postgres=# create index i_tm on tm(id);CREATE INDEX postgres=# cluster tm using i_tm; CLUSTER postgres=# select * from tm; ERROR: materialized view "tm" has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. Clustering a materialized view invalidates it. - Heikki
Andres Freund <andres@2ndquadrant.com> wrote: > giving an error so its an easy to find distinction to a normal > table seems like a good idea. I'm not sure I understood your concerns entirely, but wonder whether this helps?: test=# \d List of relations Schema | Name | Type | Owner --------+-------+-------------------+--------- public | bb | materialized view | kgrittn public | t | table | kgrittn public | tm | materialized view | kgrittn public | tmm | materialized view | kgrittn public | tv | view | kgrittn public | tvmm | materialized view | kgrittn public | tvv | view | kgrittn public | tvvm | materialized view | kgrittn public | tvvmv | view | kgrittn (9 rows) test=# truncate table tm; ERROR: "tm" is not a table test=# truncate materialized view t; ERROR: "t" is not a materialized view test=# truncate materialized view tm; TRUNCATE TABLE test=# truncate table t; TRUNCATE TABLE Well, maybe those command tags could use a tweak. Then there's this, if you don't specify an object type: test=# truncate t, tm; TRUNCATE TABLE -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That being the case, lumping them as being the "same" operation >> feels like the wrong thing, and so we should choose a different >> name for the MV operation. > There is currently no truncation of MV data without rendering the > MV unscannable.� Do you still feel it needs a different command > name? You didn't say anything that changed my opinion: it doesn't feel like a TRUNCATE to me. It's not changing the object to a different but entirely valid state, which is what TRUNCATE does. Peter claimed upthread that REFRESH is a subcommand of ALTER MATERIALIZE VIEW and that this operation should be another one. That sounds pretty reasonable from here. regards, tom lane
Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 21.02.2013 16:38, Kevin Grittner wrote: >> Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Greg Stark<stark@mit.edu> writes: >>>> The way I was thinking about it, whatever the command is named, you >>>> might be able to tell the database to drop the storage associated with >>>> the view but that would make the view invalid until it was refreshed. >>>> It wouldn't make it appear to be empty. >>> >>> Actually, that seems like a pretty key point to me. TRUNCATE TABLE >>> results in a table that is perfectly valid, you just deleted all the >>> rows that used to be in it. Throwing away an MV's contents should >>> not result in an MV that is considered valid. >> >> It doesn't. That was one of the more contentious points in the >> earlier bikeshedding phases. Some felt that throwing away the >> contents was a form of making the MV "out of date" and as such >> didn't require any special handling. Others, including myself, >> felt that "data not present" was a distinct state from "generated >> zero rows" and that attempting to scan a materialized view for >> which data had not been generated must result in an error. The >> latter property has been maintained from the beginning -- or at >> least that has been the intent. > > Yeah, "data not present" is clearly different from "0 rows". > I'm surprised there isn't an explicit boolean column somewhere > for that, There was, in earlier versions of the patch: pg_class.relisvald. The problem is that we needed some way to determine from the heap that it was invalid to support UNLOGGED MVs. Several people were offended by my attempt to use relisvald as the primary indicator and transfer the information from the heap state to pg_class and relcache. There were some pretty big technical challenges to that. So I caved on that one and went with the pg_relation_is_scannable() function based on the heap as reported by relcache. That being one of the newer parts of the patch, it is probably not as solid as the parts which haven't changed much in the last three months. > but I guess you can use the size of the heap for that too, as > long as you're careful to not truncate it to 0 blocks when it's > empty but scannable. > > There's at least one bug left in that area: > > postgres=# create table t (id int4); > CREATE TABLE > postgres=# create materialized view tm as select * from t where id < 0;SELECT > 0 > postgres=# select * from tm; > id > ---- > (0 rows) > > postgres=# create index i_tm on tm(id);CREATE INDEX > postgres=# cluster tm using i_tm; > CLUSTER > postgres=# select * from tm; > ERROR: materialized view "tm" has not been populated > HINT: Use the REFRESH MATERIALIZED VIEW command. > > Clustering a materialized view invalidates it. Good spot. That should be easy enough to fix. Thanks. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-02-21 07:10:09 -0800, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > giving an error so its an easy to find distinction to a normal > > table seems like a good idea. > > I'm not sure I understood your concerns entirely, but wonder > whether this helps?: To explain it a bit: I assume that at some point matviews will get (auto-)updateable, just as normal views recently got. In that case application programmers might not be aware anymore that something is a view either because they just don't know or because a table got converted into a matview after the code was written. Because of the potential wish for transparency (with security views as a potential user) at least normal views might get the capability to be TRUNCATEd directly, so it might be that matviews do as well. > test=# \d > List of relations > Schema | Name | Type | Owner > --------+-------+-------------------+--------- > public | bb | materialized view | kgrittn > public | t | table | kgrittn > public | tm | materialized view | kgrittn > public | tmm | materialized view | kgrittn > public | tv | view | kgrittn > public | tvmm | materialized view | kgrittn > public | tvv | view | kgrittn > public | tvvm | materialized view | kgrittn > public | tvvmv | view | kgrittn > (9 rows) > > test=# truncate table tm; > ERROR: "tm" is not a table > test=# truncate materialized view t; > ERROR: "t" is not a materialized view > test=# truncate materialized view tm; > TRUNCATE TABLE > test=# truncate table t; > TRUNCATE TABLE Thats not bad. But what if we allow TRUNCATE on views someday (possibly only if a truncate trigger is defined). For consistency we might also want that on matvies. Having a difference between TRUNCATE view; and TRUNCATE MATERIALIZED VIEW; in that case sounds ugly to me. What about DISABLE? DISCARD or DEALLOCATE would also be nice but it seems hard to fit that into existing syntax. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That being the case, lumping them as being the "same" >>> operation feels like the wrong thing, and so we should choose a >>> different name for the MV operation. >> >> There is currently no truncation of MV data without rendering >> the MV unscannable. Do you still feel it needs a different >> command name? > > You didn't say anything that changed my opinion: it doesn't feel > like a TRUNCATE to me. It's not changing the object to a > different but entirely valid state, which is what TRUNCATE does. > > Peter claimed upthread that REFRESH is a subcommand of ALTER > MATERIALIZE VIEW It's not, nor do I think it should be. > and that this operation should be another one. That sounds > pretty reasonable from here. That feels completely wrong to me. For one thing, I can't think of any ALTER commands to populate or remove data. What did you think of the idea of something like DISCARD MATERIALIZED VIEW DATA as a new statment? Or maybe RESET MATERIALIZED VIEW? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andres Freund <andres@2ndquadrant.com> wrote: > I assume that at some point matviews will get (auto-)updateable, > just as normal views recently got. I'm dubious about that. Every use case I've seen for MVs involves aggregation, although they are a generalized feature, so that won't always be true. But if you have a view like: CREATE MATERIALIZED VIEW tm AS SELECT t.type, sum(t.amt) AS totamt FROM t GROUP BY t.type; ... I don't see how that can be updateable. If I add 5 to totamt for some row, what do you do? I expect that 99% of MVs will be updated asynchronously from changes to the underlying data -- what do you do if someone updates a row that no longer exists in the underlying data. This are just seems fraught with peril and out of sync with the usual uses of MVs. > What about DISABLE? DISCARD or DEALLOCATE would also be nice but > it seems hard to fit that into existing syntax. Thanks for the suggestions. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Peter claimed upthread that REFRESH is a subcommand of ALTER >> MATERIALIZE VIEW > It's not, nor do I think it should be. Oh, never mind then. >> and that this operation should be another one.� That sounds >> pretty reasonable from here. > That feels completely wrong to me.� For one thing, I can't think of > any ALTER commands to populate or remove data.� What did you think > of the idea of something like DISCARD MATERIALIZED VIEW DATA as a > new statment?� Or maybe RESET MATERIALIZED VIEW? I could live with either DISCARD or RESET. regards, tom lane
On 2/21/13 9:25 AM, Kevin Grittner wrote: > Peter Eisentraut <peter_e@gmx.net> wrote: >> On 2/20/13 11:14 PM, Greg Stark wrote: >>> That's not entirely true. From the database's point of view, >>> TRUNCATE is in many ways actually DDL. >> >> Whether something is DDL or DML or a read operation (query) is >> not an implementation detail, it's a user-exposed category. >> Since TRUNCATE is logically equivalent to DELETE, it's a DML >> operation, as far as the user is concerned. > > Not really. It doesn't follow the same MVCC behavior as DML. This > is user-visible, documented behavior. MVCC behavior does not determine whether something is considered DDL or DML.
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> What did you think of the idea of something like DISCARD >> MATERIALIZED VIEW DATA as a new statment? Or maybe RESET >> MATERIALIZED VIEW? > > I could live with either DISCARD or RESET. I figured this was worth a pass through the keyword list to look for all imperative verbs suitable for this, which could support the needed syntax without adding a new keyword. Here are the possibilities I came up with, along with a note about why they are keywords already. DISABLE MATERIALIZED VIEW mv; -- ALTER clause for constraints DISCARD MATERIALIZED VIEW DATA mv; -- session state RELEASE MATERIALIZED VIEW DATA mv; -- savepoint RESET MATERIALIZED VIEW DATA mv; -- run-time parameter I think any of these could work. I'm personally most inclined toward DISABLE MATERIALIZED VIEW. It seems to convey the semantics better, especially if you leave out DATA as an additonal word. Given that a materialized view will retain its query, tablespace, indexes, statistics targets, etc. with this operation, and will just not be available for scanning, some of the above seem downright misleading without DATA thrown in. Opinions? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 21, 2013 at 2:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > It doesn't. That was one of the more contentious points in the > earlier bikeshedding phases. Some felt that throwing away the > contents was a form of making the MV "out of date" and as such > didn't require any special handling. Others, including myself, > felt that "data not present" was a distinct state from "generated > zero rows" and that attempting to scan a materialized view for > which data had not been generated must result in an error. The > latter property has been maintained from the beginning -- or at > least that has been the intent. Actually this sounds like exactly what I was saying. I withdraw my concern entirely. -- greg
Greg Stark <stark@mit.edu> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: >> It doesn't. That was one of the more contentious points in the >> earlier bikeshedding phases. Some felt that throwing away the >> contents was a form of making the MV "out of date" and as such >> didn't require any special handling. Others, including myself, >> felt that "data not present" was a distinct state from >> "generated zero rows" and that attempting to scan a materialized >> view for which data had not been generated must result in an >> error. The latter property has been maintained from the >> beginning -- or at least that has been the intent. > > Actually this sounds like exactly what I was saying. I withdraw > my concern entirely. Reviewing your concerns and discussions of "freshness" in general got me thinking -- while it is clear that not having generated values in the MV based on its query clearly should make the view non-scannable, and will be the only criterion for that in this initial patch; later versions will almost certainly support age-based conditions for whether the MV is scannable. So in the next release the MV may become non-scannable based on the passage of time since DML was run on a source table without the MV having been refreshed or incrementally updated to reflect that DML. Which makes me wonder why DML making the MV non-scannable is such a bad thing in the case of TRUNCATE. Granted there is a difference in that it is run *on the MV* rather than *on a source relation*; but still, I'm having some second thoughts about that being a problem. The problem with TRUNCATE MATERIALIZED VIEW from a logical perspective doesn't seem to me so much that it makes the MV non-scannable, as that it is the only DML which would be allowed directly on an MV -- which is kind of a weird exception. It is pretty much a given that when we can get to implementing it, DML statements will render MVs unscannable under various conditions. Josh Berkus and Greg Stark have been the most explicit about that, but I think most of us who are interested in the feature take it as a given. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Kevin Grittner <kgrittn@ymail.com> writes: > > >> What did you think of the idea of something like DISCARD > >> MATERIALIZED VIEW DATA as a new statment? Or maybe RESET > >> MATERIALIZED VIEW? > > > > I could live with either DISCARD or RESET. > > I figured this was worth a pass through the keyword list to look > for all imperative verbs suitable for this, which could support the > needed syntax without adding a new keyword. Here are the > possibilities I came up with, along with a note about why they are > keywords already. > > DISABLE MATERIALIZED VIEW mv; -- ALTER clause for constraints > DISCARD MATERIALIZED VIEW DATA mv; -- session state > RELEASE MATERIALIZED VIEW DATA mv; -- savepoint > RESET MATERIALIZED VIEW DATA mv; -- run-time parameter > > I think any of these could work. I'm personally most inclined > toward DISABLE MATERIALIZED VIEW. It seems to convey the semantics > better, especially if you leave out DATA as an additonal word. > Given that a materialized view will retain its query, tablespace, > indexes, statistics targets, etc. with this operation, and will > just not be available for scanning, some of the above seem > downright misleading without DATA thrown in. I vote for RESET or DISCARD. DISABLE sounds more like you disable automatic refreshes or somesuch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote: >> DISABLE MATERIALIZED VIEW mv;� -- ALTER clause for constraints >> DISCARD MATERIALIZED VIEW DATA mv;� -- session state >> RELEASE MATERIALIZED VIEW DATA mv;� -- savepoint >> RESET MATERIALIZED VIEW DATA mv;� -- run-time parameter >> >> I think any of these could work.� I'm personally most inclined >> toward DISABLE MATERIALIZED VIEW.� It seems to convey the semantics >> better, especially if you leave out DATA as an additonal word. > I vote for RESET or DISCARD. DISABLE sounds more like you disable > automatic refreshes or somesuch. Yeah, I don't much like DISABLE either. I'm also concerned about overloading RESET this way --- that statement has complicated-enough syntax already, not to mention way too many shades of meaning. So that leaves me voting for DISCARD M.V. DATA, which seems pretty precise. It's a bit verbose, but since when has SQL been succinct? regards, tom lane
> That feels completely wrong to me. For one thing, I can't think of > any ALTER commands to populate or remove data. What did you think > of the idea of something like DISCARD MATERIALIZED VIEW DATA as a > new statment? Or maybe RESET MATERIALIZED VIEW? I prefer RESET, especially since it could eventually support RESET ALL MATERIALIZED VIEWS if that turns out to be useful. How does the parser like that? BTW, to contradict Peter E., for my part I would NOT want matview resets to be logged as DDL. I would only want matview definitition changes to be so logged. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus <josh@agliodbs.com> wrote: > I prefer RESET, especially since it could eventually support RESET ALL > MATERIALIZED VIEWS if that turns out to be useful. How does the parser > like that? Isn't reset currently only used for GUCs? I think that makes for a strange crossover. Really, I'm sorry for bringing this up. I really don't think "truncate" is a bad way to spell it. -- greg
On Sat, Feb 23, 2013 at 9:55 PM, Greg Stark <stark@mit.edu> wrote:
-- On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus <josh@agliodbs.com> wrote:Isn't reset currently only used for GUCs? I think that makes for a
> I prefer RESET, especially since it could eventually support RESET ALL
> MATERIALIZED VIEWS if that turns out to be useful. How does the parser
> like that?
strange crossover.
Michael
2013-02-19 Kevin Grittner wrote: [matview-system_views-v2.diff] I assumed the patches matview-v4.patch and matview-system_views-v2.diff were to be applied together. They do apply correctly but during tests, the "test rules ... FAILED". Perhaps it is solved already but I thought I'd mention it in case it is overlooked. Thanks, Erik Rijkers
On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html > DISCARD would be better. Well, personally, I'm in favor of either TRUNCATE or ALTER MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to suppose that we're going to start using DISCARD for things that have nothing to do with the existing meanings of DISCARD. Number one, I think it's confusing. Number two, it's currently possible to determine whether something is DDL, DML, or other by looking at the first word of the command. If we throw that out the window we may cause performance issues for connection pooling software that tries to be clever like that. Mind you, I'm not aware of any connection pooling software that actually does this today, because I think pgpool includes a full parser and pgbouncer includes no parser at all, but it still seems like a possibly-useful trick. And number three, the possibility of grammar conflicts with things we might want to do in the future seems unnecessarily high. If we use ALTER MATERIALIZED VIEW ... WHATEVER, we only have to worry about grammar conflicts with other ALTER MATERIALIZED VIEW commands; if we reuse DISCARD or RESET, we've got potential conflicts with completely unrelated syntax. That consideration alone would be sufficient reason for me to choose to stick everything under ALTER MATERIALIZED VIEW, no matter how poor a semantic fit it seems otherwise. Of course, if we stick with TRUNCATE, this becomes a non-issue. All that having been said, I'm not in favor of pushing this patch out to 9.4 because we can't agree on minor syntax details. In the absence of consensus, my feeling is that Kevin should exercise committer's prerogative and commit this in the way that seems best to him. If a clear contrary consensus subsequently emerges, we can always change it. It is not as if the particular choice of SQL syntax is hard to whack around. Once we release it we're stuck with it, but certainly between now and beta we can change it whenever we like. I'd rather have the core part of the feature committed and tinker with the syntax than wait longer to land the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html >> DISCARD would be better. > Well, personally, I'm in favor of either TRUNCATE or ALTER > MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to > suppose that we're going to start using DISCARD for things that have > nothing to do with the existing meanings of DISCARD. Yeah, there's actually a serious problem with choosing DISCARD: surely we are not going to include "trash all MVs" in the behavior of DISCARD ALL. So unless you would like to say that DISCARD ALL doesn't mean what it appears to mean, we can't make MV reset be one of the sub-flavors of DISCARD. So that seems to leave us with either TRUNCATE or an ALTER sub-syntax. Personally I'd prefer the latter but it's surely debatable. regards, tom lane
On 28.02.2013 16:55, Robert Haas wrote: > On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html >> DISCARD would be better. > > Well, personally, I'm in favor of either TRUNCATE or ALTER > MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to > suppose that we're going to start using DISCARD for things that have > nothing to do with the existing meanings of DISCARD. Number one, I > think it's confusing. Number two, it's currently possible to > determine whether something is DDL, DML, or other by looking at the > first word of the command. If we throw that out the window we may > cause performance issues for connection pooling software that tries to > be clever like that. FWIW, I totally agree with that. From that point of view, the best thing would be to tack this onto the REFRESH command, perhaps something like: REFRESH matview INVALIDATE; REFRESH matview UNREFRESH; REFRESH matview DISCARD; It's a bit weird that the command is called REFRESH, if the effect is the exact opposite of refreshing it. And we usually do have two separate commands for doing something and undoing the same; CREATE - DROP, PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and so forth. I think we're being too hung up on avoiding new (unreserved) keywords. Yes, the grammar is large because of so many keywords, but surely there's some better solution to that than adopt syntax that sucks. Let's invent a new keyword (INVALIDATE? UNREFRESH?), and deal with the grammar bloat separately. - Heikki
On Thu, Feb 28, 2013 at 11:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Well, personally, I'm in favor of either TRUNCATE or ALTER >> MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to >> suppose that we're going to start using DISCARD for things that have >> nothing to do with the existing meanings of DISCARD. > > Yeah, there's actually a serious problem with choosing DISCARD: > surely we are not going to include "trash all MVs" in the behavior > of DISCARD ALL. So unless you would like to say that DISCARD ALL > doesn't mean what it appears to mean, we can't make MV reset be > one of the sub-flavors of DISCARD. Good point. > So that seems to leave us with either TRUNCATE or an ALTER sub-syntax. > Personally I'd prefer the latter but it's surely debatable. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 28.02.2013 16:55, Robert Haas wrote: >> Well, personally, I'm in favor of either TRUNCATE or ALTER >> MATERIALIZED VIEW ... DISCARD. I think it's a dangerous >> precedent to suppose that we're going to start using DISCARD for >> things that have nothing to do with the existing meanings of >> DISCARD. Number one, I think it's confusing. Number two, it's >> currently possible to determine whether something is DDL, DML, >> or other by looking at the first word of the command. If we >> throw that out the window we may cause performance issues for >> connection pooling software that tries to be clever like that. > > FWIW, I totally agree with that. From that point of view, the > best thing would be to tack this onto the REFRESH command, > perhaps something like: > > REFRESH matview INVALIDATE; > REFRESH matview UNREFRESH; > REFRESH matview DISCARD; > > It's a bit weird that the command is called REFRESH, if the > effect is the exact opposite of refreshing it. And we usually do > have two separate commands for doing something and undoing the > same; CREATE - DROP, PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and > so forth. > > I think we're being too hung up on avoiding new (unreserved) > keywords. Yes, the grammar is large because of so many keywords, > but surely there's some better solution to that than adopt syntax > that sucks. Let's invent a new keyword (INVALIDATE? UNREFRESH?), > and deal with the grammar bloat separately. I'm OK with any grammar that we can reach consensus on, but that seems elusive and I don't want to hold up getting the meat of the patch committed while we haggle out this syntax detail. Votes have shifted frequently, but as I make out the latest opinion of people making a statement that looks like an explicit vote, dividing the value of a split vote between the choices, I get: Kevin Grittner: TRUNCATE Stephen Frost: TRUNCATE Peter Eisentraut: ALTER Andres Freund: RESET or DISCARD Josh Berkus: RESET Greg Stark: TRUNCATE Michael Paquier: DISCARD Robert Haas: TRUNCATE or ALTER Tom Lane: TRUNCATE or ALTER Heikki Linnakangas: REFRESH? Vote totals: TRUNCATE: 4.0 ALTER: 2.0 DISCARD: 1.5 RESET: 1.5 REFRESH 1.0? Barring a sudden confluence of opinion, I will go with TRUNCATE for the initial spelling. I tend to favor that spelling for several reasons. One was the size of the patch needed to add the opposite of REFRESH to the backend code: diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 2a55e02..eb7a14f 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1234,11 +1234,12 @@ truncate_check_rel(Relation rel) { AclResult aclresult; - /* Only allow truncate on regular tables */ - if (rel->rd_rel->relkind != RELKIND_RELATION) + /* Only allow truncate on regular tables and materialized views. */ + if (rel->rd_rel->relkind != RELKIND_RELATION && + rel->rd_rel->relkind != RELKIND_MATVIEW) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("\"%s\" is not a table", + errmsg("\"%s\" is not a table or materialized view", RelationGetRelationName(rel)))); /* Permissions checks */ That's it. That takes it from "no way to release the space held by the current contents of a materialized view and render it unscannable until its rule's query is used to populate it again" to "working". Now, there are docs and psql support needed on top of that, and the regression tests use the verb, and bikeshedding led to a minor tweak of TRUNCATE so that you could specify TRUNCATE MATERIALIZED VIEW -- but to get working backend code, the above is sufficient. That strikes me a prima facie evidence that it's not a horribly off-target verb. In terms of things to consider in choosing a verb are that, while complete absence of data in the MV's backing table is the only thing which will render it unscannable in this initial version, there is clear demand to eventually track information on how up-to-date data is, and treat the MV as unscannable for less extreme conditions, such as the asynchronous application of changes from backing tables having fallen behind some configurable threshold. In essence, people want the error instead of scanning the relation if the generated data is not within a range of time that is "fresh" enough, and truncating the backing relation (or creating the MV WITH NO DATA) is a special case of that since the data is not representing the results of the MV's query as of *any* known point in time. (As previously discussed, that is distinct from an empty relation which is known to represent a fresh view of the results of that query.) If we pick a new pair of verbs, the connotations I think we should be looking for include the ability to deal with at least these: * Make the MV represent the result set of its query as of this moment. * Declare the MV to be too stale for the contents of its backing table to be useful, and free the space held by the current contents. Other operations will be needed, for which syntax is not settled. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 28, 2013 at 7:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Barring a sudden confluence of opinion, I will go with TRUNCATE for > the initial spelling. I tend to favor that spelling for several > reasons. One was the size of the patch needed to add the opposite > of REFRESH to the backend code: FWIW, I found Andres's point about closing the door on updatable views quite convincing. If at any point we want to add updatable materialized views, it seems like a bad inconsistency to have TRUNCATE mean something completely different from DELETE. While update capability for materialized views might not be a common use case, I don't think it's fair to completely shut the door on it to have easier implementation and shorter syntax. Especially as the shorter syntax is semantically inconsistent - normal truncate removes the data, materialized view just makes the data inaccessible until the next refresh. Sorry for weighing in late, but it seemed to me that this point didn't get enough consideration. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
Ants Aasma <ants@cybertec.at> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: >> Barring a sudden confluence of opinion, I will go with TRUNCATE >> for the initial spelling. I tend to favor that spelling for >> several reasons. One was the size of the patch needed to add >> the opposite of REFRESH to the backend code: > > FWIW, I found Andres's point about closing the door on updatable > views quite convincing. If at any point we want to add updatable > materialized views, it seems like a bad inconsistency to have > TRUNCATE mean something completely different from DELETE. While > update capability for materialized views might not be a common > use case, I don't think it's fair to completely shut the door on > it to have easier implementation and shorter syntax. Especially > as the shorter syntax is semantically inconsistent - normal > truncate removes the data, materialized view just makes the data > inaccessible until the next refresh. > > Sorry for weighing in late, but it seemed to me that this point > didn't get enough consideration. Personally, I don't understand why anyone would want updateable materialized views. That's probably because 99% of the cases where I've seen that someone wanted them, they wanted them updated to match the underlying data using some technique that didn't require the modification or commit of the underlying data to carry the overhead of maintaining the MV. In other words, they do not want the MV to be up-to-date for performance reasons. That is a big part of the reason for *wanting* to use an MV. How do you make an asynchronously-maintained view updateable? In addtion, at least 80% of the cases I've seen where people want an MV it is summary information, which does not tie a single MV row to a single underlying row. If someone updates an aggregate number like an average, I see no reasonable way to map that to the underlying data in a meaningful way. I see the contract of a materialized view as providing a table-backed relation which shows the result set of a query as of some point in time. Perhaps it is a failure of imagination, but I don't see where modifying that relation directly is compatible with that contract. Can you describe a meaningful use cases for an udpateable materialized view? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 1, 2013 at 4:18 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Personally, I don't understand why anyone would want updateable > materialized views. That's probably because 99% of the cases where > I've seen that someone wanted them, they wanted them updated to > match the underlying data using some technique that didn't require > the modification or commit of the underlying data to carry the > overhead of maintaining the MV. In other words, they do not want > the MV to be up-to-date for performance reasons. That is a big > part of the reason for *wanting* to use an MV. How do you make an > asynchronously-maintained view updateable? > > In addtion, at least 80% of the cases I've seen where people want > an MV it is summary information, which does not tie a single MV row > to a single underlying row. If someone updates an aggregate number > like an average, I see no reasonable way to map that to the > underlying data in a meaningful way. > > I see the contract of a materialized view as providing a > table-backed relation which shows the result set of a query as of > some point in time. Perhaps it is a failure of imagination, but I > don't see where modifying that relation directly is compatible with > that contract. > > Can you describe a meaningful use cases for an udpateable > materialized view? I actually agree that overwhelming majority of users don't need or want updateable materialized views. My point was that we can't at this point rule out that people will think of a good use for this. I don't have any real use cases for this, but I can imagine a few situations where updateable materialized views wouldn't be nonsensical. One case would be if the underlying data is bulkloaded and is subsetted into smaller materialized views for processing using off-the-shelf tools that expect tables. One might want to propagate changes from those applications to the base data. The other case would be the theoretical future where materialized views can be incrementally and transactionally maintained, in that case being able to express modifications on the views could actually make sense. I understand that the examples are completely hypothetical and could be solved by using regular tables. I just have feeling that will regret conflating TRUNCATE semantics for slight implementation and notation convenience. To give another example of potential future update semantics, if we were to allow users manually maintaining materialized view contents using DML commands, one would expect TRUNCATE to mean "make this matview empty", not "make this matview unavailable". Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On Fri, Mar 1, 2013 at 3:01 PM, Ants Aasma <ants@cybertec.at> wrote: > . To give another example of potential future > update semantics, if we were to allow users manually maintaining > materialized view contents using DML commands, one would expect > TRUNCATE to mean "make this matview empty", not "make this matview > unavailable". Wouldn't that just be a regular table then though? How is that a materialized view? If anything someone might expect truncate to delete any rows from the source table that appear in the view. But I think it's likely that even if materialized views were updateable truncate wouldn't be one of the updateable operations. -- greg
Greg Stark <stark@mit.edu> wrote: > Ants Aasma <ants@cybertec.at> wrote: >> To give another example of potential future update semantics, if >> we were to allow users manually maintaining materialized view >> contents using DML commands, one would expect TRUNCATE to mean >> "make this matview empty", not "make this matview unavailable". > > Wouldn't that just be a regular table then though? How is that a > materialized view? > > If anything someone might expect truncate to delete any rows from > the source table that appear in the view. But I think it's likely > that even if materialized views were updateable truncate wouldn't > be one of the updateable operations. Yeah, the only way it would make sense to truncate MV contents from a user-written maintenance trigger (assuming we might have such a thing some day) would be if you decided that the change to the underlying data was so severe that you effectively needed to REFRESH, and there would probably be a better way to go about dealing with that. Two other reasons that this might not be a problem are: (1) Any DML against the MV would need to be limited to some context fired by the underlying changes. If we allow changes to the MV outside of that without it being part of some "updateable MV" feature (reversing the direction of flow of changes), the MV could not be trusted at all. If you're going to do that, just use a table. (2) I can think of a couple not-too-horrible syntax tricks we could use to escape from the corner we're worried about painting ourselves into. All of that said, some combination of Heikki's previous suggestion that maybe REFRESH could be used and my noticing that both TRUNCATE and REFRESH create a new heap for an MV, it's just a question of whether we then run the MV's query to fill it with data, led to this thought: REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA This sort of mirrors the CREATE MATERIALIZED VIEW style (which was based on CREATE TABLE AS) and WITH NO DATA puts the MV into the unscannable state either way. I can change the parser to make this literally just the new spelling of TRUNCATE MATERIALIZED VIEW without dashing my hopes of pushing the patch tomorrow. (The patch has been ready to go for weeks other than this syntax issue and documentation which needs to refer to whatever syntax is chosen.) Barring objections, I will use the above and push tomorrow. The only issues which have been raised which will not be addressed at that point are: (1) The suggestion that ALTER MATERIALIZED VIEW name ALTER column support changing the collation isn't something I can see how to do without complication and risk beyond what is appropriate at this point in the release cycle. It will be left off, at least for now. To get a new collation, you will need to drop the MV and re-create it with a query which specifies the collation for the result column. (2) The sepgsql changes are still waiting for a decision from security focused folks. I have two patches for that contrib module ready based on my best reading of things -- one which uses table security labels and one which instead uses new matview labels. When we get a call on which is preferred, I suspect that one patch or the other will be good as-is or with minimal change. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2 March 2013 15:06, Kevin Grittner <kgrittn@ymail.com> wrote: > [ ... ] led to > this thought: > > REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA > [Sorry to join this discussion so late] FWIW I had a quick look at other DBs to see if there were any other precedents out there. Oracle was the only one I could find with anything similar. They use the same creation syntax: CREATE MATERIALIZED VIEW name [options] AS SELECT ... and they use ALTER for everything else, such as refreshing the MV: ALTER MATERIALIZED VIEW name REFRESH [options]; AFAICT the nearest thing they have to TRUNCATE/DISCARD is: ALTER MATERIALIZED VIEW name CONSIDER FRESH; They do also support updateable materialized views with standard DML, but it doesn't look as though they allow TRUNCATE to operate directly on a MV (although it can be made to propagate from a base table to a MV, in which case allowing TRUNCATE on the MV itself with a different meaning would likely be confusing). Oracle's MVs have lots of options, all of which hang off the 2 basic CREATE and ALTER commands. There's a certain appeal to that, rather than inventing or overloading a bunch of other commands as more options get added. The proposed REFRESH command is OK for today's options, but I think it might be overly limiting in the future. Of course, since this isn't in the SQL standard, we are free to use any syntax we like. We don't have to follow Oracle, but having a common syntax might make some people's lives easier, and I haven't seen a convincing argument as to why any alternative syntax is better. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: >> [ ... ] led to this thought: >> >> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA > > [Sorry to join this discussion so late] > > FWIW I had a quick look at other DBs to see if there were any > other precedents out there. Oracle was the only one I could find > with anything similar. They use the same creation syntax: > > CREATE MATERIALIZED VIEW name [options] AS SELECT ... It is a pretty obvious choice when you look at other SQL statements. > and they use ALTER for everything else, such as refreshing the > MV: > > ALTER MATERIALIZED VIEW name REFRESH [options]; No, that is for specifiying when and under what conditions an automatic refresh is done. To do an immediate action which is equivalent to what I have for the REFRESH statement, they use a REFRESH() function. That seemed too incompatible with how we've done everything else in PostgreSQL -- I felt that a statement would make more sense. Consider REINDEX, CLUSTER, and VACUUM FULL for example. > AFAICT the nearest thing they have to TRUNCATE/DISCARD is: > > ALTER MATERIALIZED VIEW name CONSIDER FRESH; No, that doesn't rebuild or discard data -- if the MV is out-of-date and therefore unscannable according to the how the MV has been set up, this overrides that indication and allows scanning in spite of that. > They do also support updateable materialized views with standard > DML, but it doesn't look as though they allow TRUNCATE to operate > directly on a MV (although it can be made to propagate from a > base table to a MV, in which case allowing TRUNCATE on the MV > itself with a different meaning would likely be confusing). They allow DML on the MV in order to update it. The default REFRESH() function executes a TRUNCATE statement followed by INSERT / SELECT using the MV's query. > Oracle's MVs have lots of options, all of which hang off the 2 > basic CREATE and ALTER commands. There's a certain appeal to > that, rather than inventing or overloading a bunch of other > commands as more options get added. The proposed REFRESH command > is OK for today's options, but I think it might be overly > limiting in the future. For what ALTER MATERIALIZED VIEW in Oracle does, I think it makes sense to use ALTER. I don't think this feature should use functions for REFRESH. Why Oracle can get away with functions for it is that they allow DML on an MV, which seems to me to compromise the integrity of the feature, at least as default behavior. I see us supporting automatic incremental updates of progressively more complex queries, and we may at some point want to add a trigger-based maintenance option; but the functionality available with a trigger-based approach is almost entirely availaable in PostgreSQL today without this feature. Rewriting queries using expressions which match the MV's query to pull from the MV instead of the underlying tables is the exception. While that is a "sexy" feature, and I'm sure one can construct examples where it helps performance, it seems to me unlikely to be very generally useful. I suspect that it exists mostly so that people who want to write an RFP to pick a particular product can include that as a requirement. In other words, I think the main benefit of automatic rewrite using an MV is marketing, not technical or performance. That's important, too; but let's focus first on getting what is most useful. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> wrote: > REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA Given the short time, I left out the "[, ...]". If people think that is important to get into this release, a follow-on patch might be possible. > Barring objections, I will use the above and push tomorrow. I'm still working on docs, and the changes related to the syntax change are still only lightly tested, but as far as I know, all is complete except for the docs. I'm still working on those and expect to have them completed late today. I'm posting this patch to allow a chance for final review of the code changes before I push. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 3 March 2013 13:12, Kevin Grittner <kgrittn@ymail.com> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> Kevin Grittner <kgrittn@ymail.com> wrote: > >>> [ ... ] led to this thought: >>> >>> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA >> >> [Sorry to join this discussion so late] >> >> FWIW I had a quick look at other DBs to see if there were any >> other precedents out there. Oracle was the only one I could find >> with anything similar. They use the same creation syntax: >> >> CREATE MATERIALIZED VIEW name [options] AS SELECT ... > > It is a pretty obvious choice when you look at other SQL > statements. > >> and they use ALTER for everything else, such as refreshing the >> MV: >> >> ALTER MATERIALIZED VIEW name REFRESH [options]; > > No, that is for specifiying when and under what conditions an > automatic refresh is done. To do an immediate action which is > equivalent to what I have for the REFRESH statement, they use a > REFRESH() function. That seemed too incompatible with how we've > done everything else in PostgreSQL -- I felt that a statement would > make more sense. Consider REINDEX, CLUSTER, and VACUUM FULL for > example. > >> AFAICT the nearest thing they have to TRUNCATE/DISCARD is: >> >> ALTER MATERIALIZED VIEW name CONSIDER FRESH; > > No, that doesn't rebuild or discard data -- if the MV is > out-of-date and therefore unscannable according to the how the MV > has been set up, this overrides that indication and allows scanning > in spite of that. > Ah, OK I see. I misunderstood what the Oracle docs were saying. ALTER only changes the MV's definition, whereas their REFRESH() function and your REFRESH statement updates the data in the MV. That makes much more sense. Regards, Dean
2013/3/3 Kevin Grittner <kgrittn@ymail.com>: > Rewriting queries using > expressions which match the MV's query to pull from the MV instead > of the underlying tables is the exception. While that is a "sexy" > feature, and I'm sure one can construct examples where it helps > performance, it seems to me unlikely to be very generally useful. > I suspect that it exists mostly so that people who want to write an > RFP to pick a particular product can include that as a requirement. > In other words, I think the main benefit of automatic rewrite > using an MV is marketing, not technical or performance. I think that automatically using materialized views even when the query doesn’t mention them directly, is akin to automatically using indexes without having to mention them in the query. That way, queries can be written the natural way, and “creating materialized views” is an optimization that can be applied by a DBA without having to update the application queries to use them. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
Nicolas Barbier <nicolas.barbier@gmail.com> wrote: > 2013/3/3 Kevin Grittner <kgrittn@ymail.com>: >> Rewriting queries using expressions which match the MV's query >> to pull from the MV instead of the underlying tables is the >> exception. While that is a "sexy" feature, and I'm sure one can >> construct examples where it helps performance, it seems to me >> unlikely to be very generally useful. I suspect that it exists >> mostly so that people who want to write an RFP to pick a >> particular product can include that as a requirement. In other >> words, I think the main benefit of automatic rewrite using an MV >> is marketing, not technical or performance. > > I think that automatically using materialized views even when the > query doesn’t mention them directly, is akin to automatically > using indexes without having to mention them in the query. That > way, queries can be written the natural way, and “creating > materialized views” is an optimization that can be applied by a > DBA without having to update the application queries to use them. Oh, I understand that concept perfectly well, I just wonder how often it is useful in practice. The cost of planning with indexes tends to go up dramatically the planner needs to evaluate all possible combinations of access paths. We've devoted quite a bit of energy keeping that from being something like the factorial of the number of indexes. If you now need to find all materialized views which could substitute for parts of a query, and look at all permutations of how those could be used, and which indexes can be used for each of those combinations, you have planning time which can explode to extreme levels. Where the number of database objects are small and their sizes are large (like some data warehouse situations), you could come out ahead; and if I wanted to showcase the capability you describe that's what I would use. With a large number of database objects with only a few tens of millions of rows per table, I doubt you will come out ahead. Granted, you could say the same thing about indexes, and they are very often useful. I'm saying that I expect the usefulness of the technique you describe is generally very low, but not zero. Except for marketing, where it's a flashy feature. I would be interested in seeing information to show where it works well, though. There is probably something to be learned by looking at the details of the environment and workload of such a site. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Mar 2, 2013 at 3:06 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > > (1) Any DML against the MV would need to be limited to some > context fired by the underlying changes. If we allow changes to > the MV outside of that without it being part of some "updateable > MV" feature (reversing the direction of flow of changes), the MV > could not be trusted at all. If you're going to do that, just use > a table. Oh! I misunderstood what you were suggesting. I think we were talking at cross-purposes. You're imagining a user issues truncate against the underlying table(s) and the code that handles updating the materialized view will need to issue a truncate against the MV to update it. I was imagining that you wanted to be able to issue DML against the MV just as one can against an updateable view. That DML should propagate to the underlying table(s) through various magic. It's a pretty theoretical fear now but one day it may be important to avoid confusion between these two. -- greg
2013/3/3 Kevin Grittner <kgrittn@ymail.com>: > Nicolas Barbier <nicolas.barbier@gmail.com> wrote: > >> I think that automatically using materialized views even when the >> query doesn’t mention them directly, is akin to automatically >> using indexes without having to mention them in the query. That >> way, queries can be written the natural way, and “creating >> materialized views” is an optimization that can be applied by a >> DBA without having to update the application queries to use them. > > Oh, I understand that concept perfectly well, I just wonder how > often it is useful in practice. The cost of planning with indexes > tends to go up dramatically the planner needs to evaluate all > possible combinations of access paths. We've devoted quite a bit > of energy keeping that from being something like the factorial of > the number of indexes. If you now need to find all materialized > views which could substitute for parts of a query, and look at all > permutations of how those could be used, and which indexes can be > used for each of those combinations, you have planning time which > can explode to extreme levels. I guess that a basic version of such a feature would do something like this: (1) Check for each matview whether it simply consists of an optional bunch of joins + optional aggregation + optional general filter conditions (to support something akin to a partial index). If not, the optimization doesn’t take this matview into account. This step can be done beforehand. (2) Check for each (sub)query in the query-to-optimize whether it does the following (at a smart point in the optimization phase). If any of these conditions are not met, don’t use this matview: - Joins at least the tables that are joined in the matview. - Containsjoin conditions and general filter conditions that are at least as strict. - Doesn’t refer elsewhere to any attributes that the matview doesn’t contain. (3) Always replace the corresponding query parts with the matview (i.e., assume that the cost will always be lower than performing the original query). (4) If multiple matviews fit in step 3, try them all (and use the one that yields the lower total cost). (5) Always replace any aggregation with the corresponding aggregation-results (if they exist) from the matview. That doesn’t sound as if it would make planning time explode that much (except, because of step 4, when there are many matviews that contain overlapping sets of joined tables, and a query joins over the union of those sets, *and* the replaceable-by-matviews parts are in subqueries). It could even decrease it significantly (e.g., in the case where a bunch of joins would be replaced with a scan of a matview). Also, I suppose that once such functionality exists, application writers would be more inclined to write “heavy” queries that do lots of aggregation even in an OLTP environment, of the kind that is these days typically only done in OLAP environments. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
Nicolas Barbier <nicolas.barbier@gmail.com> writes: > 2013/3/3 Kevin Grittner <kgrittn@ymail.com>: >> Nicolas Barbier <nicolas.barbier@gmail.com> wrote: >>> I think that automatically using materialized views even when the >>> query doesn’t mention them directly, is akin to automatically >>> using indexes without having to mention them in the query. >> Oh, I understand that concept perfectly well, I just wonder how >> often it is useful in practice. There's a much more fundamental reason why this will never happen, which is that the query planner is not licensed to decide that you only want an approximate and not an exact answer to your query. If MVs were guaranteed always up-to-date, maybe we could think about automatic use of them --- but that's a far different feature from what Kevin has here. regards, tom lane
> There's a much more fundamental reason why this will never happen, which > is that the query planner is not licensed to decide that you only want > an approximate and not an exact answer to your query. I think it would be worth talking about when someone wants to implement it. I'd imagine it would require setting a GUC, though, which would be off by default for obvious reasosn. > If MVs were guaranteed always up-to-date, maybe we could think about > automatic use of them --- but that's a far different feature from what > Kevin has here. And of limited utility, as mentioned. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Kevin Grittner <kgrittn@ymail.com> wrote: > I'm still working on docs, and the changes related to the syntax > change are still only lightly tested, but as far as I know, all is > complete except for the docs. I'm still working on those and > expect to have them completed late today. I'm posting this patch > to allow a chance for final review of the code changes before I > push. Pushed. I'll be keeping an eye on the buildfarm. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 03/04/2013 08:27 AM, Josh Berkus wrote: >> There's a much more fundamental reason why this will never happen, which >> is that the query planner is not licensed to decide that you only want >> an approximate and not an exact answer to your query. > I think it would be worth talking about when someone wants to implement > it. I'd imagine it would require setting a GUC, though, which would be > off by default for obvious reasosn. I'm not a fan of this, even with a GUC. Imagine doing remote debugging by email/phone. There are enough things to check already ("does your application REALLY commit that transaction?") without also having to deal with settings that can cause a potentially out of date view of the data to be used without it being visible in the query its self. I hate to even say it, but this is where a per-query [redacted] would be good, so we could say in the query text that this query may use matviews that are not perfectly up to date. At this point it's all hand-waving anyway, since no feature to allow the planner to automatically rewrite a subtree of a query to use a matview instead exists. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03-03-2013 21:27, Josh Berkus wrote: > I think it would be worth talking about when someone wants to implement > it. I'd imagine it would require setting a GUC, though, which would be > off by default for obvious reasosn. > -1. Why not adding another storage_parameter, say auto_refresh=on? Also, that's another feature. > And of limited utility, as mentioned. > For a first release, that is fine as is. Let's not complicate a feature that has been widely discussed in this development cycle. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
On 3 March 2013 23:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Nicolas Barbier <nicolas.barbier@gmail.com> writes: >> 2013/3/3 Kevin Grittner <kgrittn@ymail.com>: >>> Nicolas Barbier <nicolas.barbier@gmail.com> wrote: >>>> I think that automatically using materialized views even when the >>>> query doesn’t mention them directly, is akin to automatically >>>> using indexes without having to mention them in the query. > >>> Oh, I understand that concept perfectly well, I just wonder how >>> often it is useful in practice. > > There's a much more fundamental reason why this will never happen, which > is that the query planner is not licensed to decide that you only want > an approximate and not an exact answer to your query. > > If MVs were guaranteed always up-to-date, maybe we could think about > automatic use of them --- but that's a far different feature from what > Kevin has here. Its not a different feature, its what most people expect a feature called MV to deliver. That's not a matter of opinion, its simply how every other database works currently - Oracle, Teradata, SQLServer at least. The fact that we don't allow MVs to automatically optimize queries is acceptable, as long as that is clearly marked in some way to avoid confusion, and I don't mean buried on p5 of the docs. What we have here is a partial implementation that can be improved upon over next few releases. I hope anyone isn't going to claim that "Materialized Views" have been implemented in the release notes in this release, because unqualified that would be seriously misleading and might even stifle further funding to improve things to the level already implemented elsewhere. Just to reiterate, I fully support the committing of this partial feature into Postgres in this release because it will be a long haul to complete the full feature and what we have here is a reasonable stepping stone to get there. Transactionally up-yo-date MVs can be used like indexes in the planner. The idea that this is impossible because of the permutations involved is somewhat ridiculous; there is much published work on optimising that and some obvious optimisations. Clearly that varies according to the number of MVs and the number of tables they touch, not the overall complexity of the query. The overhead is probably same or less as partial indexes, which we currently think is acceptable. In any case, if you don't wish that overhead, don't use MVs. Non-transactionally up-to-date MVs could also be used like indexes in the planner, if we gave the planner the "licence" it (clearly) lacks. If using MV makes a two-hour query return in 1 minute, then using an MV that is 15 minutes out of date is likely to be a win. The "licence" is some kind of user parameter/option that specifies how stale an answer a query can return. For many queries that involve averages and sums, a stale or perhaps an approximate answer would hardly differ anyway. So I think there is room somewhere there for a "staleness" time specification by the user, allowing approximation. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> wrote: > On 3 March 2013 23:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Nicolas Barbier <nicolas.barbier@gmail.com> writes: >>> 2013/3/3 Kevin Grittner <kgrittn@ymail.com>: >>>> Nicolas Barbier <nicolas.barbier@gmail.com> wrote: >>>>> I think that automatically using materialized views even when >>>>> the query doesn’t mention them directly, is akin to >>>>> automatically using indexes without having to mention them in >>>>> the query. >> >>>> Oh, I understand that concept perfectly well, I just wonder >>>> how often it is useful in practice. >> >> There's a much more fundamental reason why this will never >> happen, which is that the query planner is not licensed to >> decide that you only want an approximate and not an exact answer >> to your query. >> >> If MVs were guaranteed always up-to-date, maybe we could think >> about automatic use of them --- but that's a far different >> feature from what Kevin has here. > > Its not a different feature, its what most people expect a > feature called MV to deliver. That's not a matter of opinion, its > simply how every other database works currently - Oracle, > Teradata, SQLServer at least. The fact that we don't allow MVs to > automatically optimize queries is acceptable, as long as that is > clearly marked in some way to avoid confusion, and I don't mean > buried on p5 of the docs. What we have here is a partial > implementation that can be improved upon over next few releases. > I hope anyone isn't going to claim that "Materialized Views" have > been implemented in the release notes in this release, because > unqualified that would be seriously misleading and might even > stifle further funding to improve things to the level already > implemented elsewhere. Just to reiterate, I fully support the > committing of this partial feature into Postgres in this release > because it will be a long haul to complete the full feature and > what we have here is a reasonable stepping stone to get there. > > Transactionally up-yo-date MVs can be used like indexes in the > planner. The idea that this is impossible because of the > permutations involved is somewhat ridiculous; there is much > published work on optimising that and some obvious optimisations. > Clearly that varies according to the number of MVs and the number > of tables they touch, not the overall complexity of the query. > The overhead is probably same or less as partial indexes, which > we currently think is acceptable. In any case, if you don't wish > that overhead, don't use MVs. > > Non-transactionally up-to-date MVs could also be used like > indexes in the planner, if we gave the planner the "licence" it > (clearly) lacks. If using MV makes a two-hour query return in 1 > minute, then using an MV that is 15 minutes out of date is likely > to be a win. The "licence" is some kind of user parameter/option > that specifies how stale an answer a query can return. For many > queries that involve averages and sums, a stale or perhaps an > approximate answer would hardly differ anyway. So I think there > is room somewhere there for a "staleness" time specification by > the user, allowing approximation. I don't think I disagree with any of what Simon says other than his feelings about the planning cost. Imagine that there are ten MVs that might apply to a complex query, but some of them are mutually exclusive, so there are a large number of permutations of MVs which could be used to replace parts of the original query. And maybe some of base tables have indexes which could reduce execution cost which aren't present in some or all of the MVs. And each MV has a number of indexes. The combinatorial explosion of possible plans would make it hard to constrain plan time without resorting to some crude rules about what to choose. That's not an unsolvable problem, but I see it as a pretty big problem. I have no doubt that someone could take a big data warehouse and add one or two MVs and show a dramatic improvement in the run time of a query. Almost as big as if the query were rewritten to usee the MV directly. It would make for a very nice presentation, and as long as they are used sparingly this could be a useful tool for a data warehouse environment which is playing with alternative ways to optimize slow queries which pass a lot of data. In other environments, I feel that it gets a lot harder to show a big win. The good news is that it sounds like we agree on the ideal long-term feature set. I'm just a lot more excited, based on the use-cases I've seen, about the addition of incremental updates than substituting MVs into query plans which reference the underlying tables. Perhaps that indicates a chance to the final feature set sooner, through everyone scratching their own itches. :-) And we both seem to feel that some system for managing acceptable levels of MV "freshness" is a necessary feature in order to go very much further. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5 March 2013 12:15, Kevin Grittner <kgrittn@ymail.com> wrote: > I don't think I disagree with any of what Simon says other than his > feelings about the planning cost. Imagine that there are ten MVs > that might apply to a complex query, but some of them are mutually > exclusive, so there are a large number of permutations of MVs which > could be used to replace parts of the original query. And maybe > some of base tables have indexes which could reduce execution cost > which aren't present in some or all of the MVs. And each MV has a > number of indexes. The combinatorial explosion of possible plans > would make it hard to constrain plan time without resorting to some > crude rules about what to choose. That's not an unsolvable > problem, but I see it as a pretty big problem. If we are proposing that we shouldn't try to optimise because its not usefully solvable, then I would disagree. If we are saying that more plans are possible with MVs, then I'd say, yes there *could* be - that's the one of the purposes. That represents more options for optimisation and we should be happy, not sad about that. Yes, we would need some thought to how those potential optimisations can be applied without additional planning cost, but I see that as a long term task, not a problem. The question is will the potential for additional planning time actually materialise into a planning problem? (See below). > I have no doubt that someone could take a big data warehouse and > add one or two MVs and show a dramatic improvement in the run time > of a query. Almost as big as if the query were rewritten to usee > the MV directly. It would make for a very nice presentation, and > as long as they are used sparingly this could be a useful tool for > a data warehouse environment which is playing with alternative ways > to optimize slow queries which pass a lot of data. In other > environments, I feel that it gets a lot harder to show a big win. Are there realistically going to be more options to consider? In practice, no, because in most cases we won't be considering both MVs and indexes. Splatting MVs on randomly won't show any more improvement than splatting indexes on randomly helps. Specific optimisations help in specific cases only. And of course, adding extra data structures that have no value certainly does increase planning time. Presumably we'd need some way of seeing how frequently MVs were picked, so we could drop unused MVs just like we can indexes. * Indexes are great at speeding up various kinds of search queries. If you don't have any queries like that, they help very little. * MVs help in specific and restricted use cases, but can otherwise be thought of as a new kind of index structure. MVs help with joins and aggregations, so if you don't do much of that, you'll see no benefit. That knowledge also allows us to develop heuristics for sane optimisation. If the MV has a GROUP BY in it, and a query doesn't, then it probably won't help much to improve query times. If it involves a join you aren't using, then that won't help either. My first guess would be that we don't even bother looking for MV plans unless it has an aggregated result, or a large scale join. We do something similar when we look for plans that use indexes when we have appropriate quals - no quals, no indexes. As a result, I don't see MVs increasing planning times for requests that would make little or no use of them. There will be more planning time on queries that could make use of them and that is good because we really care about that. Sure, a badly written planner might cost more time than it saves. All of this work requires investment from someone with the time and/or experience to make a good go at it. I'm not pushing Tom towards it, nor anyone else, but I do want to see the door kept open for someone to do this when possible (i.e. not GSoC). -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon, Kevin, all: Actually, there was already an attempt at automated MV query planning as a prior university project. We could mine that for ideas. Hmmm. I thought it was on pgfoundry, but it's not. Does anyone have access to ACM databases etc. so they could search for this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > I don't think I disagree with any of what Simon says other than his > feelings about the planning cost. Imagine that there are ten MVs > that might apply to a complex query, but some of them are mutually > exclusive, so there are a large number of permutations of MVs which > could be used to replace parts of the original query. And maybe > some of base tables have indexes which could reduce execution cost > which aren't present in some or all of the MVs. And each MV has a > number of indexes. The combinatorial explosion of possible plans > would make it hard to constrain plan time without resorting to some > crude rules about what to choose. That's not an unsolvable > problem, but I see it as a pretty big problem. I'm not sure I agree. Suppose you have a query like SELECT * FROM a INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE <some stuff>. The query planner will construct paths for scans on a, b, and c. Then it will construct joinrels for (a b), (a c), (b c), and eventually (a b c) and calculate a set of promising paths for each of them. If there is a materialized view available for one of those joinrels, all we really need to do is add the possible paths for scanning that materialized view to the joinrel. They'll either be better than the existing paths, or they won't. If they're better, the paths that otherwise would have gotten chosen will get kicked out. If they're worse, the materialized-view paths will get kicked out. Either way, we don't have any more paths than we would have otherwise - so no combinatorial explosion. There is one case where we might end up with more paths than we had before. Suppose there's a materialized view on the query SELECT a.x, a.y, a.z, b.t FROM a INNER JOIN b ON a.x = b.x ORDER BY a.z and the users enters just that query. Suppose further that the materialized view has an index on column z, but table a does not. In that case, the best paths for the joinrel (a b) not involving the materialized view will be an unordered path, but we could scan the materialized view using the index and so will have a path that is ordered by a.z to add to the joinrel. This path will stick around even if it's more expensive than the unordered path because we know it avoids a final sort. So in that case we do have more paths, but they are potentially useful paths, so I don't see that as a problem. It seems to me that the tricky part of this is not that it might add a lot more paths (because I don't think it will, and if it does I think they're useful paths), but that figuring out whether or not a materialized view matches any particular joinrel might be expensive. I think we're going to need a pretty accurate heuristic for quickly discarding materialized views that can't possibly be relevant to the query as a whole, or to particular joinrels. There are a couple of possible ways to approach that. The most manual of these is probably to have a command like ALTER TABLE x {ENABLE | DISABLE } REWRITE MATERIALIZED y, where the user has to explicitly ask for materialized views to be considered, or else they aren't. That sort of fine-grained control might have other benefits as well. I think a more automated solution is also possible, if we want it. For a materialized view to match a query, all of the baserels in the materialized view must also be present in the query. (Actually, there are situations where this isn't true; e.g. the materialized view has an extra table, but it's joined in a way that could be pruned away by the join removal code, but I think we could ignore such somewhat-pathological cases at least initially.) It seems to me that if we could figure out a very-cheap way to throw away all of the materialized views that don't pass that basic test, we'd be reasonably close to a workable solution. A database with tons of materialized views defined on the same set of target relations might still have some planning time problems, but so might a database with tons of indexes. In that regard, what I was thinking about is to use something sort of like a Bloom filter. Essentially, produce a "fingerprint" for each materialized view. For the sake of argument, let's say that the fingerprint is a 64-bit integer, although it could be a bit string of any length. To construct the fingerprint, hash the OID of each relation involved in the view onto a bit position between 0 and 63. Set the bits for all relations involved in the materialized view. Then, construct a fingerprint for the query in the same way. Any materialized view where (matview_fingerprint & query_fingerprint) != matview_fingerprint needn't be considered; moreover, for a given joinrel, any materialized view where matview_fingerprint != joinrel_fingerprint (computed using the same scheme) needn't be considered. Of course, a matching fingerprint doesn't mean that the materialized view matches, or even necessarily that it involves the correct set of relations, so there's a lot more double-checking that has to be done afterwards before you can decide that the view is in fact a match, but at least it's a fast way to throw away some large percentage of materialized views that are definitely NOT relevant to the query, and only do more detailed evaluation on the ones that might be matches. The chances of false positives can be reduced if you care to make the fingerprints bigger (and thus more expensive to compare). All that having been said, it's hard for me to imagine that anyone really cares about any of this until we have an incremental update feature, which right now we don't. Actually, I'm betting that's going to be significantly harder than automatic-query-rewrite, when all is said and done. Automatic-query-rewrite, if and when we get it, will not be easy and will require a bunch of work from someone with a good understanding of the planner, but it strikes me as the sort of thing that might work out to one large project and then it's done. Whereas, incremental update sounds to me like a series of projects over a series of releases targeting various special cases, where we can always point to some improvements vs. release N-1 but we're never actually done and able to move on to the next thing. As a roadmap goes, I think that's OK. Even a reasonably simplistic and partial implementation of incremental update will benefit a lot of users. But in terms of relative difficulty, it's not at all obvious to me that that's the easier part of the project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Josh Berkus <josh@agliodbs.com> wrote: There is no shortage of literature on the topic, although any papers from the ACM could certainly be of interest due to the general high quality of papers published there. Adding anything like this to 9.3 is clearly out of the question, though, so I really don't want to spend time researching this now, or encouraging others to do so until after we have a 9.3 release candidate. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > All that having been said, it's hard for me to imagine that > anyone really cares about any of this until we have an > incremental update feature, which right now we don't. These are actually independent of one another, as long as we nail down how we're determining "freshness" -- which is probably needed for either. Someone who's immersed in tuning long-running DW queries might be interested in this before incremental update. (They might load the data once per month, so refreshing the MVs as a step in that process might be cheaper than incrementally maintaining them.) Someone could base "freshness" on pg_relation_is_scannable() and start working on automatic query rewrite right now, if they wanted to. > Actually, I'm betting that's going to be significantly harder > than automatic-query-rewrite, when all is said and done. > Automatic-query-rewrite, if and when we get it, will not be easy > and will require a bunch of work from someone with a good > understanding of the planner, but it strikes me as the sort of > thing that might work out to one large project and then it's > done. I still think we're going to hit the wall on planning time under certain circumstances and need to tweak that over the course of several releases, but now is not the time to get into the details of why I think that. We've spent way too much time on it already for the point we're at in the 9.3 cycle. I've kept my concerns hand-wavy on purpose, and am trying hard to resist the temptation to spend a lot of time demonstrating the problems. > Whereas, incremental update sounds to me like a series of > projects over a series of releases targeting various special > cases, where we can always point to some improvements vs. release > N-1 but we're never actually done Exactly. I predict that we will eventually have some special sort of trigger for maintaining MVs based on base table changes to handle the ones that are just too expensive (in developer time or run time) to fully automate. But there is a lot of low-hanging fruit for automation. > Even a reasonably simplistic and partial implementation of > incremental update will benefit a lot of users. Agreed. > But in terms of relative difficulty, it's not at all obvious to > me that that's the easier part of the project. I totally agree that getting something working to use MVs in place of underlying tables is not all that different or more difficult than using partial indexes. I just predict that we'll get a lot of complaints about cases where it results in worse performance and we'll need to deal with those issues. I don't seem that as being brain surgery; just a messy matter of trying to get this pretty theory to work well in the real world -- probably using a bunch of not-so-elegant heuristics. And in the end, the best you can hope for is performance not noticeably worse than you would get if you modified your query to explicitly use the MV(s) -- you're just saving yourself the rewrite. Well, OK, there is the point that, (like indexes) if you run the query which hits the base tables with different parameters, and a new plan is generated each time, it might pick different MVs or exclude them as is most efficient for the given parameters. That's the Holy Grail of all this. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2013/3/5 Robert Haas <robertmhaas@gmail.com>: > All that having been said, it's hard for me to imagine that anyone > really cares about any of this until we have an incremental update > feature, which right now we don't. Actually, I'm betting that's going > to be significantly harder than automatic-query-rewrite, when all is > said and done. I agree. E.g., things such as keeping a matview consistent relative to changes applied to the base tables during the same transaction, might be mightily difficult to implement in a performant way. OTOH, matviews that can only be used for optimization if their base tables were not changed “too recently” (e.g., by transactions that are still in flight, including the current transaction), are probably kind of useful in themselves as long as those base tables are not updated all the time. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner <kgrittn@ymail.com> wrote: >> I don't think I disagree with any of what Simon says other than his >> feelings about the planning cost. > I'm not sure I agree. Suppose you have a query like SELECT * FROM a > INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE <some > stuff>. The query planner will construct paths for scans on a, b, and > c. Then it will construct joinrels for (a b), (a c), (b c), and > eventually (a b c) and calculate a set of promising paths for each of > them. If there is a materialized view available for one of those > joinrels, all we really need to do is add the possible paths for > scanning that materialized view to the joinrel. That only works to the extent that a materialized view can be described by a path. My impression is that most of the use-cases for MVs will involve aggregates or similar data reduction operators, and we don't currently implement anything about aggregates at the Path level. Arguably it would be useful to do so; in particular, we could get rid of the currently hard-wired mechanism for choosing between sorted and hashed aggregation, and perhaps there'd be a less grotty way to deal with index-optimized MIN/MAX aggregates. But there's a great deal to do to make that happen, and up to now I haven't seen any indication that it would do much except add overhead. FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. Index matching is already pretty expensive, and that has the saving grace that we only do it once per base relation. Your sketch above implies trying to match to MVs once per considered join relation, which will be combinatorially worse. Even with a lot of sweat over reducing the cost of the matching, it will hurt. > All that having been said, it's hard for me to imagine that anyone > really cares about any of this until we have an incremental update > feature, which right now we don't. Agreed. Even if we're willing to have an "approximate results are OK" GUC (which frankly strikes me as a horrid idea), people would certainly not be willing to turn it on without some guarantee as to how stale the results could be. regards, tom lane
2013/3/5 Kevin Grittner <kgrittn@ymail.com>: > Exactly. I predict that we will eventually have some special sort > of trigger for maintaining MVs based on base table changes to > handle the ones that are just too expensive (in developer time or > run time) to fully automate. But there is a lot of low-hanging > fruit for automation. I think it would be totally OK to restrict the possible definitions for matviews that can be maintained fully incrementally to something like: SELECT attributes and aggregations FROM trivial joins WHERE trivial condition GROUP BY attributes; Those definitions are the most useful for optimizing the things that matviews are good at (joins and aggregation). Nicolas PS. Sorry for having fired off this discussion that obviously doesn’t really relate to the current patch. -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
Nicolas Barbier <nicolas.barbier@gmail.com> wrote: > PS. Sorry for having fired off this discussion that obviously > doesn’t really relate to the current patch. I know it's hard to resist. While I think there will be a number of people for whom the current patch will be a convenience and will therefore use it, it is hard to look at what's there and *not* go "if only it also..." Perhaps it would be worth looking for anything in the patch that you think might be painting us into a corner where it would be hard to do all the other cool things. While it's late enough in the process that changing anything like that which you find would be painful, it might be a lot more painful later if we release without doing something about it. My hope, of course, is that you won't find any such thing. With this patch I've tried to provide a minimal framework onto which these other things can be bolted. I've tried hard not to do anything which would make it hard to extend, but new eyes may see something I missed. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 03/05/2013 01:09 PM, Kevin Grittner wrote: > Josh Berkus <josh@agliodbs.com> wrote: > > There is no shortage of literature on the topic, although any > papers from the ACM could certainly be of interest due to the > general high quality of papers published there. Adding anything > like this to 9.3 is clearly out of the question, though, so I > really don't want to spend time researching this now, or > encouraging others to do so until after we have a 9.3 release > candidate. Good point. Just FYI: once we start work on 9.4, some university team got planner stuff for matviews working with postgres, using version 8.0 or something. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
> Kevin Grittner <kgrittn@ymail.com> wrote: > >> REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA > > Given the short time, I left out the "[, ...]". If people think > that is important to get into this release, a follow-on patch might > be possible. > >> Barring objections, I will use the above and push tomorrow. > > I'm still working on docs, and the changes related to the syntax > change are still only lightly tested, but as far as I know, all is > complete except for the docs. I'm still working on those and > expect to have them completed late today. I'm posting this patch > to allow a chance for final review of the code changes before I > push. Was the remaining work on docs done? I would like to test MVs and am waiting for the docs completed. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On 5 March 2013 22:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > FWIW, my opinion is that doing anything like this in the planner is > going to be enormously expensive. Index matching is already pretty > expensive, and that has the saving grace that we only do it once per > base relation. Your sketch above implies trying to match to MVs once > per considered join relation, which will be combinatorially worse. > Even with a lot of sweat over reducing the cost of the matching, it > will hurt. As we already said: no MVs => zero overhead => no problem. It costs in the cases where time savings are possible and not otherwise. The type of queries and their typical run times are different to the OLTP case, so any additional planning time is likely to be acceptable. I'm sure we can have a deep disussion about how to optimise the planner for this; I'm pretty sure there are reasonable answers to the not-small difficulties along that road. Most importantly, I want to make sure we don't swing the door shut on improvements here, especially if you (Tom) are not personally convinced enough to do the work yourself. Making transactional MVs work would be in part justified by the possible existence of automatic lookaside planning, so yes, the two things are not linked but certainly closely related. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 5 March 2013 22:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> FWIW, my opinion is that doing anything like this in the planner is >> going to be enormously expensive. > As we already said: no MVs => zero overhead => no problem. Well, in the first place that statement is false on its face: we'll still spend cycles looking for relevant MVs, or at least maintaining a complexly-indexed cache that helps us find out that there are none in a reasonable amount of time. In the second place, even if it were approximately true it wouldn't help the people who were using MVs. > It costs in > the cases where time savings are possible and not otherwise. And that is just complete nonsense: matching costs whether you find a match or not. Could we have a little less Pollyanna-ish optimism and a bit more realism about the likely cost of such a feature? regards, tom lane
Tatsuo Ishii <ishii@postgresql.org> wrote: > Was the remaining work on docs done? I would like to test MVs and > am waiting for the docs completed. I think they are done. If you notice anything missing or in need of clarification please let me know. At this point missing docs would be a bug in need of a fix. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin, I haven't seen a reply to this. Were you able to give my notes below any consideration? On 2/15/13 12:44 PM, Peter Eisentraut wrote: > On 1/25/13 1:00 AM, Kevin Grittner wrote: >> New patch rebased, fixes issues raised by Thom Brown, and addresses >> some of your points. > > This patch doesn't apply anymore, so I just took a superficial look. I > think the intended functionality and the interfaces look pretty good. > Documentation looks complete, tests are there. > > I have a couple of notes: > > * What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED. > It might be better to use that as well then. > > * You use fields named relkind in the parse nodes, but they don't > actually contain relkind values, which is confusing. I'd just name the > field is_matview or something. > > * More generally, I wouldn't be so fond of combining the parse handling > of CREATE TABLE AS and CREATE MATERIALIZED VIEW. They are similar, but > then again so are a lot of other things. > > * Some of the terminology is inconsistent. A materialized view is > sometimes called valid, populated, or built, with approximately the same > meaning. Personally, I would settle on "built", as per above, but it > should be one term only. > > * I find the name of the relisvalid column a bit confusing. Especially > because it only applies to materialized views, and there is already a > meaning of "valid" for indexes. (Recall that indexes are also stored in > pg_class, but they are concerned about indisvalid.) I would name it > something like relmvbuilt. > > > Btw., half of the patch seems to consist of updating places referring to > relkind. Is something wrong with the meaning of relkind that this sort > of thing is required? Maybe these places should be operating in terms > of features, not accessing relkind directly. > > >
On Tue, Mar 5, 2013 at 9:08 PM, Robert Haas <robertmhaas@gmail.com> wrote: > All that having been said, it's hard for me to imagine that anyone > really cares about any of this until we have an incremental update > feature, which right now we don't. Actually, I'm betting that's going > to be significantly harder than automatic-query-rewrite, when all is > said and done. Automatic-query-rewrite, if and when we get it, will > not be easy and will require a bunch of work from someone with a good > understanding of the planner, but it strikes me as the sort of thing > that might work out to one large project and then it's done. Whereas, > incremental update sounds to me like a series of projects over a > series of releases targeting various special cases, where we can > always point to some improvements vs. release N-1 but we're never > actually done and able to move on to the next thing. As a roadmap > goes, I think that's OK. Even a reasonably simplistic and partial > implementation of incremental update will benefit a lot of users. But > in terms of relative difficulty, it's not at all obvious to me that > that's the easier part of the project. While true that's true for a lot of Postgres features. The only ones that are one-shot projects are buried deep in the internals. Anything with UI implications inevitably has limitations and then other people come along and and work on removing or extending those features. I do agree with Tom though -- the most frequently asked for materialized view in the past has always been "select count(*) from tab". People assume we already do this and are surprised when we don't. The cookie cutter solution for it is basically exactly what a incrementally updated materialized view solution would look like (with the queue of updates with transacion information that are periodically flattened into the aggregate). Rewriting this might be a bit tricky and require heuristics to determine just how much work to expend trying to match materialized views, this type of view would be where most of the win would be. I also can't see implementing query rewriting for non-transactionally-accurate materialized views. If people want a snapshot of the data that may be out of date that's great. I can tons of use cases for that. But then surely they won't be surprised to have to query the snapshot explicitly. If can't see going to all this trouble to implement transactions and snapshots and wal logging and so on and then silently rewriting queries to produce data that is not up to date. I think users would be surprised to find bog-standard SQL occasionally producing "incorrect" results. That said, there are cases where snapshots might be up to date even though we don't implement any incremental updates. If the underlying data is read-only or hasn't received any update commits since the snapshot was taken then it might still be useful. There are tons of ETL applications where you load the data once and then build MVs for it and never touch the underlying data again. -- greg
Kevin Grittner <kgrittn@ymail.com> wrote: > Tatsuo Ishii <ishii@postgresql.org> wrote: > >> Was the remaining work on docs done? I would like to test MVs and >> am waiting for the docs completed. > > I think they are done. If you notice anything missing or in need > of clarification please let me know. At this point missing docs > would be a bug in need of a fix. I decided to take another pass through to try to spot anyplace I might have missed. I found that I had missed documenting the new pg_matviews system view, so I have just pushed that. I also think that something should be done about the documentation for indexes. Right now that always refers to a "table". It would clearly be awkward to change that to "table or materialized view" everywhere. I wonder if most of thosse should be changed to "relation" with a few mentions that the relation could be a table or a materialized view, or whether some less intrusive change would be better. Opinions welcome. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mar 6, 2013, at 1:51 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > I also think that something should be done about the documentation > for indexes. Right now that always refers to a "table". It would > clearly be awkward to change that to "table or materialized view" > everywhere. I wonder if most of thosse should be changed to > "relation" with a few mentions that the relation could be a table > or a materialized view, or whether some less intrusive change would > be better. Opinions welcome. Isn’t a materialized view really just a table that gets updated periodically? And isn’t a non-matierialized view also thoughtof as a “relation”? If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the termincludes materialized views (and excludes foreign tables). Best, David
> Kevin Grittner <kgrittn@ymail.com> wrote: >> Tatsuo Ishii <ishii@postgresql.org> wrote: >> >>> Was the remaining work on docs done? I would like to test MVs and >>> am waiting for the docs completed. >> >> I think they are done. If you notice anything missing or in need >> of clarification please let me know. At this point missing docs >> would be a bug in need of a fix. Ok. > I decided to take another pass through to try to spot anyplace I > might have missed. I found that I had missed documenting the new > pg_matviews system view, so I have just pushed that. > > I also think that something should be done about the documentation > for indexes. Right now that always refers to a "table". It would > clearly be awkward to change that to "table or materialized view" > everywhere. I wonder if most of thosse should be changed to > "relation" with a few mentions that the relation could be a table > or a materialized view, or whether some less intrusive change would > be better. Opinions welcome. You might want to add small description about MVs to Tutorial documentation "3.2 Views". Here is the current description of views in the doc. --------------------------------------------------------------------- 3.2. Views Refer back to the queries in Section 2.6. Suppose the combined listing of weather records and city location is of particular interest to your application, but you do not want to type the query each time you need it. You can create a view over the query, which gives a name to the query that you can refer to like an ordinary table: CREATE VIEW myview AS SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city =name; SELECT * FROM myview; Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces. Views can be used in almost any place a real table can be used. Building views upon other views is not uncommon. --------------------------------------------------------------------- -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On 6 March 2013 14:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 5 March 2013 22:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> FWIW, my opinion is that doing anything like this in the planner is >>> going to be enormously expensive. > >> As we already said: no MVs => zero overhead => no problem. > > Well, in the first place that statement is false on its face: we'll > still spend cycles looking for relevant MVs, or at least maintaining a > complexly-indexed cache that helps us find out that there are none in > a reasonable amount of time. In the second place, even if it were > approximately true it wouldn't help the people who were using MVs. We can store info in the relcache, and reduce such a lookup to a simple if test in the planner. Populating the cache would be easy enough, approx same overhead as deriving list of constraints for the relcache. If you were using MVs, there are further heuristics to apply. MVs come in various shapes, so we can assess whether they use aggregates, joins, filters etc and use that for a general match against a query. I don't see the need for complex assessments in every case. >> It costs in >> the cases where time savings are possible and not otherwise. > > And that is just complete nonsense: matching costs whether you find a > match or not. Could we have a little less Pollyanna-ish optimism and > a bit more realism about the likely cost of such a feature? It's not a trivial feature; this is a lot of work. But it can be done efficiently, without significant effect on other workloads. If that really were to be true, then enable_lookaside = off can be the default, just as we have for another costly planning feature, constraint_exclusion. Matview lookaside is the next-best-action for further work on the planner, AFAICS. Correctly optimised query parallelism is harder, IMHO. What I'm hearing at the moment is "please don't make any changes in my area" or "don't climb the North face". Considering the rather high bar to being able to do this effectively, I do understand your interest in not having your/our time wasted by casual attempts to approach the problem, but I don't want to slam the door on a serious attempt (2 year project, 1+ man year effort). -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David E. Wheeler <david@justatheory.com> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: > >> I also think that something should be done about the >> documentation for indexes. Right now that always refers to a >> "table". It would clearly be awkward to change that to "table >> or materialized view" everywhere. I wonder if most of thosse >> should be changed to "relation" with a few mentions that the >> relation could be a table or a materialized view, or whether >> some less intrusive change would be better. Opinions welcome. > > Isn’t a materialized view really just a table that gets updated > periodically? Not exactly. It is a relation which has some characteristics of a view (including an entry in pg_rewrite exactly like that for a view) and some characteristics of a table (including a heap and optional indexes). Whether it looks more like a table or more like a view depends on how you tilt your head. You could just as easily say that it is really just a view which periodically caches its results on disk. They really are "their own thing". As future releases add more subtle "freshness" concepts, incremental updates, and query rewrite that unique identity will become even more conspicuous, I think. > And isn’t a non-matierialized view also thought of as a > “relation”? Yes. Tables, views, and materialized views are all relations. > If the answer to both those questions is “yes,” I think the term > should remain “table,” with a few mentions that the term includes > materialized views (and excludes foreign tables). And if the answers are "not exactly" and "yes"? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mar 7, 2013, at 7:55 AM, Kevin Grittner <kgrittn@ymail.com> wrote: >> If the answer to both those questions is “yes,” I think the term >> should remain “table,” with a few mentions that the term includes >> materialized views (and excludes foreign tables). > > And if the answers are "not exactly" and "yes"? I still tend to think that the term should remain “table,” with brief mentions at the top of pages when the term should beassumed to represent tables and matviews, and otherwise required disambiguations. Trying to make the least possible work for you here. :-) David
2013/3/5 Kevin Grittner <kgrittn@ymail.com>: > Perhaps it would be worth looking for anything in the patch that > you think might be painting us into a corner where it would be hard > to do all the other cool things. While it's late enough in the > process that changing anything like that which you find would be > painful, it might be a lot more painful later if we release without > doing something about it. My hope, of course, is that you won't > find any such thing. With this patch I've tried to provide a > minimal framework onto which these other things can be bolted. > I've tried hard not to do anything which would make it hard to > extend, but new eyes may see something I missed. (Without having looked at the patch, or even the documentation :-/.) I think that something that might prove useful is the following: Keeping in mind the possibility of storing something in the matview’s heap that doesn’t correspond to what a SELECT * FROM matview would yield (i.e., the “logical content”). The transformation could be performed by an INSTEAD rule (similar to how a view is expanded to its definition, a reference to a matview would expand to its heap content transformed to the “logical content”). (Note that I don’t have any reason to believe that the current implementation would make this more difficult than it should be.) <ridiculously long rationale for the previous> (All the following requires making matviews (inter- and intra-) transactionally up-to-date w.r.t. their base tables at the moment of querying them. I don’t deal with approximate results, however useful that might be.) I think that the possibility of optimizing COUNT(*) (see mail by Greg Stark in this thread with “the queue of updates with transacion information that are periodically flattened into the aggregate”) can be generalized to generic aggregation that way. The idea would be that a transaction that adds (or deletes or updates) a row in a base table causes a “delta” row version in the matview. Selecting from the matview then merges these deltas into one value (for each row that is logically present in the matview). Every once in a while (or rather quite often, if the base tables change often), a VACUUM-like clean-up operations must be run to merge all rows that are “old enough” (i.e., whose transactions are not in flight anymore). Example of trivial aggregation matview weight_per_kind defined as: SELECT kind, SUM(weight) FROM fruit GROUP BY kind; The matview would then physically contain rows such as: xmin, xmax, kind, weight 1000, 0, 'banana', 123 1000, 0, 'apple', 1 1001, 0, 'banana', 2 1002, 0, 'banana', -3 Which means: * tx 1000 probably performed a clean-up operation and merged a bunch of banana rows together to yield 123; it also inserted an apple of weight 1. * tx 1001 inserted a banana of weight 2. Any clean-up operation coming by could not merge the 2 into the first row, as long as tx 1000 is in flight. Otherwise, it would yield 125; physically this would mean adding a 125 row, marking the 123 and 2 rows as deleted, and then waiting for VACUUM to remove them). * tx 1002 deleted a banana with weight 3. The result of a SELECT * FROM weight_per_kind; would actually execute SELECT kind, SUM_merge(weight) FROM heap_of_weight_per_kind GROUP BY kind; This would result, for tx 1001 (assuming tx 1000 committed and our snapshot can see it), in: kind, weight 'banana', 125 'apple', 1 (The -3 is not taken into account, because it is not visible to tx 1001.) The operator to use at the location of SUM_merge is something that merges multiple aggregation results (plus results that represent some kind of “negation”) together. For SUM, it would be SUM itself and the negation would be numerical negation. There might also be some kind of “difference” concept used for UPDATE: When updating a weight from 4 to 3, the difference would be -1. Those additional properties could optionally be added to the definition of each aggregation function; It must be done for each function that you want to use in such a way. Other aggregation functions such as AVG would require storing the SUM + number of rows in the matview (otherwise two AVGs could not be merged); again a difference between the heap and the logical content. Functions such as MIN and MAX are more difficult to fit in this framework: I can only see how it would work if row deletion were not allowed (which might still be a valid use-case); luckily, I think MIN and MAX are not the typical things for which you would want to use matviews, because quick computation can typically be done directly using the base tables. This whole thing would result in incremental updates of aggregation-matviews that don’t require physical serialization of the transactions that update the base tables and that query the matview, which other models (that depend on correspondence between the heap and logical content of matviews) would probably require. And that’s where I stop rambling because nobody gets this far anyway, and I urgently need some sleep :-). </ridiculously long rationale> Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
On Wed, Mar 6, 2013 at 09:16:59AM -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On 5 March 2013 22:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> FWIW, my opinion is that doing anything like this in the planner is > >> going to be enormously expensive. > > > As we already said: no MVs => zero overhead => no problem. > > Well, in the first place that statement is false on its face: we'll > still spend cycles looking for relevant MVs, or at least maintaining a > complexly-indexed cache that helps us find out that there are none in > a reasonable amount of time. In the second place, even if it were > approximately true it wouldn't help the people who were using MVs. > > > It costs in > > the cases where time savings are possible and not otherwise. > > And that is just complete nonsense: matching costs whether you find a > match or not. Could we have a little less Pollyanna-ish optimism and > a bit more realism about the likely cost of such a feature? Should we add this to the TODO list as a possibility? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Mar 5, 2013 at 08:50:39AM +0000, Simon Riggs wrote: > Its not a different feature, its what most people expect a feature > called MV to deliver. That's not a matter of opinion, its simply how > every other database works currently - Oracle, Teradata, SQLServer at > least. The fact that we don't allow MVs to automatically optimize Good points. > queries is acceptable, as long as that is clearly marked in some way > to avoid confusion, and I don't mean buried on p5 of the docs. What we > have here is a partial implementation that can be improved upon over > next few releases. I hope anyone isn't going to claim that > "Materialized Views" have been implemented in the release notes in > this release, because unqualified that would be seriously misleading > and might even stifle further funding to improve things to the level > already implemented elsewhere. Just to reiterate, I fully support the > committing of this partial feature into Postgres in this release > because it will be a long haul to complete the full feature and what > we have here is a reasonable stepping stone to get there. > > Transactionally up-yo-date MVs can be used like indexes in the > planner. The idea that this is impossible because of the permutations > involved is somewhat ridiculous; there is much published work on > optimising that and some obvious optimisations. Clearly that varies > according to the number of MVs and the number of tables they touch, > not the overall complexity of the query. The overhead is probably same > or less as partial indexes, which we currently think is acceptable. In > any case, if you don't wish that overhead, don't use MVs. While you are right that automatically using materialized views is like the optimizer choosing partial indexes, we actually already have auto-selection of row-level materialized views with expression indexes and index-only scans. When you do the insert or update, the indexed function is called and the value stored in the index. If you later query the function call, we can pull the value right from the index. This, of course, is a very crude definition of materialized view, but it seems relevant. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Mar 7, 2013 at 12:14 PM, David E. Wheeler <david@justatheory.com> wrote: > On Mar 7, 2013, at 7:55 AM, Kevin Grittner <kgrittn@ymail.com> wrote: >>> If the answer to both those questions is “yes,” I think the term >>> should remain “table,” with a few mentions that the term includes >>> materialized views (and excludes foreign tables). >> >> And if the answers are "not exactly" and "yes"? > > I still tend to think that the term should remain “table,” with brief mentions at the top of pages when the term shouldbe assumed to represent tables and matviews, and otherwise required disambiguations. This ship has already sailed. There are plenty of places where operations apply to a subset of the relation types that exist today, and we either list them out or refer to "relations" generically. Changing that would require widespread changes to both the documentation and existing error message text. We cannot decide that "table" now means "table or materialized view" any more than we can decide that it means "table or foreign table", as was proposed around the time those changes went in. Yeah, it's more work, and it's a little annoying, but it's also clear. Nothing else is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
While doing some post-commit review of the 9.3 materialized view feature, I noticed a few loose ends: On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote: > Note that [...] "ALTER TABLE ... RENAME CONSTRAINT" [is] > currently supported for MVs by ALTER TABLE but not by ALTER MATERIALIZED VIEW. > > There's no documented support for table constraints on MVs, but UNIQUE > constraints are permitted: > > [local] test=# alter materialized view mymv add unique (c); > ALTER MATERIALIZED VIEW > [local] test=# alter materialized view mymv add check (c > 0); > ERROR: "mymv" is not a table > [local] test=# alter materialized view mymv add primary key (c); > ERROR: "mymv" is not a table or foreign table The above points still apply. Also, could you explain the use of RelationCacheInvalidateEntry() in ExecRefreshMatView()? CacheInvalidateRelcacheByRelid() followed by CommandCounterIncrement() is the typical pattern; this is novel. I suspect, though, neither is necessary now that the relcache does not maintain populated status based on a fork size reading. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Apologies, but this sub-thread got lost when I changed email accounts. I found it in a final review to make sure nothing had fallen through the cracks. Noah Misch <noah@leadboat.com> wrote: > On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote: >> There's no documented support for table constraints on MVs, but >> UNIQUE constraints are permitted: >> >> [local] test=# alter materialized view mymv add unique (c); >> ALTER MATERIALIZED VIEW Fix pushed. > Also, could you explain the use of RelationCacheInvalidateEntry() > in ExecRefreshMatView()? CacheInvalidateRelcacheByRelid() > followed by CommandCounterIncrement() is the typical pattern; > this is novel. I suspect, though, neither is necessary now that > the relcache does not maintain populated status based on a fork > size reading. Yeah, that was part of the attempt to support unlogged materialized views while also not returning bogus results if the view had not been populated, using heap file size. I agree that this line can just come out. If there are no objections real soon now, I will remove it in master and the 9.3 branch before the release candidate. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> wrote: > Noah Misch <noah@leadboat.com> wrote: >> Also, could you explain the use of RelationCacheInvalidateEntry() >> in ExecRefreshMatView()? CacheInvalidateRelcacheByRelid() >> followed by CommandCounterIncrement() is the typical pattern; >> this is novel. I suspect, though, neither is necessary now that >> the relcache does not maintain populated status based on a fork >> size reading. > > Yeah, that was part of the attempt to support unlogged materialized > views while also not returning bogus results if the view had not > been populated, using heap file size. I agree that this line can > just come out. If there are no objections real soon now, I will > remove it in master and the 9.3 branch before the release > candidate. Done. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company